The inclusion of rich sensors on modern smartphones has changed mobile phones from simple communication devices to powerful human-centric sensing platforms. Similar trends are influencing other personal gadgets such as the tablets, cameras, and wearable devices like the Google glass. Together, these sensors can provide
a high-resolution view of the user;;s context, ranging from simple information like locations and activities, to high-level inferences about the users;; intention, behavior, and social interactions. Understanding such context can help solving existing system-side
challenges and eventually enable a new world of real-life applications.
In this thesis, we propose to learn human behavior via multi-modal sensing. The intuition is that human behaviors leave footprints on different sensing dimensions - visual, acoustic, motion and in cyber space. By collaboratively analyzing these footprints, the system can obtain valuable insights about the user. We show that the
analysis results can lead to a series of applications including capturing life-logging videos, tagging user-generated photos and enabling new ways for human-object interactions. Moreover, the same intuition may potentially be applied to enhance existing
system-side functionalities - offloading, prefetching and compression.