Puneet Jain

Researcher at Hewlett-Packard Labs, Palo Alto.

Publications Projects Services Home

Environmental fingerprinting has been proposed as a key enabler to immersive, highly contextualized mobile computing applications, especially augmented reality. While fingerprints can be constructed in many domains (e.g., wireless RF, magnetic field, and motion patterns), visual fingerprinting is especially appealing due to the inherent heterogeneity in many indoor spaces. This visual diversity, however, is also its Achilles' heel -- matching a unique visual signature against a database of millions requires either impractical computation for a mobile device, or to upload large quantities of visual data for cloud offload. Further, most visual "features" tend to be low entropy -- e.g., homogeneous repetitions of floor and ceiling tiles. Our system VisualPrint, proposes a means to offload only the most distinctive visual data, that is, only those visual signatures which stand a good chance to yield a unique match. VisualPrint enables cloud-offloaded visual fingerprinting with efficacy comparable to using whole images, but with an order reduction in network transfer

This work was student research competition (SRC) runners-up at MobiCom 2015 and is to appeared in ACM CoNEXT 2016 main conference.


The idea of augmented reality -- the ability to look at a physical object through a camera and view annotations about the object is certainly not new. Yet, this apparently feasible vision has not yet materialized into a precise, fast, and comprehensively usable system. This project asks: What does it take to enable augmented reality (AR) on smartphones today? To build a ready-to-use mobile AR system, we adopt a top-down approach cutting across smartphone sensing, computer vision, cloud offloading, and linear optimization. Our proposed system OverLay allows random physical object tagging from camera's viewfinder. Later, these tags can be seen by others from different angles, locations, and times. Our approach does not require changes to infrastructure, localization schemes, specialized cameras, or modification to phone's operating system. Designed and developed for current generation smartphones, our experiments in an indoor setup shows promising results. If made commercially available, OverLay can immediately apply to city tourism, PoI discovery, infrastructure maintenance, and object privacy.

This work won best demo at HotMobile 2015 and appeared in ACM MobiSys 2015 main conference.


Large-scale, predictive analytics on social data have proven effective. Over the last decade, the research community has understood the potential value of inferences based on influence analysis, belief propagation, epidemic spread, sentiment mining, and behavior analysis. However, the majority of these efforts are not yet amenable to realize this value in a practical deployment. In particular, they have not been designed with real-time responsiveness as a requirement, and typically only make "predictions" of past events through a post-mortem analysis of historical data. In various applications, the utility of social analytics degrades with minutes or hours. In this research work, we develope a cloud-based framework to enable real-time temporal analysis and inference from streaming social data. As an exemplary instantiation, we have tune framework to predict which YouTube videos are most likely to "go viral" in the near future. We monitor Twitter for more than 90 days, and find that framework's real-time predictions demonstrate strong correlation to future YouTube viewership and Google search trends.

This work appeared in AAAI ICWSM 2014 main conference.


This project uses hadoop data-analytics, multi-view stereo reconstruction, and mobile sensing based dead-reckoning to enable video clustering based on shared content. Crowdsourced videos uploaded from mobile devices, often provides engaging and diverse perspectives not captured by professional videographers. Unfortunately, such multimedia is difficult to organize due to the scale of data. Video clustering services depend on manual tagging or machine-mineable viewer comments. While manual indexing can be effective for popular, well-established videos, they do not apply to newer or live content. We envisage video-sharing services for live user video streams, indexed automatically and in real-time, especially by shared content. Our implementation FOCUS is a Hadoop-on-cloud video-analytics, uniquely leverages 3D model reconstruction and sensing based dead-reckoning to decipher and continuously track a videos line-of-sight. Through spatial understanding of the relative geometry of multiple line-of-sights, FOCUS recognizes shared content even when viewed from diverse angles and distances. We use spatial overlap in multiple line-of-sights as a metric to perform real-time video clustering on cloud.

This work appeared in ACM SenSys 2013 main conference.


This project uses multi-modal sensing and computer vision as proxy metadata to enable distant object localization on cloud. As an example usecase, while driving on a highway entering New York, we want to look at one of the skyscrapers through the smartphone camera, and compute its GPS location. While the problem would have been far more difficult five years back, the growing number of sensors on smartphones, combined with advances in computer vision, have opened up important opportunities. We harness these opportunities through a system called Object Positioning System (OPS) that achieves reasonable localization accuracy. Our core technique uses computer vision to create an approximate 3D structure of the object and camera, and applies mobile phone sensors to scale and rotate the structure to its absolute configuration. Then, by solving (nonlinear) optimizations on the residual (scaling and rotation) error, we ultimately estimate the objects GPS position.

This work appeared in ACM MobiSys 2012 main conference.