A Big 3D Data Approach to Visual Scene Understanding

Stay connected



Share on facebook
Share on twitter
Share on linkedin

CIS Colloquium, Mar 28, 2014, 11:00AM – 12:00PM, Wachman 1015D

A Big 3D Data Approach to Visual Scene Understanding

Jianxiong Xiao , Princeton University

On your one-minute walk from the coffee machine to your desk each morning, you pass by dozens of scenes — a kitchen, an elevator, your office — and you effortlessly recognize them and perceive their 3D structure. But this one-minute scene-understanding problem has been an open challenge in computer vision and artificial intelligence for decades. In this talk, I will share my experience in leveraging big 3D data for scene understanding. I will first motivate why big 3D data is necessary for training computer vision systems to match human performance. Then, I will talk about how to use big 3D data for both bottom-up object detection and top-down scene parsing. As examples, I will take about SlidingShape — a 3D object detector trained from millions of depth maps rendered from CAD models, and PanoContext — a whole-room 3D context model for 360-degree panoramic scene understanding. Finally, I will address some remaining open challenges on leveraging big 3D data for computer vision, including dataset construction, pose registration, and data visualization. In particular, I will highlight our ongoing efforts in constructing the large-scale SUN3D Database, which tries to address these issues by exploiting the unique advantages of big 3D data.

Jianxiong Xiao is an Assistant Professor in the Department of Computer Science at Princeton University. He received his Ph.D. from the Computer Science and Artificial Intelligence Laboratory (CSAIL) at Massachusetts Institute of Technology (MIT). His research interests are in computer vision, with a focus on data-driven scene understanding. He has been motivated by the goal of building computer systems that automatically understand visual scenes, both inferring the semantics (e.g. SUN Database) and extracting 3D structure (e.g. Big Museum). His work has received the Best Student Paper Award at the European Conference on Computer Vision (ECCV) in 2012 and Google Research Best Papers Award for 2012, and has appeared in popular press in the United States. Jianxiong was awarded the Google U.S./Canada Fellowship in Computer Vision in 2012, MIT CSW Best Research Award in 2011, and Google Research Awards in 2014. More information can be found at his group website: http://vision.princeton.edu.