Computer Perception with Deep Learning

Stay connected



Share on facebook
Share on twitter
Share on linkedin

CIS Distinguished Lecture Series, Nov 20, 2013, 11:00AM – 12:00PM, Barton 108A

Computer Perception with Deep Learning

Yann LeCun , School of Computer Science University of Massachusetts, Amherst

Pattern recognition tasks, particularly perceptual tasks such as vision and audition, require the extraction of good internal representations of the data prior to classification. Designing feature extractors that turns raw data into suitable representation s for a classifier often requires a considerable amount of engineering and domain expertise. The purpose of the emergent field of “Deep Learning” is to devise methods that can train entire pattern recognition systems in an integrated fashion, from raw inputs to ultimate output, using a combination of labeled and unlabeled samples. Deep learning systems are multi-stage architectures in which the perceptual world is represented hierarchically. Features in successive stages are increasingly global, abstract, and invariant to irrelevant transformations of the input. Convolutional networks (ConvNets) are a particular type of deep architectures that are somewhat inspired by biology, and consist of multiple stages of filter banks, interspersed with non – linear operations, and spatial pooling. Deep learning models, particularly ConvNets, have become the record holder for a wide variety of benchmarks and competition, including object recognition in image, semantic image labeling (2D and 3D), acoustic modeling for speec h recognition, drug design, asian handwriting recognition, pedestrian detection, road sign recognition, biological image segmentation, etc. The most recent speech recognition and image analysis systems deployed by Google, IBM, Microsoft, Baidu, NEC and others use deep learning, and many use convolutional networks. A number of supervised methods and unsupervised methods, based on sparse auto-encoders, to train deep convolutional networks will be presented. Several applications will be shown through videos and live demos, including a category-level object recognition system that can be trained on the fly, a system that can label every pixel in an image with the category of the object it belongs to (scene parsing), and a pedestrian detector. Specialized hardware architecture that run these systems in real time will also be described.

Yann LeCun is the founding director of the Center for Data Science at New York University, and Silver Professor of Computer Science, Neural Science, and Electrical Engineering at the Courant Institute of Mathematical Science, the Center for Neural Science, and the ECE Department at NYU-Poly. He received the Electrical Engineer Diploma from Ecole Supérieure d’Ingénieurs en Electrotechnique et Electronique (ESIEE), Paris in 1983, and a PhD in Computer Science from Université Pierre et Marie Curie (Paris) in 1987. After a postdoc at the University of Toronto, he joined AT&T Bell Laboratories in Holmdel, NJ in 1988. He became head of the Image Processing Research Department at AT&T Labs-Research in 1996. He joined NYU as a professor in 2003, after a brief period as a Fellow of the NEC Research Institute in Princeton. His current interests include machine learning, computer perception, mobile robotics, and computational neuroscience. He has published over 180 technical papers and book chapters on these topics as well as on neural networks, handwriting recognition, image processing and compression, and on dedicated circuits and architectures for computer perception. The character recognition technology he developed at Bell Labs is used by several banks around the world to read checks and was reading between 10 and 20% of all the checks in the US in the early 2000s. His image compression technology, called DjVu, is used by hundreds of web sites and publishers and millions of users to access scanned documents on the Web. A pattern recognition method he developed, called convolutional network, is the basis of products and services deployed by companies such as AT&T, Google, Microsoft, NEC, IBM and Baidu for document recognition, human-computer interaction, image tagging, speech recognition, and video analytics. LeCun has been on the editorial board of IJCV, IEEE PAMI, and IEEE Trans. Neural Networks, was program chair of CVPR’06, and is chair of ICLR 2013 and 2014. He is on the science advisory board of Institute for Pure and Applied Mathematics, and has advised many large and small companies about machine learning technology, including several startups he co-founded. He is the recipient of the 2014 IEEE Neural Network Pioneer Award.