Course Summary

This graduate course is especially meant for Ph.D. students who have basic familiarity with computer vision, image processing, and pattern recognition and want to upsurge their knowledge and machinery to the state-of-the-art, with direct utility in their own research.

The topic of attention is the challenges of computer vision by learning. We address the theoretical foundations of machine learning in conjunction with computer vision and present algorithms that achieve state-of-the-art performance while maintaining efficient execution with minimal supervision. We explain and emphasize machine learning for vision tasks like concept detection with deep learning, fine-grained categorization using kernel pooling, semantic segmentation with conditional random fields, object tracking by structured SVMs, event recognition by random forests and retrieval from a single image by metric learning. We give an overview of the latest developments and future trends in the field on the basis of several recent challenges, including the TRECVID and ImageNet competitions, the leading competitions for visual search engines based on computer vision by learning, and we indicate how to obtain improvements in the near future.

Course Material

To prepare for the course students are advised read the following two papers:

Course Schedule

Tuesday March 25, 2014: Computer Vision

TimeRoom TopicLecturer
0930-1015D1.116Introduction, observables, invarianceArnold Smeulders
1030-1115D1.116Bag of Words, codebooksArnold Smeulders
1130-1215D1.116Object and scene classification, SVMs, codemapsCees Snoek
1215-1330Lunch break
1330-1700D1.111Lab: measuring features 

Wednesday March 26, 2014: Machine Learning

TimeRoom TopicLecturer
0930-1015D1.115Pictorial structuresLaurens van der Maaten
1030-1115D1.115Latent and Structured SVMsLaurens van der Maaten
1130-1215D1.115Convolutional networksLaurens van der Maaten
1215-1330Lunch break
1330-1700D1.111Lab: pedestrian detection | data  

Thursday March 27, 2014: Spatiotemporal computer vision by learning

TimeRoom TopicLecturer
0930-1015D1.115Objects, spatial order, and concept interactionArnold Smeulders
1030-1115D1.115Motion and action recognitionJan van Gemert
1130-1215D1.115Object tracking by learningArnold Smeulders
1215-1330Lunch break
1330-1700D1.111Lab: learning object and scene detectors | ImageMiner Euvision Technologies

Friday March 28, 2014: Large-scale computer vision by learning

TimeRoom TopicLecturer
0930-1015D1.115BenchmarkingCees Snoek
1030-1115D1.115Computer vision by learning from the webCees Snoek
1130-1215D1.115Learning using attributesThomas Mensink
1215-1330Lunch break
1330-1600D1.111Lab: Fine-grained categorization using attributes | Data  

Monday March 31, 2014: Invited tutorial by Shih-Fu Chang

Slides will be provided ASAP.
TimeRoom TopicLecturer
0930-1015G2.10Event Recognition and RecountingShih-Fu Chang
1030-1115G2.10Proportional SVMShih-Fu Chang
1130-1215G2.10Sentiment and EmotionShih-Fu Chang
1215-1330Lunch break
1400-1700G2.02Lab: your own research problem 

Invited tutorial

  • Shih-Fu Chang

    Shih-Fu Chang is the Richard Dicker Professor, Director of the Digital Video and Multimedia Lab, and Senior Vice Dean of Engineering School at Columbia University. He is an active researcher leading development of innovative technologies for multimedia information extraction and retrieval, while contributing to fundamental advances of the fields of machine learning, computer vision, and signal processing. In the past several decades, his group has developed some of the earliest image/video search engines, laying the foundation of the vibrant field of content-based visual search. Recognized by many paper awards and citation impacts, his scholarly work set trends in several important areas, such as compressed-domain video manipulation, video structure parsing, image authentication, large-scale high-dimensional data indexing, and semantic video search. His group demonstrated the top performance in the international video retrieval evaluation forum TRECVID (2008 and 2010). The video concept classifier library, ontology, and annotated corpora from his group have been used by many groups worldwide. He co-led the ADVENT university-industry research consortium with participation of more than 25 industry sponsors. He has received IEEE Signal Processing Society Technical Achievement Award, ACM SIG Multimedia Technical Achievement Award, IEEE Kiyo Tomiyasu Award, Service Recognition Awards from IEEE and ACM, and the Great Teacher Award from the Society of Columbia Graduates. He served as the Editor-in-Chief of the IEEE Signal Processing Magazine (2006-8), Chairman of Columbia Electrical Engineering Department (2007-2010), Senior Vice Dean of Columbia Engineering School (2012-date), and advisor for several companies and research institutes. His research has been broadly supported by government agencies as well as many industry sponsors. He is a Fellow of IEEE and the American Association for the Advancement of Science.


  • Cees Snoek

    is currently an Associate Professor at the University of Amsterdam. In addition, he is head of R&D at Euvision Technologies, one of the lab’s spin-off. He was a visiting scientist at Carnegie Mellon University, Pittsburgh, PA and the University of California, Berkeley, CA. His research interest is video and image search by computer vision and learning.

  • Laurens van der Maaten

    is an assistant professor at Delft University of Technology. He was previously at University of California San Diego, Tilburg University, University of Toronto, and Maastricht University. In Delft, Laurens heads the university's Computer Vision Laboratory. His research interests are in computer vision and machine learning.

  • Arnold Smeulders

    is professor in visual information analysis at the University of Amsterdam. He has an interest in cognitive vision, content-based image retrieval and the picture-language question. Currently, he is with the national research institute CWI, scientific director of the large public-private COMMIT research program in the Netherlands, and chair of the policy committee for ICT-research in the Netherlands. He has graduated 43 PhD-students. He has co-founded Euvision Technologies, an UvA-spinoff for image search engine technologies.

Guest Lecturers

  • Jan van Gemert

    is a Computer Vision researcher at the University of Amsterdam. He received a PhD degree from the University of Amsterdam. He was previously at MERL (USA), the National Institute of Informatics (Japan), and École Normale Supérieure (France). His research interests include image encodings, low-level visual features, image and video categorization, action and object recognition.

  • Thomas Mensink

    is a Post-Doctoral researcher at the University of Amsterdam. He has obtained his PhD from the LEAR-team of INRIA Grenoble and the Computer Vision group of Xerox Research Centre Europe, in France, in 2012. His research interest are in applying machine learning models to computer vision problems.