Course Summary
This graduate course is especially meant for Ph.D. students who have basic familiarity with computer vision, image processing, and pattern recognition and want to upsurge their knowledge and machinery to the state-of-the-art, with direct utility in their own research.
The topic of attention is the challenges of computer vision by learning. We address the theoretical foundations of machine learning in conjunction with computer vision and present algorithms that achieve state-of-the-art performance while maintaining efficient execution with minimal supervision. We explain and emphasize machine learning for vision tasks like concept detection with deep learning, fine-grained categorization using kernel pooling, semantic segmentation with conditional random fields, object tracking by structured SVMs, event recognition by random forests and retrieval from a single image by metric learning. We give an overview of the latest developments and future trends in the field on the basis of several recent challenges, including the TRECVID and ImageNet competitions, the leading competitions for visual search engines based on computer vision by learning, and we indicate how to obtain improvements in the near future.
Course Material
To prepare for the course students are advised read the following two papers:
- Distinctive image features from scale-invariant keypoints. David Lowe. International Journal of Computer Vision, 60(2):91-110 2004.
- Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study. J. Zhang, M. Marszałek, S. Lazebnik, C. Schmid. International Journal of Computer Vision, 73(2):213-238.
Course Schedule
Tuesday March 25, 2014: Computer Vision
Time | Room | Topic | Lecturer |
---|---|---|---|
0930-1015 | D1.116 | Introduction, observables, invariance | Arnold Smeulders |
1015-1030 | Break | ||
1030-1115 | D1.116 | Bag of Words, codebooks | Arnold Smeulders |
1115-1130 | Break | ||
1130-1215 | D1.116 | Object and scene classification, SVMs, codemaps | Cees Snoek |
1215-1330 | Lunch break | ||
1330-1700 | D1.111 | Lab: measuring features |
Wednesday March 26, 2014: Machine Learning
Time | Room | Topic | Lecturer |
---|---|---|---|
0930-1015 | D1.115 | Pictorial structures | Laurens van der Maaten |
1015-1030 | Break | ||
1030-1115 | D1.115 | Latent and Structured SVMs | Laurens van der Maaten |
1115-1130 | Break | ||
1130-1215 | D1.115 | Convolutional networks | Laurens van der Maaten |
1215-1330 | Lunch break | ||
1330-1700 | D1.111 | Lab: pedestrian detection | data |
Thursday March 27, 2014: Spatiotemporal computer vision by learning
Time | Room | Topic | Lecturer |
---|---|---|---|
0930-1015 | D1.115 | Objects, spatial order, and concept interaction | Arnold Smeulders |
1015-1030 | Break | ||
1030-1115 | D1.115 | Motion and action recognition | Jan van Gemert |
1115-1130 | Break | ||
1130-1215 | D1.115 | Object tracking by learning | Arnold Smeulders |
1215-1330 | Lunch break | ||
1330-1700 | D1.111 | Lab: learning object and scene detectors | ImageMiner | Euvision Technologies |
Friday March 28, 2014: Large-scale computer vision by learning
Time | Room | Topic | Lecturer |
---|---|---|---|
0930-1015 | D1.115 | Benchmarking | Cees Snoek |
1015-1030 | Break | ||
1030-1115 | D1.115 | Computer vision by learning from the web | Cees Snoek |
1115-1130 | Break | ||
1130-1215 | D1.115 | Learning using attributes | Thomas Mensink |
1215-1330 | Lunch break | ||
1330-1600 | D1.111 | Lab: Fine-grained categorization using attributes | Data | |
1600 | D1.111 | Borrel |
Monday March 31, 2014: Invited tutorial by Shih-Fu Chang
Slides will be provided ASAP.Time | Room | Topic | Lecturer |
---|---|---|---|
0930-1015 | G2.10 | Event Recognition and Recounting | Shih-Fu Chang |
1015-1030 | Break | ||
1030-1115 | G2.10 | Proportional SVM | Shih-Fu Chang |
1115-1130 | Break | ||
1130-1215 | G2.10 | Sentiment and Emotion | Shih-Fu Chang |
1215-1330 | Lunch break | ||
1400-1700 | G2.02 | Lab: your own research problem |

Invited tutorial
-
Shih-Fu Chang
Shih-Fu Chang is the Richard Dicker Professor, Director of the Digital Video and Multimedia Lab, and Senior Vice Dean of Engineering School at Columbia University. He is an active researcher leading development of innovative technologies for multimedia information extraction and retrieval, while contributing to fundamental advances of the fields of machine learning, computer vision, and signal processing. In the past several decades, his group has developed some of the earliest image/video search engines, laying the foundation of the vibrant field of content-based visual search. Recognized by many paper awards and citation impacts, his scholarly work set trends in several important areas, such as compressed-domain video manipulation, video structure parsing, image authentication, large-scale high-dimensional data indexing, and semantic video search. His group demonstrated the top performance in the international video retrieval evaluation forum TRECVID (2008 and 2010). The video concept classifier library, ontology, and annotated corpora from his group have been used by many groups worldwide. He co-led the ADVENT university-industry research consortium with participation of more than 25 industry sponsors. He has received IEEE Signal Processing Society Technical Achievement Award, ACM SIG Multimedia Technical Achievement Award, IEEE Kiyo Tomiyasu Award, Service Recognition Awards from IEEE and ACM, and the Great Teacher Award from the Society of Columbia Graduates. He served as the Editor-in-Chief of the IEEE Signal Processing Magazine (2006-8), Chairman of Columbia Electrical Engineering Department (2007-2010), Senior Vice Dean of Columbia Engineering School (2012-date), and advisor for several companies and research institutes. His research has been broadly supported by government agencies as well as many industry sponsors. He is a Fellow of IEEE and the American Association for the Advancement of Science.
Lecturers
-
Cees Snoek
is currently an Associate Professor at the University of Amsterdam. In addition, he is head of R&D at Euvision Technologies, one of the lab’s spin-off. He was a visiting scientist at Carnegie Mellon University, Pittsburgh, PA and the University of California, Berkeley, CA. His research interest is video and image search by computer vision and learning.
-
Laurens van der Maaten
is an assistant professor at Delft University of Technology. He was previously at University of California San Diego, Tilburg University, University of Toronto, and Maastricht University. In Delft, Laurens heads the university's Computer Vision Laboratory. His research interests are in computer vision and machine learning.
-
Arnold Smeulders
is professor in visual information analysis at the University of Amsterdam. He has an interest in cognitive vision, content-based image retrieval and the picture-language question. Currently, he is with the national research institute CWI, scientific director of the large public-private COMMIT research program in the Netherlands, and chair of the policy committee for ICT-research in the Netherlands. He has graduated 43 PhD-students. He has co-founded Euvision Technologies, an UvA-spinoff for image search engine technologies.
Guest Lecturers
-
Jan van Gemert
is a Computer Vision researcher at the University of Amsterdam. He received a PhD degree from the University of Amsterdam. He was previously at MERL (USA), the National Institute of Informatics (Japan), and École Normale Supérieure (France). His research interests include image encodings, low-level visual features, image and video categorization, action and object recognition.
-
Thomas Mensink
is a Post-Doctoral researcher at the University of Amsterdam. He has obtained his PhD from the LEAR-team of INRIA Grenoble and the Computer Vision group of Xerox Research Centre Europe, in France, in 2012. His research interest are in applying machine learning models to computer vision problems.