#### Seminar: Advanced Topics in Machine Learning

Fall Semester 2014

*Abstract.*
In this seminar, recent papers of the pattern recognition and machine learning literature are presented and discussed. Possible topics cover statistical models in computer vision, graphical models and machine learning.
The seminar "Advanced Topics in Pattern Recognition" familiarizes students with recent developments in pattern recognition and machine learning. Original articles have to be presented and critically reviewed. The students will learn how to structure a scientific presentation in English which covers the key ideas of a scientific paper. An important goal of the seminar presentation is to summarize the essential ideas of the paper in sufficient depth while omitting details which are not essential for the understanding of the work. The presentation style will play an important role and should reach the level of professional scientific presentations.
The seminar will cover a number of recent papers which have emerged as important contributions to the pattern recognition and machine learning literature. The topics will vary from year to year but they are centered on methodological issues in machine learning like new learning algorithms, ensemble methods or new statistical models for machine learning applications. Frequently, papers are selected from computer vision or bioinformatics - two fields, which relies more and more on machine learning methodology and statistical models.

Introduction Slides with general information and all topics

Seminar Hours:

Tue 16-18 CAB H 52

Thu 16-18 CHN G 22

Start of sessions: 14. Oct

Contact:

Professors:
Joachim M. Buhmann, Thomas Hofmann, Andreas Krause

Postdocs:
Hamed Hassani,
Martin Jaggi,
Brian McWilliams

Schedule:

Tuesday Topics:

A - Learning Representations / Deep Networks

- Le, Q. V., Ngiam, J., Coates, A., Lahiri, A., Prochnow, B., & Ng, A. Y. (2011). On Optimization Methods for Deep Learning. ICML
- Bengio, Y., Courville, A., & Vincent, P. (2013). Representation Learning: A Review and New Perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798–1828.
- Le, Q. V., & Mikolov, T. (2014). Distributed Representations of Sentences and Documents. ICML.
- Socher, R., Perelygin, A., Wu, J. Y., Chuang, J., Manning, C. D., Ng, A. Y., & Potts, C. (2013). Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. EMNLP
- Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv.org.
- Wager, S., Wang, S., Liang, P. (2013). Dropout Training as Adaptive Regularization. NIPS
- Rahimi, A., & Recht, B. (2007). Random features for large-scale kernel machines. NIPS
- Coates, A., & Ng, A. Y. (2012). Learning Feature Representations with K-Means. LNCS (Vol. 7700, pp. 561–580). Springer
- Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural Language Processing (Almost) from Scratch. JMLR Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural Language Processing (Almost) from Scratch. JMLR, 12, 2493−2537.

B - Optimization

- Juditsky, A., & Nesterov, Y. (2014). Deterministic and Stochastic Primal-Dual Subgradient Algorithms for Uniformly Convex Minimization. Submitted to Stochastic Systems.
- Shalev-Shwartz, S., & Zhang, T. (2013). Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization. JMLR, 14, 567–599.
- Lacoste-Julien, S., Schmidt, M., & Bach, F. (2012). A simpler approach to obtaining an O(1/t) convergence rate for projected stochastic subgradient descent. arXiv.org.
- Shalev-Shwartz, S., Singer, Y., Srebro, N., & Cotter, A. (2010). Pegasos: Primal Estimated Sub-Gradient Solver for SVM. Mathematical Programming, 127(1), 3–30.
- Konecny, J., & Richtárik, P. (2013). Semi-Stochastic Gradient Descent Methods. arXiv.org.

+ R. Johnson, T. Zhang, Accelerating Stochastic Gradient Descent using Predictive Variance Reduction, NIPS 2013 - Zhao, P., & Zhang, T. (2014). Stochastic Optimization with Importance Sampling. arXiv.org.
- Shalev-Shwartz, S., & Srebro, N. (2008). SVM optimization: Inverse dependence on training set size. ICML
- S. Ahn, A. Korattikara and M. Welling Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring. ICML 2012
- Duchi, J., Wainwright, M., Mcmahan, B. (2013) Estimation, Optimization, and Parallelism when Data is Sparse. NIPS

C - Hashing & Randomization

- Dasgupta, S., & Gupta, A. (2002). An elementary proof of a theorem of Johnson and Lindenstrauss. Random Structures and Algorithms, 22(1), 60–65.
- Shi, Q., Petterson, J. and Dror, G. and Langford, J. and Strehl, A. L. and Smola, A. J. and Vishwanathan, SVN. Hash kernels. AISTATS, 496-503, 2009
- Rahimi, A., & Recht, B. (2007). Random features for large-scale kernel machines. NIPS
- Ogawa, K., Suzuki, Y., Suzumura, S., & Takeuchi, I. (2013). Safe Sample Screening for Support Vector Machines. ICML
- Liberty, E. (2013). Simple and Deterministic Matrix Sketching. KDD
- Drineas et al. (2012) Fast approximation of matrix coherence and statistical leverage. ICML

D - Matrix Factorization

- Hardt, M. (2013). Understanding Alternating Minimization for Matrix Completion. arXiv.org.
- Bittorf, V., Recht, B., Ré, C., & Tropp, J. A. (2012). Factoring nonnegative matrices with linear programs. NIPS
- Gillis, N., & Luce, R. (2014). Robust Near-Separable Non-negative Matrix Factorization Using Linear Optimization. Journal of Machine Learning Research, 15, 1249–1280.

Thursday Topics:

E - Data Summarization

- El-Arini, K. & Guestrin, C. Beyond Keyword Search: Discovering Relevant Scientific Literature. KDD - Proc. of 17th International Conference on Knowledge Discovery and Data Mining, 2011, San Diego, California
- Shahaf, D. & Guestrin, C. Connecting the Dots Between News Articles. KDD 2010.
- Shahaf, D., Yang, J., Suen, C., Jacobs ,J., Wang, H., Leskovec, J. Information cartography: creating zoomable, large-scale maps of information. Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2013.
- Lin, H. and Bilmes, J. Learning Mixtures of Submodular Shells with Application to Document Summarization. In Uncertainty in Artificial Intelligence (UAI), AUAI, Catalina Island, USA, 2012.

F - Networks, Crowds & Privacy

- Gomez-Rodriguez, M., Leskovec, J., Schölkopf, B. Modeling Information Propagation with Survival Theory. ICML 2013
- Karger, D. R., Oh, S., Shah D., Iterative learning for reliable crowdsourcing systems, NIPS 2011
- Welinder, P., Branson, S., Belongie, S., Perona, P. The Multidimensional Wisdom of Crowds. NIPS 2010.
- Blum, A., Hajiaghayi, M.., Ligett, K., Roth, A. Regret Minimization and the Price of Total Anarchy. STOC 2008.
- Hardt, M., Ligett, K., McSherry, F. A Simple and Practical Algorithm for Differentially Private Data Release. NIPS 2012
- Duchi, J.C., Jordan, M.I., Wainwright, M.J. Privacy Aware Learning. Journal of the ACM 2014

G - Spectral Learning (quite theoretical)

- Vempala, S. and Wang, G. A spectral algorithm for learning mixtures of distributions. FOCS 2002
- Hsu, D., Kakade, S. M., and Zhang, T. A spectral algorithm for learning hidden Markov models, COLT 2009
- Song, L., Boots, B., Siddiqi, S. M., Gordon, G. J., and A. J. Smola. Hilbert space embeddings of hidden Markov models. In Proc. 27th Intl. Conf. on Machine Learning (ICML), 2010.
- Anandkumar, A., Foster, D., Hsu, D., Kakade, S. M., and Liu Y.. A spectral algorithm for latent dirichlet allocation. arXiv:1204.6703, 2012

H - Dependence and Model Validation

- Gretton, A et al. (2012) A Kernel Two-Sample Test. JMLR
- Lopez-Paz, D., Henning, P., Schoelkopf, B. (2013). The Randomized Dependence Coefficient. NIPS
- Kleiner, A., Talwalkar, A., Sarkar, P., Jordan, M.I., (2014) A Scalable Bootstrap for Massive Data. Journal of the RSS, Series B, 76, 795-816.
- Agarwal, A., Bartlett, P.L., Duchi, J. (2012) Oracle inequalities for computationally adaptive model selection. arXiv.
- Niu, D., Dy, J., Jordan, M.I. (2013) Iterative discovery of multiple alternative clustering views. IEEE TPAMI.
- Watanabe, S. (2014) A Widely Applicable Bayesian Information Criterion, JMLR 14 (2013) 867-897

I - Structure Learning / Graphical Models

- Wang J., Jebara, T., Chang, S. (2013). Semi-Supervised Learning Using Greedy Max-Cut, JMLR 14, 771-800.
- Nicolaou, M., Zafeiriou, S., Pantic, M. (2014) A Unified Framework for Probabilistic Component Analysis. Machine Learning and Knowledge Discovery in Databases
- Shotton, Sharp, Kohli, Nowozin, Winn, Criminisi (2013) Decision Jungles: Compact and Rich Models for Classification. NIPS
- Loh, P., Wainwright, M., (2012) Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses. NIPS
- Tang, Ben Ayed, Boykov (2014) Pseudo-bound Optimization for Binary Energies. ECCV