## Computational Intelligence Lab

### Spring Semester 2019

This laboratory course teaches fundamental concepts in
computational science and machine learning based on **matrix
factorization**.

where a data matrix **X** is (approximately)
factorized into two matrices **A** and
**B**. Based on the choice of approximation quality
measure and the constraints on **A** and
**B**, the method provides a powerful framework of numerical
linear algebra that encompasses many important techniques, such as dimension
reduction, clustering and sparse coding.

### News

Date | Info |
---|---|

21 /22 February | First Exercise Sessions |

22 February | First Lecture |

1 February | The Q&A forum piazza is online |

### Course Overview

Week | Topic | Lecture | Tutorial/Exercise | Solution |
---|---|---|---|---|

8 | Introduction Linear Autoencoder |
CIL2019-00 CIL2019-01-LinearAutoencoder |
Exercise 1 | Solution 1 |

9 | Principal Component Analysis | CIL-2019-02-PCA |
Exercise 2
Tutorial 2 |
Solutions 2 (code) Solutions 2 (pdf) |

10 | Matrix Reconstruction | CIL2019-03-Matrix-Reconstruction | Tutorial 3Exercise 3 Dataset and Code |
Solution 3 |

11 | Non-Negative Matrix Factorization | CIL2019-04-Non-Negative | Tutorial 4:slides
Tutorial 4:notes Exercise 4 |
Solution 4 |

12 | Word embeddings | TBA | Exercise 5 | TBA |

13 | Clustering Mixture Models | |||

14 | Neural Networks | |||

15 | Generative models | |||

16 | (no lecture, easter holiday) |
|||

17 | (no lecture, easter holiday) |
|||

18 | Sparse coding | |||

19 | Dictionary Learning | |||

20 | TBD |

### Classes

What | Time | Room |
---|---|---|

Lectures | Fri 8 - 10 | HG E 7 |

Exercises | Thu 15 - 17 | CAB G 51 |

Thu 16 - 18 | CHN C 14 | |

Fri 15 - 17 | CAB G 61 | |

Presence hour | Mo 11-12 | CAB H 53 |

### Organization

#### Video Recordings

Video Recordings of the 2019 Lecture, Video Recordings of the 2018 Lecture

#### Exercises

Exercise sheets provide pen-and-paper as well as implementation problems, which help you to solidify the theory presented in the lecture and identify areas of lack of understanding. It is highly recommended that you actively participate in the exercise classes.

#### Semester Project

The semester project is an integral part of the CIL course. **Participation is mandatory.
Failing the project results in a failing grade for the overall CIL course **.

You work in groups of *three to four* students (no more, no less) to develop novel solutions to one of four topics.

Building on the implementations you develop during the semester, you and your teammates create a novel solution by combining and extending previous work. You compare your solution to at least two baselines and submit it to the online ranking system for competitive evaluation. Finally, you write up your methodology and present your experimental results in the form of a short scientific paper.

Project Reports are due on **Friday July 5th, 2019 (midnight)**. Competitive submission deadlines are given on kaggle.

A detailed description of the semester project formalities can be found below.

#### Written Exam

The written exam takes 120 minutes. The language of examination is English. As written aids, you can bring two A4 pages (i.e. one A4 sheet of paper), either handwritten or 11 point minimum font size.

#### Old Exams

cil-exam-2010.pdf, cil-exam-2012.pdf, cil-exam-2015.pdf, cil-exam-2016.pdf.

#### Grade

Your final grade will be determined by the written final exam (70% weight) and the semester project (30% weight).
**The project must be passed on its own** and has a bonus/penalty function.
**Failing the project results in a failing grade for the overall examination of the CIL course.**

Binding performance assessment information can be found in the course catalog.

#### Piazza Q&A Forum

Please pose questions on the Q&A forum piazza. You can sign up here using the lecture Id 263-0008-00L. You are more than welcome to participate in the discussion of your peers' questions.

### Semester Project

The semester project is an integral part of the CIL course. Participation is mandatory. Failing the project results in a failing grade for the overall CIL course.

You work in groups of *three to four* students (no more, no less) to develop novel solutions to one of four topics.
You may use piazza to find team members or join an existing team.

Students repeating the course can either submit the previous year's project as-is, in which case they have to be a group of one person and inform us in advance. Alternatively, they can also resubmit a new project in a (new) regular group.

As part of the semester project, you and your teammates are expected to

- Develop a novel solution, e.g. by combining and extending methods from the programming exercises.
- Compare your novel solution to at least two baseline algorithms.
- Submit your novel solution for evaluation to the kaggle online ranking system.
- Write-up your findings in a short scientific paper.

As a rough guide, you may approach the problem as follows: (i) Study the project description sheet. (ii) Download the training data and implement the baselines. (iii) Develop, debug and optimise your novel solution on the training data. (iv) Submit your solution for online evaluation on test data. (v) See where you stand in a ranking of all submissions.

#### Project Descriptions

The project descriptions can be found here.

#### Developing a Novel Solution

You are free to exploit any idea you have, provided it is not identical to any other group submission or existing implementation of an algorithm on the internet.

#### Comparison to Baselines

You must compare your solution to at least two baseline algorithms. For the baselines, you can use the implementations you developed as part of the programming exercises.

#### Ranking of Novel Solution

You must submit your novel algorithm to the kaggle online ranking system. See project descriptions.

#### Scientific Report

For instructions on how to write a scientific paper, see the following [PDF] [source]

The write-up must be a maximum of **4 pages** long (excluding references).

#### Project Submission

To submit your report, please go to https://cmt3.research.microsoft.com/ETHZCIL2019, register and follow the instructions given there. You can resubmit any number of times until the deadline passes.

- When finally uploading your report, you are also required to upload the Python code that you have used for your final kaggle submission. The code should be well-documented and generate the predictions in the required format as uploaded to kaggle. For reproducibility you should also include additional code which you have used to produce plots and additional experiments etc.
- Include the name of your group in the header of the submitted PDF file, e.g: \author{Author1, Author2 & Author3 \\group: cil_nerds, Department of Computer Science, ETH Zurich, Switzerland}
- Attach the signed plagiarism form at the end of your paper (scan).

#### Project Grading

The project grade is composed of a competitive (30%) and a non-competitive (70%) part.

Competitive grade (30%): The ranks in the Kaggle competition system will be converted on a linear scale to a grade between 4 and 6.

Non-competitive grade: The following criteria are graded based on an evaluation by the teaching assistants: quality of paper (30%), creativity of solution (20%), quality of implementation (20%). Each project is graded by two independent reviewers. The grades of each reviewer are de-biased such that the aveage grade across all projects that the reviewer graded is comparable for each reviewer.

### Reading Material

Here is a list of additional material if you want to read up on a specific topic of the course.

#### Probability Theory and Statistics

Chapter 1.2 "Probability Theory" in: Christopher M. Bishop (2006). *Pattern Recognition and Machine Learning*. Springer.

Larry Wasserman (2003). *All of Statistics*. Springer.

#### Linear Algebra

Gene Golub and Charles Van Loan (1996). *Matrix Computations*. The Johns Hopkins University Press.

Lloyd Trefethen and David Bau (1997). *Numerical Linear Algebra*. SIAM.

Dave Austin. *We recommend a Singular Value Decomposition*. (SVD tutorial)

Michael Mahoney. *Randomized algorithms for matrices and data*. (Recent review paper)

#### Collaborative Filtering

Yehuda Koren, Robert Bell and Chris Volinsky (2009). *Matrix Factorization Techniques for Recommender Systems*. IEEE Computer.

#### Clustering

Chapter 9 "Mixture Models and EM" in: Christopher M. Bishop (2006). *Pattern Recognition and Machine Learning*. Springer.

#### Neural Networks

TensorFlow tutorials, and udacity Deep Learning Lecture.

#### Sparse Coding

Chapter 1 "Sparse Representations" in: Stephane Mallat (2009). *A Wavelet Tour of Signal Processing - The Sparse Way*. Academic Press.

Chapter 6 "Wavelet Transforms", pp. 244-276; in: Andrew S. Glassner (1995). *Principles of Digital Image Synthesis*, Vol. 1. Morgan Kaufmann Publishers, inc.

Chapter 13 "Fourie and Spectral Application", pp. 699-716; in: William H. Press, Saul A. Teukolsky, William T. Vetterling and Brian P. Flannery (2007). *Numerical Recipes. The Art of Scientific Computing*. Cambridge University Press.

Aharon, Elad and Bruckstein (2005). *K-SVD: Design of Dictionaries for Sparse Representation.* Proceedings of SPARS.

Richard Baraniuk (2007). *Compressive sensing*. IEEE Signal Processing Magazine.

### Contact

Please pose questions on the Q&A forum piazza. You can sign up here using the lecture Id 263-0008-00L. You are more than welcome to participate in the discussion of your peers' questions.

Lecturer | Prof. Thomas Hofmann |

Head TA | Viktor Gal |

Head TA | Kevin Roth |

TA | Yannic Kilcher |

TA | Hadi Daneshmand |

TA | Jonas Kohler |

TA | Antonio Orvieto |

TA | Dario Pavllo |

TA | Gideon Dresdner |

TA | Mikhail Karasikov |

TA | Matthias Hüser |