Project descriptions

Collaborative Filtering

A recommender system is concerned with presenting items (e.g. books on Amazon, movies at Movielens or music at lastFM) that are likely to interest the user. In collaborative filtering, we base our recommendations on the (known) preference of the user towards other items, and also take into account the preferences of other users.

Resources

All the necessary resources (including training data) are available at https://inclass.kaggle.com/c/cil-collab-filtering-2019

To participate, follow the link here

Training Data

For this problem, we have acquired ratings of 10000 users for 1000 different items. All ratings are integer values between 1 and 5 stars.

Evaluation Metrics

Your collaborative filtering algorithm is evaluated according to the following weighted criteria:


Text Sentiment Classification

The use of microblogging and text messaging as a media of communication has greatly increased over the past 10 years. Such large volumes of data amplifies the need for automatic methods to understand the opinion conveyed in a text.

Resources

All the necessary resources (including training data) are available at https://inclass.kaggle.com/c/cil-text-classification-2019

To participate, follow the link here

Training Data

For this problem, we have acquired 2.5M tweets classified as either positive or negative.

Evaluation Metrics

Your approach is evaluated according to the following criteria:


Road Segmentation

Segmenting an image consists in partitioning an image into multiple segments (formally one has to assign a class label to each pixel). A simple baseline is to partition an image into a set of patches and classify every patch according to some simple features (average intensity). Although this can produce reasonable results for simple images, natural images typically require more complex procedures that reason abut the entire image or very large windows.

Resources

All the necessary resources (including training data) are available at https://inclass.kaggle.com/c/cil-road-segmentation-2019

To participate, follow the link here

Training Data

For this problem, we provide 100 aerial images acquired from GoogleMaps. We also provide groundtruth images where each pixel is labeled as {road, background}. Your goal is to train a classifier to segment roads in these images, i.e. assign a label {road=1, background=0} to each pixel.

Evaluation Metrics

Your approach is evaluated according to the following criteria:

Galaxy Image Generation

In this project, you are given a mix of realistic cosmology images, corrupted cosmology images, and images which show other concepts like landscapes.
Most of the images have been scored according to their similarity to the concept of a prototypical 'cosmology image' according to our data-set. A similarity score like 2.61 means that the image almost co-incides with the prototypical cosmology image, a low similarity score like 0.00244 means that the image is a poor representative of a cosmology image -- probably because it has a different subject like a landscape or is corrupted. You can assume that similarity scores are valued in the interval [0.0, 8.0].
Beyond the scored images you are a given a smaller set of labeled images for which you can assume that they are drawn from the same distribution as the scored images. For these images, you are not given the similarity score, but you get labels: 1.0 means that the image is a real cosmology image, whereas 0.0 means it has been corrupted or shows another subject.
Task description
You are required to use the combination of scored/labeled images to build a generative model of the concept of 'realistic cosmology image', and then use this model to solve the following two tasks:
a) Generate a set of realistic cosmology images, i.e. they have a high similarity to the concept of 'cosmology image' according to our data-set. You are encouraged to submit a set of diverse images, i.e. you should not submit images that are perturbed versions of each other or perturbed versions of the scored images. This part of the competition is not judged via Kaggle but uses a custom submission at the end of the project.
b) For a set of query images assign a similarity score to each of them. This part of the competition is judged via Kaggle, you can submit solution CSV files and track your public leaderboard scores.

Resources

All the necessary resources (including training data) are available at https://inclass.kaggle.com/c/cil-cosmology-2019

To participate, follow the link here

Evaluation Metrics

Your approach is evaluated according to the following criteria:



Computational infrastructure

Use ETH's new Leonhard cluster.



Report Grading Guidlines

Your paper will be graded by two independent reviewers according to the following three criteria:

1) Quality of paper (30%)
----------------

6.0: Good enough for submission to an international conference.
5.5: Background, method, and experiment are clear. May have minor
issues in one or two sections. Language is good. Scores and baselines are well documented.
5.0: Explanation of work is clear, and the reader is able to identify the novelty of the work. Minor issues in one or two sections. Minor problems with language. Has all the recommended sections in the howto-paper.
4.5: Able to identify contribution. Major problems in presentation of results and or ideas and or reproducibility/baselines.
4.0: Hard to identify contribution, but still there. One or two good sections should get students a pass.
3.5: Unable to see novelty. No comparison with any baselines.


2) Creativity of solution (20%)
----------------------

6.0: Elegant proposal, either making a useful assumption, studying a particular class of data, or using a novel mathematical fact.
5.5: A non-obvious combination of ideas presented in the course or published in a paper (Depending on the difficulty of that idea).
5.0: A novel idea or combination not explicitly presented in the course.
4.5: An idea mentioned in a published paper with small extensions / changes, but not so trivial to implement.
<=4.0: A trivial idea taken from a published paper.


3) Quality of implementation (20%)
----------------------

6.0: Idea is executed well. The experiments done make sense in order to answer the proposed research questions. There are no obvious experiments not done that could greatly increase clarity. The submitted code and other supplementary material is understandable, commented, complete, clean and there is a README file that explains it and describes how to reproduce your results.

Subtractions from this grade will be made if:
- the submitted code is unclear, does not run or experiments cannot be reproduced or there is no description of it
- experiments done are useless to gain understanding or of unclear nature or obviously useful experiments have been left undone
- comparison to baselines are not done