Projects
Representing Uncertain Databases as World-Set Decomposition
Data Wrangling
This work is based on the paper 10^10^6 Worlds and Beyond : Efficient Representation and Processing of Incomplete Information by Lyublena Antova, Christoph Koch, and Dan Olteanu, in which the authors present an efficient way to represent incomplete information into databases, using World-Set Decompositions (WSD). This model being a strong representation of incomplete databases for every query language, the result of every query on WSD can be expressed as a WSD. In this project, I implemented algorithms to answer relational algebra queries on WSD. Finally, I tested my algorithms on a dataset with real incompleteness taken from the Paris Open Data website and containing informations about all trees in Paris.
Adaptive Heuristics
Game Theory
This work is based on the paper Adaptive Heuristics by Sergiu Hart, in which the author presents the concept of Adaptive Heuristics and various strategies like Regret Matching or Generalized Regret Matching. I did a 10-page report on this paper and a 30 minute presentation, with a proof of the main theorem, and implemented the main strategies presented in this paper.
Robust Adversarial Networks for Image Detection
Data Science Projects
In this third project of the Data Science Projects course, our goal was to build neural networks for image classification which are robust to classic adversarial attacks, like PGD and FGSM presented in Explaining and Harnessing Adversarial Examples by Goodfellow and Al. We first build a basic neural network and tested the different attacks on it. Then, we tried to build the most robust neural network to these attacks. We got very good results when training a network with the contrastive loss on adversarial examples.
Unsupervised Word Translation
Data Science Projects
In this second project of the Data Science Projects course, our goal was to use word embedding to build an unsupervised world translator, like described in the paper World Translation Without Parallel Data by Conneau and Al (Facebook AI Research). We first implemented supervised word translation using a dictionary. Then, we tried the unsupervised method on different pairs of languages using generative adversarial networks.
Online-Learning for Recommendation Systems
Data Science Projects
In this first project of the Data Science Projects course, our goal was to familiarize ourselves with data science through a small project. We chose the project about reinforcement learning methods like linearUCB for recommendation systems. We based our work on the paper A Contextual-Bandit Approach to Personalized News Article Recommendation (Yahoo Labs) in which they use contextual bandit for article recommendation. Once we did that, we tried to implement an online-learning framework in which new movies and new users can come into the system at any time.
Toxic Comments Classification
Natural Language Processing
This NLP project is based on the Toxic Comment Classification Challenge from Kaggle. The goal is to train a classifier on Wikipedia comments to detect automatically toxic comments and what kind(s) of toxicity are present in each comment. We tried several NLP methods, from the simple MLP to Transformers.
Electric Motor Temperature Prediction
Optimization for Machine Learning
For this project, the goal was to apply notions and algorithms covered by the course to a dataset of our choice on a classification or a regression task. I choose the Kaggle dataset Electric Motor Temperature for a regression task. I used classic methods like Ridge and Lasso and small neural networks entirely hand coded. I also tried to add a recursive unit to obtain better results.
Deep Learning for Go
Deep Learning
The goal of this project was to build the best possible deep neural network for playing go, with the constraint that there must be less than 1 million parameters. We use Keras during this project and were in competition with other students and we finished at the 7th place out of 16. Click here to access the page of the competition.
From Quantized Gossip to Voting
Networks Algorithms
In this project, we presented the method to propagate the result of a vote in a network proposed in the paper Interval consensus: From quantized gossip to voting by Bénézit and Al. We described the proof of their method and we implemented it and tested it ourselves.
SnaKhan
Systems and Networks
In this project, we implemented Kahn process networks in OCaml and tested their efficiency with the game Snake that we implemented in OCaml too.
Mini-Rust Compiler
Compilation
The goal of this project was to build a compiler for a fragment of Rust called Mini-Rust, using the language OCaml. I did the lexer, the parser, the AST and the compiler. I learnt a lot during this project and at the end, we obtained a 96% success on test cases.
A clock microprocessor
Digital System
The goal of this project was to build a microprocessor for RISC V in OCaml and create a code for a clock with RISC-V to test the microprocessor.