## Projects and Internships

## A knowledge base of mathematical results

### Internship

The basic unit of information of use by researchers in theoretical fields are the mathematical results. We aim to build a knowledge base of these results, using information extraction techniques on scholarly documents. We present an algorithm which extracts mathematical results and references to mathematical results from scientific papers, using their PDF or LaTeX sources. We analyse the results of our algorithm on the whole arXiv database of scientific papers and explore the resulting graph of mathematical results, which contains more than 6 million results and 4.5 million edges. We present attempts to link theorems of different papers using a TFIDF vectorizer or an autoencoder.

**Advisor:** *Pr. Pierre Senellart.*

*work*M2 IASD

*watch_later*5 months

*group*1

Slides

Report

Code

## Representing Uncertain Databases as World-Set Decomposition

### Data Wrangling

This work is based on the paper *10^10^6 Worlds and Beyond : Efficient Representation and Processing of Incomplete Information* by Lyublena Antova, Christoph Koch, and Dan Olteanu, in which the authors present an efficient way to represent incomplete information into databases, using World-Set Decompositions (WSD). This model being a strong representation of incomplete databases for every query language, the result of every query on WSD can be expressed as a WSD. In this project, I implemented algorithms to answer relational algebra queries on WSD. Finally, I tested my algorithms on a dataset with real incompleteness taken from the Paris Open Data website and containing informations about all trees in Paris.

*work*M2 IASD

*watch_later*3 weeks

*group*1

Slides

Report

Code

## Adaptive Heuristics

### Game Theory

This work is based on the paper *Adaptive Heuristics* by Sergiu Hart, in which the author presents the concept of Adaptive Heuristics and various strategies like Regret Matching or Generalized Regret Matching. I did a 10-page report on this paper and a 30 minute presentation, with a proof of the main theorem, and implemented the main strategies presented in this paper.

*work*M2 IASD

*watch_later*2 weeks

*group*1

Slides

Report

Code

## Robust Adversarial Networks for Image Detection

### Data Science Projects

In this third project of the *Data Science Projects* course, our goal was to build neural networks for image classification which are robust to classic adversarial attacks, like PGD and FGSM presented in *Explaining and Harnessing Adversarial Examples* by Goodfellow and Al. We first build a basic neural network and tested the different attacks on it. Then, we tried to build the most robust neural network to these attacks. We got very good results when training a network with the contrastive loss on adversarial examples.

*work*M2 IASD

*watch_later*3 weeks

*group*3

Slides

Report

Private repo

## Unsupervised Word Translation

### Data Science Projects

In this second project of the *Data Science Projects* course, our goal was to use word embedding to build an unsupervised world translator, like described in the paper *World Translation Without Parallel Data* by Conneau and Al (Facebook AI Research). We first implemented supervised word translation using a dictionary. Then, we tried the unsupervised method on different pairs of languages using generative adversarial networks.

*work*M2 IASD

*watch_later*3 weeks

*group*3

Slides

Report

Private repo

## Online-Learning for Recommendation Systems

### Data Science Projects

In this first project of the *Data Science Projects* course, our goal was to familiarize ourselves with data science through a small project. We chose the project about reinforcement learning methods like linearUCB for recommendation systems. We based our work on the paper *A Contextual-Bandit Approach to Personalized News Article Recommendation* (Yahoo Labs) in which they use contextual bandit for article recommendation. Once we did that, we tried to implement an online-learning framework in which new movies and new users can come into the system at any time.

*work*M2 IASD

*watch_later*3 weeks

*group*3

Slides

Notebook

## Toxic Comments Classification

### Natural Language Processing

This NLP project is based on the Toxic Comment Classification Challenge from Kaggle. The goal is to train a classifier on Wikipedia comments to detect automatically toxic comments and what kind(s) of toxicity are present in each comment. We tried several NLP methods, from the simple MLP to Transformers.

*work*M2 IASD

*watch_later*1 month

*group*3

Report

Code

## Electric Motor Temperature Prediction

### Optimization for Machine Learning

For this project, the goal was to apply notions and algorithms covered by the course to a dataset of our choice on a classification or a regression task. I choose the Kaggle dataset *Electric Motor Temperature* for a regression task. I used classic methods like Ridge and Lasso and small neural networks entirely hand coded. I also tried to add a recursive unit to obtain better results.

*work*M2 IASD

*watch_later*1 week

*group*1

Notebook

## Deep Learning for Go

### Deep Learning

The goal of this project was to build the best possible deep neural network for playing go, with the constraint that there must be less than 1 million parameters. We use *Keras* during this project and were in competition with other students and we finished at the 7th place out of 16. *Click here to access the page of the competition.*

*work*M2 IASD

*watch_later*1 month

*group*2

Report

## From Quantized Gossip to Voting

### Networks Algorithms

In this project, we presented the method to propagate the result of a vote in a network proposed in the paper *Interval consensus: From quantized gossip to voting* by Bénézit and Al. We described the proof of their method and we implemented it and tested it ourselves.

*work*M1 MPRI

*watch_later*1 week

*group*2

Slides

Code

## SnaKhan

### Systems and Networks

In this project, we implemented Kahn process networks in *OCaml* and tested their efficiency with the game Snake that we implemented in *OCaml* too.

*work*L3

*watch_later*2 weeks

*group*2

Slides

Private repo

## Mini-Rust Compiler

### Compilation

The goal of this project was to build a compiler for a fragment of Rust called *Mini-Rust*, using the language *OCaml*. I did the lexer, the parser, the AST and the compiler. I learnt a lot during this project and at the end, we obtained a 96% success on test cases.

*work*L3

*watch_later*2 months

*group*2

Slides

Private repo

## A clock microprocessor

### Digital System

The goal of this project was to build a microprocessor for *RISC V* in *OCaml* and create a code for a clock with RISC-V to test the microprocessor.

*work*L3

*watch_later*2 months

*group*4

Slides

Private repo

## Paper presentations

#### On the Representation and Querying of Sets of Possible Worlds *link*

*work*Data Wrangling (M2)

*watch_later*2 weeks

*group*1

Report

#### Rationilizing Neural Predictions *link*

*work*Natural Language Processing (M2)

*watch_later*1 week

*group*3

Slides

#### Pinpointing in the Description Logic EL *link*

*work*Knowledge Graphs, Description Logics and Reasoning on Data (M2)

*watch_later*1 week

*group*1

Slides