Projets

Unité de recherche rattachée au CNRS

Information du ou de la responsable scientifique du projet :

Prénom

Nom

E-mail

Nom du project

Résumé publiable du projet (10 lignes max.)

Champs thématiques adressés (séparés par des virgules)

Type

Décrivez les innovations attendues à l'issue du projet (5 lignes max.)

Argumentaire scientifique qui décrit le projet, l'état de l'art, les travaux antérieurs et les résultats attendus (20 lignes max.)

The reconstruction of set-theoretic types for dynamic languages can be prohibitively time-consuming. Our current prototype, [CDuce DynLang](https://www.cduce.org/dynlang/), takes an untyped program written in an "idealized" dynamic language, reconstructs an annotation tree for the program, and checks its typing. Although checking whether an annotation tree makes a program well-typed is extremely fast, the reconstruction of the annotation tree can be very slow.

The goal of this project is to optimize the type inference process by fine-tuning a Large Language Model (LLM) to produce a (possibly partial) annotation tree for a program. The type-checker will subsequently verify the annotation tree, resorting to the slower reconstruction only if the check fails.

The project will be developed in the following phases:

1. Generate a random number of well-formed programs to create a diverse dataset.

2. Run the type inference on the generated programs to produce a dataset of program-annotation tree pairs.

3. Clean up the annotations in the dataset, providing only the most relevant information to improve the LLM's performance. This will be a key step of the research since the goal is to determine which information in the annotation is relevant. In particular, we will have to find the right balance between enough information so that the subsequent partial checking/inference is executed efficiently, and not too much information so that the subsequent inference will not miss important solutions.

4. Split the dataset into training and testing subsets to evaluate the LLM's accuracy and generalization capabilities.

5. Fine-tune an LLM on the LIP6 cluster, utilizing its computational resources for efficient model training.

By integrating the fine-tuned LLM into the type inference process, we aim to significantly reduce the time required for reconstructing set-theoretic types in dynamic languages while maintaining a high level of accuracy.

Évaluez le temps ingénieur dont vous avez besoin pour votre projet:

Quelles sont les compétences spécifiques d'ingénierie attendues ; notamment, langages ou frameworks qui seront utilisés dans le projet, outils existants, connaissances requises ?

Décrivez les activités qui seront confiées à·aux ingénieur·e·s (si possible, avec un planning de tâches attendues)

Editer un projet

Cadre de travail

Présentation du projet scientifique

Description du besoin en ingénierie

Commentaires