matthieuvion

Ex. data scientist @Linkfluence (Radarly). NLP, tabular data, time-series, APIs, but curious about anything, really.

Current projects : NLP, LLM fine-tuning / deployment optimization :

lmd_classi : Classifier 3 classes on pro-russian comments. From (small) annotated data + baseline w/ few-shot model (SetFit) to synthetic data generation, LLM (Mistral-7B / LLama3-8B) fine-tuning and multi-e5-base classifier w/ quantization for deployment (FastAPI/Docker/GCP). Full guide, ressources, benchmarks, available as organized notebooks & code (Kaggle + Hugginface + WandB)
lmd_viz : Ukraine War (1st year) : comments as a proxy for people engagement. Viz + crafted a 200k comments dataset from Le Monde w/ my own API (usable "as is").

Cool stuff :

wzkd app & match2kd : reverse engineering matchmaking score from players' features only ; using XGB & streamlit App.
wzlight : light wrapper around COD WZ api (discontinuated since). Also available on PyPi.

My favorite AI/ML newsletter (should read it too!) :

Data Machina : not mine, not sponsored. Technical-but-simple enough and existed way before the LLM hype.

Provide feedback