Skip to content

A Python tool for text classification based on the author's writing style.

Notifications You must be signed in to change notification settings

cmantoux/authorship-attribution

Repository files navigation

Automatic Authorship Attribution using Statistics

This repository hosts the code for a school project on authorship attribution, by Guillaume Dalle, Jasmine Gamblin, Maxime Godin, Clément Mantoux, Wang Sun and Lucile Vigué. We designed a software to answer the question: "Given a set of texts with known authors, can we infer the author of a new document?" The original motivation was to apply our method to a litterary controversy on the authorship of Alexandre Dumas' Les Trois Mousquetaires.

We compared a wide variety of statistical methods combined to a preliminary feature extraction step. We proposed a standardized pipeline to evaluate, interpret and visually represent the output or the classification algorithms. We applied our method to several problems, like classification between novels for children and novels for adults, authorship attribution on a corpus of naturalist French writers, or classification between truth and lies on a data set we gathered.

The final report for this PSC (Collective Scientific Project, in French) can be found here.

About

A Python tool for text classification based on the author's writing style.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages