Skip to content

TALC-sef is a pos-TAgged Literary Corpus, in Serbian, English and French developed at Université d'Artois and Université Lille 3.

License

Notifications You must be signed in to change notification settings

abalvet/TALC-sef

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TALC-sef

TALC-sef is a pos-TAgged Literary Corpus, in Serbian, English and French developed at Université d'Artois and Université Lille 3.

In this corpus, >830,000 Serbian tokens were tagged with BTagger (Gesmundo & Samardzic, 2012), based on a reference corpus of >100,000 manually revised tokens. Tagging accuracy, with our ad hoc tagset (43 tags), and without lemmatization, is over 94% on average.

The corpus is described in our paper: http://www.lrec-conf.org/proceedings/lrec2014/summaries/755.html.

Should you use the tagging models provided, or any other file from the TALC-sef project, please cite:

@InProceedings{BALVET14.755,

author = {Antonio Balvet and Dejan Stosic and Aleksandra Miletic},

title = {TALC-Sef a Manually-revised POS-Tagged Literary Corpus in Serbian, English and French},

booktitle = {Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)},

year = {2014},

month = {may},

date = {26-31},

address = {Reykjavik, Iceland},

editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Hrafn Loftsson and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis},

publisher = {European Language Resources Association (ELRA)},

isbn = {978-2-9517408-8-4},

language = {english}

}


A. Balvet, D. Stosic and A. Miletic.

About

TALC-sef is a pos-TAgged Literary Corpus, in Serbian, English and French developed at Université d'Artois and Université Lille 3.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published