Skip to content

Graph-based and Transition-based dependency parsers based on BiLSTMs

License

Notifications You must be signed in to change notification settings

wddabc/bist-parser

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

68 Commits
 
 
 
 
 
 
 
 

Repository files navigation

A Pytorch implementation of the BIST Parsers (for graph based parser only)

To be more accurate, this implementation is just a line-by-line translation from the DyNet implementation that can be found here. The techniques behind the parser are described in the paper Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations.

Required software

Data format:

The software requires having a training.conll and development.conll files formatted according to the CoNLL data format, or a training.conllu and development.conllu files formatted according to the CoNLLU data format.

Train a parsing model

python src/parser.py --outdir [results directory] --train data/en-universal-train.conll --dev data/en-universal-dev.conll --epochs 30 --lstmdims 125 --bibi-lstm

Parse data with your parsing model

The command for parsing a test.conll file formatted according to the CoNLL data format with a previously trained model is:

python src/parser.py --predict --outdir [results directory] --test data/en-universal-test.conll --model [trained model file] --params [param file generate during training]

The parser will store the resulting conll file in the out directory (--outdir).

Difference from the DyNet implementation

  1. The multiple roots checking of the evaluation script is turned off (See here) as it might generate trees with multiple roots. (See the discussion here)
  2. This version hasn't yet supported deep LSTM as the DyNet version does, which means --lstmlayer is no larger than 1.

About

Graph-based and Transition-based dependency parsers based on BiLSTMs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 52.5%
  • Perl 47.5%