Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FileNotFoundError: data path not found: data/conll2003/resource/label.vocab #28

Open
ScottLiao920 opened this issue Jul 3, 2019 · 8 comments

Comments

@ScottLiao920
Copy link

Cannot find this file. Also, what's the usage of run_embed.py?

@stevezheng23
Copy link
Owner

stevezheng23 commented Jul 3, 2019

@ScottLiao920 this file and the train/dev/test data should be prepared for each NER task.

For CoNLL2003 NER task, you can use the followings,
<pad> O X <cls> <sep> B-LOC B-MISC B-ORG B-PER I-LOC I-MISC I-ORG I-PER

@stevezheng23
Copy link
Owner

For run_embed, it's used to export saved model for sent/token-level embedding generation

@ScottLiao920
Copy link
Author

Does this mean that for CoNLL2003 NER task, each line of label.vocab corresponds to each line of another training data file?

@stevezheng23
Copy link
Owner

No, label.vocab is only for CoNLL2003 NER tag list, <pad> O X <cls> <sep> B-LOC B-MISC B-ORG B-PER I-LOC I-MISC I-ORG I-PER. It doesn't correspond to lines in training data file.

@stevezheng23
Copy link
Owner

stevezheng23 commented Jul 3, 2019

Here are the screenshots for vocab file and data file,
label.vocab:
image
train-conll2003.json:
image

@stevezheng23
Copy link
Owner

stevezheng23 commented Jul 3, 2019

The data file format is a little bit different from CoNLL format, you can easily convert CoNLL format data to json format data using the following command, where input file is in CoNLL format and output file is in json format
python prepro/prepro_conll.py --data_format json --input_file data/ner/conll2003/raw/eng.xxx --output_file data/ner/conll2003/xxx-conll2003/xxx-conll2003.json

@ScottLiao920
Copy link
Author

solved already. Thanks!

@stevezheng23
Copy link
Owner

@ScottLiao920 Great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants