Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train the model #56

Open
Kik099 opened this issue Jan 21, 2024 · 5 comments
Open

Train the model #56

Kik099 opened this issue Jan 21, 2024 · 5 comments

Comments

@Kik099
Copy link

Kik099 commented Jan 21, 2024

Hello, thank you for your attention. I am working on a dissertation for a master's in Computer Engineering, and I would like to inquire if I could use this model as a foundation for my thesis. If so, my goal is to train the model for use in Portuguese. I've successfully run Kazu, and now I'd like to use this one to compare data related to the healthcare field.
Can you tell me how I can train these model?
And how the training data files needed to be.

Best regards,
Rodrigo Saraiva

@mjeensung
Copy link
Contributor

Hi @Kik099

The instructions for training NER models are described at https://github.com/dmis-lab/BERN2/tree/main/multi_ner/training. You can review the format of the training data there.

If you encounter any issues while training models on your data, please feel free to ask follow-up questions

Thanks

@Kik099
Copy link
Author

Kik099 commented Jan 21, 2024

I just want to know how the training data should be. I have data files that are tokenized and contain the classification. Does this work? Example:

The 0
pacient 0
took 0
an 0
aspirin B-drug
because 0
his 0
head B-disease
hurt I-disease
. 0

@minstar
Copy link
Collaborator

minstar commented Jan 22, 2024

Hi @Kik099 ,

We have split each type of labels and train separate classifier layers.
Here is the example of the training data:

  1. For drug
    The 0
    pacient 0
    took 0
    an 0
    aspirin B
    because 0
    his 0
    head 0
    hurt 0
    . 0

  2. For disease
    The 0
    pacient 0
    took 0
    an 0
    aspirin B
    because 0
    his 0
    head B
    hurt I
    . 0

If you have any questions, feel free to tag me please!

@Kik099
Copy link
Author

Kik099 commented Apr 9, 2024

Hi mjeensung.
Sorry just to reply now.
Basically I need to train as I told you.
You told me I will have separate files for each label.
But let's say I have 5000 files that are annotated.
Do I need to separate the files into different folders?
Which means each folder would be for each type?
And if yes, in the link you supplied me it only refers one data_dir. This means I need the NERdata/ with all folders ?

And sorry to disturb.

Best,

Rodrigo Saraiva

@minstar
Copy link
Collaborator

minstar commented Apr 18, 2024

Hi @Kik099
We use separate files to annotate different types, but it doesn't matter.
Because we connect to each data type to separate the classification layer in the model.py file.
Thus, if you want to annotate many types then you should allocate to each classification layer.

Best, Minbyul

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants