Train the model #56

Kik099 · 2024-01-21T00:41:20Z

Hello, thank you for your attention. I am working on a dissertation for a master's in Computer Engineering, and I would like to inquire if I could use this model as a foundation for my thesis. If so, my goal is to train the model for use in Portuguese. I've successfully run Kazu, and now I'd like to use this one to compare data related to the healthcare field.
Can you tell me how I can train these model?
And how the training data files needed to be.

Best regards,
Rodrigo Saraiva

mjeensung · 2024-01-21T05:20:34Z

Hi @Kik099

The instructions for training NER models are described at https://github.com/dmis-lab/BERN2/tree/main/multi_ner/training. You can review the format of the training data there.

If you encounter any issues while training models on your data, please feel free to ask follow-up questions

Thanks

Kik099 · 2024-01-21T05:48:57Z

I just want to know how the training data should be. I have data files that are tokenized and contain the classification. Does this work? Example:

The 0
pacient 0
took 0
an 0
aspirin B-drug
because 0
his 0
head B-disease
hurt I-disease
. 0

minstar · 2024-01-22T04:17:37Z

Hi @Kik099 ,

We have split each type of labels and train separate classifier layers.
Here is the example of the training data:

For drug
The 0
pacient 0
took 0
an 0
aspirin B
because 0
his 0
head 0
hurt 0
. 0
For disease
The 0
pacient 0
took 0
an 0
aspirin B
because 0
his 0
head B
hurt I
. 0

If you have any questions, feel free to tag me please!

Kik099 · 2024-04-09T15:27:43Z

Hi mjeensung.
Sorry just to reply now.
Basically I need to train as I told you.
You told me I will have separate files for each label.
But let's say I have 5000 files that are annotated.
Do I need to separate the files into different folders?
Which means each folder would be for each type?
And if yes, in the link you supplied me it only refers one data_dir. This means I need the NERdata/ with all folders ?

And sorry to disturb.

Best,

Rodrigo Saraiva

minstar · 2024-04-18T01:19:54Z

Hi @Kik099
We use separate files to annotate different types, but it doesn't matter.
Because we connect to each data type to separate the classification layer in the model.py file.
Thus, if you want to annotate many types then you should allocate to each classification layer.

Best, Minbyul

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train the model #56

Train the model #56

Kik099 commented Jan 21, 2024

mjeensung commented Jan 21, 2024

Kik099 commented Jan 21, 2024

minstar commented Jan 22, 2024 •

edited

Loading

Kik099 commented Apr 9, 2024 •

edited

Loading

minstar commented Apr 18, 2024

Train the model #56

Train the model #56

Comments

Kik099 commented Jan 21, 2024

mjeensung commented Jan 21, 2024

Kik099 commented Jan 21, 2024

minstar commented Jan 22, 2024 • edited Loading

Kik099 commented Apr 9, 2024 • edited Loading

minstar commented Apr 18, 2024

minstar commented Jan 22, 2024 •

edited

Loading

Kik099 commented Apr 9, 2024 •

edited

Loading