Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to handle incomplete structure token predictions? #16

Open
chadrick-kwag opened this issue Oct 6, 2020 · 2 comments
Open

how to handle incomplete structure token predictions? #16

chadrick-kwag opened this issue Oct 6, 2020 · 2 comments

Comments

@chadrick-kwag
Copy link

I wonder how you guys handled incomplete sequences. Here's an example

<thead><tr><td></td><td></td>**<td>**</tr></thead>...

In the first phase when only training the structure decoder, the model gives predictions that are partially incomplete like this.

In the paper there is no specific mention about this problem and how it was handled.

These imperfections affect when calculating TEDS and also moving on to second phase of the training where training the cell decoder will need to be run at the same time. I expect these imperfections will make cell decoder GT assignment confusing.

Can you share on how these imperfections were handled?

@Sunnycheey
Copy link

Also want to know how to deal whit this kind of exception.

@zhxgj
Copy link
Contributor

zhxgj commented Dec 6, 2020

Currently we did not have any post processing to the model output. HTML is tolerant to some of these errors and we do not think we can do much better than HTML. I would suggest to parse the incomplete sequence with something like lxml, and see what the tree looks like.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants