how to handle incomplete structure token predictions? #16

chadrick-kwag · 2020-10-06T01:15:02Z

I wonder how you guys handled incomplete sequences. Here's an example

<thead><tr><td></td><td></td>**<td>**</tr></thead>...

In the first phase when only training the structure decoder, the model gives predictions that are partially incomplete like this.

In the paper there is no specific mention about this problem and how it was handled.

These imperfections affect when calculating TEDS and also moving on to second phase of the training where training the cell decoder will need to be run at the same time. I expect these imperfections will make cell decoder GT assignment confusing.

Can you share on how these imperfections were handled?

The text was updated successfully, but these errors were encountered:

Sunnycheey · 2020-10-25T08:15:03Z

Also want to know how to deal whit this kind of exception.

zhxgj · 2020-12-06T22:56:38Z

Currently we did not have any post processing to the model output. HTML is tolerant to some of these errors and we do not think we can do much better than HTML. I would suggest to parse the incomplete sequence with something like lxml, and see what the tree looks like.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to handle incomplete structure token predictions? #16

how to handle incomplete structure token predictions? #16

chadrick-kwag commented Oct 6, 2020

Sunnycheey commented Oct 25, 2020

zhxgj commented Dec 6, 2020

how to handle incomplete structure token predictions? #16

how to handle incomplete structure token predictions? #16

Comments

chadrick-kwag commented Oct 6, 2020

Sunnycheey commented Oct 25, 2020

zhxgj commented Dec 6, 2020