Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About WYGIWYS #10

Open
tomoohive opened this issue Jul 16, 2020 · 3 comments
Open

About WYGIWYS #10

tomoohive opened this issue Jul 16, 2020 · 3 comments

Comments

@tomoohive
Copy link

tomoohive commented Jul 16, 2020

I think that I want to implement WYGIWYS before implementation of EDD.
I saw this issue. (#6 (comment))
I'd like to know more details of WYGIWYS model.

Please tell me where should I add the RNN model to the original tutorial source?
Also, what role is that RNN?

I'm thinking that RNN works prediction of structure of the table.
Is this understanding of mine correct?

@zhxgj
Copy link
Contributor

zhxgj commented Jul 17, 2020

In WYGIWYS, the RNN is applied to the CNN output to capture longer spatial dependences. You can apply the RNN to each column of the the CNN output.

@nishchay47b
Copy link

Not sure if I did this right but I did a small experiment with WYGIWYS using this implementation. I used format_html function
available in exploring_PubTabNet_dataset.ipynb and created a label file for few images, I just took few hundered sample images and ran training for 2 epochs and got the results like <html> <th> UNK </th> UNK UNK </html> where UNK was the token for out of vocab word. Maybe this happend because I used vocab from this PubTabNet dataset which are basically characters but WYGIWYS expects word tokens in the vocab. If you can create a vocab of words and use this function to generate label files with proper spacing, for each image maybe this can work

@zhxgj
Copy link
Contributor

zhxgj commented Jul 21, 2020

@nishchay47b When I train WYGIWYS, I used the character level tokenization, where HTML tags are single tokens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants