Skip to content

Simplified DOM Trees for Transferable Attribute Extraction from the Web

Notifications You must be signed in to change notification settings

MurtuzaBohra/SimpDOM

Repository files navigation

Attribute Extraction from Web Documents.

Title: Simplified DOM Trees for Transferable Attribute Extraction from the Web Url : https://arxiv.org/pdf/2101.02415.pdf

Keywords: structured data extraction, web information extraction, Simplified DOM

Implementation is in pytorch.

Trained weights on SWDE dataset (auto- vertical) are available here - https://drive.google.com/file/d/1aMuHb8RT_GrKr6VoUvmDsObEwIqypkHy/view?usp=sharing

In order to execute test.ipynb notebook, download the file and unzip in "data" folder.

To re-train the model on other verticals of SWDE dataset use train.ipynb notebook.

About

Simplified DOM Trees for Transferable Attribute Extraction from the Web

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published