Attribute Extraction from Web Documents.
Title: Simplified DOM Trees for Transferable Attribute Extraction from the Web Url : https://arxiv.org/pdf/2101.02415.pdf
Keywords: structured data extraction, web information extraction, Simplified DOM
Implementation is in pytorch.
Trained weights on SWDE dataset (auto- vertical) are available here - https://drive.google.com/file/d/1aMuHb8RT_GrKr6VoUvmDsObEwIqypkHy/view?usp=sharing
In order to execute test.ipynb notebook, download the file and unzip in "data" folder.
To re-train the model on other verticals of SWDE dataset use train.ipynb notebook.