Skip to content

maxboehm/lomi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

#LOMI - Enrich Linked Open Data (DBpedia) with Microdata

This project was created during a one-year project at the Universitiy of Mannheim. There has been efforts to Common Crawl, a project which provides an "open repository of web crawl data that can be accessed and analyzed by anyone". This data is used by the Web Data Commons project to extract Schema.org data in N-Quads format.

The main startup classes are located under "com.maximilian_boehm.lod.main". Ideally, you will need to assign at least 6 GB to the JVM to get results. The program is separated in three phases. Phase 1 is the deduper (A0_Deduper.java) which finds instance with multiple occurences and reduces the occurences to a single one. In phase 2, the transformer (A1_Transformer.java) transforms the instances from the Schema.org-Vocabulary to the dbpedia ontology. And finally in phase 3, the instance matcher (A2_InstanceMatcher.java) finds corresponding matches between data from the web and dbpedia.

See also my blog post for further explanations.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages