Skip to content

Glogo/wikipedia-redirects

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Wikipedia Redirects

Java projects for extracting and searching for Wikipedia redirects (alternative titles)

Project created by Michael Gloger for school assignment at FIIT STU Bratislava http://vi.ikt.ui.sav.sk/User:Michael.Gloger?view=home

Main goal of this project was to implement parser for finding alternative titles for Wikipedia pages by parsing articles XML dump files. Amongst other detailed information, in each page record we can find page title and flag if this page is redirect to another page. If this page is redirect we can consider its title as alternative title of page it is referring to.

Please note that this project does not bring any new exciting functionality. Wikipedia provides online services such as "What links here" where you can find amongst other things pages referring to specified page. This project was more like a challenge because input XML files are larger than 50 GB of more than 14 mil pages records.

This repository contains two Java projects:

  1. Parser - parsing Wikipedia XML dumps and saving alternative titles data to CSV file
  2. Server - read alternative titles from file, index them in Lucene, provide REST services for page search and webpage to display results

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published