Skip to content

TechnionTDK/wikipedia-places

Repository files navigation

Wikipedia Places API

This project's final product is a server- http request returning json. Used as a server of WikiPo (Google's play app).

Introduction:

  • This project was done in the TDK - Technion Data & Knowledge Lab of the CS faculty By Nerya Hadad under the supervision of Dr. Oren Mishali, and updated later by Tzahi Levi, and Raz Levi.

Prerequisite:

Server Structure:

drawing

API:

wiki_by_place

Parameters:
  • lat- location's latitude.
  • lon- location's longitude.
  • radius- a number and it's unit [mm, cm, m, km...].
  • [optional] from- an index to start receiving the data from (using for pagination).
  • [optional] size- number of places for receiving (using for pagination).
Output: All Wikipedia entries in Hebrew with location (coordinates) at the defined area-
  • label- the headline of the Wikipedia page.
  • url- url of the Wikipedia page.
  • abstract- the first 5 sentences of the wikipedia article. some labels have no abstract.
  • imageUrl- image url of the Wikipedia page. some labels have no imageUrl.
  • pin- {distance[km], location: {lat, lon}} of the article.
Request Examples:

place_details_by_name

Parameters:
  • name- partially place's name or pattern.
Output: the full name and the coordinates of the given name
  • name- full place's name.
  • lat- place's latitude.
  • lon- place's longitude.
Request Examples:

place_details_by_coordinates

Parameters:
  • lat- place's latitude.
  • lon- place's longitude.
Output: the full name of the given place
  • name- full place's name.
Request Examples:

get_suggestions

Parameters:
  • name- full place's name.
Output:
  • suggestions- list of full place's names that includes the given name.
Request Examples:

Instruction for running the server:

1. Clone:

  • Clone this repository to your local machine.

git clone https://github.com/TechnionTDK/wikipedia-places

2. Install Python:

Windows:

  • Download python3 latest version from python official website.

Linux:

  • Install python3 with these commands:

sudo apt-get update sudo apt-get install python3.6

3. Fill input directory:

  • Input was generated following the instructions here. There are 3 scripts, only 1 is relevant for our input.
  • Clone this repository.
  • Run labels_generator.py only.
  • Take it's output to our repository in input/all_labels directory.

4. Python packages:

  • Install some python packages automatically by running:

pip install -r requirements.txt

  • You can do it manually by running:

python3 -m pip install packageName

  • While the packageNames are:
  • elasticsearch==7.6.0
  • Flask==1.1.2
  • geopy==1.21.0
  • requests==2.23.0
  • Jinja2==3.0.3
  • itsdangerous==2.0.1
  • werkzeug==2.0.2

5. Verify Python packages were successfully installed:

  • verify that you have installed the right packages using:

pip list

6. Create virtual environment

  • Run the following commands for creating a virtual environment:

python3.7 -m venv env source env/bin/activate

7. Setup Elasticsearch

Linux:

  • Run the following commands in your home directory:

cd ~ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.6.0-linux-x86_64.tar.gz tar -xf elasticsearch-7.6.0-linux-x86_64.tar.gz cd elasticsearch-7.6.0/ bin/elasticsearch

  • Use port 9200 (or change this number on the scripts to your port number).

Windows:

8. Build the Elasticsearch:

  • This phase is very long and can take several days- you should use screen command to prevent it from stopping.
  • This builder takes all the wikipedia labels and parse it- filter the articles with coordinates and add the imageUrl and the first paragraph of the article. in the end, it creates the indices for Elasticsearch.
  • Run the elastic_builder.py using screen command:

screen python elastic_builder.py

  • For checking the process, see the logs in the report_file.txt.

cat report_file.txt

  • if problem is occurred, you can continue running the script from the middle (all parsed data is saved locally). print report_file.txt and check what file the process is stopped. use the argument --file (-f) for continuing from the file number . use the argument --index (-i) if you have the all parsed data, and you want to pass the parsing and only index the elastic search.

9. Run server.py:

  • Run the Flask's server.
  • Use screen command to prevent it from stopping.

python server.py

  • If you got a permission denied error, run:

sudo env/bin/python server.py

10. Your server is up now and can be communicated.

Reindex The server:

  • Every time you wants to rebuild the server, run the virtual environment again:

source env/bin/activate

  • For reindex the server, you can just run elastic_builder.py again.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages