Skip to content

greenelab/connectivity-search-backend

Repository files navigation

connectivity search backend

CircleCI

This django application powers the API available at https://search-api.het.io/.

Environment

This repository uses conda to manage its environment as specified in environment.yml. Install the environment with:

conda env create --file=environment.yml

Then use conda activate hetmech-backend and conda deactivate to activate or deactivate the environment.

Secrets

Users must supply dj_hetmech/secrets.yml with the database connection information and two optional parameters for Django settings. See dj_hetmech/secrets-template.yml for what fields should be defined. These secrets will determine whether django connects to a local database or a remote database and other security settings in Django.

Notebooks

Use the following command to launch Jupyter Notebook in your browser for interactive development:

python manage.py shell_plus --notebook

Server

A local development server can be started with the command:

python manage.py runserver

This exposes the API at http://localhost:8000/v1/.

Database

This project uses a PostgreSQL database. The deployed version of this application uses a remote database. Public read-only access is available with the following configuration:

name: connectivity_db
user: read_only_user
password: tm8ut9uzqx7628swwkb9
host: search-db.het.io
port: 5432

To erect a new database locally for development, run:

# https://docs.docker.com/samples/library/postgres/
docker run \
  --name connectivity_db \
  --env POSTGRES_DB=connectivity_db \
  --env POSTGRES_USER=dj_hetmech \
  --env POSTGRES_PASSWORD=not_secure \
  --volume "$(pwd)"/database:/var/lib/postgresql/data \
  --publish 5432:5432 \
  --detach \
  postgres:12.4

Populating the database

To populate the database from scratch, use the populate_database management command (source). Here is an example workflow:

# migrate database to the current Django models
python manage.py makemigrations
python manage.py migrate --run-syncdb
# view the populate_database usage docs
python manage.py populate_database --help
# wipe the existing database (populate_database assumes empty tables)
python manage.py flush --no-input
# populate the database (will take a long time)
python manage.py populate_database --max-metapath-length=3 --reduced-metapaths --batch-size=12000
# output database information and table summaries
python manage.py database_info

Another option to load the database is to import it from the connectivity-search-pg_dump.sql.gz database dump, which will save time if you are interested in loading the full database (i.e. without --reduced-metapaths). This 5 GB file is available on Zenodo (TODO: update latest database dump to Zenodo).

To load connectivity-search-pg_dump.sql.gz into a new database, modify the following command:

zcat hetmech-pg_dump.sql.gz | psql --user=dj_hetmech --dbname=connectivity_db --host=HOST

connectivity-search-pg_dump.sql.gz was exported from the development Docker database with the command:

docker exec connectivity_db \
  pg_dump \
  --host=localhost --username=dj_hetmech --dbname=connectivity_db \
  --create --clean \
  --compress=8 \
  > connectivity-search-pg_dump.sql.gz