Skip to content

drizzleco/salary-database

Repository files navigation

Salary Database

Build status codecov

Technology

  • sqlite3
  • python

Setup

  1. make install
  2. make collect_2019_data to populate salary db with just 2019 data(if using sqlite)
  3. docker-compose up --build to build and start docker container(if using postgres)

Info

To obtain the H1B data, go to the site here. Then, click on the "Disclosure Data" tab. Once we're here, scroll down to LCA Programs and download the report for the given fiscal year. In my sample, I ran this for 2019. After downloading one of the excel file, open it in your desktop client and export it to csv. Then, use the "Data Ingestion" notebook to clean it up.

Currently, all the information exists in the data folder. I added the steps for moving information in the data ingestion notebook. We can probably automate that to make this idempotent.

Feel free to access the GraphQL Viewer!

Using https://salary-database.herokuapp.com/graphql, feel free to take a look at the schema and query the data. Here is an example input

{
  salaries(limit: 10, employer: "AIRBNB", year:"2020"){
    caseNumber
    employerName
    jobTitle
    prevailingWage
    employmentStartDate
  }
}

with the following link: Query Link

Production

Deploying to Heroku

To deploy Docker container to heroku:

  1. heroku create (one time step)
  2. heroku container:push web
  3. heroku container:release web
  4. heroku open

Pushing Salary Data from Local DB to Production DB

You'll need:

  • postgresql
    • install with: brew install pgloader
  • pgloader
    • install with: brew install postgresql
  • Heroku CLI

In the future, when we want to update the database, the steps to push our local sqlite database to the production heroku database are:

  1. use pgloader to load our local sqlite db to a local postgres database
    • pgloader data/salary.sqlite postgresql:///[name of postgres dev db]
  2. reset remote db
    • heroku pg:reset DATABASE_URL --app salary-database
  3. push local postgres database to heroku
    • heroku pg:push postgresql:///[name of postgres dev db] DATABASE_URL --app salary-database

NOTE: you can use heroku pg:info --app salary-database to get info about the the production database and check if we are near the row limit.