Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pennsylvania Monthly Salary Data (2012--2017) #2

Open
soodoku opened this issue Sep 22, 2017 · 7 comments
Open

Pennsylvania Monthly Salary Data (2012--2017) #2

soodoku opened this issue Sep 22, 2017 · 7 comments
Assignees

Comments

@soodoku
Copy link
Member

soodoku commented Sep 22, 2017

Source =

http://pennwatch.pa.gov/employees/Pages/Employee-Salaries.aspx

To scrape, use the form to subset salary by $0---$25,000 etc.

Add columns indicating year and month to each row scraped.
Also add department to each row. It comes up as a title.

@ChrisMuir
Copy link
Contributor

FYI, the service at the link in your post is unavailable, see screenshot. No idea if it's just temporarily down, but figured I'd document it.

penn_state_website

@soodoku
Copy link
Member Author

soodoku commented Dec 6, 2017

temporary outage. i can see it.

@ChrisMuir
Copy link
Contributor

ChrisMuir commented Dec 7, 2017

I manually pulled the latest PDF's from 2017-11-15 and wrote a script to read them all in, extract the data, merge into a single data frame, and write to csv. Next step I will write a scraper to automate the process of pulling all PDF's from all time frames, and try to apply the aggregation script to all of the PDF's.

I pushed the 2017-11-15 data (raw PDF's and 7z of output df) and aggregation script to the repo.

@soodoku
Copy link
Member Author

soodoku commented Dec 7, 2017

Nice man! Really cool! :-)

@soodoku
Copy link
Member Author

soodoku commented Dec 23, 2017

hey @ChrisMuir --- should we close this issue?

@ChrisMuir
Copy link
Contributor

Ah, I haven't yet completed the next step (write script to pull all of the PDF's from the website). We have a plan in place, but I don't know if that's enough to close this issue or if you want to wait until all of the PA work has been completed...it's up to you.

@soodoku
Copy link
Member Author

soodoku commented Dec 23, 2017

Thanks, man! Let's wait.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants