Skip to content

Aniruddhsinh03/JobSearcher

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

JobSearcher

This is a Scrapy project to scrape job information from https://newyork.craigslist.org/d/automotive-services/search/aos.

This project is only meant for educational purposes.

Element Selection

Job Listing Website

Image of Website

Extract Job Urls

Image of jobUrls

Extract Date

Image of Date

Extract Thumbs

Image of Thumbs

Extract Job Posting Body

Image of JobPostingBody

Extracted data

This project extracts quotes, combined with the respective author names and tags. The extracted data looks like this sample:

       {
         "date": "2020-06-08 02:00",
         "link": "https://newyork.craigslist.org/stn/aos/d/staten-island-mobile-auto-body-repair/7137824388.html",
         "text": "mobile auto body repair 60% off shop price",
         "compensation": null,
         "type": null,
         "images": [
         "https://images.craigslist.org/00W0W_dX9Njz7JLwE_0t20t2_600x450.jpg",
         "https://images.craigslist.org/00p0p_85S5teL5luk_0t20t2_600x450.jpg",
         "https://images.craigslist.org/01010_51Z32ESXNL7_0t20t2_600x450.jpg",
         "https://images.craigslist.org/00b0b_81REHtuVZWo_0t20t2_600x450.jpg"
                   ],
         "address": [
         "\n        ",
        "\nmobile auto body repair we come to you same day service all jobs big or small satisfaction guaranteed 60% shop price      r            ubber & plastic bumper repair all size dents rust holes scuff marks scratches faded paint repair fiber glass repair         headlight restoration color matching & blending part replacement & installation call or text for free estimate 917.454.4453"
                    ]
          }

Spiders

This project contains one spider and you can list them using the list command:

$ scrapy list
jonSearcherSpider

Spider extract the data from quotes page and visit author hyperlink and extract auther infomation also.

Running the spiders

You can run a spider using the scrapy crawl command, such as:

$ scrapy crawl jonSearcherSpider

If you want to save the scraped data to a file, you can pass the -o option:

$ scrapy crawl jonSearcherSpider -o output.json