Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Source Request]: Lane Cove Municipality, NSW, Australia #2268

Open
dpupkov opened this issue Jul 15, 2024 · 0 comments
Open

[Source Request]: Lane Cove Municipality, NSW, Australia #2268

dpupkov opened this issue Jul 15, 2024 · 0 comments
Labels
source request Request to add a new source

Comments

@dpupkov
Copy link

dpupkov commented Jul 15, 2024

Municipality / Region

Lane Cove Municipality, NSW, Australia

Collection Calendar Webpage

https://www.lanecove.nsw.gov.au/Services/Waste-and-Recycling/Waste-Collection-Calendar

Example Address

17 Moore ST LANE COVE WEST, 2066

Collection Data Format

As html on a webpage

Additional Information

Seems like two requests:
First is HTTP GET request to get address
https://www.lanecove.nsw.gov.au/api/v1/myarea/search?keywords=17%20Moore%20ST%20LANE%20COVE%20WEST
response in json:

{
    "Items": [
        {
            "Id": "c827fdc6-bce5-4791-b11e-c1a831071646",
            "AddressSingleLine": "17 Moore ST LANE COVE WEST, 2066",
            "MunicipalSubdivision": null,
            "Distance": 0,
            "Score": 17.309006,
            "LatLon": null
        }
    ],
    "Offset": 0,
    "Limit": 10,
    "Total": 1
}

Second is retrieving schedule based on the location ID request. This is done with HTTP GET request https://www.lanecove.nsw.gov.au/ocapi/Public/myarea/wasteservices?geolocationid=c827fdc6-bce5-4791-b11e-c1a831071646&ocsvclang=en-AU&pageLink=/$b9015858-988c-48a4-9473-7c193df083e4$/Services/Waste-and-Recycling/Waste-Collection-Calendar**** (potentially pageLink is optional). Response in json:

{"success":true,"responseContent":"\r\n      \u003cdiv class=\"module-widget waste-services-widget\"\u003e\r\n\u003ch2 class=\"sub-title\"\u003eWaste Collection\u003c/h2\u003e\r\n\u003cdiv class=\"grid waste-services-results results-4\"\u003e\r\n    \r\n      \u003cdiv class=\"col-xs-12 col-m-6 waste-services-result regular-service general-waste date-precise item-0\"\u003e\r\n\u003carticle\u003e\r\n  \u003ch3\u003eGeneral Waste\u003c/h3\u003e\r\n\u003cdiv class=\"service-details\"\u003e\r\n   \u003cdiv class=\"note\"\u003eCollected weekly. Place bin on verge on night before collection.\u003c/div\u003e\r\n   \r\n   \u003cdiv class=\"next-service\"\u003e\r\n     Fri 19/7/2024\r\n   \u003c/div\u003e\r\n\u003c/div\u003e\r\n\u003c/article\u003e\r\n\u003c/div\u003e\r\n    \r\n      \u003cdiv class=\"col-xs-12 col-m-6 waste-services-result regular-service green-waste date-precise item-1\"\u003e\r\n\u003carticle\u003e\r\n  \u003ch3\u003eGreen Waste\u003c/h3\u003e\r\n\u003cdiv class=\"service-details\"\u003e\r\n   \r\n   \r\n   \u003cdiv class=\"next-service\"\u003e\r\n     Fri 26/7/2024\r\n   \u003c/div\u003e\r\n\u003c/div\u003e\r\n\u003c/article\u003e\r\n\u003c/div\u003e\r\n    \r\n      \u003cdiv class=\"col-xs-12 col-m-6 waste-services-result regular-service recycling date-precise item-2\"\u003e\r\n\u003carticle\u003e\r\n  \u003ch3\u003eContainer Recycling\u003c/h3\u003e\r\n\u003cdiv class=\"service-details\"\u003e\r\n   \r\n   \r\n   \u003cdiv class=\"next-service\"\u003e\r\n     Fri 19/7/2024\r\n   \u003c/div\u003e\r\n\u003c/div\u003e\r\n\u003c/article\u003e\r\n\u003c/div\u003e\r\n    \r\n      \u003cdiv class=\"col-xs-12 col-m-6 waste-services-result regular-service paper-cardboard-recycling date-precise item-3\"\u003e\r\n\u003carticle\u003e\r\n  \u003ch3\u003ePaper and Cardboard Recycling\u003c/h3\u003e\r\n\u003cdiv class=\"service-details\"\u003e\r\n   \r\n   \r\n   \u003cdiv class=\"next-service\"\u003e\r\n     Fri 26/7/2024\r\n   \u003c/div\u003e\r\n\u003c/div\u003e\r\n\u003c/article\u003e\r\n\u003c/div\u003e\r\n    \r\n      \u003c/div\u003e\r\n\u003c/div\u003e\r\n    "}

Something like this potentially should work to parse the response:

import json
from bs4 import BeautifulSoup

# The JSON response
json_response = '''{
    "success":true,
    "responseContent":"\r\n      \u003cdiv class=\"module-widget waste-services-widget\"\u003e\r\n\u003ch2 class=\"sub-title\"\u003eWaste Collection\u003c/h2\u003e\r\n\u003cdiv class=\"grid waste-services-results results-4\"\u003e\r\n    \r\n      \u003cdiv class=\"col-xs-12 col-m-6 waste-services-result regular-service general-waste date-precise item-0\"\u003e\r\n\u003carticle\u003e\r\n  \u003ch3\u003eGeneral Waste\u003c/h3\u003e\r\n\u003cdiv class=\"service-details\"\u003e\r\n   \u003cdiv class=\"note\"\u003eCollected weekly. Place bin on verge on night before collection.\u003c/div\u003e\r\n   \r\n   \u003cdiv class=\"next-service\"\u003e\r\n     Fri 19/7/2024\r\n   \u003c/div\u003e\r\n\u003c/div\u003e\r\n\u003c/article\u003e\r\n\u003c/div\u003e\r\n    \r\n      \u003cdiv class=\"col-xs-12 col-m-6 waste-services-result regular-service green-waste date-precise item-1\"\u003e\r\n\u003carticle\u003e\r\n  \u003ch3\u003eGreen Waste\u003c/h3\u003e\r\n\u003cdiv class=\"service-details\"\u003e\r\n   \r\n   \u003cdiv class=\"next-service\"\u003e\r\n     Fri 26/7/2024\r\n   \u003c/div\u003e\r\n\u003c/div\u003e\r\n\u003c/article\u003e\r\n\u003c/div\u003e\r\n    \r\n      \u003cdiv class=\"col-xs-12 col-m-6 waste-services-result regular-service recycling date-precise item-2\"\u003e\r\n\u003carticle\u003e\r\n  \u003ch3\u003eContainer Recycling\u003c/h3\u003e\r\n\u003cdiv class=\"service-details\"\u003e\r\n   \r\n   \u003cdiv class=\"next-service\"\u003e\r\n     Fri 19/7/2024\r\n   \u003c/div\u003e\r\n\u003c/div\u003e\r\n\u003c/article\u003e\r\n\u003c/div\u003e\r\n    \r\n      \u003cdiv class=\"col-xs-12 col-m-6 waste-services-result regular-service paper-cardboard-recycling date-precise item-3\"\u003e\r\n\u003carticle\u003e\r\n  \u003ch3\u003ePaper and Cardboard Recycling\u003c/h3\u003e\r\n\u003cdiv class=\"service-details\"\u003e\r\n   \u003cdiv class=\"next-service\"\u003e\r\n     Fri 26/7/2024\r\n   \u003c/div\u003e\r\n\u003c/div\u003e\r\n\u003c/article\u003e\r\n\u003c/div\u003e\r\n    \r\n      \u003c/div\u003e\r\n\u003c/div\u003e\r\n    "
}'''

# Parse the JSON response
data = json.loads(json_response)
html_content = data["responseContent"]

# Parse the HTML content
soup = BeautifulSoup(html_content, 'html.parser')

# Extract the required information
services = {}
for article in soup.find_all('article'):
    service_name = article.h3.text.strip()
    next_service_date = article.find('div', class_='next-service').text.strip()
    services[service_name] = next_service_date

# Print the extracted information
for service, date in services.items():
    print(f"{service}: {date}")

# The expected output should be:
# General Waste: Fri 19/7/2024
# Green Waste: Fri 26/7/2024
# Container Recycling: Fri 19/7/2024
# Paper and Cardboard Recycling: Fri 26/7/2024
@dpupkov dpupkov added the source request Request to add a new source label Jul 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
source request Request to add a new source
Projects
None yet
Development

No branches or pull requests

1 participant