Skip to content

Web application to predict enterococci bacteria risk levels along Galveston Island beaches using weather trends

Notifications You must be signed in to change notification settings

david-garza/enterococci_prediction_analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Beach Bacteria Prediction Modeling Project

Problem

Texas beaches are a great place to relax and have fun, but there are potential dangers in the water. Bacteria levels can exceed safe levels causing the state to close beaches, loss of revenue for locals, or most siverly, make bathers sick.

The state currently tests and reports bacteria levels for several beaches and displays bacteria counts at TexasBeachWatch.com.

The problem is that testing occurs weekly or biweekly, there is a delay of at least three days from when the sample is taken to when the results are finalized and published to the public. Bacteria levels may have risen to unsafe levels during the time delay.

Proposal

Using historical bacteria sample and weather records, we propose to train a regression model that estimates the bacteria counts when provided weather information. If successful, the delay from testing to reporting to the public would be greatly reduced.

Data Sources

  • Beach Advisory and Closing On-line Notification historical bacteria levels example. (CSV download)

  • Historical Weather Data example. (CSV download)

Team Communications Protocols

Zoom Meeting

The team will meet weekly via zoom at 9:00 AM on Monday to map out a work plan and duties for the coming week.

Slack

Primary channel for real time communications between team members and instructional staff.

GitHub

Team will comment on pull request and issues to create a record of work specific to changes in the repo.

Project Outline

ETL

ETL was performed on CSV files listed in data sources above. All of the ETL work was performed in python using pandas. The connection to the database was established through SQLAlchemy.

Details of the ETL process can be found in this Juypter Notebook.

Database

An instance of PostgreSQL on AWS is used to hold the transformed CSV files. The database contains 5 tables containing information about beach properties, bacteria sample records, and historical data from three different weather stations. The tables were joined to create a view in PostgreSQL that can be accessed by the machine learning model via SQLAlchemy.

The detailed information on the database can be found in the datbase folder.

Explorartory Analysis

Exploratory analysis was done using Tableau. The findings were used to enhance and narrow the machine learning options.

The Tableau story can be found here.

Machine Learning

The machine learning model evolved from a regression model to a classifier from feed back in model performance and findings in the exploratory analysis.

A detailed write up of the machine learning model evolution can be found here.

GSlides

The slide presentation can be found here.

Interactive Website

Beach Bacteria Risk Prediction