Predicting future COVID-19 cases in different regions of Singapore using a Graph Neural Network (GNN) model

GNN Model Used

To predict future COVID-19 cases in Singapore, I had decided to replicate an already-existing graph structure and GNN model created by five University of Illinois’ final-year undergraduate students (https://github.com/Inerix/gnn_covid19_spread) for predicting future COVID-19 cases in different states of the United States and Canada - since it was able to capturing general trends in each state quite well, albeit with some volatility.

Graph Structure

The graph was created where the nodes represent GRCs (Group Representation Constituencies) regions in Singapore. An edge exists between 2 nodes if the 2 GRCs are neighbours.

GNN Model's Input and Output

Input Tensor: The input tensor of the model is a 29x10x1 tensor. Since there are 29 GRCs and thus 29 nodes, each node will have 10x1 feature vector. This 10x1 feature vector is the total number of cases for the previous 10 days
Output Vector: The output of the model is a 29x1 vector which gives the prediction of the total number of cases for the 11th day for each GRC

GNN Model Layers

The GNN model used has 3 layers:

A modified Graph Convolutional Network (GCN) layer: The modiﬁed GCN layer takes the 29x10x1 tensor as input (mentioned above) and performs the graph convolution on the input data and then passes the aggregated data into a stack of 3 Long Short-Term Memory networks
A standard GCN layer
A final standard GCN layer which outputs the total number of cases for the 11th day for each GRC

Technologies Used in the Project

Programming Language: Python
Version Control System: Git
Web Scraping Library: BeautifulSoup
(String) Pattern Matching Library: RegEx
PDF Table Parser Library: Camelot
Data Manipulation and Analysis Libraries: Pandas and NumPy
Geocoding Library: Google Maps API
Library for analysis of planar geometric objects: Shapely
Graph Creation Library: NetworkX
Machine Learning Libraries (to create/train GNN model): PyTorch and DGLGraph
Plotting Library: Matplotlib

Project Pipeline

The project has many different sequential stages. Therefore, running each Python script accomplishes a particular stage:

get_moh_urls_and_save_text_files.py: Download and Save all the Daily COVID-19 Press Release Reports from the Ministry of Health Website (https://www.moh.gov.sg/covid-19/past-updates) as text files

call_parser.py: Creates a Table from the downloaded text files. This table contains all relevant information about every published COVID-19 case (found in the Daily COVID-19 Press Release Reports) in Singapore; each row entry in the Table follows the following format: “Case Number”, “Date of Confirmation”, "Nationality", “Age”, “Gender”, “Cluster” and “Links” (all row entries must have a non-null “Case Number”)

aggregate_raw_cases.py: Aggregate all row entries in the Table that have the same ‘Case Number’. Therefore, each row entry in the Table will now have a unique ‘Case Number’.

add_latlong_columns.py: Adds “Latitude” and “Longitude” columns to the Table. Then, for every row entry: a “Latitude” and “Longitude" value is retrieved using Google Maps API - based on the “Cluster” address

add_grc_columns.py: Adds “GRC” columns to the Table. Then, for every row entry: a “GRC” value is calculated - based on the “Latitude” and “Longitude” values

create_number_of_cases_table.py: Reformats the Table by having the column labels equal the “Dates of Confirmation” (in ascending order) and the row labels equal the different “GRC” regions. Therefore, for example: the number of cases at GRC X on Date Y can be found by looking at (GRC X, Date Y) in the table

create_grc_graph.py: Creates and saves all the 29x10x1 Feature Tensors (i.e the training examples) and corresponding 29x1 Label Vectors (i.e the training labels) that are needed to train the GNN model (Note: the Feature Tensor and corresponding Label Vector are created using the reformatted Table). Furthermore, it also creates and saves the graph where the nodes represents GRCs and an edge exists between 2 nodes if the 2 GRCs are neighbours

8.train_model.py: Train the GNN model (on the saved graph) with the saved Feature Tensors and the Label Vectors

evaluate_trained_model.py: Plots the Ground Truth (actual number of cases) versus Prediction (number of cases predicted by the model) for the training and testing dataset. The r^2 score (i.e the coefficient of determination) of the two plots can be used as a measure of the training and testing accuracy respectively. Furthermore, for every GRC: it plots the Dates versus Ground Truth and Dates versus Prediction on a single graph; this allows us to see how well the model predicts the number of cases for individual GRCs.

Evaluation

Note: The evaluation results mentioned below have been taken from the branch 'train_gnn_with_weekly_cases'

By plotting the ground truth versus predictions on the testing dataset, I had received an r^2 score of ~ 0.898 (3 significant figures). And by plotting the ground truth versus prediction on the training dataset, I had received an r^2 score of ~0.998 (3 significant figures) (you can find these results in the directory 'model files/accuracy files'). Furthermore, from looking at plots of most GRCs (the GRC plots are in the directory 'model files/grc plots'), the predictions closely match the general trend direction of the actual number of cases. However, just like with the project done by the University of Illinois students, the predictions are volatile.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
Results		Results
common		common
csv files		csv files
graph files		graph files
grc files		grc files
log files		log files
model files		model files
modules		modules
readme pictures		readme pictures
scripts		scripts
text files		text files
README.md		README.md
linked_cases.png		linked_cases.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting future COVID-19 cases in different regions of Singapore using a Graph Neural Network (GNN) model

GNN Model Used

Graph Structure

GNN Model's Input and Output

GNN Model Layers

Technologies Used in the Project

Project Pipeline

Evaluation

About

Releases

Packages

Languages

aang114/COVID-19-in-Singapore

Folders and files

Latest commit

History

Repository files navigation

Predicting future COVID-19 cases in different regions of Singapore using a Graph Neural Network (GNN) model

GNN Model Used

Graph Structure

GNN Model's Input and Output

GNN Model Layers

Technologies Used in the Project

Project Pipeline

Evaluation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages