Steps to Run the program :
- Run scrapper.py to fetch the data from Reddit and Store it to csv files.
- Next, Run Visualise.py to generate the social network graph form the edges.csv.
- Next, Run analyse.py to generate the network metrics.
Libraries Required :
Pandas, Networkx, Praw, matplot.lib
Instructor: Prof. Kai Shu
This project is a 2-member group assignment aimed at crawling social media data and performing an analysis on the collected data. The project is divided into three main steps: Data Collection, Data Visualization, and Network Measures Calculation.
- Crawl social media data.
- Process and analyze the extracted data.
- Report findings and methodology.
- Data Collection: Select a social media platform and crawl data to create a social network with 100-500 nodes.
- Data Visualization: Use graph analysis software to visualize the network.
- Network Measures Calculation: Calculate and plot various network measures.
Choose a social media platform and figure out how to crawl data from it. You may need API credentials for this purpose. Document your process in the report.
Visualize the fetched data as a graph using packages like NetworkX, SNAP, Gephi, NodeXL, or graph-tool. Include snapshots in your report.
Calculate network measures like Degree Distribution, Clustering Coefficient, PageRank, Diameter, Closeness, and Betweenness. Choose appropriate measures to plot and analyze, and include these in your report.