Skip to content

A cloud-based data driven project aiming to establish a secure and efficient data management and analysis system for structured and semi-structured YouTube video data, focusing on video categories and trending metrics.

Notifications You must be signed in to change notification settings

AbhignaSowgandhika/TubePulse-CloudComputing-Project-Fall2023

Repository files navigation

TubePulse: Harnessing Youtube Insights Using The Cloud

This project aims to securely manage, streamline, and perform analysis on the structured and semi-structured YouTube videos data based on the video categories and the trending metrics.

Overview

This is a comprehensive exploration into the vast realm of Youtube data, aiming to uncover insights into popular topics, user engagement, sentiment trends in comments. Leveraging the robust Youtube Data and an array of cutting-edge tools such as AWS Glue, Lambda, Athena, and QuickSight our project delves into data collection, processing, and sophisticated analysis.

Architecture

The main components are:

  • Data Storage - Store the data into 3 S3 Buckets - Raw, Cleansed and Analytics
  • Processing - ETL Job and Lambda functions to process & transform JSON, CSV data into Apache Parquet format
  • Joining - Joining the cleansed data using AWS Athena Queries and ETL Job
  • Metadata - AWS Glue crawlers to catalog dataset schema
  • Analysis - Using AWS QuickSight

Implementation Details

Key steps:

  • Configure S3 buckets
  • Develop ETL logic in Python
  • Deploy Glue ETL Job for CSV to Parquet Conversion
  • Deploy AWS Lambda Function for JSON to Parquet Conversion
  • Data Joining using ETL Job
  • Dashboards creation using QuickSight

Getting Started

Prerequisites:

  • AWS environment with access keys
  • Python, AWS CLI/SDK

Next Steps

Future enhancements:

  • EScheduled ETL Jobs
  • Enhanced Dashboard Experiences
  • Continuous Monitoring and Optimization
  • Machine Learning Integration

About

A cloud-based data driven project aiming to establish a secure and efficient data management and analysis system for structured and semi-structured YouTube video data, focusing on video categories and trending metrics.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages