Skip to content

Downloading financial news podcasts and storing them using a data pipeline

Notifications You must be signed in to change notification settings

Nana-322/Podcast-Download-with-Airflow-Data-Pipeline-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Overview

The goal is to create a data pipeline to download podcast episodes from marketplace - a financial news podcast- using an Airflow Data pipeline.

The pipeline tasks are divided into 4: Task 1: Downloading the podcast xml and parse Task 2: Creating a SQLite database to store podcast metadata Task 3: Storing podcast metadata into the created database Task 4: Downloading the actual podcast audio using the python requests library

Future Work

  • Extend the pipeline to automatically transcribe downloaded podcast episodes using Vosk and pydub and summarize them
  • Host downloaded podcast episodes and their summaries on an html page

About

Downloading financial news podcasts and storing them using a data pipeline

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages