Unleash the power of real-time COVID-19 Tweet analysis with this microservices ETL architecture. Built on Spring Cloud Stream and Apache Kafka, this project is your gateway to ingesting, processing, and visualizing tweets about the pandemic.
Project developed to practice what I have learned in the Udemy course Apache Kafka Series - Learn Apache Kafka for Beginners v2.
The tech stack includes Spring Boot 2.3.2, Apache Maven 3.6.3, Spring Cloud Stream, Elasticsearch, Kibana, and more, all running as Docker containers.
Explore the project, visualize COVID-19 tweet data, and analyze sentiment and trending terms with ease. For more detailed information, check out our Medium article.
Thank you for visiting the Covid Tweets ETL Architecture GitHub repository! Stay informed and empowered with real-time Tweet analysis. 📈🦠🔍
-
covid-tweets-api
Spring Boot
Web Java application that allows to retrieve and view the tweets processed through aREST API
orSTOMP over WebSocket
. -
covid-tweets-collector
Spring Boot
Web Java application that listens to news messages inprocessed-tweets
topic inKafka
, saves them inElasticsearch
. -
covid-tweets-ingest
Spring Boot
Web Java application that implement a Twitter client that receives the latest tweets about COVID-19, creates the data model associated with the tweet, and posts it to the topictweets-ingest
inKafka
. -
covid-tweets-processor
Spring Boot
Web Java application that listens to news messages intweets-ingest
topic inKafka
and it make the analysis of the text through the analysis service implemented onStandford Core NLP
.
- Spring Boot 2.3.2 / Apache Maven 3.6.3.
- Spring Cloud Stream (to build highly scalable event-driven applications connected with shared messaging systems)
- Spring Cloud Starter Stream Kafka.
- lombok.
- Twitter4j Stream.
- Mapstruct.
- Elasticsearch oss 7.6.2.
- Spring Boot Starter Data Elasticsearch.
- kibana oss 7.6.2.
- Spring Boot Starter Web.
- Springdoc Openapi UI.
- Spring Boot Starter Websocket.
- Stanford Corenlp.
The available tasks are detailed below (rake --task)
Task | Description |
---|---|
check_deployment_file_task | Check Deployment File |
check_docker_task | Check Docker and Docker Compose Task |
cleaning_environment_task | Cleaning Evironment Task |
deploy | Deploys the Covid Tweets Architecture and laun... |
login | Authenticating with existing credentials |
start | Start Containers |
status | Status Containers |
stop | Stop Containers |
undeploy | UnDeploy Covid Tweets Architecture |
To start the platform make sure you have Ruby installed, go to the root directory of the project and run the rake deploy
task, this task will carry out a series of preliminary checks, discard images and volumes that are no longer necessary and also proceed to download all the images and the initialization of the containers.
Also make sure to define your own credentials in the twitter4j.properties
file
oauth.consumerKey=YOUR_CONSUMER_KEY
oauth.consumerSecret=YOUR_CONSUMER_SECRET
oauth.accessToken=YOUR_ACCESS_TOKEN
oauth.accessTokenSecret=YOUR_ACCESS_TOKEN_SECRET