Skip to content

build-on-aws/real-time-streaming-analytics-application-using-apache-kafka

Real-time streaming analytics application using Apache Kafka

In today’s fast-paced digital world, real-time streaming analytics has become increasingly important as companies require to understand what customers, application and products are doing right now and react promptly. For example, companies need to analyse data in real-time to continuously monitor an application to ensure high service uptime and personalize promotional offers and product recommendations to customers.

This repository contains the sample code to build a real-time streaming analytics application using Apache Kafka on AWS. It forms the basis of the following tutorial:

The source code builds Apache Flink application and creates the following resources on AWS using the provided AWS CloudFormation:

  • Amazon OpenSearch Cluster
  • Amazon ECS Cluster and a Task definition
  • Amazon Kinesis Data Analytics streaming application
  • Amazon EC2 Instance to serve as Kafka client
  • Amazon EC2 Instance to serve as Nginx proxy
  • Security groups and AWS IAM roles

Requirements

⚙️ Building the Apache Flink application

Follow the following steps to build the Apache Flink application:

  1. Install the following dependencies:
  1. Move to the right directory:
cd flink-clickstream-consumer
  1. Build the Apache Flink application file:
mvn clean package

💡 A file named flink-clickstream-consumer/target/ClickStreamProcessor-1.0.jar will be created. This is your Apache Flink application.

🌩 Deploying into AWS

Once you created built the Flink application, you can deploy the deploy the required resources on AWS to start building your real-time streaming analytics application. The required steps are detailed in the following tutorial:

Security

See CONTRIBUTING for more information.

License

This project is licensed under the MIT-0 License. See the LICENSE file.