Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.
-
Updated
Jul 2, 2024 - Java
Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.
Flink CDC is a streaming data integration tool
BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every day.
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
Compiler for streaming data pipelines and data microservices with configurable engines.
Kafka Streams made easy with a YAML file
cron replacement to schedule complex data workflows
Data pipeline using Apache Kafka, Apache Spark and HDFS
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
Toolkit for describing data transformation pipelines by compositing simple reusable components.
An end to end data pipeline with Kafka Spark Streaming Integration
LinkedIn's previous generation Kafka to HDFS pipeline.
⚡ 数据集成 | DataLink is a lightweight data integration framework build on top of DataX, Spark and Flink
Data-processing and common libraries used in main project, all available under Apache 2.0
Real Time Data Streaming Pipeline
CS502Capstone
Efficiently captures real-time Wikimedia data, like a newsroom for Wikipedia changes. Uses microservices, Kafka, and Spring Boot for reliability and scalability. Ideal for research and analysis.
This is Kafka-Elastic Search pipeline for storing and analyzing server health logs
Realtime metrics calculation pipeline using kafka, elasticsearch and kibana.
Add a description, image, and links to the data-pipeline topic page so that developers can more easily learn about it.
To associate your repository with the data-pipeline topic, visit your repo's landing page and select "manage topics."