data-pipeline

Here are 646 public repositories matching this topic...

airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

Updated Jul 3, 2024
Python

apache / flink-cdc

Star

Flink CDC is a streaming data integration tool

mysql real-time kafka etl postgresql distributed batch data-integration schema-evolution elt flink cdc data-pipeline change-data-capture paimon

Updated Jul 3, 2024
Java

snowplow / snowplow

Star

The leader in Next-Generation Customer Data Infrastructure

data analytics snowplow data-collection data-pipeline product-analytics marketing-analytics snowplow-pipeline snowplow-events

Updated May 31, 2024
Scala

GoogleCloudPlatform / data-science-on-gcp

Star

Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017

data-science machine-learning data-visualization data-engineering cloud-computing data-analysis data-processing data-pipeline

Updated May 1, 2024
Jupyter Notebook

adilkhash / Data-Engineering-HowTo

Star

A list of useful resources to learn Data Engineering from scratch

distributed-systems scala cloud-providers data-engineering data-pipeline

Updated Jun 19, 2024

kestra-io / kestra

Star

Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.

workflow data pipeline etl workflow-engine scheduler orchestration data-engineering data-integration elt data-pipeline data-quality low-code data-orchestration data-orchestrator reverse-etl

Updated Jul 3, 2024
Java

BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every day.

real-time big-data high-performance data-lake data-integration flink data-synchronization data-pipeline

Updated Jan 1, 2024
Java

rudderlabs / rudder-server

Star

Privacy and Security focused Segment-alternative, in Golang and React

Updated Jul 3, 2024
Go

apache / seatunnel-web

Star

SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).

real-time offline high-performance apache data-integration sql-engine data-pipeline etl-framework seatunnel

Updated Jun 18, 2024
Java

superstreamlabs / memphis

Star

Memphis.dev is a highly scalable and effortless data streaming platform

kubernetes golang data enrichment microservices schema-registry message-bus message-queue data-engineering data-pipeline message-broker data-streaming data-stream-processing messaging-queue

Updated May 27, 2024
Go

damklis / DataEngineeringProject

Star

Example end to end data engineering project.

python redis elasticsearch airflow kafka big-data mongodb scraping django-rest-framework s3 data-engineering minio kafka-connect hacktoberfest data-pipeline debezium

Updated Dec 8, 2022
Python

pydoit / doit

Star

task management & automation tool

python workflow data-science build-automation task-runner build-tool build-system workflow-management hacktoberfest data-pipeline workflow-automation

Updated May 29, 2024
Python

infoslack / awesome-kafka

Star

A list about Apache Kafka

infrastructure kafka apache-spark stream-processing apache-kafka kafka-streams data-processing data-pipeline streaming-data

Updated Feb 9, 2024

elementary-data / elementary

Star

The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.

bigquery snowflake data-warehouse dataops data-analysis redshift dbt data-pipelines data-pipeline lineage data-governance data-lineage analytics-engineer dbt-packages data-observability data-reliability dbt-artifacts

Updated Jul 3, 2024
HTML

reugn / go-streams

Star

A lightweight stream processing library for Go

Updated Jun 22, 2024
Go

whylabs / whylogs

Star

An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collection, ensuring safety & robustness. 📈

python data-science machine-learning analytics logging constraints dataset dataops data-pipeline data-quality calculate-statistics data-constraints mlops model-performance ml-pipelines ai-pipelines approximate-statistics statistical-properties

Updated Jul 2, 2024
Jupyter Notebook

AgnostiqHQ / covalent

Star

Pythonic tool for orchestrating machine-learning/high performance/quantum-computing workflows in heterogeneous compute environments.

Updated Jun 18, 2024
Python

sspaeti-com / practical-data-engineering

Star

Practical Data Engineering: A Hands-On Real-Estate Project Guide

data-engineering data-pipeline dagster

Updated Mar 18, 2024
Jupyter Notebook

confluentinc / learn-kafka-courses

Star

Learn the basics of Apache Kafka® from leaders in the Kafka community with these video courses covering the Kafka ecosystem and hands-on exercises.

kafka stream-processing apache-kafka kafka-streams data-pipelines data-pipeline ksqldb

Updated Oct 3, 2023
Shell

airscholar / e2e-data-engineering

Star

An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.

docker big-data cassandra apache-spark data-storage postgresql data-engineering apache-kafka data-processing data-pipeline real-time-analytics containerization apache-zookeeper apache-airflow etl-pipeline

Updated Oct 5, 2023
Python

Improve this page

Add a description, image, and links to the data-pipeline topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-pipeline topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data-pipeline

Here are 646 public repositories matching this topic...

airbytehq / airbyte

apache / flink-cdc

snowplow / snowplow

GoogleCloudPlatform / data-science-on-gcp

adilkhash / Data-Engineering-HowTo

kestra-io / kestra

bytedance / bitsail

rudderlabs / rudder-server

apache / seatunnel-web

superstreamlabs / memphis

damklis / DataEngineeringProject

pydoit / doit

infoslack / awesome-kafka

elementary-data / elementary

reugn / go-streams

whylabs / whylogs

AgnostiqHQ / covalent

sspaeti-com / practical-data-engineering

confluentinc / learn-kafka-courses

airscholar / e2e-data-engineering

Improve this page

Add this topic to your repo