Programs conducted at Army Institute of Technology, Pune in training on Big Data Analytics during September 2024.
-
Updated
Sep 5, 2024 - Java
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Programs conducted at Army Institute of Technology, Pune in training on Big Data Analytics during September 2024.
REST web service for the true real-time scoring (<1 ms) of Scikit-Learn, R and Apache Spark models
This repo contains examples of high throughput ingestion using Apache Spark and Apache Iceberg. These examples cover IoT and CDC scenarios using best practices. The code can be deployed into any Spark compatible engine like Amazon EMR Serverless or AWS Glue. A fully local developer environment is also provided.
OSM planet dump high performance data loader. Transform OpenStreetMap World/Region PBF dump into partitioned by H3 regions PostGIS pgsnapshot (lossless) OSM schema representation and/or into ArrowIPC/Parquet dumps
A connector for Apache Spark to access Exasol
Apache Spark Capstone project
REST API for Apache Spark on K8S or YARN
A Java based project aims to extract news articles from large .sgm file, process them and load them into MongoDB Database. It includes an Apache Spark job for word frequency analysis directly from .sgm files, and a sentiment analysis implementation using a Bag-of-Words model in Java.
Data transformation framework for ETL processing with SQL-like syntax and GIS extensions, based on Apache Spark
Apache Spark based 'Dist' utility to supplement Data Cooker ETL tool
a suite of benchmark applications for distributed data stream processing systems
A comprehensive repository showcasing data engineering solutions using Apache Spark. Includes ETL pipelines, data transformations, and performance optimization techniques. Ideal for those looking to enhance their skills in big data processing with Spark.
The Proxima platform.
Student projects in Big Data field.
Project for the course of Cloud Computing (2024)
Oauth2/OIDC Authentication filter for Apache Spark Apps/History UIs
A real-time cryptocurrency data streaming pipeline.
Apache Spark based framework for analysis A/B experiments
Common library for Exasol Apache Spark based connectors
Created by Matei Zaharia
Released May 26, 2014