Skip to content
#

apache-spark

spark logo

Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.

Here are 219 public repositories matching this topic...

OSM planet dump high performance data loader. Transform OpenStreetMap World/Region PBF dump into partitioned by H3 regions PostGIS pgsnapshot (lossless) OSM schema representation and/or into ArrowIPC/Parquet dumps

  • Updated Sep 2, 2024
  • Java

A Java based project aims to extract news articles from large .sgm file, process them and load them into MongoDB Database. It includes an Apache Spark job for word frequency analysis directly from .sgm files, and a sentiment analysis implementation using a Bag-of-Words model in Java.

  • Updated Aug 22, 2024
  • Java

Created by Matei Zaharia

Released May 26, 2014

Followers
422 followers
Repository
apache/spark
Website
spark.apache.org
Wikipedia
Wikipedia

Related Topics

hadoop scala