Interactive and Reactive Data Science using Scala and Spark.
-
Updated
May 16, 2023 - JavaScript
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Interactive and Reactive Data Science using Scala and Spark.
Includes notes on using Apache Spark in general, notes on using Spark for Physics, how to run TPCDS on PySpark, how to create histograms with Spark, tools for performance testing CPUs, Jupyter notebooks examples for Spark, examples for Oracle and other DB systems.
Notes about Spark Streaming in Apache Spark
Ansible roles to install an Spark Standalone cluster (HDFS/Spark/Jupyter Notebook) or Ambari based Spark cluster
PySpark Tutorial for Beginners - Practical Examples in Jupyter Notebook with Spark version 3.4.1. The tutorial covers various topics like Spark Introduction, Spark Installation, Spark RDD Transformations and Actions, Spark DataFrame, Spark SQL, and more. It is completely free on YouTube and is beginner-friendly without any prerequisites.
JupyterLab extension that enables monitoring launched Apache Spark jobs from within a notebook
Implementation of Spark code in Jupyter notebook. Topics include: RDDs and DataFrame, exploratory data analysis (EDA), handling multiple DataFrames, visualization, Machine Learning
Apache Zeppelin notebooks for Recommendation Engines using Keras and Machine Learning on Apache Spark
An image for running Jupyter notebooks and Apache Spark in the cloud on OpenShift
Tutorial for exploring FHIR data with Apache Spark in an interactive notebook
Pyspark Notebook With Docker
Zeppelin notebook online
Notebook de las clases de 75-06 Organización de Datos - FIUBA
Reusable Python classes that extend open source PySpark capabilities. Examples of implementation is available under notebooks of repo https://github.com/bennyaustin/synapse-dataplatform
PySpark notebooks to learn Apache Spark (WIP)
The implementation of Apache Spark (combine with PySpark, Jupyter Notebook) on top of Hadoop cluster using Docker
Low Humid Day Predication from daily weather data using SprakSQL on a Jupyter notebook and KNIME workspace
One Click deployment of Notebooks - Bringing Notebooks to Production
Template for Spark Data Science Projects
This is the final project I had to do to finish my Big Data Expert Program in U-TAD in September 2017. It uses the following technologies: Apache Spark v2.2.0, Python v2.7.3, Jupyter Notebook (PySpark), HDFS, Hive, Cloudera Impala, Cloudera HUE and Tableau.
Created by Matei Zaharia
Released May 26, 2014