JsonSpark

This package is meant to give a python simplicity and feel to pyspark while handling json files.

It is very simple to use and doesn't need extra information if you are using python.

Installation

pip install jsonSpark

Import the package
import jsonSpark
Pass the pyspark json file object
df = sql.read.json("filename", multiLine=True) # or get from S3 bucket
Create a JsonSpark object.
df = jsonSpark(df)
See the schema if you wish.
df.printSchema()
Display the Data
df.show()
Use it as python dictionary
df["key1"]["key2"]["key3"]["key4"].show()
You can use the pyspark functions by converting the object back to pyspark object.
pysparkObject = df._toDF()

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
jsonSpark		jsonSpark
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py