Skip to content

maximveksler/awesome-serialization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

awesome-serialization Awesome

So many formats, so little bandwidth. See wesm comment before making up your mind.

Contents

API

Serialization suitible for API RPC networked services.

  • CSV - Comma Separated Values. Textual.
  • JSON - Lightweight document data-interchange format. Textual.
  • JSONL - Schemeless "multiple JSON documents in 1 file" container data format. Textual.
  • Thrift - Scalable code generation, schema evolution binary format. Binary.
  • Protocol Buffers - Google's data interchange format. Binary.
  • Message Pack - Efficient JSON-like binary serialization format. Binary.
  • bson - Binary schemeless JSON encoding. Binary.
  • XML - Extensible Markup Language. Genuinely Horrible. Textual.
  • Plist - Property List representation. Apple. Textual.
  • YAML - Identation based data serialization standard. Textual.
  • TOML - Tom's Obvious, Minimal Language. Textual.

RPC

  • gRPC - A high-performance, open source universal RPC framework. Binary, ISO Layer 7.
  • RSocket - Application protocol providing Reactive Streams semantics. Binary, ISO Layer 5 (or 6).

Big Data

Serialization suitble for big data at rest systems, from Hadoop family of solutions.

  • Parquet - Columnar storage for Hadoop workloads. Binary.
  • FlatBuffers - Protocol Buffers suitible for larger datasets. Binary.
  • ORC - Columnar storage for Hadoop workloads. Binary.
  • Avro - Scheme embedded, dynamic rich data structures. Textual/Binary.
  • Ion - Row storage with skip scan parsing. Structed, schema embedded. Amazon. Textual/Binary.

Scientific

Large scale sparse arrays used in physical, mathematics and statistics research.

  • HDF5® - n-dimensional datasets, complex objects, with schema. Efficient I/O. Binary.
  • npy - Numpy arrays, cell sparse metadata. Binary.

Machine Learning

Serialization of deep learning networks and weights.

  • SavedModel - TensorFlow package, weights, graph, executable code. Binary.
  • GraphDef - TensorFlow graphs. Binary. Deprecated.
  • ONNX - The open standard for machine learning interoperability
  • safetensors - Simple, safe way to store and distribute tensors

Graph

Serialization formats for representing graph oriented data structures.

  • json-ld - JSON for Linking Data. Textual.
  • Turtle - Terse RDF Triple Language. Textual.

Workflow

Language specific

Language native serialization formats for in transit data (aka memory based "live" objects)

Dart

Python

  • pickle - Ram to Disk serialization. Binary.
  • msgpack-python - MessagePack serializer implementation for Python.
  • srsly - Modern high-performance serialization utilities for Python.

Swift

Java

Academic

Research papers discussing types, category theory, benchamrks and co.

  • Type theory - studies types, which informally are attributes that objects can possess.
  • Category theory - General theory of functions. Axiomatic foundation for mathematics, as an alternative to set theory.

Contribute

Contributions welcome! Read the contribution guidelines first.

License

CC0

To the extent possible under law, Maxim Veksler has waived all copyright and related or neighboring rights to this work.

About

Data formats useful for API, Big Data, ML, Graph & co

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published