Order-Book

A high speed NASDAQ ITCH trade data parser and Limit Order Book that processes ~10.8 million messages per second (~92ns latency).

This project will serve as a playground for exploring C++ performance and efficiency ideas.

Dependencies

Boost.Pool

A memory pool library for C++.

apt-get install libboost-all-dev

sparsehash

std::unordered_map alternative.

glog

Logging library with debug-only logs/asserts (though I have my doubts).

apt-get install libgoogle-glog-dev

Build and Usage

Build with make rel
Run with ./order-book <NASDAQ_ITCH_50_file>

~~For benchmarking, comment out all DLOGs in src/order_book.cpp as DLOG_ASSERT seems to be enforced even with #define NDEBUG.~~ For benchmarking, the DLOG_ASSERT issue was recently resolved, just make sure you have the latest glog build. Otherwise, comment out all DLOG*s for benchmarking (:%s/DLOG/\/\/DLOG/ in vim).

Design

The Order Book follows a price/time priority (i.e. best price first, then earliest order).

Order books are stored in a hash map keyed by symbol
Levels (limit prices) are stored in a hash map keyed by limit price
Orders are stored in a hash map keyed by reference number

In order to maintain the price/time priority:

Levels are also stored in a sorted tree (so adding orders with new limits is O(logm) where m = # of limits). The tree provides access to Level 1 data in O(1).
Orders of the same limit price form a doubly linked list. This allows for time ordering where new orders are simply appended in O(1). The orders being doubly linked allows for O(1) deletion since the Order can just be referenced from the hash map.

Performance Considerations

ITCH Reader

The ITCH 5.0 data files are +10GB binary files.

to avoid excessive I/O, the files are read in large chunks at a time
Memory is only allocated once and reused when storing these chunks

The file reader itself processes +100 million messages per second and has very little overhead.

aanrv@debianssd:~/Documents/Order-Book$ ./order-book spec/12302019.NASDAQ_ITCH50  
Processing spec/12302019.NASDAQ_ITCH50
processed 268744780 messages (8251407909 bytes) in 2052 milliseconds

Order Book

The Order Book takes up the majority of the application's runtime.

The bulk of the operations are Adds and Deletes with Replaces at a distant 3rd, so this is where the optimizations were focused

aanrv@debianssd:~/Documents/Order-Book$ ./order-book spec/12302019.NASDAQ_ITCH50  
Processing spec/12302019.NASDAQ_ITCH50
Add: 118631456
Exc: 5822741
Cnl: 2787676
Del: 114360997
Rpl: 21639067

To avoid the overhead costs of allocating/deallocating orders hundreds of millions of times, Boost's memory pool (Boost.Pool) was used. Seems to have removed memory management as a major bottleneck.

Levels require a sorted tree so that L1 info is available in O(1), so adding an order with a new limit is O(logm). But # of limits < # of orders by far (average of ~5 orders per limit), and this doesn't seem to be a bottleneck based on profiling.

aanrv@debianssd:~/Documents/Order-Book$ ./order-book spec/12302019.NASDAQ_ITCH50  
Processing spec/12302019.NASDAQ_ITCH50
# Levels: 30113162
# Orders: 140270523

The bulk of the time was spent on hash map insertion/deletion operations. Replacing std::unordered_map with google::dense_hash_map yielded a >30% decrease in runtime. Tested on an i7-9750H 2.6 GHz laptop in VMWare:

aanrv@debianssd:~/Documents/Order-Book$ ./order-book spec/12302019.NASDAQ_ITCH50  
Using: std::unordered_map
Processing spec/12302019.NASDAQ_ITCH50
processed 268744780 messages (8251407909 bytes) in 216493 milliseconds

aanrv@debianssd:~/Documents/Order-Book$ ./order-book spec/12302019.NASDAQ_ITCH50
Using: google::dense_hash_map
Processing spec/12302019.NASDAQ_ITCH50
processed 268744780 messages (8251407909 bytes) in 148342 milliseconds

Current optimized measurements show a latency of ~92ns:

aanrv@debianvm:~/Documents/Order-Book$ ./order-book spec/12302019.NASDAQ_ITCH50
Processing spec/12302019.NASDAQ_ITCH50
processed 268744780 messages (8251407909 bytes) in 24844 milliseconds

Further Improvements

ITCH file is read in chunks and is fairly fast, but look into memory mapped files as well
object_pool was used as the memory pool. Look into other memory pools which may be for suitable for this use case (e.g. where I can allocate memory in advance)
Perhaps there is a faster std::map alternative
Profiling still shows hash map insertions/deletions to be the largest bottleneck:

Even though switching to google::dense_hash_map significantly reduced runtime, look into further alternatives. Perhaps the most significant improvement would come from finding a way to use a contiguous container rather than a map/tree. This idea is discussed in detail here.

Name		Name	Last commit message	Last commit date
Latest commit History 133 Commits
include		include
spec		spec
src		src
LICENSE		LICENSE
README.md		README.md
makefile		makefile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Order-Book

Dependencies

Build and Usage

Design

Performance Considerations

ITCH Reader

Order Book

Further Improvements

Resources

About

Releases

Packages

Languages

License

aanrv/Order-Book

Folders and files

Latest commit

History

Repository files navigation

Order-Book

Dependencies

Build and Usage

Design

Performance Considerations

ITCH Reader

Order Book

Further Improvements

Resources

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages