Sub-Domain : Distributed Big Data Analysis, Distributed Big Data Analytics, Multi-market Analysis, Anomaly Detection.
- Developed architecture for preprocessing and analysis of 7 years of historical US stock market data (50 TB).
- Extracted information from raw data files based on field length specifications available on US Stock Exchange for multiple years and file formats with nanosecond granularity.
- Preprocessed and analyzed data on multiple clusters with Apache Spark for reducing time complexity.
- Conducted stock market data analysis (multi-market analysis for market dominance) and anomaly detection (Flash crash day - May 6, 2010, and August 24, 2015) and generated visualization and report.
- Proposed using unsupervised learning/clustering on large-scale unlabeled stock market data for anomaly detection and general market analysis in absence of labels.
Current Version : v1.0.0.0
Last Update : 07.31.2017