Source code for SliceNStitch, described in the paper SliceNStitch: Continuous CP Decomposition of Sparse Tensor Streams by Taehyung Kwon*, Inkyu Park*, Dongjin Lee, and Kijung Shin, to be presented at ICDE 2021.
SliceNStitch is an algorithm for continous CANDECOMP/PARAFAC (CP) decomposition, which has numerous time-critical applications. It has the following properties:
- Any time: updates factor matrices immediately without having to wait until the current time period ends
- Fast: with constant-time updates up to 464x faster than online methods
- Accurate: with fitness comparable (specifically, 72 - 100%) to offline methods.
Please see supplementary.
All parsed datasets are available at this link. The source of each dataset is listed below.
Name | Structure | Size | # Non-zeros | Source |
---|---|---|---|---|
Divvy Bikes | sources x destinations x time (minutes) | 673 x 673 x 525594 | 3.82M | Link |
Chicago Crime | communities x crime types x time (hours) | 77 x 32 x 148464 | 5.33M | Link |
New York Taxi | sources x destinations x time (seconds) | 265 x 265 x 5184000 | 84.39M | Link |
Ride Austin | sources x destinations x colors x time (minutes) | 219 x 219 x 24 x 285136 | 0.89M | Link |
- C/C++17 compiler
- CMake
- Git
- Eigen (https://gitlab.com/libeigen/eigen)
- yaml-cpp (https://github.com/jbeder/yaml-cpp)
git clone --recursive https://github.com/DMLab-Tensor/SliceNStitch.git
To generate the build system, run the following command on your terminal:
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=RELEASE
After that, you can build the program using the build automation software (e.g. Make, Ninja).
To test the algorithms in the paper, run the following command on your terminal:
./SliceNStitch [config_file]
# test.yaml
data:
filePath: "nyt_2019.csv" # Path of the data stream file
nonTempDim: [265, 265] # Non-temporal dimension
tempDim: 5184000 # Temporal length of data stream
tensor:
unitNum: 10 # The number of indices in the time mode (W)
unitSize: 3600 # Period (T)
rank: 20 # Rank of CPD (R)
algorithm:
name: "SNS_RND+"
settings: # Details are described in the next section
numSample: 20
clipping: 1000.0
test:
outputPath: "out.txt" # Path of the output file
startTime: 0 # Starting time of the input data to be processed
numEpoch: 180000 # The number of epochs
checkPeriod: 1 # Period of printing the error
updatePeriod: 1 # Period of updating the tensor
random: true # Randomness of the initial points
algorithm:
name: "ALS"
settings:
algorithm:
name: "SNS_MAT"
settings:
numIter: 1
algorithm:
name: "SNS_VEC"
settings:
algorithm:
name: "SNS_VEC+"
settings:
clipping: 1000.0 # Clipping value (η)
algorithm:
name: "SNS_RND"
settings:
numSample: 20 # Threshold (θ)
algorithm:
name: "SNS_RND+"
settings:
numSample: 20 # Threshold (θ)
clipping: 1000.0 # Clipping value (η)
Input (data:filePath in a config file) must be a CSV file that consists of a multi-aspect data stream. Each row of the file is a single event and the file should be formatted as follows.
For a CSV file with N columns,
- First (N-2) columns represent the non-temporal indices of events
- The (N-1)th column represents the time indices of events
- The last column represents the values of events
The output (test:outputPath) of the code will be:
-----Test Results-----
RMSE Fitness
0.33438 0.675436
0.329659 0.639608
...
0.37203 0.692263
-----The Total Number of Updates-----
4030721
-----Total Elapse Time-----
330.706 (sec)
-----Elapse Time per each Update-----
8.20463e-05 (sec)
If you use this code as part of any research, please cite the following paper.
@inproceedings{kwon2021slicenstitch,
title={Slicenstitch: Continuous cp decomposition of sparse tensor streams},
author={Kwon, Taehyung and Park, Inkyu and Lee, Dongjin and Shin, Kijung},
booktitle={2021 IEEE 37th International Conference on Data Engineering (ICDE)},
pages={816--827},
year={2021},
organization={IEEE}
}