The repository is organized as follows:
dataset/
: dataset foldersynthetic_company
: Data of the synthetic company;synthetic_transaction
: Data of the synthetic transaction;real_company
: Data of the real-world company;real_transaction
: Data of the real-world transaction;
model/
: model folderDualFraud.ipynb
: data, model and training/testing code for DualFraud and DualFraud-S;
utils/
: functions foldersynthetic_structsim.py
: generate structures for the synthetic dataset;featgen.py
: generate features for nodes in the synthetic dataset;
Detailed structure in DualFraud.ipynb
:
-
Set up
: required python packages -
Data
: Data generation and processSynthetic Data
: Synthetic data generation;Real-World Data
: Data process for real-world data;
-
Model
: Model and train/test codeParameters
: parameters of the model;Layer
: Layer definition including GraphConvLayer, MLPAttentionNetwork and BiLSTMAttentionNetwork;Model
: Model definition of DualFraud;GNNExplainer
: GNNExplainer model from Pytorch geometric;DualFraud: Train/Test
: Training and testing code for DualFraud;DualFraud-S: Train/Test
: Training and testing code for DualFraud-S;Explainer
: explainer component to generate explainations for enterprises and transations.
We build Synthetic and Real-world datasets for experiments:
Nodes(Fraud%) | #Features | Relation | #Edges | |
---|---|---|---|---|
Synthetic Enterprise | 1143(10.5%) | 8 | C-C | 1181 |
Synthetic Transaction | 4575(10.5%) | 8 | T-T | 4749 |
Real-world Enterprise(HAT) | 13489(26.4%) | 89 | C-I-C C-C C-M-C ALL |
53874 15908 139413 209195 |
Real-world Transaction(BankSim) | 50000(1.2%) | 23 | T-A-T | 206666 |
You can download the project and and run the program as follows:
- Unzip datasets in the dataset folder
\dataset
- Install the required packages using the
requirements.txt
;
pip install -r requirements.txt
- Run code in
DualFraud.ipynb
to run DualFraud.
* To run the code, you need to have at least Python 3.6 or later versions.
To run DualFraud on your datasets, you need to prepare the following data for enterprises and transactions separately:
- Multiple-single relation graphs with the same nodes where each graph is stored in adjacency matrix format;
- An array with node labels;
- A node feature matrix.