ML4MILP is the first benchmark dataset specifically designed to test ML-based algorithms for solving MILP problems, consisting of three main components: Similarity Evaluation, Benchmark Datasets, and Baseline Library. Based on this structure, we conducted uniform training and testing of baseline algorithms, followed by a comprehensive evaluation and ranking of the results.
We have meticulously assembled a substantial collection of mixed integer linear programming (MILP) instances from a variety of sources, including open-source, comprehensive datasets, domain-specific academic papers and competitions related to MILP. Additionally, we generated a substantial number of standard problem instances based on four canonical MILP problems: the Maximum Independent Set (MIS) problem, the Minimum Vertex Covering (MVC) problem, the Set Covering (SC) problem, Balanced Item Placement (BIP) problem, Combinatorial Auctions (CAT) problem, Capacitated Facility Location (CFL) problem, Mixed Integer Knapsack Set (MIKS) problem, Middle-mile Consolidation Problem with Waiting Times (MMCW) problem and Steiner Network Problem with Coverage Constraints (SNPCC) problem. For each type of problem, we generated instances at three levels of difficulty—easy, medium, and hard.
The sizes of each categorized datasets are as follows, and the download links are detailed in ./Benchmark Datasets/README.md
.
Name(Path) | Number of Instances | Avg.Vars | Avg.Constrains |
---|---|---|---|
MIS_easy | 50 | 20000 | 60000 |
MIS_medium | 50 | 100000 | 300000 |
MIS_hard | 50 | 1000000 | 3000000 |
MVC_easy | 50 | 20000 | 60000 |
MVC_medium | 50 | 100000 | 300000 |
MVC_hard | 50 | 1000000 | 3000000 |
SC_easy | 50 | 40000 | 40000 |
SC_medium | 50 | 200000 | 200000 |
SC_hard | 50 | 2000000 | 2000000 |
BIP_easy | 50 | 4081 | 290 |
BIP_medium | 50 | 14182 | 690 |
BIP_hard | 50 | 54584 | 2090 |
CAT_easy | 50 | 2000 | 2000 |
CAT_medium | 50 | 22000 | 22000 |
CAT_hard | 50 | 2000000 | 2000000 |
CFL_easy | 50 | 16040 | 80 |
CFL_medium | 50 | 144200 | 320 |
CFL_hard | 50 | 656520 | 800 |
MIKS_easy | 50 | 5000 | 5000 |
MIKS_medium | 50 | 55000 | 55000 |
MIKS_hard | 50 | 1000000 | 1000000 |
MMCW_easy | 50 | 5760 | 2880 |
MMCW_medium | 50 | 55260 | 27630 |
MMCW_hard | 50 | 253980 | 126990 |
SNPCC_easy | 50 | 3000 | 30 |
SNPCC_medium | 50 | 15000 | 151 |
SNPCC_hard | 50 | 240000 | 2405 |
nn_verification | 3622 | 7144.02 | 6533.58 |
item_placement | 10000 | 1083 | 195 |
load_balancing | 10000 | 61000 | 64307.19 |
anonymous | 138 | 34674.03 | 44498.19 |
HEM_knapsack | 10000 | 720 | 72 |
HEM_mis | 10002 | 500 | 1953.48 |
HEM_setcover | 10000 | 1000 | 500 |
HEM_corlat | 1984 | 466 | 486.17 |
HEM_mik | 90 | 386.67 | 311.67 |
vary_bounds_s1 | 50 | 3117 | 1293 |
vary_bounds_s2 | 50 | 1758 | 351 |
vary_bounds_s3 | 50 | 1758 | 351 |
vary_matrix_s1 | 50 | 802 | 531 |
vary_matrix_rhs_bounds_s1 | 50 | 27710 | 16288 |
vary_matrix_rhs_bounds_obj | 50 | 7973 | 3558 |
vary_obj_s1 | 50 | 360 | 55 |
vary_obj_s2 | 50 | 745 | 26159 |
vary_obj_s3 | 50 | 9599 | 27940 |
vary_rhs_s1 | 50 | 12760 | 1501 |
vary_rhs_s2 | 50 | 1000 | 1250 |
vary_rhs_s3 | 50 | 63009 | 507 |
vary_rhs_s4 | 50 | 1000 | 1250 |
vary_rhs_obj_s1 | 50 | 90983 | 33438 |
vary_rhs_obj_s2 | 50 | 4626 | 8274 |
Aclib | 99 | 181 | 180 |
Coral | 279 | 18420.92 | 11831.01 |
Cut | 14 | 4113 | 1608.57 |
ECOGCNN | 44 | 36808.25 | 58768.84 |
fc.data | 20 | 571 | 330.5 |
MIPlib | 50 | 7719.98 | 6866.04 |
Nexp | 77 | 9207.09 | 7977.14 |
Transportation | 32 | 4871.5 | 2521.467 |
MIPLIB_collection_easy | 649 | 119747.4 | 123628.3 |
MIPLIB_collection_hard | 107 | 96181.4 | 101135.8 |
MIPLIB_collection_open | 204 | 438355.9 | 258599.5 |
MIRPLIB_Original | 72 | 36312.2 | 11485.8 |
MIRPLIB_Maritime_Group1 | 40 | 13919.5 | 19329.25 |
MIRPLIB_Maritime_Group2 | 40 | 24639.8 | 34053.25 |
MIRPLIB_Maritime_Group3 | 40 | 24639.8 | 34057.75 |
MIRPLIB_Maritime_Group4 | 20 | 4343.0 | 6336.0 |
MIRPLIB_Maritime_Group5 | 20 | 48330.0 | 66812.0 |
MIRPLIB_Maritime_Group6 | 20 | 48330.0 | 66815.0 |
To validate the effectiveness of the proposed dataset, we organized the existing mainstream methods into a Baseline Library and conducted comparisons using Benchmark Datasets against these mainstream baselines. The algorithms in the Baseline Library are as follows.
Baseline | Code |
---|---|
Gurobi | ./Baseline Library/Gurobi/ |
SCIP | ./Baseline Library/SCIP/ |
Large Neighborhood Searc | ./Baseline Library/LNS/ |
Adaptive Constraint Partition Based Optimization Framework | ./Baseline Library/ACP/ |
Learn to Branch | ./Baseline Library/Learn2Branch/ |
GNN&GBDT-Guided Fast Optimizing Framework | ./Baseline Library/GNN&GBDT/ |
GNN-Guided Predict-and-Search Framework | ./Baseline Library/Predict&Search |
Neural Diving | ./Baseline Library/Neural Diving |
Hybrid Learn to Branch | ./Baseline Library/Hybrid_Learn2Branch |
Graph Neural Networks with Random Feat | ./Baseline Library/GNN_MILP |
The codes are shown in ./Similarity Evaluation/Similarity/Structure Similarity
. The following bash command can be run to calculate the structural embedding similarity of a given dataset containing several instances of MILP in .lp
format.
base_dir="<The dataset folder>"
# Traverse all subfolders under _latest_datasets
find "$base_dir" -type d -name LP | while read lp_dir; do
# Get the directory where the LP is located, i.e. the instance directory
instance_dir=$(dirname "$lp_dir")
# Get the question name
problem_name=$(basename "$instance_dir")
echo "Processing problem: $problem_name, directory: $instance_dir"
# Create the Test0 folder in the instance directory
mkdir -p "$instance_dir/Test0"
# Run MILP_utils.py
python MILP_utils.py --mode=model2data \
--input_dir="$lp_dir" \
--output_dir="$instance_dir/Test0" \
--type=direct
# Run graph_statistics.py
python graph_statistics.py --input_dir="$instance_dir/Test0" \
--output_file="$instance_dir/statistics"
# Run calc_sim.py and output the results
python calc_sim.py --input_file1="$instance_dir/statistics" > "$instance_dir/result.txt"
done
The codes are shown in ./Similarity Evaluation/Similarity/Neural Embedding Similarity
. The following bash command can be run to calculate the neural embedding similarity of a given dataset containing several instances of MILP in .lp
format.
special_dir="<Your Dataset Folder>"
find "$special_dir" -type d -name LP | while read lp_dir; do
process_instance "$lp_dir"
done
process_instance() {
lp_dir=$1
# Get the directory where the LP is located, i.e. the instance directory
instance_dir=$(dirname "$lp_dir")
# Get the question name
problem_name=$(basename "$instance_dir")
echo "Processing problem: $problem_name, directory: $instance_dir"
# Create the Test0 folder in the instance directory
mkdir -p "$instance_dir/Test0"
# Run MILP_utils.py
python MILP_utils.py --mode=model2data \
--input_dir="$lp_dir" \
--output_dir="$instance_dir/Test0" \
--type=direct
# Run inference.py to inference
python src/inference.py --dataset=MILP \
--cfg_path=./experiments/configs/test.yml \
--seed=1 \
--device=2 \
--model_path=./experiments/weights/encoder.pth \
--input_dir="$instance_dir/Test0" \
--output_file="$instance_dir/embedding" \
--filename="$instance_dir/namelist" || echo "Inference failed"
# Run calc_sim.py and output the results
python calc_sim.py --input_file1="$instance_dir/embedding" > "$instance_dir/result_embedding.txt"
}
The codes are shown in ./Similarity Evaluation/Classification
. The following bash command can be run to build, train, inference and cluster. For classification, we can use our trained model and just run build, inference and cluster in turn.
case $1 in
build|train|inference|cluster)
echo "Valid argument: $1"
;;
*)
echo "Invalid argument: $1. Please enter correct choice."
exit 1
;;
esac
if [ "$1" = "build" ]; then
python src/MILP_utils.py --mode=model2data --input_dir=dataset/model --output_dir=dataset/data --type=direct
fi
if [ "$1" = "train" ]; then
python src/train.py --dataset=MILP --cfg_path=experiments/configs/test.yml --seed=1 --device=0 --model_path=experiments/weights/encoder.pth --dataset_path=dataset/data || echo "Training failed"
fi
if [ "$1" = "inference" ]; then
python src/inference.py --cfg_path=experiments/configs/test.yml --seed=1 --device=0 --model_path=experiments/weights/encoder.pth --input_dir="<Your Dataset Folder>" --output_file=tmp.pkl --filename=namelist.pkl || echo "Inference failed"
fi
if [ "$1" = "cluster" ]; then
python src/clustering.py --filename=namelist.pkl --input_file=tmp.pkl
fi