Skip to content
/ ML4MILP Public

ML4MILP: the first benchmark dataset specifically designed to test ML-based algorithms for solving MILP problems

Notifications You must be signed in to change notification settings

thuiar/ML4MILP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ML4MILP: A Benchmark Dataset for Machine Learning-based Mixed-Integer Linear Programming

Overview

ML4MILP is the first benchmark dataset specifically designed to test ML-based algorithms for solving MILP problems, consisting of three main components: Similarity Evaluation, Benchmark Datasets, and Baseline Library. Based on this structure, we conducted uniform training and testing of baseline algorithms, followed by a comprehensive evaluation and ranking of the results.

Framework

Benchmark Datasets

We have meticulously assembled a substantial collection of mixed integer linear programming (MILP) instances from a variety of sources, including open-source, comprehensive datasets, domain-specific academic papers and competitions related to MILP. Additionally, we generated a substantial number of standard problem instances based on four canonical MILP problems: the Maximum Independent Set (MIS) problem, the Minimum Vertex Covering (MVC) problem, the Set Covering (SC) problem, Balanced Item Placement (BIP) problem, Combinatorial Auctions (CAT) problem, Capacitated Facility Location (CFL) problem, Mixed Integer Knapsack Set (MIKS) problem, Middle-mile Consolidation Problem with Waiting Times (MMCW) problem and Steiner Network Problem with Coverage Constraints (SNPCC) problem. For each type of problem, we generated instances at three levels of difficulty—easy, medium, and hard.

The sizes of each categorized datasets are as follows, and the download links are detailed in ./Benchmark Datasets/README.md.

Name(Path) Number of Instances Avg.Vars Avg.Constrains
MIS_easy 50 20000 60000
MIS_medium 50 100000 300000
MIS_hard 50 1000000 3000000
MVC_easy 50 20000 60000
MVC_medium 50 100000 300000
MVC_hard 50 1000000 3000000
SC_easy 50 40000 40000
SC_medium 50 200000 200000
SC_hard 50 2000000 2000000
BIP_easy 50 4081 290
BIP_medium 50 14182 690
BIP_hard 50 54584 2090
CAT_easy 50 2000 2000
CAT_medium 50 22000 22000
CAT_hard 50 2000000 2000000
CFL_easy 50 16040 80
CFL_medium 50 144200 320
CFL_hard 50 656520 800
MIKS_easy 50 5000 5000
MIKS_medium 50 55000 55000
MIKS_hard 50 1000000 1000000
MMCW_easy 50 5760 2880
MMCW_medium 50 55260 27630
MMCW_hard 50 253980 126990
SNPCC_easy 50 3000 30
SNPCC_medium 50 15000 151
SNPCC_hard 50 240000 2405
nn_verification 3622 7144.02 6533.58
item_placement 10000 1083 195
load_balancing 10000 61000 64307.19
anonymous 138 34674.03 44498.19
HEM_knapsack 10000 720 72
HEM_mis 10002 500 1953.48
HEM_setcover 10000 1000 500
HEM_corlat 1984 466 486.17
HEM_mik 90 386.67 311.67
vary_bounds_s1 50 3117 1293
vary_bounds_s2 50 1758 351
vary_bounds_s3 50 1758 351
vary_matrix_s1 50 802 531
vary_matrix_rhs_bounds_s1 50 27710 16288
vary_matrix_rhs_bounds_obj 50 7973 3558
vary_obj_s1 50 360 55
vary_obj_s2 50 745 26159
vary_obj_s3 50 9599 27940
vary_rhs_s1 50 12760 1501
vary_rhs_s2 50 1000 1250
vary_rhs_s3 50 63009 507
vary_rhs_s4 50 1000 1250
vary_rhs_obj_s1 50 90983 33438
vary_rhs_obj_s2 50 4626 8274
Aclib 99 181 180
Coral 279 18420.92 11831.01
Cut 14 4113 1608.57
ECOGCNN 44 36808.25 58768.84
fc.data 20 571 330.5
MIPlib 50 7719.98 6866.04
Nexp 77 9207.09 7977.14
Transportation 32 4871.5 2521.467
MIPLIB_collection_easy 649 119747.4 123628.3
MIPLIB_collection_hard 107 96181.4 101135.8
MIPLIB_collection_open 204 438355.9 258599.5
MIRPLIB_Original 72 36312.2 11485.8
MIRPLIB_Maritime_Group1 40 13919.5 19329.25
MIRPLIB_Maritime_Group2 40 24639.8 34053.25
MIRPLIB_Maritime_Group3 40 24639.8 34057.75
MIRPLIB_Maritime_Group4 20 4343.0 6336.0
MIRPLIB_Maritime_Group5 20 48330.0 66812.0
MIRPLIB_Maritime_Group6 20 48330.0 66815.0

Baseline Library

To validate the effectiveness of the proposed dataset, we organized the existing mainstream methods into a Baseline Library and conducted comparisons using Benchmark Datasets against these mainstream baselines. The algorithms in the Baseline Library are as follows.

Baseline Code
Gurobi ./Baseline Library/Gurobi/
SCIP ./Baseline Library/SCIP/
Large Neighborhood Searc ./Baseline Library/LNS/
Adaptive Constraint Partition Based Optimization Framework ./Baseline Library/ACP/
Learn to Branch ./Baseline Library/Learn2Branch/
GNN&GBDT-Guided Fast Optimizing Framework ./Baseline Library/GNN&GBDT/
GNN-Guided Predict-and-Search Framework ./Baseline Library/Predict&Search
Neural Diving ./Baseline Library/Neural Diving
Hybrid Learn to Branch ./Baseline Library/Hybrid_Learn2Branch
Graph Neural Networks with Random Feat ./Baseline Library/GNN_MILP

Similarity Evaluation

Similarity Evaluation Metrics

Structure Embedding

The codes are shown in ./Similarity Evaluation/Similarity/Structure Similarity. The following bash command can be run to calculate the structural embedding similarity of a given dataset containing several instances of MILP in .lp format.

base_dir="<The dataset folder>"

# Traverse all subfolders under _latest_datasets
find "$base_dir" -type d -name LP | while read lp_dir; do
		# Get the directory where the LP is located, i.e. the instance directory
    instance_dir=$(dirname "$lp_dir")  
    # Get the question name
    problem_name=$(basename "$instance_dir")

    echo "Processing problem: $problem_name, directory: $instance_dir"

    # Create the Test0 folder in the instance directory
    mkdir -p "$instance_dir/Test0"

    # Run MILP_utils.py
    python MILP_utils.py --mode=model2data \
        --input_dir="$lp_dir" \
        --output_dir="$instance_dir/Test0" \
        --type=direct

    # Run graph_statistics.py
    python graph_statistics.py --input_dir="$instance_dir/Test0" \
        --output_file="$instance_dir/statistics"

    # Run calc_sim.py and output the results
    python calc_sim.py  --input_file1="$instance_dir/statistics" > "$instance_dir/result.txt"
done
Neural Embedding

The codes are shown in ./Similarity Evaluation/Similarity/Neural Embedding Similarity. The following bash command can be run to calculate the neural embedding similarity of a given dataset containing several instances of MILP in .lp format.

special_dir="<Your Dataset Folder>"


find "$special_dir" -type d -name LP | while read lp_dir; do
    process_instance "$lp_dir"
done

process_instance() {
    lp_dir=$1
    # Get the directory where the LP is located, i.e. the instance directory
    instance_dir=$(dirname "$lp_dir")  
    # Get the question name
    problem_name=$(basename "$instance_dir") 

    echo "Processing problem: $problem_name, directory: $instance_dir"

    # Create the Test0 folder in the instance directory
    mkdir -p "$instance_dir/Test0"

    # Run MILP_utils.py
    python MILP_utils.py --mode=model2data \
        --input_dir="$lp_dir" \
        --output_dir="$instance_dir/Test0" \
        --type=direct
		
		# Run inference.py to inference
    python src/inference.py --dataset=MILP \
    --cfg_path=./experiments/configs/test.yml \
    --seed=1 \
    --device=2 \
    --model_path=./experiments/weights/encoder.pth \
    --input_dir="$instance_dir/Test0" \
    --output_file="$instance_dir/embedding" \
    --filename="$instance_dir/namelist" || echo "Inference failed"

    # Run calc_sim.py and output the results
    python calc_sim.py  --input_file1="$instance_dir/embedding" > "$instance_dir/result_embedding.txt"
}

Classification Algorithm

The codes are shown in ./Similarity Evaluation/Classification. The following bash command can be run to build, train, inference and cluster. For classification, we can use our trained model and just run build, inference and cluster in turn.

case $1 in
  build|train|inference|cluster)
    echo "Valid argument: $1"
    ;;
  *)
    echo "Invalid argument: $1. Please enter correct choice."
    exit 1
    ;;
esac


if [ "$1" = "build" ]; then
    python src/MILP_utils.py --mode=model2data --input_dir=dataset/model --output_dir=dataset/data --type=direct
fi

if [ "$1" = "train" ]; then
    python src/train.py --dataset=MILP --cfg_path=experiments/configs/test.yml --seed=1 --device=0 --model_path=experiments/weights/encoder.pth --dataset_path=dataset/data || echo "Training failed"
fi

if [ "$1" = "inference" ]; then
    python src/inference.py --cfg_path=experiments/configs/test.yml --seed=1 --device=0 --model_path=experiments/weights/encoder.pth --input_dir="<Your Dataset Folder>" --output_file=tmp.pkl --filename=namelist.pkl || echo "Inference failed"
fi

if [ "$1" = "cluster" ]; then
    python src/clustering.py --filename=namelist.pkl --input_file=tmp.pkl
fi

About

ML4MILP: the first benchmark dataset specifically designed to test ML-based algorithms for solving MILP problems

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published