This repo contains code to recreate the experiments in Abstraction Alignment: Comparing Model and Human Conceptual Relationships. Each of the experiments is contained within its own notebook:
cifar_abstraction_alignment.ipynb
--- section 4.1. Interpreting model behavior with abstraction alignmentlanguage_model_abstraction_alignment.ipynb
--- section 4.2. Benchmarking language models’ abstraction alignmentmimic_abstraction_alignment.ipynb
--- section 4.3. Analyzing datasets using abstraction alignment