Resources | Concept-based Interpretable Deep Learning

Relevant Papers and Materials

Below we include a list of works in Concept representation learning, particularly in the areas of Interpretability/Explainability, that are relevant to concept-based interpretable deep learning. We will discuss several of these papers in our tutorial, however we thought that it may be benefitial to write them down in list format for people to access these works more easily. Please keep in mind that this is in no way an exhaustive list of important works within concept learning as this is a fast moving field and we have only so much space we can use here. Nevertheless, we still hope you may find this list helpful if you want to get a sense of where the field is and where it is heading.

Concept Learning Surveys

These are some of the surveys that touch on concept representation learning and its use in interpretable/explainable AI:

2023

Concept-based Explainable Artificial Intelligence: A Survey

Eleonora Poeta , Gabriele Ciravegna , Eliana Pastor , and 2 more authors

arXiv preprint arXiv:2312.12936, 2023

arXiv

2022

Concept embedding analysis: A review

Gesina Schwalbe

arXiv preprint arXiv:2203.13909, 2022

arXiv

2020

Explainable AI: A review of machine learning interpretability methods

Pantelis Linardatos , Vasilis Papastefanopoulos , and Sotiris Kotsiantis

Entropy, 2020

HTML

Various Aspects of XAI

Similarly, there are several key surveys/works that discuss formalisms, definitons, and limitatons of key ideas in the general field of XAI. These works touch upon definitions of what it means to explain a model and on some of the issues of so-called “traditional” XAI approaches (e.g., saliency methods):

2023

Dear XAI community, we need to talk! Fundamental misconceptions in current XAI research

Timo Freiesleben , and Gunnar König

In World Conference on Explainable Artificial Intelligence , 2023

arXiv

2022

The Disagreement Problem in Explainable Machine Learning: A Practitioner’s Perspective

Satyapriya Krishna , Tessa Han , Alex Gu , and 3 more authors

Transactions on Machine Learning Research, 2022

arXiv
How cognitive biases affect XAI-assisted decision-making: A systematic review

Astrid Bertrand , Rafik Belloum , James R Eagan , and 1 more author

In Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society , 2022

HTML

2021

A historical perspective of explainable artificial intelligence

Roberto Confalonieri , Ludovik Coba , Benedikt Wagner , and 1 more author

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2021

HTML
Notions of explainability and evaluation approaches for explainable artificial intelligence

Giulia Vilone , and Luca Longo

Information Fusion, 2021

HTML

2020

Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI

Alejandro Barredo Arrieta , Natalia Dı́az-Rodrı́guez , Javier Del Ser , and 8 more authors

Information fusion, 2020

arXiv

2019

Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead

Cynthia Rudin

Nature machine intelligence, 2019

HTML
Explanations can be manipulated and geometry is to blame

Ann-Kathrin Dombrowski , Maximillian Alber , Christopher Anders , and 3 more authors

Advances in neural information processing systems, 2019

arXiv
Interpretation of neural networks is fragile

Amirata Ghorbani , Abubakar Abid , and James Zou

In Proceedings of the AAAI conference on artificial intelligence , 2019

arXiv

2018

Explaining explanations: An overview of interpretability of machine learning

Leilani H Gilpin , David Bau , Ben Z Yuan , and 3 more authors

In 2018 IEEE 5th International Conference on data science and advanced analytics (DSAA) , 2018

arXiv
Explainable AI: the new 42?

Randy Goebel , Ajay Chander , Katharina Holzinger , and 5 more authors

In International cross-domain conference for machine learning and knowledge extraction , 2018

HTML
Sanity checks for saliency maps

Julius Adebayo , Justin Gilmer , Michael Muelly , and 3 more authors

Advances in neural information processing systems, 2018

arXiv

Supervised Concept Learning

Here we include some relevant works in concept representation learning that assume concept-labels are provided in some manner to learn concept representations from which explanations can be then constructed:

2024

Do Concept Bottleneck Models Respect Localities?

Naveen Raman , Mateo Espinosa Zarlenga , Juyeon Heo , and 1 more author

NeurIPS Workshop on XAI in Action, 2024

arXiv
Understanding inter-concept relationships in concept-based models

Naveen Raman , Mateo Espinosa Zarlenga , and Mateja Jamnik

In Proceedings of the 41st International Conference on Machine Learning , 2024

arXiv
A Neuro-Symbolic Benchmark Suite for Concept Quality and Reasoning Shortcuts

Samuele Bortolotti , Emanuele Marconato , Tommaso Carraro , and 5 more authors

In The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track , 2024

arXiv
Energy-Based Concept Bottleneck Models: Unifying Prediction, Concept Intervention, and Probabilistic Interpretations

Xinyue Xu , Yi Qin , Lu Mi , and 2 more authors

In The Twelfth International Conference on Learning Representations , 2024

arXiv
Concept-Based Interpretable Reinforcement Learning with Limited to No Human Labels

Zhuorui Ye , Stephanie Milani , Fei Fang , and 1 more author

In Workshop on Interpretable Policies in Reinforcement Learning at RLC-2024 , 2024

arXiv
Learning to Intervene on Concept Bottlenecks

David Steinmann , Wolfgang Stammer , Felix Friedrich , and 1 more author

In Forty-first International Conference on Machine Learning , 2024

arXiv
Stochastic Concept Bottleneck Models

Moritz Vandenhirtz , Sonia Laguna , Ričards Marcinkevičs , and 1 more author

In The Thirty-eighth Annual Conference on Neural Information Processing Systems , 2024

arXiv
Beyond concept bottleneck models: How to make black boxes intervenable?

Sonia Laguna Cillero , Ričards Marcinkevičs , Moritz Vandenhirtz , and 1 more author

In 38th Annual Conference on Neural Information Processing Systems (NeurIPS 2024), Vancouver, Canada, December 10-15, 2024 , 2024

arXiv

2023

Concept Gradient: Concept-based Interpretation Without Linear Assumption

Andrew Bai , Chih-Kuan Yeh , Pradeep Ravikumar , and 2 more authors

ICLR, 2023

PDF
Learning to Receive Help: Intervention-Aware Concept Embedding Models

Mateo Espinosa Zarlenga , Katherine M Collins , Krishnamurthy Dvijotham , and 3 more authors

NeurIPS, 2023

arXiv
Label-Free Concept Bottleneck Models

Tuomas Oikarinen , Subhro Das , Lam M Nguyen , and 1 more author

ICLR, 2023

arXiv
Post-hoc concept bottleneck models

Mert Yuksekgonul , Maggie Wang , and James Zou

ICLR, 2023

arXiv
Probabilistic Concept Bottleneck Models

Eunji Kim , Dahuin Jung , Sangha Park , and 2 more authors

ICML, 2023

arXiv
Probabilistic Concept Bottleneck Models

Eunji Kim , Dahuin Jung , Sangha Park , and 2 more authors

ICML, 2023

arXiv
Understanding and enhancing robustness of concept-based models

Sanchit Sinha , Mengdi Huai , Jianhui Sun , and 1 more author

In Proceedings of the AAAI Conference on Artificial Intelligence , 2023

arXiv
Concept correlation and its effects on concept-based models

Lena Heidemann , Maureen Monnet , and Karsten Roscher

In Proceedings of the ieee/cvf winter conference on applications of computer vision , 2023

PDF
Towards robust metrics for concept representation evaluation

Mateo Espinosa Zarlenga , Pietro Barbiero , Zohreh Shams , and 4 more authors

In Proceedings of the AAAI Conference on Artificial Intelligence , 2023

arXiv
Interpretability is in the mind of the beholder: A causal framework for human-interpretable representation learning

Emanuele Marconato , Andrea Passerini , and Stefano Teso

Entropy, 2023

arXiv
A closer look at the intervention procedure of concept bottleneck models

Sungbin Shin , Yohan Jo , Sungsoo Ahn , and 1 more author

In International Conference on Machine Learning , 2023

arXiv
Interactive concept bottleneck models

Kushal Chauhan , Rishabh Tiwari , Jan Freyberg , and 2 more authors

In Proceedings of the AAAI Conference on Artificial Intelligence , 2023

arXiv

2022

Concept activation regions: A generalized framework for concept-based explanations

Jonathan Crabbé , and Mihaela Schaar

Advances in Neural Information Processing Systems, 2022

arXiv
Glancenets: Interpretable, leak-proof concept-based models

Emanuele Marconato , Andrea Passerini , and Stefano Teso

Advances in Neural Information Processing Systems, 2022

arXiv
Concept embedding models: Beyond the accuracy-explainability trade-off

Mateo Espinosa Zarlenga , Pietro Barbiero , Gabriele Ciravegna , and 8 more authors

Advances in Neural Information Processing Systems, 2022

PDF
Addressing leakage in concept bottleneck models

Marton Havasi , Sonali Parbhoo , and Finale Doshi-Velez

Advances in Neural Information Processing Systems, 2022

HTML
Learning from uncertain concepts via test time interventions

Ivaxi Sheth , Aamer Abdul Rahman , Laya Rafiee Sevyeri , and 2 more authors

In Workshop on Trustworthy and Socially Responsible Machine Learning, NeurIPS 2022 , 2022

HTML

2021

Do concept bottleneck models learn as intended?

Andrei Margeloiu , Matthew Ashman , Umang Bhatt , and 3 more authors

ICLR Workshop on Responsible AI, 2021

arXiv
Promises and pitfalls of black-box concept learning models

Anita Mahinpei , Justin Clark , Isaac Lage , and 2 more authors

ICML Workshop on Theoretic Foundation, Criticism, and Application Trend of Explainable AI, 2021

arXiv

2020

Concept bottleneck models

Pang Wei Koh , Thao Nguyen , Yew Siang Tang , and 4 more authors

In International conference on machine learning , 2020

HTML
Concept whitening for interpretable image recognition

Zhi Chen , Yijie Bei , and Cynthia Rudin

Nature Machine Intelligence, 2020

HTML
MEME: generating RNN model explanations via model extraction

Dmitry Kazhdan , Botty Dimanov , Mateja Jamnik , and 1 more author

NeurIPS HAMLETS Workshop, 2020

arXiv

2018

Net2vec: Quantifying and explaining how concepts are encoded by filters in deep neural networks

Ruth Fong , and Andrea Vedaldi

In Proceedings of the IEEE conference on computer vision and pattern recognition , 2018

PDF
Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (TCAV)

Been Kim , Martin Wattenberg , Justin Gilmer , and 4 more authors

In International conference on machine learning , 2018

PDF

2017

Network dissection: Quantifying interpretability of deep visual representations

David Bau , Bolei Zhou , Aditya Khosla , and 2 more authors

In Proceedings of the IEEE conference on computer vision and pattern recognition , 2017

PDF
Feature visualization

Chris Olah , Alexander Mordvintsev , and Ludwig Schubert

Distill, 2017

HTML

Unsupervised Concept Learning

In contrast to the works above, the following papers attempt to learn concept representations without implicit or explicit concept labels. This is done by the means of concept discovery and represents a particularly active are of reasearch in this field:

2025

Explanation Bottleneck Models

Shin’ya Yamaguchi , and Kosuke Nishida

AAAI, 2025

arXiv

2023

Label-Free Concept Bottleneck Models

Tuomas Oikarinen , Subhro Das , Lam M Nguyen , and 1 more author

ICLR, 2023

arXiv
Tabcbm: Concept-based interpretable neural networks for tabular data

Mateo Espinosa Zarlenga , Zohreh Shams , Michael Edward Nelson , and 2 more authors

Transactions on Machine Learning Research, 2023

PDF
Bridging the human-ai knowledge gap: Concept discovery and transfer in alphazero

Lisa Schut , Nenad Tomasev , Tom McGrath , and 3 more authors

arXiv preprint arXiv:2310.16410, 2023

arXiv
Global concept-based interpretability for graph neural networks via neuron analysis

Han Xuanyuan , Pietro Barbiero , Dobrik Georgiev , and 2 more authors

In Proceedings of the AAAI conference on artificial intelligence , 2023

arXiv

2021

Gcexplainer: Human-in-the-loop concept-based explanations for graph neural networks

Lucie Charlotte Magister , Dmitry Kazhdan , Vikash Singh , and 1 more author

3rd ICML Workshop on Human in the Loop Learning,, 2021

arXiv

2020

On completeness-aware concept-based explanations in deep neural networks

Chih-Kuan Yeh , Been Kim , Sercan Arik , and 3 more authors

Advances in neural information processing systems, 2020

arXiv

2019

Towards automatic concept-based explanations

Amirata Ghorbani , James Wexler , James Y Zou , and 1 more author

Advances in neural information processing systems, 2019

arXiv

2018

Towards robust interpretability with self-explaining neural networks

David Alvarez Melis , and Tommi Jaakkola

Advances in neural information processing systems, 2018

arXiv

Reasoning with Concepts

Finally, we include some papers that describe very interesting things one can do once one has learnt some concept representations (regardless of whether these representations were learnt with or without concept supervision). These works are highly related to the field of neuro-symbolic reasoning and we discuss them in more detail in our presentation:

2024

DiConStruct: Causal Concept-based Explanations through Black-Box Distillation

Ricardo Moreira , Jacopo Bono , Mário Cardoso , and 3 more authors

arXiv preprint arXiv:2401.08534, 2024

arXiv

2023

Logic explained networks

Gabriele Ciravegna , Pietro Barbiero , Francesco Giannini , and 4 more authors

Artificial Intelligence, 2023

HTML
Interpretable Neural-Symbolic Concept Reasoning

Pietro Barbiero , Gabriele Ciravegna , Francesco Giannini , and 7 more authors

ICML, 2023

arXiv

2022

Logic tensor networks

Samy Badreddine , Artur d’Avila Garcez , Luciano Serafini , and 1 more author

Artificial Intelligence, 2022

HTML
Entropy-based logic explanations of neural networks

Pietro Barbiero , Gabriele Ciravegna , Francesco Giannini , and 3 more authors

In Proceedings of the AAAI Conference on Artificial Intelligence , 2022

arXiv
Algorithmic concept-based explainable reasoning

Dobrik Georgiev , Pietro Barbiero , Dmitry Kazhdan , and 2 more authors

In Proceedings of the AAAI Conference on Artificial Intelligence , 2022

arXiv

2021

Neural algorithmic reasoning

Petar Veličković , and Charles Blundell

Patterns, 2021

arXiv
Meaningfully explaining model mistakes using conceptual counterfactuals

Abubakar Abid , Mert Yuksekgonul , and James Zou

ICML, 2021

arXiv

2020

Neural execution of graph algorithms

Petar Veličković , Rex Ying , Matilde Padovano , and 2 more authors

ICLR, 2020

arXiv

2019

Explaining classifiers with causal concept effect (cace)

Yash Goyal , Amir Feder , Uri Shalit , and 1 more author

arXiv preprint arXiv:1907.07165, 2019

arXiv

2018

Learning explanatory rules from noisy data

Richard Evans , and Edward Grefenstette

Journal of Artificial Intelligence Research, 2018

arXiv
Deepproblog: Neural probabilistic logic programming

Robin Manhaeve , Sebastijan Dumancic , Angelika Kimmig , and 2 more authors

Advances in neural information processing systems, 2018

PDF

2016

Neural gpus learn algorithms

Łukasz Kaiser , and Ilya Sutskever

ICLR, 2016

PDF

2014

Learning to execute

Wojciech Zaremba , and Ilya Sutskever

arXiv preprint arXiv:1410.4615, 2014

arXiv

Concept-Learning Public Codebases

Below we list some concept-based open-sourced libraries. As with our reference material, this is by no means an exaustive list but rather one that contains libraries we have had the chance to interact with in the past. If you wish to include your library here, and it is related to concept-learning, please do not hesitate to contact us and we will include it here.