Resources

Here, we include some useful resources for reasearch in concept representation learning.

Relevant Papers and Materials

Below we include a list of works in Concept representation learning, particularly in the areas of Interpretability/Explainability, that are relevant to concept-based interpretable deep learning. We will discuss several of these papers in our tutorial, however we thought that it may be benefitial to write them down in list format for people to access these works more easily. Please keep in mind that this is in no way an exhaustive list of important works within concept learning as this is a fast moving field and we have only so much space we can use here. Nevertheless, we still hope you may find this list helpful if you want to get a sense of where the field is and where it is heading.

Concept Learning Surveys

These are some of the surveys that touch on concept representation learning and its use in interpretable/explainable AI:

2023

  1. Concept-based Explainable Artificial Intelligence: A Survey
    Eleonora Poeta ,  Gabriele Ciravegna ,  Eliana Pastor , and 2 more authors
    arXiv preprint arXiv:2312.12936, 2023

2022

  1. Concept embedding analysis: A review
    Gesina Schwalbe
    arXiv preprint arXiv:2203.13909, 2022

2020

  1. Explainable AI: A review of machine learning interpretability methods
    Pantelis Linardatos ,  Vasilis Papastefanopoulos ,  and  Sotiris Kotsiantis
    Entropy, 2020

Various Aspects of XAI

Similarly, there are several key surveys/works that discuss formalisms, definitons, and limitatons of key ideas in the general field of XAI. These works touch upon definitions of what it means to explain a model and on some of the issues of so-called “traditional” XAI approaches (e.g., saliency methods):

2023

  1. Dear XAI community, we need to talk! Fundamental misconceptions in current XAI research
    Timo Freiesleben ,  and  Gunnar König
    In World Conference on Explainable Artificial Intelligence , 2023

2022

  1. The Disagreement Problem in Explainable Machine Learning: A Practitioner’s Perspective
    Satyapriya Krishna ,  Tessa Han ,  Alex Gu , and 3 more authors
    Transactions on Machine Learning Research, 2022
  2. How cognitive biases affect XAI-assisted decision-making: A systematic review
    Astrid Bertrand ,  Rafik Belloum ,  James R Eagan , and 1 more author
    In Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society , 2022

2021

  1. A historical perspective of explainable artificial intelligence
    Roberto Confalonieri ,  Ludovik Coba ,  Benedikt Wagner , and 1 more author
    Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2021
  2. Notions of explainability and evaluation approaches for explainable artificial intelligence
    Giulia Vilone ,  and  Luca Longo
    Information Fusion, 2021

2020

  1. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
    Alejandro Barredo Arrieta ,  Natalia Dı́az-Rodrı́guez ,  Javier Del Ser , and 8 more authors
    Information fusion, 2020

2019

  1. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
    Cynthia Rudin
    Nature machine intelligence, 2019
  2. Explanations can be manipulated and geometry is to blame
    Ann-Kathrin Dombrowski ,  Maximillian Alber ,  Christopher Anders , and 3 more authors
    Advances in neural information processing systems, 2019
  3. Interpretation of neural networks is fragile
    Amirata Ghorbani ,  Abubakar Abid ,  and  James Zou
    In Proceedings of the AAAI conference on artificial intelligence , 2019

2018

  1. Explaining explanations: An overview of interpretability of machine learning
    Leilani H Gilpin ,  David Bau ,  Ben Z Yuan , and 3 more authors
    In 2018 IEEE 5th International Conference on data science and advanced analytics (DSAA) , 2018
  2. Explainable AI: the new 42?
    Randy Goebel ,  Ajay Chander ,  Katharina Holzinger , and 5 more authors
    In International cross-domain conference for machine learning and knowledge extraction , 2018
  3. Sanity checks for saliency maps
    Julius Adebayo ,  Justin Gilmer ,  Michael Muelly , and 3 more authors
    Advances in neural information processing systems, 2018

Supervised Concept Learning

Here we include some relevant works in concept representation learning that assume concept-labels are provided in some manner to learn concept representations from which explanations can be then constructed:

2024

  1. Do Concept Bottleneck Models Respect Localities?
    Naveen Raman ,  Mateo Espinosa Zarlenga ,  Juyeon Heo , and 1 more author
    NeurIPS Workshop on XAI in Action, 2024
  2. Understanding inter-concept relationships in concept-based models
    Naveen Raman ,  Mateo Espinosa Zarlenga ,  and  Mateja Jamnik
    In Proceedings of the 41st International Conference on Machine Learning , 2024
  3. A Neuro-Symbolic Benchmark Suite for Concept Quality and Reasoning Shortcuts
    Samuele Bortolotti ,  Emanuele Marconato ,  Tommaso Carraro , and 5 more authors
    In The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track , 2024
  4. Energy-Based Concept Bottleneck Models: Unifying Prediction, Concept Intervention, and Probabilistic Interpretations
    Xinyue Xu ,  Yi Qin ,  Lu Mi , and 2 more authors
    In The Twelfth International Conference on Learning Representations , 2024
  5. Concept-Based Interpretable Reinforcement Learning with Limited to No Human Labels
    Zhuorui Ye ,  Stephanie Milani ,  Fei Fang , and 1 more author
    In Workshop on Interpretable Policies in Reinforcement Learning at RLC-2024 , 2024
  6. Learning to Intervene on Concept Bottlenecks
    David Steinmann ,  Wolfgang Stammer ,  Felix Friedrich , and 1 more author
    In Forty-first International Conference on Machine Learning , 2024
  7. Stochastic Concept Bottleneck Models
    Moritz Vandenhirtz ,  Sonia Laguna ,  Ričards Marcinkevičs , and 1 more author
    In The Thirty-eighth Annual Conference on Neural Information Processing Systems , 2024
  8. Beyond concept bottleneck models: How to make black boxes intervenable?
    Sonia Laguna Cillero ,  Ričards Marcinkevičs ,  Moritz Vandenhirtz , and 1 more author
    In 38th Annual Conference on Neural Information Processing Systems (NeurIPS 2024), Vancouver, Canada, December 10-15, 2024 , 2024

2023

  1. Concept Gradient: Concept-based Interpretation Without Linear Assumption
    Andrew Bai ,  Chih-Kuan Yeh ,  Pradeep Ravikumar , and 2 more authors
    ICLR, 2023
  2. Learning to Receive Help: Intervention-Aware Concept Embedding Models
    Mateo Espinosa Zarlenga ,  Katherine M Collins ,  Krishnamurthy Dvijotham , and 3 more authors
    NeurIPS, 2023
  3. Label-Free Concept Bottleneck Models
    Tuomas Oikarinen ,  Subhro Das ,  Lam M Nguyen , and 1 more author
    ICLR, 2023
  4. Post-hoc concept bottleneck models
    Mert Yuksekgonul ,  Maggie Wang ,  and  James Zou
    ICLR, 2023
  5. Probabilistic Concept Bottleneck Models
    Eunji Kim ,  Dahuin Jung ,  Sangha Park , and 2 more authors
    ICML, 2023
  6. Probabilistic Concept Bottleneck Models
    Eunji Kim ,  Dahuin Jung ,  Sangha Park , and 2 more authors
    ICML, 2023
  7. Understanding and enhancing robustness of concept-based models
    Sanchit Sinha ,  Mengdi Huai ,  Jianhui Sun , and 1 more author
    In Proceedings of the AAAI Conference on Artificial Intelligence , 2023
  8. Concept correlation and its effects on concept-based models
    Lena Heidemann ,  Maureen Monnet ,  and  Karsten Roscher
    In Proceedings of the ieee/cvf winter conference on applications of computer vision , 2023
  9. Towards robust metrics for concept representation evaluation
    Mateo Espinosa Zarlenga ,  Pietro Barbiero ,  Zohreh Shams , and 4 more authors
    In Proceedings of the AAAI Conference on Artificial Intelligence , 2023
  10. Interpretability is in the mind of the beholder: A causal framework for human-interpretable representation learning
    Emanuele Marconato ,  Andrea Passerini ,  and  Stefano Teso
    Entropy, 2023
  11. A closer look at the intervention procedure of concept bottleneck models
    Sungbin Shin ,  Yohan Jo ,  Sungsoo Ahn , and 1 more author
    In International Conference on Machine Learning , 2023
  12. Interactive concept bottleneck models
    Kushal Chauhan ,  Rishabh Tiwari ,  Jan Freyberg , and 2 more authors
    In Proceedings of the AAAI Conference on Artificial Intelligence , 2023

2022

  1. Concept activation regions: A generalized framework for concept-based explanations
    Jonathan Crabbé ,  and  Mihaela Schaar
    Advances in Neural Information Processing Systems, 2022
  2. Glancenets: Interpretable, leak-proof concept-based models
    Emanuele Marconato ,  Andrea Passerini ,  and  Stefano Teso
    Advances in Neural Information Processing Systems, 2022
  3. Concept embedding models: Beyond the accuracy-explainability trade-off
    Mateo Espinosa Zarlenga ,  Pietro Barbiero ,  Gabriele Ciravegna , and 8 more authors
    Advances in Neural Information Processing Systems, 2022
  4. Addressing leakage in concept bottleneck models
    Marton Havasi ,  Sonali Parbhoo ,  and  Finale Doshi-Velez
    Advances in Neural Information Processing Systems, 2022
  5. Learning from uncertain concepts via test time interventions
    Ivaxi Sheth ,  Aamer Abdul Rahman ,  Laya Rafiee Sevyeri , and 2 more authors
    In Workshop on Trustworthy and Socially Responsible Machine Learning, NeurIPS 2022 , 2022

2021

  1. Do concept bottleneck models learn as intended?
    Andrei Margeloiu ,  Matthew Ashman ,  Umang Bhatt , and 3 more authors
    ICLR Workshop on Responsible AI, 2021
  2. Promises and pitfalls of black-box concept learning models
    Anita Mahinpei ,  Justin Clark ,  Isaac Lage , and 2 more authors
    ICML Workshop on Theoretic Foundation, Criticism, and Application Trend of Explainable AI, 2021

2020

  1. Concept bottleneck models
    Pang Wei Koh ,  Thao Nguyen ,  Yew Siang Tang , and 4 more authors
    In International conference on machine learning , 2020
  2. Concept whitening for interpretable image recognition
    Zhi Chen ,  Yijie Bei ,  and  Cynthia Rudin
    Nature Machine Intelligence, 2020
  3. MEME: generating RNN model explanations via model extraction
    Dmitry Kazhdan ,  Botty Dimanov ,  Mateja Jamnik , and 1 more author
    NeurIPS HAMLETS Workshop, 2020

2018

  1. Net2vec: Quantifying and explaining how concepts are encoded by filters in deep neural networks
    Ruth Fong ,  and  Andrea Vedaldi
    In Proceedings of the IEEE conference on computer vision and pattern recognition , 2018
  2. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (TCAV)
    Been Kim ,  Martin Wattenberg ,  Justin Gilmer , and 4 more authors
    In International conference on machine learning , 2018

2017

  1. Network dissection: Quantifying interpretability of deep visual representations
    David Bau ,  Bolei Zhou ,  Aditya Khosla , and 2 more authors
    In Proceedings of the IEEE conference on computer vision and pattern recognition , 2017
  2. Feature visualization
    Chris Olah ,  Alexander Mordvintsev ,  and  Ludwig Schubert
    Distill, 2017

Unsupervised Concept Learning

In contrast to the works above, the following papers attempt to learn concept representations without implicit or explicit concept labels. This is done by the means of concept discovery and represents a particularly active are of reasearch in this field:

2025

  1. Explanation Bottleneck Models
    Shin’ya Yamaguchi ,  and  Kosuke Nishida
    AAAI, 2025

2023

  1. Label-Free Concept Bottleneck Models
    Tuomas Oikarinen ,  Subhro Das ,  Lam M Nguyen , and 1 more author
    ICLR, 2023
  2. Tabcbm: Concept-based interpretable neural networks for tabular data
    Mateo Espinosa Zarlenga ,  Zohreh Shams ,  Michael Edward Nelson , and 2 more authors
    Transactions on Machine Learning Research, 2023
  3. Bridging the human-ai knowledge gap: Concept discovery and transfer in alphazero
    Lisa Schut ,  Nenad Tomasev ,  Tom McGrath , and 3 more authors
    arXiv preprint arXiv:2310.16410, 2023
  4. Global concept-based interpretability for graph neural networks via neuron analysis
    Han Xuanyuan ,  Pietro Barbiero ,  Dobrik Georgiev , and 2 more authors
    In Proceedings of the AAAI conference on artificial intelligence , 2023

2021

  1. Gcexplainer: Human-in-the-loop concept-based explanations for graph neural networks
    Lucie Charlotte Magister ,  Dmitry Kazhdan ,  Vikash Singh , and 1 more author
    3rd ICML Workshop on Human in the Loop Learning,, 2021

2020

  1. On completeness-aware concept-based explanations in deep neural networks
    Chih-Kuan Yeh ,  Been Kim ,  Sercan Arik , and 3 more authors
    Advances in neural information processing systems, 2020

2019

  1. Towards automatic concept-based explanations
    Amirata Ghorbani ,  James Wexler ,  James Y Zou , and 1 more author
    Advances in neural information processing systems, 2019

2018

  1. Towards robust interpretability with self-explaining neural networks
    David Alvarez Melis ,  and  Tommi Jaakkola
    Advances in neural information processing systems, 2018

Reasoning with Concepts

Finally, we include some papers that describe very interesting things one can do once one has learnt some concept representations (regardless of whether these representations were learnt with or without concept supervision). These works are highly related to the field of neuro-symbolic reasoning and we discuss them in more detail in our presentation:

2024

  1. DiConStruct: Causal Concept-based Explanations through Black-Box Distillation
    Ricardo Moreira ,  Jacopo Bono ,  Mário Cardoso , and 3 more authors
    arXiv preprint arXiv:2401.08534, 2024

2023

  1. Logic explained networks
    Gabriele Ciravegna ,  Pietro Barbiero ,  Francesco Giannini , and 4 more authors
    Artificial Intelligence, 2023
  2. Interpretable Neural-Symbolic Concept Reasoning
    Pietro Barbiero ,  Gabriele Ciravegna ,  Francesco Giannini , and 7 more authors
    ICML, 2023

2022

  1. Logic tensor networks
    Samy Badreddine ,  Artur d’Avila Garcez ,  Luciano Serafini , and 1 more author
    Artificial Intelligence, 2022
  2. Entropy-based logic explanations of neural networks
    Pietro Barbiero ,  Gabriele Ciravegna ,  Francesco Giannini , and 3 more authors
    In Proceedings of the AAAI Conference on Artificial Intelligence , 2022
  3. Algorithmic concept-based explainable reasoning
    Dobrik Georgiev ,  Pietro Barbiero ,  Dmitry Kazhdan , and 2 more authors
    In Proceedings of the AAAI Conference on Artificial Intelligence , 2022

2021

  1. Neural algorithmic reasoning
    Petar Veličković ,  and  Charles Blundell
    Patterns, 2021
  2. Meaningfully explaining model mistakes using conceptual counterfactuals
    Abubakar Abid ,  Mert Yuksekgonul ,  and  James Zou
    ICML, 2021

2020

  1. Neural execution of graph algorithms
    Petar Veličković ,  Rex Ying ,  Matilde Padovano , and 2 more authors
    ICLR, 2020

2019

  1. Explaining classifiers with causal concept effect (cace)
    Yash Goyal ,  Amir Feder ,  Uri Shalit , and 1 more author
    arXiv preprint arXiv:1907.07165, 2019

2018

  1. Learning explanatory rules from noisy data
    Richard Evans ,  and  Edward Grefenstette
    Journal of Artificial Intelligence Research, 2018
  2. Deepproblog: Neural probabilistic logic programming
    Robin Manhaeve ,  Sebastijan Dumancic ,  Angelika Kimmig , and 2 more authors
    Advances in neural information processing systems, 2018

2016

  1. Neural gpus learn algorithms
    Łukasz Kaiser ,  and  Ilya Sutskever
    ICLR, 2016

2014

  1. Learning to execute
    Wojciech Zaremba ,  and  Ilya Sutskever
    arXiv preprint arXiv:1410.4615, 2014


Concept-Learning Public Codebases

Below we list some concept-based open-sourced libraries. As with our reference material, this is by no means an exaustive list but rather one that contains libraries we have had the chance to interact with in the past. If you wish to include your library here, and it is related to concept-learning, please do not hesitate to contact us and we will include it here.