Mark Ibrahim

I’m interested in how we can build reliable multimodal AI systems that can discover and compose the right first principles.

SELECTED RESEARCH

Latest in Google Scholar

AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions

We evaluate LLMs’ capacity for abstention: the skill of knowing when NOT to answer! We find reasoning LLMs struggle with unanswerable questions and hallucinate. NeurIPS, 2025

> paper + code +data

Polina Kirichenko*, Samuel J. Bell*, Kamalika Chaudhuri, Mark Ibrahim*

cited in OpenAI's GPT-5 technical report & used by UK Security Institute’s Inspect Evals

The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More

We show training transformers to predict multiple tokens ahead and back (instead of just the single next token) improves models' ability to retrieve knowledge. NeurIPS, 2024

> paper

Ouail Kitouni, Niklas Nolte, Diane Bouchacourt, Adina Williams, Mike Rabbat, Mark Ibrahim

In a followup, we find the same objective without tweaks can improve transformer's ability to plan in maze navigation!

\(\mathbb{X}\)-Sample Contrastive Loss: Improving Contrastive Learning with Sample Similarity Graphs

We propose a graph-based contrastive loss that explicitly encodes relationships across samples during training. ICLR, 2025

> paper

Vlad Sobal, Mark Ibrahim, Randall Balestriero, Vivien Cabannes, Diane Bouchacourt, Pietro Astolfi, Kyunghyun Cho, Yann LeCun

Does Progress On Object Recognition Benchmarks Improve Real-World Generalization?

progress on standard benchmarks fails to improve geographic disparities in today's best models, ICLR 2024.

> paper

Megan Richards, Polina Kirichenko, Diane Bouchacourt, Mark Ibrahim

Shortcuts Come in Multiples Where Mitigating One Amplifies Others

a method for and study of how deep learning techniques cope with multiple shortcuts, CVPR 2023.

> paper + code

Zhiheng Li, Ivan Evtimov*, Albert Gordo, Caner Hazirbas, Tal Hassner, Cristian Canton Ferrer, Chenliang Xu, Mark Ibrahim*

ImageNet-X: Understanding Model Mistakes with Factor of Variation Annotations

we find surprisingly similar strengths and vulnerabilities across more than 2,200 deep learning models, ICLR Spotlight 2023.

> paper + website

Badr Youbi Idrissi, Diane Bouchacourt, Randall Balestriero, Ivan Evtimov, Caner Hazirbas, Nicolas Ballas, Pascal Vincent, Michal Drozdzal, David Lopez-Paz, Mark Ibrahim

Global Explanations for Neural Networks

Mapping the Landscape of Predictions, ACM AAAI 2019

> paper + open source library + blog post

Mark Ibrahim, Melissa Louie, Ceena Modarres, John Paisley (Columbia University)

A Cookbook of Self-Supervised Learning

co-author along with Randall Balestriero , Yann LeCun, and many others

Talks

Self Supervised Learning: The Final Frontier of AI at the Simons Flatiron Institute (April 2025)
Lightning Talk on Latent Space Prediction
organized by Randall Balestriero, Yann LeCun, Alberto Bieti, and Shirley Ho

NeurIPS Self-Supervised Learning Workshop Oral (Dec 2024)
Occam's Razor: What's sufficient for learning good self-supervised representations?

Brown University talk on Robust Representation Learning (2024)
From Vision to Multimodal Self-Supervised Models

ICML Tutorial on Self-Superivsed Learning (2023) (400+ researchers attended)
From Research Advances to Best Practices (slides and recording)

Georgia Tech's Deep Learning Course Instructor (2022) (10k+ online students)
Lecture on "Feed Forward Neural Networks"

PyCon US 2020 (Python Conference)
Talk on "Machine Learning on Encrypted Data with CrypTen"

NeurIPS 2018 FEAP Workshop Spotlight Talk (Dec 2018)
"Towards Explainable Deep Learning for Credit Lending"

New York Python Meetup (Dec 2018)
Data Science Talk: " Explaining Deep Learning Models"

Applied Machine Learning Tom Tom Conference (April 2018)
"Explainable AI: Key Techniques and Societal Implications"

George Washington University, Data Driven Conference (Dec 2017)
"Understanding the Predictions of Deep Neural Networks"

NYC Data Wranglers Meetup (Aug 2016)
Data Science in Practice: "Building a Graph-Based Search Engine"

Advising & Mentorship

Research Internship Advising: Ouail Kitouni (MIT PhD, now Anthropic Research Scientist), Mazda Moayeri (U Maryland PhD student), Cian Eastwood (now Senior Research Scientist at Valence Labs), Karsten Roth (PhD Student Google DeepMind/Tubingen).

AI Residents: Megan Richards (incoming NYU PhD student with prof Kyunghyun Cho), Haider Al-Tahan (incoming Georgia Tech PhD student)

Industry PhD Advisor: Polina Kirichenko (NYU PhD, now Research Scientist at FAIR, Meta AI).

Courses Taught at the University of Vermont

Calculus I 71 eager minds,

Calculus II 38 étudiants, and

College Algebra 42 estudiantes.

math + software

SELECTED RESEARCH

AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions

We evaluate LLMs’ capacity for abstention: the skill of knowing when NOT to answer! We find reasoning LLMs struggle with unanswerable questions and hallucinate. NeurIPS, 2025

> paper + code +data

cited in OpenAI's GPT-5 technical report & used by UK Security Institute’s Inspect Evals

The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More

We show training transformers to predict multiple tokens ahead and back (instead of just the single next token) improves models' ability to retrieve knowledge. NeurIPS, 2024

> paper

In a followup, we find the same objective without tweaks can improve transformer's ability to plan in maze navigation!

\(\mathbb{X}\)-Sample Contrastive Loss: Improving Contrastive Learning with Sample Similarity Graphs

We propose a graph-based contrastive loss that explicitly encodes relationships across samples during training. ICLR, 2025

> paper

Does Progress On Object Recognition Benchmarks Improve Real-World Generalization?

progress on standard benchmarks fails to improve geographic disparities in today's best models, ICLR 2024.

> paper

Shortcuts Come in Multiples Where Mitigating One Amplifies Others

a method for and study of how deep learning techniques cope with multiple shortcuts, CVPR 2023.

> paper + code

ImageNet-X: Understanding Model Mistakes with Factor of Variation Annotations

we find surprisingly similar strengths and vulnerabilities across more than 2,200 deep learning models, ICLR Spotlight 2023.

> paper + website

Global Explanations for Neural Networks

Mapping the Landscape of Predictions, ACM AAAI 2019

> paper + open source library + blog post

A Cookbook of Self-Supervised Learning

co-author along with Randall Balestriero , Yann LeCun, and many others

Talks

Advising & Mentorship

Courses Taught at the University of Vermont