math + coding

Mark Ibrahim

Staff AI Researcher · FAIR, Meta Superintelligence Lab · New York

Building dependable multimodal & computer-use agents.

Co-author — Self-Supervised Learning Cookbook (with Yann LeCun)
30+ publications
3 spotlight awards
2 Oral Awards · top 1%
~10 years of AI research

I’m interested in how we can build reliable multimodal AI systems that can discover and compose the right first principles — from foundation-model training and evaluation to the computer-use agents they power.

SELECTED RESEARCH

Latest in Google Scholar

๐Ÿ—ž๏ธ News & Updates

๐Ÿ–ฑ๏ธ Computer-Use Agents

OpenApps: simulating app variations for computer-use agent reliability

โ˜… ICLR 2026 Oral ยท top 1% โ˜… Oral ยท NE Agents Day

We open-source OpenApps, a Python research environment that generates endless versions of six real apps — with ground-truth state and rewards, on a single CPU — to train and evaluate computer-use agents across the variations they break on in deployment.

Integrated into OpenEnv (the Hugging Face / PyTorch RL environment) and BrowserGym · oral at NE Agents Day
๐Ÿ“ฃ code release, ๐Ÿ“ƒ paper, and ๐ŸŽฌ video tutorial

Karen Ullrich, Jingtong Su, Claudia Shi, Arjun Subramonian, Amir Bar, Ivan Evtimov, Nikolaos Tsilivis, Randall Balestriero, Julia Kempe, Mark Ibrahim

๐Ÿ–ผ๏ธ Multimodal Training & Evaluation

Common-O: Hallucination in Visual Reasoning Across Scenes

NeurIPS 2025

Multimodal models can perceive objects but hallucinate when reasoning across scenes. Common-O is a decontaminated multi-scene reasoning benchmark on which today's best models score under 25%.

๐Ÿ”ฅ 30k+ Hugging Face downloads
> paper + data

Candace Ross, Florian Bordes, Adina Williams, Polina Kirichenko, Mark Ibrahim

LLIP: Latent-Language Image Pretraining

ICML 2024

A state-of-the-art open-weight vision encoder (ViT-G) with optimized visual cross-attention. Scaled to 5B samples, LLIP outperforms MetaCLIP by an average of 2.9% across 22 zero-shot benchmarks and 6% R@1 on COCO retrieval.

> paper + weights

Samuel Lavoie, Polina Kirichenko, Mark Ibrahim, Mahmoud Assran, Andrew Gordon Wilson, Aaron Courville, Nicolas Ballas

UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond Scaling

NeurIPS 2024

A 50+ benchmark suite of vision-language capabilities revealing that scaling alone doesn't improve visual reasoning — evaluate a model across 7 capability types on 1 GPU in minutes.

> paper

Haider Al-Tahan, Quentin Garrido, Randall Balestriero, Diane Bouchacourt, Caner Hazirbas, Mark Ibrahim

\(\mathbb{X}\)-Sample Contrastive Loss: Improving Contrastive Learning with Sample Similarity Graphs

ICLR 2025

A graph-based contrastive loss that explicitly encodes relationships across samples during training, improving efficiency and robustness.

> paper

Vlad Sobal, Mark Ibrahim, Randall Balestriero, Vivien Cabannes, Diane Bouchacourt, Pietro Astolfi, Kyunghyun Cho, Yann LeCun

๐Ÿงญ Alignment & Reasoning

Learning to Reason in 13 Parameters (TinyLoRA)

๐Ÿ”ฅ Front page of Hacker News

We show GRPO needs to update as few as 13 parameters (26 bytes) to bring Qwen2.5-8B within 5% of full-finetuning GSM8K performance — recovering 90% of reasoning gains while training 1000× fewer parameters.

> paper

John X. Morris, Niloofar Mireshghallah, Mark Ibrahim, Saeed Mahloujifar

AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions

NeurIPS 2025

We evaluate LLMs' capacity for abstention — the skill of knowing when NOT to answer. We find reasoning LLMs struggle with unanswerable questions and hallucinate.

Used by OpenAI in the GPT-5 system card · adopted by the UK AI Security Institute's Inspect Evals · cited in the MuseSpark Preparedness Report
> paper + code + data

Polina Kirichenko*, Samuel J. Bell*, Kamalika Chaudhuri, Mark Ibrahim*

The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More

NeurIPS 2024

We show that training transformers to predict multiple tokens ahead and back (instead of just the single next token) improves models' ability to retrieve knowledge.

> paper

Ouail Kitouni, Niklas Nolte, Diane Bouchacourt, Adina Williams, Mike Rabbat, Mark Ibrahim

In a follow-up, the same objective improves a transformer's ability to plan in maze navigation (MLM-U), converging 2× faster in GPU hours.

๐Ÿงช Self-Supervised Learning & Generalization

Discovering Environments with XRM

โ˜… ICML 2024 ยท top 1%

A method to automatically discover the spurious environments that break out-of-distribution generalization — without human annotations.

> paper

Mohammad Pezeshki, Diane Bouchacourt, Mark Ibrahim, Nicolas Ballas, Pascal Vincent, David Lopez-Paz

ImageNet-X: Understanding Model Mistakes with Factor of Variation Annotations

โ˜… ICLR 2023 Spotlight ยท top 5%

We find surprisingly similar strengths and vulnerabilities across more than 2,200 deep learning models.

> paper + website

Badr Youbi Idrissi, Diane Bouchacourt, Randall Balestriero, Ivan Evtimov, Caner Hazirbas, Nicolas Ballas, Pascal Vincent, Michal Drozdzal, David Lopez-Paz, Mark Ibrahim

Shortcuts Come in Multiples Where Mitigating One Amplifies Others

CVPR 2023

A method for and study of how deep learning techniques cope with multiple shortcuts (Whac-A-Mole).

> paper + code

Zhiheng Li, Ivan Evtimov*, Albert Gordo, Caner Hazirbas, Tal Hassner, Cristian Canton Ferrer, Chenliang Xu, Mark Ibrahim*

Does Progress on Object Recognition Benchmarks Improve Real-World Generalization?

ICLR 2024

Progress on standard benchmarks fails to improve — and can worsen — geographic disparities in today's best models.

> paper

Megan Richards, Polina Kirichenko, Diane Bouchacourt, Mark Ibrahim

Recipes for training and evaluating self-supervised learning systems — co-authored with Randall Balestriero, Yann LeCun, and many others.

Featured on the Meta AI blog

Earlier work

Global Explanations for Neural Networks: Mapping the Landscape of Predictions

AAAI 2019
> paper + open source library + blog post

Mark Ibrahim, Melissa Louie, Ceena Modarres, John Paisley (Columbia University)

Talks

ICLR Oral Presentation (May 2026)
OpenApps: simulating app variations for computer-use agent reliability

World Modeling Workshop at Mila (Feb 2026)
OpenApps: World Models for Computer-Use

NeurIPS Highlights (Dec 2025)
Slides

Self Supervised Learning: The Final Frontier of AI at the Simons Flatiron Institute (April 2025)
Lightning Talk on Latent Space Prediction
organized by Randall Balestriero, Yann LeCun, Alberto Bieti, and Shirley Ho

NeurIPS Self-Supervised Learning Workshop Oral (Dec 2024)
Occam's Razor: What's sufficient for learning good self-supervised representations?

Brown University talk on Robust Representation Learning (2024)
From Vision to Multimodal Self-Supervised Models

ICML Tutorial on Self-Superivsed Learning (2023) (400+ researchers attended)
From Research Advances to Best Practices (slides and recording)

Georgia Tech's Deep Learning Course Instructor (2022) (10k+ online students)
Lecture on "Feed Forward Neural Networks"

PyCon US 2020 (Python Conference)
Talk on "Machine Learning on Encrypted Data with CrypTen"

NeurIPS 2018 FEAP Workshop Spotlight Talk (Dec 2018)
"Towards Explainable Deep Learning for Credit Lending"

New York Python Meetup (Dec 2018)
Data Science Talk: " Explaining Deep Learning Models"

Applied Machine Learning Tom Tom Conference (April 2018)
"Explainable AI: Key Techniques and Societal Implications"

George Washington University, Data Driven Conference (Dec 2017)
"Understanding the Predictions of Deep Neural Networks"

NYC Data Wranglers Meetup (Aug 2016)
Data Science in Practice: "Building a Graph-Based Search Engine"


Advising & Mentorship


Research Internship Advising: Ouail Kitouni (MIT PhD, now Anthropic Research Scientist), Mazda Moayeri (U Maryland PhD student), Cian Eastwood (now Senior Research Scientist at Valence Labs), Karsten Roth (PhD Student Google DeepMind/Tubingen).

AI Residents: Megan Richards (incoming NYU PhD student with prof Kyunghyun Cho), Haider Al-Tahan (incoming Georgia Tech PhD student)

Industry PhD Advisor: Polina Kirichenko (NYU PhD, now Research Scientist at FAIR, Meta AI).


Courses Taught at the University of Vermont


Calculus I 71 eager minds,

Calculus II 38 étudiants, and

College Algebra 42 estudiantes.