LEDOM: An Open and Fundamental Reverse Language Model

Xunjian Yin, Sitao Cheng, Yuxi Xie, Xinyu Hu, Li Lin, Xinyi Wang, Liangming Pan, William Yang Wang, Xiaojun Wan

ACL 2026

TL;DR

We train the first open right-to-left language model at scale (2B/7B, 435B tokens). LEDOM develops unique capabilities like abductive inference and question synthesis, and its Reverse Reward mechanism boosts strong forward models by up to 6.6% on AIME 2024 and 15% on AMC 2023 -- purely at inference time.

What reasoning patterns emerge when a model conditions on future context to predict the past? We investigate this question by training a right-to-left autoregressive language model at the 2B/7B parameter scale on 435 billion tokens. The resulting model develops distinctive capabilities including abductive inference, question synthesis, and natural resolution of the reversal curse. We propose Reverse Reward, which combines forward and reverse probabilities through noisy channel duality to rerank outputs. This approach achieves performance gains of up to 6.6% on AIME 2024 and 15% on AMC 2023 across multiple strong baselines. We release all models, code, and data publicly.

Forward vs. Reverse Language Modeling. Forward LMs decompose P(x) left-to-right; LEDOM decomposes right-to-left using an identical decoder-only architecture -- same model, opposite direction.

Thinking Backwards

Every language model you have ever used -- GPT, Claude, Llama -- works the same way: given some text, predict the next token. It is so fundamental that we rarely question it. But what if we flipped this entirely? What if a model was trained to predict the previous token instead?

This is not just a curiosity. LEDOM represents the first serious attempt to build a reverse language model at scale. The name itself is a hint: spell MODEL backwards. It captures the essence of what we have built -- a language model that thinks in reverse.

How It Works

Building a reverse language model requires rethinking every stage of the pipeline, from data to architecture to evaluation.

Reverse the Data

All training sequences are reversed at the token level. The model sees text right-to-left and learns to predict each token given only the tokens that originally came after it.

Standard Architecture, Opposite Direction

We use the same decoder-only transformer architecture (Qwen2 family). No architectural changes are needed -- the causal mask naturally handles right-to-left dependencies once the data is reversed.

Large-Scale Pretraining

We train on 435B tokens from three domains: 284B general text (DCLM), 103B mathematical data, and 48B code -- matching the data diet of strong forward models.

Reverse Reward Scoring

At inference time, LEDOM scores forward-model candidates by computing how likely each solution is to have led to the given answer -- a natural backward verification via noisy channel duality.

Reverse Reward Mechanism — Verification by Inversion: a forward LM generates candidate solutions, then LEDOM scores each by estimating its posterior probability -- how likely is this reasoning chain to have produced this answer?

A Different Kind of Intelligence

When you read a sentence forward, you are predicting where it might go. When you read it backward, you are reasoning about how it got there. These are complementary forms of intelligence, and until now, we have only trained models for one of them.

LEDOM exhibits fascinating behaviors that emerge from backward processing. It naturally excels at verification -- checking whether a conclusion follows from given premises. It develops unique attention patterns suited for tracking backward dependencies. It develops distinctive capabilities including abductive inference, question synthesis, and natural resolution of the reversal curse.

Reverse Reward: The Killer Application

Given multiple candidate answers from a forward model, LEDOM evaluates each by asking "how likely is this reasoning chain to have produced this answer?" This backward likelihood, combined with forward probability through noisy channel duality, serves as a powerful reranking signal. The best part: it requires no additional training -- it is purely an inference-time improvement.

Performance vs Sampling Size — Accuracy improvements with varying sampling sizes (N=1 to 64) on MATH-500 and GSM8K. Reverse Reward consistently improves performance across all sampling budgets.

Reverse Reward Results on Competition Math

Base Model	Method	GSM8K	MATH-500	AIME 2024	AMC 2023
Qwen2.5-Math-7B	Greedy	95.2%	83.6%	16.7%	55.0%
Qwen2.5-Math-7B	+ Reverse Reward	96.1%	85.4%	23.3%	57.5%
DeepSeek-Math-7B	Greedy	--	--	6.7%	--
DeepSeek-Math-7B	+ Reverse Reward	--	--	13.3%	--

Noisy Channel Duality

The theoretical foundation of Reverse Reward comes from Bayes' rule. Instead of selecting the answer y that maximizes P(y|x) from the forward model alone, we combine it with the reverse probability P(x|y) from LEDOM. This noisy channel formulation -- long studied in machine translation -- turns out to be remarkably effective for mathematical reasoning verification.

Training Loss Curves — Training loss curves comparing LEDOM and a forward LM. While reverse modeling converges slightly more slowly, both reach stable optima -- suggesting reverse modeling is equally learnable.

What We Found

Training LEDOM revealed several surprising insights. First, reverse language modeling converges stably, with training curves that look remarkably similar to forward modeling. Second, the model develops genuinely different internal representations, not just mirror images of forward models. Third, combining forward generation with backward evaluation creates a more robust reasoning system than either alone.

Training Data Composition

Domain	Source	Tokens	Proportion
General Text	DCLM	284B	65.3%
Mathematics	Math corpora	103B	23.7%
Code	Code corpora	48B	11.0%
Total	--	435B	100%

      Key Contributions
      First open reverse LM at scale -- 2B and 7B models trained on 435B tokens, fully open-sourced
Emergent capabilities: abductive inference, question synthesis, reversal curse resolution
Reverse Reward: up to +6.6% on AIME 2024 and +15% on AMC 2023, with no additional training
Noisy channel duality: principled Bayesian framework combining forward and reverse probabilities
Full reproducibility: models, training code, and pre-training data all publicly released

    

Open Science

We believe in open research. That is why we are releasing not just the trained models, but the complete training code and pre-training data. We want others to explore reverse language modeling, to find applications we have not imagined, and to push this direction further.

LEDOM is an invitation to think differently about language models. Forward is not the only direction.

Citation

@misc{yin2025ledomopenfundamentalreverse, title={LEDOM: An Open and Fundamental Reverse Language Model}, author={Xunjian Yin and Sitao Cheng and Yuxi Xie and Xinyu Hu and Li Lin and Xinyi Wang and Liangming Pan and William Yang Wang and Xiaojun Wan}, year={2025}, eprint={2507.01335}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2507.01335} }