LEDOM: An Open and Fundamental Reverse Language Model
ACL 2026
We train the first open right-to-left language model at scale (2B/7B, 435B tokens). LEDOM develops unique capabilities like abductive inference and question synthesis, and its Reverse Reward mechanism boosts strong forward models by up to 6.6% on AIME 2024 and 15% on AMC 2023 -- purely at inference time.
What reasoning patterns emerge when a model conditions on future context to predict the past? We investigate this question by training a right-to-left autoregressive language model at the 2B/7B parameter scale on 435 billion tokens. The resulting model develops distinctive capabilities including abductive inference, question synthesis, and natural resolution of the reversal curse. We propose Reverse Reward, which combines forward and reverse probabilities through noisy channel duality to rerank outputs. This approach achieves performance gains of up to 6.6% on AIME 2024 and 15% on AMC 2023 across multiple strong baselines. We release all models, code, and data publicly.
Thinking Backwards
Every language model you have ever used -- GPT, Claude, Llama -- works the same way: given some text, predict the next token. It is so fundamental that we rarely question it. But what if we flipped this entirely? What if a model was trained to predict the previous token instead?
This is not just a curiosity. LEDOM represents the first serious attempt to build a reverse language model at scale. The name itself is a hint: spell MODEL backwards. It captures the essence of what we have built -- a language model that thinks in reverse.
How It Works
Building a reverse language model requires rethinking every stage of the pipeline, from data to architecture to evaluation.
Reverse the Data
All training sequences are reversed at the token level. The model sees text right-to-left and learns to predict each token given only the tokens that originally came after it.
Standard Architecture, Opposite Direction
We use the same decoder-only transformer architecture (Qwen2 family). No architectural changes are needed -- the causal mask naturally handles right-to-left dependencies once the data is reversed.
Large-Scale Pretraining
We train on 435B tokens from three domains: 284B general text (DCLM), 103B mathematical data, and 48B code -- matching the data diet of strong forward models.
Reverse Reward Scoring
At inference time, LEDOM scores forward-model candidates by computing how likely each solution is to have led to the given answer -- a natural backward verification via noisy channel duality.
A Different Kind of Intelligence
When you read a sentence forward, you are predicting where it might go. When you read it backward, you are reasoning about how it got there. These are complementary forms of intelligence, and until now, we have only trained models for one of them.
LEDOM exhibits fascinating behaviors that emerge from backward processing. It naturally excels at verification -- checking whether a conclusion follows from given premises. It develops unique attention patterns suited for tracking backward dependencies. It develops distinctive capabilities including abductive inference, question synthesis, and natural resolution of the reversal curse.
Reverse Reward: The Killer Application
Given multiple candidate answers from a forward model, LEDOM evaluates each by asking "how likely is this reasoning chain to have produced this answer?" This backward likelihood, combined with forward probability through noisy channel duality, serves as a powerful reranking signal. The best part: it requires no additional training -- it is purely an inference-time improvement.
Reverse Reward Results on Competition Math
| Base Model | Method | GSM8K | MATH-500 | AIME 2024 | AMC 2023 |
|---|---|---|---|---|---|
| Qwen2.5-Math-7B | Greedy | 95.2% | 83.6% | 16.7% | 55.0% |
| + Reverse Reward | 96.1% | 85.4% | 23.3% | 57.5% | |
| DeepSeek-Math-7B | Greedy | -- | -- | 6.7% | -- |
| + Reverse Reward | -- | -- | 13.3% | -- |
Noisy Channel Duality
The theoretical foundation of Reverse Reward comes from Bayes' rule. Instead of selecting the answer y that maximizes P(y|x) from the forward model alone, we combine it with the reverse probability P(x|y) from LEDOM. This noisy channel formulation -- long studied in machine translation -- turns out to be remarkably effective for mathematical reasoning verification.
What We Found
Training LEDOM revealed several surprising insights. First, reverse language modeling converges stably, with training curves that look remarkably similar to forward modeling. Second, the model develops genuinely different internal representations, not just mirror images of forward models. Third, combining forward generation with backward evaluation creates a more robust reasoning system than either alone.
Training Data Composition
| Domain | Source | Tokens | Proportion |
|---|---|---|---|
| General Text | DCLM | 284B | 65.3% |
| Mathematics | Math corpora | 103B | 23.7% |
| Code | Code corpora | 48B | 11.0% |
| Total | -- | 435B | 100% |
Key Contributions
- First open reverse LM at scale -- 2B and 7B models trained on 435B tokens, fully open-sourced
- Emergent capabilities: abductive inference, question synthesis, reversal curse resolution
- Reverse Reward: up to +6.6% on AIME 2024 and +15% on AMC 2023, with no additional training
- Noisy channel duality: principled Bayesian framework combining forward and reverse probabilities
- Full reproducibility: models, training code, and pre-training data all publicly released
Open Science
We believe in open research. That is why we are releasing not just the trained models, but the complete training code and pre-training data. We want others to explore reverse language modeling, to find applications we have not imagined, and to push this direction further.
LEDOM is an invitation to think differently about language models. Forward is not the only direction.