History Matters: Temporal Knowledge Editing in Large Language Model
AAAI 2024
Current knowledge editing methods cause catastrophic forgetting of historical facts. We formalize Temporal Knowledge Editing, build the AToKe benchmark, and propose METO -- a framework that updates LLMs to current knowledge while preserving the historical record.
The imperative task of revising or updating the knowledge stored within large language models arises from two distinct sources: intrinsic errors inherent in the model which should be corrected and outdated knowledge due to external shifts in the real world which should be updated. Existing knowledge editing methods treat all updates uniformly, but we argue that models should retain recollection of the historical knowledge while integrating the newfound knowledge. We introduce Temporal Knowledge Editing (TKE) as a distinct task and create the AToKe (Assessment of Temporal Knowledge Editing) benchmark with three dataset variants. Our experiments reveal that existing editing approaches cause catastrophic forgetting of historical facts. We propose METO (Multi-Editing with Time Objective), a framework that edits both old and new knowledge simultaneously while optimizing temporal predictions, substantially improving performance on historical knowledge retention.
Two Kinds of Wrong
When a language model is wrong, it can be wrong in two very different ways. Sometimes the model has an intrinsic error -- it learned something incorrectly during training and should simply be fixed. But sometimes the model's knowledge has become outdated -- the world changed, and what was once true is no longer true.
The President of the United States in 2020 isn't the same as in 2024. A company's CEO changes. A country's capital can move. When we update these facts in a language model, should we erase the history or preserve it?
The Problem with Forgetting
Current knowledge editing methods treat all updates the same -- they replace old knowledge with new. But this causes historical amnesia. Ask an edited model "Who was the CEO of Twitter in 2022?" and it might answer with the current CEO, erasing the historical record. For temporal facts, we want models to remember history while knowing what's current.
Temporal Knowledge Editing
We introduce Temporal Knowledge Editing (TKE) as a distinct task. Unlike standard editing that overwrites, TKE requires the model to maintain temporal coherence: knowing what was true at different times, and correctly distinguishing past from present.
To evaluate this, we built AToKe (Assessment of Temporal Knowledge Editing), a benchmark that tests whether edited models can answer time-indexed questions correctly -- both for current facts and historical ones. AToKe includes three variants: Single Editing (SE), Multi-Editing (ME), and Extended Editing (EE).
The Catastrophic Forgetting
Our experiments revealed a troubling finding: existing editing methods cause catastrophic forgetting of historical knowledge. They successfully implant new facts but destroy the record of what came before. A model that once knew historical sequences of events becomes temporally confused after editing.
This isn't just an academic concern. Models deployed in domains like law, medicine, or policy need to reason about what was true when -- not just what's true now.
Existing Methods on AToKe (Single Editing)
| Method | CES | CES-P | CRS | HES | HRS |
|---|---|---|---|---|---|
| CFT | 5.73 | 5.69 | 5.34 | 0.06 | 0.02 |
| MEND | 80.47 | 40.56 | 32.46 | 1.73 | 0.68 |
| ROME | 99.99 | 97.01 | 81.64 | 2.41 | 1.56 |
| MEMIT | 99.66 | 92.23 | 75.31 | 2.22 | 1.21 |
Reading the Results
CES/CES-P/CRS measure how well models handle current knowledge after editing. HES/HRS measure historical knowledge retention. Notice the stark contrast: methods like ROME achieve near-perfect current scores (99.99%) but historical scores near zero (2.41%). The models know the present but have forgotten the past.
Query Current Knowledge
Query the model based on current knowledge to retrieve what it already knows under its internal timeline.
Multi-Edit Construction
Construct editing targets using both the historical knowledge (what was true before) and the new current knowledge (what is true now).
Time Objective Optimization
Optimize the model's ability to predict when each fact was true, adding temporal awareness to the editing process.
Joint Knowledge Update
Apply the edit using any existing editing method, now enhanced with both historical preservation and temporal prediction objectives.
With METO Enhancement (Single Editing)
| Method | CES | CES-P | CRS | HES | HRS |
|---|---|---|---|---|---|
| CFT+ | 2.80 | 2.62 | 2.26 | 3.38 | 2.43 |
| MEND+ | 83.26 | 33.45 | 25.41 | 30.14 | 30.17 |
| ROME+ | 99.95 | 93.78 | 78.88 | 20.25 | 16.29 |
| MEMIT+ | 86.40 | 85.32 | 74.07 | 30.31 | 24.32 |
METO Dramatically Improves Historical Retention
Compare MEMIT's historical scores before and after METO: HES jumps from 2.22% to 30.31%, and HRS from 1.21% to 24.32% -- over 13x and 20x improvements respectively. METO transforms existing editing methods from history-erasing to history-preserving, while maintaining strong current knowledge performance.
Key Contributions
- New Task: Temporal Knowledge Editing as distinct from standard knowledge editing
- AToKe Benchmark: Three-variant evaluation suite (SE, ME, EE) for temporal knowledge editing
- Problem Discovery: Documenting catastrophic forgetting of historical knowledge in existing methods
- METO Framework: A plug-in solution that preserves historical knowledge across editing methods
History Matters
As language models become repositories of world knowledge, how we update them matters. The past isn't just noise to be overwritten -- it's context that gives meaning to the present. Temporal Knowledge Editing takes this seriously, ensuring that models can be updated while maintaining their role as keepers of historical record.