History Matters: Temporal Knowledge Editing in Large Language Model

Xunjian Yin, Jin Jiang, Liming Yang, Xiaojun Wan

AAAI 2024

TL;DR

Current knowledge editing methods cause catastrophic forgetting of historical facts. We formalize Temporal Knowledge Editing, build the AToKe benchmark, and propose METO -- a framework that updates LLMs to current knowledge while preserving the historical record.

The imperative task of revising or updating the knowledge stored within large language models arises from two distinct sources: intrinsic errors inherent in the model which should be corrected and outdated knowledge due to external shifts in the real world which should be updated. Existing knowledge editing methods treat all updates uniformly, but we argue that models should retain recollection of the historical knowledge while integrating the newfound knowledge. We introduce Temporal Knowledge Editing (TKE) as a distinct task and create the AToKe (Assessment of Temporal Knowledge Editing) benchmark with three dataset variants. Our experiments reveal that existing editing approaches cause catastrophic forgetting of historical facts. We propose METO (Multi-Editing with Time Objective), a framework that edits both old and new knowledge simultaneously while optimizing temporal predictions, substantially improving performance on historical knowledge retention.

Example of catastrophic forgetting in knowledge editing: an edited GPT-J model loses historical knowledge about U.S. presidents — An edited GPT-J model loses historical knowledge when using existing editing methods like MEMIT. The editing operation overwrites the current president but destroys the record of who held the position before.

Two Kinds of Wrong

When a language model is wrong, it can be wrong in two very different ways. Sometimes the model has an intrinsic error -- it learned something incorrectly during training and should simply be fixed. But sometimes the model's knowledge has become outdated -- the world changed, and what was once true is no longer true.

The President of the United States in 2020 isn't the same as in 2024. A company's CEO changes. A country's capital can move. When we update these facts in a language model, should we erase the history or preserve it?

The Problem with Forgetting

Current knowledge editing methods treat all updates the same -- they replace old knowledge with new. But this causes historical amnesia. Ask an edited model "Who was the CEO of Twitter in 2022?" and it might answer with the current CEO, erasing the historical record. For temporal facts, we want models to remember history while knowing what's current.

Temporal Knowledge Editing

We introduce Temporal Knowledge Editing (TKE) as a distinct task. Unlike standard editing that overwrites, TKE requires the model to maintain temporal coherence: knowing what was true at different times, and correctly distinguishing past from present.

To evaluate this, we built AToKe (Assessment of Temporal Knowledge Editing), a benchmark that tests whether edited models can answer time-indexed questions correctly -- both for current facts and historical ones. AToKe includes three variants: Single Editing (SE), Multi-Editing (ME), and Extended Editing (EE).

The Catastrophic Forgetting

Our experiments revealed a troubling finding: existing editing methods cause catastrophic forgetting of historical knowledge. They successfully implant new facts but destroy the record of what came before. A model that once knew historical sequences of events becomes temporally confused after editing.

This isn't just an academic concern. Models deployed in domains like law, medicine, or policy need to reason about what was true when -- not just what's true now.

Existing Methods on AToKe (Single Editing)

Method	CES	CES-P	CRS	HES	HRS
CFT	5.73	5.69	5.34	0.06	0.02
MEND	80.47	40.56	32.46	1.73	0.68
ROME	99.99	97.01	81.64	2.41	1.56
MEMIT	99.66	92.23	75.31	2.22	1.21

Reading the Results

CES/CES-P/CRS measure how well models handle current knowledge after editing. HES/HRS measure historical knowledge retention. Notice the stark contrast: methods like ROME achieve near-perfect current scores (99.99%) but historical scores near zero (2.41%). The models know the present but have forgotten the past.

METO editing framework diagram — The METO framework: the model is first queried for current knowledge, then both historical and current knowledge are used as targets for joint optimization with a time objective.

Query Current Knowledge

Query the model based on current knowledge to retrieve what it already knows under its internal timeline.

Multi-Edit Construction

Construct editing targets using both the historical knowledge (what was true before) and the new current knowledge (what is true now).

Time Objective Optimization

Optimize the model's ability to predict when each fact was true, adding temporal awareness to the editing process.

Joint Knowledge Update

Apply the edit using any existing editing method, now enhanced with both historical preservation and temporal prediction objectives.

With METO Enhancement (Single Editing)

Method	CES	CES-P	CRS	HES	HRS
CFT+	2.80	2.62	2.26	3.38	2.43
MEND+	83.26	33.45	25.41	30.14	30.17
ROME+	99.95	93.78	78.88	20.25	16.29
MEMIT+	86.40	85.32	74.07	30.31	24.32

METO Dramatically Improves Historical Retention

Compare MEMIT's historical scores before and after METO: HES jumps from 2.22% to 30.31%, and HRS from 1.21% to 24.32% -- over 13x and 20x improvements respectively. METO transforms existing editing methods from history-erasing to history-preserving, while maintaining strong current knowledge performance.

      Key Contributions
      New Task: Temporal Knowledge Editing as distinct from standard knowledge editing
AToKe Benchmark: Three-variant evaluation suite (SE, ME, EE) for temporal knowledge editing
Problem Discovery: Documenting catastrophic forgetting of historical knowledge in existing methods
METO Framework: A plug-in solution that preserves historical knowledge across editing methods

    

History Matters

As language models become repositories of world knowledge, how we update them matters. The past isn't just noise to be overwritten -- it's context that gives meaning to the present. Temporal Knowledge Editing takes this seriously, ensuring that models can be updated while maintaining their role as keepers of historical record.

Citation

@inproceedings{yin2024history, title={History matters: Temporal knowledge editing in large language model}, author={Yin, Xunjian and Jiang, Jin and Yang, Liming and Wan, Xiaojun}, booktitle={Proceedings of the AAAI Conference on Artificial Intelligence}, volume={38}, number={17}, pages={19413--19421}, year={2024} }