A team has demonstrated stable recall across million-token contexts, sidestepping the quadratic cost that has limited long-context models.
The approach
Instead of brute-force attention, the method blends retrieval with a learned memory store that compresses older context.
Open questions
Independent replication is still pending, and the gains on real documents may differ from the synthetic benchmarks used so far.