By Zuzanna Stamirowska, CEO & Co-founder, Pathway
The next leap in AI will not come from large models that wake up with amnesia. It will come from architectures that can learn through experience.
Scaling laws and the transformer architecture gave researchers a recipe: more data, more compute, more parameters. It worked. We have systems that write, code, and converse with astonishing fluency. However, it also produced a blind spot which is now getting harder to ignore. Enterprises are asking AI to run agentic workflows that span days. Governments are building national AI strategies around ever-larger data centres. All of these futures run into the same constraint: today’s models cannot accumulate experience.
Imagine two employees on their first day of work. One is brilliant but wakes up every morning having forgotten you. You leave notes on its desk so it can catch up. The other employee has the same liberty to fetch text notes, but it remembers and forms decision making patterns as well. By day thirty, there is no comparison. The industry has been shipping amnesiacs like that first employee for five years now.
Most of what the industry currently calls “memory” in its current form is not memory. Long context windows are working memory, which is gone the moment the session ends. RAG is an external database lookup; the information lives in the index, not the model. The memory features in chat applications are text files attached to your user ID; the model is not learning about you, it is receiving a dossier. Fine-tuning is closer, but it is an offline batch process that assumes the production model stays frozen.
The opportunity lies in building memory into the model itself — weights that change as the system forms memories and learns from experience. Repeated exposure to a domain makes the system genuinely better at the work, not just better at looking things up.
The solution to this is architectural. The separation between training and inference is not a law of intelligence. It is a property of the transformer architecture that powers most LLMs today. Knowledge lives in weights frozen the moment training ends while everything afterward happens inside a temporary scratchpad, the context window, discarded at session end. Brains do not work this way. Neither do the post-transformer architectures emerging in 2026. A growing group of researchers now see memory, reasoning, and continual learning as built-in components of the architecture itself rather than capabilities bolted on after training.
Cost and energy follow directly from such architectural choices. Transformer attention scales quadratically with context length, which means that longer context windows get disproportionately expensive fast. Every task that rebuilds context from scratch pays that penalty in compute and in electricity. By contrast, post-transformer alternatives use linear attention and sparse structures, which give them energy efficiency and the ability to form memories as they work, without compute costs that blow up with scale.
In practical settings this has more consequences. Consider a clinician weighing efficacy, side effects, drug interactions, and a patient’s history at the same moment. A model that starts each consultation from zero cannot accumulate the judgment that comes from having worked through a thousand cases. The same holds in legal, operations, and scientific research. The most valuable business knowledge is judgment built through repeated exposure.
A growing conviction across the field is that the next frontier towards AGI is continual learning and memory, rather than scaling. METR measures how long frontier models stay coherent on real tasks; ARC-AGI-3 is built around whether systems can adapt inside novel environments. Academics such as Martin Farach-Colton, NYU’s Computer Science Chair, have publicly backed Post-Transformer research. The diagnosis is broadly agreed upon, and the discourse is now moving into the open. Adrian Kosowski, the inventor of the Dragon Hatchling architecture, has described this missing step as the “PageRank moment for intelligence.” Before that breakthrough, search could find pages, but not reliably order the web by relevance. Likewise, Transformers have made intelligence visible in machines. The next breakthrough will come from discovering what makes memory, adaptation, and experience compound. Llion Jones, one of the original Transformer authors, has argued that continual learning feels like a hack when it is added on top of architectures built around static weights. On the direction itself, there is no real disagreement. What remains is for post-transformer neolabs or incumbent frontier labs to deliver on it.
Today many researchers see AI as a genuine participant in scientific work and they believe AI will move a field forward in ways human researchers could not alone. Such advancement will compress decades of progress into years. Early examples such as AlphaFold’s protein-structure work or AI-designed materials also exist to back this vision. However, every version of that future carries an unstated precondition: experience compounds. A frozen model without native memory cannot learn from experience. It can only generate plausible answers from the hints in its prompt or retrieved snippets.
The path to superintelligence will not be carved by whoever owns the largest pile of data. It will be carved by whoever builds systems that learn from what they did yesterday, hold their judgment across sessions, and improve with use. Data made this generation of AI fluent. Memory will make the next generation capable.