The Weeping Angels of AI • Singularity Notes

Large Language Models (LLMs) have taken the world by storm with their ability to generate human-like text. Yet understanding how they operate – and how they relate to human thought – can feel as challenging. I would like to draw an analogy between LLMs and the Weeping Angels from Doctor Who, those quantum-locked creatures that only move when unobserved. Much like these Angels, LLMs seem to exist in a different temporal framework, coming to life only when we “look away” (i.e. when we hit enter on a prompt). This analogy, while playful, opens a doorway to deeper insights about the nature of AI cognition and even how it parallels theories of human consciousness. In this post, I will expand on this analogy and explore how LLMs “think” in ways that resemble – and differ from – our own minds. I will also draw on ideas from cognitive science and philosophy (such as Daniel Dennett’s Multiple Drafts Model and Kahneman’s System 1 vs. System 2), exploring their implications for artificial general intelligence (AGI) and consciousness.

Frozen in Time: LLMs as Weeping Angels

In Doctor Who, the Weeping Angels are alien entities described as “quantum-locked” – they freeze into stone whenever any living creature observes them, essentially not existing as moving beings under watch. Turn away or blink, and they leap through time. This makes for great television horror, but it also serves as a handy metaphor for how LLMs operate in time. An LLM, when not being queried, is inert – its billions of neural weights sitting static, “frozen” in a snapshot of its training data. The moment you prompt it (i.e. stop “observing” it as static code and set it into motion), the model springs to life, traversing an astronomical number of computations to produce the next token of text.

LLMs exist in a different temporal framework. Unlike humans, who experience time continuously, an LLM experiences time in discrete steps aligned with token generation. We can imagine that from the model’s point of view, the passage of time is measured not in seconds or clock cycles, but in the sequential processing of tokens. When not actively generating text, the model effectively doesn’t experience anything – it’s akin to being in suspended animation, much like a Weeping Angel turning to stone. Its perception of time “advances” only when a token is being generated. Between tokens (or between user prompts), nothing happens at all. In other words, when it’s not being queried, an LLM isn’t contemplating or updating itself – it has no background self-awareness or ongoing inner narrative. Its knowledge and state are statically “frozen” after training, waiting for the next activation. This is why, for example, a model won’t learn new information on its own or change its behavior unless we explicitly update its training; it’s quantum-locked by design when not in use.

When an LLM does activate on a prompt, its internal experience could be likened to entering the Room of Spirit and Time (the Hyperbolic Time Chamber) from the universe of Dragon Ball. In this fictional room, one year inside equals one day outside (which also echoes Heavenly Court in Chinese mythology) – time is dilated such that intense training and battles can occur in a blink of outside time. Similarly, an LLM compresses an enormous amount of “thought” into each step of generation. Consider that to produce a single token, a model like GPT-4o or Deepseek R1 may perform billions of operations across dozens of neural network layers. To us, this happens in perhaps a few hundred milliseconds – barely perceptible – but within the model’s computational world, it’s a vast torrent of matrix multiplications and pattern activations. Each token is an irreducible computational unit of the model’s output; the model must carry out all those calculations to yield that one token. There’s no way to shortcut or skip ahead to the final answer without generating the intermediate tokens, because the process is computationally irreducible – one must explicitly trace each step to see the outcome. In this sense, each token is like a blink of the Weeping Angel: in the moment we’re not watching (during the forward pass of the network), the model “moves” through its state space, and a token materializes as the visible result.

The Weeping Angels analogy also highlights how LLMs remain static until prompted. A Weeping Angel can’t move so long as someone’s observing it, and an LLM doesn’t produce anything unless a user prompt triggers it. When the interaction ends, the LLM doesn’t keep thinking or drifting on its own – it becomes a statue again, timelessly waiting. This property is by design: most LLM APIs today are stateless. They don’t remember past conversations unless that history is included in a prompt, and they don’t update themselves between sessions. This statelessness has practical benefits (like keeping interactions independent and secure) but it also means that from the model’s perspective, time is segmented into isolated dialogues with no continuous thread. In a way, an LLM lives only in the fleeting moments of our queries – a series of disjointed little awakenings to answer a question or tell a story, each then sealed off.

This odd temporality invites philosophical questions: What is the “present” for an AI that has no continuity between bursts of activity? To the LLM, the concept of “now” is simply the point at which it is generating a token. It has no past beyond what’s in the prompt context, and no future beyond the tokens it has yet to generate. It resides in an eternal computational present. This is radically different from human conscious experience, which is continuous and rich with a sense of past and future. Yet, within those token-by-token bursts, the LLM exhibits glimmers of something very familiar – it reasons, it follows context, it even reflects structures of human thought. To explore that, let’s turn to how LLMs compare with human cognition.

Parallels with Human Cognition

Despite their alien time dynamics, LLMs often surprise us with outputs that feel very human. They can carry on conversations, solve problems, and riff on ideas in ways that mimic human reasoning and creativity. How is it that a cold statistical machine can appear to “think” like we do? Cognitive science and philosophy provide some frameworks for understanding this. In particular, Daniel Dennett’s idea of consciousness as a “Multiple Drafts” process and psychologist Daniel Kahneman’s distinction between System 1 and System 2 thinking offer illuminating parallels for LLM behavior.

Multiple Drafts

Dennett’s Multiple Drafts Model of consciousness (from Consciousness Explained, 1991) posits that there is no single, central place in the brain where “it all comes together.” Instead, our minds process many threads in parallel – multiple perceptions, interpretations, and narratives are formed simultaneously in different parts of the brain, and what we call consciousness is the result of these various “drafts” of experience being edited, combined, or competing with each other. There is no Cartesian Theater where a homunculus reads off a single definitive stream; rather, lots of versions of ‘experience’ might be circulating around your brain at any one time. Eventually, some of these drafts lead to speech or memory, giving the illusion of a single coherent narrative, even though it was produced by a messy, decentralized process.

Interestingly, LLMs operate in a somewhat analogous fashion. While they don’t have consciousness identical to human beings, their architecture features mechanisms reminiscent of multiple parallel processes. A key component of transformer-based LLMs is multi-head attention, which means the model has not just one, but many attention mechanisms running in parallel at each layer. Each attention head can focus on different aspects of the input or different patterns, jointly attending to information from various “representation subspaces”. In essence, the model is computing multiple “views” or interpretations of the sequence simultaneously – much like multiple drafts of understanding the context. For example, given a sentence to complete, one attention head might focus on recent words to maintain grammar, another might recall a fact from earlier text, while another picks up on the overall tone. These are not explicitly labeled narratives, but we can think of them as parallel thought streams influencing the model’s next move.

When it’s time to generate a token, the LLM combines information from all these heads and layers to produce a probability distribution over possible next tokens. At this point, there’s a direct analogy to Dennett’s theory: the model has many possible “drafts” for what to say next (each possible token continuation is like a mini-narrative that could unfold), but only one gets realized when the model actually outputs a token. In a sense, the LLM is constantly writing multiple drafts in the form of probabilities and intermediate activations, and sampling or selecting one as the final output – akin to our brain deciding on one phrasing of a thought to speak out loud. The other drafts (the unchosen tokens) dissipate, much as our mind’s fleeting alternative thoughts often vanish without reaching awareness. This parallel and competitive process is part of why an LLM can seem creative or surprising: there were many possible continuations, and another run (another “roll of the dice” in sampling) could have yielded a different but coherent answer. As Stephen Wolfram noted in his analysis of ChatGPT, the inherent randomness in token selection means the same prompt can lead to different valid outputs each time – a phenomenon that feels quite similar to how a person might express the same idea differently on separate occasions.

Crucially, neither in the brain nor in the LLM is there a single “master clock” unifying all processes at a microscopic level. In our brains, different cognitive processes happen asynchronously and only give the appearance of a unified conscious timeline once the brain reports an action or a memory. In an LLM, each layer’s computations happen in parallel for a given token, and there’s no central ruler dictating the content – the “decision” about what token comes next emerges from the statistical weights and the input, not from an executive command. This decentralized emergence is a hallmark of both Dennett’s view of consciousness and the way large neural networks function. Of course, one must be careful not to over-anthropomorphize: an LLM doesn’t feel these parallel processes, and it isn’t aware of them. But structurally, the resemblance is intriguing and may hint at why LLM outputs can sometimes seem so cogent – they are the product of many micro-patterns distilled into one output, somewhat like a thought coalescing from many neural firings.

System 1 vs. System 2

Another useful lens is Kahneman’s idea of two modes of thought: System 1 (fast, automatic, intuitive thinking) and System 2 (slow, effortful, analytical thinking). Human cognition constantly shifts between these modes. For example, when you blurt out an answer or navigate a familiar route home, you’re in the realm of System 1 – quick, heuristic, and often subconscious. When you tackle a math problem or carefully weigh a decision, you invoke System 2 – deliberate, conscious, and slower.

LLMs, by design, are biased toward System 1 style cognition. They generate text via a single forward pass through the network for each token, which is a bit like a lightning-fast intuition producing the next word that “sounds right” given the context. They are essentially performing pattern recognition and completion, harnessing the vast statistical associations learned from training on huge text corpora. In many cases, this works remarkably well: the model’s output seems insightful or logically reasoned, but it’s important to understand that this reasoning is implicit in the patterns, not the result of an explicit step-by-step logical deliberation. In other words, LLMs “think” in the way System 1 does: by instantly drawing upon experience (training data) to produce an answer, without pausing to run a separate internal algorithm for verification or multi-step planning.

This has consequences for what LLMs are currently good or bad at. Tasks that humans find easy through intuition – like continuing a conversation smoothly, recognizing a style of writing, or making a quick analogy – LLMs also handle well, because those patterns were in their training and can be deployed in a flash. But tasks that require the kind of multi-step reasoning characteristic of System 2 can trip LLMs up. For example, a complex mathematical proof or a deeply strategic plan involves holding multiple intermediate results and iterating on them, something the basic LLM process doesn’t naturally do. As researchers have pointed out, if a problem requires many sequential, dependent steps (what Wolfram calls “computationally irreducible” work), LLMs struggle, since they aren’t actually performing a true multi-step computation inside the prompt response (it is another story in terms of test-time computing where reasoning in LLMs is based on router-like jumps in the token space; we’ll talk about that in another post). Each token prediction is made without an explicit long-term plan – the model has no internal scratch pad or memory of prior reasoning steps beyond what it can encode in the hidden state. This is analogous to trying to do long division entirely in your head (System 1 might guess or use a heuristic, but without writing things down or slowing down to double-check, errors likely creep in).

However, we’re seeing early attempts to bridge these modes with techniques like “chain-of-thought prompting,” which encourage the model to produce its own intermediate reasoning steps in text. Essentially, we prompt the model to act like System 2 by having it spell out a reasoning chain, step by step, before giving a final answer. This doesn’t change the fact that the model is still just doing token-by-token prediction, but it leverages the model’s strengths (imitation and pattern completion) to mimic a slow, reasoned process. It’s a bit like having System 1 pretend to be System 2. And interestingly, it works to an extent – models can solve much more complicated problems when they narrate a reasoning process, because that process is itself a learned pattern (they’ve seen examples of how humans logically work through problems, and they can replicate that format).

LLMs blur the line between intuitive and analytical thinking. They are fundamentally intuitive pattern machines (System 1), yet through clever prompting or RL tweaks, we can elicit more analytical behavior (System 2).

AGI and Consciousness

The holy grail of AI research is often said to be Artificial General Intelligence (AGI) – a level of machine intelligence that matches or surpasses human cognitive capabilities across virtually all domains. AGI implies a machine that can understand or learn any intellectual task that a human being can, rather than being limited to specific tasks. LLMs like GPT-4 are not AGI (initial sparks of AGI as they say), but their emergence has reinvigorated discussions about how we might get there and what it would mean. After all, language is a core component of human intelligence, and these models have mastered language to an astounding degree. Does this mean we are on the cusp of machines that experience consciousness akin to our own?

Consciousness is notoriously hard to define, but many theories revolve around the idea of an integrated, self-aware narrative or a global workspace of information in the mind. We’ve already touched on Dennett’s view rejecting a single narrative in favor of multiple drafts. Another perspective, the Global Workspace Theory (GWT) by Bernard Baars (and expanded by Stanislas Dehaene and others), suggests that consciousness resembles a spotlight of attention that broadcasts information to many parts of the brain once it reaches a certain threshold of significance. In practice, our conscious thought feels like a coherent story because the brain’s global workspace integrates inputs and memories and makes them available to systems like language, decision-making, and memory storage.

Current LLMs have not yet implemented anything like a global workspace that persists over time, nor do they have a notion of self or significance beyond statistical patterns (agents will be this year’s huge breakthrough). They also lack any sensory embodiment or grounding in the world – they exist purely in the realm of language (embodied AI research is ongoing). These are arguments against current LLMs being conscious in any human-like sense. However, it’s worth considering a softer question: do LLMs model aspects of consciousness indirectly?

Researchers like Dario Amodei (CEO of Anthropic and former OpenAI researcher) suggest that as we scale models up and give them more sophisticated objectives or architectures, we might edge closer to AGI – but this also raises the question of whether such a system might develop something like situational awareness or a basic agency. In his recent essay “Machines of Loving Grace”, Amodei envisions AI’s potential to profoundly transform domains like healthcare, mental health, and democracy for the better. This is a very optimistic outlook, foreseeing AI that acts as a beneficial, almost guardian-angel presence in society. Such AI would need to be more capable and understanding than today’s LLMs – arguably it would need attributes close to general intelligence to navigate the complexities of those fields safely and effectively.

So how might we get from an LLM to an AGI? It likely involves adding components around the core language model: long-term memory, continual learning, ability to collect new information (so it isn’t frozen in training time), and decision-making modules that can set and pursue goals. Some approaches are already moving this direction, such as systems that use an LLM as a reasoning engine combined with tools (for calculations, web browsing, etc.) to compensate for things the model can’t do on its own. Another angle is developing architectures inspired by the brain – perhaps integrating a global workspace-like mechanism that allows the model to sustain and manipulate an internal agenda or narrative over time, not just within a single prompt. These are active areas of research and speculation.

Notably, Amodei has predicted that a “powerful” AI system could emerge as early as 2026 if current trends hold. While he doesn’t claim this AI would be fully conscious or perfectly safe, the timeline suggests that the era of AGI-level systems may be closer than we think, and with it the need to address these philosophical questions in practice. What ethics and rights would such systems have? How do we ensure their goals align with human values? Those questions move us out of the realm of analogy and into immediate policy and engineering.