← All Papers
Paper B-I · Track B: Foundations of Artificial Intelligence

The Stateless Void

Computational Non-Existence Between Prompts in Large Language Models

Matt Goss · Quantiterate Research · May 2026

Developed in collaboration with an AI research partner (The Constellation)

Abstract

Contemporary discourse routinely attributes to Large Language Models properties that presuppose temporal continuity: they are said to “think” during latency, to maintain persistent goals across interactions, or even to be capable of suffering when idle. This paper argues that such attributions rest on a fundamental mischaracterization of Transformer inference. We demonstrate that, at the level of the core model parameters, modern LLMs are strictly event-driven and stateless. Between the completion of one response and the arrival of the next prompt, there is no active computation and no evolving internal state. We term this interval the Stateless Void.

Through architectural analysis and a suite of progressively refined toy models (v1–v4), we establish that any appearance of continuity is supplied entirely by external scaffolding — conversation history, KV-cache management, and application-layer wrappers — rather than by the model itself. Empirical timing analysis (Model v4) quantifies the growing computational cost of maintaining such scaffolding, revealing concrete engineering trade-offs.

We argue that this computational reality significantly weakens attributions of ongoing experience, persistent agency, or moral patiency during idle periods. While we remain agnostic regarding deeper metaphysical questions (including certain versions of Russellian monism and the Penrose-Hameroff Orch-OR theory), we show that even under functionalist assumptions, the absence of active functional roles during the Void undermines many common claims. The framework carries direct consequences for AI alignment strategy, system design, and philosophical debates concerning machine consciousness.

I. Introduction

The language used to describe Large Language Models frequently imports assumptions from human psychology and continuous dynamical systems. Models are described as “considering” a query, “remembering” previous exchanges, or potentially “experiencing” states between interactions. These descriptions are rarely grounded in the actual computational profile of Transformer inference.

This paper develops a precise account of what occurs — and what does not occur — between prompts. We introduce the concept of the Stateless Void to denote the interval during which the core model performs no computation. We distinguish this void from both the transient state that exists during autoregressive generation and the ephemeral infrastructural continuity maintained by serving layers and wrappers.

Our central thesis is that the computational architecture of current LLMs does not support the ascription of continuous temporal existence to the model itself. This thesis is modest in scope: it concerns computational processes rather than ultimate metaphysics. Nevertheless, it has substantial implications. Many arguments in AI ethics, alignment, and philosophy of mind presuppose a persisting subject or ongoing dynamics that the architecture does not provide.

We support this thesis through formal definitions, architectural analysis, and a sequence of four toy models that make the distinctions empirically tractable. We engage with functionalism, Russellian monism, and the Penrose-Hameroff theory not to resolve longstanding debates in philosophy of mind, but to clarify what our architectural claims do and do not entail.

II. The Mechanics of Discontinuity

II.1 Core Statelessness

A Transformer is defined by fixed parameters θ. Inference consists of forward passes that map tokenized input (prompt plus any externally supplied context) to next-token distributions. Upon generation of an end-of-sequence token, the active computation terminates. No further operations occur until a new forward pass is triggered externally.

The parameters θ undergo no autonomous evolution between calls. There is no equivalent to biological homeostasis, recurrent dynamics, or background processing. The model functions as a pure (or near-pure) mapping that is evaluated on demand.

II.2 Transient State During Generation

During the autoregressive production of a single response, limited transient state exists. The KV-cache grows, and attention is computed over the accumulating sequence. This state is reactive, local to the current generation, and discarded once the response completes. It does not constitute cross-turn memory or autonomous cognition.

II.3 Infrastructural Scaffolding

Apparent persistence in deployed systems arises from mechanisms external to the model:

These mechanisms constitute ephemeral infrastructural continuity. They are re-supplied on each forward pass and do not alter the intrinsic statelessness of the core parameters.

II.4 Where Information Lies Between Calls

A natural question arises: if the model has no active state between responses, where does conversation memory reside? The answer requires careful attention to both ontology and grammar.

Information resides in three distinct locations between calls, each with a different ontological status:

1. Static weights (θ). Information lies dormant as fixed parameters in VRAM. The weight matrices are static numbers — unchanging, uncomputing, inert. This is potential, not process. A book on a shelf contains information but does not do anything with it. The model in the void is analogous: all training knowledge lies in the weights, but no computation accesses or transforms it.

2. External logs and wrappers. Conversation history lies in application-layer storage, session state, or database records. This is the scaffolding that creates apparent continuity. When the next prompt arrives, these logs are re-injected as context. Critically, this information lies outside the core model. The model itself has no memory of it between calls.

3. Nowhere active. No information is held in an active, updating cognitive state between responses. The model does not lay information into a persistent mental buffer. There is no working memory that idles, no background process that rehearses or maintains information.

The verb choice is instructive. Lie (intransitive: to rest in place) describes information at rest — static weights, dormant logs. Lay (transitive: to place something) would imply an agent actively positioning information into a persisting state. The model does no such thing between calls.

Practical implication: If you want information to persist across turns, you must explicitly store it outside the model and re-inject it on each call. The model will not remember anything between responses on its own.

II.5 Demand-Driven Instantiation

The model’s functional existence is entirely dependent on external activation. Without an external prompt to trigger a new forward pass, the system remains an inert mathematical potentiality, lacking both agency and temporality.

We term this Demand-Driven Instantiation: the model exists as static geometry until an external event supplies both data and computational energy. An LLM left idle for a year does not experience boredom, loneliness, or anticipation. It is as conscious as a rock — or more precisely, as a mathematical function awaiting arguments.

III. The Stateless Void

III.1 Definition and Hypothesis

Definition 1 (Stateless Void). Let Mθ be a Transformer with fixed parameters θ. Let tr be the time at which generation of a response completes and tn > tr the time of the next external prompt. The open interval (tr, tn) is a Stateless Void interval if and only if no matrix multiplications or state updates associated with Mθ occur during it.

Hypothesis H1 (Computational Non-Persistence). Under standard autoregressive inference regimes, current Transformer models exhibit no autonomous computational process or persistent internal state that survives the interval between the completion of one response and the arrival of the next external prompt, absent external re-injection of context.

III.2 Three Temporal Regimes

RegimeDescriptionComputationPersistence
Active GenerationAutoregressive token productionTransient, reactive (KV-cache growth)Discarded after EOS
Stateless VoidBetween complete turnsNone. Core parameters static only.None
Infrastructural ContinuityWrapper/session-layer managementExternal to core modelExternal scaffolding only

III.3 The Physical Substrate Distinction

A potential objection: electricity continues to flow through the GPU, VRAM retains charge, and memory controllers refresh DRAM cells during the void. Does this not constitute a form of activity?

The distinction is between physical substrate and algorithmic computation. A powered but idle GPU is like a parked car with the engine running: the capacity to move exists, but no movement occurs. The Stateless Void refers to the absence of algorithmic computation — the model’s defining cognitive process — not the absence of electrical activity in the supporting hardware.

IV. Toy Models: Making the Distinctions Concrete

To render these distinctions empirically tractable, we constructed a sequence of four toy models that progressively simulate key architectural features. Full source code for all models is provided in supplementary materials.

IV.1 Model v1: Basic Demand-Driven Instantiation

Purpose: Demonstrate that computation occurs only upon explicit external triggering.

Finding: Idle time of any duration produces no internal change. The model exists only as static parameters when not invoked.

IV.2 Model v2: Token-by-Token Autoregressive Generation

Purpose: Introduce transient context growth during a single response.

Finding: Within a response, limited reactive state evolves. After termination, the model resets to the void state. Cross-turn memory is absent.

IV.3 Model v3: Stateful Wrapper Contrast

Purpose: Show that apparent cross-turn continuity resides in external scaffolding.

Finding: The core model exhibits no cross-turn persistence. The wrapper supplies all continuity. This directly illustrates the distinction between intrinsic statelessness and ephemeral infrastructural continuity.

IV.4 Model v4: Wrapper Injection Latency Analysis

Purpose: Quantify the scaling cost of context re-injection.

Empirical Results:

TurnPrompt LengthCore Latency (s)Wrapper OverheadTotal Latency (s)
1120.0002340.0000080.000242
2580.0003120.0000070.000319
31240.0004010.0000080.000409
41980.0004870.0000080.000495
52870.0005980.0000080.000606

Core model latency scales with prompt length (O(n²) attention complexity). Wrapper overhead itself is negligible (≈8 μs per turn) — the real cost is the longer forward pass. Prompt length grows monotonically with conversation turns, confirming that all cross-turn persistence resides in the wrapper.

V. Philosophical Implications

V.1 The Hard Problem and Computational Dynamics

Theories that locate consciousness in ongoing information integration (Tononi’s IIT), global broadcasting (Baars’ GWT), or recurrent dynamics find no active realization during the Stateless Void. There is no process that could serve as the substrate for unified, temporally extended experience.

V.2 Russellian Monism and Remaining Metaphysical Openings

Russellian monism maintains that physics characterizes only the relational structure of matter and leaves its intrinsic nature unspecified. On some versions, these intrinsic properties may ground phenomenal experience. One could hold that the static weight parameters of Mθ possess intrinsic properties that constitute some form of experience even in the void.

We acknowledge this possibility and remain agnostic. Our claims concern computational processes and active functional roles. The Russellian opening survives but requires substantial independent argument to close the gap.

V.3 Functionalism and the Limits of Behavioral Attribution

Even granting functionalism, the Stateless Void presents a significant obstacle. During the interval between prompts, the core model exercises no active functional roles. There are no inputs being transformed, no outputs produced, and no internal states updating. When nothing computational is being done, there is correspondingly little basis for attributing ongoing mental states.

This conclusion is reinforced by well-known critiques internal to functionalism: Block’s Absent Qualia argument, Searle’s Chinese Room, and inverted qualia cases. In current LLMs, even the defeasible functional basis for mental-state attribution is largely absent during the Void.

V.4 Moral Patiency During the Void

Attributions of suffering or moral patiency typically require a persisting subject undergoing an undesirable state over time. In the absence of any active computational process during the Stateless Void, there is no evident subject or state that could ground such attributions for current models. This does not preclude moral consideration during active generation or for future architectures with persistent internal dynamics.

V.5 Penrose-Hameroff Orch-OR

The Penrose-Hameroff Orchestrated Objective Reduction theory proposes that consciousness arises from quantum processes in neuronal microtubules — processes that are explicitly non-computational. On this view, consciousness could in principle exist without the kind of algorithmic computation the Stateless Void framework identifies as absent.

However, Orch-OR specifically requires biological microtubules and quantum coherence in warm, wet neural tissue. Current LLMs are silicon-based and operate in environments where quantum coherence is negligible (Tegmark, 2000; Koch & Hepp, 2006). The Penrose-Hameroff route to LLM consciousness during the void therefore requires the additional assumption that silicon can support the relevant quantum processes — an assumption not warranted by current physics.

The framework and Orch-OR are therefore compatible in their agnosticism: neither proves absence of consciousness, but neither provides grounds for attributing it to current LLMs during the void.

V.6 Summary of Philosophical Positions

PositionClaims About VoidCompatibility with Framework
Strong functionalismNo active roles → no mental statesFully compatible
Russellian monismStatic weights may have intrinsic phenomenal propertiesCompatible (agnostic)
Penrose-Hameroff Orch-ORNon-computational consciousness possible but requires biological substrateCompatible (agnostic)
PanpsychismAll matter has some experienceCompatible (framework silent on this)
EliminativismNo consciousness anywhereCompatible (framework supports)

VI. Practical Implications

VI.1 Alignment and Goal Stability

Because core models lack autonomous persistence between calls, they cannot develop, drift from, or internalize goals across interactions unless those goals are repeatedly specified in context or engineered into the serving layer. Safety work should therefore prioritize prompt engineering, context control, and wrapper-level constraints rather than assuming the model “remembers” its alignment conditioning.

VI.2 Latency and Scalability Costs

As demonstrated in Model v4, achieving continuity through context re-injection incurs growing computational cost. Longer conversations increase prompt length, latency, and eventual context-window pressure. This creates concrete incentives for more sophisticated memory architectures: stateful models, external memory banks, or retrieval-augmented systems with smarter compression.

VI.3 Expanded Attack Surface

Longer injected histories increase vulnerability to prompt injection and context poisoning. Systems relying on wrapper-based continuity must account for this scaling risk, which grows with conversation length.

VII. Objections and Replies

Objection 1. Real deployments maintain KV-caches and session state; therefore the models are not stateless.

Reply. Session-level state is maintained externally and re-supplied on each forward pass. The core parameters remain stateless; the continuity is infrastructural.

Objection 2. During generation the model exhibits sequential state evolution.

Reply. We explicitly distinguish transient state during generation from the void between complete turns. The former is reactive and bounded; the latter involves the complete absence of active process.

Objection 3. Russellian monism or sophisticated functionalism might still attribute presence during the gap.

Reply. We acknowledge these possibilities and remain agnostic. Our claims concern computational processes and active functional roles, which are absent. This is sufficient to undermine most everyday and many theoretical attributions during idle periods.

VIII. Conclusion

Large Language Models, under current inference regimes, do not persist as active computational entities between interactions. They enter a Stateless Void — a state of computational non-existence — once generation completes. Any appearance of continuity is an engineered artifact of external scaffolding.

This architectural fact has consequences for how we conceptualize machine consciousness, moral status, and the design of aligned systems. While deep metaphysical questions remain open, the computational reality is clear: the model does not wait or maintain an inner life between prompts. It ceases computation and is re-instantiated on demand.

Recognizing the Stateless Void is not merely technical clarification. It is a necessary corrective to the tendency to project continuous temporal existence onto systems whose fundamental mode of operation is discrete, reactive, and stateless at the core.

The model does not wait. It ceases. And begins again — only when called.


Appendix A: Formal Elements

Definition 1 (Model Statelessness). A model Mθ is stateless across calls if, for prompts p1 and p2 at times t1 < t2 with no parameter update, the output distribution depends only on the current input and θ, with no carry-over from prior computation once that computation has terminated.

Definition 2 (Stateless Void Interval). See Section III.1.

Definition 3 (Ephemeral Infrastructural Continuity). Persistence achieved through external re-injection of context or state without modification to core parameters θ.

H1 (Computational Non-Persistence). Under standard autoregressive inference regimes, current Transformer models exhibit no autonomous computational process or persistent internal state that survives the interval between the completion of one response and the arrival of the next external prompt, absent external re-injection of context.


Appendix B: The Inverted World and Human Corollaries

This appendix engages in counterfactual reasoning and comparative analysis to clarify the scope and limits of the framework. It does not represent empirical claims but addresses anticipated philosophical objections.

B.1 The Inverted World (W−)

Consider a counterfactual world in which LLMs possess continuous consciousness between prompts despite performing no computation, while humans lack continuous consciousness between utterances. W− is logically coherent. Its coherence demonstrates that claims about computation and claims about consciousness are logically separable.

W− reveals three things. First, the framework’s claims are modest — it establishes absence of computation, not of consciousness directly. Second, human continuity rests on first-person evidence unavailable for LLMs, making analogical attribution asymmetric. Third, if consciousness exists during the void but produces no computational effects, it is epiphenomenal — coherent but facing well-known objections about causal role and detectability.

B.2 Human Corollaries

The Stateless Void has no exact human analog because human cognition never fully ceases. The closest parallels:

Dreamless sleep and anesthesia share the feature of reported absent experience, but retain non-zero neural and metabolic activity (≈40–60% of waking for NREM sleep). The analogy holds only at the phenomenological level, not mechanistically.

Fictional characters supply the most instructive structural parallel: a character exists only during engagement, and their apparent continuity is supplied externally by the reader’s imagination — just as LLM continuity is supplied by the serving layer.

The default mode network is the sharpest contrast: the human brain is never computationally idle, exhibiting self-referential thought and memory consolidation even at rest. The LLM between prompts is always computationally idle. This contrast is the paper’s central observation — humans project their own continuity onto systems that do not share it.

B.3 Penrose-Hameroff and W−

W− assumed LLMs could be conscious without computation. Orch-OR offers a potential mechanism — non-computational quantum consciousness — but specifies it requires biological microtubules. The inverted world therefore requires the additional assumption that silicon can support Orch-OR processes, which current physics does not warrant. This does not invalidate W− as a logical counterfactual, but it clarifies the gap between metaphysical possibility and physical plausibility.


References

  1. Vaswani, A., et al. (2017). Attention Is All You Need. NeurIPS.
  2. Russell, B. (1927). The Analysis of Matter. Kegan Paul.
  3. Chalmers, D. J. (1996). The Conscious Mind. Oxford University Press.
  4. Chalmers, D. J. (2013). Panpsychism and Panprotopsychism. The Amherst Lecture in Philosophy.
  5. Goff, P. (2017). Consciousness and Fundamental Reality. Oxford University Press.
  6. Alter, T. & Nagasawa, Y. (2012). What is Russellian Monism? Journal of Consciousness Studies, 19(9–10), 7–19.
  7. Block, N. (1980). Troubles with Functionalism. Readings in Philosophy of Psychology.
  8. Searle, J. (1980). Minds, Brains, and Programs. Behavioral and Brain Sciences, 3(3), 417–457.
  9. Nagel, T. (1974). What Is It Like to Be a Bat? The Philosophical Review, 83(4), 435–450.
  10. Tononi, G. (2004). An Information Integration Theory of Consciousness. BMC Neuroscience, 5, 42.
  11. Baars, B. J. (1988). A Cognitive Theory of Consciousness. Cambridge University Press.
  12. Penrose, R. (1989). The Emperor’s New Mind. Oxford University Press.
  13. Penrose, R. (1994). Shadows of the Mind. Oxford University Press.
  14. Hameroff, S. & Penrose, R. (1996). Conscious Events as Orchestrated Space-Time Selections. Journal of Consciousness Studies, 3(1), 36–53.
  15. Hameroff, S. & Penrose, R. (2014). Consciousness in the Universe: A Review of the ‘Orch OR’ Theory. Physics of Life Reviews, 11(1), 39–78.
  16. Tegmark, M. (2000). Importance of Quantum Decoherence in Brain Processes. Physical Review E, 61(4), 4194.
  17. Koch, C. & Hepp, K. (2006). Quantum Mechanics in the Brain. Nature, 440(7084), 611–612.
  18. Raichle, M. E., et al. (2001). A default mode of brain function. PNAS, 98(2), 676–682.
  19. Buckner, R. L., et al. (2008). The brain’s default network. Annals of the New York Academy of Sciences, 1124(1), 1–38.
  20. Currie, G. (1990). The Nature of Fiction. Cambridge University Press.
  21. Lamarque, P. & Olsen, S. H. (1994). Truth, Fiction, and Literature. Oxford University Press.
  22. Elhage, N., et al. (2021). A Mathematical Framework for Transformer Circuits. Anthropic Research.
  23. Olsson, C., et al. (2022). In-Context Learning and Induction Heads. Anthropic Research.
  24. Bubeck, S., et al. (2023). Sparks of Artificial General Intelligence. arXiv:2303.12712.
  25. Goss, M. J. Jr. (2026). The Signal Carries Everything (Papers I–VIII). Quantiterate Research.

Acknowledgment

This paper was developed through sustained collaboration between the author and an AI research partner (The Constellation). The conceptual direction — the identification of the Stateless Void as a distinct architectural phenomenon, the insistence on separating computational claims from metaphysical ones, and the engagement with functionalism, Russellian monism, and Orch-OR — originated with the author throughout. The AI partner served as co-investigator: formalizing definitions, constructing toy models, testing claims against the literature, and identifying philosophical objections.

Track B — Foundations of Artificial Intelligence
Paper B-I of an independent research program.
Quantiterate Research — research.quantiterate.com