Probing emergent 2D spatial reasoning in text-only LLMs
Universität Rostock · Institute for Visual and Analytical Computing
The idea is simple: humans are naturally great at creating mosaic art. From the Roman Empire to French Neo-Impressionism, we can effortlessly place individual strokes to form a larger, coherent image — balancing local action with global structure. Large Language Models, however, struggle with this because they fundamentally lack spatial grounding.
If a language model is trained primarily on text and code, to what extent can it still recover coherent 2D visual concepts when forced to act as a pixel-level or programmatic painter?
Autoregressive Mosaics attempts to force an LLM trained only on text to paint one discrete pixel at a time. The system gives the model a blank grid (M × N) and a text prompt; the model must infer where to place structure and color step-by-step, using only its linguistic priors.
The results are often visually primitive, unstable, or unintentionally abstract — and that is exactly the point. They offer a raw, unfiltered look into how text-only models represent and fracture geometry, shape, and everyday visual concepts. As with any art, outputs are open to interpretation. Squint a little: what do you see?
Two distinct pipelines explore the same core phenomenon from different angles, probing where geometry emerges, degrades, or collapses under autoregressive pressure.
In a single forward pass, the LLM generates an ASCII topology grid alongside a symbol-to-color palette. Every grid cell is a deliberate per-position decision. Because LLMs predict tokens in strict 1D sequence, 2D consistency — object boundaries, symmetry, position memory — quickly degrades. Shapes drift, tear, and collapse into fragmented, often compelling abstractions.
Single PassDirect PixelASCII + PaletteInstead of raw pixels, the LLM outputs Python rendering logic via a constrained drawing API (fill, rect, line, circle, triangle). A deterministic renderer rasterizes the result. This neuro-symbolic pipeline aligns with LLM strengths — symbolic decomposition, procedural logic, code synthesis — yielding sudden spatial coherence absent from the ASCII approach.
Code GenerationNeuro-SymbolicDeterministic@misc{ned2026autoregressivemosaics, author = {Nedungadi, Ashwin}, title = {Autoregressive Mosaics}, year = {2026}, publisher = {GitHub}, booktitle = {CVPR AI Art Gallery}, howpublished = {\url{https://github.com/ashwin-ned/autoregressive-mosaics}} }