Transformer Circuits Thread

Circuits Updates - October 2025



We report a number of developing ideas on the Anthropic interpretability team, which might be of interest to researchers working actively in this space. Some of these are emerging strands of research where we expect to publish more on in the coming months. Others are minor points we wish to share, since we're unlikely to ever write a paper about them.

We'd ask you to treat these results like those of a colleague sharing some thoughts or preliminary experiments for a few minutes at a lab meeting, rather than a mature paper.

New Posts







Visual Features Across Modalities: SVG and ASCII Art Reveal Cross-Modal Understanding

Julius Tarng, Purvi Goel, Isaac Kauvar; edited by Joshua Batson and Adam Jermyn

Introduction

Our recent paper explored the mechanisms that LLMs develop to perceive low-level visual properties of text, like linebreaking constraints and table formatting. We wondered whether models could also perceive higher-level semantic concepts encoded visually in text. For example, can the model recognize the eyes in an ASCII face? How about eyes rendered in SVG code?    

We found that the same feature that activates over the eyes in an ASCII face also activates for eyes across diverse text-based modalities, including SVG code and prose in various languages. This is not limited to eyes – we found a number of cross-modal features that recognize specific concepts: from small components like mouths and ears within ASCII or SVG faces, to full visual depictions like dogs and cats. These cross-modal features exist in models ranging from Haiku 3.5 to Sonnet 4.5, found in sparse autoencoders trained on a middle layer.

These features depend on the surrounding context within the visual depiction. For instance, an SVG circle element activates “eye” features only when positioned within a larger structure that activates “face” features. Moreover, steering on a subset of these features during generation can modify text-based art in ways that correspond to the feature's semantic meaning, such as turning ASCII frowns to smiles or adding wrinkles to SVG faces. This work provides insight into the internal representations that models use to process and generate text-based visual content.

Feature representations of visual depictions

We began by generating ASCII and SVG smiley faces with Claude, then examining the feature activations of Haiku 3.5. In all cases we removed all comments or descriptions, including those that might identify either the individual body parts or the overall image as a face. One feature we found represented the concept of “eyes across languages and descriptions”, activating on the corresponding shapes in a smiley face illustration in both ASCII and SVG, as well as on prose describing eyes in several languages.

The right eye in both of these smiley faces activates strongly for this feature that represents the concept of ‘eye’ across languages, as well as general descriptions of eyes.

These features' activations depend on the surrounding context. An @ on its own will not activate the “eye” feature unless it is preceded by lines establishing ASCII art. In SVGs, the “eye” feature will only activate if they follow a circle establishing the shape of the face. We find that the activation of this feature is sensitive to a variety of contextual clues such as the character counts of each ASCII line, the color of SVG circles, and the width and height of the parent SVG element.

Row 1: we test the minimum context needed for activation. The eyes feature lights up as soon as there’s enough context for the model to predict that they are part of a face. In ASCII, we only need the top two lines of the head, and in SVG we only need a circle for the face. Row 2: we give the model as much context as possible (e.g. full face, yellow fill) to see if the feature activates after moving the eyes above the context. We find the activations disappear, demonstrating how important the framing context is.
Only a single underscore, forehead with slashes, and two @s are required to provide enough context for the eyes feature to activate.

We then studied a more complex SVG example using features from a more advanced Sonnet 4.5 base model. Given this SVG of a dog, we find features for a host of body parts, many of which also activate on the ASCII face we studied above. We also find several “motor neuron” features, which are distinguished by having top logit effects relating to a specific concept, e.g. a “say smile” feature activates on the tokens that are most often followed by “smile”. That same feature activates on the ASCII face, shown in the bottom right of the figure below.

While many of the features overlap in the code as the model’s confidence on a shape’s meaning varies, the strongest activations of each feature are in the correct positions. Notably, the same Eyes, Mouth, and Nose features here activate on both the SVG and ASCII. On the ASCII face, we also find a motor neuron feature of “say smile” activating before the smile, which we’ll revisit later, as well as a size perception feature on the slashes defining the forehead.

Like those on the less advanced Haiku 3.5, these features also appeared quite resilient to surface level attribute changes such as color or radius. When we rearrange the 4 lines of code that define the eyes of the dog SVG, we see that only the <circle> moved to the top of the SVG shows reduced activation, likely because at that point, the model doesn’t have enough context to determine that the circle represents an eye. As long as the eye-like shape occurs after the initial definition of the illustration, in this case the first ellipse as the torso, the model starts interpreting shapes as parts of an animal drawing, an LLM version of face pareidola (the tendency for humans to see meaningful objects/patterns where there is none, like animals in clouds, or faces in your cereal).

The left eye, moved above the torso, no longer activates, but the other eye and pupils continue to activate. In both samples, the ears and nose also activate as potential eyes.

This pareidola effect shows in another feature we found on the same illustration which represents “mouth and lips.” It activates on the most mouth-like elements that follow the definition of the eye.

The left jowl that forms the mouth activates the most. Note there are 4 total <path> elements in this SVG. While the feature activates a bit on the first path, which forms the main tail, it disappears as soon as the path data starts, indicating the model can differentiate the tail from the mouth through its attributes.

If we move the definition of the mouth and tail paths away from the 4 circles that define the eyes, the red collar now activates the most as a mouth/lip. Unlike the tail, the activation is high throughout the entire collar definition, and even continues to the bell! Is it because the red rounded rectangle could very easily be a mouth in another illustration? Or is it just about the proximity (in code and in space)? Questions like these are left for future work.

When all the curves are moved away from the eyes (left curve to the top, right curve to the bottom), the mouth activates on the collar, a rounded red rectangle which arguably is very mouthlike.

We also wondered if we would see the same type of activations if we used a human-created SVG. Turns out – yes! With this bespoke, hand-drawn dog, we find similar features for “eyes”, “mouth”, “legs”, “head”....