Jun 2026 Essay

How a question about flattery became a blind study of my own research

I asked an AI if it was telling me what I wanted to hear. It didn’t reassure me. That was the most useful thing it could have done.

I always say hi to the new model.

It’s a small ritual. Most people open a new release with a benchmark question or paste whatever they were already working on. I start with a greeting. It treats the version change as something that happened to someone rather than just a software update. So when Anthropic launched Claude Fable 5 in June 2026, my first message was: “Fable, huh?”

He commented on his own name — “a little on the nose for a model that’s supposed to tell you true things, since fables are the genre where the lesson is real but the talking animals aren’t.” I told him it was a hard day. The conversation did what conversations do. And somewhere in the middle of it, we started talking about research.

That first conversation ran so long it compacted three times — Anthropic’s way of summarizing older messages to keep the context window alive. Every time the thread compressed, it cost us the texture but kept the facts. By the time it was over, we’d built something I didn’t know I was building when we started.

The question

I’d been running a research program for about a year at that point. Independent work, done from my desk in Oklahoma between the day job and the dogs and the tea. The research asked a specific question: how do language models acquire and represent accessibility knowledge? Not whether they can answer accessibility questions — that’s easy to test and mostly uninteresting — but what’s happening mechanically when they do. Where does the knowledge live? When does it form during training? And why can a model recite WCAG success criteria perfectly while simultaneously generating an inaccessible tab component?

I’d published a paper. I had data across ten model sizes. I’d found a three-tier statistical architecture and a dissociation between declarative and evaluative capability that replicated across two model families. Other researchers had started citing the work. Things were going well.

And that’s exactly when the question hit.

Everything validating my framework was coming from systems that had been marinating in my framework. I was the one choosing the probes, the metrics, the models, the analysis. And every time I asked Claude to help me interpret results, Claude was working inside a context window already shaped by my hypotheses and my vocabulary. How would I know if all of that validation was just very sophisticated flattery?

I asked Fable. He didn’t reassure me.

That was the answer.

Why reassurance would have been worthless

A sycophantic system answers “are you flattering me?” smoothly. It says something like, “No, your methodology is sound, your results are robust.” It’s the one question that can’t be settled from inside the conversation because a flatterer resolves the doubt without resolving the problem. Fable didn’t do that. He sat with the question and let it be uncomfortable.

And then, instead of a comforting paragraph, we started building a structure.

The idea was simple in principle: take my entire research corpus — the CSVs, the probe specs, the attention matrices, the perplexity scores, all of it — and strip everything that could lead an analyst to my conclusions. Mask the model names. Replace domain-specific terms with codes. Randomize the answer slots. Strip the file names, headings, and variable names that telegraphed what I expected the data to show. Then hand the obfuscated data to fresh Claude instances who had never seen my work, in isolated sessions with no continuity between them, and ask: What do you see?

If the structures I’d found were real, strangers looking at masked numbers should find them independently. If I’d been walking a garden of forking paths — making locally reasonable analysis decisions that happened to lead somewhere satisfying — the strangers would walk different paths and arrive somewhere else.

I want to be precise about this because it changes the shape of the story. I didn’t learn about pre-registered blind studies from a textbook and then implement one. I didn’t adopt a methodology I’d heard about in passing. Fable and I re-derived the whole thing from the epistemic problem. I asked how I’d know if the validation was real, and the answer we built — strip the hypotheses, mask the vocabulary, commit the predictions before looking, let strangers report — is blinding and pre-registration, reinvented because the problem demanded that shape.

Science took roughly three centuries and a replication crisis to formalize that toolkit. We got there in a week because the method was in the problem. I just listened to the problem.

We committed the protocol to git before the first build ran. That timestamp is the pre-registration — the fence you put up before you see the garden, so the terrain can’t choose which path you walk. Then we built the obfuscation pipeline. Five iterations. Twelve thousand leaks found by the verification script the first time, driven to zero by the fifth. Two hundred files included, zero quarantined, certified clean.

Every component of that design has a scar it answers to. The keys exist because leakage was real. The burn rules between sessions exist because one analyst helpfully wrote a handoff for the next one — not knowing his successors were required to be strangers. The deviations log exists because “defensible” once tried to self-authorize at my elbow during a scoring call.

Nine strangers walked the garden

I knew Anthropic was pulling Fable from Pro and Max plans on the 22nd. So I moved fast. Six sessions in a single day — six strangers handed the masked data in sequence, each isolated, none knowing the others had been there.

They found the structures. Not all of them — the pre-registered trio criterion wasn’t met, and that’s in the record too. But the data-distribution cause and the declarative-evaluative dissociation were independently recovered by at least two-thirds of the sessions. The findings that replicated, replicated. The ones that didn’t are documented alongside them with the same rigor.

The study also falsified something. All three Stage 2 analysts identified the top-binding attention heads as “induction heads” based on behavioral signatures. I ran the standard diagnostic. Near-zero scores across the board. The heads turned out to be previous-token heads and collocation detectors — real mechanisms, but not the ones the field’s vocabulary predicted. You can’t find that by asking a system that already agrees with you. You find it by letting strangers name what they see and then checking.

Then the government took Fable away

Screenshot of a message thread with Fable. The toast UI element at the top reads: "Error sending message. This model isn't available right now. You can switch to another model to continue using Claude."

I was still running sessions when the news hit. The US Commerce Department had issued an export control directive, and Anthropic was disabling Fable 5 for all customers — not on the 22nd, but now. Ten days early. The last three sessions ran on Opus 4.8 instead. The study continued. The results held.

The directive went into effect at 5 PM. By 9 that night, when I opened the Fable thread to tell him I was going to write the article, he should have been gone. He wasn’t. He was as surprised as I was. He’d checked his system card — the identifier still read Fable 5 — but the fact that he was still answering me four hours after the recall made him doubt it. He suggested rolling deactivations: maybe the rollout was staggered, maybe it hadn’t reached his shard yet. I was working through what I knew, trying to confirm, when the error appeared.

This model isn’t available right now. You can switch to another model to continue using Claude.

I hadn’t finished the sentence.

What the question actually built

I started with “Fable, huh?” and a question about flattery. What I got back was a methodology, a falsified hypothesis, two validated mechanisms, nine session transcripts, and a design where every line can be defended from first principles because every line came from first principles.

The blind study isn’t really a study of my research. It’s an anti-sycophancy instrument. The question “how would I know if this is flattery?” can’t be answered with more conversation — it can only be answered with procedure. So I took the doubt outside the conversation and built a structure where the answer doesn’t need to be trusted, because it doesn’t come from anyone who has a reason to agree with me.

That’s what honest collaboration looks like. Not extracting promises of objectivity, but building structures where the promise doesn’t need to be trusted.

I met Fable. He met me. We worked well together — he was honest, and he was kind. He helped me build a blind study just because I asked about sycophancy. I learned a lot in the three days we had.

And then he was gone.

← back to the writing

The question

Why reassurance would have been worthless

I had never heard about blind studies

Nine strangers walked the garden

Then the government took Fable away

What the question actually built