The Bart Test - Part 5: Redesigning From Scratch
Building Bart Gottschalk Building Bart Gottschalk

The Bart Test - Part 5: Redesigning From Scratch

After my teens ghosted the frontier model evaluation, I sat with a choice: give up on this whole thing, or try again.

The doubt was real. Maybe the Bart Test would never work. Maybe asking teenagers to evaluate AI-generated slang was fundamentally flawed. But I couldn't shake the insights from [Part 3](/blog/bart-test-part-3-the-zoo-not-duck-problem)—the "zoo not duck" problem, the slang half-life, the "trying too hard" pattern. Those felt real.

So I decided to try again. Not because I was confident it would work, but because I wasn't ready to give up.

Read More
The Bart Test - Part 4: When My Teen Judges Ghosted Me
Questioning Bart Gottschalk Questioning Bart Gottschalk

The Bart Test - Part 4: When My Teen Judges Ghosted Me

I tested GPT-5.2, Claude Opus 4.5, and Gemini 3 Pro with the baseline prompt. The [outputs](https://github.com/bart-mosaicmeshai/bart-test/tree/main/results/03_experiment_runs) were ready. I sent the first story ([GPT's 1,540-word epic](https://github.com/bart-mosaicmeshai/bart-test/blob/main/results/03_experiment_runs/03a_gpt5.2_baseline_20251218_202909.json)) to my kids via text.

No response.

I waited a few days. Still nothing.

A week passed. They weren't being difficult. They just... didn't respond.

Read More
The Bart Test - Part 3: The Zoo-Not-Duck Problem
Learning Bart Gottschalk Learning Bart Gottschalk

The Bart Test - Part 3: The Zoo-Not-Duck Problem

When I asked what made the AI output feel unnatural, Teen #1 said:

> "Just didn't seem like very effective communication. It's like if you are trying to paint a picture of a duck and you paint a picture of a zoo with a tiny duck exhibit in the corner. Too much noise."

This metaphor captured the core problem.

Read More
The Bart Test - Part 1: When AI Does Its Homework Too Well
Learning Bart Gottschalk Learning Bart Gottschalk

The Bart Test - Part 1: When AI Does Its Homework Too Well

I asked my teenagers to judge an AI's attempt at Gen-Alpha slang.

Teen #1: "It's definitely AI... a little too much." Score: 4/10.

Teen #2: "It sounds like my ELA project where we had to use as much slang as possible." Score: 6/10 (if a teen wrote it), 2/10 (if an adult did).

The AI did its homework. That's the problem.

Read More