On AI character design

Designed, Not Prompted.

Language is a lossy compression of visual intent. Every AI creator has typed a perfectly clear prompt and gotten back something almost-but-not-quiteright. Here's why prompts alone will never close that gap — and what to do about it.


You can picture the character clearly. Not vaguely — clearly. You know the shape of her jaw, the tilt of her eyes, how the cheekbones sit just a little higher than feels natural. You sit down, type a careful prompt, and hit generate. What comes back is competent. It is even close. But it isn't her.

So you rewrite. "Slightly narrower jaw. Almond eyes with a slight upturn."You generate again. Now the jaw is narrower but the nose has changed and the lighting is wrong. You add two more clauses. The hair color drifts. You add another. The pose shifts. By the twentieth attempt you have spent forty minutes and a small pile of credits, and you have either accepted something close enough or given up.

Every AI content creator has lived this loop. It is not a skill problem and it is not a model problem. It is a tool problem — specifically, a problem with what kind of instructions you are allowed to give the model in the first place.

Language is a lossy compression of visual intent

A prompt is a sentence about a picture. The picture in your head has thousands of independent variables — proportions, angles, color values, micro-expressions, occlusion, lighting direction. A sentence has, generously, dozens of meaningful tokens. Even with excellent prose, you are asking a model to expand a paragraph into a million pixels in exactly the way you imagined them.

The model is doing its best. It is filling in everything you did not specify — and there is an enormous amount you did not specify, because you cannot fit it into a sentence. "A young woman with short curly hair and freckles" leaves the position of the eyes, the shape of the lip line, the angle of the brows, the spacing between the irises, and the proportion of the forehead entirely up to the model. It will guess. It will guess differently every time.

A prompt is a sentence about a picture. The picture in your head has thousands of independent variables. A sentence has dozens.

This is not a bug you can prompt-engineer your way out of. It is the math of compression. You can carry more information per word by writing more carefully, but you can never carry the same density of intent that an image already contains — and the kind of intent that character design specifically requires (proportions, geometry, exact color) is the hardest kind to encode in language at all.

The interfaces we already accept for this problem

We solved this problem decades ago, in every adjacent creative discipline.

A photographer does not type "warmer" at their editing software. They drag a color-temperature slider and watch the image shift in real time. An audio engineer does not describe a mix in prose — they grab the EQ band and pull it down two decibels. A 3D artist sculpts with a brush, not a sentence. Every mature creative tool eventually converges on the same insight: for some kinds of decisions, direct manipulation beats specification.

The reason is feedback latency. When you move a slider, you can see in 16 milliseconds whether you have over- or under-shot, and adjust. When you write a prompt, you write a sentence, wait ten seconds for a render, evaluate, rewrite, wait ten more seconds. The OODA loop is roughly a thousand times slower, and the steering wheel is made of words instead of motion. Of course you over- and under-shoot constantly. Of course the result drifts.

What this means for AI character design specifically

Characters are mostly proportions. The reason your AI character almostlooks right and then doesn't is almost always geometric — the cheekbones are a little too low, the jaw is a little too wide, the eyes are spaced a fraction too far apart. These are the exact things prose is worst at conveying and sliders are best at. "Slightly more upturned outer corner"is approximate; a slider at +0.35 is not.

Color is the same. Hair color, skin tone, eye color — describing a precise hue in words ("a warm honey blonde, not too brassy, with a slight ash undertone") is laughable next to a hex code or a swatch. You can prompt yourself blue trying to specify a color. You can pick the color out of a palette in half a second.

Characters are mostly proportions. Proportions and color are the two things sliders do better than prose. It is not a coincidence that prompt-only tools struggle most where these two dominate.

Where prompts genuinely win

None of this is an argument against prompts. Prompts are good at exactly the things sliders are bad at: novelty, semantics, combinations, intent. "A weary cartographer in a flooded library" is not a slider preset. "Almond eyes with a slight upturn at the outer corners that suggests amusement rather than aggression" communicates an emotional valence that a numeric slider never could. Prompts are the right interface for telling the model what kind of thing you want. They are the wrong interface for telling it exactly how you want it.

The collapse of every other creative tool toward sliders-plus-text is not coincidence. Photoshop has both. DaVinci Resolve has both. Procreate has both. Every audio DAW has both. The mature answer is not sliders or prompts. It is sliders for shape and color, prompts for intent and surprise.

A different way to think about the workflow

Imagine starting with a face you can see, and then sculpting it. You drag jaw width down. You raise the cheekbones. You spread the eyes a fraction. You pick a hair color from a curated palette instead of describing it. The face changes in real time as a hint — geometrically warped, not yet photorealistic. When you have the proportions you want, you add the kind of sentence prompts were made for: "almond shape with slight upturn at the outer corners", scoped to just the eyes, because that is the kind of detail the geometric slider can't quite hit.

Then you commit, and the model renders the result naturalistically — taking the proportions as a fixed target and the per-region descriptions as the qualitative finish. One pass. One credit. The face you actually wanted.

The point is not that this is the only correct workflow. The point is that it is dramatically closer to how character design has always worked — and dramatically further from the twenty-prompt drift cycle that has become the default. Direct manipulation of the things that should be directly manipulated. Words for the things that should be words.

A note on what this is not

None of this solves the larger problems in AI character creation — identity preservation across shots, lipsync, riggable export, multi-character continuity. Those are real, harder, and mostly unsolved. What it does solve is the friction at the very beginning of the pipeline: the gap between the character you can picture and the character that arrives on your screen. If you can't close that gap, nothing downstream matters, because you never had the character you wanted in the first place.

Most of the AI character world is racing to make prompt-only tools almost consistent. It might be more useful to question whether prompt-only was ever the right interface for the problem.


FaceHub's face editor is built on this idea — sliders for shape and color, scoped text hints for intent, one AI pass that respects both. If the loop above is familiar, it's worth a try.