Casting Simon

There are two animated characters, Vilma and Simon, who appear in the Question and Answer section. Vilma is meant to be a teacher-like figure, and Simon is your fellow student.

Vilma was an almost instant hit. I had just tried a couple of characters that looked exactly like my third-grade English teachers, and then I thought, "If I can make her look like anyone, maybe she should just have a subtle teacher vibe - not be a full-on teacher."

Deciding what Simon should look like took me an entire day. Below is an image from the later stage of the casting process - and here's the full picture, showing just how many candidates were considered.

applicants thumbnail

Take a look at the full lineup

I felt like Tyler Durden picking recruits for Project Mayhem: too young, too old, too cartoonish, eyebrows too thick, and so on.

The Look

The two biggest problems were head size and eyes.

Apparently, the AI is trained on a specific set of cartoon characters with exaggerated body proportions in mind, so by default, it gives the boy an oversized head. Any attempts to make the head smaller are ignored. So, if you want a boy who looks relatively normal, you need to say something like "young adult" instead - or even "young adult, youthful-looking."

The eyes were a separate story. Many of the characters, by default, stared past you into empty space. But when you asked them to look into the camera, they developed a scary, penetrating, fuming look - like a "Captain Intensity."

Finding a balance between those two extremes was a challenge. I'm still not sure I'm entirely happy with the result, but I believe it's good enough for now.

At this point, I've read The Wizard of Oz so many times that the scene where the farmers make the Scarecrow immediately comes to mind. One of them paints his ears, but they end up uneven. And he says, “Never mind - they're still ears.”

So in this case, I can say: never mind - they're still eyes.

The Voice

Finding the right voice for Simon wasn't easy either. Most good neural voices are trained on adult speech, so they just don't work for a child character. I did eventually find a boy's voice, but it wasn't exceptionally clear.

That was before I added the karaoke-style subtitles that now take up half the screen - and even I, despite being very familiar with the text, was missing some of the words Simon was saying. But once I added the text, with each word highlighted as he speaks, I had no trouble understanding him at all. It's probably a psychological trick, but it sort of confirms that combining text with sound really makes a difference.

Then I thought - maybe it's not really about the model. The model is probably trained to follow natural speech patterns - and the way a boy speaks just isn't the same as a CNN news anchor. So if we're trying to learn how to deal with real-life situations, a more natural-sounding voice might even be seen as an advantage in some people's eyes.

Besides that, Simon is meant to be a fellow student - he's not supposed to sound intimidatingly good. The idea is that you're both roughly on the same level. And nobody expects you to speak like a CNN news anchor at this point.

This is Simon!

As you can see from the casting process picture, after seeing Simon for the first time, I looked at a dozen or so other options. So yes, I had my doubts. But in the end, I think I just got over them. So, let me introduce you - this is Simon, your fellow student. Say hello to him too!

As you can see, casting Simon was no small task.