Moxie utilises ChatGPT


Staff member
According to this article ChatGPT by OpenAI is included in the mix of AI technologies used in Moxie:

Multimodal Moxie

“Moxie is way beyond sort of text-to-text LLM. It’s not just text in and text out, and this is an important distinction because about 90% of communication is non-verbal. So only-text contains about 10% of the content of interaction, and 90% is body language, intonation of voice, facial expressions, eye contact, nodding and emotional expression,” said Pirjanian.

Moxie is a multimodal robot where the multimodal in and out mapping is at a very high level. The tech infrastructure involves a number of models including computer vision, voice recognition, sentiment analysis and others.

In addition to collaborating with AI model providers, the company has proprietary models too. “We have a partnership with OpenAI, so we use their large language models as part of this. We use off-the-shelf automatic speech recognition technology too, but everything else is custom developed by us in the past eight years.”

Hallucinations being a common problem with LLMs has also been carefully mitigated with these robots as children are their audience and the robot should not push conversations to inappropriate corners.

“One of the things we have done over the years is collect a lot of data and train a model that moderates the inputs and outputs of the large language models, and then gently nudge the conversation into what parents would consider appropriate, or sensitive to the situation, so that it doesn’t allow the child to go off the rails,” said Pirjanian.

The company releases a new version of the software that powers Moxie every month, where the quality assurance team constantly tests and monitors everything that Moxie does.


It seems ChatGPT is used alongside Embodied’s proprietary Social X software:

Moxie already has a lot of characteristics aimed at making you forget that you are talking to a collection of circuits, metal, and plastic that is driven by Embodied’s SocialX Conversational AI integrated with large language models (LLMs) and noise-resilient automatic speech recognition (ASR).