The group behind the Stable Diffusion project wants to open source the source code for an AI that recognizes emotions

Updated 7 months ago on May 15, 2024

In 2019, Amazon updated its Alexa assistant with a feature that allows it to detect when a customer is likely to be upset and respond with more empathy. For example, if a customer asked Alexa to turn on a song and she queued up the wrong one, and then the customer says "No, Alexa," in an upset tone, Alexa can apologize and ask for clarification.

Now the group behind one of the datasets used to train Stable Diffusion's text-to-image model wants to bring similar emotion recognition capabilities to every developer - and for free.

This week, LAION, a non-profit organization that creates image and text datasets for training generative AI, including Stable Diffusion, announced the Open Empathic project. According to the project, Open Empathic aims to "equip open source AI systems with empathy and emotional intelligence."

"The LAION team, with a background in healthcare, education, and machine learning research, saw a gap in the open source community: emotional AI had been largely overlooked," Christoph Schumann, co-founder of LAION, told TechCrunch via email. "Much like our concerns about opaque AI monopolies that led to the birth of LAION, we felt a similar urgency here as well."

As part of the Open Empathic project, LAION is recruiting volunteers to submit audio clips to a database that can be used to create AI, including chatbots and text-to-speech models that "understand" human emotion.

"At Open Empathic, our goal is to create an AI that doesn't just understand words," Schumann added. "We aim for it to pick up nuances in expressions and shifts in tone, making human-AI interactions more authentic and empathic."

LAION, an acronym for "Large-scale Artificial Intelligence Open Network," was founded in early 2021 by Schumann, who is a former German high school teacher, and several members of a Discord server for AI enthusiasts. Funded by donations and government research grants, including from AI startup Hugging Face and Stability AI vendor Stable Diffusion, LAION's stated mission is to democratize resources for AI research and development, starting with educational data.

"We are driven by a clear mission: to harness the power of artificial intelligence in a way that brings real benefits to society," Kari Norii, open source author of LAION and a PhD student at Bournemouth University, told TechCrunch. "We are passionate about transparency and believe that the best way to shape artificial intelligence is through open discussion."

Hence the overt empathy.

In the initial phase of the project, LAION has set up a website that invites volunteers to annotate YouTube clips - some pre-selected by the LAION team, others made by volunteers - featuring an individual's performance. For each clip, volunteers can fill in a detailed list of fields, including a transcription of the clip, audio and video description, as well as the age, gender, accent (e.g., "British English"), level of arousal (alert - not sexual, to be clear), and level of valence ("pleasantness" vs. "unpleasantness") of the person in the clip.

Other fields on the form deal with the sound quality of the clip and the presence (or absence) of loud background noises. But the main focus is on the person's emotions - or at least the emotions that the volunteers think the person is feeling.

From a variety of drop-down menus, volunteers can select single or multiple emotions, ranging from "chirpy," "animated" and "emoting" to "pondering" and "engaging." Norius says the idea was to get "rich" and "emotional" annotations while capturing expressions across languages and cultures.

"We aim to train AI models that can understand a wide variety of languages and understand different cultural traditions," Norii says. "We are working on building models that 'understand' languages and cultures using videos of real emotions and expressions."

Once volunteers submit a clip to the LAION database, they can repeat the process again - there is no limit to the number of clips a single volunteer can annotate. LAION hopes to collect about 10,000 samples over the next few months, and optimistically expects to collect between 100,000 and 1 million by next year.

"We have passionate community members who, driven by the idea of democratizing AI models and datasets, willingly provide annotations in their spare time," says Norii. "Their motivation is a shared dream of creating empathic and emotionally intelligent open source AI that is accessible to all."

The pitfalls of recognizing emotions

In addition to Amazon's attempts with Alexa, startups and tech giants are developing AI that can recognize emotions - for purposes ranging from sales training to preventing accidents caused by drowsiness.

In 2016, Apple acquired Emotient, a San Diego-based company working on artificial intelligence algorithms that analyze facial expressions. In May this year, Swedish company Smart Eye acquired Affectiva, a division of the Massachusetts Institute of Technology, and said its technology could recognize anger or frustration in speech in 1.2 seconds. And speech recognition platform Nuance, which Microsoft acquired in April 2021, demonstrated a product for cars that analyzes a driver's emotions based on their facial expressions.

Other players in the emotion recognition market include Hume, HireVue, and Realeyes, whose technologies are used to determine how certain segments of viewers react to advertisements. Some employers are using emotion recognition technology to evaluate potential employees by rating them on their level of empathy and emotional intelligence. Schools are using it to monitor student engagement - and remotely at home. Emotion-detecting AI is being used by governments to identify "dangerous people" and is being tested at border controls in the US, Hungary, Latvia and Greece.

The LAION team, in turn, envisions useful and seamless applications of the technology in robotics, psychology, vocational training, education, and even gaming. Schumann paints a picture of robots that offer support and companionship, virtual assistants that sense when a person is feeling lonely or anxious, and tools that help diagnose psychological disorders.

This is techno-utopia. The problem is that most emotion recognition technologies are on shaky scientific ground.

There are virtually no universal markers of emotion, which casts doubt on the accuracy of AI detecting emotions. Most emotion recognition systems were created based on the work of psychologist Paul Ekman, published in the 70s. However, subsequent research, including Ekman's own work, supports the common sense idea that there are significant differences in how people from different walks of life express their feelings.

For example, an expression supposedly universal for fear is stereotyped in Malaysia for threat or anger. In a later paper, Ekman suggested that American and Japanese students react differently to violent movies, with Japanese students adopting a "completely different set of expressions" if someone else-especially an authority figure-is in the room.

Voices also cover a wide range of characteristics, including people with disabilities, conditions such as autism, and those who speak other languages and dialects such as African American Vernacular English (AAVE). A native French speaker interviewing in English may pause or pronounce a word with some uncertainty, which can be misinterpreted by a stranger as a marker of emotion.

Indeed, much of the problem with emotion-determining AI is bias - implicit and explicit bias introduced by annotators whose inputs are used to train emotion-determining models.

For example, in a 2019 study, researchers found that labelers were more likely to annotate AAVE phrases as toxic than their equivalents in American English. Sexual orientation and gender identity can also strongly influence which words and phrases an annotator perceives as toxic, as can overt prejudice. Several widely used open source image datasets have been found to contain racist, sexist, and other offensive labels from annotators.

The consequences can be quite significant.

Retorio, an artificial intelligence hiring platform, has been found to react differently to the same candidate wearing different outfits, such as glasses and a headscarf. In a 2020 study at the Massachusetts Institute of Technology, researchers showed that facial analysis algorithms can be biased toward certain facial expressions, such as smiling, reducing their accuracy. More recent work shows that popular emotional analysis tools tend to assign more negative emotions to the faces of black men than to the faces of white men.

Respect for the process

So how will the LAION team combat these biases, making sure that, for example, white people don't outnumber black people in the dataset; that non-binary people aren't assigned the wrong gender; that people with mood disorders aren't attributed emotions they didn't intend to express?

It's not entirely clear.

Schumann argues that the process for submitting training data to Open Empathic is not an "open door" and that LAION has systems in place to "ensure the integrity of the materials."

"We can validate user intent and consistently check the quality of annotations," he added.

However, previous LAION datasets have not been particularly clean.

Some analyses of LAION ~400M, a training set of LAION images that the group attempted to create using automated tools, revealed photos depicting sexual assault, rape, hate symbols, and graphic violence. LAION ~400M is also rife with bias, for example, returning images of men but not women for words such as "CEO" and images of Middle Eastern men for the word "terrorist."

This time, Schumann trusted the community to serve as a check.

"We believe in the power of hobby scientists and enthusiasts from around the world coming together and contributing to our datasets," he said. "We are open and collaborative, but we prioritize the quality and authenticity of our data."

As for how any emotion detection AI trained on the Open Empathic dataset will be used - biased or not - LAION intends to adhere to an open source philosophy, even if that means the AI may be misused.

"Using artificial intelligence to understand emotions is a powerful endeavor, but it is not without its challenges," Robert Kaczmarczyk, co-founder of LAION and a physician at the Technical University of Munich, said in an email. "Like any other tool, it can be used for good or for bad. Imagine if only a small group of people had access to advanced technology, while a large part of society remained in the dark. Such an imbalance could lead to abuse or even manipulation by the few who have control over the technology."

When it comes to artificial intelligence, light-hearted approaches sometimes come back to bite model builders - an example of this is the use of stable diffusion to create child sexual abuse material and non-consensual deepfakes.

Some privacy and human rights advocates, including European Digital Rights and Access Now, are calling for a complete ban on emotion recognition. The EU's recent Artificial Intelligence Act, which establishes a framework for the governance of artificial intelligence in the European Union, bans the use of emotion recognition in police forces, at borders, workplaces and schools. And some companies, such as Microsoft, have voluntarily abandoned the use of emotion-recognizing AI in the face of public backlash.

However, LAION seems to be relaxed about the level of risk and believes in an open development process.

"We welcome researchers who can poke around, suggest changes, and spot problems," Kaczmarczyk says. "And just as Wikipedia thrives on community input, Open Empathic is fueled by community participation, ensuring transparency and security."

Transparent? Sure. Safe? Time will tell.

Let's get in touch!

Please feel free to send us a message through the contact form.

Drop us a line at mailrequest@nosota.com / Give us a call over skypenosota.skype