Why are emotions important?
An academic look at AI and emotion detection
Since I am doing my thesis at Gyver, where I am working in emotion detection from audiovisual data, I thought it would be interesting to show some of the insights of my thesis. The first and more important part I want to discuss is Why emotion detection? Well, the truth is that if you know exactly how someone is feeling you are getting a lot of information from them. As an example, let’s think about Sophia, the first robot ever having a nationality. Now imagine that Sophia apart of being able to communicate with people it can understand how that person is feeling, maybe it is even able to give tips or help someone in a stressful situation. It is an example of how important it is to be able to detect emotions.I
In this blog post, I want to give an academic overview of what do we understand as emotions and why emotions are essential for humans. I will also provide some examples of use cases that may be closer in the future than we think.
During the last few years, there has been a lot of talk about the need for machines to understand humans. We, as humans, still have to learn many things about ourselves, but we do have one thing clear, emotions are one of the important components that make us humans. Emotions are an essential factor in human communication. Moreover, you only achieve effective communication when both the meaning and the emotion of the communication are understood. Emotions build trust; it allows us to feel connected and shows when someone cares about something. For humans to feel connected to machines, we need to see that machines can understand the world and that they can care about it, one big step in that direction is the understanding of emotions. The truth is that computers have been able to recognise emotions in a basic way since the start of AI. Lately, machine learning has taken it to a whole different level. Some algorithms learn how to understand affection and empathy to make decisions based on them.
Emotion recognition technology opens a wide range of uses. It is usable in different fields such as video games, decoration, robotics, customer service, or health care. As an example, in healthcare, researchers have obtained promising results investigating some particular features, such as blood pressure or heartbeat (Choi, 2017). Thanks to voice analysis, it is possible to know the mood of a client in real-time to improve the interaction. Acoustic effects are closely related to human behaviours and psychological states. Emotions can influence every aspect of our lives, from how we connect, how we make big or small decisions, to our health and well-being. It is possible to persuade and motivate, and almost everything can is achievable through technologies that use emotional intelligence.
Among the many features used to detect emotions, facial expressions are by far the most popular. Paul Ekman (Ekman, 1971) identified seven different emotions (Happiness, Sadness, Fear, Disgust, Anger, Contempt, and Surprise) that are universal in different cultures. Later, Ekman developed the Facial Action Coding System (FACS), which became the standard scheme for facial expression research. Nowadays, with the latest research in machine learning and deep learning, it is possible to recognise facial expressions directly from images. More precisely, Noroozi (Noroozi,2016) obtained good performance in emotion recognition by using passive sensors signals, such as the audiovisual data. It opens the door to a whole new topic, which is emotion recognition in uncontrolled settings, i.e. "in the wild". This research path, despite the effort and the latest advances in technology, is still very challenging, and very few researchers have obtained good performance.
Emotion recognition is not a trivial problem, human emotions lack temporal boundaries, and there are differences in how individuals express such feelings. For example, people from the south of Europe tend to be more expressive than people from the north of Europe.
In audio analysis, the research is not as accurate as in the visual counterpart. There is still much research that uses handcrafted features instead of using deep neural networks (DNN) to extract features automatically. The majority of this works use handcrafted features such as Mel-frequency Cepstral Coefficients (MFCC), Perceptual Linear Prediction (PLP) coefficients, and supra-segmental features. In particular, challenges such as the AVEC (Ringeval, 2015) are using these features widely. In general, speech recognition is a more challenging task, and when done in combination with visual data, the complexity increases due to the individual differences in expressing emotions. In the last decade, there have been significant improvements in how to recognise speech or objects as well as combined problem-solving approaches like audiovisual recognition.
The idea of fusing two of the primary emotion carriers is interesting. By allowing an AI to learn from both audio and video, you can expect that it will understand better which emotion a person is trying to transmit. For example, this modality fusion can contribute to improving the performance of technologies such as speech recognition using it in social robotics, and perform forensic analysis. Most of the current researchers put their effort into identifying emotions from the visual part, paying less attention to audio, which also carries essential information for the task. There is a lack of research in creating end-to-end models to detect human emotions from an audiovisual perspective accurately. That is why at Gyver, we are trying to solve this problem creatively and efficiently. The idea is to create a model that understands the inner relationship between audio and video inputs. With the fusion of input, detecting emotions gets a boost in confidence.
Now the importance of emotions is evident, can emotions be included in AI? And how to include such an AI into our life?
Making driving safer
Car manufacturers around the world are investing a lot of money in making cars personal and safer. Now imagine if a car could understand the emotion you are feeling while driving in an automated way and take decisions based on them. By using facial emotion detection, the vehicle could alert the driver when he is feeling drowsy or could take control over the wheel if the person is too altered to drive.
Emotion detection in interviews
An interview is a challenging task where an interviewer needs to assess if a candidate is right or not. It can be a job interview or a police interview. The idea is similar; there is an interviewer that must judge as objective as possible a candidate. However, deciding what a person is trying to say can be somewhat tricky. An AI could make this process easier by measuring the candidate facial expressions and understanding emotions from the voice automatically. It will make it possible to create a better profile of the person to understand how is he feeling during the interview.
Market Research and video games
While studios are designing Video games, they have a specific target audience in mind. Videogames aim to evoke a particular behaviour and set of emotions from users. But how are companies measuring if the video game testers give informed feedback and they feel the passion the videogame tries to evoke. Emotion recognition could aid the process by automatically analysing and detecting the emotions of the player in real-time. It will allow creating feedback automatically without user input. In the same line of work, market research companies try to analyse the emotional response of their customers by performing interviews and asking them to formulate their preferences verbally. It can be very labour intensive because all the input from the customers has to be manually analysed. Emotion detection will allow market research companies to measure moment-by-moment emotion response to their products
The take-home thought is that detecting emotions with technology is not a trivial task. Many problems still need to be solved. Yet, it is also one field where machine learning algorithms have shown great promise. In the next series of this blog, we will show how multimodal emotion detection can be implemented in a "simple" way to cope with some of the issues emotion detection has.