The importance of tone in artificial intelligence: A missed opportunity
Anders Hvelplund, SVP, Jabra
It’s no secret that artificial intelligence (AI) has made a lot of progress in the last years. And in the world of generative AI, perhaps nothing has stunned the world more than ChatGPT by OpenAI, with its ability to provide coherent, all-too-human responses to even the most specific queries. Similarly, with the advent of tools such as Dall-E and Midjourney, we have seen AI becoming visually creative. You simply provide a text-based prompt, and these tools will generate amazing art – no matter how specific or unusual your prompt is.
Undoubtedly, this laser focus on text-based, written-word AI has led to tremendous development in the field. However, it also means that a critical aspect of human communication has been all but ignored: tone.
The communicative value of tone
Tone, which refers to the emotional or attitudinal quality of a message, is a vital element of human communication. It conveys the speaker’s intentions, feelings, and personality, and can significantly impact the way a message is received by the listener. Consider the following sentence: “I never said she stole the money.” Depending on which word is emphasised, the meaning of the sentence can change significantly.
This lack of attention has led to misunderstandings from existing AI solutions – for instance from virtual assistants like Siri and Alexa. We should not be surprised. According to Mehrabian’s model, on the topics of feelings and attitudes, tone accounts for 38% of communication, while body language accounts for 55% and words only 7%. Recent studies by Yale’s Michael Krauss have even challenged Mehrabian’s model, finding that body language may be less important than tone. AI models that do not focus on tone are thus missing the most critical piece in understanding the emotional context of human conversation.
The business case is a human case
The neglect of tone in mainstream AI is not for lack of opportunity. For example, contact centres are spending billions of dollars to measure customer sentiment and the wellbeing of their agents, but may not be receiving enough value for it. Traditionally, they have relied on surveys for these purposes. But surveys suffer from low response rates and bias. And if they look to AI to solve the problem, the current text-based AI solutions have inherent limitations in accurately detecting sentiment and wellbeing as described above. However, paying attention to tone of voice can provide a strong indication of customer satisfaction.
Moreover, contact centres provide a great example that tone is not only a relevant measurement of customer satisfaction, but also way to improve it; customers will appreciate being met by agents with friendly and engaged tones. However, this need is not serviced by the text-based tools currently in use, which are constrained to teaching agents to follow scripts and use the right words. But as research shows that 78% of customers say unscripted calls provide a better experience, contact centres could achieve better customer satisfaction by focusing on the tone of the agents instead of their words.
In addition to its crucial role in communications, tone can also be used to detect a person’s wellbeing and health. One survey of academic literature in this field concluded that speech analytics is a promising tool for diagnosing early cognitive decline and Alzheimer’s disease. Indeed, tone analysis can be incredibly effective in predicting the onset of these diseases.
Responsibly scaling tone AI
When it comes to implementation, tone-based AI has great benefits as it is easier to scale and has better data privacy properties than its text-based counterparts. Unlike text-based approaches, it is not specific to certain languages and therefore scales much better across languages and accents. This means that users of the technology can broaden their potential audience. Similarly, they will not necessarily be required to train models specifically for their contexts, generating vast implementation savings compared to text-based models.
Moreover, tone analysis has superior privacy properties compared to other forms of AI, which is increasingly important today where data privacy, rightly, is a major concern. While AI systems based on visuals or text operate on some of the most personal data we have (social security numbers, account numbers, etc.), tone analysis focuses on less sensitive data such as pitch and speaking rate. In addition, tone models tend to be smaller than text-based models and can therefore be implemented locally on devices such as smartphones, computers, wearables, or robots. This allows individuals and companies to keep their most critical data on their own premises and avoid sharing it with third parties, ensuring that what happens on your device, truly stays on your device.
Breaking the tone barrier
One reason that tone analysis has not had a greater impact is that until recently, models struggled to perform consistently across different conditions (such as accents, background noise, and voice quality). However, the transformer-based architecture that has been successful in text-based AI can also be applied to tone-based AI, resulting in models as impressive as those of ChatGPT and Dall-E. For example, audEERING, a European leader in voice AI, used this architecture to achieve the holy grail in voice AI: accurate prediction of valence, or the emotional value (positive/negative) of an experience. Moreover, they demonstrated that these models can now be used across languages and accents and are therefore ready for the real world. And these models are already being implemented in the contact centre context.
As we enter 2023, there is a big opportunity to focus more on tone-based AI – not only for the research community, but also in the world of business and practical affairs. And as text-based AI tools continue to develop and enter into the mainstream, it will only become increasingly clear that a major dimension of communication is being entirely sidelined. It’s time to finally give tone the attention it deserves and unlock tremendous value in the process.
Anders Hvelplund is SVP at Jabra
Jabra is a world leading brand in audio, video and collaboration solutions – engineered to empower consumers and businesses.
Proudly part of GN Group, we are committed to bringing people closer to one another and to what is important to them. Jabra engineering excellence leads the way, building on over 150 years of pioneering work within GN. This allows us to create integrated tools for contact centers, offices, and collaboration to help professionals work more productively from anywhere; and true wireless headphones and earbuds that let consumers better enjoy calls, music, and media.
Founded in 1869, GN Group employs more than 7,500 people and is listed on Nasdaq Copenhagen (GN.CO). GN’s solutions are sold in 100 countries across the world.
For additional information on Jabra view their Company