Mobile telephony has changed the world in the space of just two decades. From what was once a clunky device with one purpose – to make and receive calls – we now hold in our hands technology more powerful than many personal computers.
The journey that this technology has undertaken reflects how we communicate today. Think of video. Two decades ago, video was a passive experience – something we watched, that was produced by others, rather than engaged with. Today, millions of users are creating and sharing their own videos every day. Now the calls we make not only use audio, but are made using high-definition video as well. But while mobile has continually innovated in video – smartphones can now record in 4K and recognize faces – the same cannot be said of audio.
Audio has significantly lagged behind consumer’s expectations. Around 66 percent of consumers say they can tell the difference between technologies on sound difference alone, and a further 70 percent state that sound quality is more important than cost when buying headphones or speakers. This demonstrates that there is a distinct expectation from smartphone users that audio quality, and how natural or close to normal communication the audio is, needs to improve.
Evolving the standards
While there has been some progress in speech coding technologies – such as the development from GSM FR codec to EVS, it is surprising that mobile voice services and telephony have not been evolving at the same pace. These innovations have gone some way to improving the audio experience, while also ensuring cost-efficient and miniaturized product implementations for OEMs and ODMs.
But we are still some distance from recreating the same experience as a natural face-to-face discussion. We need to move away from passive sound to more intuitive audio experiences. The way to do this lies in alternative design philosophies and strategies.
Traditionally, audio systems are optimized for the best possible interference and noise cancellation systems. The aim is to improve the signal to noise ratio, creating an experience similar to talking in an empty room. But the challenge is to differentiate what is noise and which signals we need to enhance, such as a voice against the background sound of a crowded concert. This signal model is visible in functionality like Voice Activity Detectors (VAD), Discontinuous Transmission (DTX) and comfort noise.
But these don’t account for the ambient sounds we may want to keep, and inadvertently cause competition in what sound to focus on.
The alternative is to use human capabilities to present audio in a more natural manner without manipulating content with aggressive signal processing that can introduce further distortions. This type of approach can enable immersive audio experiences.
The core challenge in immersive audio is defining what is a desired signal. This is because sound is highly context dependent, from the environment to its purpose. Humans are very good at adapting to various audio situations – so immersive audio systems need to support the listener’s goal and ability to adapt to these different scenarios.
Truly immersive audio
Nokia introduced OZO Audio, our spatial audio capture solution. By using spatial audio technologies, we re-create how people hear in a natural environment. It allows listeners to hear content as if they were present, meaning they can clearly differentiate sounds – focusing on what needs to be heard, and what is background noise.
Currently, this technology only works for one-way applications, such as playing videos on a smartphone. But there is great scope that, in the near future, we will be able to combine mobile telephony with spatial audio so that immersive phone calls will be a reality. This is the next big step in telephony.
This is just one reason why the future of immersive communications is so exciting. Recent advances and the democratization of virtual and augmented devices alongside continued development of mobile platforms provide the motor for this – consumer demand is the fuel. The industry is now being pushed to transition from the traditional telephony technology used for decades and towards immersive experiences where the physical location is less critical for fluent communication.
If the past decade has been about improving the user experience from a visual perspective, then the next few years must be about ensuring the audio experience matches it. The industry needs to place greater emphasis on innovating in sound, lest they lose both consumers and content creators that seek the competitors who are.
Learn more about our spatial audio technology here
Follow us on @NokiaOzo