Ambient Audio Footprint: In Meta, an AI-assisted codec achieves new compression records

© Free / Artur Debate

If the digital sector is no longer the most polluter or generator of greenhouse gases, the impact it has on natural ecosystems and global warming continues to grow, which makes it an area of ​​attention, especially since it interferes everywhere in our daily lives. Therefore, any new technology capable of reducing the effects attracts the interest of ecologists. This is the case of an audio compression method tested in the FAIR (Fundamental AI Research) laboratory of a certain Meta.

Surprisingly, it was with the metaverse on the horizon that a small research team began AI-assisted compression, emphasizing the importance of making available all connections, even limited ones, of the new eldorado of the parent company of the galaxy Facebook. To begin her work, she focused on audio compression, a field revolutionized by the Fraunhofer Institute’s MP3 in the 1990s, which allowed for the emergence of Napster and the like, opened up the market for digital walkmans, and made possible the platforms of broadcast today. Except with EnCodec – the name given to their audio codec – it’s no longer about compression, but hypercompression, as the researchers themselves admit.

AI assisted decoding

In this field, encoding and decoding are the main steps. They manage file compression and decompression. Here, a third stage intervenes with the “quantizer”, an algorithm capable of decomposing and reconstructing an audio signal respecting a given final file size and preserving the most important information. An additional complexity that makes decompression very difficult. That’s why he uses AI.

Advertising Your content continues below

The idea here is not to keep a signal as detailed as the original. At this level of compression, it is unimaginable. On the other hand, the role of this intelligent decoding is to stop the changes that would be “perceptible by the human ear”, using discriminators which will compare extracts of the original file with the compressed results. This is the role of a neural network trained for this purpose. In this way, the system is forced to provide reconstructed extracts perceived as similar to the original.

A technique that allows Meta researchers to claim to reduce the size of an audio file by ten times compared to a 64 kbps MP3, and this “no loss of quality”. Enough to hold an album on a simple antediluvian computer diskette. A statement we don’t take at face value, of course. Especially since these works seem to turn more towards encoding vocal messages, which are easier to understand than a more complex piece of music.

However, Facebook’s lab strongly believes in this work, which it considers the first of its kind to be able to apply it to CD quality (stereo 16-bit 48 kHz), which is the standard for music distribution. Eventually, the company hopes its compression method can be used for video conferencing as well as streaming movies, or playing virtual reality games online with friends.

Examples to listen to

Of course, there is the matter of the hardware resources required to decrypt these files. A point that Meta is not very vocal about, but reassuring, showing that decompression is done at high speed in real time on a single CPU core. The researchers also indicate that future progress in processors dedicated to these tasks could improve the file compression/decompression stages while being less power-consuming.

Advertising Your content continues below

IN blog post Showing these advances, an example of 6 kbps compression can be heard, making it possible to compare an original extract with that obtained with EnCodec. Without a doubt, the result obtained is infinitely more audible than that given by using EVS in 2014 or OPUS in 2020, with the same compression target. From there to say that no change is audible to the original file is perhaps going a bit too fast.

To get an idea, other examples are provided on Twitter by one of the Meta researchers, Alexandre Défossez, with bandwidths ranging from 1.9 to 10.4 kbps. He also points out that EnCodec’s current model, at just 12 kbps, yields results measured by human samples somewhere between MP3 at 64 and 128 kbps. A file weight reduction factor of about six to ten times, which is impressive.

Streaming music, not so good for the planet

It doesn’t take long for some to see a compression method that represents a major breakthrough for the music industry, imagining that it could significantly reduce the environmental footprint of streaming while improving the efficiency of song distribution. And this, even though streaming music, in itself, is presented as an ecological boon in the sense that its carbon footprint is lower than that of any physical streaming medium.

It’s just that this point is mostly to be mitigated given the fact that this technology is also associated with a massive increase in music consumption. According to the organization Zero carbon, in 2020, with the Covid-19 pandemic, the use of transmission services increased by 70%, to represent 570 million tons of CO2 equivalent released into the atmosphere. As for Sharon George from Keele University, she calculated that 5 hours of audio streaming represented 288g CO2e of a CD with its case and those 17 hours were equivalent to 979g of a vinyl record. However, some studies estimate the consumption of music streaming by subscribers of a premium offer to be around 5 hours on average.

However, better compression will also have the advantage of placing a whole host of technologies near slow connections, while reducing the inconvenience of using them in a turbulent network environment. That said, when you know the quality of an MP3, at 320 kbps, but also at lower qualities, and you know the number of critics this format still has among audiophiles who swear by loss, it goes without saying that specialized forums are far from the end of the format war.

The details of the post about EnCodec and a link in its code are also available.

Advertising Your content continues below

Leave a Comment