Opus 1.6 Released

Opus gets another major update with the release of version 1.6. This release again brings many new features and improvements, while remaining fully compatible with RFC 6716. Here are some of the most noteworthy upgrades.

ML-Based Speech Bandwidth Extension (BWE)

Opus 1.6 introduces an experimental wideband-to-fullband speech enhancer that was presented at WASPAA 2025. It is an addition to the family of Opus speech coding enhancement algorithms which are covered by the related IETF draft. The new Bandwidth Extension (BWE) model is based on a neural network that was trained to generate high frequency speech content (8-20 kHz) from wideband speech (0-8 kHz) without any side information. It can therefore be used to enhance speech from any previous Opus version and there is no risk of breaking compatibility as the model gets improved in the future. Generating highband content from wideband speech is possible since all phonetic information is already contained in the lower frequency range. This is unlike the problem of narrowband (telephone) to wideband extension, which is hard to achieve and tends to be unreliable (which is why we are not attempting it here).

The model can be used to optionally decode wideband speech into fullband speech sampled at 48 kHz as shown below. It can furthermore be used in combination with the wideband enhancement methods introduced in Opus 1.5. It is, however, not intended to replace highband content encoded in hybrid mode and it will never activate for super-wideband or fullband audio.

Wideband speech decoded with (right) and without (left) BWE option.

A second application of the BWE model extends the FARGAN wideband speech vocoder used for deep packet loss concealment (PLC) and Deep Redundancy (DRED) decoding. Notably, this enables DRED to achieve fullband quality when combined with BWE, ensuring more consistent output quality for fullband transmissions.

Fullband speech (hybrid mode) with significant amount of DRED-decoded speech without (right) and with (left) BWE option.

Combining NoLACE and BWE significantly improves speech quality. It makes good fullband quality speech possible with as low as 9 kb/s. A comparison is provided below.

Subjective MOS evaluation of the new bandwidth extension. For uncoded speech, BWE is able to close about half of the quality gap between wideband and fullband. For coded speech in combination with NoLACE at 9 kb/s, it achieves similar quality to Opus 1.4 fullband at double the bitrate (18 kb/s). It also closes the gap with the EVS codec. Interestingly, we show that at 9 kb/s and above, Opus is now able to exceed the quality of a purely neural codec like EnCodec.

To use BWE, it needs to be enabled in the build using the --enable-osce configure option. Then, at run-time, it needs to be enabled explicitly via -enable_osce_bwe and the decoder complexity (introduced in Opus 1.5) needs to be set to 4 or above. BWE will then only be used for speech coded using the SILK wideband mode provided the decoder is configured for 48-kHz sampling rate.

DRED Improvements

Deep REDundancy (DRED) was first introduced as an experimental extension in Opus 1.5. Since then, it has continued to improve and Opus 1.6 brings many improvements. Since DRED is signaled in the bitstream, the new model is incompatible with the version in Opus 1.5. Thanks to the versioning of the model, using a 1.5 encoder with a 1.6 decoder (or vice versa) will not result in any unwanted sounds, but any DRED information will not be available to an incompatible decoder. The good news is that we are hoping (but not yet promising) the new model will not need to be changed in the final DRED standard. For more on DRED, see this paper and the IETF draft. See below the DRED improvements introduced in Opus 1.6.

Better Intelligibility

The original model in 1.5 had a tendency to overly smooth out temporal spectral variations, especially at very low bitrate. That could sometimes be perceived as slurred speech, with lower intelligibility. The new 1.6 model was trained with an improved loss function that is more sensitive to large errors (using a fourth-power error term). Although it does not improve perceptual quality (it doesn't sound better), the speech is more intelligible.

Intelligibility of the updated DRED model compared to the original model from Opus 1.5. Blue regions in the confusion matrix indicate pairs where the new model is more intelligible, whereas red regions indicate that the new model is less intelligible. These are measured at low bitrate. See the absolute confusion matrices (compared to the uncompressed speech) caused by the new model at low, medium, and high bitrate. These results are taken from the draft-lechler-mlcodec-test-battery (slides).

Increased Robustness

Nowadays, DNN-based speech enhancement is ubiquitous and it's rare to find ourselves encoding noisy or reverberant speech. That being said, robustness to such speech is still potentially useful for some applications, so it's a nice-to-have as long as it does not hurt clean speech performance. For that reason, the 1.6 model was trained on a mix of clean and noisy/reverberant speech. The new augmented training significantly improves quality and intelligibility in challenging conditions without hurting the clean speech case (maybe even improving it a bit).

Smaller Model

One of the main barriers to using DNNs in codecs — even more than complexity — is model size. Despite the improvements above, the new DRED encoder and decoder models are also about 3x smaller than the corresponding 1.5 models (600 kB, down from 1800 kB). The size reduction was made possible by a combination of architecture fine-tuning, increased sparsity, and the use of bottleneck layers for convolutional layers.

Experimental Opus HD Support (in development)

One more experimental feature (see IETF draft) introduced in this release is support for coding 96-kHz audio with bandwidth beyond the standard 20 kHz audio range, and increased bitrates up to 2 Mb/s. While 48 kHz audio is good enough for everyone, there are use cases for the increased sampling rate and bitrate, including situations where sensors and/or ultrasonics are involved. Moreover, to preserve backward compatibility, Opus HD is implemented as an extension layer on top of Opus. A regular Opus decoder will handle an Opus HD stream perfectly fine — albeit without being able to take advantage of the added bitrate and bandwidth.

In addition to the extended bandwidth, Opus HD also allows for an increased resolution in the audible 0-20 kHz band. The standard Opus quantizers in RFC 6716 have a resolution that can go as high as about 8 bits per coefficient as the bitrate reaches its maximum of 510 kb/s. With Opus HD, quantizers can reach as high as 20 bits of depth. Moreover, since the added resolution is implemented as a layer, it makes it possible to use Opus as a scalable codec. Note that the quantizer resolution is not to be confused with the PCM sample bit resolution. Even with bit depths below 8 bits, Opus is still easily capable of coding the full dynamic range of 24-bit audio — and more.

To enable Opus HD support, use the --enable-qext configure option when building Opus. To encode using Opus HD, use the -qext option in opus_demo, or use OPUS_SET_QEXT(1) with the encoder API. If built with Opus HD support, the decoder will automatically use any Opus HD layer it finds, unless OPUS_SET_IGNORE_EXTENSIONS(1) is used.

New 24-bit Audio API

Opus 1.6 introduces a new 24-bit integer audio API for encoding and decoding. While the original 16-bit integer and 32-bit float calls remain available, this new option targets high-resolution audio pipelines that prefer to avoid floating-point arithmetic. It is particularly useful on platforms where floating-point operations are expensive, or for applications that maintain a pure integer pipeline.

Since the C standard does not provide a native 24-bit integer type, the API uses opus_int32 with the audio data stored in the lower 24 bits. This results in a nominal range of [-2²³, +2²³-1]. Crucially, just like the floating-point API, the 24-bit API supports values slightly beyond this nominal range without hard clipping, preserving dynamic range peaks that would be lost with the 16-bit API.

The new calls use a "24" suffix (e.g., opus_encode24() and opus_decode24()). Support is comprehensive, covering the standard API as well as:

Miscellaneous

The header files were previously using internal macros starting with an __opus prefix, which violates the C specification saying that identifiers beginning with a double-underscore are reserved. This has now been fixed. Unfortunately, libopusenc was (wrongly) relying on those internal macros, so anyone compiling with libopusenc (binaries are unaffected) will need to update to libopusenc 0.3.

As a side benefit of the Opus HD work, the accuracy of the fixed-point implementation has been significantly improved. It now more closely matches the floating point implementation.

The architecture-specific optimizations for MIPS have been updated and improved.

Conclusion

This release is a continuation from the Opus 1.5 release and these new features will continue to evolve over time. Please give Opus 1.6 a try and let us know about your experience (good or bad) so we can fix any issue that may come up. Enjoy!