High-resolution volume control and normalisation #660

roderickvd · 2021-03-01T21:58:51Z

Enhancements:

Store and handle samples as 32-bit floats instead of 16-bit integers. This provides 24-25 bits of transparency, allowing for 48-54 dB of headroom to do volume control and normalisation without throwing away bits or dropping dynamic range below 96 dB CD quality.
Perform volume control and normalisation in 64-bit arithmetic for minimum quantisation noise.
Output to 32-bit float, 32-bit integer, 24-bit integer (both in padded 32-bit words and as three-byte arrays) or 16-bit integer (default), as specified on the command line.
Add a dynamic limiter with configurable threshold, attack time, release or decay time, and steepness for the sigmoid transfer function. This mimics the native Spotify limiter, offering greater dynamic range than the old limiter, that just reduced overall gain to prevent clipping.
Make the configurable threshold also apply to the old limiter, which is still available.
DRY-ups of a lot of audio backend code, at the same time enabling OggData passthrough on the subprocess backend.

Resolves: #608

Notes:

New command line options:

        --format FORMAT Output format (F32, S32, S24, S24_3 or S16). Defaults to S16
        --normalisation-method NORMALISATION_METHOD
                        Specify the normalisation method to use - [basic,
                        dynamic]. Default is dynamic.
        --normalisation-threshold THRESHOLD
                        Threshold (dBFS) to prevent clipping. Default is -1.0.
        --normalisation-attack ATTACK
                        Attack time (ms) in which the dynamic limiter is
                        reducing gain. Default is 5.
        --normalisation-release RELEASE
                        Release or decay time (ms) in which the dynamic
                        limiter is restoring gain. Default is 100.
        --normalisation-knee KNEE
                        Knee steepness of the dynamic limiter. Default is 1.0.

For the dynamic limiter, steepness between 0.5 and 2.0 work well. The default of 1.0 yields a linear function; > 1.0 rolls off softly, is steeper midway and < 1.0 has a sharp initial response, then gentler midway. Feedback on optimisation of these parameters is welcome.
Compiling with-vorbis works, but panics. This was already the case and is not new to this PR. See: UB with std::mem::zeroed tomaka/vorbis-rs#19

To do:

Add a command-line option to output either 16 or 32 bit depth
Add support for four-byte S24 output format
Add support for three-byte S24_3 output format
Rename normalisation-steepness to normalisation-knee
Revert default format to S16 for a seamless out-of-the-box experience
Test remaining backends (help needed!)
Optimise requantizer to work in f32 then round

Pending refactoring:

~~DRY up sample conversion in PortAudio and SDL backends~~
Edit: Not much to be gained, dropped idea
~~Use TryFrom idiom instead of AudioPacket::f32_to_<type>() helper functions~~
Edit: Moved sample conversion into separate struct
Use Self instead of full struct and enum names on config.rs

All done!

Test status:

All backends compile successfully, but require testing.

Backend	Status	First verified by	Remarks
Alsa	✅	@roderickvd	-
GStreamer	✅	@JasonLG1979
JACK audio	✅	@sashahilton00
pipe	✅	@roderickvd	-
PortAudio	☑️	@roderickvd	[2]
PulseAudio	✅	@JasonLG1979	-
Rodio	☑️	@roderickvd	[1]
SDL			[3]
subprocess	✅	@roderickvd	-

Rodio on Raspian 10 (Raspberry Pi 3 Model B+) does not open the output correctly. See issue at: Alsa output opened incorrectly RustAudio/cpal#564. This seems to be on Alsa only, no issues on macOS Big Sur, and not strictly related to this PR.
Panics on Alsa and macOS but that's already the case in dev.
Free pass granted by @sashahilton00.

roderickvd · 2021-03-01T22:17:59Z

Is Rust 1.41.1 still a target, should I work around the fact that clamp was an experimental API back then? Easy enough to do at the expense of a little elegance.

sashahilton00 · 2021-03-02T03:06:41Z

1.41.1 is still a target, as I understand it is currently the version available in stable Debian

roderickvd · 2021-03-09T21:56:52Z

From #652 I gather that the tokio_migration branch is the current development target? This PR tracks the current dev branch. I'm fine with rebasing on tokio_migration if that's the future.

Also I well on track adding a --format {F32|S16} command-line option and while at it, doing a lot of DRY-ing up in the backend department by reusing common code.

- Store and output samples as 32-bit floats instead of 16-bit integers. This provides 24-25 bits of transparency, allowing for 42-48 dB of headroom to do volume control and normalisation without throwing away bits or dropping dynamic range below 96 dB CD quality. - Perform volume control and normalisation in 64-bit arithmetic. - Add a dynamic limiter with configurable threshold, attack time, release or decay time, and steepness for the sigmoid transfer function. This mimics the native Spotify limiter, offering greater dynamic range than the old limiter, that just reduced overall gain to prevent clipping. - Make the configurable threshold also apply to the old limiter, which is still available. Resolves: librespot-org#608

Usage: `--format {F32|S16}`. Default is F32. - Implemented for all backends, except for JACK audio which itself only supports 32-bit output at this time. Setting JACK audio to S16 will panic and instruct the user to set output to F32. - The F32 default works fine for Rodio on macOS, but not on Raspian 10 with Alsa as host. Therefore users on Linux systems are warned to set output to S16 in case of garbled sound with Rodio. This seems an issue with cpal incorrectly detecting the output stream format. - While at it, DRY up lots of code in the backends and by that virtue, also enable OggData passthrough on the subprocess backend. - I tested Rodio, ALSA, pipe and subprocess quite a bit, and call on others to join in and test the other backends.

roderickvd · 2021-03-13T08:42:15Z

@JasonLG1979 continuing from #608:

@roderickvd you pushed a couple commits since I cloned and started to compile. The version I cloned does not work. I get errors about setting the format. ALSA does not seem to support float formats even running though dmix. I see in your commits since then you mention ALSA only working in 16bit linear mode.

Alsa works great in 32-bit float, just not through the current cpal and Rodio. So be sure to launch with --backend alsa.

aplay --dump-hw-params /usr/share/sounds/alsa/Front_Right.wav tells me that on my system at least that dmix will accept S16_LE S16_BE S24_LE S32_LE S32_BE S24_3LE so basically 16, 24, and 32bit linear.

If you're already doing 16bit is there a reason you can't do 32bit?

Sure, should be easy enough to do now the plumbing is there. In the meantime does adding defaults.pcm.dmix.format S24_LE to /etc/asound.conf help?

Really though I'd like to see a 24bit option also. Basically all but the cheapest DACs will do 24bit natively a lot more than will do 32bit float that's for sure.

It's in the back of my mind. This looks like a promising route: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=3d233fedc8ed595a1e88e815d23cd009

Johannesd3 · 2021-03-13T08:59:35Z

What's the advantage of u24 compared to u32, or perhaps a newtype wrapper around u32 with bound check?

roderickvd · 2021-03-13T09:22:00Z

Only sound card compatibility. This is for final output to the driver only, internally all samples continue to be stored in f32.

I would use a helper function instead of implementing TryFrom because for PCM audio, it's not about putting the same value into another type, but shifting bytes. Or in the case of i32 to i24, dropping the least significant byte.

JasonLG1979 · 2021-03-13T16:42:43Z

@roderickvd the problem could have very well been fixed between the time I cloned and compiled last and now. I'll have to set up a cross compile environment. Compiling directly on a Pi Zero is for the birds.

Alsa works great in 32-bit float, just not through the current cpal and Rodio. So be sure to launch with --backend alsa.

I compiled with the alsa backend and did use the --backend alsa option when running it.

Sure, should be easy enough to do now the plumbing is there. In the meantime does adding defaults.pcm.dmix.format S24_LE to /etc/asound.conf help?

defaults.pcm.dmix.format S24_LE sets the sound cards input as in what ALSA gives to the card. i.e ALSA's output.

aplay --dump-hw-params /usr/share/sounds/alsa/Front_Right.wav when going though dmix tells you what you can input to dmix. Changing defaults.pcm.dmix.format will not change the output of aplay --dump-hw-params /usr/share/sounds/alsa/Front_Right.wav.

I generally use a custom /etc/asound.conf that I wrote that allows me to set the sampling rate, format, buffer size, and software volume settings. It's basically softvol > dmix > hardware.

JasonLG1979 · 2021-03-13T17:59:53Z

I just cloned and built your branch on my desktop (it takes less than a min to compile which is nice compared to hours for the Pi Zero) and everything seems to work great. PulseAudio has no problem with 32bit float.

Edit: It runs great on my desktop. I have yet to compile on my Pi Zero.

While at it, add a small tweak when converting "silent" samples from float to integer. This ensures 0.0 converts to 0 and vice versa.

roderickvd · 2021-03-13T23:06:43Z

@JasonLG1979 thanks for verifying with PulseAudio. Could you also try Alsa now I've added support for --format S32?

JasonLG1979 · 2021-03-14T00:01:13Z

thanks for verifying with PulseAudio.

I built it with the alsa backend. I'm not even sure what the point of having a PulseAudio specific backend is? PulseAudio when running is the "default" ALSA device for compatibility reasons. A separate PulseAudio backend seems redundant.

Could you also try Alsa now I've added support for --format S32?

Sure I'll give it a shot on my desktop 1st and if that goes well on the Pi Zero.

JasonLG1979 · 2021-03-14T05:23:35Z

@roderickvd running just fine on a Pi Zero. 15 - 20% CPU during normal playback and maybe 50 - 60% while fetching the next song during playback. I'd call that a success, it's not like you're going to be multitasking on a Pi Zero.

If it matters here are the args I used:

-n Librespot --enable-volume-normalisation --normalisation-gain-type track --normalisation-pregain 3 --format S32 --initial-volume 100 --username XXXX --password XXXX --autoplay --disable-discovery --disable-audio-cache -v

JasonLG1979 · 2021-03-14T05:37:22Z

The limiter seems to also work pretty well. Totally unscientific A/B testing between the official Spotify desktop client and your branch on my desktop with --normalisation-pregain 3 to match Spotify's default levels and they sound pretty close to identical. I also ran into a track that hits the limiter pretty hard (1.94 dB), Red Hot Chili Peppers - Under The Bridge, and it didn't distort, breath or duck.

Thanks.

Edit: I would rename "Steepness" to "Knee". Attack, Release and Knee are common words used to describe compressor/limiter parameters. I've never seen Knee called Steepness.

roderickvd · 2021-03-14T12:54:27Z

I built it with the alsa backend. I'm not even sure what the point of having a PulseAudio specific backend is? PulseAudio when running is the "default" ALSA device for compatibility reasons. A separate PulseAudio backend seems redundant.

Well although I'm no PulseAudio expert, I do think PA offers things like networking support and per-application volume control right? Which sets it apart from the Alsa kernel it builds on.

If it matters here are the args I used:

-n Librespot --enable-volume-normalisation --normalisation-gain-type track --normalisation-pregain 3 --format S32 --initial-volume 100 --username XXXX --password XXXX --autoplay --disable-discovery --disable-audio-cache -v

So just checking: this is on Alsa, not on Rodio or PulseAudio?

@roderickvd running just fine on a Pi Zero. 15 - 20% CPU during normal playback and maybe 50 - 60% while fetching the next song during playback. I'd call that a success, it's not like you're going to be multitasking on a Pi Zero.

The limiter seems to also work pretty well. Totally unscientific A/B testing between the official Spotify desktop client and your branch on my desktop with --normalisation-pregain 3 to match Spotify's default levels and they sound pretty close to identical. I also ran into a track that hits the limiter pretty hard (1.94 dB), Red Hot Chili Peppers - Under The Bridge, and it didn't distort, breath or duck.

Righteous!

Edit: I would rename "Steepness" to "Knee". Attack, Release and Knee are common words used to describe compressor/limiter parameters. I've never seen Knee called Steepness.

That's a good suggestion.

JasonLG1979 · 2021-03-14T13:12:11Z

Well although I'm no PulseAudio expert, I do think PA offers things like networking support and per-application volume control right? Which sets it apart from the Alsa kernel it builds on.

I'm not a PulseAudio expert either. Maybe? I will build the branch with the PulseAudio backend and test it on my desktop.

So just checking: this is on Alsa, not on Rodio or PulseAudio?

Yes ALSA. I'd never run PulseAudio on a headless Pi Zero, I'm not a masochist,lol!!! PulseAudio is not designed to run as a system service, but as a per user service. I generally run librespot as a system level service (with restricted permissions ofc) on headless Pi's.

Righteous!

I thought so,lol!!!

That's a good suggestion.

As far as the knee goes I think the steepness is about right, nice and middle of the road. And I'd still use steepness in the description.

JasonLG1979 · 2021-03-14T13:18:10Z

If they're going to do a PulseAudio backend though they should do it right and set the Pulseaudio Application Properties

roderickvd · 2021-03-14T13:35:51Z

If they're going to do a PulseAudio backend though they should do it right and set the Pulseaudio Application Properties

With "they" you mean this project? The PA application and stream name are set at compile-time. Which properties are you missing? I'm not sure what you're getting at but you can consider opening a separate issue.

JasonLG1979 · 2021-03-14T13:48:57Z

With "they" you mean this project? The PA application and stream name are set at compile-time. Which properties are you missing? I'm not sure what you're getting at but you can consider opening a separate issue.

It's not a must. PulseAudio is generally used on desktop systems with Desktop Environments and all the associated settings UI's and complex BS. PulseAudio properties allow things like telling the DE that librespot is a music player and have it's name show up in the sound settings volume panel. It's also a way to send metadata like track title and whatnot when using PulseAudio as a network streamer. (Not that anyone really uses that functionality in PulseAudio, the network stuff I mean)

Something like this, ofc it's a app with a UI and it's in python, but anyway:

https://github.com/pithos/pithos/blob/master/pithos/application.py#L42-L46
https://github.com/pithos/pithos/blob/master/pithos/pithos.py#L747-L750

JasonLG1979 · 2021-03-15T13:13:13Z

@roderickvd PulseAudio looks good to me. Seems to work as well as ALSA as far as I can tell.

JasonLG1979 · 2021-03-15T13:19:41Z

One thing you might consider though is defaulting to 16bit so you don't break librespot for people that upgrade from an existing install that up until this point was 16bit. 16bit is also the most commonly supported format. In my mind that means that librespot would work out of the box for more people.

roderickvd · 2021-03-15T16:19:23Z

@roderickvd PulseAudio looks good to me. Seems to work as well as ALSA as far as I can tell.

Great, thanks for testing. Call on other lurkers to test the other backends as well!

One thing you might consider though is defaulting to 16bit so you don't break librespot for people that upgrade from an existing install that up until this point was 16bit. 16bit is also the most commonly supported format. In my mind that means that librespot would work out of the box for more people.

Yes I was thinking the same. Interestingly though JACK audio supports F32 and not S16. Previously, samples were reformatted from S16 to F32 without the user even knowing.

Currently in my branch --backend jackaudio --format S16 panics instructing the user to use --format F32. But it might be more intuitive to revert to the original behavior, and add a warning that the format is overridden.

What do you think?

I would use a helper function instead of implementing TryFrom because for PCM audio, it's not about putting the same value into another type, but shifting bytes. Or in the case of i32 to i24, dropping the least significant byte.

Yesterday I was slamming my head why I was only hearing white noise. Finally I gave up, then in bed I realized I had actually implemented the three-byte S24_3 array instead of the four-byte S24 array (zero-shifting all bits by eight) 😑
Good news is I'm now close to adding both.

roderickvd · 2021-03-27T20:23:36Z

True, although there's something to be said about librespot being self contained and it just working out of the box on a small ARM based board running Linux as opposed to piping. I haven't tested to see how well piping actually works. I would assume it works just fine?

Yes it does.

roderickvd · 2021-03-27T21:53:18Z

Can someone verify there is no regression in the JACK Audio backend? I can't get playback and am unsure if I'm doing this right. I've got a server set up in working order; running jack_simple_client plays a test tone. However running librespot with just --backend jackaudio outputs no sound (nor errors) both on my branch and on dev.

Running jackd2 (jackdmp version 1.9.12 tmpdir /dev/shm protocol 8) as: jackd -dalsa -s -r 44100
librespot as: librespot --name test --verbose --backend jackaudio --disable-audio-cache

roderickvd · 2021-03-31T18:59:40Z

I reached out for help on the JackAudio mailing list, hoping they'll chime in.

As for SDL, could you give me a free pass? From a code walkthrough I believe it should work fine, but it feels like a burden to write an entire SDL shim just to test playback.

That said I feel like this PR is ready for merge. Are there any obstacles or further points?

sashahilton00 · 2021-04-09T00:37:24Z

Have just tested the Jack audio backend on my mac. My ears aren't good enough to tell the difference, but it appears to run fine. As for SDL, I'm fine with it being merged as is, don't think it warrants making a shim as you mentioned. Happy to merge if you are and there's no further feedback.

roderickvd · 2021-04-09T05:53:43Z

That's great man. I've got one little optimisation I'll commit this evening and let you know.

JasonLG1979 · 2021-04-09T09:33:44Z

@sashahilton00 How long until this ends up in a release once it's merged?

For Rodio, this fixes garbled sound on some but not all Alsa hosts.

roderickvd · 2021-04-09T18:19:28Z

All done and happy to merge!

Hold my beer as I'm very close to opening a follow-up PR with configurable dithering and noise shaping.

sashahilton00 · 2021-04-10T00:29:19Z

Merged. Please update the wiki with any new CLI args and corresponding documentation as necessary.

roderickvd · 2021-04-10T19:14:23Z

Merged. Please update the wiki with any new CLI args and corresponding documentation as necessary.

Done!

herrernst · 2021-04-19T16:14:30Z

@roderickvd Great work! Nowadays, librespot (e. g. on a cheapo Raspi Zero) is probably better than Spotify Connect in some >1000 USD/EUR AVR. (I have done the initial and very naive normalization implementation, sorry for that 😉)

eDad2003 · 2024-02-06T18:49:22Z

A voice from the future here. Thank you @JasonLG1979 and @roderickvd for recognizing the problem and spending the time to improve it. My (noob) take on this endeavor is this: "while Spotify may only transmit 16 bit information, the internal librespot code involved in gain adjustment may decrease resolution/precision of those bits resulting in sound quality degradation. Adjusting the FORMAT parameter will eliminate this decrease (provided your DAC can handle 24 or 32 bits)."

Perhaps the document can reflect this better? Sounds (pun intended) kinda important. I didn't understand the config parameter and googled my way to thread. I was relying on https://github.com/librespot-org/librespot/wiki/Options for information.

Cheers

roderickvd added 4 commits March 12, 2021 23:09

Fix build on Rust < 1.50.0

1672eb8

Fix example

6379926

roderickvd force-pushed the hi-res-volume-control branch from bc05e7f to 3837cc2 Compare March 12, 2021 22:48

Fix Alsa backend for 64-bit systems

a4ef174

roderickvd force-pushed the hi-res-volume-control branch from 3837cc2 to a4ef174 Compare March 12, 2021 22:50

roderickvd mentioned this pull request Mar 12, 2021

Significanly higher CPU usage with JACK audio backend vs. ALSA backend on ARM #343

Closed

Add support for S32 output format

5f26a74

While at it, add a small tweak when converting "silent" samples from float to integer. This ensures 0.0 converts to 0 and vice versa.

Rename steepness to knee

309e264

Minor code improvements and crates bump

bfca1ec

roderickvd added 4 commits March 27, 2021 21:42

Make S16 to F32 sample conversion less magical

cdbce21

Fix formatting

a200b25

Fix buffer size in JACK Audio backend

cc60dc1

Warn about broken backends

d252eee

roderickvd mentioned this pull request Mar 31, 2021

Channel balance correction. #682

Closed

Use AudioFormat size for SDL

07d710e

roderickvd added 2 commits April 5, 2021 21:30

Move SamplesConverter into convert.rs

78bc621

DRY up constructors

928a673

Optimize requantizer to work in f32, then round

d0ea963

roderickvd force-pushed the hi-res-volume-control branch from 175ba02 to d0ea963 Compare April 9, 2021 17:33

roderickvd added 2 commits April 9, 2021 20:01

Bump playback crates to the latest supporting Rust 1.41.1

222f9bb

For Rodio, this fixes garbled sound on some but not all Alsa hosts.

Merge remote-tracking branch 'upstream/dev' into hi-res-volume-control

e20b96c

sashahilton00 merged commit 8fe2e01 into librespot-org:dev Apr 10, 2021

roderickvd mentioned this pull request Apr 10, 2021

Further progress on tokio migration #687

Merged

roderickvd deleted the hi-res-volume-control branch April 19, 2021 19:50

roderickvd mentioned this pull request Apr 20, 2021

Implement dithering #694

Merged

This was referenced May 13, 2021

Highly distorted audio when audio normalization is true hrkfdn/ncspot#522

Closed

Distorted audio when using dynamic normalisation #745

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High-resolution volume control and normalisation #660

High-resolution volume control and normalisation #660

roderickvd commented Mar 1, 2021 •

edited

Loading

roderickvd commented Mar 1, 2021

sashahilton00 commented Mar 2, 2021

roderickvd commented Mar 9, 2021

roderickvd commented Mar 13, 2021 •

edited

Loading

Johannesd3 commented Mar 13, 2021

roderickvd commented Mar 13, 2021

JasonLG1979 commented Mar 13, 2021

JasonLG1979 commented Mar 13, 2021 •

edited

Loading

roderickvd commented Mar 13, 2021

JasonLG1979 commented Mar 14, 2021

JasonLG1979 commented Mar 14, 2021

JasonLG1979 commented Mar 14, 2021 •

edited

Loading

roderickvd commented Mar 14, 2021

JasonLG1979 commented Mar 14, 2021

JasonLG1979 commented Mar 14, 2021

roderickvd commented Mar 14, 2021

JasonLG1979 commented Mar 14, 2021 •

edited

Loading

JasonLG1979 commented Mar 15, 2021

JasonLG1979 commented Mar 15, 2021

roderickvd commented Mar 15, 2021

roderickvd commented Mar 27, 2021

roderickvd commented Mar 27, 2021 •

edited

Loading

roderickvd commented Mar 31, 2021

sashahilton00 commented Apr 9, 2021 •

edited

Loading

roderickvd commented Apr 9, 2021

JasonLG1979 commented Apr 9, 2021

roderickvd commented Apr 9, 2021

sashahilton00 commented Apr 10, 2021

roderickvd commented Apr 10, 2021

herrernst commented Apr 19, 2021

eDad2003 commented Feb 6, 2024

High-resolution volume control and normalisation #660

High-resolution volume control and normalisation #660

Conversation

roderickvd commented Mar 1, 2021 • edited Loading

Enhancements:

Notes:

To do:

Test status:

roderickvd commented Mar 1, 2021

sashahilton00 commented Mar 2, 2021

roderickvd commented Mar 9, 2021

roderickvd commented Mar 13, 2021 • edited Loading

Johannesd3 commented Mar 13, 2021

roderickvd commented Mar 13, 2021

JasonLG1979 commented Mar 13, 2021

JasonLG1979 commented Mar 13, 2021 • edited Loading

roderickvd commented Mar 13, 2021

JasonLG1979 commented Mar 14, 2021

JasonLG1979 commented Mar 14, 2021

JasonLG1979 commented Mar 14, 2021 • edited Loading

roderickvd commented Mar 14, 2021

JasonLG1979 commented Mar 14, 2021

JasonLG1979 commented Mar 14, 2021

roderickvd commented Mar 14, 2021

JasonLG1979 commented Mar 14, 2021 • edited Loading

JasonLG1979 commented Mar 15, 2021

JasonLG1979 commented Mar 15, 2021

roderickvd commented Mar 15, 2021

roderickvd commented Mar 27, 2021

roderickvd commented Mar 27, 2021 • edited Loading

roderickvd commented Mar 31, 2021

sashahilton00 commented Apr 9, 2021 • edited Loading

roderickvd commented Apr 9, 2021

JasonLG1979 commented Apr 9, 2021

roderickvd commented Apr 9, 2021

sashahilton00 commented Apr 10, 2021

roderickvd commented Apr 10, 2021

herrernst commented Apr 19, 2021

eDad2003 commented Feb 6, 2024

roderickvd commented Mar 1, 2021 •

edited

Loading

roderickvd commented Mar 13, 2021 •

edited

Loading

JasonLG1979 commented Mar 13, 2021 •

edited

Loading

JasonLG1979 commented Mar 14, 2021 •

edited

Loading

JasonLG1979 commented Mar 14, 2021 •

edited

Loading

roderickvd commented Mar 27, 2021 •

edited

Loading

sashahilton00 commented Apr 9, 2021 •

edited

Loading