1
0
mirror of https://github.com/OpenRCT2/OpenRCT2 synced 2025-12-10 01:22:25 +01:00
Clone
2
Audio specifics
Michael Steenbeek edited this page 2025-12-01 11:58:36 +01:00
This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This page was written to clear up some things around OpenRCT2s audio layer. We have and have had some issues around crackling, as well as a desire to allow higher sampling rates. There has also been some confusion about what certain things do, like the number of samples in a buffer.

Digital audio in general

The human ear can generally pick up sounds ranging from 20 Hz to 20,000 Hz, although the upper bound gets lower with old age and hearing loss. Analogue sounds are a continuous wave, while digital audio is discrete, meaning it is a series of measurements. This needs to be done often enough to fully capture all the nuances, Nyquist theorem states that this needs to be at least 2.05 times the maximum frequency. For the aforementioned 20,000 Hz, this means it needs to be done at least 41000 times per second. We call this the sampling rate or sampling frequency. Common sampling rates are 44,100 Hz (CD) and 48,000 (DVD), with the latter having become the standard for computer audio as of 2025.

Every sample uses a number of bits, for example CD audio uses 16 bits per sample. Combined with the sampling rate of 44,100 Hz, and multiplying by 2 for the amount of channels, this gives out 1,411,200 bits per second, or 176,400 bytes per second. (Which gives us 747 MiB for a 74-minute CD.)

When playing digital audio on a computer, the samples have to be loaded into a buffer. When the buffer runs dry, the result is often a crackling sound. This can be mitigated by making the buffer larger, but that does mean increased latency. The latency can be calculated by taking the amount of samples in the buffer, dividing them by the sample rate and multiplying by 1000 to get the number in milliseconds. For example, a buffer of 2048 samples playing at an output rate of 22,050 Hz (which is what OpenRCT2 does as of November 2025) means a latency of 93 ms.

When the sampling frequency of the audio file does not match the output frequency, or the amount of channels, or the bitrate, the file has to be resampled. When combining this with streaming the audio off the disk, like RCT2 and OpenRCT2 do, special consideration has to be given to make sure enough data is read to fill the buffer at the output frequency.

Situation in OpenRCT2/RCT2

OpenRCT2 currently uses SDL2 and has to output everything at the same output frequency, bitrate and number of channels. Any audio that is different needs to be resampled to those values before it is mixed in. A rounding error caused OpenRCT2 to read too few bytes. This bug was resolved in November 2025. RCT2 sidesteps this issue by requiring all audio to be 16 bits and use a sampling rate of 22,050 Hz, meaning it only had to account for the amount of channels (as most of its sound effects are in mono), which is a simple power of two.

Future development/SDL3

With the resampling bug fixed, it may be safe for OpenRCT2 to raise the output frequency without running into crackling issues. However, this will need careful testing. A higher output frequency also means that more data has to be moved at any given time, and if the game cannot fill the buffer quickly enough, crackling will still occur. Besides the output frequency, the number of samples in the buffer also needs consideration. Generally, you want to increase the number of samples in the buffer more or less linearly with an increase in sampling rate, as doubling the output frequency obviously means the buffer is essentially halved. Doubling both of them keeps the latency the same.

SDL3s audio system is more flexible around differing sample rates. In SDL2s setup, you open an audio device with a set output frequency, bitrate and number of channels, and any audio sent to the device has to adhere to these parameters. In SDL3, you can set these per stream, meaning that SDL handles any conversions needed. (A similar setup may be possible in SDL2 by using SDL2_mixer, though this has not been fully investigated.)

Unsorted sources