sndio
hints on writing & porting audio code

1 Introduction
- 1.1 Aim of this document
- 1.2 Device model overview
2 Parameter negotiation
- 2.1 Selecting formats and encodings
3 Choosing the buffer size
4 Synchronizing stuff on audio playback
5 Choosing and using the block size
6 Adjusting the volume, mute knob
7 Pausing and resuming
8 Using sndio in multi-threaded programs
9 Windows-style callbacks
10 Pitfalls
11 Glossary

This document contains simple tips on how to write new code for the sndio API as well as how to port existing code to it. This document doesn't explain how to invoke sndio functions, which are already described in the sio_open(3) manual page.

Remember to keep things as simple as possible; the sndio API is designed to make this possible. If something looks complicated, the approach may be wrong. In some cases, it may be better to drop some complicated feature rather than adding hackish code that may hurt the overall correctness and robustness of the application.

1.2 Device model overview

The sndio device model is as follows:

A bidirectional data stream exists between the program and the sound-card, for the play and record directions, respectively.
Data is a sequence of frames, where each frame corresponds to a sample for all channels of the stream. It is submitted and retrieved using functions similar to the read(2) and write(2) syscalls.
A wall clock ticks when samples are processed by the hardware; i.e. the n-th frame of the stream corresponds to the n-th clock tick. The clock is exposed through a callback mechanism: a function registered by the program is called periodically, which takes as argument the number of clock ticks elapsed since the last call.

In other words, the n-th sample read is recorded exactly when the n-th written sample is played. This means that samples to play must be written before recorded samples can be read, otherwise a deadlock will occur.

2 Parameter negotiation

To minimize mistakes, the following approach can be used:

call sio_setpar(3) using the application's native parameters.
call sio_getpar(3) and verify whether returned parameters are usable.

Certain applications support multiple parameters sets, so if the above steps failed, you may want to retry with another set. However, that's unlikely to work in real life for two reasons:

apps often support "common" formats that "common" hardware supports, so if it didn't work, it's probably because the hardware is not so "common". Therefore, trying another "common" format the application supports has little chance of working
that's more code, so greater chance of introducing a bug. Why? To allow the app to emulate the format one particular piece of hardware supports. Is it worth the effort given that sndiod(8) already does emulation and supports any hardware and is always enabled by default?

2.1 Selecting formats and encodings

A typical example is a simple player that tries to play 2-channel, s16 at 44.1kHz. If the audio subsystem doesn't support this format it should just fail, so the following is OK:

...
par.pchan = 2;
par.sig = 1;
par.bits = 16;
par.le = SIO_LE_NATIVE;
par.rate = 44100;
if (!sio_setpar(hdl, &par))
	errx(1, "internal error, sio_setpar() failed");
if (!sio_getpar(hdl, &par))
	errx(1, "internal error, sio_getpar() failed");
if (par.pchan != 2)
	errx(1, "couldn't set number of channels");
if (!par.sig || par.bits != 16 || par.le != SIO_LE_NATIVE)
	errx(1, "couldn't set format");
if (par.bits != 16 || par.bps != 2)
	errx(1, "couldn't set precision");
if (par.rate < 44100 * 995 / 1000 ||
    par.rate > 44100 * 1005 / 1000)
	errx(1, "couldn't set rate");
...

As sndiod(8) is used by default, sio_setpar(3) will always use the requested parameters. If the user has requested direct access to the hardware, then sio_setpar(3) may configure the device to other parameters, so the new ones must be checked with sio_getpar(3).

3 Choosing the buffer size

The buffer size represents the amount of time given to the application to produce data to play (or to consume recorded data). If the application doesn't respect this constraint, xruns will occur.

Therefore, we must estimate the maximum time it will take to prepare the data and to fill the buffer, and then choose a slightly larger buffer size by setting the appbufsz parameter in the sio_par structure.

On a multitasking system, the delay estimate must take into account the other processes hogging the system. On a typical Unix-like system, a margin of around ~5-10ms seems OK. If the buffer size is not set, the audio subsystem will choose a reasonable value, something around 50ms.

For example, consider a file player. It's organized as follows:

for (;;) {
	read_file_to_fifo();
	play_from_fifo();
}

The maximum time it takes for the application to call play_from_fifo() is roughly equal to the maximum time read_file_to_fifo() takes to complete. Reading from a file may block for around 50ms, so around 100ms of buffer is mostly OK. If the file uses a 44.1kHz sampling rate, then the buffer size is:

0.1s * 44100Hz = 4410 frames

The orders of magnitudes of the maximum delay for different operations, measured on a slow i386 system with ~2 users doing simple stuff (editors, basic X11, compilations), can be seen below:

operation max delay
extract a block from a CD 300ms
read less than 64kB from hard disk 50ms
read from a pipe + pair of context switches 10ms

operation	max delay
extract a block from a CD	300ms
read less than 64kB from hard disk	50ms
read from a pipe + pair of context switches	10ms

Note: the device may choose a different buffer size that the one the application requested. In any case, the application must use sio_getpar(3) and take into account the actual buffer size.

4 Synchronizing stuff on audio playback

Timing information is available by setting up a callback with the sio_onmove(3) function:

struct sio_par par;
long long writecnt;	/* frames written (in bytes) */ 
long long readcnt;	/* frames read (in bytes) */
long long realpos;	/* frame number Joe is hearing */

void
cb(void *addr, int delta)
{
	realpos += delta;
}

int
main(void)
{
	sio_hdl *hdl;
	sio_par par;

	...

	writecnt = readcnt = realpos = 0;
	sio_onmove(hdl, cb, NULL);

	...

	for (;;) {
		...

		writepos += sio_write(hdl, buf, count);

		...

		readpos += sio_read(hdl, buf, count);

		...
	}
	...
}

The callback is invoked every time a block is processed by the hardware. It's called from one of the following functions:

sio_revents(3) after poll(2)
blocking sio_write(3) and sio_read(3)

4.1 Absolute play and record positions

The absolute play position is given by realpos, from the above example. If the application needs this expressed in seconds:

realpos_sec = realpos / par.rate;

Note that in earlier versions of sndio, ``realpos'' could be negative, but that feature was removed.

4.2 Playback latency à la GET_ODELAY

The playback latency is the delay (expressed in number of frames) that it will take until the last frame that was written becomes audible. This is exactly the buffer usage:

writepos = writecnt / (par.pchan * par.bps);	/* convert to frames */
bufused = writepos - realpos;

The recording latency is generally zero, since the application is waiting and consuming the data immediately.

4.3 Playback buffer usage à la GET_OSPACE

Certain applications ask for the number of bytes left in the playback buffer, assuming that sio_write(3) will not block if the program writes less than the space available in the buffer. This is wrong, but sometimes it's not desirable to change the application, so the available buffer space could be calculated as follows:

space_avail = par.bufsz - bufused;

4.4 Record buffer usage à la GET_ISPACE

Using this for non-blocking I/O is wrong too, nevertheless the buffer usage is:

readpos = readcnt / (par.rchan * par.bps);
bufused = realpos - readpos;

4.5 Sleeping until there's space for one block in the play buffer

Certain applications want to sleep until there's space for at least one block in the play buffer. There's no way to wait for such an event, and that's not compatible with Unix file semantics.

The best approach is to change the application to use poll(2). If that's not possible, wait until the stream is writable as follows:

void
wait_space_avail(void)
{
	int nfds, revents;
	struct pollfd *pfds = malloc(sio_nfds(hdl) * sizeof(*pfds));
	
	do {
		nfds = sio_pollfd(hdl, pfds, POLLOUT);
		if (nfds > 0) {
			if (poll(pfds, nfds, -1) < 0)
				err(1, "poll failed");
		}
		revents = sio_revents(hdl, pfds);
		if (revents & POLLHUP)
			errx(1, "device disappeared");
	} while (!(revents & POLLOUT));
}

Other approaches would probably lead to stuttering or to a busy loop, which, in turn, may lead to stuttering.

Note, however, that if poll(2) is called with no file descriptors and non-zero timeout, it will hang, and if timeout is negative, it will hang forever. That means we need to check if nfds is positive.

5 Choosing and using the block size

5.1 Getting the block size to optimize I/O

Audio is a continuous stream of frames, but the hardware processes them in blocks. A typical player will have an internal ring that will be filled by the player and consumed using sio_write(3). If the ring size is a multiple of the hardware block size, then calls to sio_write(3) will be optimal.

The block size is stored in the ``round'' field of the sio_par structure, and is negotiated using sio_setpar(3) and sio_getpar(3). Application should round their internal buffer sizes as follows:

buf_size  = desired_buf_size + par.round - 1;
buf_size -= buf_size % par.round;

The ``round'' parameter is very constrained by the hardware, so sio_setpar(3) only uses it as a hint.

5.2 Using a small block size for low latency

The minimum latency a program can get is related to the minimum buffer size, which is often one or two blocks. So if an application needs very low latency, it must use a small block size too, but there's no need to change it explicitly.

When changing the ``appbufsz'' parameter, an optimal block size is calculated by the sio_setpar(3) function. The sio_setpar(3) function will evolve to cope with future hardware and software constraints, so it's expected to always do the right thing, on any hardware. Therefore, in order to get the maximum robustness, don't change the block size.

5.3 Getting higher clock resolution for synchronization

Synchronization is based on the callback set with the sio_onmove(3) function. It's called periodically, every time a block is processed. Basically, this provides clock ticks to the program, which correspond to the sound card's clock.

If the block size is large, the tick rate is low, and time increases in big steps, which may not be desirable for applications requiring higher clock resolution. The easiest solution is to use a smaller block size to get a higher tick rate. This approach has the advantage of being very accurate, but it's CPU intensive. It's also not always possible to choose the block size (e.g. because of hardware constraints).

Example: a video player plays 25 images per second. To get a smooth video, images must be displayed at regular time intervals. Thus, the clock resolution must be at least twice the image rate, i.e. 50 ticks per second. If the audio rate is 44.1kHz, the maximum block size to get smooth video is:

44100Hz / 50Hz = 882 frames per block

Another solution is to use a large block size, and extrapolate the time between clock ticks using gettimeofday(2). This is more complicated to get right, but works in all situations, is less CPU intensive and works even if very high clock resolution is needed.

6 Adjusting the volume, mute knob

6.1 Setting the volume

It's as simple as calling sio_setvol(3) with a value in the 0..127 range, where 0 means ``mute the stream'' and 127 is the maximum volume (the default). Certain apps use percents in the 0..100 range, in that case a conversion must be performed as follows:

#define PCT_TO_SIO(pct)	((127 * (pct) + 50) / 100)
#define SIO_TO_PCT(vol)	((100 * (vol) + 64) / 127)

void setvol(int p)
{
	...

	sio_setvol(hdl, PCT_TO_SIO(p));
}

6.2 Volume feedback (reading the current volume 1)

There's no getter for the current volume; instead the program can install a callback to be notified about volume changes:

void
cb(void *addr, unsigned vol)
{
	redraw_volume_slider(SIO_TO_PCT(vol));
}

int
main(void)
{
	...

	sio_onvol(hdl, cb, NULL);

	...

	for (;;) {
		p = mouse_event_to_pct();
		setvol(p);
	}
}

6.3 Volume getter (reading the current volume 2)

Certain applications require a ``get volume'' function and work as follows:

	for (;;) {
		p = volume_slider_to_pct();
		setvol(p);
		p = getvol();
		move_volume_slider(p);
	}

One may think that it's enough to set a global ``current volume'' variable in the callback and to return it in the getter. This can't work because the following property is required:

x == SIO_TO_PCT(PCT_TO_SIO(x))		/* for all x */
y == PCT_TO_SIO(SIO_TO_PCT(y))		/* for all y */

So it may lead to various weird effects like the cursor stuttering around a given position, or ``+/- volume'' keyboard shortcuts not working. The correct implementation is to use feedback as in the above section. If that's not possible, a fake getter can be implemented as follows:

unsigned current_pct;

void
cb(void *addr, unsigned vol)
{
	if (vol != PCT_TO_SIO(current_pct))
		current_pct = SIO_TO_PCT(vol);
}

unsigned
getvol(int p)
{
	return current_pct;
}

7 Pausing and resuming

Pause and resume functions do not exist, because it's hard to properly implement on any hardware. If the pause feature is required, it's easier to stop the stream with sio_stop(3) and to later restart it with sio_start(3).

Certain programs expect a pause-resume cycle to not change the amount of buffered data. If so, the "resume" function must play the same amount of silence as the amount of data the buffer contained when the "pause" function was called.

Update : Doing nothing would also work, but only in few cases. If you just stop providing data to play, the stream will underrun and stop automatically. Once data is available again, the stream will resume automatically. However, this abuse of the xrun mechanism is not desirable for two reasons:

The device will still be processing data (silence) and will waste CPU time (which consumes more energy from laptop batteries).
This doesn't work if sndiod(8) is used and the subdevice is controlled by MMC. Indeed, sndiod(8) will try to resynchronize after the underrun and will drop a huge amount of samples, corresponding to the duration of the pause.

8 Using sndio in multi-threaded programs

The sndio library can be safely used in multi-threaded programs as long as all calls to function using the same handle are serialized. This is achieved either with locks or by simply running all sndio related bits in the same thread. In any case, using multiple threads to handle audio I/O buys nothing since the process is I/O bound.

9 Windows-style callbacks

Certain programs expect to register a callback that will be invoked automatically by the audio subsystem whenever the play buffer must be filled. For instance, Windows, jack and portaudio APIs use such semantics; callbacks are tipically called by a real-time thread or in an interrupt context. This approach is equivalent to the read/write based approach, which is widespread on Unix. Consider the following callback-style pseudo-code:

void
play_cb(void *buf, size_t buflen)
{
	/* fill buf with data to play */
}

int
main(void)
{
	register_audio_callback(play_cb);
	...
	wait_forever();
}

It could be rewritten using read/write style semantics:

void
play_cb(void *buf, size_t buflen)
{
	/* fill buf with data to play */
}

int
main(void)
{
	unsigned char *buf;
	unsigned buflen = par.round;

	...

	for (;;) {
		play_cb(buf, buflen);
		sio_write(hdl, buf, buflen);
	}
}

there's no fundamental difference. In other words, any callback style API could be exposed using sndio. The only remaining problem is where to put the sndio loop.

If the program is single-threaded, then it uses a poll(2)-based event loop, in which case non-blocking I/O should be used and the sndio bits should be hooked somewhere in the poll(2) loop. However, such programs probably come from the Unix world and don't use a callback-style API.

If the program is multi-threaded, then it is simpler to spawn a thread and run the simple loop from above in it. The thread could be spawned when sio_start(3) is called and terminated when sio_stop(3) is called; if so, the thread contains real-time code paths only, and its scheduling priority could be cranked.

Multi-threaded programs use locks for synchronization, and we don't want a thread to sleep while holding a lock. To avoid holding a lock while a blocking sio_write(3) call is sleeping, one can use non-blocking I/O and sleep in poll(2) without holding the lock. In other words, sio_write(2) could be expanded as follows:

	unsigned char *p = buf;
	struct pollfds pfds[MAXFDS];

	...
	pthread_mutex_lock(&hdl_mtx);
	...

	for (;;) {
		if (p - buf == buflen) {
			play_cb(buf, buflen);
			p = buf;
		}
		n = sio_pollfds(hdl, pfds);
		pthread_mutex_unlock(&hdl_mtx);
		if (n > 0 && poll(pfds, n, -1) < 0) {
			pthread_mutex_lock(&hdl_mtx);
			continue;
		}
		pthread_mutex_lock(&hdl_mtx);
		if (sio_revents(hdl, pfds) & POLLOUT) {
			n = sio_write(hdl, p, buflen - (p - buf));
			p += n;
		}
	}

10 Pitfalls

10.1 Playback and record aren't independent

If for any reason a full-duplex program stops consuming recorded data, there's a buffer overrun and recording stops. But since playback and record direction are synchronous, this will also stop playback. For instance, waiting for playback to drain without consuming recorded data will never complete, because the record direction will pause the stream because of the overrun. Deadlock occurs.

10.2 Using appbufsz instead of bufsz

The ``appbufsz'' parameter is the size of the buffer the application is responsible for keeping non-empty (playback) or non-full (record). It should never be used for latency or buffer usage calculations.

The ``bufsz'' parameter is read-only and gives the total buffering between the application and Joe's ears, i.e. it's the actual latency. It takes into account any buffering including uncontrolled buffering of network sockets.

10.3 How many bytes to store a 24-bit sample

Short answer: four. Hardware, as most of the software, stores 24-bit samples in 4-byte words. This format is often referred to as ``s24le'' or ``s24be'', and it's the default when 24-bit encodings are requested.

However, that's not always the case: .wav and .aiff files store 24-bit samples in 3-byte words to save space. This encoding is often referred to as ``s24le3'' or ``s24be3''. If a program just reads and plays such files without any processing, it's likely it will try to send the file contents on the audio stream as-is. If so, the parameters should be set as follows:

par.bits = 24;
par.bps = 3;

10.4 poll(2) not called fast enough

A (wrong) program may use the following approach. Consider the following function to wait for the play buffer to become ready:

void
wait_ready(void)
{
	/*
	 * wait buffer to be consumed, sleep not to hog the CPU
	 */
	while (bufused > threshold)
		usleep(5);
}

where the ``bufused'' variable is updated asynchronously by the callback set with sio_onmove(3). Suppose it's then called as follows:

for (;;) {
	prepare_data(some_data);
	wait_ready();
	sio_write(hdl, some_data, count);	
}

This will deadlock. The callback is invoked from sio_write(3), but sio_write(3) is not called until ``bufused'' is updated by the callback. The correct implementation uses poll(2) as follows; it's also more efficient:

void
wait_ready(void)
{
	int nfds, revents;
	struct pollfd pfds[1];
	
	do {
		nfds = sio_pollfd(hdl, pfds, POLLOUT);
		if (nfds > 0) {
			if (poll(pfds, nfds, -1) < 0)
				err(1, "poll failed");
		}
		revents = sio_revents(hdl, pfds);
	} while (!(revents & POLLOUT));
}

11 Glossary

channel

that's a mono signal. Multiple channels form an audio stream. For example, a stereo stream has two channels: left and right. Channels are identified by small integers rather than names; so ``channel 0'' means the ``left channel''.

channels numbers start from zero and are ordered as follows:

channel number physical meaning
0 main left
1 main right
2 rear left
3 rear right
4 center
5 lfe

channel number	physical meaning
0	main left
1	main right
2	rear left
3	rear right
4	center
5	lfe

above, 0 is the origin, but that's arbitrary. The important point is that ``main left'' is just before ``main right''. This allows, for example, for the rear speakers to be viewed as a stereo substream.

sample

it's a scalar value representing ``the voltage'' on a given channel. The signal is a sequence of samples. We represent them as integers.

frame

the set of one sample for each channel of the stream. I.e a sample for channel 0 followed by a sample for channel 1, followed by ... a sample for the last channel. Frames are numbered.

rate

the number of frames per second the streams carries, e.g. 44.1kHz, 48kHz

encoding

The format with which samples are stored in memory. Example: 16-bit signed integer with little endian byte order. We use the following abbreviations:

``s'' or ``u'' character for signed or unsigned
followed by the number of bits
followed by ``le'' or ``be'' for little or big endian.
followed by the number of bytes in which the bits are stored.
followed by ``msb'' or ``lsb'' indicating how the significant bits are aligned in the bytes.

Example: s16, s24le, s24le3, s24le3lsb

format

The number of channels, the encoding, and the rate of the stream. Example: stereo, s16le at 44.1kHz

stream

a (possibly bidirectional) connection between two endpoints like the application and the sound card. Basically this is a sequence of frames. If the stream is bidirectional, the two sequences are synchronous: the played frame number N is played simultaneously while the recorded frame number N is captured.

full-duplex stream:

bidirectional stream as described above.

underrun

played frames are buffered. If the producer (eg. the application) doesn't provide frames fast enough, the consumer (eg. the sound card) may end up without frames to play. Thus it will play something else (because it can't stop); often, it plays silence.

overrun

the recorded frames are buffered. If the consumer (eg. the application) doesn't consume them fast enough, the producer (eg. the sound card) may not be able to store newly recorded frames in the buffer, thus it will discard them (because it can't stop recording).

xrun

overrun or underrun. Note that on bidirectional streams, since both directions are synchronous, if one of the directions xruns, the error is present in the other direction as well. For instance, if the play buffer underruns, recorded frames during the underrun are lost.

sndiohints on writing & porting audio code

Table of contents

sndio
hints on writing & porting audio code