This document quickly explains how the sndio API can be used to solve common audio development problems. It applies to both writing new code and porting existing code to sndio. This document doesn't explain how to invoke sndio functions, which are already described in the sio_open(3) manual page.
The main goal of this document -- and aim of the sndio API itself -- is to ease writing correct code, improving audio software quality and robustness. An easy approach to correctness is to keep things as simple as possible and the sndio API design goal is to allow to do so. So, if something looks complicated, the approach may be wrong. In some cases it's even better not to have a given useful -- but complicated -- feature rather that adding hackish code that may hurt the overall correctness and robustness of the application.
The sndio device model is built around two components:
Certain applications support multiple parameters sets, so if the above steps failed, you may want to retry with another set. But that's unlikely to work in real life for two reasons:
There's a special case. Some very rare applications support any format and want direct access to the hardware. They have another possibility:
...
par.pchan = 2;
par.sig = 1;
par.bits = 16;
par.le = SIO_LE_NATIVE;
par.rate = 44100;
if (!sio_setpar(hdl, &par))
errx(1, "internal error, sio_setpar() failed");
if (!sio_getpar(hdl, &par))
errx(1, "internal error, sio_getpar() failed");
if (par.pchan != 2)
errx(1, "couldn't set number of channels");
if (!par.sig || par.bits != 16 || par.le != SIO_LE_NATIVE)
errx(1, "couldn't set format");
if (par.bits != 16 || par.bps != 2)
errx(1, "couldn't set precision");
if (par.rate < 44100 * 995 / 1000 ||
par.rate > 44100 * 1005 / 1000)
errx(1, "couldn't set rate");
...
If the sndiod(1) is used, sio_setpar(3) will always use the correct parameters. If the audio(4) backend is used, sio_setpar(3) may set the device to other parameters, so the new ones must be checked with sio_getpar(3).
There might be apps that don't care about certain parameters. For instance a guitar tuner might work at any sample rate and any number of channels but requires s16 encoding. In this case, just set the encoding parameters, leaving the channels and the rate to the default ones. Do not set the rate/channels, because if sndiod(1) is used, this might trigger useless format conversions and resampling.
So the code looks exactly as the above example but without the rate setting.
So the application must estimate the maximum time it will take to prepare the data and to fill the buffer and then choose a slightly larger buffer size by setting the appbufsz parameter in the sio_par structure.
On a multitasking system, the delay estimation must take into account the other processes hogging the system. On OpenBSD a margin of around ~5-10ms seems OK. If the buffer size is not set, the audio subsystem will choose a reasonable value, something around 50ms.
Example, consider a file player. It's organized as follows:
for (;;) {
read_file_to_fifo();
play_from_fifo();
}
the maximum time it takes to the application to call play_from_fifo()
is roughly equal to the maximum time read_file_to_fifo() takes to
complete. Reading from a file, may block for around 50ms, so say
~100ms of buffer is largely OK. If the file uses 44.1kHz sampling
rate, then the buffer size is:
0.1s * 44100Hz = 4410 frames
Below are few orders of magnitudes of maximum delays measured on a slow i386 with ~2 users doing simple stuff (editors, basic X11, compilations):
| operation | max delay |
|---|---|
| extract a block from a CD | 300ms |
| read less than 64kB from hard disk | 50ms |
| read from a pipe + pair of context switches | 10ms |
Note: the device may choose a different buffer size that the one the application requested. In any case the application must use sio_getpar() and take into account the actual buffer size.
Timing information is available by setting up a callback with the sio_onmove(3) function:
struct sio_par par;
long long writecnt; /* frames written (in bytes) */
long long readcnt; /* frames read (in bytes) */
long long realpos; /* frame number Joe is hearing */
void
cb(void *addr, int delta)
{
realpos += delta;
}
int
main(void)
{
sio_hdl *hdl;
sio_par par;
...
writecnt = readcnt = realpos = 0;
sio_onmove(hdl, cb, NULL);
...
for (;;) {
...
writepos += sio_write(hdl, buf, count);
...
readpos += sio_read(hdl, buf, count);
...
}
...
}
The callback is invoked every time a block is processed by the hardware.
It's called from one of the following functions:
It's given by realpos, above. If the application needs this expressed in seconds:
realpos_sec = realpos / par.rate;Note that in earlier versions of libsndio, ``realpos'' could be negative, but that feature was removed.
writepos = writecnt / (par.pchan * par.bps); /* convert to frames */ bufused = writepos - realpos;
The recording latency is generally zero, since the application is waiting and consuming the data immediately.
Certain applications ask for the number of bytes left in the playback buffer assuming that sio_write(3) will not block if the program writes less than the space available in the buffer. This is wrong, but sometimes it's not desirable to fix the application so the buffer space used is calculated as follows:
space_avail = par.bufsz - bufused;Note that we don't use par.appbufsz, but par.bufsz which is read-only but takes into account any buffering, including uncontrolled network buffers.
readpos = readcnt / (par.rchan * par.bps); bufused = realpos - readpos;
Certain applications require to sleep until there's space for at least one block in the play buffer. There's no way to wait for such an event, and that's not compatible with unix file semantics. The best approach is to change the application to use better semantics; if that's not possible, wait until the stream is writable as follows:
void
wait_space_avail(void)
{
int nfds, revents;
struct pollfd pfds[1];
do {
nfds = sio_pollfd(hdl, pfds, POLLOUT);
if (nfds > 0) {
if (poll(pfds, nfds, -1) < 0)
err(1, "poll failed");
}
revents = sio_revents(hdl, pfds);
if (revents & POLLHUP)
errx(1, "device disappeared");
} while (!(revents & POLLOUT));
}
Another approach would probably lead to stuttering or to a busy loop
which, in turn, may lead to stuttering.
If poll(2) is called with no file descriptors and non-zero timeout it would block. The correct Example:
...
nfds = sio_pollfd(hdl, pfds, POLLOUT);
if (nfds > 0) {
if (poll(pfds, nfds, -1) < 0)
err(1, "poll failed");
}
revents = sio_revents(hdl, pfds);
...
If we forget to check whether nfds is positive, poll(2) may be called
with no descriptors to poll, and the program will hang forever.
Audio is a continuous stream of frames, however the hardware processes them in blocks. A typical player will have an internal ring that will be filled by the player and consumed using sio_write(3). If the ring size is multiple of the hardware block size, then calls to sio_write(3) will be optimal.
The block size is stored in the ``round'' field of the sio_par structure, and is negotiated using sio_setpar(3) and sio_getpar(3). Application should round their internal buffer sizes as follows:
buf_size = desired_buf_size + par.round - 1; buf_size -= buf_size % par.round;
The ``round'' parameter is very constrained by the hardware, so sio_setpar(3) only uses it as a hint.
When changing the ``appbufsz'' parameter, an optimal block size is calculated by the sio_setpar(3) function. The sio_setpar(3) function will evolve to cope with future hardware and software constraints, so it's supposed to always do the right thing, on any hardware. So, to get the maximum robustness, don't change the block size.
Synchronization is based on the callback set with the sio_onmove(3) function. It's called periodically, once every time a block is processed. Basically this provides clock ticks to the program, corresponding to the sound card's clock.
If the block size is large, the tick rate is low, and the time makes big steps, that may not be desirable for applications requiring higher clock resolution. The easier solution is to use a smaller block size to get a higher tick rate. This approach has the advantage of being very accurate, but it's CPU intensive. Also it's not always possible to choose the block size (eg. because of hardware constraints).
Example: a video player plays 25 images per second. To get a smooth video, images must be displayed at regular time intervals. Thus the clock resolution must be at least twice the image rate, so 50 ticks per second. If the audio is at 44.1kHz, the maximum block size to get smooth video is:
44100Hz / 50Hz = 882 frames per block
Another solution is to use large block size, and extrapolate the time between clock ticks using gettimeofday(2). This is more complicated to get right, but works in all situations, is less CPU intensive and works even if very high clock resolution is needed.
It's as simple as calling sio_setvol(3) with a value in the 0..127 range, where 0 means ``mute the stream'' and 127 is the maximum volume (the default). Certain apps use percents in the 0..100 range, if so a conversion must be done as follows:
#define PCT_TO_SIO(pct) ((127 * (pct) + 50) / 100)
#define SIO_TO_PCT(vol) ((100 * (vol) + 64) / 127)
void setvol(int p)
{
...
sio_setvol(hdl, PCT_TO_SIO(p));
}
There's no getter for the current volume; instead the program can install a callback to be notified about volume changes:
void
cb(void *addr, unsigned vol)
{
redraw_volume_slider(SIO_TO_PCT(vol));
}
int
main(void)
{
...
sio_onvol(hdl, cb, NULL);
...
for (;;) {
p = mouse_event_to_pct();
setvol(p);
}
}
Certain applications require a ``get volume'' function and work as follows:
for (;;) {
p = volume_slider_to_pct();
setvol(p);
p = getvol();
move_volume_slider(p);
}
One may think that it's enough to set a global ``current volume''
variable in the callback and to return it in the getter. This can't
work because the below property is required:
x == SIO_TO_PCT(PCT_TO_SIO(x)) /* for all x */ y == PCT_TO_SIO(SIO_TO_PCT(y)) /* for all y */So it may lead to various weired effects like the cursor stuttering around a given position, or ``+/- volume'' keyboard shortcuts not working. The correct implementation is to use feedback as in the above section, if that's not possible, a fake getter can be implemented as follows:
unsigned current_pct;
void
cb(void *addr, unsigned vol)
{
if (vol != PCT_TO_SIO(current_pct))
current_pct = SIO_TO_PCT(vol);
}
unsigned
getvol(int p)
{
return current_vol;
}
Pause and resume functions do not exist, because it's hard to properly implement on any hardware. If the pause feature is required, the easier is stop the stream with sio_stop() and later to restart it with sio_start().
A possible (but not necessary) improvement would be to fill the play buffer with silence when resuming. The buffer size is obtained in the ``appbufsz'' parameter using sio_getpar().
Update : Doing nothing would also work, but only in few cases. Just stop providing data to play, the stream will underrun and stop automatically. Once data is available again, the stream will resume automatically. But this abuse of the xrun mechanism is not desirable for two reasons:
The sndio library can be safely used in multi-threaded programs as long as all calls to function using the same handle are serialized. This is achieved either with locks or by simply running all sndio related bits in the same thread. Anyway, using multiple threads to handle audio I/O buys nothing since the process is I/O bound.
Certain programs expect to register a callback that will be invoked automatically by the audio subsystem whenever the play buffer must be filled. For instance, Windows, jack and portaudio APIs use such semantics; callbacks are called typically by a real-time thread or at interrupt context. This approach is equivalent to the read/writed based approach widespread on Unix. Consider the following callback-style pseudo-code:
void
play_cb(void *buf, size_t buflen)
{
/* fill buf with data to play */
}
int
main(void)
{
register_audio_callback(play_cb);
...
wait_forever();
}
It could be rewritten using read/write style semantics:
void
play_cb(void *buf, size_t buflen)
{
/* fill buf with data to play */
}
int
main(void)
{
unsigned char *buf;
unsigned buflen = par.round;
...
for (;;) {
play_cb(buf, buflen);
sio_write(hdl, buf, buflen);
}
}
there's no fundamental difference.
In other words any callback style API could be exposed using sndio.
The only remaining problem is where to put the sndio loop.
If the program is single-threaded, then it uses a poll()-based event loop, in which case non-blocking I/O should be used and the sndio bits should be hooked somewhere in the poll() loop. But such programs probably come from the unix world and don't use a callback-style API.
If the program is multi-threaded, then the simpler is to spawn a thread and run above simple loop in it. The thread could be spawned when sio_start() is called and terminated when sio_stop() is called; if so the threads contains real-time code paths only, and its scheduling priority could be cranked.
Multi-threaded programs use locks for synchronization, and we want no thread to sleep while holding a lock. To avoid holding a lock while a blocking sio_write() is sleeping, one can use non-blocking I/O and sleep in poll() without holding the lock. In other words sio_write() should be expanded as follows:
unsigned char *p = buf;
struct pollfds pfds[MAXFDS];
...
pthread_mutex_lock(hdl_mtx);
...
for (;;) {
if (p - buf == buflen) {
play_cb(buf, buflen);
p = buf;
}
n = sio_pollfds(hdl, pfds);
pthread_mutex_unlock(&hdl_mtx);
if (poll(pfds, n, -1) < 0)
continue;
pthread_mutex_lock(&hdl_mtx);
if (sio_revents(hdl, pfds) & POLLOUT) {
n = sio_write(hdl, p, buflen - (p - buf));
p += n;
}
}
If for any reason a full-duplex program stops consuming recorded data, there's a buffer overrun and recording stops. But since playback and record direction are synchronous, this will stop playback too. For instance, waiting for playback to drain without consuming recorded data will never complete, because the record direction will pause the stream because of the overrun. Deadlock occurs.
The ``appbufsz'' parameter is the size of the buffer the application is responsible for keeping non-empty (playback) or non-full (record). It should never be used for latency or buffer usage calculations.
The ``bufsz'' parameter is read-only and gives the total buffering between the application and Joe's ears, it's actually the latency. It takes into account any buffering including uncontrolled buffering of network sockets.
Short answer: four. Hardware, as most of the software, stores 24-bit samples in 4-byte words, this format is often referred as ``s24le'' or ``s24be''. It's the default when 24-bit encodings are requested.
However that's not always the case: .wav and .aiff files store 24-bit samples in 3-byte words to save space. This encoding is often referred as ``s24le3'' or ``s24be3''. If a program just reads and plays such files without any processing, it's likely it will try to send the file contends on the audio stream as-is. If so, the parameters should be set as follows:
par.bits = 24; par.bps = 3;
A (wrong) program may use the following approach. Consider the following function to wait for the play buffer to become ready:
void
wait_ready(void)
{
/*
* wait buffer to be consumed, sleep not to hog the CPU
*/
while (bufused > threshold)
usleep(5);
}
where the ``bufused'' variable is updated asynchronously by the
callback set with sio_onmove(3). Then suppose it's called as
follows:
for (;;) {
prepare_data(some_data);
wait_ready();
sio_write(hdl, some_data, count);
}
This will deadlock. The callback is invoked from sio_write(3), but sio_write(3) is not called until the ``bufused'' is updated by the callback. The correct implementation is by using poll(2), as follows, it's also more efficient:
void
wait_ready(void)
{
int nfds, revents;
struct pollfd pfds[1];
do {
nfds = sio_pollfd(hdl, pfds, POLLOUT);
if (poll(pfds, nfds, -1) < 0)
err(1, "poll failed");
revents = sio_revents(hdl, pfds);
} while (!(revents & POLLOUT));
}
channels numbers start from zero and are ordered as follows:
| channel number | physical meaning |
|---|---|
| 0 | main left |
| 1 | main right |
| 2 | rear left |
| 3 | rear right |
| 4 | center |
| 5 | lfe |
above, 0 is the origin, but that's arbitrary. The important point is that ``main left'' is just before ``main right''. This allows for instance the rear speakers to be viewed as a stereo substream.
Copyright (c) 2008-2012 Alexandre Ratchov
Last updated apr 23, 2012