XT-Audio
|
XT-Audio is a platform-independent audio I/O library aiming to achieve low-latency streaming audio with a simple, unified API. It is designed to map directly to underlying audio backends as much as possible, without introducing significant overhead in stream processing. Backends may be compiled-in selectively if you build from source (use -DXT_ENABLE_XYZ or the build scripts). The provided binaries are compiled with all supported backends for the platform (Win32/Linux).
Supported backends are:
ASIO, DirectSound, WASAPI (Windows)
ALSA, PulseAudio, JACK (Linux)
Supported platforms are:
x86/x64 linux (native)
x86/x64 windows (native)
x64 JVM (windows/linux)
x64 .NET Core (linux)
x86/x64 .NET Core (windows)
x86/x64 .NET framework (windows/linux)
Features (availability depends on the backend):
Channel masking
Stream time-stamping
Buffer under/overrun detection
Support both interleaved and non-interleaved operation
Full-duplex operation (JACK, ASIO) or stream aggregation (all others)
The audio platform (XtPlatform) is the entry point to the XT-Audio library. Applications obtain a platform handle using XtAudioInit, and cleanup the library using XtPlatformDestroy. The audio platform translates audio setups (XtSetup, consumer, system and pro-audio) to platform-specific backend identifiers (XtSystem) identifying the supported backends (DirectSound, WASAPI and ASIO on windows, and PulseAudio, ALSA and JACK on linux). XT-Audio may be compiled with support for specific backends enabled. Therefore not all systems which are supported on a given platform may actually be available in the compiled binary. Use XtPlatformGetSystems to determine which systems are available. Use XtPlatformSetupToSystem to translate a platform-independent audio setup (consumer, system and pro-audio) to a platform-specific system identifier. From there, use XtPlatformGetService to start working with a specific audio service.
An audio service (XtService) is an implementation of the XtService API on a specific backend. Services are used to query and open available audio devices and to query service-specific capabilities. The device list (XtDeviceList) represents metadata about the available audio devices within a given service. The device list allows retrieving device names and identifiers for all audio devices in a service without actually opening the device. Querying other metadata such as device capabilities may have to open the device (depending on the backend). XtService also allows retrieving a default device-id, for backends which support the notion of a default device. In addition, the XtService API provides a stream-aggregation feature which allows the application to combine multiple devices into a single audio stream. This is primarily intended to emulate full-duplex on backends which do not natively support this, but may be used to combine any number of input and output devices into a single audio stream.
An audio device is an implementation of the XtDevice API on a specific backend. Depending on the backend type, devices may directly correspond to physical audio cards (ASIO, ALSA hw, WASAPI exclusive), system mixers on top of such cards (ALSA plughw, WASAPI shared, DirectSound), or sound services which allow user- defined routing from application input/output to physical or virtual devices (JACK and PulseAudio). The device API may be used to query format support and buffer sizes, and to create audio streams.
An audio stream is an implementation of the XtStream API on a specific backend. Each stream is tied to an application-defined callback function (XtOnBuffer) which is called whenever the stream is ready to deliver or receive more data. The XtStream API can be used to control (start/stop) audio streams and to query latencies and buffer sizes. In addition, applications may optionally supply a buffer under/overrun callback (XtOnXRun) and stream state changed callback (XtOnRunning).
Include <xt/XtAudio.h> for C, <xt/XtAudio.hpp> for C++ and link xt-audio.dll/libxt-audio.so. Reference Xt.Audio.dll for .NET, xt.audio-(version).jar for java.
An application must first initialize the library using XtAudioInit. From there, using the returned platform handle it can select an audio service, query for available devices, and open streams on those devices. The choice of backend (system) greatly affects the way audio is streamed in terms of latency, supported formats and whether the application takes system-wide exclusive control of a device. If you know exactly what your application needs, use XtPlatformSetupToSystem to select a backend with specific characteristics. Otherwise it's probably best to let the end-user pick the system (XtPlatformGetSystems) and device to use.
Streaming audio with XT-Audio involves at least these steps:
When using interleaved access, the data passed to the callback (XtOnBuffer) is a pointer to a single array of samples, and the audio buffer contains alternating samples for each channel, e.g. [LRLRLR] for a stereo stream. When using non-interleaved access, there is one audio buffer for each channel, and the data passed to the callback is a pointer to an array of pointers each pointing to the audio buffer for a single channel, e.g. [LLL][RRR] for a stereo stream.
The input/output buffers passed to XtOnBuffer should be cast to the appropriate type before usage. This means unsigned char* (UInt8, Int24), short* (Int16), int* (Int32) or float* (Float32) for interleaved buffers, or unsigned char** (UInt8, Int24), short** (Int16), int** (Int32) or float** (Float32) for non-interleaved buffers.
XT-Audio allows an application to open an audio stream on more than one device. This is primarily intended to emulate full-duplex operation on systems which don't natively support it, but may be used to aggregate any number of input and/or output streams. The resulting stream's channel count will be equal to the total number of channels requested for each of the underlying streams. Aggregate streams introduce an intermediate buffer to keep underlying streams in sync, and as such (and unlike anything else in XT-Audio) they actually add latency to a running stream. Aggregate streams are created using XtServiceAggregateStream instead of XtDeviceOpenStream. Other than that, the stream interface (including callback functions) is exactly the same as for regular audio streams. Stream aggregation is not supported for backends which natively support full-duplex.
Applications that wish to extend XT-Audio's feature set can do so by using backend-specific functionality through XtDeviceGetHandle and XtStreamGetHandle. This allows direct access to IASIO, IDirectSound, IMMDevice/IAudioClient, snd_pcm_t, pa_simple, jack_client_t etc depending on the backend.
The core API uses error codes, while C++, Java and C# API's use exceptions.
Errors returned from backend API's typically (but not exclusively) correspond to expected error conditions (such as device unavailable) and are handled by returning XtError in the C API and by throwing XtException in language bindings. Errors caused by faulty application usage (such as invalid argument) are handled (by default, see XtAudioAssertTerminates()) by std::terminate() in the C API and by throwing some language-default exception type in language bindings. Internal failures which can not be communicated back to the application always result in std::terminate(). All errors, whether fatal or not, pass through XtOnError.
In the C and C++ API's, all strings are encoded as UTF-8. In the C# and Java API's strings use the built-in string type. XT-Audio provides ToString()-like functionality for basic data types (all enums, source location and error info) using the XtPrint* interfaces for C or operator << (std::ostream&) for C++. For .NET and Java, use the built-in ToString() functions. In particular, services and devices do NOT provide ToString()-like functionality. Applications should use XtPrintSystem/XtDeviceListGetName instead.
In all language API's there are 4 resources that must be explicitly managed: device lists, devices, streams, and the library itself. For C, this means XtDeviceListDestroy/XtDeviceDestroy/XtStreamDestroy/XtPlatformDestroy. For C++ everything is handled using unique_ptrs. For Java each resource implements Closeable, for C# IDisposable. No other resources are returned to the application that must be cleaned up by the caller. Strings and collections are handled by the language's respective data types (eg std::string, System.String, java.lang.String and std::vector or built-in array types). For C, string and array-like API's either return statically-allocated data or accept caller-allocated buffers.
The XtOnBuffer callback receives a pointer to the XtBuffer structure which in turn contains pointers to the native (unmanaged) audio data. Java and .NET applications may operate directly on these native buffers (using System.IntPtr or com.sun.jna.Pointer, respectively), or may choose to have audio data represented to them using managed arrays (e.g. byte[], short[] etc). Managed ("safe") buffering copies the audio data into/out of the CLR/JVM, but does not introduce additional latency. Applications can use safe buffers using the XtSafeBuffer API. After opening a stream, use XtSafeBuffer.Register to set up a safe buffer for that stream. Then inside the XtOnBuffer callback, use XtSafeBuffer.Lock before working with the audio data (copies in the captured data) and XtSafeBuffer.Unlock after working with the audio data (copies out the rendered data). In between the calls to Lock()/Unlock(), use the GetInput()/GetOutput() methods to retrieve the managed input/output audio buffers. Before closing the audio stream, use XtSafeBuffer.Dispose()/close() to clean up.
The input/output buffers retrieved from XtSafeBuffer should be cast to the appropriate type before usage. This means byte[] (UInt8, Int24), short[] (Int16), int[] (Int32) or float[] (Float32) for interleaved buffers, or byte[][] (UInt8, Int24), short[][] (Int16), int[][] (Int32) or float[][] (Float32) for non-interleaved buffers.