This chapter presents a modular networked audio system as a reference for the rest of this white paper. A collection of 12 modules are introduced as building blocks of a system. The described system processes audio in acoustic, analogue and networked formats.

Audio System

A collection of components connected together to process audio signals in order to increase a system’s sound quality.

The following pages will elaborate further on audio processes, formats and components.

2.1 Audio processes

A system’s audio processing can include:

table 201: audio processing types




format conversion of audio signals


transport of signals, eg. through cables


storage for editing, transport and playback using audio media, eg. tape, hard disk, CD


mixing multiple inputs to multiple outputs


equalising, compression, amplification etc

The audio system can be mechanical - eg. two empty cans with a tensioned string in-between, or a mechanical gramophone player. But since the invention of the carbon microphone in 1887-1888 individually by Edison and Berliner, most audio systems use electrical circuits. Since the early 1980’s many parts of audio systems gradually became digital, leaving only head amps, A/D and D/A conversion and power amplification remaining as electronic circuits, and microphones and loudspeakers as electroacoustic components. At this moment, digital point-to-point audio protocols such as AES10 (MADI) are being replaced by network protocols such as Dante, EtherSound.

In this white paper, the terms ‘networked audio system’ and ‘digital audio system’ are applied loosely, as many of the concepts presented concern both. When an issue is presented to apply to networked audio systems, the issue does not apply to digital audio systems. When an issue is presented to apply to digital audio systems, it also applies for networked audio systems.

2.2 Audio formats

Although with the introduction of electronic instruments the input can also be an electrical analogue or digitally synthesised signal, in this white paper we will assume all inputs and outputs of an audio system to be acoustic signals. In the field of professional audio, the following identification is used for different formats of audio:

table 202: audio formats




audio signals as pressure waves in air


audio signals as a non-discrete electrical voltage


audio signals as data (eg. 16 or 24 bit - 44.1, 48, 88.2 or 96kHz)


audio data as streaming or switching packets (eg. Ethernet)

A networked audio system includes these audio formats simultaneously - using specialised components to convert from one to another:

table 203: audio format conversion components

source format

destination format


acoustic ->



analogue ->


A/D converter

digital/networked ->


D/A converter

analogue ->



2.3 Audio system components

In this white paper we assume an audio system to be modular, using digital signal processing and networked audio and control distribution. An audio system’s inputs and outputs are assumed to be acoustic audio signals - with the inputs coming from one or multiple acoustic sound sources, and the output or outputs being picked up by one or more listeners. A selection of functional modules constitutes the audio system in between sound sources and listeners.

A typical networked audio system is presented in the diagram below. Note that this diagram presents the audio functions as separate functional blocks. Physical audio components can include more than one functional block - eg. a digital mixing console including head amps, A/D and D/A converters, DSP and a user interface. The distribution network in this diagram can be any topology - including ring, star or any combination.

Acoustic source

An acoustic sound source generates vibrations and radiates them to the air. Sound sources can be omni-directional - radiating to all directions, or directional, concentrating energy in one or more directions. Musical instruments use strings (eg. guitar, piano, violin), surfaces (eg. drums, marimba) or wind (eg. flute, trombone) to generate sound. In nature, sound often is generated by wind shearing past objects (eg. trees, buildings). The output of an audio system is also an acoustic sound source. Finally, almost all human activities (including singing) - and man-made machinery (including car engines and bomb detonations) generate sound. The lowest sound pressure level in dB generated by acoustic sound sources closes in to minus infinity - eg. resting bodies at absolute zero temperature . The maximum undistorted sound pressure level is said to be above 160dBSPL before vacuum pockets start to form in the air. The lowest frequency an acoustic sound source can generate closes in to zero Hertz (‘subsonic’), where the maximum wave pressure frequency in air without distortion is said to be above 1GHz.

Human auditory system

The human auditory system constitutes the combination of two ears and one brain, creating a hearing sensation invoked by audio signals generated by acoustic sources. The inner-ear codes a level range of appr. 120dB and a frequency range of appr. 20kHz into neural firing patterns, and sends them to a specialised part of the brain called ‘auditory nervous system’. The brain interprets the coded signals and invokes a hearing sensation. The hearing sensation is most significantly influenced by changes in level and frequency over time, with the lowest detectable time slot being as low as 6 microseconds. Basic parameters of hearing sensations are loudness, pitch, timbre, localisation.


Microphones convert acoustic signals into electric signals - the analogue domain. Dynamic microphones use a coil and a magnet to generate the electrical signal, condenser microphones reach a higher accuracy using a variable capacitor construction that is much lighter than a coil. Further varieties are Piezo microphones and electromagnetic elements to directly pick up guitar strings.

head amp

The professional audio market adopted a nominal analogue signal level of 0.775Vrms as 0dBu reference for line level audio signals, optimally supporting electronic circuit designs with 9V to 15V balanced power supplies used in many audio products. As microphones generally output a much lower signal level - typically around 0.3mV (-68dBu) for the average sound level of conversational speech at 1 meter from the microphone (60dBSPL), these signals are amplified to a nominal level before entering further electronic circuits using a microphone preamplifier, or ‘head amp’, abbreviated HA. Head amps most commonly have an amplification range of around 70dB, and are designed to have a very low noise floor. The most common Equivalent Input Noise (EIN) of a head-amp is -128dBu (0.3 μVrms), with a maximum input level before clipping of up to +30dBu (24V). But as the balancing and buffering circuits of the HA block also add noise, and analogue level switching changes the signal levels in the gain control circuit, the maximum dynamic range a typical HA delivers to the A/D block is around 112 dB. Of course, whenever the HA gain is increased to match a microphone's signal level, the HA noise floor will increase as well, lowering the dynamic range. More details on head-amp quality issues are presented in chapter 7.

A/D converter

An A/D converter converts electrical (analogue) signals to digital data for further processing in digital audio systems. This process is called ‘sampling’, with most modern A/D converters using a 24-bit data width to represent audio signals. This allows a theoretical dynamic range of approximately 144dB to be registered accurately, with the inaccuracies in the A/D process accumulating in a digital noise floor at -144dB. Most modern digital audio equipment use 48kHz or 96kHz sampling rates, supporting 20kHz or 40kHz frequency ranges. More details on sampling are presented in chapter 5.

distribution network

A distribution network is a collection of components used to transfer data from and to all physical locations in the audio system. The distribution of course includes audio, but it can also include data to control audio components, and other media data such as video and lighting control. A distribution network can consist of multiple point-topoint connections, separately for audio, control and other data. Such a network needs hardware routers or patch panels at every location to patch sources and destinations. This is not only expensive, but it also limits design freedom as functional connections are restricted by physical connections - and for every change in a system’s functional design, the physical design has to change with it. Also, distribution systems based on point-to-point connections have very limited redundancy options. This is why networked systems have become a standard for audio distribution systems - supporting the functional and physical designs to be fully independent and also fully redundant. The audio protocol can be based on Ethernet, or it can include an embedded Ethernet tunnel. As most control systems use Ethernet, and protocol converters are available for other protocols (eg. USB, MIDI, RS232), the use of Ethernet allows virtually any digital data format to be transported over the same distribution network. If the audio system is Ethernet based - using Dante, Ethersound and/or CobraNet, the distribution network will typically be a collection of Ethernet switches and cables. More details on operational (non-audio) quality issues in networks is presented in chapter 9.

change & mixing (DSP)

Digital Signal Processors are used to perform real-time change and mixing of audio signals. Some LSI manufacturers, including Yamaha, Analog Devices, Motorola and Texas Instruments, offer dedicated DSP hardware architecture. Combined with general purpose Field Programmable Gate Arrays (FPGA) chips, the processing power of digital systems has evolved to a level way beyond the capabilities of previously used analogue systems. High data widths - eg. 32 bit or higher - ensure that error residuals of DSP calculations stay well under the head-amp and A/D converter’s noise floors, leaving algorithm design and the user interfacing as main quality parameters for DSP functionality.

In the past, dedicated DSP was normally built into mixing consoles, effect units or speaker processors. But since networks started to support high channel counts, DSP units - including ‘plug-in servers’, ‘mixing engines’, effect units, speaker processing and user-programmable DSP units - can be located anywhere in the system in any quantity. More details on DSP quality issues is presented in chapter 6.

storage (recording, playback, editing)

A digital audio system can process audio in real time, but it also can store audio streams on media such as hard disks, memory cards, CD, DVD for later processing or playback. Through storage, an audio process can flow through multiple audio systems at different time slots - eg. a multitrack live recording being stored on a hard disk, then edited on a second system to an authoring DVD, then mixed down on a third system to CD, then transferred to a customer by post and then played back on a fourth system: the stereo system at the customer’s home. Multitrack recording, editing and authoring is most commonly done with Digital Audio Workstation (DAW) software running on Personal Computers - using Ethernet connectivity to connect to networked audio systems.

D/A converter

D/A converters convert digital audio data to electrical (analogue) signals to be sent to power amplifiers, accepting the same data width and rate as the A/D converters and the distribution network of the audio system.

power amplifier

A power amplifier increases an audio signal voltage to a higher level at a low impedance to drive energy into loudspeakers. Modern power amplifiers use high frequency switching output stages to directly drive loudspeakers (class-D), sometimes combined with AB class circuits (class TD, EEEngine(*2A)). Some power amplifiers have distribution interfaces, DSP (for speaker processing) and D/A converters built-in.


Loudspeakers convert electric signals into acoustic signals. High quality loudspeaker systems use multiple transducers to generate a combined acoustic output, each delivering a separate frequency range. Multiple time-aligned transducers - ‘line arrays’ - can be used to generate coupled acoustic coverage. High frequency transducers (tweeters, compression drivers, ribbon drivers) are available in sizes varying from 0.5” to 3”, mid frequency transducers from 5” to 15”, and low frequency transducers (‘woofers, sub woofers’) from 8” to 21”. Loudspeakers and individual transducers have an efficiency (sensitivity) and a maximum SPL output (peak SPL), standardized through the AES1984 norm. In a networked audio system, the loudspeakers are the most prominent sources of distortion - depending on the build quality of the transducer, but also the enclosure. Fortunately, the kind of harmonic distortion generated in loudspeakers often positively contributes to sound quality.

User interface

To allow sound engineers to operate audio systems, manufacturers of components provide some form of user interface. Conventional (mostly analogue) audio components use hardware ‘tactile’ user interfaces such as knobs and faders as an integral part of the analogue electronic circuitry. The use of digital technology introduced remote and graphic interfaces such as mouse/trackpad, display and touch screens, while the introduction of networking technology allowed multiple user interfaces to coexist in one system, sharing physical connections through the network protocol, and also functionality through common control protocols. Examples are the many available online graphic user interfaces on personal computers and tablets for digital mixing consoles.