docs/ambisonics.txt


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126

OpenAL Soft's renderer has advanced quite a bit since its start with panned
stereo output. Among these advancements is support for surround sound output,
using psychoacoustic modeling and more accurate plane wave reconstruction. The
concepts in use may not be immediately obvious to people just getting into 3D
audio, or people who only have more indirect experience through the use of 3D
audio APIs, so this document aims to introduce the ideas and purpose of
Ambisonics as used by OpenAL Soft.


What Is It?
===========

Originally developed in the 1970s by Michael Gerzon and a team others,
Ambisonics was created as a means of recording and playing back 3D sound.
Taking advantage of the way sound waves propogate, it is possible to record a
fully 3D soundfield using as few as 4 channels (or even just 3, if you don't
mind dropping down to 2 dimensions like many surround sound systems are). This
representation is called B-Format. It was designed to handle audio independent
of any specific speaker layout, so with a proper decoder the same recording can
be played back on a variety of speaker setups, from quadraphonic and hexagonal
to cubic and other periphonic (with height) layouts.

Although it was developed decades ago, various factors held ambisonics back
from really taking hold in the consumer market. However, given the solid
theories backing it, as well as the potential and practical benefits on offer,
it continued to be a topic of research over the years, with improvements being
made over the original design. One of the improvements made is the use of
Spherical Harmonics to increase the number of channels for greater spatial
definition. Where the original 4-channel design is termed as "First-Order
Ambisonics", or FOA, the increased channel count through the use of Spherical
Harmonics is termed as "Higher-Order Ambisonics", or HOA. The details of higher
order ambisonics are out of the scope of this document, but know that the added
channels are still independent of any speaker layout, and aim to further
improve the spatial detail for playback.

Today, the processing power available on even low-end computers means real-time
Ambisonics processing is possible. Not only can decoders be implemented in
software, but so can encoders, synthesizing a soundfield using multiple panned
sources, thus taking advantage of what ambisonics offers in a virtual audio
environment.


How Does It Help?
=================

Positional sound has come a long way from pan-pot stereo (aka pair-wise).
Although useful at the time, the issues became readily apparent when trying to
extend it for surround sound. Pan-pot doesn't work as well for depth (front-
back) or vertical panning, it has a rather small "sweet spot" (the area the
head needs to be in to perceive the sound in its intended direction), and it
misses key distance-related details of sound waves.

Ambisonics takes a different approach. It uses all available speakers to help
localize a sound, and it also takes into account how the brain localizes low
frequency sounds compared to high frequency ones -- a so-called psychoacoustic
model. It may seem counter-intuitive (if a sound is coming from the front-left,
surely just play it on the front-left speaker?), but to properly model a sound
coming from where a speaker doesn't exist, more needs to be done to construct a
proper sound wave that's perceived to come from the intended direction. Doing
this creates a larger sweet spot, allowing the perceived sound direction to
remain correct over a larger area around the center of the speakers.

In addition, Ambisonics can encode the near-field effect of sounds, effectively
capturing the sound distance. The near-field effect is a subtle low-frequency
boost as a result of wave-front curvature, and properly compensating for this
occuring with the output speakers (as well as emulating it with a synthesized
soundfield) can create an improved sense of distance for sounds that move near
or far.


How Is It Used?
===============

As a 3D audio API, OpenAL is tasked with playing 3D sound as best it can with
the speaker setup the user has. Since the OpenAL API does not explicitly handle
the output channel configuration, it has a lot of leeway in how to deal with
the audio before it's played back for the user to hear. Consequently, OpenAL
Soft (or any other OpenAL implementation that wishes to) can render using
Ambisonics and decode the ambisonic mix for a high level of accuracy over what
simple pan-pot could provide.

This is effectively what the high-quality mode option does, when given an
appropriate decoder configuation for the playback channel layout. 3D rendering
and effect mixing is done to an ambisonic buffer, which is later decoded for
output utilizing the benefits available to ambisonic processing.

The basic, non-high-quality, renderer uses similar principles, however it skips
the frequency-dependent processing (so low frequency sounds are treated the
same as high frequency sounds) and does some creative manipulation of the
involved math to skip the intermediate ambisonic buffer, rendering more
directly to the output while still taking advantage of all the available
speakers to reconstruct the sound wave. This method trades away some playback
quality for less memory and processor usage.

In addition to providing good support for surround sound playback, Ambisonics
also has benefits with stereo output. 2-channel UHJ is a stereo-compatible
format that encodes some surround sound information using a wide-band 90-degree
phase shift filter. It works by taking a B-Format signal, and deriving a
frontal stereo mix with the rear sounds attenuated and filtered in with it.
Although the result is not as good as 3-channel (2D) B-Format, it has the
distinct advantage of only using 2 channels and being compatible with stereo
output. This means it will sound just fine when played as-is through a normal
stereo device, or it may optionally be fed to a properly configured surround
sound receiver which can extract the encoded information and restore some of
the original surround sound signal.


What Are Its Limitations?
=========================

As good as Ambisonics is, it's not a magic bullet that can overcome all
problems. One of the bigger issues it has is dealing with irregular speaker
setups, such as 5.1 surround sound. The problem mainly lies in the imbalanced
speaker positioning -- there are three speakers within the front 60-degree area
(meaning only 30-degree gaps in between each of the three speakers), while only
two speakers cover the back 140-degree area, leaving 80-degree gaps on the
sides. It should be noted that this problem is inherent to the speaker layout
itself; there isn't much that can be done to get an optimal surround sound
response, with ambisonics or not. It will do the best it can, but there are
trade-offs between detail and accuracy.

Another issue lies with HRTF. While it's certainly possible to play an
ambisonic mix using HRTF and retain a sense of 3D sound, doing so with a high
degree of spatial detail requires a fair amount of resources, in both memory
and processing time. And even with it, mixing sounds with HRTF directly will
still be better for positional accuracy.