Audio
Processing
by Edward Chow
In this web page, we introduce
-
the basic facts about sound,
-
how sound are digitized using
PCM coding
-
popular MP3 information
-
recording voice/music using
Cool Edit 96.
There is a homework exercise
at the end of the web page.
Reference: Material here
adapted from the following Related web pages and literature.
-
Davis Pan, "A Tutorial on MPEG/Audio
Compression", IEEE Multimedia, pp. 60-74, 1995.
-
Audio
Compression Overview at Simon Fraser University
A good technical overview
of psychoacoustics, MPEG Audio and other audio compression algorithms.
-
Tutorial
paper on MPEG/Audio Compression by Davis Pan
-
Dolby
AC-3: Multichannel Perceptual Coding at Dolby
-
FAQ
about MP3, MPEG Audio Layer 3 by Fraunhofer Institute.
-
Modern Audio Technology by Martin
Clifford, 1992, Prentice Hall,
-
Data and Computer Communications,
by William Stallings, 5th edition, 1999, Macmillan.
Element of Sound
-
Longitudinal wave: medium vibrate
in the same direction of wave advance.
-
Sound can be converted via transducer
to other form of energy for storage.
-
Inverse Square Law: the intensity
of the sound radiation decreases in proportion to the square of its distance
from the sound source.
-
Acoustic masking- one sound
can cover (mask) another making it inaudible (especially in mid- to treble
range)
-
Sound Pressure Level (SPL) is
a variation above and below normal atmospheric pressure.
-
Sound depends on the production
of changes in atmospheric pressure.
-
Audio band- 20-20kHz
0-20Hz- infrasonic
above 20kHz- ultrasonic
(supersonic)
Material here adapted from “Modern
Audio Technology” by Martin Clifford, 1992, Prentice Hall, and “Data and
Computer Communications”, by William Stallings, 4th edition, 1994, Macmillan.
Hearing-response
Characteristics
Basic
characteristics of a sound wave
Harmonics
and Tone Color
The same note play on different
musical instruments generate different tone.
The harmonics composed in
tones make them different.
Digital
vs. Analog
Reality: signal got distorted
over distance because
-
impedance (within transmission
medium)
-
interference (outside force,
e.g. cloud, lightening)
Digital signal can be regenerated
better without the distortion experienced in analog signal.
Recording
analog audio signal using Pulse Code Modulation (PCM)
Analog
signal from microphone or telephone handset are sampled at high frequency,
for telephone it is 8000 samples/sec, for digial music recoding, it is
44.1 samples/sec. Each sample is then represented by 8 bits or 16
bits resprectively. The resulting is a digital bit stream.
Quantization
error can be reduced by having higher sampling rate (sample more frequently)and
more quantization level (more bits for each sample).
Example:
With
Stereo, 16 bits/sample, 44kHz sampling rate, PCM encoding, how many bits
of data will be generated by a three minute sound recording?
Ans: 3min*60sec/min*44000samples/sec*16bits/sample*2(channel)/8bits/byte=31.68MB/s
Masking
-
A phenomenon of the human hearing
system. Normal human ears are sensitive to a wide range of frequencies.
However, when a lot of signal energy is present at one frequency, the ear
cannot hear lower energy at nearby frequencies.
-
We say that the louder frequency
masks the softer frequencies. The louder frequency is called the masker.
Sub
Band Coding (SBC)
-
The basic idea of SBC is to
save signal bandwidth by throwing away information about frequencies which
are masked.
-
The result won't be the same
as the original signal, but if the computation is done right, human ears
can't hear the difference.
-
Divide signal into bands and
perform masking computation and throw away weak signals in each band.
MP3:
MPEG Audio Layer 3
-
It is a perceptual audio coding
scheme, exploiting the masking property of the human ear, and trying to
maintain the original sound quality as far as possible.
-
MPEG Audio specifies a family
of three audio coding schemes, simply called Layer-1, Layer-2, and
Layer-3. From Layer-1 to Layer-3, encoder complexity and performance (sound
quality per bitrate) are increasing.
-
In MPEG 1 Audio standard,
-
all three layers may use 32,
44.1 or 48 kHz sampling frequency.
-
All Layers are allowed to work
with similar bitrates:
Layer-1: from 32 kbps to 448 kbps
Layer-2: from 32 kbps to 384 kbps
Layer-3: from 32 kbps to 320 kbps
-
In MPEG-1: max. 1.5 Mbits/sec
for audio and video, About 1.2 Mbits/sec for video, 0.3 Mbits/sec for audio
-
In MPEG 2 Audio standard, there
two new extensions:
-
low sample
rate extension: extend sampling rates to 8, 16, 22.05 or 24 kHz.
-
multichannel
extension: address surround sound applications, with up to 5 main
audio channels (left, center, right, left surround, right surround) and
optionally 1 extra "low frequency enhancement (LFE)" channel for subwoofer
signals;
-
multilingual
extension: allow the inclusion of up to 7 more audio channels.
-
Compression ratios: MP3 achieve
1:10-1:12 compression ratio, at about 64 kbit/s per audio channel, while
maintaining the original CD sound quality .
-
MPEG Audio Algorithm
Steps in algorithm:
-
Use convolution filters to divide
the audio signal (e.g., 48 kHz sound) into frequency subbands that approximate
the 32 critical bands --> sub-band filtering.
-
Determine amount of masking
for each band caused by nearby band using the results shown above (this
is called the psychoacoustic model).
-
If the power in a band is below
the masking threshold, don't encode it.
-
Otherwise, determine number
of bits needed to represent the coefficient such that noise introduced
by quantization is below the masking effect (Recall that 1 bit of quantization
introduces about 6 dB of noise).
-
Format bitstream

Example:
-
After analysis, the first levels
of 16 of the 32 bands are these:
----------------------------------------------------------------------
Band 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Level (db) 0 8 12 10 6 2 10 60 35 20 15 2 3 5 3 1
----------------------------------------------------------------------
If the level of the 8th band
is 60dB,
it gives a masking of 12
dB in the 7th band, 15dB in the 9th.
Level in 7th band is 10 dB
( < 12 dB ), so ignore it.
Level in 9th band is 35 dB
( > 15 dB ), so send it.
--> Can encode with up to
2 bits (= 12 dB) of quantization error.
MPEG Layers
Audio
Recording and Editing Tool: CoolEdit 2000
- CoolEdit 2000 is a shareware
that records, playbacks, and editing sound files. (GEt 30 day trial version
with limited features on http://www.syntrillium.com/cooledit/).
- Select Start | Program | Cool
Edit 2000 | Cool Edit 2000 and the following main window appear.
- Select File | "new" menuitem,
"New Waveform" dialog window appear.
- Specify the sampling rate, resolution
(bits per sample), and channels(mono/stereo).
For human speech we can get by with
11kHz, 8bit, and mono setting.
For singing, we can increase from
11 kHz to 22 kHz.
For music recording, use 44kHz and
16 bit.
Hit OK after choosing the sampling
parameters.
- Press red circle "Record" button
on the lower left corner
of the screen to start recording.
Hit square "Stop" button
when you are done.
The recorded wave form will be displayed
on the main window.
- If the signal is weak,
Select Transform | Amplify | Normalized
menuitem to enhance the signal to a "normal" level.
- If did not get any recorded signal,
make sure the microphone volume is reasonable high and the mute control is
not checked.
- Editng the sound file by
selecting the area (point and drag) of the wave form and hit control-X or
Edit | Cut menuitem.
- Select File | save as menuitem.
Choose filename and audio file type, typically .au (Sun/Apple),
.ra (RealNetworks), .mp3 and .wav (Microsoft) formats.
Here are three sample
audio files generated by CoolEdit, click to play them:
Vincent
singing 179KB, au format and
Vincent
singing, 33KB, realaudio format and
Vincent
singing, 182KB, wav format.
Homework
Exercises:
- With mono, 8 bits, 22kHz, PCM
encoding, how many bytes of data will be generated by a three minute sound
recording?
- Create audio and link to a web
page.
- Use CoolEdit to record a less
than 5 second voice with mono, 16 bit, 22kHz, encoding.
- Edit out the unnecessary silence
portion of the sound track and apply the normalized special effect.
- Save as both .wav and .ra
file formats with your_login.wav and your_login.ra as file names.
- Ftp your audio files to your
public_html/sounds directory.
- Create hyperlinks in your
class personal web page to the audio files
using <a> tag such as
<a href="sounds/your_login.ra">Vincent
singing, 33KB, realaudio format</a>
- Indicate the file
size for comparison purpose.