In 1992, I was working as a tutor at the Technical University of Berlin. The research group
I was in needed a speech compression algorithm to support its multimedia
They found what they were looking for in the
Global System for Mobile telecommunication (GSM), Europe's
currently most popular protocol suite for digital cellular
phones. (John Scourias'
overview of GSM
does a good job introducing the overall architecture;
Another, more recent,
overview of the GSM system (with a list of
Web links) comes from Javier Gozàlvez Sempere.)
GSM 06.10 lossy speech compression
- telephone quality speech
- 13 kbit/s
- free sourcecode
The low-level speech compression algorithm of the GSM suite is
06.10 RPE-LTP (Regular-Pulse Excitation
My colleague Dr. Carsten Bormann and I have implemented a GSM 06.10
RPE-LTP coder and decoder in C.
code is freely available, and we
encourage you to use it,
play with it, and invent new real-time media protocols and algorithms.
Our implementation consists of a C library and a stand-alone program.
Both are destined to be compiled and used on a Unix-like environment
with at least 32-bit-integers, but others have ported it to VMS and a
GSM 06.10 is faster than code-book lookup algorithms such as CELP,
but by no means cheap; to use it for real-time communication,
you will need at least a medium-scale workstation.
When using the library, you create a gsm object that holds the state
necessary to either encode frames of 160 16-bit PCM samples into 264-bit
GSM frames, or to decode GSM frames into linear PCM frames.
If you want to examine and change the individual parts of the GSM frame,
you can ``explode'' it into an array of 70 parameters, change them there,
and ``implode'' them back into a packed frame; you can also print a
whole GSM frame to a file in human-readable format with a single function call.
Our library client, called toast, is modeled after
the Unix compress program.
Running toast myspeech will
compress the file myspeech, remove it, and collect the result of
the compression in a new file called myspeech.gsm, while untoast
myspeech will reverse the process. The big difference
between toast and compress is that toast loses information with each
compression cycle. (After a few iterations, you can
hear high-pitched chirps
that I initially mistook for birds outside of my office window.)
Patent Issues with GSM 06.10
Philips is claiming intellectual property on GSM 06.10.
They haven't contacted the authors of this library,
but at least two large
companies that wanted to integrate GSM 06.10 codecs into their
products have been approached; one decided to pull their codec,
another to pull just the encoder and leave the decoder.
(So, apparently, at least some lawyers think the intellectual
property applies only to one half of the process.)
I don't know which parts of the patent are new,
or whether it would hold up in court, but of course nobody
wants to go to court over an issue as small as this.
The VPIM IETF workgroup is considering using GSM 06.10.
The IETF can't standardize on technology that forces its
users to pay license fees.
If Philips doesn't release their intellectual property for use in VPIM,
we'll be wasting a lot of bandwidth with voice mail.
That's not the end of the world, but it would be nice to
at least ask first.
I don't know whom to ask.
If you do, please contact me.
Get ETSI publications free of charge
ETSI is the European standards body
that came up with GSM. For a limited time, ETSI
is making copies of its publications available over the
Internet for the price
of giving away an email address,
among them the GSM 06.10 and GSM 06.06 drafts
Try it out at http://pda.etsi.org/pda/.
GSM 06.10: the current patchlevel is 18
Last changes: don't ship MacOS quarantine files! [thanks, Alexander!]
Bad A-law up to patchlevel 9
Shortly after releasing patchlevel 9 (which has a little more WAV#49
support in the library), a long email correspondence convinced
me that the tables used in toast's A-law conversion were entirely
Nobody seems to use A-law much, so it is conceivable that
the wrong tables might have gone undetected since 1992. The
new tables have been independently tested against the vectors
supplied with G.726, and match those the ITU, Bell Labs, and at
least one other implementor use.
If you're using the library to encode and decode sound in
your project, and the resulting audio is nowhere near telephony
quality but sort of warbled, the most likely cause is that
you're using the same gsm state to both encode and decode.
Don't do that; allocate two different states instead, one
for each direction.
Warble, warble, warble
Leila: Are you using a scrambler?
J. Frank: I can't hear you, I'm using a scrambler!
- Repo Man
Porting to a DEC Alpha
People porting the GSM 06.10 library to DEC Alphas have noticed
that the test of the basic math routines fails. The test prints:
0xfffffffe (4294967294) != L_<< (2147483647, 1) -- expected 0xfffffffe (-2)
0x00000000 (-4294967296) != L_<< (-2147483648, 1) -- expected 0
This can be fixed by changing the definitions of the
32-bit types in inc/private.h from long
and unsigned long
to int and unsigned int. On the Alpha, a long
has 64 bits; an int (at least with the unadorned native compiler
I used) has 32 bits.
The math tests that fail exploit properties specific to 32-bit
If you don't care about the math test, you don't have to
change the types. In spite of the failing test, the library
does work fine even with a 64-bit long. (It's been
tested against byte-swapped ETSI test patterns.)
There is a .wav chunk format #49 that encodes GSM 06.10 frames.
Newer Windows versions support it natively. It's a completely parallel
version to ours, written from the same ETSI pseudocode, but ending up
with imcompatible framing and different code order in the bytes.
After fretting over intellectual property rights for
a few months,
Microsoft has now registered the encoding inside the WAV chunk as a
MIME type, particularly for use in the context of VPIM (Voice Profile
for Interenet Mail)'s spinoff IVM, a way of sending
Voice Messages as MIME documents.
The Microsoft ietf-draft is avalable as
from IETF draft repositories.
Long before that, Jeff Chilton
figured out the format with trial-and-error when he needed to write
compressed wave files for his shortwave radio application.
The patchlevel 9 release of GSM integrates Jeff's ``unofficial''
patch 8 in slightly different form,
breaking his sample source code along the way.
The updated version
has its GSM_OPT_WAV_FMT changed to GSM_OPT_WAV49, and (thanks
to Dima Barsky) a more portable way of looking at fputs's
result. If you couldn't get it to work earlier on a SysV-ish
environment, try again.
GSM 06.10 Errata
The list of tested overflow points for sequence 1 (coder part),
table 5.2 of the GSM 06.10 draft, expects 49 overflows in the APCM
quantizer's call to abs() (section 4.2.15).
Rob Wubben of Philips
Research Labs, who implemented a GSM 06.10 codec and counted, found
57 - ditto when he checked the same count in our library, and in
a colleague's C simulation of the codec.
In our opinion the table
(Update: Pierre Larbier reports that the final ETSI
release of the GSM 06.10 test sequences, attached to
ETS 300 580-2 edition 2 (GSM 06.10 version 4.1.1),
has corrected its SEQ01 to produce
only the promised 49 overflows.)
Page move from TU Berlin to ... Quut?
A more extensive version of this document used to be at
http://kbs.cs.tu-berlin.de/~jutta/toast.html, hosted by the university
I studied computer science at many years ago.
I don't know whether my alma mater is suffering some sort of technical
failure, or whether they finally noticed that I've been gone for about
ten years - but my directories there don't seem to exist anymore.
I made a couple of backups over the years, but I'll take the opportunity
to trim down the page a little.
The place where it is now, quut.com, the Questionable Utility Company,
will likely be around as long as I am. Thanks to the Technische Universität
Berlin for its long and largely pain-free hosting services; we'll take it from here.
Dr. Carsten Bormann
Carsten Bormann has left the TU Berlin to accompany Prof. Ute Bormann to the computer science department
of the Universität Bremen.
Carsten's email address in Bremen is email@example.com.
The Schur recursion
The Linear Predictive Coding (LPC) part of the GSM algorithm
uses an integer version of the ``Schur recursion'' described
by Issai Schur in 1917. (The Levinson-Durbin
algorithm from 1959 is better known, but the Schur recursion can be
faster when paralellized.) Linear prediction
means that the algorithm tries to find parameters for a filter
that predicts the signal in the current frame as a weighted
sum (or ``linear combination'') of the previous ones.
(Wil Howitt offers a short tutorial about LPC and CELP)
Kudos to Steven Pickles for
a free full-source Java 1.1 port of the GSM 06.10 Decoder side.
(I don't have a good source for it anymore, but you can find versions of it floating around -- look for GSMDecoder.java.)
Chris Edwards did a Java port of the GSM 06.10 Encoder.
An open-source applet that can play lots of different GSM variants (with or without
.wav header) is MumboJumbo,
from voxeo's Omi Chandiramani. It's being extended to play other sound formats,
too, and you can help.
Louis Selvon <firstname.lastname@example.org> has created a new version
of toast for DOS,
based on the Patchlevel 10 release.
As part of his EE thesis work, Louis also measured the
objective and subjective performance (not speed, quality) of
GSM 06.10 using MatLab (objective) and his family and neighbors
Richard Elofsson <email@example.com> has made the
his DOS-port of the Patchlevel 4 release
available. (He fixed bugs that it took me until
Patchlevel 7 to find, though.)
The source code, which compiles with Turbo C++ version 1.01,
could once be found as dos.zip in the toplevel GSM ftp directory.
Back when FTP was still a thing.
Sergey A. Zhatchenko (firstname.lastname@example.org),
from Novosibirsk, Russia, has donated a toast.exe, derived from a patchlevel 6 release of the GSM
library. Make sure your input filenames have no suffix;
this version of toast doesn't know that MS-DOS doesn't like more
than one dot in its filenames.
GSM on the BeBox
Pierre-Emmanuel Chaut ported the GSM library to the
BeBox, a PowerPC-based
multiprocessing platform that excels
with concurrent multimedia applications.
It takes, he writes, "4 seconds to compress 20 seconds
of sound". Way to go, Be.
GSM DLL for OS/2
Terry Fry created, and was at least for a while distributing and maintaining,
an OS/2 DLL version of the GSM 06.10 library.
Paul C.H. Ho and Pink Elephant Technologies have used the Patchlevel 6
release to write a drag-and-drop
that converts between .au.gsm and .au. You'll need System 7.5 or
System 7.0 and 7.1 with a Thread Manager extension; 68K and Power PC
hardware is fine.
The tool, initially written to decompress files broadcast by
Radio Television Hong Kong,
was freeware and was distributed via
ftp as a binary at ftp://ftp.cuhk.hk/pub/multimedia/macgsm-100.hqx .
GSM for GBA
Damian Yerrick ported part of the library to the Game Boy Advance as
part of a portable music player application that plays music
off 256 Mbit flash cards.
The GPL'ed free video player xine
now uses code from our library to help play GSM-enocded AppleTalk and
Windows WAV/AVI/ASF audio tracks.
The KDE sound server aRts, short for
analog realtime synthesizer,
has grown a GSM de- and encoder in its kdenonbeta module, thanks
to Matthias Kretz.
Jonas Tärnström released this compact
Windows multiuser voice chat application. It supported multiple
sample rates, could function as a client or server, and could be set to
stream audio either contiguously or whenever the voice level rises
above a threshold.
The makers of the TouchTown Internet package for seniors were using
a Java GSM 06.10 client for low-bandwidth telephony.
Even if you don't speak French, you can now read about and download
linphone, a web-phone application
that uses the GSM 06.10 library (with a fresh autoconf Makefile from author Simon
was a LGPL'ed voice-over-IP library written in C++, based on his thesis work.
It supported multiple codecs and codec parameters, VoIP session creation and
destruction, and 3D effects (!).
OpenH323 was an Open Source implementation of the ITU H.323 protocol stack which
runs on Linux, Windows, Solaris and other Unix platforms.
The OpenH323 client sample code can interoperate with NetMeeting in
audio mode, and can receive H261 format video. The GSM codec is the
standard codec used by Linux implementations where G.723.1 hardware is
Patches to the SOund eXchange tool, sox
patched Lance Norskog's
sox program to work with the GSM
library. I wish I had thought of that.
Sox-12.16: Son of SOX
Chris Bagwell (you might remember him as maintainer of the
Audio File Format FAQ) has snatched maintenance of the
cryptic, resourceful Unix tool sox from its original author,
Since Version 12.17, Sox supports GSM
Pulse Entertainment's 3d web animation plugin
is streaming GSM 06.10 audio to its real-time animated characters,
along with the lip sync and and body animation information that
makes them come to life.
People with friends in Hyderaband, India, are in
is offering a (so far) free gateway service to numbers in the
local area there. Their small, free client also serves
as a gateway to an online chat system; as usual, if you
and a friend both download the client, have Duplex sound cards and
a reasonably fast Internet connection, you can talk for
free across the Internet, no matter where you are.
Somewhere towards the tail fin of the Japanese-English
that the Advanced Telecommunications Research group's Interpreting
Telecommunications Research Laboratories
are trying to build, a GSM 06.10 codec is one of the options
available for encoding the translated utterances.
NTT's "InterSpace" Virtual Environment
The Virtual Campus of NTT's
combines videoconferencing with 3D graphics and, recently added,
an audio chat facility that uses our library.
entrance graphics show rendered avatars whose heads are
replaced by video screens rendered into the scenery, rather
ingeniously close to the SnowCrash ideal.
QuickView, the DOS based multimedia viewer
Version 2.3 of QuickView
supports GSM 06.10 and a host of other video and audio formats.
The viewer is shareware that comes with a
three-week free evaluation period; if you're interested in licensing
the libraries or building custom viewers, contact Wolfgang Hesseler
Gir: A realtime player for Amiga OS
a small realtime player for Amiga OS named "Gir"; it comes with
a browser-like interface for playing music locally or from the net.
Included in the package (which can be found in tcp/Gir??.lha
in your local Aminet archive) are tools for converting between Amiga
raw 8-bit iff samples and GSM, and a "littlegir" plugin
Mark Podlipec has integrated support for GSM audio into his
animation, video, and audio player running under X on Unix and VMS.
(I guess someone had to come up with that name...)
Matt Krokosz and Greg Foglesong present version 2
an Internet phone application that runs on Sun workstations
and is being ported to Linux PCs.
The system comes with an (optional) user directory service running
on magenta.com; the full version costs $20, the demo (with
2 minutes of connect time through the central server only) is free.
Brian C Wiles has been breathing new life into John Walker's
an Internet phone that runs on SGIs, Sun SPARCstation,
and (with WINSOCK) on Windows. The tools interoperate
seamlessly and can encrypt their voice data streams
with IDEA, DES, PGP, and/or a one-time pad.
Source code is freely available for both the Unix and
Windows release. Version 8.0, now in beta under Windows,
features a multipoint conference mode, answering machine messages,
and easier interoperation with ICQ.
PCS 1.0 (?)
This isn't really an application, but there is, or used to be, a
industry consortium called the
Personal Conferencing Working
Group (PCWG) which defined
something called the Personal Conferencing Specification (PCS) -
yet another desktop video conferencing infrastructure -
and, according to Leigh Anne Rettinger's thesis,
the first version of it included GSM audio compression.
I can't find a trace of these people after 1997; if anyone
knows the story of what happened to them,
send me email.
The Linux ``xztalk'' by Liem Bahneman
(email@example.com) and Andy Burnett
(firstname.lastname@example.org) is based
on Scott ``This is so incredibly alpha, it isn't funny''
Doty (email@example.com)'s extended version of
firstname.lastname@example.org's ``mtalk''. W. Richard Jhang
is also a descendant of Scott Doty's release; I don't know
whether xztalk used ztalk, or whether both were developed
Named after the author's IRC nick, ericyyyphone is a GPL-licensed audio conferencing application
written in C++, running on Linux.
Microsoft NT and Windows 95 (beta)
Microsoft's Audio Compression Manager includes a
GSM 6.10 CODEC (in addition to those for ADPCM,
IMA ADPCM, the DSP Group's
a PCM converter).
The Windows 95 beta
added CCITT G.711 u- and A-law CODECs to the collection.
Microsoft's GSM 06.10 CODEC is not compatible with toast's
frame format - they use 65-byte-frames (2 x 32 1/2) rather than
rounding to 33, and they number the bits in their bytes from
the other end.
SoundApp for Macs
SoundApp plays as many audio formats on the Mac as he
could get his hands on, among them GSM 06.10
(both ours and Microsoft's). Keeping with the flexible theme,
the application has been translated into Japanese, French, and Swedish.
Freewebfone Combo 3+1
Freewebfone Combo 3+1, formerly known as WebWatch for Windows, by Daniel Ding,
turns a Pentium with VideoBlaster-compatible capture card and
the usual sound support into a video phone.
The video codec does H.261's QCIF, the audio is GSM or ADPCM.
Internet Global Phone
Around December 1994, a company called microWonders, Inc.,
released source code for a GSM-using tool called
``Internet Global Phone'' and publicised the event
with a press release that suggested I was
distributing their tool.
The Internet Multicasting Service
The Internet Multicasting
Service has been broadcasting audio on the Internet
for more than two years, starting with the ``Geek of the
Week'' program in March 1993. In addition to
its original .au format, it now supports .ra (Real Audio) as
well as .gsm.
vat - LBNL Audio Conferencing Tool
Vat was developed by
the Lawrence Berkeley National Laboratory's Research Group.
It is part of a whole set of tcl/tk applications grouped around
IP multicasting on the MBONE (but functional without it).
With the most recent 4.0 alpha release, source code is finally
available; so are, as before, binary distributions for most
Nevot 3.34 (December 22nd, 1995)
Henning Schulzrinne's network voice terminal program NeVoT provided packet -voice communications across
internetworks. It operates in either unicast,
simulated multicast, or IP multicast environments, using the
I have been told that Cornell's CU-SeeMe for MacIntosh computers
supports GSM encoding in some manner. The Web resources
list a mysterious new 16 kb/s encoding that ``should work over
a 14.4 line'' (the incredibly shrinking compression method!),
but I don't know anything specific.
Enhanced Full-Rate GSM
On November 4th 1995, Nokia announced that the EFR (enhanced full rate)
codec they had been developing with the University of
Sherbrooke, Canada, had been chosen by the ETSI
as the industry standard codec for GSM/DCS.
Additionally, the US PCS 1900 operators have also moved
to EFR. It's supposed to have ``landline quality,''
be ``more robust to non-voice signals such as music'' and
more resilient to ``environments with excess background noise''.
Anyone know more about this?
According to an article posted to comp.dsp by
Texas Instruments' Mansoor Chishtie,
The draft prETS 300 581-2 (GSM 06.20 Version 4.0.0)
is the mathematical description of half-rate GSM.
GSM half-rate is now a standard.
It is based on Motorola's VSELP technology similar to
It compreses speech at 5.6kbps using two
7-bit codebooks for unvoiced speech and one 9-bit
codebook for voiced segments.
According to a posting to comp.dsp from Feb 18 1995 by Chris Cavigioli, back then of Analog Devices, Inc.,
they have ``joined
Alcatel Radiotelephone, Nokia, and Italtel-SIT in a sub-group to evaluate
the complexity (MIPs and memory) required of typical 16-bit DSPs, based on
bit-exact ANSI C programs supplied by Motorola and ANT Bosch (the two final
codec candidates)''; their results have been published in three
Analog Devices have ``implemented the GSM half-rate standard in DSP assembly
code, running in real-time, and meeting the ETSI delay
- DSPx '94 Proceedings (theoretical worst case complexity)
- DSPWorld '94 ... also known as ICSPAT '94 Proceedings (avg complexity)
- Wireless Symposium '95 Proceedings (compare ETSI vs. ADI DSP complexity)
(Of course, this says very little about what will be possible
in non-DSP software.)
In the proceedings of the September's EUROSPEECH'95 in Madrid,
Tim Fingscheidt, T. Wiechers and E. Delfs have published a
paper on ``Implementation Aspects of the GSM Half-Rate Speech Codec''
(pp. 723/726). Tim, whose group implemented a half-rate
codec for the NEC PD77018 based on the 06.06 source code,
estimates the complexity of the half-rate codec at 4-6 times that of
the full-rate version.
GSM 06.06: sourcecode for GSM 06.20
GSM 06.06 is ANSI C source code for a half-rate codec.
Its public review period started on April 10th, 1995.
``Public review'' means that it is for sale as
from the ETSI sales department,
draft ETS 300 581-7
Ms. Anja Mulder
+33 92 94 42 58 (voice)
+33 93 95 81 33 (fax)
At the moment, it doesn't seem as if we're going to implement
GSM 06.06 here.
The test patterns for GSM 06.06, GSM 06.07, will become
draft ETS 300 581-8, and lag the source code by about two
06.42 is half-rate voice activity detection, 06.22 comfort noise.
email@example.com, July 2017.
Comments and corrections are welcome.