GSM Applications, Ports, Half-Rate GSM

GSM 06.10 lossy speech compression

- telephone quality speech
- 13 kbit/s
- free sourcecode

In 1992, I was working as a tutor at the Technical University of Berlin. The research group I was in needed a speech compression algorithm to support its multimedia conferencing experiments. They found what they were looking for in the ETSI specifications of the Global System for Mobile telecommunication (GSM), Europe's currently most popular protocol suite for digital cellular phones. (John Scourias' overview of GSM does a good job introducing the overall architecture; hire him. Another, more recent, overview of the GSM system (with a list of Web links) comes from Javier Gozàlvez Sempere.)

The low-level speech compression algorithm of the GSM suite is called GSM 06.10 RPE-LTP (Regular-Pulse Excitation Long-Term Predictor). My colleague Dr. Carsten Bormann and I have implemented a GSM 06.10 RPE-LTP coder and decoder in C. Its source code is freely available, and we encourage you to use it, play with it, and invent new real-time media protocols and algorithms.

Our implementation consists of a C library and a stand-alone program. Both are destined to be compiled and used on a Unix-like environment with at least 32-bit-integers, but others have ported it to VMS and a MS-DOS 16-bit-environment. GSM 06.10 is faster than code-book lookup algorithms such as CELP, but by no means cheap; to use it for real-time communication, you will need at least a medium-scale workstation.

When using the library, you create a gsm object that holds the state necessary to either encode frames of 160 16-bit PCM samples into 264-bit GSM frames, or to decode GSM frames into linear PCM frames. If you want to examine and change the individual parts of the GSM frame, you can ``explode'' it into an array of 70 parameters, change them there, and ``implode'' them back into a packed frame; you can also print a whole GSM frame to a file in human-readable format with a single function call.

Our library client, called toast, is modeled after the Unix compress program. Running toast myspeech will compress the file myspeech, remove it, and collect the result of the compression in a new file called myspeech.gsm, while untoast myspeech will reverse the process. The big difference between toast and compress is that toast loses information with each compression cycle. (After a few iterations, you can hear high-pitched chirps that I initially mistook for birds outside of my office window.)

Patent Issues with GSM 06.10

Philips is claiming intellectual property on GSM 06.10. They haven't contacted the authors of this library, but at least two large companies that wanted to integrate GSM 06.10 codecs into their products have been approached; one decided to pull their codec, another to pull just the encoder and leave the decoder. (So, apparently, at least some lawyers think the intellectual property applies only to one half of the process.)

I don't know which parts of the patent are new, or whether it would hold up in court, but of course nobody wants to go to court over an issue as small as this.

The VPIM IETF workgroup is considering using GSM 06.10. The IETF can't standardize on technology that forces its users to pay license fees. If Philips doesn't release their intellectual property for use in VPIM, we'll be wasting a lot of bandwidth with voice mail. That's not the end of the world, but it would be nice to at least ask first.

I don't know whom to ask. If you do, please contact me.

Get ETSI publications free of charge

ETSI is the European standards body that came up with GSM. For a limited time, ETSI is making copies of its publications available over the Internet for the price of giving away an email address, among them the GSM 06.10 and GSM 06.06 drafts and attachments. Try it out at http://pda.etsi.org/pda/.

GSM 06.10: the current patchlevel is 13

Shortly after releasing patchlevel 9 (which has a little more WAV#49 support in the library), a long email correspondence convinced me that the tables used in toast's A-law conversion were entirely fictional.

Nobody seems to use A-law much, so it is conceivable that the wrong tables might have gone undetected since 1992. The new tables have been independently tested against the vectors supplied with G.726, and match those the ITU, Bell Labs, and at least one other implementor use.

Warble, warble, warble


Leila: Are you using a scrambler?
J. Frank: I can't hear you, I'm using a scrambler!
- Repo Man
If you're using the library to encode and decode sound in your project, and the resulting audio is nowhere near telephony quality but sort of warbled, the most likely cause is that you're using the same gsm state to both encode and decode. Don't do that; allocate two different states instead, one for each direction.

Porting to a DEC Alpha

People porting the GSM 06.10 library to DEC Alphas have noticed that the test of the basic math routines fails. The test prints:
0xfffffffe (4294967294) != L_<< (2147483647, 1) -- expected 0xfffffffe (-2)
0x00000000 (-4294967296) != L_<< (-2147483648, 1) -- expected 0
This can be fixed by changing the definitions of the 32-bit types in inc/private.h from long and unsigned long to int and unsigned int. On the Alpha, a long has 64 bits; an int (at least with the unadorned native compiler I used) has 32 bits. The math tests that fail exploit properties specific to 32-bit integer math.

If you don't care about the math test, you don't have to change the types. In spite of the failing test, the library does work fine even with a 64-bit long. (It's been tested against byte-swapped ETSI test patterns.)

The .wav GSM format

There is a .wav chunk format #49 that encodes GSM 06.10 frames. Newer Windows versions support it natively. It's a completely parallel version to ours, written from the same ETSI pseudocode, but ending up with imcompatible framing and different code order in the bytes.

After fretting over intellectual property rights for a few months, Microsoft has now registered the encoding inside the WAV chunk as a MIME type, particularly for use in the context of VPIM (Voice Profile for Interenet Mail)'s spinoff IVM, a way of sending Voice Messages as MIME documents.

The Microsoft ietf-draft is avalable as draft-ema-vpim-msgsm-00.txt from IETF draft repositories.

Long before that, Jeff Chilton figured out the format with trial-and-error when he needed to write compressed wave files for his shortwave radio application.

The patchlevel 9 release of GSM integrates Jeff's ``unofficial'' patch 8 in slightly different form, breaking his sample source code along the way. The updated version has its GSM_OPT_WAV_FMT changed to GSM_OPT_WAV49, and (thanks to Dima Barsky) a more portable way of looking at fputs's result. If you couldn't get it to work earlier on a SysV-ish environment, try again.

GSM 06.10 Errata

The list of tested overflow points for sequence 1 (coder part), table 5.2 of the GSM 06.10 draft, expects 49 overflows in the APCM quantizer's call to abs() (section 4.2.15). Rob Wubben of Philips Research Labs, who implemented a GSM 06.10 codec and counted, found 57 - ditto when he checked the same count in our library, and in a colleague's C simulation of the codec. In our opinion the table is wrong.

(Update: Pierre Larbier reports that the final ETSI release of the GSM 06.10 test sequences, attached to ETS 300 580-2 edition 2 (GSM 06.10 version 4.1.1), has corrected its SEQ01 to produce only the promised 49 overflows.)

Page move from TU Berlin to ... Quut?

A more extensive version of this document used to be at http://www.cs.tu-berlin.de/~jutta/toast.html aka http://kbs.cs.tu-berlin.de/~jutta/toast.html, hosted by the university I studied computer science at many years ago. I don't know whether my alma mater is suffering some sort of technical failure, or whether they finally noticed that I've been gone for about ten years - but my directories there don't seem to exist anymore. I made a couple of backups over the years, but I'll take the opportunity to trim down the page a little.

The place where it is now, quut.com, the Questionable Utility Company, will likely be around as long as I am. Thanks to the Technische Universität Berlin for its long and largely pain-free hosting services; we'll take it from here.

Dr. Carsten Bormann

My co-author Carsten Bormann has left the TU Berlin to accompany Prof. Ute Bormann to the computer science department of the Universität Bremen. Carsten's email address in Bremen is cabo@tzi.org.

The Schur recursion

The Linear Predictive Coding (LPC) part of the GSM algorithm uses an integer version of the ``Schur recursion'' described by Issai Schur in 1917. (The Levinson-Durbin algorithm from 1959 is better known, but the Schur recursion can be faster when paralellized.) Linear prediction means that the algorithm tries to find parameters for a filter that predicts the signal in the current frame as a weighted sum (or ``linear combination'') of the previous ones. (Wil Howitt offers a short tutorial about LPC and CELP)


GSM for X

Java

Kudos to Steven Pickles for a free full-source Java 1.1 port of the GSM 06.10 Decoder side. Unlike the C library, the Java code is licensed under the Free Software Foundation's General Public License; if you use it, keep the library source available.

Chris Edwards did a Java port of the GSM 06.10 Encoder.

An open-source applet that can play lots of different GSM variants (with or without .wav header) is MumboJumbo, from voxeo's Omi Chandiramani. It's being extended to play other sound formats, too, and you can help.

DOS?

Louis Selvon <lselvon@usa.net> has created a new version of toast for DOS, based on the Patchlevel 10 release. As part of his EE thesis work, Louis also measured the objective and subjective performance (not speed, quality) of GSM 06.10 using MatLab (objective) and his family and neighbors (subjective).

Richard Elofsson <rel@ldecs.ericsson.se> has made the his DOS-port of the Patchlevel 4 release available. (He fixed bugs that it took me until Patchlevel 7 to find, though.) The source code, which compiles with Turbo C++ version 1.01, can be found as gsm-dos.zip in the toplevel GSM ftp directory.

Sergey A. Zhatchenko (zha@ergenm.comcen.nsk.su), from Novosibirsk, Russia, has donated a toast.exe, derived from a patchlevel 6 release of the GSM library. Make sure your input filenames have no suffix; this version of toast doesn't know that MS-DOS doesn't like more than one dot in its filenames.

GSM on the BeBox

Pierre-Emmanuel Chaut ported the GSM library to the BeBox, a PowerPC-based multiprocessing platform that excels with concurrent multimedia applications. It takes, he writes, "4 seconds to compress 20 seconds of sound". Way to go, Be.

GSM DLL for OS/2

Terry Fry created, and is now distributing and maintaining, a OS/2 DLL version of the GSM 06.10 library. Next will be a .wav to .gsm for OS/2.

MacGSM

Paul C.H. Ho and Pink Elephant Technologies have used the Patchlevel 6 release to write a drag-and-drop GSM compressor/decompressor that converts between .au.gsm and .au. You'll need System 7.5 or System 7.0 and 7.1 with a Thread Manager extension; 68K and Power PC hardware is fine. The tool, initially written to decompress files broadcast by Radio Television Hong Kong is freeware and is distributed via ftp as a binary.


GSM Applications

GSM for GBA

Damian Yerrick ported part of the library to the Game Boy Advance as part of a portable music player application that plays music off 256 Mbit flash cards.

xine

The GPL'ed free video player xine now uses code from our library to help play GSM-enocded AppleTalk and Windows WAV/AVI/ASF audio tracks.

aRts

The KDE sound server aRts, short for analog realtime synthesizer, has grown a GSM de- and encoder in its kdenonbeta module, thanks to Matthias Kretz.

JusTalk 2

Jonas Tärnström released this compact Windows multiuser voice chat application. It supports multiple sample rates, can function as a client or server, and can be set to stream audio either contiguously or whenever the voice level rises above a threshold.

ElderVision

The makers of the TouchTown Internet package for seniors are using a Java GSM 06.10 client for low-bandwidth telephony.

linphone

Even if you don't speak French, you can now read about and download linphone, a web-phone application that uses the GSM 06.10 library (with a fresh autoconf Makefile from author Simon Morlat).

JVOIPLIB

Jori Liesenborgs's JVOIPLIB is a LGPL'ed voice-over-IP library written in C++, based on his thesis work. It supports multiple codecs and codec parameters, VoIP session creation and destruction, and 3D effects (!). Jori has just integrated the GSM library and will likely be shipping a subset of the GSM 06.10 release with his next version.

OpenH323

OpenH323 is an Open Source implementation of the ITU H.323 protocol stack which runs on Linux, Windows, Solaris and other Unix platforms. The OpenH323 client sample code can interoperate with NetMeeting in audio mode, and can receive H261 format video. The GSM codec is the standard codec used by Linux implementations where G.723.1 hardware is not available.

Patches to the SOund eXchange tool, sox

Andrew Pam (avatar@aus.xanadu.com) has patched Lance Norskog's sox program to work with the GSM library. I wish I had thought of that.

Sox-12.16: Son of SOX

Chris Bagwell (you might remember him as maintainer of the Audio File Format FAQ) has snatched maintenance of the cryptic, resourceful Unix tool sox from its original author, Lance Norskog. Version 12.17 supports GSM and WAV#49.

Pulse Entertainment's 3d web animation plugin

Pulse3d is streaming GSM 06.10 audio to its real-time animated characters, along with the lip sync and and body animation information that makes them come to life.

HotFoon

People with friends in Hyderaband, India, are in luck; hotfoon is offering a (so far) free gateway service to numbers in the local area there. Their small, free client also serves as a gateway to an online chat system; as usual, if you and a friend both download the client, have Duplex sound cards and a reasonably fast Internet connection, you can talk for free across the Internet, no matter where you are.

ATR-ITL

Somewhere towards the tail fin of the Japanese-English telephone "babelfish" that the Advanced Telecommunications Research group's Interpreting Telecommunications Research Laboratories are trying to build, a GSM 06.10 codec is one of the options available for encoding the translated utterances.

NTT's "InterSpace" Virtual Environment

The Virtual Campus of NTT's InterSpace project combines videoconferencing with 3D graphics and, recently added, an audio chat facility that uses our library. The site's entrance graphics show rendered avatars whose heads are replaced by video screens rendered into the scenery, rather ingeniously close to the SnowCrash ideal.

QuickView, the DOS based multimedia viewer

Version 2.3 of QuickView supports GSM 06.10 and a host of other video and audio formats. The viewer is shareware that comes with a three-week free evaluation period; if you're interested in licensing the libraries or building custom viewers, contact Wolfgang Hesseler at qv@multimediaware.com.

Gir: A realtime player for Amiga OS

Sinisa Kesic has developed a small realtime player for Amiga OS named "Gir"; it comes with a browser-like interface for playing music locally or from the net. Included in the package (which can be found in tcp/Gir??.lha in your local Aminet archive) are tools for converting between Amiga raw 8-bit iff samples and GSM, and a "littlegir" plugin for webbrowsers.

XAnim

Mark Podlipec has integrated support for GSM audio into his XAnim, an animation, video, and audio player running under X on Unix and VMS.

CyberPhone

(I guess someone had to come up with that name...) Matt Krokosz and Greg Foglesong present version 2 of CyberPhone, an Internet phone application that runs on Sun workstations and is being ported to Linux PCs. The system comes with an (optional) user directory service running on magenta.com; the full version costs $20, the demo (with 2 minutes of connect time through the central server only) is free.

Speak Freely

Brian C Wiles has been breathing new life into John Walker's Speak Freely, an Internet phone that runs on SGIs, Sun SPARCstation, and (with WINSOCK) on Windows. The tools interoperate seamlessly and can encrypt their voice data streams with IDEA, DES, PGP, and/or a one-time pad. Source code is freely available for both the Unix and Windows release. Version 8.0, now in beta under Windows, features a multipoint conference mode, answering machine messages, and easier interoperation with ICQ.

PCS 1.0 (?)

This isn't really an application, but there is, or used to be, a strongly Intel-influenced industry consortium called the Personal Conferencing Working Group (PCWG) which defined something called the Personal Conferencing Specification (PCS) - yet another desktop video conferencing infrastructure - and, according to Leigh Anne Rettinger's thesis, the first version of it included GSM audio compression. I can't find a trace of these people after 1997; if anyone knows the story of what happened to them, send me email.

xztalk, ztalk

The Linux ``xztalk'' by Liem Bahneman (roland@cac.washington.edu) and Andy Burnett (burnett@baldrick.cecer.army.mil) is based on Scott ``This is so incredibly alpha, it isn't funny'' Doty (scott@cs.santarosa.edu)'s extended version of misch@elara.fsag.de's ``mtalk''. W. Richard Jhang (feinmann@cs.mcgill.ca)'s ztalk is also a descendant of Scott Doty's release; I don't know whether xztalk used ztalk, or whether both were developed independently.

erikyyyphone

Named after the author's IRC nick, ericyyyphone is a GPL-licensed audio conferencing application written in C++, running on Linux.

Microsoft NT and Windows 95 (beta)

Microsoft's Audio Compression Manager includes a GSM 6.10 CODEC (in addition to those for ADPCM, IMA ADPCM, the DSP Group's TrueSpeech(TM), and a PCM converter). The Windows 95 beta added CCITT G.711 u- and A-law CODECs to the collection. Microsoft's GSM 06.10 CODEC is not compatible with toast's frame format - they use 65-byte-frames (2 x 32 1/2) rather than rounding to 33, and they number the bits in their bytes from the other end.

SoundApp for Macs

Norman Franke's SoundApp plays as many audio formats on the Mac as he could get his hands on, among them GSM 06.10 (both ours and Microsoft's). Keeping with the flexible theme, the application has been translated into Japanese, French, and Swedish.

Freewebfone Combo 3+1

Freewebfone Combo 3+1, formerly known as WebWatch for Windows, by Daniel Ding, turns a Pentium with VideoBlaster-compatible capture card and the usual sound support into a video phone. The video codec does H.261's QCIF, the audio is GSM or ADPCM.

Internet Global Phone

Around December 1994, a company called microWonders, Inc., released source code for a GSM-using tool called ``Internet Global Phone'' and publicised the event with a press release that suggested I was distributing their tool. (Longer version.)

The Internet Multicasting Service

The Internet Multicasting Service has been broadcasting audio on the Internet for more than two years, starting with the ``Geek of the Week'' program in March 1993. In addition to its original .au format, it now supports .ra (Real Audio) as well as .gsm.

vat - LBNL Audio Conferencing Tool

Vat was developed by the Lawrence Berkeley National Laboratory's Research Group. It is part of a whole set of tcl/tk applications grouped around IP multicasting on the MBONE (but functional without it). With the most recent 4.0 alpha release, source code is finally available; so are, as before, binary distributions for most Unix platforms.

Nevot 3.34 (December 22nd, 1995)

Henning Schulzrinne's network voice terminal program NeVoT provided packet -voice communications across internetworks. It operates in either unicast, simulated multicast, or IP multicast environments, using the vat or RTP protocols.

CU-SeeMe

I have been told that Cornell's CU-SeeMe for MacIntosh computers supports GSM encoding in some manner. The Web resources list a mysterious new 16 kb/s encoding that ``should work over a 14.4 line'' (the incredibly shrinking compression method!), but I don't know anything specific.


Half-rate GSM and EFR

Enhanced Full-Rate GSM

On November 4th 1995, Nokia announced that the EFR (enhanced full rate) codec they had been developing with the University of Sherbrooke, Canada, had been chosen by the ETSI as the industry standard codec for GSM/DCS. Additionally, the US PCS 1900 operators have also moved to EFR. It's supposed to have ``landline quality,'' be ``more robust to non-voice signals such as music'' and more resilient to ``environments with excess background noise''. Anyone know more about this?

Half-Rate GSM

According to an article posted to comp.dsp by Texas Instruments' Mansoor Chishtie,
GSM half-rate is now a standard. It is based on Motorola's VSELP technology similar to IS-54 full-rate. It compreses speech at 5.6kbps using two 7-bit codebooks for unvoiced speech and one 9-bit codebook for voiced segments.
The draft prETS 300 581-2 (GSM 06.20 Version 4.0.0) is the mathematical description of half-rate GSM.

So, how complex is it?

Good question. According to a posting to comp.dsp from Feb 18 1995 by Chris Cavigioli, back then of Analog Devices, Inc., they have ``joined Alcatel Radiotelephone, Nokia, and Italtel-SIT in a sub-group to evaluate the complexity (MIPs and memory) required of typical 16-bit DSPs, based on bit-exact ANSI C programs supplied by Motorola and ANT Bosch (the two final codec candidates)''; their results have been published in three places:
  1. DSPx '94 Proceedings (theoretical worst case complexity)
  2. DSPWorld '94 ... also known as ICSPAT '94 Proceedings (avg complexity)
  3. Wireless Symposium '95 Proceedings (compare ETSI vs. ADI DSP complexity)
Analog Devices have ``implemented the GSM half-rate standard in DSP assembly code, running in real-time, and meeting the ETSI delay specifications.''

(Of course, this says very little about what will be possible in non-DSP software.)

In the proceedings of the September's EUROSPEECH'95 in Madrid, Tim Fingscheidt, T. Wiechers and E. Delfs have published a paper on ``Implementation Aspects of the GSM Half-Rate Speech Codec'' (pp. 723/726). Tim, whose group implemented a half-rate codec for the NEC muPD77018 based on the 06.06 source code, estimates the complexity of the half-rate codec at 4-6 times that of the full-rate version.

GSM 06.06: sourcecode for GSM 06.20

GSM 06.06 is ANSI C source code for a half-rate codec. Its public review period started on April 10th, 1995. ``Public review'' means that it is for sale as
draft ETS 300 581-7
from the ETSI sales department,
Ms. Anja Mulder
+33 92 94 42 58 (voice)
+33 93 95 81 33 (fax)

At the moment, it doesn't seem as if we're going to implement GSM 06.06 here.

The test patterns for GSM 06.06, GSM 06.07, will become draft ETS 300 581-8, and lag the source code by about two weeks. 06.42 is half-rate voice activity detection, 06.22 comfort noise.


jutta@pobox.com, December 2009. Comments and corrections are welcome.