GSM Applications,   Ports,   Others,   Half-Rate GSM,   Miscellaneous,   Indices

GSM 06.10 lossy speech compression

- telephone quality speech
- 13 kbit/s
- free sourcecode

[Recent additions are bold.]

In 1992, I was working as a tutor at the Technical University of Berlin. The research group I was in needed a speech compression algorithm to support its multimedia conferencing experiments.  They found what they were looking for in the ETSI specifications of the Global System for Mobile telecommunication (GSM), Europe's currently most popular protocol suite for digital cellular phones.  (John Scourias' overview of GSM does a good job introducing the overall architecture; hire him. Another, more recent, overview of the GSM system (with a list of Web links) comes from Javier Gozàlvez Sempere.)

The low­level speech compression algorithm of the GSM suite is called GSM 06.10 RPE­LTP (Regular­Pulse Excitation Long­Term Predictor).  My colleague Dr. Carsten Bormann and I have implemented a GSM 06.10 RPE­LTP coder and decoder in C. Its source code is freely available, and we encourage you to use it, play with it, and invent new real­time media protocols and algorithms.

Our implementation consists of a C library and a stand­alone program. Both are destined to be compiled and used on a Unix­like environment with at least 32­bit­integers, but others have ported it to VMS and a MS­DOS 16­bit­environment.  GSM 06.10 is faster than code­book lookup algorithms such as CELP, but by no means cheap; to use it for real­time communication, you will need at least a medium­scale workstation.

When using the library, you create a gsm object that holds the state necessary to either encode frames of 160 16­bit PCM samples into 264­bit GSM frames, or to decode GSM frames into linear PCM frames.  If you want to examine and change the individual parts of the GSM frame, you can ``explode'' it into an array of 70 parameters, change them there, and ``implode'' them back into a packed frame; you can also print a whole GSM frame to a file in human­readable format with a single function call.

Our library client, called toast, is modeled after the Unix compress program.  Running toast myspeech will compress the file myspeech, remove it, and collect the result of the compression in a new file called myspeech.gsm, while untoast myspeech will reverse the process.  The big difference between toast and compress is that toast loses information with each compression cycle.  (After a few iterations, you can hear high­pitched chirps that I initially mistook for birds outside of my office window.)

Patent Issues with GSM 06.10

Philips is claiming intellectual property on GSM 06.10. They haven't contacted the authors of this library, but at least two large companies that wanted to integrate GSM 06.10 codecs into their products have been approached; one decided to pull their codec, another to pull just the encoder and leave the decoder. (So, apparently, at least some lawyers think the intellectual property applies only to one half of the process.)

I don't know which parts of the patent are new, or whether it would hold up in court, but of course nobody wants to go to court over an issue as small as this.

The VPIM IETF workgroup is considering using GSM 06.10. The IETF can't standardize on technology that forces its users to pay license fees. If Philips doesn't release their intellectual property for use in VPIM, we'll be wasting a lot of bandwidth with voice mail. That's not the end of the world, but it would be nice to at least ask first.

I don't know whom to ask. If you do, please contact me.

Get ETSI publications free of charge

ETSI is the European standards body that came up with GSM. For a limited time, ETSI is making copies of its publications available over the Internet for the price of giving away an email address, among them the GSM 06.10 and GSM 06.06 drafts and attachments. Try it out at http://pda.etsi.org/pda/.

GSM 06.10: the current patchlevel is 10

Shortly after releasing patchlevel 9 (which has a little more WAV#49 support in the library), a long email correspondence convinced me that the tables used in toast's A-law conversion were entirely fictional.

Nobody seems to use A-law much, so it is conceivable that the wrong tables might have gone undetected since 1992. The new tables have been independently tested against the vectors supplied with G.726, and match those the ITU, Bell Labs, and at least one other implementor use.

Patches relative to patchlevel 9: reference version, 8.3 filename version; Full release (if the last thing you have is patchlevel 7, get the full release): reference version, gzip'ed tar file, compressed, 8.3 filename ZIP archive.

WAV49 version of gsm_implode.c is broken.

The gsm_implode() function doesn't work if the WAV49 flag is set; as Shay Ben-David brought to my attention, the code in question is a duplicate, rather than the opposite, of the gsm_explode() code. This will be fixed in the next release.

Warble, warble, warble


Leila: Are you using a scrambler?
J. Frank: I can't hear you, I'm using a scrambler!
- Repo Man
If you're using the library to encode and decode sound in your project, and the resulting audio is nowhere near telephony quality but sort of warbled, the most likely cause is that you're using the same gsm state to both encode and decode. Don't do that; allocate two different states instead, one for each direction.

Porting to a DEC Alpha

People porting the GSM 06.10 library to DEC Alphas have noticed that the test of the basic math routines fails. The test prints:
0xfffffffe (4294967294) != L_<< (2147483647, 1) -- expected 0xfffffffe (-2)
0x00000000 (-4294967296) != L_<< (-2147483648, 1) -- expected 0
This can be fixed by changing the definitions of the 32-bit types in inc/private.h from long and unsigned long to int and unsigned int. On the Alpha, a long has 64 bits; an int (at least with the unadorned native compiler I used) has 32 bits. The math tests that fail exploit properties specific to 32-bit integer math.

If you don't care about the math test, you don't have to change the types. In spite of the failing test, the library does work fine even with a 64-bit long. (It's been tested against byte-swapped ETSI test patterns.)

The .wav GSM format

There is a .wav chunk format #49 that encodes GSM 06.10 frames. Newer Windows versions support it natively. It's a completely parallel version to ours, written from the same ETSI pseudocode, but ending up with imcompatible framing and different code order in the bytes.

After fretting over intellectual property rights for a few months, Microsoft has now registered the encoding inside the WAV chunk as a MIME type, particularly for use in the context of VPIM (Voice Profile for Interenet Mail)'s spinoff IVM, a way of sending Voice Messages as MIME documents.

The Microsoft ietf-draft is avalable as draft-ema-vpim-msgsm-00.txt from IETF draft repositories.

Long before that, Jeff Chilton figured out the format with trial-and-error when he needed to write compressed wave files for his shortwave radio application (see below).

The patchlevel 9 release of GSM integrates Jeff's ``unofficial'' patch 8 in slightly different form, breaking his sample source code along the way. The updated version has its GSM_OPT_WAV_FMT changed to GSM_OPT_WAV49, and (thanks to Dima Barsky) a more portable way of looking at fputs's result. If you couldn't get it to work earlier on a SysV-ish environment, try again.

GSM on the World-Wide Web

Jay Novello has gone ahead and used the audio/x-gsm MIME type. A page at the North Carolina Institute for Transportation Research and Education explains how users and web masters can configure their systems to conveniently handle GSM documents, and offers a few sounds to test with for those that do.

GSM 06.10 Errata

The list of tested overflow points for sequence 1 (coder part), table 5.2 of the GSM 06.10 draft, expects 49 overflows in the APCM quantizer's call to abs() (section 4.2.15). Rob Wubben of Philips Research Labs, who implemented a GSM 06.10 codec and counted, found 57 - ditto when he checked the same count in our library, and in a colleague's C simulation of the codec. In our opinion the table is wrong.

(Update: Pierre Larbier reports that the final ETSI release of the GSM 06.10 test sequences, attached to ETS 300 580-2 edition 2 (GSM 06.10 version 4.1.1), has corrected its SEQ01 to produce only the promised 49 overflows.)

Dr. Carsten Bormann

My co­author Carsten Bormann has left the TU Berlin a few months ago to accompany Prof. Ute Bormann to the computer science department of the Universität Bremen, but both still visit Berlin regularily.  Carsten will continue to be reachable as cabo@cs.tu-berlin.de; his email address in Bremen is cabo@informatik.uni-bremen.de.

The Schur recursion

The Linear Predictive Coding (LPC) part of the GSM algorithm uses an integer version of the ``Schur recursion'' described by Issai Schur in 1917.  (The Levinson­Durbin algorithm from 1959 is better known, but the Schur recursion can be faster when paralellized.)  Linear prediction means that the algorithm tries to find parameters for a filter that predicts the signal in the current frame as a weighted sum (or ``linear combination'') of the previous ones. (Wil Howitt offers a short tutorial about LPC and CELP)


GSM for X

Java

Kudos to Steven Pickles for a free full-source Java 1.1 port of the GSM 06.10 Decoder side. Unlike the C library, the Java code is licensed under the Free Software Foundation's General Public License; if you use it, keep the library source available.

Chris Edwards did a Java port of the GSM 06.10 Encoder.

An open-source applet that can play lots of different GSM variants (with or without .wav header) is MumboJumbo, from voxeo's Omi Chandiramani. It's being extended to play other sound formats, too, and you can help.

DOS?

Louis Selvon <lselvon@usa.net> has created a new version of toast for DOS, based on the Patchlevel 10 release. As part of his EE thesis work, Louis also measured the objective and subjective performance (not speed, quality) of GSM 06.10 using MatLab (objective) and his family and neighbors (subjective).

Richard Elofsson <rel@ldecs.ericsson.se> has made the his DOS­port of the Patchlevel 4 release available.  (He fixed bugs that it took me until Patchlevel 7 to find, though.)  The source code, which compiles with Turbo C++ version 1.01, can be found as gsm-dos.zip in the toplevel GSM ftp directory.

Sergey A. Zhatchenko (zha@ergenm.comcen.nsk.su), from Novosibirsk, Russia, has donated a toast.exe, derived from a patchlevel 6 release of the GSM library. Make sure your input filenames have no suffix; this version of toast doesn't know that MS-DOS doesn't like more than one dot in its filenames.

GSM on the BeBox

Pierre-Emmanuel Chaut ported the GSM library to the BeBox, a PowerPC-based multiprocessing platform that excels with concurrent multimedia applications. It takes, he writes, "4 seconds to compress 20 seconds of sound". Way to go, Be.

Jake Bordens did his own port and implemented some GSM Coders as minimal sample applications. He's still too embarrassed to publish his code to just about anyone, but might be talked out of it; meanwhile, the binary is available from the webpage.

GSM DLL for OS/2

Terry Fry created, and is now distributing and maintaining, a OS/2 DLL version of the GSM 06.10 library. Next will be a .wav to .gsm for OS/2.

MacGSM

Paul C.H. Ho and Pink Elephant Technologies have used the Patchlevel 6 release to write a drag-and-drop GSM compressor/decompressor that converts between .au.gsm and .au. You'll need System 7.5 or System 7.0 and 7.1 with a Thread Manager extension; 68K and Power PC hardware is fine. The tool, initially written to decompress files broadcast by Radio Television Hong Kong is freeware and is distributed via ftp as a binary.

GSM for the amiga

Michael Cheng is responsible for the distribution of a toast binary compiled with amiga gcc2.7.2 on the aminet repositiories, path util/pack/GSMToast.lha. Michael also added some scripts that use toast to implement a streaming audio GSM mime type; they can be found on the same archives in comm/tcp/unrealaudio.lha.


GSM Applications

GSM for GBA

Damian Yerrick ported part of the library to the Game Boy Advance as part of a portable music player application that plays music off 256 Mbit flash cards.

xine

The GPL'ed free video player xine now uses code from our library to help play GSM-enocded AppleTalk and Windows WAV/AVI/ASF audio tracks.

aRts

The KDE sound server aRts, short for analog realtime synthesizer, has grown a GSM de- and encoder in its kdenonbeta module, thanks to Matthias Kretz.

JusTalk 2

Jonas Tärnström released this compact Windows multiuser voice chat application. It supports multiple sample rates, can function as a client or server, and can be set to stream audio either contiguously or whenever the voice level rises above a threshold.

ElderVision

The makers of the TouchTown Internet package for seniors are using a Java GSM 06.10 client for low-bandwidth telephony.

linphone

Even if you don't speak French, you can now read about and download linphone, a web-phone application that uses the GSM 06.10 library (with a fresh autoconf Makefile from author Simon Morlat).

JVOIPLIB

Jori Liesenborgs's JVOIPLIB is a LGPL'ed voice-over-IP library written in C++, based on his thesis work. It supports multiple codecs and codec parameters, VoIP session creation and destruction, and 3D effects (!). Jori has just integrated the GSM library and will likely be shipping a subset of the GSM 06.10 release with his next version.

OpenH323

OpenH323 is an Open Source implementation of the ITU H.323 protocol stack which runs on Linux, Windows, Solaris and other Unix platforms. The OpenH323 client sample code can interoperate with NetMeeting in audio mode, and can receive H261 format video. The GSM codec is the standard codec used by Linux implementations where G.723.1 hardware is not available.

Patches to the SOund eXchange tool, sox

Andrew Pam (avatar@aus.xanadu.com) has patched Lance Norskog's sox program to work with the GSM library.  I wish I had thought of that.

Sox-12.16: Son of SOX

Chris Bagwell (you might remember him as maintainer of the Audio File Format FAQ) has snatched maintenance of the cryptic, resourceful Unix tool sox from its original author, Lance Norskog. Version 12.17 supports GSM and WAV#49.

Pulse Entertainment's 3d web animation plugin

Pulse3d is streaming GSM 06.10 audio to its real-time animated characters, along with the lip sync and and body animation information that makes them come to life.

HotFoon

People with friends in Hyderaband, India, are in luck; hotfoon is offering a (so far) free gateway service to numbers in the local area there. Their small, free client also serves as a gateway to an online chat system; as usual, if you and a friend both download the client, have Duplex sound cards and a reasonably fast Internet connection, you can talk for free across the Internet, no matter where you are.

ATR-ITL

Somewhere towards the tail fin of the Japanese-English telephone "babelfish" that the Advanced Telecommunications Research group's Interpreting Telecommunications Research Laboratories are trying to build, a GSM 06.10 codec is one of the options available for encoding the translated utterances.

The Audiograph Lecture Recorder and Player

The University of Surrey, UK, and Massey University, NZ, have developed a Mac-based authoring system and Windows/Mac Netscape plugin software for voice- and drawing-annotated slide shows; they now distribute it through www.nzedsoft.com. The viewers are free; version 1.2 of the authoring tool used to cost money, but is now free as well.

NTT's "InterSpace" Virtual Environment

The Virtual Campus of NTT's InterSpace project combines videoconferencing with 3D graphics and, recently added, an audio chat facility that uses our library. The site's entrance graphics show rendered avatars whose heads are replaced by video screens rendered into the scenery, rather ingeniously close to the SnowCrash ideal.

Vosaic streaming audio applet

In the long term, the young Illinois startup Vosaic tries to compete with Progressive Networks in the streaming video market. Right now, they're showing a GSM-streaming Java applet based on Avneesh Pant's work.

FreedomAudio - Streaming Audio Player

Rolande Kendal has written the beautifully minimalistic set of controls that is free for non-commercial use. The FreedomAudio Java Applet can be used with a Java or JavaScript user interface and supports MS WAV #49 GSM by default; plain GSM available on request.

1-Step Audio Publisher Version 2.x

Noël Bouchard's GSM player/converter for Windows supports plain WAV, Sun AU, GSM 6.10 as understood by toast, WAV #49 GSM, and TrueSpeech.

GSM to WAV, the second

Bill Neisius (neisius@netcom.com) sent me email about a GSM-to-WAV converter and Web client he wrote a while ago. Soon after it arrived, the email fell prey to a temporary shortage of disk space on our system; it didn't get deleted, it just got written to the Place Where I Never Look. Well, I looked there just now, and if you're lucky enough to be able to access Bill Neisius' ftp directory at netcom, you might find lots of interesting sound applications there, some of which convert GSM to plain WAV.

QuickView, the DOS based multimedia viewer

Version 2.3 of QuickView supports GSM 06.10 and a host of other video and audio formats. The viewer is shareware that comes with a three-week free evaluation period; if you're interested in licensing the libraries or building custom viewers, contact Wolfgang Hesseler at qv@multimediaware.com.

Gir: A realtime player for Amiga OS

Sinisa Kesic has developed a small realtime player for Amiga OS named "Gir"; it comes with a browser-like interface for playing music locally or from the net. Included in the package (which can be found in tcp/Gir??.lha in your local Aminet archive) are tools for converting between Amiga raw 8-bit iff samples and GSM, and a "littlegir" plugin for webbrowsers.

XAnim

Mark Podlipec has integrated support for GSM audio into his XAnim, an animation, video, and audio player running under X on Unix and VMS.

SoftFone

And yet another product starts its description with ``Now you can...'', as if IGP, WebPhone, DigiPhone, CyberPhone, and whatever they are called had never happened. SilverSoft's SoftFone shines with a built-in answering machine, voice mail, and variable rate compression; other than that, it's the usual full duplex point-to-point Internet phone deal.

IVS

Thierry Turletti's INRIA Videoconferencing System transmits video and audio data between camera-equipped Unix workstations on the Internet. It supports a number of different audio codecs (among them GSM 06.10) and a H.261 video codec that is packetized with the increasingly popular RTP.

V-Fone

Bob Summers brings us V-Fone, a flexible, low-end videoconferencing application for PCs running Windows '95. (It seems to be point-to-point right now, with broadcast just around the corner.)

The Internet Party Line

Intel's experimental Windows application is no longer supported by its creators, but some of its users still distribute and use the binaries. As the name suggests, this is a real-time, multi-party audio chat via the Internet. Closely modeled after text chats like IRC, the application queues each speaker's statements separately and plays them serially, allowing any person to talk at any time without interrupting the others. Any Internet-connected PC can become a server; both client and server binaries are publicly available.

CyberPhone

(I guess someone had to come up with that name...) Matt Krokosz and Greg Foglesong present version 2 of CyberPhone, an Internet phone application that runs on Sun workstations and is being ported to Linux PCs. The system comes with an (optional) user directory service running on magenta.com; the full version costs $20, the demo (with 2 minutes of connect time through the central server only) is free.

Speak Freely

Brian C Wiles has been breathing new life into John Walker's Speak Freely, an Internet phone that runs on SGIs, Sun SPARCstation, and (with WINSOCK) on Windows. The tools interoperate seamlessly and can encrypt their voice data streams with IDEA, DES, PGP, and/or a one-time pad. Source code is freely available for both the Unix and Windows release. Version 8.0, now in beta under Windows, features a multipoint conference mode, answering machine messages, and easier interoperation with ICQ.

PCS 1.0 (?)

This isn't really an application, but there is, or used to be, a strongly Intel-influenced industry consortium called the Personal Conferencing Working Group (PCWG) which defined something called the Personal Conferencing Specification (PCS) - yet another desktop video conferencing infrastructure - and, according to Leigh Anne Rettinger's thesis, the first version of it included GSM audio compression. I can't find a trace of these people after 1997; if anyone knows the story of what happened to them, send me email.

xztalk, ztalk

The Linux ``xztalk'' by Liem Bahneman (roland@cac.washington.edu) and Andy Burnett (burnett@baldrick.cecer.army.mil) is based on Scott ``This is so incredibly alpha, it isn't funny'' Doty (scott@cs.santarosa.edu)'s extended version of misch@elara.fsag.de's ``mtalk''.  W. Richard Jhang (feinmann@cs.mcgill.ca)'s ztalk is also a descendant of Scott Doty's release; I don't know whether xztalk used ztalk, or whether both were developed independently.  Contact your friendly sunsite mirror for details.

erikyyyphone

Named after the author's IRC nick, ericyyyphone is a GPL-licensed audio conferencing application written in C++, running on Linux.

Microsoft NT and Windows 95 (beta)

Microsoft's Audio Compression Manager includes a GSM 6.10 CODEC (in addition to those for ADPCM, IMA ADPCM, the DSP Group's TrueSpeech(TM), and a PCM converter). The Windows 95 beta added CCITT G.711 u- and A-law CODECs to the collection. Microsoft's GSM 06.10 CODEC is not compatible with toast's frame format - they use 65-byte-frames (2 x 32 1/2) rather than rounding to 33, and they number the bits in their bytes from the other end. (Well done, guys.)

SoundApp for Macs

Norman Franke's SoundApp plays as many audio formats on the Mac as he could get his hands on, among them GSM 06.10 (both ours and Microsoft's). Keeping with the flexible theme, the application has been translated into Japanese, French, and Swedish.

WebbWatch for Windows

WebbWatch for Windows, by Daniel Ding, turns a Pentium with VideoBlaster-compatible capture card and the usual sound support into a video phone. The video codec does H.261's QCIF, the audio is GSM or ADPCM.

VidCall from M R A Associates, Inc.

VidCall is a video and audio player and recorder, combined with a multipoint shared clipboard application, for Windows. It uses aforementioned Microsoft GSM 06.10 CODEC for its audio; if you can't find yours, VidCall's public ftp directory has a replacement zipfile with an MSGSM610.ACM. The software is distributed via the Internet; restricted use during a 30-day evaluation period is free.

Internet Global Phone

Around December 1994, a company called microWonders, Inc., released source code for a GSM-using tool called ``Internet Global Phone'' and publicised the event with a press release that suggested I was distributing their tool. (Longer version.)

The Internet Multicasting Service

The Internet Multicasting Service has been broadcasting audio on the Internet for more than two years, starting with the ``Geek of the Week'' program in March 1993.  In addition to its original .au format, it now supports .ra (Real Audio) as well as .gsm.

vat - LBNL Audio Conferencing Tool

Vat was developed by the Lawrence Berkeley National Laboratory's Research Group. It is part of a whole set of tcl/tk applications grouped around IP multicasting on the MBONE (but functional without it). With the most recent 4.0 alpha release, source code is finally available; so are, as before, binary distributions for most Unix platforms.

NVAT - Network Video Audio Tool

NEC Corporation's NVAT implements video and audio conferencing with less than 64 Kbps bandwidth on PCs. To receive and send video, you'll need a i486 DX4/100MHz or Pentium 75MHz or faster running Windows NT, with at least 32 MB of memory, a VGA video adapter with at least 256 colors (it's faster with 65,536), a SoundBlaster16 card or similar, and a video card that works with the Video for Windows API. Some of these requirements can be dropped if video is only received, not sent. The tool is compatible with versions of the Unix-based nv and vat, and can receive MBONE broadcasts. The binary-only alpha release is free for research and evaluation purposes.

Nevot 3.34 (December 22nd, 1995)

Henning Schulzrinne's network voice terminal program NeVoT provides packet­voice communications across internetworks.  It operates in either unicast, simulated multicast, or IP multicast environments, using the vat or RTP protocols.

NetPhone, DigiPhone, Digifone, e-Phone.

San Francisco's Electric Magic Company has renamed their NetPhone Internet phone application to e-Phone; the name NetPhone was already taken. They apparently plan to continue changing the name to Digifone with the Macintosh release, which is very likely a typo for DigiPhone, whose vendor Third Planet Publishing claims they bought e-Phone in 1995, which would be before the name change.

WebPhone

NetSpeak Coprporation, formerly the "Internet Telephone Company," has released version 4.02 of their WebPhone application. The application, which requires a PC running Windows 3.1 or higher and an MCI-compliant sound card, supports TrueSpeech(TM), G.723.1, G.711 (that's 8 kHz u-law), and full-rate GSM.

CU-SeeMe

I have been told that CU-SeeMe for MacIntosh computers supports GSM encoding in some manner. The Web resources list a mysterious new 16 kb/s encoding that ``should work over a 14.4 line'' (the incredibly shrinking compression method!), but I don't know anything specific.

InPerson

SGI's multimedia conferencing tool (technical details) offers GSM 06.10 encoding as one of six options for the audio stream.  (I do not know who wrote their codec.)  The software requires an SGI workstation with at least 32 Mb RAM running IRIX 5.2 or above, and can be downloaded via ftp for a free 30­day trial.

UnReal Audio

As a take-off on RealAudio (see below), Roman Mitnitski (mitnits@shani.net) has implemented a simple real-time server/client for Linux based on the GSM library. The pre-release is freely available via ftp, but you'll need to bring your own XForms library and GSM library.

PowWow

Collaborative browsing is the speciality of Tribal Software's Windows-only PowWow tool. Users chat through text and, if their hardware and Internet connection allows, on a 14,400 bps voice line.

MBONE protocols for Windows

Precept software, a Palo Alto startup, sells software that supports the dominant Internet conferencing protocols RTP and RTCP. (RSVP is being worked on.) Together with access to a MBONE-connected Internet host and Precept's H.261 codec, that's enough to both play data from the MBONE and broadcast to it. These capabilities are sold both as `stand-alone'' software and as a convenient library for developers wishing to integrate multimedia Internet communication into their applications.


Other audio compression applications

Voxware

The Princeton, NJ startup Voxware specializes on vocoder software that, according to their own descriptions, allows for stunning compression rates without requiring dedicated hardware. They offer a complete product palette of speech codec applications, from the inevitable Internet telephone over browser plugins to a voice parameter editing system.

RealAudio

The selling point of Progressive Network's RealAudio is not its format, but its flow control: the streams start playing immediately.  Users don't have to wait for the whole document to arrive, and they can interact with the data stream (jump to different tracks, change channels).

Internet Phone

VocalTec Inc., from Northvale, NJ, sells an Internet Phone application for PCs that uses some unspecified ``unique voice compression algorithm'' to compress down to about 7.7Kbit, and, if you buy their compression card, even down to 6.72 Kbit. Because of the attention the company paid to community infrastructure (initially leading to the demise of the IRC servers they used to support their directories), Internet Phone has become a ``scene'' similar to that of IRC.

Internet Wave

Internet Wave is VocalTec's answer to RealAudio. Unlike RealAudio, which consistently operates at a bandwidth around 14,400, VocalTec's system supports four different audio qualities, at bandwidths between 9,600 and 28,800 baud. Will VocalTec's existing market base and the possible better sound quality suffice to dislodge RealAudio from its already established market position? And will either of them manage to move my headphone plug from its established position in the CD player? Not as long as there's Shriekback on, it won't. Stay tuned.


Half-rate GSM and EFR

Enhanced Full-Rate GSM

On November 4th, Nokia announced that the EFR (enhanced full rate) codec they had been developing with the University of Sherbrooke, Canada, had been chosen by the ETSI as the industry standard codec for GSM/DCS. Additionally, the US PCS 1900 operators have also moved to EFR. It's supposed to have ``landline quality,'' be ``more robust to non-voice signals such as music'' and more resilient to ``environments with excess background noise''. Anyone know more about this?

Half-Rate GSM

According to an article posted to comp.dsp by Texas Instruments' Mansoor Chishtie,
GSM half-rate is now a standard.  It is based on Motorola's VSELP technology similar to IS-54 full-rate.  It compreses speech at 5.6kbps using two 7-bit codebooks for unvoiced speech and one 9-bit codebook for voiced segments.
The draft prETS 300 581-2 (GSM 06.20 Version 4.0.0) is the mathematical description of half-rate GSM.

So, how complex is it?

Good question. According to a posting to comp.dsp from Feb 18 1995 by Chris Cavigioli, back then of Analog Devices, Inc., they have ``joined Alcatel Radiotelephone, Nokia, and Italtel-SIT in a sub­group to evaluate the complexity (MIPs and memory) required of typical 16­bit DSPs, based on bit­exact ANSI C programs supplied by Motorola and ANT Bosch (the two final codec candidates)''; their results have been published in three places:
  1. DSPx '94 Proceedings (theoretical worst case complexity)
  2. DSPWorld '94 ... also known as ICSPAT '94 Proceedings (avg complexity)
  3. Wireless Symposium '95 Proceedings (compare ETSI vs. ADI DSP complexity)
Analog Devices have ``implemented the GSM half-rate standard in DSP assembly code, running in real-time, and meeting the ETSI delay specifications.'' 

(Of course, this says very little about what will be possible in non­DSP software.)

In the proceedings of the September's EUROSPEECH'95 in Madrid, Tim Fingscheidt, T. Wiechers and E. Delfs have published a paper on ``Implementation Aspects of the GSM Half-Rate Speech Codec'' (pp. 723/726). Tim, whose group implemented a half-rate codec for the NEC muPD77018 based on the 06.06 source code, estimates the complexity of the half-rate codec at 4-6 times that of the full-rate version.

GSM 06.06: sourcecode for GSM 06.20

GSM 06.06 is ANSI C source code for a half­rate codec.  Its public review period started on April 10th, 1995.  ``Public review'' means that it is for sale as
draft ETS 300 581-7
from the ETSI sales department,
Ms. Anja Mulder
+33 92 94 42 58 (voice)
+33 93 95 81 33 (fax)

At the moment, it doesn't seem as if we're going to implement GSM 06.06 here.

The test patterns for GSM 06.06, GSM 06.07, will become draft ETS 300 581-8, and lag the source code by about two weeks.  06.42 is half-rate voice activity detection, 06.22 comfort noise.

GSM 06.06, lesson I: the bait and switch

When a colleague recently told me he had ordered draft ETS 300 581-7 and would lend me his copy, I was looking forward to examining the code in GSM 06.06 and judging its complexity myself.  What he turned out to receive from ETSI was only a call hierarchy of the functions in the code; the discette that was listed as ``attached to the back cover'' was missing.  After complaining to ETSI, they told him that he could buy the actual source code for 1000 ECU (1 ECU = US$ 1.28 or DM 1.85).

Mind, this is probably not attributable to malice - I hear that the electronic­only distribution format is still new to the ETSI bureaucracy, and it is likely that they have trouble adapting their page­based pricing scheme to that.  Nevertheless, if you or your company order ETS 300 581-7, make clear that you do want source code, and make the person at the other end list the price for the code; don't buy expensive, but useless, crossreference listings.

[Update: Currently, ETSI documents - including the source code mentioned above - are available free of charge over the net. Go to http://webapp.etsi.org/pda/.]


Miscellaneous

Books

On digital speech processing, I recommend

For a well­written, interesting, 100% jargon­free introduction to language, speech, and the mind, see

The book on GSM in general is self-published and can only be ordered from the authors.

Introductions and Demos

A set of introductory DSP classes is online at http://www.bores.com/courses/intro.

If you're learning about digital speech processing, visit Phil Karn's Digital/Analog Voice Demo at Qualcomm. Illustrated with mu-law sound samples, Phil takes you from an original sound sample, to a band-pass filtered version, to one with added noise, to a GSM version, a CELP-encoded version, Qualcomm's proprietary QCELP-encoded version at two different data rates, and an LPC-10 version, complete with running commentary about each encoding.

CELP source code sighted

Rick Ross found a set of speech compression engines at CMU; featuring a prehistoric version of GSM, an LPC, the CCITT-ADPCM, and various *ELPs that I haven't seen anywhere else.

Samples

The Vincent Voice Library at Michigan State University houses taped utterances of over 50,000 persons recorded over 100 years.  In addition to the standard 20 8-kHz mu-law samples (among them Isaac Asimov at MSU on writing books (2:39)), the site currently features an exhibition of voices of presidents from Grover Cleveland to Bill Clinton.

The inexhaustible Jennifer Myers maintains a list of sites with audio clips.

A short tutorial on Cursing in Swedish is richly illustrated with wave file samples.

Sound applications on the World­Wide Web

Periodical sounds: the Weekly Idiom from the Comenius Group.

The US National Institute of Health's amateur radio club broadcasts traffic as UDP packets containing GSM 06.10 audio from the Listening Post. (You'll need a UDP-based GSM 06.10 audio decoder to receive, such as Speak Freely).

Jeff Chilton's Shortwave Radio gives you access to the last 5 or 15 seconds from a user-selected frequency, as received in Reston, Virginia, USA.

``Bluedog can count!''

Voices is a Web interface to AT&T's text-to-speech engine. You can type and then hear spoken sentences with up to 40 words; a second, more detailed interface lets you customize pitch, head size, word rate, and aspiration of the generated sample.

Research

Voice Synthesizers On The Verge of a Nervous Breakdown: in 1989, Janet Cahn wrote her thesis at the MIT Media Lab about Expressive Synthesized Speech - how to make voice synthesizers express emotions.  The three sound samples she has online, three different sentences synthesized in ten tones expressing anything from impatience through anger to depression, are still hilarious to listen to.

T.V. Raman's auido rendering system for mathematics ASTeR uses parameters of synthesized speech to indicate nesting and dependencies within formulas; culminating in an impressive 66-second audio rendering of Faà di Bruno's formula (Knuth Vol. 1, bottom of page 50.)

The European Speech Communication Association (ESCA) keeps a page of links to speech research institutions all over the world.  (You might want to turn off image loading for this page; every link is illustrated with the institution's logo.)

Fun

Proceed to the next stage of collaborative technology with the RealAroma server and, uh, plug-ins.

If you want to learn more about sounds, why not pay a visit to the San Francisco Exploratorium and its duck call vowels?

(If, conversely, you want to hear more about toasters, I recommend Patrick R. Michaud's report on Strawberry Pop­Tart Blow­Torches)

The final word on telephone sex.


Indices

The maintainers of the following sites try to offer comprehensive and complete indices into their respective subjects; the documents should be large enough to get you within a few hops of your topic quickly.

Multimedia

Simon Gibbs' Index to Multimedia Information Sources
A long no-frills list of Multimedia links, with archives, standards, companies, research organisatins, conference announcements, tutorial-type material, and FAQs.

Internet telecommunication

Audio and Video via the Internet from Jack Decker.
A long list of links sorted by type of application or institution: audio players; distributors of audio media; two-way audio; two-way audio/video; hardware; and miscellaneous links.
Voice/Video on the Net
Where the previous site has short descriptions, Jeff Pulver's restricts itself to links; but it is embedded in a rich subtree of media related information that makes up for the main page's brevity. The NetWatch archives in particular have short notes and updates on what must be every PC Internet audio tool in existence.
How Do I Use the Internet as a Telephone?, from Kevin Savetz and Andrew Sears
Preceded by a loose and fast introduction to the basic concepts of Internet telephony strictly from a user's perspective, most of the FAQ is taken up by short reviews of Internet Telephony products, grouped by platform.

Speech Processing

Andrew Hunt's comp.speech WWW site (Cambridge mirror)
The site's hypertext version of the comp.speech Frequently Asked Questions posting has pointers to general information and tools concerned with speech encoding, compression, recognition, synthesis, and other forms of natural language processing.

Jason Woodard's descriptions of Speech Codecs

Rather than pointing to every speech processing gizmo in existence, this subtree explains principles and formats, and gives crucial software and theory references, for three general classes of speech codecs and a the most important standards.

Digital signal processing

The comp.dsp Frequently Asked Questions list
Questions, answers, and resources for general digital signal processing.
Josip Juric's DSP homepage
collects the FAQ and a number of other pointers to DSP resources; among them Guido van Rossum's Audio File Format FAQ and Appendix from comp.dsp.

Compression

The comp.compression Frequently Asked Questions list
explains, and often provides references to software that implements, most lossy and non-lossy algorithms. The hypertext FAQ archived at Ohio State University looks just like the ascii FAQ, but has been broken up and links directly to referenced documents where possible.

Telecommunication

Telecommunication resources
from the Australian Telstra is a big page of commented links.

Telecom Information Resources from Jef MacKie-Mason,

a searchable list of references to technical, economic, public policy, and social aspects of telecommunications.

Telecommunication sites from John Scourias.

John is the author of the excellent overview referenced elsewhere on this page; this is his telecommunication hotlist.

Digital mobile telephony

Simon Hewison's FAQ on Digital Mobile Phones
lists service providers and manufacturers of digital cellular phones. If you're trying to find the difference between, say, GSM and PCN, or want to know exactly how many mutually incompatible CT2 networks there were in the UK, this is the place to look.

GSM

Jürgen Morhöfer's GSM List,
last updated on Sep 22th 2000, lists GSM operators with network code and customer service phone number, sorted by country.

Supercall Cellular,

a South African provider, maintains a page of links to general information about GSM, including codes, networks, coverage maps for Europe -- and a request for submissions of scanned-in SIMs.


jutta@pobox.com, July 2000.   Comments and corrections are welcome.