abnfgen

In memoriam Peter Naur
October 25 1928 - January 3 2016
and John Backus
December 3 1924 - March 17 2007

What is it?

Given an Augmented Backus-Naur form (ABNF) grammar (Wikipedia, RFC 4234), abnfgen quickly generates lots of random documents that match that grammar.

$ cat GRAMMAR
ring = 1*12("ding" SP) "dong" CRLF
$ ./abnfgen GRAMMAR
DInG DiNg doNg

That's useful for

  • verifying an ABNF grammar's syntactical correctness
  • illustrating a grammar with non-obvious examples
  • testing software that is supposed to consume documents described by a grammar

The tool is in alpha and, while copyrighted, free for any use.

Download

abnfgen-0.20.tar.gz is the current version. You'll need a C compiler and a Unix-like environment to build it.

Documentation

Unix-style man page for abnfgen(1).
For a quickreference, run abnfgen -h.

Differences to abnf

The input grammar accepted by abnfgen differs from ABNF in three ways:

'abc'   In ABNF, single quotes have no special meaning. With abnfgen input grammars, 'abc' means abc exactly as written, without upper/lower case options. I like case-insensitive protocols as much as the next person, but if I want to generate some spcific text, and I can't write it any other way than as hex, decimal, or binary numbers, the results will be unpleasant to read, write, and maintain.
(Some years after I wrote the preceding paragraph, the standards community has added a way of spelling case-insensitive literals using the syntax of %s"abc". If you want to stay conforming, use that and mention that you require support for RFC 7405.)

{N}   In ABNF, curly braces have no special meaning, either. With abnfgen, a number in curly braces in front of an alternative controls the chance that that branch is taken. The default value is 1. So, with coin = {2} heads / tails, the toss will come out "heads" in two cases out of three. This works for repetition branches, too; the branch to the left of the * is the shorter one, to the right, the longer. (If you're trying to make longer strings, you may also want to increase the maximum recursion depth using the -y runtime option.)

UTF-8   Numeric character codes (e.g. %d13) are interpreted as Unicode and emitted as UTF-8. (ABNF is wisely silent on its character encoding.) In retrospect, that's a bit of a hack. It'll come out the same for 7-bit ASCII, and make things easier if what you're trying to generate is UTF-8; otherwise, complain to me.

The "legal" command-line flag, -l, turns all of this off.

Author & Homepage

Jutta Degener <jutta@pobox.com> wrote abnfgen and maintains it.
Its homepage is at http://www.quut.com/abnfgen/.

Recent Changes

The release was last updated April 8th, 2018.

0.20   Parenthesize macro arguments in expressions. (Yikes!)

0.19   Forgot to increment the version number for 0.18.

0.18   Fix the usage message. Thanks to Tony R. for the improved version.

0.17   Add support for RFC 7405 (%s"foo"); new option -7 turns it off. Thanks to Martin Bjorklund for sending me a patch!

0.16   New option -l ("legal") turns off extensions.

0.15   Handle EOF in '', "", and <> as an error, rather than looping forever.

0.14   Inside [..] or 0*N(..), abnfgen didn't always check resolvability prior to expansion.

0.13   Randomly expand [foo] as either foo or nothing. Up to version 0.12, abnfgen always picked foo if there was room for it.

0.12   Don't let "%d12-%d14" through - the correct syntax is "%d12-14".

0.11   Don't allow "_" in identifier names unless the new "-_" option is specified.

0.10   Don't treat any single count in front of a rule as if it were 1.
Document -u ("don't understand prose") option.

0.9   ABNF doesn't allow spaces in N*Mexpr; I did. Tighten the restrictions to make it easier for ABNF authors that want to check compliance.
Pre-load the core syntax for ALPHA BIT CHAR CR CRLF CTL DIGIT DQUOTE HEXDIG HTAB LF LWSP OCTET SP VCHAR WSP

0.8  Don't go into infinite loops when expanding grammars with recursive rules.

Other ABNF Software

Test case generators
Stéphane Bortzmeyer's Eusthatius is written in Haskell.

Parser generators
Lowell D. Thomas of Coast to Coast Research wrote APG, the ABNF parser generator. It produces a C++ class, given a grammar.
Tanaka Akira has a converter for ABNF to regexp in Ruby - for those instances where that's possible.

Verifiers
There's Bill Fenner's ABNF checker (for cut-and-pasted grammar) and
Chris Newman's abnf.c, a widely used validator (here's its cut-and-paste frontend).

Extractors
Bill Fenner's ABNF extractor, a small perl script that tries to guess which part of an Internet Draft or RFC is the ABNF; it too has a web frontend.

Test case #1.
Test case #2.
Test case #3.
Test case #4.
Reality.

Demo: applying abnfgen to itself.

1. Download abnfgen.

That should leave you with a file abnfgen-0.20.tar.gz in a directory you control, for example /tmp.

2. Compile a local copy.

Decompress, untar, configure, and make:
$ gunzip < abnfgen-0.20.tar.gz | tar xf -
$ cd abnfgen-0.20
$ ./configure
creating cache ./config.cache
checking for a BSD compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking whether make sets ${MAKE}... yes
checking for working aclocal... found
checking for working autoconf... found
checking for working automake... found
checking for working autoheader... found
checking for working makeinfo... missing
checking for gcc... gcc
checking whether the C compiler (gcc ) works... yes
checking whether the C compiler (gcc ) is a cross-compiler... no
checking whether we are using GNU C... yes
checking whether gcc accepts -g... yes
checking for a BSD compatible install... /usr/bin/install -c
updating cache ./config.cache
creating ./config.status
creating Makefile
$ make
gcc -DPACKAGE=\"abnfgen\" -DVERSION=\"0.20\" -I. -I. -g -O2 -c abnfgen.c
gcc -DPACKAGE=\"abnfgen\" -DVERSION=\"0.20\" -I. -I. -g -O2 -c argv.c
gcc -DPACKAGE=\"abnfgen\" -DVERSION=\"0.20\" -I. -I. -g -O2 -c check.c
gcc -DPACKAGE=\"abnfgen\" -DVERSION=\"0.20\" -I. -I. -g -O2 -c coverage.c
gcc -DPACKAGE=\"abnfgen\" -DVERSION=\"0.20\" -I. -I. -g -O2 -c expression.c
gcc -DPACKAGE=\"abnfgen\" -DVERSION=\"0.20\" -I. -I. -g -O2 -c hash.c
gcc -DPACKAGE=\"abnfgen\" -DVERSION=\"0.20\" -I. -I. -g -O2 -c input.c
gcc -DPACKAGE=\"abnfgen\" -DVERSION=\"0.20\" -I. -I. -g -O2 -c mem.c
gcc -DPACKAGE=\"abnfgen\" -DVERSION=\"0.20\" -I. -I. -g -O2 -c nonterminal.c
gcc -DPACKAGE=\"abnfgen\" -DVERSION=\"0.20\" -I. -I. -g -O2 -c output.c
gcc -DPACKAGE=\"abnfgen\" -DVERSION=\"0.20\" -I. -I. -g -O2 -c renderchar.c
gcc -DPACKAGE=\"abnfgen\" -DVERSION=\"0.20\" -I. -I. -g -O2 -c scan.c
gcc -DPACKAGE=\"abnfgen\" -DVERSION=\"0.20\" -I. -I. -g -O2 -c symbol.c
gcc -DPACKAGE=\"abnfgen\" -DVERSION=\"0.20\" -I. -I. -g -O2 -c toutf8.c
gcc -g -O2 -o abnfgen abnfgen.o argv.o check.o coverage.o expression.o hash.o input.o mem.o nonterminal.o output.o renderchar.o scan.o symbol.o toutf8.o

3. Find the document that defines your grammar.

The grammar of ABNF itself is defined in RFC 4234. (That used to be 2234, but the authors published a new version in October 2005.) The text of that RFC is available from many servers, among them http://www.ietf.org/rfc/rfc4234.txt at the IETF.

4. Extract an ABNF grammar from the document.

In RFC 4234, the ABNF grammar is in section 4, "ABNF DEFINITION OF ABNF". Save that section in a file (let's call it "section-4.txt") and remove the page headers from the text (the three lines starting with "Crocker & Overell").
That'll get you something that looks like this: section-4.txt

5. Run abnfgen on the grammar.

$ ./abnfgen section-4.txt
;
;

    ;
  k =/ <>/ <>

So, as I'm writing this tutorial, abnfgen generated some empty comments and a rule adding two prose alternatives to a symbol named "k". If you're following along in real life, your case may vary. It's fairly common for the documents generated by abnfgen to be small - the alternatives that do nothing and the alternatives that do something in a grammar usually start out with about even weight.

It's also common for documents generated by abnfgen to be unfit for display on a terminal - many grammars allow control characters that one doesn't see much in practice.

5b ... lots of times.

Now that the basic grammar is working, let's generate 100 test documents.

$ ./abnfgen -d demo -n 100 section-4.txt

The -d option specifies the name of a directory. Abnfgen creates it if it doesn't exist yet. In the directory, there should now exist files demo/0001.tst, demo/0002.tst, and so on up to demo/0100.tst, each with a random ABNF grammar. Deciphering the meaning of these test files is an exercise in itself; it takes a while to recognize

=/8*" ";
604%D29-09
;f

as an ABNF grammar rather than line noise.

6. The test: run abnfgen on the grammars it generated.

The grammars are syntactically correct, but will have lots of semantic errors - missing definitions, definitions for nonterminals that aren't used, and so on. Abnfgen will rarely create output for them. So, for a simple test, all we need to do is run abnfgen with these grammars as input and see whether it crashes.

$ for i in demo/*
> do
> ./abnfgen $i > /dev/null 2>&1
> done
Segmentation fault
Segmentation fault

Well, that's what version 0.1 looked like - since version 0.2, abnfgen no longer crashes very much on its own input - most of the time.

(Two of the randomly generated grammars happened to have an infinite recursion in them; and abnfgen overflowed its stack following them.)