|
abnfgen
|
In memoriam Peter Naur
October 25 1928 - January 3 2016
and John Backus
December 3 1924 - March 17 2007
|
Given an Augmented Backus-Naur form (ABNF) grammar
(Wikipedia,
RFC 4234), abnfgen quickly
generates lots of random documents that match
that grammar.
$ cat GRAMMAR
ring = 1*12("ding" SP) "dong" CRLF
$ ./abnfgen GRAMMAR
DInG DiNg doNg
|
That's useful for
- verifying an ABNF grammar's syntactical correctness
- illustrating a grammar with non-obvious examples
- testing software that is supposed to consume
documents described by a grammar
The tool is in alpha and, while copyrighted,
free for any use.
abnfgen-0.20.tar.gz
is the current version. You'll need
a C compiler and a Unix-like environment
to build it.
Unix-style man page for abnfgen(1).
For a quickreference, run abnfgen -h.
The input grammar accepted by abnfgen differs from
ABNF in three ways:
'abc' In ABNF, single quotes have no special meaning.
With abnfgen input grammars, 'abc'
means abc exactly as written, without upper/lower
case options.
I like case-insensitive protocols as much as the
next person, but if I want to generate some spcific text,
and I can't write it any other way than as hex, decimal, or binary
numbers, the results will be unpleasant to read, write, and maintain.
(Some years after I wrote the preceding paragraph,
the standards community has added a way
of spelling case-insensitive literals using the syntax
of %s"abc". If you
want to stay conforming, use that and mention that
you require support for RFC 7405.)
{N} In ABNF, curly braces have no special
meaning, either. With abnfgen, a number in curly braces in front of an
alternative controls the chance that that branch is taken.
The default value is 1. So, with
coin = {2} heads / tails , the toss will come
out "heads" in two cases out of three.
This works for repetition branches, too; the branch to the
left of the * is the shorter one, to the right, the longer.
(If you're trying to make longer strings, you may also want to
increase the maximum recursion depth using the -y runtime option.)
UTF-8 Numeric character codes (e.g. %d13 )
are interpreted as Unicode and emitted as UTF-8. (ABNF is wisely silent
on its character encoding.)
In retrospect, that's a bit of a hack.
It'll come out the same for 7-bit ASCII, and make things easier
if what you're trying to generate is UTF-8; otherwise,
complain to me.
The "legal" command-line flag, -l , turns all of this off.
Jutta Degener <jutta@pobox.com> wrote
abnfgen and maintains it.
Its homepage is at
http://www.quut.com/abnfgen/.
The release was last updated April 8th, 2018.
0.20 Parenthesize macro arguments in
expressions. (Yikes!)
0.19 Forgot to increment the version
number for 0.18.
0.18 Fix the usage message.
Thanks to Tony R. for the improved version.
0.17 Add support for RFC 7405 (%s"foo");
new option -7 turns it off.
Thanks to Martin Bjorklund for sending me a
patch!
0.16 New option -l ("legal")
turns off extensions.
0.15 Handle EOF in '', "", and <>
as an error, rather than looping forever.
0.14 Inside [..] or 0*N(..), abnfgen
didn't always check resolvability prior to
expansion.
0.13 Randomly expand [foo] as
either foo or nothing. Up to version
0.12, abnfgen always picked foo if
there was room for it.
0.12 Don't let "%d12-%d14" through -
the correct syntax is "%d12-14".
0.11 Don't allow "_" in identifier names
unless the new "-_" option is specified.
0.10 Don't treat any single count
in front of a rule as if it were 1.
Document -u ("don't understand prose") option.
0.9 ABNF doesn't allow spaces in
N*Mexpr; I did. Tighten the restrictions
to make it easier for ABNF authors that want to
check compliance.
Pre-load the core syntax for
ALPHA BIT CHAR CR CRLF CTL DIGIT
DQUOTE HEXDIG HTAB LF LWSP OCTET SP VCHAR WSP
0.8 Don't go into infinite loops when
expanding grammars with recursive rules.
- Test case generators
Stéphane Bortzmeyer's
Eusthatius
is written in Haskell.
Parser generators
Lowell D. Thomas of Coast to Coast Research wrote
APG,
the ABNF parser generator. It produces a C++ class,
given a grammar.
Tanaka Akira has a
converter for ABNF
to regexp in Ruby - for those instances where that's
possible.
Verifiers
There's Bill Fenner's
ABNF checker
(for cut-and-pasted grammar) and
Chris Newman's
abnf.c, a widely used validator (here's
its cut-and-paste frontend).
Extractors
Bill Fenner's
ABNF
extractor, a small perl script that
tries to guess which part of an Internet Draft
or RFC is the ABNF; it too has a web frontend.
 | Test case #1. |
|
 | Test case #2. |
|
 | Test case #3. |
|
 | Test case #4. |
|
 | Reality. |
|
That should leave you with a file
abnfgen-0.20.tar.gz in a directory
you control, for example /tmp.
Decompress, untar, configure, and make:
$ gunzip < abnfgen-0.20.tar.gz | tar xf -
$ cd abnfgen-0.20
$ ./configure
creating cache ./config.cache
checking for a BSD compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking whether make sets ${MAKE}... yes
checking for working aclocal... found
checking for working autoconf... found
checking for working automake... found
checking for working autoheader... found
checking for working makeinfo... missing
checking for gcc... gcc
checking whether the C compiler (gcc ) works... yes
checking whether the C compiler (gcc ) is a cross-compiler... no
checking whether we are using GNU C... yes
checking whether gcc accepts -g... yes
checking for a BSD compatible install... /usr/bin/install -c
updating cache ./config.cache
creating ./config.status
creating Makefile
$ make
gcc -DPACKAGE=\"abnfgen\" -DVERSION=\"0.20\" -I. -I. -g -O2 -c abnfgen.c
gcc -DPACKAGE=\"abnfgen\" -DVERSION=\"0.20\" -I. -I. -g -O2 -c argv.c
gcc -DPACKAGE=\"abnfgen\" -DVERSION=\"0.20\" -I. -I. -g -O2 -c check.c
gcc -DPACKAGE=\"abnfgen\" -DVERSION=\"0.20\" -I. -I. -g -O2 -c coverage.c
gcc -DPACKAGE=\"abnfgen\" -DVERSION=\"0.20\" -I. -I. -g -O2 -c expression.c
gcc -DPACKAGE=\"abnfgen\" -DVERSION=\"0.20\" -I. -I. -g -O2 -c hash.c
gcc -DPACKAGE=\"abnfgen\" -DVERSION=\"0.20\" -I. -I. -g -O2 -c input.c
gcc -DPACKAGE=\"abnfgen\" -DVERSION=\"0.20\" -I. -I. -g -O2 -c mem.c
gcc -DPACKAGE=\"abnfgen\" -DVERSION=\"0.20\" -I. -I. -g -O2 -c nonterminal.c
gcc -DPACKAGE=\"abnfgen\" -DVERSION=\"0.20\" -I. -I. -g -O2 -c output.c
gcc -DPACKAGE=\"abnfgen\" -DVERSION=\"0.20\" -I. -I. -g -O2 -c renderchar.c
gcc -DPACKAGE=\"abnfgen\" -DVERSION=\"0.20\" -I. -I. -g -O2 -c scan.c
gcc -DPACKAGE=\"abnfgen\" -DVERSION=\"0.20\" -I. -I. -g -O2 -c symbol.c
gcc -DPACKAGE=\"abnfgen\" -DVERSION=\"0.20\" -I. -I. -g -O2 -c toutf8.c
gcc -g -O2 -o abnfgen abnfgen.o argv.o check.o coverage.o expression.o hash.o input.o mem.o nonterminal.o output.o renderchar.o scan.o symbol.o toutf8.o
The grammar of ABNF itself is defined
in RFC 4234. (That used to be 2234, but
the authors published a new version in October
2005.) The text of that RFC is available
from many servers, among them http://www.ietf.org/rfc/rfc4234.txt
at the IETF.
In RFC 4234, the ABNF grammar
is in section 4,
"ABNF DEFINITION
OF ABNF". Save that section in
a file (let's call it "section-4.txt")
and remove the page headers
from the text (the three lines
starting with "Crocker & Overell").
That'll get you something that looks
like this: section-4.txt
$ ./abnfgen section-4.txt
;
;
;
k =/ <>/ <>
|
So, as I'm writing this tutorial,
abnfgen generated some empty comments and
a rule adding two prose alternatives to a
symbol named "k".
If you're following
along in real life, your case may vary.
It's fairly common for the documents generated
by abnfgen to be small - the alternatives
that do nothing and the alternatives that
do something in a grammar usually start
out with about even weight.
It's also common for documents generated
by abnfgen to be unfit for display on
a terminal - many grammars allow control
characters that one doesn't see much
in practice.
Now that the basic grammar is working,
let's generate 100 test documents.
$ ./abnfgen -d demo -n 100 section-4.txt
|
The -d option specifies the name of a directory.
Abnfgen creates it if it doesn't exist yet.
In the directory, there should now exist
files demo/0001.tst, demo/0002.tst, and so on
up to demo/0100.tst, each with a random
ABNF grammar. Deciphering the meaning of
these test files is an exercise in itself;
it takes a while to recognize
as an ABNF grammar rather than line noise.
The grammars are syntactically
correct, but will have lots of semantic
errors - missing definitions, definitions
for nonterminals that aren't used, and so on.
Abnfgen will rarely create output
for them. So, for a simple test,
all we need to do is run abnfgen with these
grammars as input and see
whether it crashes.
$ for i in demo/*
> do
> ./abnfgen $i > /dev/null 2>&1
> done
Segmentation fault
Segmentation fault
|
Well, that's what version 0.1 looked like -
since version 0.2, abnfgen no longer crashes very much
on its own input - most of the time.
(Two of the randomly generated grammars happened
to have an infinite recursion in them; and abnfgen
overflowed its stack following them.)
|
|