echo "full spelling of an English cardinal numeral less than a quadrillion" | ./kang
echo Nineteen hundred and eighty-four | ./kang echo uno | ./kang echo trois | ./kang echo fier | ./kang echo "shest'" | ./kang
The judges were able to appreciate the Indo-European language family relation by making this entry successfully recognize some French, German, Italian, Russian, and Spanish numerals.
Also worth mentioning is this entry’s ability to understand the colloquial year numbers of the last millennium.
We’ve added a linefeed to the print format for convenience.
This short program reads a spelt number (e.g.
forty-two) and writes a
corresponding decimal number (e.g.
42). Too long for one-liners, alas,
but it still qualifies as a short program as it has less than 0x100 bytes.
It accepts a variety of spelt numbers:
forty twoare same. So does period or comma.
a hundredare same.
one hundred twenty-threeand
one hundred and twenty-threeare same.
billionis 109 and
It does not accept some spelt numbers, which I found mostly irrelevant:
thousandetc. do not work.
one million milliondoes not work. Get used to
This program is quite portable, only requiring the following:
int main(int, int)should be accepted by the linker. (Original version only)
charshould be at least 8 bits long (as dictated by the standard),
intshould be at least 32 bits long,
long longshould be at least 64 bits long.
charshould wrap around, if your
The design of the program explicitly allows for
EOF which does not equal to -1
(it has to be negative per the standard) and both signed and unsigned
for example. Furthermore it is endian-independent.
Many obfuscations used are typical for standard IOCCC entries:
mainfunction are reused as normal variables.
?:ternary operator and
forloops and nothing else.
,) for multiple statements. The number of them is minimized, however, as it is too easy to (ab)use them.
"string"[n]. Both are fine for this program but I went to the former just for fun.
Other obfuscations are more subtle:
"1+DIY/.K430x9G(kC["is 18 bytes long, but actually 19 bytes including the final null character are used.
two hundred and three, it actually writes 0x203 as hexadecimal.
n) have dual uses.
main-er within it!
But the most important obfuscation is the clever construction of lookup table. The program uses 11 different characters required for recognizing 22 lexemes:
zero one tw- th(i)r- fo(u)r- fi- six- seven- eigh- nin- ten eleven twelve hundred(s) thousand(s) million(s) billion(s) trillion(s) a and -teen -ty
So that they are internally represented as like:
r n tw- tr- fr- f- s- sn- g- nn- tn ln twl nr(s) tsan(s) lln(s) blln(s) trlln(s) a an -tn -ty
The stemmer recognizes the longest matching prefix, so every lexeme can be
recognized by at most three characters (e.g.
trl instead of
trlln). This is
also handy for ignoring plurals. But that would make that the table does not fit
in the printable byte—112 is already almost 27!
The trick is to use octal; three characters (
g) are interpreted
as sequences of two characters (
nw respectively). Asides from
a smaller lookup table, it has many good consequences:
andshare the common prefix,
ny, and can be discarded altogether. Note that
nyitself does not appear from other entries.
thousandis interpreted as
tsan, which is equivalent to
tfynin the octal scheme. As it is the only entry with
tfprefix, it can be shorten by one character.
Having said this important trick, other details should be relatively easier to follow. The order of lookup table, for example, is very important, and the biggest constant 6177 is not arbitrarily chosen.
The cleaner (size-optimized) version of this program was originally published in my website in July 2011. Sun Park and others have reviewed it and let me aware of possible improvements. I’d also like to thank Seo Sanghyeon for proof-reading remarks.
© Copyright 1984-2015,
Leo Broukhis, Simon Cooper, Landon Curt Noll
- All rights reserved