Most Obfuscated Translator:
Tony Finch
Demon Internet
322 Regent's Park Road
Finchley
London N3 2QQ
United Kingdom
http://www.inch.demon.co.uk/
Judges' comments:
To build:
make fanf
Some systems may have problems building this entry because
of the system resources can require.
To use:
Enter an expression on standard input. Here are some
sample expressions that you can use:
((\a(\b(\c(d)))) e)
(\f\g\x
(
(g x) (f x)
) K K z
)
(\f(\f\g\x( (f((\a(g(b))) e)) (g x) ) K K z))
(Y\f\n
((= n 0)
1
(* n (f (- n 1)))
)
)
This program translares lambda expressions into combinator
expressions. But you do not need to know Lambda Calculus to be
impressed by this program!
See the file:
fanf.lambda
for more examples of lambda calculus source.
Notice how large the code grows from the fanf.c into the final
fanftmp2.c C program. Take a look at that final C program again,
can you begin to understand what it is doing?
Look at the first stage of the C pre-processing:
gcc -ansi fanf.c -E > fanftmp1.c
Skip to the bottom of fanftmp1.c (after all of the #include header
stuff ... look for a line of the form: # 2 "fanf.c" near the bottom)
and look at the resulting code. This code, when C pre-processed
will expand into over 342 times the original code (ignoring #include
headers) to produce fanftmp2.c. That program in turn, when compiled
translates a single lambda expression on standard input into
combinator expressions on standard output.
Extra credit question:
What do you have to do to make this program work with an
old non-ANSI C preprocessor?
Selected Author's comments:
My program is a demonstration of some practical applications of
theoretical computer science. The application it implements is a program
for translating lambda expressions into combinator expressions.
What the program does
=====================
The syntax of lambda expressions recognised by the program is as
follows: The basic atoms are variables which are single characters other
than backslash or parentheses, e.g. `a' or `b' or `@'. Variables are
combined by function application which is written e.g. `(f a)' which is
the function `f' applied to the argument `a'. Application groups to the
left, so `(f a b c)' is equivalent to `(((f a) b) c)'. Functions are
created from lambda terms which are written in the form `\a(expr)' which
is the function taking argument `a' and having value `expr'. Lambda is
spelt `\' because ASCII isn't good enough. Lambda binds tightly, so
`\a\bcd' is equivalent to `((\a(\b(c))) d)'. Whitespace is ignored.
For example, the identity function, called `I', is written `\x(x)'. A
simple expression is `(I a)' which is equivalent to `(\x(x) a)', which
has the value `a'. Another function called `K' is defined as `\x\y(x)'
and it evaluates as follows:
K p q
-> \x\y(x) p q
-> \y(p) q
-> p
Another function of interest is `S', defined as `\f\g\x((f x) (g x))'.
My program demonstrates the fact that any lambda expression can be
translated into an equivalent "combinator expression" which involves
only `S', `K', `I' (the combinators), and application, with no
variables or lambda abstractions. In fact, even `I' is unnecessary,
since it is equivalent to `S K K':
I z
-> \a(a) z
-> z
S K K z
-> \f\g\x((f x) (g x)) K K z
-> (K z) (K z)
-> \x\y(x) z (K z)
-> z
The algorithm for translating lambda expressions into combinator
expressions works as follows. There are three forms of lambda expression
to consider: variables, applications, and abstractions.
trans `v' -> `v' (variable)
trans `a b' -> (trans `a') (trans `b') (application)
trans `\ab' -> abs a (trans `b') (abstraction)
There are a further three cases to consider for the body of lambda
expressions, where we need to do the magic that transforms away the
variables.
abs a `f x' -> `S' (abs a `f') (abs a `x')
abs a `b' -> `K b' (b != a)
abs a `a' -> `I'
E.g. suppose we had combinator expressions for `+' and `3' and we wanted
to see what the combinator expression for doubling 3 looked like:
trans `\x(+ x x) 3'
-> (trans `\x(+ x x)') (trans `3')
-> (abs x (trans `+ x x')) `3'
-> (abs x `+ x x') `3'
-> `S' (abs x `+ x') (abs x `x') `3'
-> `S' (`S' (abs x `+') (abs x `x')) `I' `3'
-> `S' (`S' (`K +') `I') `I' `3'
-> `S (S (K +) I) I 3'
We can then check that this evaluates to the expected result:
S (S (K +) I) I 3
-> S (K +) I 3 (I 3)
-> K + 3 (I 3) (I 3)
-> + (I 3) (I 3)
-> + 3 3
It is possible to perform a slightly more compact translation with a
couple of simple optimisations. For example, note that `K (a b)' is the
same as `S (K a) (K b)', because
K (a b) x
-> a b
and
S (K a) (K b) x
-> K a x (K b x)
-> a b
Also, `S (K a) I' is the same as just `a' because
S (K a) I x
-> K a x (I x)
-> a x
The program simply reads a lambda expression from the standard input,
translates it into a combinator expression using the transformation and
optimisations described above, and prints out the result.
Why this is a practical application
===================================
Combinator expressions are easier to evaluate than lambda expressions
because there are no problems with variables. For example, if I naively
evaluated the expression `\x\y(x y) y' I might get `\y(y y)' which is
not the same as the correct result because the first `y' in the body
should refer to a global `y' rather than the local one. David Turner was
the first person to realise that because of this fact, SK combinators
provide a simple way to implement non-strict functional programming
languages, either in software or directly in hardware.
My program demonstrates this by being based on a combinator evaluation
engine which is used to implement a language called "OFL" (pronounced
like "offal" and short for "Obfuscated Functional Language"). I then
used OFL to implement all the high-level functionality described above.
This contributes a lot to the obfuscation because even small amounts of
OFL are quite unpalatable, and large expressions are completely
incomprehensible.
It is fairly well known that the lambda calculus (and hence SK
combinators) can compute anything, but mere computation is no use if you
cannot communicate with the world. This is why OFL includes a few
concessions to reality: a combinator `E' for representing characters and
testing them for equality, two IO combinators, `G' and `P', for reading
and writing characters respectively, and a combinator `J' for
representing "false". I also implemented the `Y' combinator directly
rather than in terms of the primitive combinators, since it only
required five characters to do so (viz. `Y W q Q g'). However this code
is not enabled because the definition of Y in terms of other combinators
is more cute.
The `E' combinator comes in a large number of forms, one for each
character and one for end of file. It is written `E(c)' for the variant
of `E' corresponding to the character `c'. An expression of the form
`E(c) x' is evaluated by first evaluating `x'. If `x' doesn't evaluate
to some form of `E', then evaluation of the expression terminates. If
`x' evaluates to `E(c)', then the expression evaluates to `K'. If `x'
evaluates to `E(d)' (where `d' is a different character to `c'), then
the expression evaluates to `J'.
`K' and `J' are used to represent "true" and "false" respectively; they
correspond exactly to the standard lambda calculus representations of
true and false, viz. `\t\f(t)' and `\t\f(f)' respectively. In this way a
conditional expression can just be written `cond then else', which is
even more terse than C's `cond ? then : else'. An expression that
compares a character with `a', say, can be written in the rather
obfuscated form `E('a') char then else'.
The `P' combinator, like the `E' combinator, starts off by evaluating
its first argument and checking that it is some form of `E'. It then
writes that character on stdout, and finally it calls its second
argument as a function with the argument `I', i.e.
P x f -> f I
with a side-effect.
The `G' combinator just reads a character from stdin then calls its
argument as a function with the appropriate `E' combinator as an
argument, i.e.
G f -> f E(c)
with a side-effect.
One of the problems with non-strict languages is that it is hard to
predict in advance in which order side-effects will occur. In my program
I used a monadic IO structure like that used by the programming language
Haskell; this allows one to write programs that manipulate external
state in a manner remarkably similar to imperative programming languages
like C. An examination of my program should provide an illustration of
how well this technique works in practice. (In particular, I used the
CPS form of the IO monad from page 6 of "Imperative Functional
Programming".)
The `Y' combinator is used for implementing recursive functions. It has
the evaluation rule
Y f -> f (Y f)
so that it can be used to call a function with itself as an argument.
For example, if I had the appropriate arithmetic operators defined, I
could write a factorial function in the lambda calculus like this:
Y \f\n((= n 0) 1 (* n (f (- n 1))))
OFL provides a facility for naming expressions so that they can be used
more than once without writing them in full. A definition must be written
on one line in the form `l name b expression' where `l' is short for
"let" and `b' is short for "be". Names must be more than one character
long. A complete OFL program starts off by defining the primitives it
will use with likes of the form `p (S)' etc., then goes on to define the
names it will use, and finishes off with the main expression followed by
a single `e' short for "end". OFL expressions must use the minimum
number of brackets needed to be unambiguous, except that the whole
expression must be surrounded by brackets.
OFL has a standard library which adds another five combinators to the
basic eight. These combinators provide greater scope for optimisation
when translating an expression from the lambda calculus to combinators,
and therefore more compact programs. Their evaluation rules are:
B f g x -> f (g x)
C f g x -> (f x) g
BB c f g x -> c (f (g x))
CC c f g x -> c (f x) g
SS c f g x -> c (f x) (g x)
I used them to help fit my program into the required space; unfortunately
I couldn't fit the version that fully optimises the lambda expressions
it reads into the required space. The complete optimisation rules
including the ones it doesn't use are:
S (K p) I -> p
S (K p) (K r) -> K ( p r )
S (K p) (B r s) -> BB p r s
S (K p) r -> B p r
S (B p q) (K r) -> CC p q r
S p (K r) -> C p r
S (B p q) r -> SS p q r
S p r -> S p r
The implementation of OFL
=========================
Although OFL has completely different syntax and semantics to C, all you
need to compile OFL programs is a C compiler and a header file (which in
my entry to the competition has been included in-line). The header file
implements a two-pass optimising compiler which produces a representation
of the program that is then interpreted.
The first pass of the compiler is implemented via the C preprocessor. It
implements one optimisation, namely function in-lining. Unfortunately,
this optimisation always increases object code size and execution time.
The opposite optimisation, common sub-expression elimination, (which
would offer improvements in code size and execution time) has not been
implemented. The output of this pass is a C program that may be compiled
by a normal C compiler.
The second pass is performed by the initialisation phase of the C
program. It builds a data structure representing the abstract syntax
tree of the program.
The program is then executed by an interpreter. This operates by
performing manipulations on the AST that correspond to the evaluation
rules of the combinators that I described above.
The first 54 lines of the program are mostly the interpreter; the next 8
lines are pass one of the compiler; the next nine declare the OFL
primitives; the next five define the OFL standard library; and the rest
implement the application described in the first part of this document.
This part of the program is transformed into the second pass of the
compiler by the first pass of the compiler.
The first pass of the compiler relies on some magic macros that expand
in a mutually-overlapping and almost recursive way in order to
perpetrate a complete change of syntax. Interestingly enough, if I
hadn't used the same identifier for more than one thing in various
places in the program this transformation would work with old-style cpp
as well and ANSI cpp. It expands the program from about 2.2KB to about
740KB. The resulting code does some tricky stuff to build a data
structure of the same form as the code itself.
The interpreter is written in fairly straightforward C, although it is
highly compressed. The C itself is written in a fairly functional style,
in that every function consists of a single expression. In the tradition
of functional programming languages it assumes that memory is infinite
and so never frees anything. It also assumes that the C compiler
implements the tail recursion optimisation, which is probably not true.
References
==========
I used these references in the course of writing this program and they
explain some of the ideas behind it in more detail. Arthur Norman (one
of the authors of the first paper) did a lot to enthuse me about this
topic.
Clarke, T. J. W., et al., 1980. SKIM - the S, K, I Reduction Machine.
Proceedings of the 1980 ACM LISP Conference, pp. 128-135.
Curry, H. B. and Feys, R., 1958. Combinatory Logic, Vol. 1.
North-Holland.
Gordon, M. J. C., 1988. Programming Language Theory and its
Implementation. Prentice-Hall.
Peterson, J, et al., 1997. Report on the Programming Language Haskell,
version 1.4.
Peyton Jones, S. L., 1987. The implementation of Functional
Programming Languages, Chapters 2, 11, 12, 16. Prentice-Hall.
Peyton Jones, S. L., and Wadler, P., 1992. Imperative Functional
Programming. Proceedings of the ACM Symposium on the Principles of
Programming Languages 1993, pp. 71-84.
Turner, D. A., 1979. A new implementation technique for applicative
languages. Software- Practice and Experience. Vol. 9, pp. 31-49.
Turner, D. A., 1979. Another algorithm for bracket abstraction. Journal
of Symbolic Logic. Vol. 44, no. 2, pp. 67-270.
Wadler, P., 1990. Comprehending Monads. Proceedings of the ACM
conference on Lisp and Functional Programming, 1990.
Wadler, P., 1995. How to Declare an Imperative. Proceedings of the ACM
International Logic Programming Symposium, 1995.