IOCCC image by Matt Zucker

The International Obfuscated C Code Contest

2025/yang2 - Most magical word

Zoltraak encoding

Author:

Award: Most magical word

To build:

    make all

To use:

    echo Message | ./prog > output.c; cc output.c; ./a.out

Try:

    ./try.sh

Alternate code:

Alternate build:

    make alt

Alternate use:

    echo Message | ./prog.alt > output.c; cc output.c; ./a.out

Judges’ remarks:

This is a fine example of concatenative programming.

Space matters in the output, but why does it also contain that single word? And what does it all mean?

A fun challenge

Create an alternative form of prog.c (i.e., prog.alt.c) that implements the compression algorithm used in the entry as a standalone program outputting a bitstream, and compare its performance with gzip -1.

The above fun challenge is still open. See the “Fun challenge Info” section for details.

Author’s remarks:

Synopsis

    gcc prog.c -o prog
    ./prog < input.txt > output.c

    g++ prog.c -o prog++
    ./prog++ < input.txt > output.c

Details

A skilled mage only needs one magic to defeat most foes, and that’s the ordinary offensive magic, also known as “zoltraak”. Inspired by the simplicity and versatility of this magic, we have created a language that contains just “zoltraak”. No operators or operands or other complications, just zoltraak. Here is a sample zoltraak program that outputs 216 + characters:

    zoltraak
    zoltraak

    zoltraak




    zoltraak


    zoltraak

    zoltraak

    zoltraak
    zoltraak









    zoltraak

In 27 lines of code (9 lines of zoltraak plus 18 blank lines), we managed to produce more bytes of output than what we put in. Not bad!

We still need a way to turn code this into an executable, and writing all these lines is rather tedious for large programs. This IOCCC entry (henceforth called “Fern”) solves both problems: simply feed the desired text through stdin, and Fern will produce C code on stdout.

    gcc prog.c -o prog
    ./prog < input.txt > output.c

    gcc output.c -o output
    ./output | diff input.txt -

Output C code starts with a few lines to define “zoltraak”, followed by a mix of zoltraak and blank lines. Output is slightly compressed, such that it’s possible to get something smaller than the original input depending on amount of redundancy present. Obviously we can achieve a higher compression ratio if we chose a shorter symbol than “zoltraak”. To use the more compact variant of zoltraak, compile Fern with a C++ compiler:

    g++ prog.c -o prog++
    ./prog++ < input.txt > output.c

Output features

All output comes with the same prelude. All difference in behavior are due to the zoltraak lines, the prelude is constant for all programs.

    ./prog < /dev/null > empty.c
    ./prog < input.txt > output.c

    head -`wc -l < empty.c` output.c | diff empty.c -

Concatenation of valid zoltraak code produces valid zoltraak code. The output of the concatenated program will be the concatenated output of the individual programs.

    ./prog < input1.txt > output1.c
    ./prog < input2.txt > output2.c
    cat input1.txt input2.txt > combined_input.txt

    cat output1.c output2.c > combined_output.c
    gcc combined_output.c -o combined_output
    ./combined_output | diff combined_input.txt -

It’s possible to determine which part of the input was most redundant by compiling and running the output with code coverage enabled. You might find the accompanying annotate_zoltraak.pl script helpful in annotating the output code. This script is compatible with both generated C code and gcov output.

    ./prog < input.txt | perl annotate_zoltraak.pl > output.c
    gcc --coverage output.c -o output
    ./output > /dev/null
    gcov output.c

Encoder features

Encoder and generated output have been verified to work under these environments:

CRC32 of the source code is embedded in the code itself (4665726e), which spells out Fern.

Code layout is based on Fern from “Sousou no Frieren” (Frieren: Beyond Journey’s End).

For people who prefer something different, there is also prog.alt.c, which does exactly what it says on the tin.

Design notes

Designing an one-symbol language and implementing a small driver to make it acceptable to C compilers is a fun exercise. This section provides some rationales behind what we have implemented. Readers are encouraged to think about how they might implement their own one-symbol language before reading further.

We would like to implement a system with these properties:

We will work backwards from how the code is designed, followed by the prelude implementation to make the code executable.

Code design

We can think of the code section as one large unary numeral where each code symbol represents a 1, and different output is encoded by varying number of symbols.

    '+' = 1111111111111111111111111111111111111111111

It’s possible to write programs this way, but it would be terribly inefficient if we wanted to support arbitrary output since the code size grows exponentially. Instead, we will assume each symbol represents varying number of 0 followed by a 1, where the number of zeroes is determined by the amount of whitespaces that precede each symbol. This enables a more efficient binary system where we can just translate 1 to symbols and 0 to spaces.

    '+' = 11010100 (least significant bit first)

But this trivial translation doesn’t work because we can’t encode trailing zeroes, due to the fact that we can’t detect trailing whitespaces. One fix is to always append an 1 bit after outputting 8 bits of data.

    '+' = 11010100 1

Because 1 bits translate to symbols, it’s more expensive to encode 1 bits compared to 0 bits. We can reduce the cost a bit by observing that bit 5 is often set in most text files (because lowercase letters, numbers, and common punctuation all have that bit set), so we can reduce average output size for text data if we XOR all bytes with 0x20.

    '+' = 11010000 1

There are other per-byte optimizations that we can try, but instead of going down that route, we implement a compression scheme similar to DEFLATE (RFC 1951), by adding an instruction that repeats some previous output sequence. This instruction is encoded as length + 0 + distance + 1:

We now have enough information to decode the 27-bit sample program near the beginning of this document.

    11010000 1 -> output '+' (0x2b ^ 0x20 = 0x0b)
    00101011 0 00000000 1 -> output 216 bytes starting from offset -1.

Prelude implementation

We would like a fixed prelude that is constant for all programs, constructed in such a way so that we can append arbitrary number of code lines and still have the end result being acceptable to a C compiler. There are a bunch of small puzzles that we need to solve to get the behavior we wanted:

It’s possible to solve all these by doing #embed __FILE__ and parse the zoltraak tokens at run time, but this means handling each zoltraak as data as opposed to translating them to actual C code. It would also require a compiler supporting C23 or C++26. Instead of #embed, all of the above can be solved using basic preprocessor tricks available since C89. A prelude that supports just the output instruction (and no repeat instructions) can be implemented in ~211 bytes.

There are at least two ways to implement the repeat instruction: a straightforward implementation is to maintain a buffer of recently printed bytes and copy from there. A more convoluted implementation is to jump back to the previous program location where we printed those bytes, run the instructions until we have printed the desired number of bytes, then return to the location after the repeat instruction. Naturally we went with the convoluted implementation, not because it’s easy but because it’s hard. It’s hard and costs more code (562 bytes, whereas the straightforward way could be implemented in ~386 bytes). But it does come with these features:

The jump-and-run method does require solving a few more puzzles:

Solution to these puzzles is left as an exercise for the reader.

Inventory for 2025/yang2

Primary files

Secondary files


 Jump to: top