IOCCC image by Matt Zucker

The International Obfuscated C Code Contest

2025/ncw3 - Best use of Unicode

F O R T H

Author:

Award: Best use of Unicode

To build:

    make all

To use:

    ./prog

Try:

    ./try.sh

Judges’ remarks:

We marveled at how few lines of code it takes to devolve (or to evolve?) C into FORTH.

A fun challenge

Create an alternative form of prog.c (i.e., prog.alt.c) that draws a Julia set.

The above fun challenge is still open. See the “Fun challenge Info” section for details.

Author’s remarks:

About FORTH

We are proud to present FORTH - a FORTH for the 21st century, re-engineered from first principles and second guesses. This revolutionary dialect compiles with any C compiler (given enough flags and emotional support) and brings the legendary transparency of FORTH programming straight into the C ecosystem, where it clearly belongs.

This modern FORTH uses double-precision floating-point numbers exclusively, delivering the same mathematical integrity as JavaScript with none of the user-friendly distractions. Integers were deprecated for philosophical reasons.

FORTH fully embraces Unicode - because if you’re not using obscure multi-byte characters that don’t render properly in your terminal, are you even programming? APL had its chance. It’s our turn to make your keyboard cry.

Requirements: - needs POSIX usleep(3) - a terminal with UTF-8 Support with Block Elements - a terminal with 24-bit color support - terminal window set to at minimum 120x40 - gcc >= 10 or clang >= 14 - may compile with other compilers!

Beyond the architectural manifesto below, no explanation of the C source mechanics is provided. The lucid self-documentation of the code should be apparent to all. If you still find it confusing, that’s not a bug, it’s a feature: FORTH is simply operating on a higher stack frame of consciousness.

FORTH - for those who looked at C and thought, “too straightforward.”

Architectural “Decisions”

While lesser implementations might adhere blindly to the ANS Forth standard, FORTH takes a more… creative approach to compliance.

Debugging & “Generative Art”

Writing the demo program was a nostalgic trip back to the metal. gdb was helpful, but FORTH includes a built-in “Chaos Monkey”: the stack is not checked for underflow or overflow.

When the stack becomes unbalanced, the program does not simply “error out.” It engages the Memory Visualization Engine (often confused with “Segmentation Faults”), producing unintentional but aesthetically pleasing graphical glitches before unceremoniously dying.

The Source Code

The source code prog.c has been formatted according to the strict style guidelines of the FORTH steering committee.

Spoilers

The following section explains exactly how the C preprocessor and tokenizer were abused to make C look like FORTH. We recommend reading the source code before decoding this.

The “FORTH” source code is actually valid C, made possible by a heavy abuse of Unicode characters and the C preprocessor.

  1. The “Invisible” Syntax: The core trick relies on C macros defined with invisible Unicode characters. These handle the stack manipulation and control flow, allowing the visible text to remain clean.

    • U+200B (Zero Width Space) expands to (void){, turning function names into C function definitions.
    • U+200C (Zero Width Non-Joiner) expands to ();, executing the previous word as a function call.
    • U+200D (Zero Width Joiner) acts as the C semicolon ;.
  2. The “Visible” Mimicry: Standard Forth words are implemented using Unicode Fullwidth forms to avoid C syntax collisions.

    • The Forth colon : is actually U+FF1A (Fullwidth Colon), defined as void.
    • The Forth semicolon ; is actually U+FF1B (Fullwidth Semicolon), defined as }.
    • Operators like , , >R use their Fullwidth counterparts.
  3. Variables & Memory: The VARIABLES keyword is a macro for union E. The variables themselves (like u, v) are declared with an invisible separator (U+2063) that expands to a designated initializer using __COUNTER__. This automatically links C variables to the Forth memory array M.

  4. Literals:

    • Double literals are preceded by U+2061 (Function Application), which expands to (*--_).i=.
    • String literals use U+2062 (Invisible Times), which expands to (*--_).s = 1+. The 1+ skips the leading space inside the quote, allowing the source to mimic the Forth " string" syntax.

Inventory for 2025/ncw3

Primary files

Secondary files


 Jump to: top