IOCCC image by Matt Zucker

The International Obfuscated C Code Contest

extra.html for 2024/burton

extra

This readme provides a description of the extra files provided, how to use them, analysis of their results, and a deeper investigation on the failed guesses.

The recommendation is to do:

    make tools

    ./run dict blast swept      # exhaust runs this command a LOT!
    ./run dict crate swept      # show the diff between starting words

    mkdir results               # create the results directory
    ./exhaust                   # exhaustive test of all words (~1 hour)
    echo terns | ./word.result  # add the word "terns" to results
    ./mkresults > results.avg   # proc exhaust into a single, small file

The files created as output from exhaust allow one to view, for each starting word, how each of the solutions resolve, including the misses. mkresults outputs a 1566 line report showing the starting word, its average accuracy, and the number of missed words. To show the number of guesses and the actual missed words, run rank on the desired word:

    ./rank results/blast

    1 1         # blast guessed itself
    2 112       # 112 words guessed in 2 attempts
    3 867       # 867 words guessed in 3 attempts
    4 1019      # ...
    5 271
    6 41
    foyer       jaunt     shave     wound

    ./avg results/blast

    blast  3.673002       4

    grep blast results.avg

    blast  3.673002       4

The C programs allow for (automated) testing:

    gen.c - simulate wordle game by generating targets and scoring guesses
        ./gen dict [target]

    dups.c - remove words with(out) repeated letters
         (best starting words have no repeats)
        cat dict | wc -l        # all the words
        cat dict | ./dups | wc -l   # words with repeated letters
        cat dict | ./dups -n | wc -l    # words without repeated letters

    run.c - `gen` and `prog` connected via pipes; allows automated testing
        ./run dict [start [target]]
            start - the first guess
            target - force the target to be the named word

Using More Guesses

If you wish to see the failed words in action, and if more guesses would work, try:

    ./run dict blast jaunt | grep ^gen

    gen: target: jaunt
    gen: blast bbybg
    gen: daunt bgggg
    gen: haunt bgggg
    gen: gaunt bgggg
    gen: vaunt bgggg
    gen: taunt ygggg
    gen result end=0: n 6 lose: jaunt

    GUESSES=7 ./run dict blast jaunt | grep ^gen

    gen: target: jaunt
    gen: blast bbybg
    gen: daunt bgggg
    gen: haunt bgggg
    gen: gaunt bgggg
    gen: vaunt bgggg
    gen: taunt ygggg
    gen: jaunt ggggg
    gen result end=2: y 7 good: jaunt

Using GUESSES=9 and exhaust, one gets a metric that scores words without misses, allowing a better comparison on the average score. Note, however, that the game doesn’t bend this way, and anything more than 6 guesses is a loss. Take this metric with a grain of salt. Also, 9 is the limit of guesses; the program will not work properly - if at all - with more than 9 guesses.

Ranking by fewest missed with GUESSES=6, the order is:

    blast palsy climb flash lapse

Ranking by average with GUESSES=6, the order is:

    crate slate trace crane least

Ranking by average with GUESSES=9, the order is:

    slate trace crate train least

To reproduce these results:

    GUESSES=9 ./exhaust results.9
    ./mkresults results.9 > results.9.avg

Simple Analysis

The 1566 raw files produced by exhaust are ~370KB each (total ~562MB), far too large to submit, so included are the compressed results files for four starting words, and the summary file results.avg.

    blast   fewest missed
    crate   best average
    query   most missing
    terns   example used in remarks

The program rank allows one to see how the averages are calculated, as well as the actual missed words. Comparing the results for blast and terns:

    ./rank r.terns | awk 'NF==2' > a
    ./rank r.blast | awk 'NF==2' > b
    join a b | column -t

    guesses   terns  blast
    --------  -----  -----
    2           110    112
    3           909    867
    4           960   1019
    5           265    271
    6            53     41

    F            18      4  (failed puzzles)

In general, the program avg is easier to use, but conveys less information:

    ./avg r.terns; ./avg r.blast
    terns  3.641469      18
    blast  3.673002       4

You have to decide if you prefer a marginally lower average, but slightly higher chance of losing, versus a marginally higher average and lower chance of losing. This is developed further in the following section.

It is interesting that the letter order makes a noticeable difference:

    ./avg r.terns; ./avg results/stern
    terns  3.641469      18     # lower average and fewer misses
    stern  3.658315      20

For completeness, using GUESSES=9 and the full starting dictionary from Wordle(tm), here are the results for the “best” starting words: salet, slate, and blast.

    guesses  salet  slate  blast
    -------  -----  -----  -----
    1            0      1      1
    2          148    146    112
    3          932    939    867
    4          925    919   1019
    5          240    235    271
    6           56     57     41
    ---      -----  -----  -----
    7           11     13      4
    8            2      4      0
    9            1      1      0
    ---      -----  -----  -----
    avg      3.641  3.644  3.685

Obviously, salet produces the best results, excepting the losses (14). Just re-arranging the letters into slate produces almost the same, but there are more losses (18). If attempting to minimize losses, then blast is the optimal starting guess, since it never takes more than 7 guesses and has the fewest losses (4) of any starting word in the combined full Wordle(tm) dictionary.

NB: The averages for GUESSES=9 are larger than the default, because the misses are essentially folded into the average. The averages are more accurately compared when using the higher limit, although minimizing losses remains the priority (who likes to lose?).

Seek These

The starting words with the fewest misses with dict:

    sort -k3n results.avg | sed 18q | awk '{print $1, $3}' | column -t | pr -t -3

    blast  4        ralph  7        crawl  8
    palsy  6        ranch  7        crumb  8
    climb  7        shawl  7        grasp  8
    flash  7        tramp  7        lymph  8
    lapse  7        clash  8        spark  8
    prowl  7        clasp  8        swamp  8

The starting words with the best accuracy:

    sort -k2n results.avg | sed 18q | pr -t -3

    crate  3.586609  20 irate  3.611231  31 trice  3.617279  15
    slate  3.586609  18 slice  3.612095  22 cater  3.620302  26
    trace  3.588769  18 stale  3.612959  19 satin  3.621598  19
    crane  3.595248  26 train  3.613391  14 snail  3.621598  19
    least  3.599568  19 trade  3.613823  20 stare  3.621598  19
    plate  3.606479  22 leant  3.616415  18 dealt  3.622462  15

    Note the word with fewest misses, `blast`, is relatively far
    down the list in accuracy at 178th best:

    sort -k2n results.avg | pr -t -n | grep blast

    178 blast  3.673002  4

Versus starting words with fewest misses:

    egrep 'terns|lapse|blast|climb|crate' results.avg

    blast  3.673002  4
    climb  3.772354  7
    crate  3.586609  20
    lapse  3.664795  7
    terns  3.641469  18

             weighted avg
    #  blast  crate  blast   crate    improvement
    -  -----  -----  ---------------  -----------
    1      1      1  1.0000  1.0000    0.0000
    2    112    147  1.9912  1.9932   -0.0020
    3    867    952  2.8837  2.8645   -0.0192
    4   1019    884  3.4527  3.3951   -0.0576
    5    271    250  3.6374  3.5775   -0.0599
    6     41     61  3.6794  3.6429   -0.0365

    F      4     20  -16    +16 (0.26% blast vs 1.28% crate)

Notice the difference in the average and weighted average. The program avg simply takes the total guesses as a percentage of the total words. The program wavg takes each of the six scores and weights them by the score. I wrote wavg later to track my progress playing the online game, and it is a more accurate score IMHO, but both are useful for comparisons.

    $ ./avg results/blast
    blast  3.673002  4
    $ ./rank results/blast
    1 1
    2 112
    3 867
    4 1019
    5 271
    6 41
    foyer         jaunt     shave     wound
    $ ./wavg 1 112 867 1019 271 41
    3.6794

The program wrank (weighted rank) makes this latter calculation much easier. It takes the output from rank and invokes wavg with these numbers, saving a lot of typing and errors.

    ./wrank blast
    blast 3.6794

In words, crate as a first guess gets more correct answers in 2 and 3 guesses, reducing its cost. It looks to be a stronger candidate, but it misses 16 more words (therefore losing the game more often). Your task is to determine your risk profile, and decide among the variety of possible starting points, then let prog recommend your next guess.

A somewhat laborious calculation provides the answer to which starting word has the most answers in two words and three words:

    # two words
    for n in results/*; do
        rank $n | awk -v word=$n '/^2/ {print word,$2}';
    done | sort -k2nr | sed 3q

    trace 149
    crate 147
    slate 146

    # three words
    for n in results/*; do
        rank $n | awk -v word=$n '/^3/ {print word,$2}';
    done | sort -k2nr | sed 4q

    crate 952
    train 944
    trace 939
    slate 936

    # versus the averages
    sort -k2n -k3n results.avg | sed 3q
    slate  3.586609      18
    crate  3.586609      20
    trace  3.588769      18

Avoid These

The starting words with the worst accuracy:

    sort -k2nr results.avg | sed 18q | pr -t -3

    jumpy  3.997408  21 juicy  3.954212  39 squib  3.927862  28
    jumbo  3.996976  25 quake  3.952052  38 quark  3.926566  30
    buxom  3.995248  23 jerky  3.950756  31 quoth  3.920950  33
    quick  3.987041  36 equip  3.939957  22 gawky  3.918790  22
    quack  3.974514  23 quasi  3.938661  32 unzip  3.915335  24
    quirk  3.964147  40 epoxy  3.934773  27 judge  3.914903  33

The starting words with the most failures (missed words):

    sort -k3nr results.avg | sed 18q | pr -t -3

    query  3.893305  43 juicy  3.954212  39 enjoy  3.890713  37
    quite  3.835421  42 quota  3.881641  39 gauze  3.878186  37
    yield  3.740821  41 quake  3.952052  38 oxide  3.888121  37
    juice  3.847084  40 quiet  3.857451  38 fiery  3.772786  36
    quirk  3.964147  40 quote  3.829374  38 quail  3.893737  36
    axion  3.796544  39 unify  3.862203  38 quick  3.987041  36

The worst target words, for any starting word (N out of 1566):

    ./worst

    wound 1340    jaunt 1254    hatch 1157    foyer 1109    waste 1057
    found 997     shave 945     wight 943     willy 790 catch 779
    boxer 762     jolly 751     graze 626     rower 611 fatty 598
    roger 568     taunt 520     watch 497     witty 483 wooer 451
    tight 449     bound 442     swore 415     taste 405 mower 378
    hover 374     bunch 371     shake 358     racer 348 rarer 294
    paper 289     water 285     stave 271     ferry 250 fever 242
    fully 241     paddy 239     fight 225     parer 219 taffy 207
    frown 196     joker 195     filly 188     batty 179 lowly 179
    staff 176     jazzy 174     piper 152     vying 152 swear 151

SPOILER Deep Dive on Near Misses SPOILER

Dissecting the failed targets when starting with blast. Recall blast only misses 4 words out of all 2315:

    foyer       jaunt     shave     wound

These four words have common sub-patterns, which in hard-mode is difficult to disambiguate. Using run dict blast $word, here are the grep sub-patterns and counts for these four words as revealed in the output:

    foyer   25  grep '^.o.er' dict | wc -l
    jaunt    6  grep '^.aunt' dict | wc -l
    shave    7  grep '^sha.e' dict | wc -l
    wound    8  grep '^.ound' dict | wc -l

Looking at the patterns, the program is making guesses that just happen to select the correct word last in its choices. In all cases, on the seventh guess, the program has the correct answer using blast as the starting guess. It is easy to do an exhaustive look for only four words, but as the count goes up, so does the work.

Several variations were attempted, including randomization of the few results remaining. Besides being non-deterministic (this time it wins, next time it loses), better results were obtained with other changes to the selection algorithm. Some small insight into the algorithm used may be seen by the grep command lines, although there are additional insights used within the program that improve significantly upon these results.

    ./run dict blast foyer | grep ^gen
    gen: target: foyer  2315
    gen: blast bbbbb    429 grep -v '[blast]' dict | wc -l
    gen: drone byyby    27  grep r dict | grep o | grep e | grep -v '[blastdn]' | wc -l
    gen: cower bgbgg    8   grep '^.o.er' dict | grep -v '[blastdncw]'
                        foyer homer hover joker
                        mover poker roger rover
    gen: mover bgbgg    4   grep '.o.er' dict | grep -v '[blastdncwmv]'
                        foyer joker poker roger
    gen: poker bgbgg    rm poker joker (foyer roger)
    gen: roger ygbgg    rm roger (foyer)
    gen result end=0: n 6 lose: foyer

    ---------------

    ./run dict blast jaunt | grep ^gen
    gen: target: jaunt  2315
    gen: blast bbybg    46  grep 't$' dict | grep a | grep -v '[bls]' | wc -l
    gen: daunt bgggg    5   grep '^.aunt$' dict | grep -v '[blsd]'
                        gaunt haunt jaunt taunt
                        vaunt
    gen: haunt bgggg    rm haunt (gaunt, jaunt, taunt, vaunt)
    gen: gaunt bgggg    rm guant (jaunt, taunt, vaunt)
    gen: vaunt bgggg    rm vaunt (jaunt, taunt)
    gen: taunt ygggg    rm taunt (jaunt)
    gen result end=0: n 6 lose: jaunt

    ---------------

    ./run dict blast shave | grep ^gen
    gen: target: shake  2315
    gen: blast bbgyb    52  grep '^..a' dict | grep s | grep -v '[blt]' | wc -l
    gen: share gggbg    5   grep '^sha.e' dict | grep -v '[bltr]'
                        shade shake shame shape
                        shave
    gen: shade gggbg    rm shade (shake shame shape shave)
    gen: shape gggbg    rm shape (shake shame shave)
    gen: shame gggbg    rm shame (shake shave)
    gen: shake gggbg    rm shake (shave)
    gen result end=0: n 6 lose: shave

    ---------------

    ./run dict blast wound | grep ^gen
    gen: target: wound  2315
    gen: blast bbbbb    429 grep -v '[blast]' dict | wc -l
    gen: drone ybygb    7   grep 'n.$' dict | grep d | grep o | grep -v '[blastre]' | wc -l
    gen: pound bgggg    4   grep '^.ound' dict | grep -v '[blastreh]'
                        found hound mound wound
    gen: mound bgggg    rm mound (found, hound, wound)
    gen: hound bgggg    rm hound (found, wound)
    gen: found bgggg    rm found (wound)
    gen result end=0: n 6 lose: wound