Skip to content

Latest commit

 

History

History
177 lines (114 loc) · 20.8 KB

EXPECTEDSTEPS_Readme.md

File metadata and controls

177 lines (114 loc) · 20.8 KB

Expected Number of Steps to Solve E( ) Wordle

Introduction

The expected number of steps required to solve the unknown Wordle word can be determined for any "guess" one makes in Wordle . This seemingly impossible notion is possible because the list of possible Wordle words is known and the guess word is known. Therefore, the ways the guess word relates to every possible Wordle can be examined, counted and assigned a probability. This number, called E( ) herein, shorthand for E(the guess word), is the average number of probable required steps that will arrive at the Wordle solution. Each step is assumed played with the intent to minimize the steps. E( ) is based upon probabilities associated with how guess words influence the list of possible solutions for a Wordle game.

In practice, working out the Expected Number of Steps to Solve for Wordle guesses is not necessary when considering different guesses based upon their groups characteristics. One can judge a guess word's merits based upon its group summary information alone. Knowing the group mechanics and results learned through having worked out a few Expected Number of Steps to Solve is a useful, albeit arduous, exercise.

Before getting into the weeds it must be noted that a visiting adult son Parker Seidel derived the general formula for E( ) after being requested to lend some brain power to the subject of "groups".

Jargon words and fundamental ideas are used here to avoid cluttering this writing. These are defined at the end in a section named Definitions and Fundamentals.

E( ) General Formula

  • E( ) is the sum of the following figure applied to each group generated by the groups analysis:
    • p(Gi)*[1+E(Gi)]
  • Note: This formula is in context with "groups". Gi is the i-th group.
  • p(Gi) is the probability the solution is in group Gi. p(Gi) is always n/(# of words). 'n' is the number of words in group Gi. '# of words' is the number of possible remaining solutions. Think of p(Gi) as the fraction of the current inspection context's possible remaining solutions that comprise the group Gi.
  • E(Gi) is the expected number of steps for group Gi when the solution is in the group Gi.
  • Note the recursion in the formula. The formula refers to the i-th group's E() designated as E(Gi). Each group performs its own p(Gi)*[1+E(Gi)] using its own '# of words' count for that p(Gi) that is within that E(Gi)'s context.
  • The E() value, number of steps, includes the candidate guess word's step. Understanding this is important when applying the E(Gi) for groups having more than one word member.

Calculating E( )

  • Wordle Helper's Group Optimal or Groups Driller is useful for manually figuring the expected number of steps E() of any “guess” because these components identify the groups associated with a guess.

Example

  • (The table images shown here were created prior to when the group's p2 value was included in the summary data and when the list quantity to largest group ratio was included in summary data. Furthermore, the Classic+ guess vocabulary is not used and p2 is not mentioned. The p2 value is the groups size population variance.)
  • In this example the word CARGO is the first play. The match clue is 00011 (grey,grey,grey,yellow,yellow). As it happens, the word CARGO results in 25 remaining possible solutions words. This example compares two different kinds of guesses after the first CARGO guess.
  • Later on this example will show the 25 remaining possible solutions words.

Looking At Groups Generated By In-Pool Candidates

  • In short, the in-pool word GOLEM generates the most number of groups. That is 16 groups.

  • Wordle Helper's Group Optimal performed on these 25 possible solution words using only words from these 25 words (Using the Words Showing vocabulary source option in the Wordle Helper's Group Optimal.), an in-pool operation, shows the one word GOLEM as the optimal word. GOLEM divides the 25 words into 16 groups. The maximum group size is 5 words.

    • The Words Showing refers to the words currently displayed by the Wordle Helper and is the remaining possible solutions according to the current settings. Each of those words was match tested against the same words for groups analysis. The word GOLEM is the word from the Words Showing that leads to the most number of groups.

    'EXPECT_golem_summary.png Image'

Looking At Groups Generated By Out-of-Pool Candidates

  • In short, two out-of-pool words UNLED and GILET generate the most number of groups. That is 18 groups.

  • Wordle Helper's Group Optimal performed on these 25 possible solution words using all acceptable words (Using the Large Vocabulary source option in the Wordle Helper's Group Optimal.), an operation that allows out-of-pool words, shows two words UNLED and GILET as the optimal words. UNLED and GILET divide the 25 words into 18 groups. The maximum group sizes are 3 and 4 words.

    • The Large Vocabulary refers to the 14,855 Wordle allowed words vocabulary. Each of those words was match tested against the Words Showing for groups analysis. The Large Vocabulary includes the Words Showing. The words UNLED and GILET are the words leading to the most number of groups, surpassing GOLEM by 2 groups.

    'EXPECT_unled_gilet_summary.png Image'

Let us calculate the expected number of steps E( ) for each of these three guess words starting with the two 18 group generating words UNLED and GILET

UNLED

  • These are the groups UNLED divides the 25 remaining possible solutions according to how UNLED matches to each of the 25 words. Note there is not a 22222 group. UNLED is not one of the 25 words and so UNLED is an out-of-pool selection for this step. Therefore, there cannot be a 22222 "perfect match" group. If one is still unsure about groups, understand that using UNLED for a guess into this 25 remaining possible solutions will result in one of these 18 groups as the remaining possible solutions. Which group that is depends on the actual unknown solution. What is known is that the solution is one of the 25 words, and it is sitting in one of the 18 groups. The prior guesses, just CARGO in this case, established those 18 groups by the ways CARGO is not the solution. The way UNLED "matches", ie its match clue after playing UNLED, reveals in which group the solution resides.

    'EXPECT_unled_groups.png Image'

  • There is an expected number of steps associated with each group. The following table outlines how the groups for UNLED tally up to arrive at E()=2.2801. (Sorry, the table here is not something generated by The Wordle Helper.) (The 0.0001 in the 2.2801 results from the 1.667 values instead of the more expanded value 1.6667 used in the table. The E( ) is actually 2.28.)

    'EXPECT_unled_expected.png Image'

  • In the above table every singleton group, like 11000 or 02000, has E(Gi)=1. Because the group leaves only 1 word to guess, there is 1 more step required in the context of the group. Had UNLED been an in-pool selection there would be one singleton "perfect match" group 22222 in this table that would have an E(Gi)=0.

  • Every 2 word group has an E(Gi)=1.5. It is assumed an in-pool guess will always be made when faced with a choice of 2 words. That assumption underlies why every 2 word group has an E(Gi)=1.5.

  • In this example it happens that each 3 word group resulting from UNLED has at least one possible in-pool, hole-in-one guess. Such 3 word groups have an E(Gi)=1.667. Each of the 3 words [goofy, bigot, ghost] in the group 00000 are in-pool, hole-in-one guesses. The two words [owing, going]in group 01000 [owing, going, thong] are in-pool, hole-in-one guesses. It is assumed one figures that out and would use those in-pool, hole-in-one guesses later on if the leading guess, UNLED in this case, reveals the solution is in one of these groups.

Drilling Into The 3 Word Groups To See If They Are In-Pool Hole-In-One
  • Submitting the 3 word groups to its own group optimal operation in the Groups Driller is shown here. The min-max 1 indicates each set of word groups resulting from the highlighted quess word contains only one word and therefore is a hole-in-one guesses. Being in-pool guesses, these are in-pool, hole-in-one guesses. Thus both 3 word groups have a 1.667 E(Gi).

    'EXPECT_3word_grps_inpool_hio3.png Image' 'EXPECT_3word_grps_inpool_hio3.png Image'

GILET

  • These are the groups GILET divides the 25 remaining possible solutions according to how GILET matches to each of the 25 words.

    'EXPECT_gilet_groups.png Image'

  • There is an expected number of steps associated with each group. The following table outlines how the groups for GILET tally up to arrive at E()=2.2800. As it so happens the E( ) matches that of UNLET even though GILET has a 4 word maximum group.

    'EXPECT_gilet_expected.png Image'

  • GILET's E( ) equals to UNLET's E( ) because GILET's 4 word group does have two in-pool, hole-in-one guesses. Thus, this 4 word group's E(Gi) can be 1.75.

GOLEM

  • Now let's examine the E( ) for the in-pool guess GOLEM.

  • These are the groups GOLEM divides the 25 remaining possible solutions according to how GOLEM matches to each of the 25 words.

    'EXPECT_golem_groups.png Image'

  • GOLEM's table is as follows. GOLEM has an E()=2.32 expected number of steps. The E( ) is larger than that for UNLET and GILET as might be expected since there are 16 groups versus 18 groups.

    'EXPECT_golem_expected.png Image'

  • GOLEM is an in-pool guess. Therefore, it has a 22222 "perfect match" group. That group has an E(Gi)=0. Notice that the chance GOLEM is the solution makes a tiny difference to the expected number of steps. The small difference is due to the number possible words being 25.

  • GOLEM's 5 word group does have in-pool, hole-in-one guesses, and thus it has E(Gi)=1.8.

Note

  • A spectacular example was intentionally not chosen. Luck plays a major large role in Wordle.
  • In practice fractional differences between guess E( ) are not significant. Compare each of the groups generated by UNLED, GILET and GOLEM. Pretend various words out of that 25 remaining possible solutions list are the solution and see if they fall into a singleton group. In this example GHOUL was the actual solution. GHOUL is not a singleton group for any of the example guesses GOLEM, UNLET and GILET. GOLEM and UNLET, not GILET, would have been the better guesses, but they both lead to 2 remaining possible solutions. GILET would have resulted in 4 remaining possible solutions.

Definitions and Fundamentals

  • Match Clue is the five character code equivalent of the Wordle Grey, Yellow and Green letter color code for guess word to target word letter matching. The character 0 corresponds to Grey. The character 1 corresponds to Yellow. The character 2 corresponds to Green. Each of the five character code positions corresponds to the letter positions in a five letter guess word. Match Clue 22222 means all the letters match. Match Clue 00000 means no letters match.
    • For example, the guess word CARGO for the solution word GHOUL has 00011 for the match clue. The '000' represent the letters CAR, which are not in GHOUL. The '11' represent the letters GO, which are letters present in GHOUL, but not in the last two letter positions.

  • Remaining Possible Solutions refers to all the possible solutions that satisfy the current known clues. The *remaining possible solutions must contain the day's Wordle solution word. This fundamental premise is required.

  • Groups refer to unique word groups the remaining possible solutions divide into according to how the words match or mis-matches the letters for a candidate guess word. The candidate guess word matches or mis-matches all the words in a group the same, unique way. Every remaining possible solution is in only one of the groups.
    • The Wordle Helper uses 0 in the match clue to mean a letter in the candidate guess word is not present. This corresponds to Wordle Grey color.
    • The Wordle Helper uses 1 in the match clue to mean a letter in the candidate guess word is present, but not at this position. This corresponds to Wordle Yellow color.
    • The Wordle Helper uses 2 in the match clue to mean a letter in the candidate guess word is present and at the correct position. This corresponds to Wordle Green color.

  • Singleton refers to a group containing only one word.
    • All singleton groups, except if it is a "perfect match", are E( )=1. In the context of the guess falling into that non-perfect match group the guess step is already made. There will be only one word, the solution, remaining to select. The singleton guess's non-perfect match group's contribution to the expected number of steps is p(Gi)*(1 + 1). Falling into that condition results in 2 steps total. The first step, being the first 'guess' results in one remaining word to 'guess'. The second step is 'guessing' that last remaining word.
    • The singleton "perfect match" group is E( )=0. In the context of the guess falling into that perfect match group the guess step is already made. That guess's "perfect match" group's contribution to the expected number of steps is p(Gi)*(1 + 0). Falling into that condition results in 1 step total.

  • In-Pool refers to a guess word selected from the remaining possible solutions pool. The day's Wordle solution is one of those remaining possible solutions. Which word is the solution is unknown, but it's "perfect match" group can have only that word in it. It is a singleton group. The singleton "perfect match" group's E( )=0. In-pool categorization is useful when examining a guess having a hole-in-one groups result where every group is singleton. This condition means the group has an E()<2 and by what amount can be significant.

  • Out-Of-Pool refers to a guess selected that is not a member of the remaining solutions. That guess might have been selected from a pool of words that included the remaining possible solutions but was not one of them, so it could never be the solution. Such a guess cannot have the singleton "perfect match". Out-of-pool categorization is useful when examining a guess having a hole-in-one groups result where every group is singleton. This condition means the group can be resolved to an E()=2 and no further examination is necessary.

  • Hole-In-One (HIO) refers to a group's result where every group contains only one word. Every group is a singleton. In other words a *hole-in-one is when a guess used on a remaining possible solution list of N words results in N number of groups. Thus, there is a largest size 1 reporting because the guess divided the N remaining possible solutions N ways. Remember, no word can be in more than one group.

    • An out-of-pool, hole-in-one guess is an E( )=2. The solution will be in 2 steps, always. The first guess, being out-of-pool and therefore cannot be the solution, is hole-in-one. It results in one remaining possible solution regardless of what the solution actually is. It eliminates all other solution candidates when any candidate is the solution. The second guess can only be the solution. Contrary to conventional wisdom, out-of-pool, hole-in-one guesses for seemingly large remaining possible solution lists, like nine words or more in size, are possible. Using the largest vocabulary guess pool provides more out-of-pool, hole-in-one opportunities.
    • An in-pool, hole-in-one guess is an E( )<2. The E( ) value varies according to the remaining possible solution list size N starting at 1.667 for N=3. The in-pool, hole-in-one guess E( ) approaches the value 2 as the remaining possible solution list size N increases.
  • Groups with two words always are an E( )=1.5. This is because the choice type into that condition has to be in-pool only. If out-of-pool choices were allowed then every choice for this condition could result in the same remaining two words. The expected number of steps could be infinite.

  • Context concerns the state or viewpoint from which one considers different outcomes. The context for the expected number of steps to solve E( ) for a guess is different before one makes the guess from when after the guess has been made. Getting the context confused easily happens. After the guess is made the context adds the guess step used to arrive at the list of possible groups resulting from that guess. At that after-the-guess context the expected number of steps to solve E( ) for any one of those groups does not include the arriving step. This applies to all subsequent groups generated from a group. This is the reason for [1+E(Gi)] being in the formula.

E( ) Table

  • This table shows the E( ) values for word count and group quantity sizes up to 12 words.
  • Notice that E( ) values for in-pool, hole-in-one guess situation approaches 2.00 as the number of words in a group gets larger and that the out-of-pool, hole-in-one guess situation is always 2.00 regardless of the word group size. The in-pool guess being the solution drops off in significance compared to the out-of-pool guess.
  • out-of-pool, hole-in-one guess situations are always 2-step situations regardless of word group size. They have an expected number of steps E( )=2. In fact only 2-step possibilities exist. An in-pool guess for an N word count list that results in N-1 groups, in other words there is one 2 word group in the N-1 groups, also has an E( )=2. There is a difference. The out-of-pool, hole-in-one guess results in only 2-step possibilities. The in-pool latter guess situation can result in a 1-step possibility and a 3-step possibility. The average expected steps is 2 steps.
Word Count Group Qty E() in-pool HIO E() out-of-pool HIO in-pool out-of-pool
2 2 1.50 na na na
3 3 1.67 2.00 na na
3 2 na na 2.00 2.33
4 4 1.75 2.00 na na
4 3 na na 2.00 2.25
5 5 1.80 2.00 na na
5 4 na na 2.00 2.20
6 6 1.83 2.00 na na
6 5 na na 2.00 2.17
7 7 1.86 2.00 na na
7 6 na na 2.00 2.14
8 8 1.88 2.00 na na
8 7 na na 2.00 2.13
9 9 1.89 2.00 na na
9 8 na na 2.00 2.11
10 10 1.90 2.00 na na
10 9 na na 2.00 2.10
11 11 1.91 2.00 na na
11 10 na na 2.00 2.09
12 12 1.92 2.00 na na
12 11 na na 2.00 2.08