Playing “Codenames” Using Word Embeddings

6 min readFeb 4, 2023

Github repo is here: https://github.com/zscore/codenames-embeddings

The Idea

Codenames is a very popular board game, usually played with two teams of two or more players. There are two teams of players, and each team has one Spymaster and one or more Operatives. The operatives see this view:

And spymasters see this view:

If you’re the Red Spymaster, you want to give a clue that will point your operatives towards the words marked in red and away from the words marked in blue and especially black! The beige words are neutral, so it’s not a big problem if either side picks them.

For example, MUZZLED 2 might be a good clue for team red. A BEAR is muzzled and a COLLAR is also used on animals along with a muzzle. Perhaps CHAIN would also be selected, which would not be good since this clue belongs to the blue team!

Word Similarity

The core of this AI is a meaure of similarity between pairs of English words. If we have a good measure of similarity between words, we could use a dictionary of the most common English words and pick a word that is similar some red words and dissimilar to the blue words and the black word.

Similarity Function

To get a similarity function, I used embeddings from fast text trained on the common crawl: https://fasttext.cc/docs/en/english-vectors.html

This dataset represents the semantic content of the word using 300 real numbers. To compare two vectors together, we use the cosine similarity function:

This is transformed into a distance function by simply subtracting this from 1, so now our similarity should range from 0 to 1.

Examples of Cosine Distance

Similarity Transform

I wanted to transform this similarity function so that it agreed with my perceptions of the word’s similarity. In my view, everything up to a distance of 0.3 is an almost perfect match, but they decline in quality up to 0.6, so I tried to fit a logistic function to match this. I think the function with a=12, b=16 works well.

If we wanted to properly calibrate this, we could check what the general preference is for raters to match a word at a given similarity to a target word and fit a curve to match this.

Utility Function

So my utility function maps the transformed similarity (treated as a probability here) into a single number so that we can present a candidate set of clues. Assuming that you are the red team spymaster, we are taking the sum of the transformed similarity with a factor of 1.0 for all red codenames (since we get a point for each of these), -1.0 for each blue codename (since this gives the other team a point), -0.04 for beige codenames because of the opportunity cost (perhaps we should also include this for the blue codenames), and -15 for the black codenames since this results in an immediate defeat if we pick this clue.

Filtering

Dictionary

We sometimes run into the problem of getting embeddings for words that are not really words For example,
`, ., I, ), :, “, (, !, ‘s.

In addition, some embeddings are quite obcsure. Do you recognize all of these words?

Aniket, Nahin, Fwy, e39, Haniyeh, Ravnica, accoglienza, Currumbin, Kawabata, Ced, Dorin, Dohrn

I solved this issue by loading a dictionary with the most common English words and filtering to only those words that are found there. Link to source: https://github.com/dwyl/english-words

Lemmatizer

I also wanted to avoid presenting the same word more than once (stings, sting, stung, stinging), so I wanted to use a lemmatizer to reduce everything to a standard form. For this purpose, I used Spacey. I loaded `en_core_web_sm` and used the NLP function to get the lemmas for our words. Sometimes the lemmatizer doesn’t work the way we want it to though; for example, viking is mapped to vike (did you even know that was a word?) and dressing was mapped to dress, so perhaps we could leave this out if we don’t mind wading through more suggestions.

Controls

I use some javascript to pull the clues out of the website at codenames.game and add these to my jupyter notebook. Then, we calculate the utility for all words in our dictionary and sort them, displaying how similar each clue is to the candidate codename. You can untick words after they’ve been selected and change teams to play through an entire game.

Codenames Can be Unticked after they are Chosen

Example

Red Starts

Some of these suggestions on the second page are interesting here. I would be tempted to try CAGED for three, although I’m not sure why ANIMAL is not an option. I think that it would suggest RAT, HAWK, and BEAR for red and would also suggest only whale in beige, which is not so bad.

AI View for Blue after we take away a few boxes

I would choose Superstructures here! Cool word.

Drawbacks

When using the expansion, I found that there were some difficulties with similarities to some of the proper nouns, although this might be fixed by finding a better set of embeddings. Also, sometimes the parser and ends up splitting up words that should not be split up like ST. PATRICK, and sometimes the lemmatizer over-normalizes certain words like VIKING. An easy alternative would be just to paste in a description of the game codenams and the clues to a LLM like Chat-GPT-3 and use that instead. I tried that with my friend, and it also performs pretty well, although it’s a bit hard to get it to give alternative answers and doesn’t necessarily understand how to avoid the assassin clue.

Further Directions

https://arxiv.org/abs/2105.05885 This paper does this but in a more sophisticated way.
We could try to use a different set of embeddings that worked better.
We could try to recalibrate the utility function or the transformation from cosine distance to probabilities.
We could use a word frequency metric to try to avoid picking words that are less common.
We could try combining embeddings with a knowledge graph?
Google chrome plugin to give recommendations while playing on codenames.game