Westworld in Data

Unravel the mysteries of Westworld through data visualization.

Check back each week for updates!

The Westworld network

Which characters share scenes? Click and drag a character's image to explore their relationships. The thicker the line and more forceful the pull, the stronger the connection between characters.

Who knows about Arnold?

Which characters get mentioned the most (and by whom). The darker the square, the more the character is mentioned.

Which characters dominate each episode?

Hover over a point on the graph to see how many words a character spoke. Dark clusters indicate which storylines have advanced in each episode.

Who speaks the most?

Word counts per character for every episode so far.

Which group has the bulk of the dialogue?

Women or men? Hosts or humans? These charts break down the percentage of words spoken by men vs. women and hosts vs. humans.

Word clouds

What are characters actually talking about?

Who has the foulest mouth in Westworld?

Who swears the most? Do hosts or humans curse more?

The origin of this project

This is an unofficial fan site dedicated to scrutinizing HBO's original series Westworld from a data perspective. It was made by the theory-obsessed employees of Mode Analytics for a hack day project. Read all about the process in this blog post.

The Data

You can access this dataset of Westworld characters and their lines through the Mode Public Warehouse. Sign up for a free Mode account to export the tables as CSVs or explore them with SQL and Python in Mode.

The Methodology

We obtained Westworld scripts from Springfield! Springfield! film and TV script database.
We manually combed through each script to create 4 tables:

Episodes:

  • id: The episode's unique id number.
  • season: The season number.
  • episode: The episode number.
  • name: The episode name.

Characters:

  • id: The character's unique id number.
  • name: The character's name.
  • sex: The sex of the character.
  • is_human: Boolean. Whether the character is a human or a host. This may change as we learn new information throughout the season.
  • is_named: Boolean. If the character has a name. For instance is_named for Dolores would equal 'true' and is_named for Guest 1 would equal 'false'.
  • is_major_character: If the character is part of the main cast or seems very important (e.g. Arnold or Wyatt).

Lines:

  • id: The line's unique id number.
  • episode_id: The episode number.
  • conversation_id: The conversation id number. [what was the logic for breaking up conversations?
  • character_id: The id of the character who is speaking.
  • line: A string of text. For our purposes, we defined a line as the words a character speaks without being interrupted by another character speaking. We didn't take sound effects and non-intelligible sounds into account, such as gunshots or screaming
  • word_count: The number of words in the character's line.

Mentions:

  • id: The mention's unique id number.
  • episode_id: The episode number.
  • conversation_id: The conversation id number.
  • speaking_character_id: The id of the character who is speaking.
  • mentioned_character_id: The id of the character who the speaking character referred to by a proper noun. For instance, Dolores saying “Daddy” would count, but “he” or “her father” wouldn't.

Share your work

Been doing some Westworld data wrangling of your own, pardner? Send your data viz to westworld@modeanalytics.com and we might include it on the site!