Building a recommender system — Part 1

As my final project before graduating with Flatiron, I chose to build a recommendation system. Zero to Hero — an Overwatch hero recommender.

For those who don’t know Overwatch is a dynamic 6-a-side team arena game. Players fight with heroes to control objectives and win matches.

I have plenty of business understanding in this domain and one of the biggest issues I see involves the choosing of heroes appropriate to the current situation. Twelve players play in a single match, all of different levels and understanding. Heroes have different synergies/matchups and excel at different objectives in the game. There are often situations that call for a certain hero or hero type but no one steps up to fill the role.

With a recommendation system, I thought, “I could help players of all levels diversify in their ability to play the selection of 31 heroes” (32 as of last week).

In the first part of this two-part series, we will look at installing the relevant libraries and assembling the data.

Let’s get into it:

1. First install the Scikit-surprise library and import pandas.

There are others out there but this one is simple, its name stems from this simplicity. We can install it on Macs easily with

Installation on Windows appears to be a bit trickier as you need to ensure that you have Cython installed too. One way to do that is to install Microsoft Visual Studio 2014.

2. Next, assemble the data.

Initially, my data came in the format: User|Hero1|Hero2|Hero3|Hero4|Hero5

Initial data format from the source.

Scikit-Surprise likes the data to come in with three key columns: User, Item, and Rating.

The data needs to be in this format for the surprise library.

2.1 The data is preprocessed to drop missing values and get dummies for all heroes. This allows me to give each user a rating for every hero in the cast of 31.

Taking out missing values and obtaining dummy data

Getting dummies for this data made the column count over 150!

A rating function turns all the ones from the dummy data into actual ratings.

2.2 A curation function applies this over all the columns. It uses a for loop and conditionals to look for the Hero numbers 1–5 to iterate over. This data represents the top five heroes a user plays from a pool of 31. If a Hero is number 1 it gets a 5-star rating and number 5 gets a 4.2-star rating.

The curation function applies the rating function.

2.3 Collapsing the data back to just 3 columns:

This function uses two for loops and some indexing to create a new column `df[hero]` which will sum all the data in that hero’s row from all 150+ columns. This will equate to the rating that the hero is given since all the other values will be 0’s.

The long line of code is cut here but reads the same for heroes index [0 to 4].

Finally, the code above shows the value of .melt() which allows you to unpivot a data frame from a wide to long format.

Rename the columns that have appeared to something more suitable:

Final touches.

This final piece of code leaves us with many rows and only 3 columns:
User | Hero | Rating

This is how the data needs to be for the next part of this blog. We will look into how to use Scikit — Surprise to create recommendation predictions.

Practicing Data Scientist. Interested in Games, Gamification, Ocean Sciences, Music, Biology.