Building a recommender system — Part 1

As my final project before graduating with Flatiron, I chose to build a recommendation system. Zero to Hero — an Overwatch hero recommender.

Image for post
Image for post

For those who don’t know Overwatch is a dynamic 6-a-side team arena game. Players fight with heroes to control objectives and win matches.

I have plenty of business understanding in this domain and one of the biggest issues I see involves the choosing of heroes appropriate to the current situation. Twelve players play in a single match, all of different levels and understanding. Heroes have different synergies/matchups and excel at different objectives in the game. There are often situations that call for a certain hero or hero type but no one steps up to fill the role.

With a recommendation system, I thought, “I could help players of all levels diversify in their ability to play the selection of 31 heroes” (32 as of last week).

In the first part of this two-part series, we will look at installing the relevant libraries and assembling the data.

Let’s get into it:

There are others out there but this one is simple, its name stems from this simplicity. We can install it on Macs easily with pip install scikit-surprise

Installation on Windows appears to be a bit trickier as you need to ensure that you have Cython installed too. One way to do that is to install Microsoft Visual Studio 2014.

Initially, my data came in the format: User|Hero1|Hero2|Hero3|Hero4|Hero5

Image for post
Image for post
Initial data format from the source.

Scikit-Surprise likes the data to come in with three key columns: User, Item, and Rating.

Image for post
Image for post
The data needs to be in this format for the surprise library.

2.1 The data is preprocessed to drop missing values and get dummies for all heroes. This allows me to give each user a rating for every hero in the cast of 31.

Image for post
Image for post
Taking out missing values and obtaining dummy data

Getting dummies for this data made the column count over 150!

A rating function turns all the ones from the dummy data into actual ratings.

2.2 A curation function applies this over all the columns. It uses a for loop and conditionals to look for the Hero numbers 1–5 to iterate over. This data represents the top five heroes a user plays from a pool of 31. If a Hero is number 1 it gets a 5-star rating and number 5 gets a 4.2-star rating.

Image for post
Image for post
The curation function applies the rating function.

2.3 Collapsing the data back to just 3 columns:

This function uses two for loops and some indexing to create a new column `df[hero]` which will sum all the data in that hero’s row from all 150+ columns. This will equate to the rating that the hero is given since all the other values will be 0’s.

Image for post
Image for post
The long line of code is cut here but reads the same for heroes index [0 to 4].

Finally, the code above shows the value of .melt() which allows you to unpivot a data frame from a wide to long format.

Rename the columns that have appeared to something more suitable:

Image for post
Image for post
Final touches.

This final piece of code leaves us with many rows and only 3 columns:
User | Hero | Rating

Image for post
Image for post

This is how the data needs to be for the next part of this blog. We will look into how to use Scikit — Surprise to create recommendation predictions.

Written by

Practicing Data Scientist. Interested in Games, Gamification, Ocean Sciences, Music, Biology.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store