Harnessing Recursion to Retrieve all Reviews.

Aleksandar Gakovic
5 min readOct 23, 2020

A Google My Business Example with Python

Photo by Laurie Byrne on Unsplash

Learning the Google Cloud API and Google My Business (GMB) framework was not straightforward. I had to go to many different sources and ask many questions along the way. In my efforts to obtain review data from a location using the GMB API, the process drove me to frustration and I laid the path as straight as I could for those who follow in this article.

That article allows you access to the reviews given that you already know your ‘locationId’ and ‘accountId’. More on that in the article above. Now I want to show you how to obtain all pages of reviews from the API with a recursive function in Python

Obtaining Tokens and Ids

Using this resource you should be able to access all the info you need for the following code to work and the ideas to make sense. The resource linked is an excellent document that has much to offer including how to access the OAuth 2.0 Playground and make ‘test’ requests to the API. We will be using the OAuth 2.0 Playground to grab location and account identifiers and also temporary authorisation tokens that allow us access to the API for a time-limited period.

# Cell missing data intentionally
headers = {'Authorization': ''}
accountId = None
locationId = None

Obtain the values to ‘accountId’, ‘locationId’ and ‘Authorization’ before going forward.

The Location and Account Ids will be numbers over 15 digits in length likely 20 digits long.

The Authorization token will likely start with ‘bearer’ and contain a long alphanumeric string. For example, it may look like this once inserted as a header value:

headers = {
'Authorization':
'Bearer 1hsdf0983223ladf.'xcvlwqr0i3098dfglm34eoj234rkjdfsav-90fdgfgfjfdlgkja'}

Once again all this information can be obtained by following the Basic Setup guide.

Once you have obtained the token and the Ids avoid making them public for safety. The bearer token will be temporary and will cease to work after a period of time (around 3000 seconds) but the Id’s persist.

Obtaining Reviews by Recursion

Now the good part. You have reviews, and perhaps they total more than 50. The default and maximum pageSize is 50 so then it is certain that the API will show you only the first page. Therefore we can implement a recursive function to retrieve all the reviews spread across multiple pages. I will show by using the learning convention “Do it for one, then do it for all”.

Retrieve the first page of reviews:

We will need the requests library.

pip install requests import requests# Cell missing data intentionally
headers = {'Authorization': ''}
accountId = None
locationId = None
URI = f'https://mybusiness.googleapis.com/v4/accounts/{accountId}/locations/{locationId}/reviews'r = requests.get(URI, headers=headers)
print(f'status code: {r.status_code}')

result = r.json()

And access the result:

result

This is standard GET request protocol, we define a URL (Unique Resource Location) or in this case a URI (identifier) and send it along with some arguments passed into headers to the API as a request. The response is saved into an object ‘r’ and then the JSON format of ‘r’ is retrieved and saved into an object ‘result’.

What does this result look like?

Following the daisy chain of documentation for this API we can find the exact method that returns this paginated list. Inside we can find some handy info like what the response will look like.

Response body for the review list request.

So how do we proceed to the next page?

Often APIs can be given URI/URLs that contain query parameters. These can specify page numbers, dates, types, or categories of results, all sorts of things. We will need to look at the Query Parameters in the documentation for this specific request. Found once again in the method for the request just above.

Quer Parameters for locations reviews list

We can see that the response body will return a “nextPageToken”: a string and that can be used as a query parameter to fetch the next page.

Query parameters are added to the end of the URI appended using a ‘?’ like so:

URI = f'https://mybusiness.googleapis.com/v4/accounts/{accountId}/locations/{locationId}/reviews?pageToken='

Great but we won’t do this manually for every single page of 50 results!

Retrieving all reviews with recursion

If you’ve not been exposed to recursion or don’t have a good handle on how it works you might want to read this article about it before going forward.

You will also need to install Pandas if you have not and the function utilises the time library to moderate the speed the request hit s the API

import pandas 
import time
import requests

We will feed the function with the same information we had above:

# Cell missing data intentionally
headers = {'Authorization': ''}
accountId = None
locationId = None
URI = f'<https://mybusiness.googleapis.com/v4/accounts/{accountId}/locations/{locationId}/reviews>'

In the recursive function below I iterate through all pages and save the results of each page to a Pandas DataFrame object on the fly. Only the reviews are saved to the data frame. The data frame is returned when the last page of results has been appended.

# Automate data extraction using recursive function:
def reviews_to_df(headers, accountId, locationId, df, URI):
r = requests.get(URI, headers=headers)
print(f'status code: {r.status_code}')
time.sleep(3)
result = r.json()
new_df = df.append(result['reviews'])
print(f'the shape of the df is now: {new_df.shape}')
if len(result['reviews']) != 50:
return new_df
else:
nextPage = result['nextPageToken']
URI = f'<https://mybusiness.googleapis.com/v4/accounts/{accountId}/locations/{locationId}/reviews?pageToken={nextPage}>'
return reviews_to_df(headers, accountId, locationId, new_df, URI)

To run the code, provide an empty data frame and save the output like so:

empty_df = pd.DataFrame()
df = reviews_to_df(headers, accountId, locationId, empty_df, URI)

Explanation:

The reviews_to_df() function takes in the parameters that we talked about headers, Ids, URI, and an empty data frame.

It makes a request like before but this time it appends the results to the empty data frame and checks to see if the length of the page result is 50 (the maximum page size). If it is the function triggers its else statement and calls itself! but this time with a few changes. The URI is changed to contain the query parameter and the value of the parameter is the value of the ‘nextPageToken’ field from the result just obtained. Once the pageSize is less than 50, the if statement is triggered, returning the data frame.

This is how we can quickly obtain all reviews from a paginated API request response body.

Conclusion

Taking the time to read about recursion has helped me a great deal and I was so glad to be able to break through the frustration of obtaining these reviews and accomplish a handy recursive function to boot! I hope this has helped streamline the process for you and if you’re still reading, well done for making it this far! Please leave a clap and a comment and thank you for your time!

References

  1. Using Google Cloud and My Business platforms to obtain reviews for a locationalgakovic.medium.com
  2. Basic Setup — Get Started Guide Google My Business API
  3. Requests library — Docs
  4. Reviews list method — Google My Business API Docs
  5. Recursion in Pythonalgakovic.medium.com

--

--

Aleksandar Gakovic

Practicing Data Scientist. Interested in Games, Gamification, Ocean Sciences, Music, Biology.