Strava ride data with Python & Pandas

3 minute read

After spending the past 2.5 months madly finishing up projects, cycling took a serious back seat. Which is OK, because as you will see, cycling took the driver’s seat for much of 2018.

Now that the projects are done (or at least at v1), I’ve been itching to look at my Strava data again, this time with Python. In fact, I’ve been itching to look at my ride data a helluva lot more than to go riding.

That’s a bit of a problem, fitness-wise, although it gave me even more incentive to connect to the Strava API.

Here are some simple overview plots using variations on Pandas’ df.plot(kind = 'barh') that show my annual progressions for some ride stats.

For the sake of clarity, you’ll find all the code to get, clean & plot Strava data after the plots.

How much time did I spend on the bike?

png

How many rides were done on the trainer?

png

This could explain why I found myself dreading the trainer by the end of the year.

How far did I go?

png

And how much climbing was involved?

png

Admittedly, these can hardly compete with the slick video Strava sends you at the end of the year. However, it has been satisfying being able to look at stats that Strava doesn’t talk about much, if at all, like rides done on the trainer, as well as being able to see my annual progression.

Code

Get the data

# import librairies

import urllib2
import json
import pandas as pd

import matplotlib as plt
%matplotlib inline

import seaborn as sns
sns.set()

# For the base URL we use for activities
token = 'c3f*********' # replace with your token
base_url = "https://www.strava.com/api/v3/activities?access_token=%s&per_page=200&page=" % token
page_num = 1

# To store the json response
my_acts = "my_activities.txt"

cols = ["start_date","type","name","id","average_cadence",
        "average_heartrate","average_speed", "distance",
          "moving_time", "location_city", "pr_count","suffer_score",
          "trainer", "total_elevation_gain", "workout_type","map"]

### Starter code adapted from http://pdwhomeautomation.blogspot.fr/2014/11/raspberry-pi-and-strava-api-1.html

def get_strava_data(token = token, filename = my_acts):
    #Open the my_file to use
    my_file = open(filename,'w')

    #Loop extracting data.  Remember it comes in pages
    end_found = False
    page_num = 1

    strava_df = pd.DataFrame(columns = cols)


    #Main loop - Getting all activities
    while (end_found == False):
        #Do a HTTP Get - First form the full URL
        act_url = base_url + str(page_num)
        strava_json = urllib2.urlopen(act_url).read()

        if strava_json != "[]":   # Checks for empty JSON response, i.e. end of activities
            #Now we process the JSON
            act_json = json.loads(strava_json)
            page_df = pd.DataFrame(act_json)

            # reduce number of fields in df
            # page_df = page_df[cols] TO FIX: key error with avg HR
            strava_df = strava_df.append(page_df)

            #Set up for next loop
            cum_sum = page_num * 200
            print("{} activities printed...".format(cum_sum))
            page_num += 1
            strava_json = []

        else:
            end_found = True
            print("*** No more activities. ***")

    return strava_df

# create data frame

strava_df = get_strava_data()

activities printed...
activities printed...
activities printed...
activities printed...
activities printed...
activities printed...
*** No more activities. ***

Data wrangling

## Helper functions

def s_to_hrs(secs):
    m, s = divmod(secs, 60)
    h, m = divmod(m, 60)
    #d, h = divmod(h, 24)
    return("{} hrs, {} m".format(h,m))

def clean_strava(df = strava_df, cols = cols):

    # limit columns
    mystrava = df.copy()[cols]
    mystrava['start_date'] = pd.to_datetime(mystrava['start_date'], yearfirst = True, infer_datetime_format = True)

    # extract month & year
    mystrava['year'] = pd.DatetimeIndex(mystrava['start_date']).year
    mystrava['month'] = pd.DatetimeIndex(mystrava['start_date']).month

    # convert moving time to integer
    convert_cols = ['moving_time', 'pr_count', 'suffer_score', 'trainer']
    mystrava[convert_cols] = mystrava[convert_cols].fillna(0).astype(int)

    # convert distance to KMs
    mystrava['distance'] = mystrava['distance'].divide(1000)

    return mystrava

# clean up & filter data
types = ['Ride']
q = "type in @types & year < 2019 & year > 2009"

mystrava = clean_strava().query(q)

# sum activities by year
mystrava_by_yr = mystrava.set_index(['id']).groupby(['year','type']).sum()

# create moving_hrs
mystrava_by_yr['moving_hrs'] = mystrava_by_yr['moving_time'].apply(lambda x: s_to_hrs(x))

## plot function
types_str = ", ".join(types)

def act_bar(df = mystrava_by_yr, metric = 'moving_time', unit = 'seconds', legend = False):
    ax = df.reset_index().query(q).set_index('year')[['type', metric]]\
    .pivot(columns = 'type', values = metric)\
    .plot(kind='barh', legend = legend,
          title = "{} {}".format(str.title(types_str), metric))
    ax.set_ylabel("")
    ax.set_xlabel(unit);

plots

Summary plots

# how much time did I spend on the bike?
act_bar()

png

# how much is that in hours & minutes?
mystrava_by_yr.query(q)['moving_hrs']

year  type
Ride    103 hrs, 50 m
Ride    146 hrs, 29 m
Ride     69 hrs, 55 m
Ride    167 hrs, 32 m
Ride    319 hrs, 35 m
Name: moving_hrs, dtype: object

# how many rides?
mystrava.query(q).groupby(['year']).count()['id'].sort_index(ascending = False)

year
  133
   80
   43
   72
   35
Name: id, dtype: int64

# how many of those were on the trainer?
mystrava.query(q).groupby(['year', 'trainer']).count()['id'].unstack()\
    .plot(kind = 'barh',
          title = "{} count".format(types_str));

png

# how far did I go?
act_bar(metric = "distance", unit = 'Kilometers')

png

# how high did I climb?
act_bar(metric = 'total_elevation_gain', unit = "Meters")

png

Share on

Twitter Facebook Google+ LinkedIn

Sarah Hosking