Instagram x #Developer: Automating Data Collection with Python

Instagram x #Developer: Automating Data Collection with Python

In April of this year Instagram changed its API rates, meaning that a plethora of third-party apps reliant on retrieving that data went out of business and effectively shut down

In English: A couple of months ago Instagram decided that third-party websites that track user data on Instagram should be limited in how often they could actually retrieve that data

Instagram suddenly chokes off developers as Facebook chases privacy

Sure, we all understood when Instagram started shutting down websites that did auto-liking and auto-commenting because that was technically against their terms and policies. Reducing API rates, however, affects laymen who are trying to track and analyze their data. A lot of the websites that did data tracking relied on being able to do individual data analysis of particular photos – since many influencers have hundreds of photos, a limit of 200 calls per hour hugely inhibits the ability to process through all those photos. My personal favorite (Websta) was hit by this, which unfortunately means I’ve lost data for the several months I was reliant on their service.

The DIY Approach to Collecting Instagram Statistics

When I first started tracking Instagram statistics I did it all manually (don’t ask me why, it just seemed easier), then I got tired of going in every day and manually writing things down so I started searching out effective analytics websites that would do the tracking for me. Since those websites are going down and I remember how awful it was to try to collect the data on my own, I figured it would be easier to create a python script that runs once a day and collects all my statistics for me

You can call the Instagram API without having an application with them, but you will need to get your user token from a third-party application (you can generate it from a website like this)

Using a simple GET  call, you can get all your basic profile information{access_token}

	"data": {
		"id": "1457775797",
		"username": "jonesdoeslife",
		"profile_picture": "",
		"full_name": "Johna Rutz",
		"bio": "Custom Software Developer/Consultant at Credera, part-time petsitter, and coffee enthusiast; raised in Alaska, working in Texas. Questions? ",
		"website": "",
		"is_business": true,
		"counts": {
		"media": 389,
		"follows": 196,
		"followed_by": 22259
	"meta": {
	"code": 200

The stats I care about tracking on a day-to-day basis are how many people follow me, how many people I’m following, and my media count (getting all of the ‘likes’ requires a lot more effort for not a lot more information, so I’m ignoring that for now)

The Script

Because I manage two accounts, I created methods that could be used interchangeably with different access tokens. Each time the code is run, it will append that day’s statistics onto a CSV file which can be viewed and manipulated in excel when I want to sit down and look at the numbers (I don’t need to look at nicely formatted graphs every day, I just want to collect the data)

Before running the script, I created a CSV file where I want to store all of my stats

Date, Day, Followers, Following, Media,

Then I got to the actual code

# -*- coding: utf-8 -*-
import urllib2, urlparse, json, time

# Declare Access Token Variable(s)
zoraAccess = "475223457.4137883.40ffb2cfasf89awerwaer8sdafdafwe39b1e4c5"
jonesAccess = "14123497.1677ed0.58awer8dcx8werj2m2b2i2n2o1k3n4k4b3k53f"

# File to append statistics to
zoraFile = "/Users/jrutz/Desktop/stats/zoraInstagramStats.csv"
jonesFile = "/Users/jrutz/Desktop/stats/jonesInstagramStats.csv"

# Function to retrieve JSON and create string with variables
def parseInstagramStats(accessToken):
    url='' + accessToken
    userContent = json.loads(contents)
    followers = userContent['data']['counts']['followed_by']
    following = userContent['data']['counts']['follows']
    media = userContent['data']['counts']['media']
    day = time.strftime("%a")
    date = time.strftime("%m/%d/%Y")
    return date + ',' + day + ',' + str(followers) + ',' + str(following) + ',' + str(media) + ',\n'

# User Specific 
def getInstagramStats(accessToken, filePath):
    stats = parseInstagramStats(accessToken)
    file = open(filePath, "a")
getInstagramStats(jonesAccess, jonesFile)
getInstagramStats(zoraAccess, zoraFile)

When I run python  in my terminal it goes out, grabs the information from Instagram, and updates my CSV file(s)

What it looks like in plain text form


What it looks like if you open it up in excel

All of this is great, but part of the convenience of using a third-party website was that you didn’t have to think about collecting all that information. Knowing me, I’m liable to skip a day or two even if it’s as easy as a one-line terminal command

Scheduling the Script

I’ve never actually had a script I wanted to schedule before, so the world of cron jobs was new to me (but also surprisingly simple). A cron job is just a script that you schedule to run on your computer on a specified schedule (it’s also known as a ‘clock daemon’ which sounds way cooler, but I guess cron is more syllable-efficient).

I want to run python  every day at a time when I’m pretty sure my computer will be open/on – the job won’t run if your computer is asleep or powered off. There are ways to make it run asynchronously, but I don’t feel that motivated today.

So I open up my terminal and enter

Jones-Mac:stats jrutz$ crontab -e

This opens up a VIM editor where I can then edit the cron jobs I have. The syntax of a cron job is as follows

minute hour day of month month day of week {script to be run}

I chose to run it at 9:01am because with the exception of weekends, most days I will be awake/on my computer/have access to internet around this time

If you want to get really fancy with it, there are a half-dozen other ways to set schedules. For me, once a day will work just fine. All you have to do is save the file and voila! you’re done.

(I was super paranoid about this because I’d never used it before, so I tested the cron job with times that were just a minute or two further along to double-check that it was working then went back and deleted the extra data)


Instagram recently(ish) changed their policies which makes it more difficult for websites to offer the analytics they were offering previously and still be profitable. I still want to collect my Instagram data but am also pretty lazy and am tired of relying on other people, so decided to write a script that will do it for me and run once a day. What could be easier than that?

Share your experience

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: