Instagram x #Developer: Automating Data Collection with Python

Instagram x #Developer: Automating Data Collection with Python
22,259 Followers

In April of this year Instagram changed its API rates, meaning that a plethora of third-party apps reliant on retrieving that data went out of business and effectively shut down

In English: A couple of months ago Instagram decided that third-party websites that track user data on Instagram should be limited in how often they could actually retrieve that data

Instagram suddenly chokes off developers as Facebook chases privacy

Sure, we all understood when Instagram started shutting down websites that did auto-liking and auto-commenting because that was technically against their terms and policies. Reducing API rates, however, affects laymen who are trying to track and analyze their data. A lot of the websites that did data tracking relied on being able to do individual data analysis of particular photos – since many influencers have hundreds of photos, a limit of 200 calls per hour hugely inhibits the ability to process through all those photos. My personal favorite (Websta) was hit by this, which unfortunately means I’ve lost data for the several months I was reliant on their service.

The DIY Approach to Collecting Instagram Statistics

When I first started tracking Instagram statistics I did it all manually (don’t ask me why, it just seemed easier), then I got tired of going in every day and manually writing things down so I started searching out effective analytics websites that would do the tracking for me. Since those websites are going down and I remember how awful it was to try to collect the data on my own, I figured it would be easier to create a python script that runs once a day and collects all my statistics for me

You can call the Instagram API without having an application with them, but you will need to get your user token from a third-party application (you can generate it from a website like this)

Using a simple GET  call, you can get all your basic profile information

https://api.instagram.com/v1/users/self/?access_token={access_token}

{
	"data": {
		"id": "1457775797",
		"username": "jonesdoeslife",
		"profile_picture": "https://scontent.cdninstagram.com/vp/ca54a753645d4a27193102aa188fc237/5BEABFAF/t51.2885-19/s150x150/20214491_493624530982026_8438940438273982464_a.jpg",
		"full_name": "Johna Rutz",
		"bio": "Custom Software Developer/Consultant at Credera, part-time petsitter, and coffee enthusiast; raised in Alaska, working in Texas. Questions? ",
		"website": "http://www.johnarutz.com/ask",
		"is_business": true,
		"counts": {
		"media": 389,
		"follows": 196,
		"followed_by": 22259
		}
	},
	"meta": {
	"code": 200
	}
}

The stats I care about tracking on a day-to-day basis are how many people follow me, how many people I’m following, and my media count (getting all of the ‘likes’ requires a lot more effort for not a lot more information, so I’m ignoring that for now)

The Script

Because I manage two accounts, I created methods that could be used interchangeably with different access tokens. Each time the code is run, it will append that day’s statistics onto a CSV file which can be viewed and manipulated in excel when I want to sit down and look at the numbers (I don’t need to look at nicely formatted graphs every day, I just want to collect the data)

Before running the script, I created a CSV file where I want to store all of my stats

Date, Day, Followers, Following, Media,
jonesInstagramStats.csv

Then I got to the actual code

# -*- coding: utf-8 -*-
import urllib2, urlparse, json, time

# Declare Access Token Variable(s)
zoraAccess = "475223457.4137883.40ffb2cfasf89awerwaer8sdafdafwe39b1e4c5"
jonesAccess = "14123497.1677ed0.58awer8dcx8werj2m2b2i2n2o1k3n4k4b3k53f"

# File to append statistics to
zoraFile = "/Users/jrutz/Desktop/stats/zoraInstagramStats.csv"
jonesFile = "/Users/jrutz/Desktop/stats/jonesInstagramStats.csv"

# Function to retrieve JSON and create string with variables
def parseInstagramStats(accessToken):
    url='https://api.instagram.com/v1/users/self?access_token=' + accessToken
    contents=urllib2.urlopen(url).read()
    userContent = json.loads(contents)
    followers = userContent['data']['counts']['followed_by']
    following = userContent['data']['counts']['follows']
    media = userContent['data']['counts']['media']
    day = time.strftime("%a")
    date = time.strftime("%m/%d/%Y")
    
    return date + ',' + day + ',' + str(followers) + ',' + str(following) + ',' + str(media) + ',\n'

# User Specific 
def getInstagramStats(accessToken, filePath):
    stats = parseInstagramStats(accessToken)
    file = open(filePath, "a")
    file.write(stats)
    file.close()
    
getInstagramStats(jonesAccess, jonesFile)
getInstagramStats(zoraAccess, zoraFile) 
collectInstagramStats.py

When I run python collectInstagramStats.py  in my terminal it goes out, grabs the information from Instagram, and updates my CSV file(s)

What it looks like in plain text form

 

What it looks like if you open it up in excel

All of this is great, but part of the convenience of using a third-party website was that you didn’t have to think about collecting all that information. Knowing me, I’m liable to skip a day or two even if it’s as easy as a one-line terminal command

Scheduling the Script

I’ve never actually had a script I wanted to schedule before, so the world of cron jobs was new to me (but also surprisingly simple). A cron job is just a script that you schedule to run on your computer on a specified schedule (it’s also known as a ‘clock daemon’ which sounds way cooler, but I guess cron is more syllable-efficient).

I want to run python collectInstagramStats.py  every day at a time when I’m pretty sure my computer will be open/on – the job won’t run if your computer is asleep or powered off. There are ways to make it run asynchronously, but I don’t feel that motivated today.

So I open up my terminal and enter

Jones-Mac:stats jrutz$ crontab -e

This opens up a VIM editor where I can then edit the cron jobs I have. The syntax of a cron job is as follows

minute hour day of month month day of week {script to be run}

I chose to run it at 9:01am because with the exception of weekends, most days I will be awake/on my computer/have access to internet around this time

If you want to get really fancy with it, there are a half-dozen other ways to set schedules. For me, once a day will work just fine. All you have to do is save the file and voila! you’re done.

(I was super paranoid about this because I’d never used it before, so I tested the cron job with times that were just a minute or two further along to double-check that it was working then went back and deleted the extra data)

TL;DR

Instagram recently(ish) changed their policies which makes it more difficult for websites to offer the analytics they were offering previously and still be profitable. I still want to collect my Instagram data but am also pretty lazy and am tired of relying on other people, so decided to write a script that will do it for me and run once a day. What could be easier than that?



2 thoughts on “Instagram x #Developer: Automating Data Collection with Python”

  • I am enjoying the tech breakdown of how you use instagram info. Hi from another tech person on instagram btw. 🙂

    Curious to know and if you are would not mind me asking- how did the data that you collected helped in your instagram growth to over 20K now? I am reading on using data effectively for work and currently looking at the simplest use case of collecting data from my own instagram profile as a trial run before using it for work. Thanks for your time

  • Glad you’re enjoying! My data held up to about the 25k mark (I predicted I’d hit it at the end of October 2018 and did, but then took a holiday hiatus so it slid back down to 24.8k)

    What collecting this data helped me do is show how my engagement translated to growth. For example, if I post every day I get stable growth, if I post just once a week I lose followers. If I post pictures with me in it I get more followers than pictures of just a laptop which gets more than a random photo, etc. In that way it guides my mental “rules” for my account because I don’t have to guess what’s driving the growth. I remember when I started some blogs said to post 3x a day every day and that seemed crazy to me (and now I have the data to back me up).

    What I didn’t account for when I started doing predictions was that after two years I engage much less actively and regularly with the platform than when I started. This has meant that the whole of my past data doesn’t accurately reflect how my account grows today.

    Best of luck! Would love to hear how you progress.

Share your experience

This site uses Akismet to reduce spam. Learn how your comment data is processed.


%d bloggers like this: