Extracting Instagram Data – Part 1

For the next two posts, I’m going to introduce some of the techniques I’ve been using to mine content on Instagram, without using their API. This will include extracting a set of posts for a particular hashtag, extracting user information, and extracting a public users timeline. This first post will introduce some of the concepts of Instagram’s REST service it’s front-end JS client uses to load data from, and start by showing how we can use it to extract posts containing a hashtag.

If you’re just interested in the python code for this example, it can be found at https://github.com/tomkdickinson/Instagram-Search-API-Python.
Continue reading

A Python Wrapper for Mining Frequent Graphs Using ParSeMiS

As part of my PhD, I’ve been doing a lot of frequent graph mining, and thought I’d share some of the scripts I’ve written to mine frequent graphs using an already existing library called ParSeMiS.

ParSeMiS is a Java library that implements the gSpan algorithm (as well as a few others) for detecting frequent subgraphs in graph datasets.

The library takes two main inputs:

  • An input file, that contains a list of graphs
  • A minimum support where s is either an integer > 0, or percentage, where a “subgraph” is considered frequent, if it occurs in more than s graphs

As it’s a Java library, I’ve written a small wrapper in Python to interface with it.

The wrapper uses NetworkX to manage graphs, and takes as input a list of NetworkX graphs. Currently it only deals with directed graphs, but when I get a chance, I’ll add in undirected graph support as well.

If you haven’t used NetworkX before, the choice is mainly to do with its read/write functionality. This allows you to create your graphs in a variety of different formats, and then have NetworkX load them. As the wrapper will return a list of NetworkX graphs, you can then write them to disk (or handle them yourself).

At some point in the future, I may update it to use graph-tools instead, as graph-tools has much better support for things like detecting subgraph isomorphism, and handling larger graphs in general.

The library can be found at: https://github.com/tomkdickinson/parsemis_wrapper

I’ve included a small example that shows how the wrapper works.