As part of my PhD, I’ve been doing a lot of frequent graph mining, and thought I’d share some of the scripts I’ve written to mine frequent graphs using an already existing library called ParSeMiS.
ParSeMiS is a Java library that implements the gSpan algorithm (as well as a few others) for detecting frequent subgraphs in graph datasets.
The library takes two main inputs:
- An input file, that contains a list of graphs
- A minimum support s where s is either an integer > 0, or percentage, where a “subgraph” is considered frequent, if it occurs in more than s graphs
As it’s a Java library, I’ve written a small wrapper in Python to interface with it.
The wrapper uses NetworkX to manage graphs, and takes as input a list of NetworkX graphs. Currently it only deals with directed graphs, but when I get a chance, I’ll add in undirected graph support as well.
If you haven’t used NetworkX before, the choice is mainly to do with its read/write functionality. This allows you to create your graphs in a variety of different formats, and then have NetworkX load them. As the wrapper will return a list of NetworkX graphs, you can then write them to disk (or handle them yourself).
At some point in the future, I may update it to use graph-tools instead, as graph-tools has much better support for things like detecting subgraph isomorphism, and handling larger graphs in general.
The library can be found at: https://github.com/tomkdickinson/parsemis_wrapper
I’ve included a small example that shows how the wrapper works.