Skip to content

TomKDickinson Posts

Combining WordNet and ConceptNet in Neo4j

WordNet and ConceptNet are two popular databases of words and concepts that are used in a number of AI applications. This post will look at how we can combine the two of them into one searchable graph using Neo4j.

Before I start, some people may view this as a redundant task because ConceptNet already ingests WordNet, so why not just stick with loading ConceptNet into Neo4j?

While this is true, WordNet models its relationships at the synset level, while ConceptNet seeks to undo this (https://github.com/commonsense/conceptnet5/wiki/FAQ). This exercise is to combine both these types of abstractions into one large graph that we can use, allowing us to create graph queries against either network, or create queries that can combine both.

The rest of this post will give a quick overview of WordNet and ConceptNet, the graph model I’ve developer so far, a python script that will generate some import files for Neo4j, and finally some example cypher queries. For those just interested in the python script, you can find it at https://github.com/tomkdickinson/wordnet_conceptnet_neo4j.

Leave a Comment

Extracting Instagram Data – Part 1

For the next two posts, I’m going to introduce some of the techniques I’ve been using to mine content on Instagram, without using their API. This will include extracting a set of posts for a particular hashtag, extracting user information, and extracting a public users timeline. This first post will introduce some of the concepts of Instagram’s REST service it’s front-end JS client uses to load data from, and start by showing how we can use it to extract posts containing a hashtag.

If you’re just interested in the python code for this example, it can be found at https://github.com/tomkdickinson/Instagram-Search-API-Python.

23 Comments

Twitter Search Example in Python

This post is response to a few comments. Firstly, as some people had requested it, I’ve re-written the example from here in Python. I’ll be the first to admit, my Python skills aren’t fantastic, but I tested it against collecting 5k tweets and it appeared to work.

The library can be found at https://github.com/tomkdickinson/Twitter-Search-API-Python. Similar to the Java one, it defines an abstract class you can extend from and write your own custom implementation for what you do with the Tweets on each call to Twitter.

10 Comments