Skip to content

Category: Data Science

Combining WordNet and ConceptNet in Neo4j

WordNet and ConceptNet are two popular databases of words and concepts that are used in a number of AI applications. This post will look at how we can combine the two of them into one searchable graph using Neo4j.

Before I start, some people may view this as a redundant task because ConceptNet already ingests WordNet, so why not just stick with loading ConceptNet into Neo4j?

While this is true, WordNet models its relationships at the synset level, while ConceptNet seeks to undo this (https://github.com/commonsense/conceptnet5/wiki/FAQ). This exercise is to combine both these types of abstractions into one large graph that we can use, allowing us to create graph queries against either network, or create queries that can combine both.

The rest of this post will give a quick overview of WordNet and ConceptNet, the graph model I’ve developer so far, a python script that will generate some import files for Neo4j, and finally some example cypher queries. For those just interested in the python script, you can find it at https://github.com/tomkdickinson/wordnet_conceptnet_neo4j.

Leave a Comment

Twitter Search Example in Python

This post is response to a few comments. Firstly, as some people had requested it, I’ve re-written the example from here in Python. I’ll be the first to admit, my Python skills aren’t fantastic, but I tested it against collecting 5k tweets and it appeared to work.

The library can be found at https://github.com/tomkdickinson/Twitter-Search-API-Python. Similar to the Java one, it defines an abstract class you can extend from and write your own custom implementation for what you do with the Tweets on each call to Twitter.

10 Comments

Scraping Tweets Directly from Twitters Search Page – Part 2

UPDATE The code outlined here won’t work directly with Twitter now as they have updated their source code. Thankfully it only takes a few small changes to modify this script for it to work, which I’ve outlined here.

In the previous post we covered effectively the theory of how we can search and extract tweets from Twitter without having to use their API. This post now deals with an example implementation in Java, with an example git repository that you can improve on, or use yourself. You can find the git repository over at https://github.com/tomkdickinson/TwitterSearchAPI.

17 Comments

Scraping Tweets Directly from Twitters Search Page – Part 1

EDIT – Since I wrote this post, Twitter has updated how you get the next list of tweets for your result. Rather than using scroll_cursor, it uses max_position. I’ve written a bit more in detail here.

EDIT 2 – A useful update to the python version of this script, that allows larger datasets to be collected can be found here.

In fairly recent news, Twitter has started indexing it’s entire history of Tweets going all the way back to 2006. Hurrah for data scientists! However, even with this news (at time of writing), their search API is still restricted to the past seven days of Tweets. While I doubt this will be the case permanently, as a useful exercise this post presents how we can search for Tweets from Twitter without necessarily using their API. Besides the indexing, there is also the advantage that Twitter is a little more liberal with rate limits, and you don’t require any authentication keys.

The post will be split up into two parts, this first part looking at what we can extract from Twitter and how we might start to go about it, and the second a tutorial on how we can implement this in Java.

10 Comments