Those familiar with the Twitter API know the annoyance of limits to number of tweets extracted being at 3,200. For most of us who don’t…
UPDATE The code outlined here won’t work directly with Twitter now as they have updated their source code. Thankfully it only takes a few small changes to modify this script for it to work, which I’ve outlined here.
In the previous post we covered effectively the theory of how we can search and extract tweets from Twitter without having to use their API. This post now deals with an example implementation in Java, with an example git repository that you can improve on, or use yourself. You can find the git repository over at https://github.com/tomkdickinson/TwitterSearchAPI.
EDIT – Since I wrote this post, Twitter has updated how you get the next list of tweets for your result. Rather than using scroll_cursor, it uses max_position. I’ve written a bit more in detail here.
EDIT 2 – A useful update to the python version of this script, that allows larger datasets to be collected can be found here.
In fairly recent news, Twitter has started indexing it’s entire history of Tweets going all the way back to 2006. Hurrah for data scientists! However, even with this news (at time of writing), their search API is still restricted to the past seven days of Tweets. While I doubt this will be the case permanently, as a useful exercise this post presents how we can search for Tweets from Twitter without necessarily using their API. Besides the indexing, there is also the advantage that Twitter is a little more liberal with rate limits, and you don’t require any authentication keys.
The post will be split up into two parts, this first part looking at what we can extract from Twitter and how we might start to go about it, and the second a tutorial on how we can implement this in Java.