Scraping Tweets Directly from Twitters Search Page – Part 2

UPDATE The code outlined here won’t work directly with Twitter now as they have updated their source code. Thankfully it only takes a few small changes to modify this script for it to work, which I’ve outlined here.

In the previous post we covered effectively the theory of how we can search and extract tweets from Twitter without having to use their API. This post now deals with an example implementation in Java, with an example git repository that you can improve on, or use yourself. You can find the git repository over at

Scraping Tweets Directly from Twitters Search Page – Part 1

EDIT – Since I wrote this post, Twitter has updated how you get the next list of tweets for your result. Rather than using scroll_cursor, it uses max_position. I’ve written a bit more in detail here.

EDIT 2 – A useful update to the python version of this script, that allows larger datasets to be collected can be found here.

In fairly recent news, Twitter has started indexing it’s entire history of Tweets going all the way back to 2006. Hurrah for data scientists! However, even with this news (at time of writing), their search API is still restricted to the past seven days of Tweets. While I doubt this will be the case permanently, as a useful exercise this post presents how we can search for Tweets from Twitter without necessarily using their API. Besides the indexing, there is also the advantage that Twitter is a little more liberal with rate limits, and you don’t require any authentication keys.

The post will be split up into two parts, this first part looking at what we can extract from Twitter and how we might start to go about it, and the second a tutorial on how we can implement this in Java.

Hello Internet!

Hello there, welcome to this here blog! My name is Tom, and this blog is meant to be a nice way to both disseminate some of my research, and pass on any useful technical knowledge. While I doubt you’ll be very interested in me, I’ve added a bio on the About Me page, so feel free to read that if you want a more in depth overview of who I am. In short, I’m a PhD student researching ways we can extract stories from peoples social media profiles, with previous experience as a software developer and an undergrad in Computer Science.

Hopefully this blog will be both useful and interesting to people and is aimed at fellow researchers, and developers a like. Besides blogging about my research, I’ll also be providing some, hopefully, useful overviews of any techniques I’ve picked up over the past 4 and a half years since I originally graduating. These might vary from posts on data extraction to using common frameworks like Spring, or build scripts such as Gradle. If I think something I’ve done is useful to the research and development community in general, I’ll probably write a blog post about it!