Skip to content

Extracting a Users Twitter Timeline Above the 3.2k Limit

Those familiar with the Twitter API know the annoyance of limits to number of tweets extracted being at 3,200. For most of us who don’t tweet that often, this isn’t an issue, but sometimes we might find we need to extract more tweets.

Following on from my previous post (many months I know, but I’ve been busy with PhD work), Twitters indexing upgrade actually gives us the opportunity to extract more tweets than this, if the users twitter feed is public.

Using the same methodology as previously stated, we can search all original tweets that a user has made using the query:

from:screenname

So for example, to extract all tweets I’ve ever created, you can use from:tomkdickinson.

However, the major restriction with this method is it does not include retweets. Even including the “include:retweets” parameter does not seem to change this.

6 Comments

  1. Hi Tom,

    Thanks for your sharings. I am amazed by the new method you provided in the posts on scraping Twitter. However, when I tested your Python script to scrape some users (like @PurdueToday and @JustinBieber), whose total number of tweets are over 3.2k, the scraper usually just stop scraping when around 3.2k tweets were collected. Have you run into this issue?

    Thanks,
    Edward

    • I tested the search for “from:justinbieber” and extracted a total of 8898 tweets. This is quite a bit shy of the 29k tweets his stats mention, however as I mentioned this method does not collect retweets. The last tweet I collected was:

      8898 [2009-05-12 04:27:20] – Check out my single “ONE TIME” on my myspace and spread the word for me. Thanks http://www.myspace.com/justinbieber

      Which other services like “https://discover.twitter.com/first-tweet#justinbieber” seem to confirm is his first Tweet.

      If you’re not scraping more than 3.2k, check you haven’t added a limit to the example I added:

      twit = TwitterSearchImpl(0, 5, 5000)
      twit.search("from:justinbieber")

      Where the 5000 is a max limit to collect. I forgot about this myself when testing, and spent 5 minutes wondering why I could only collect 5k tweets (Doh!)

      If that’s not the issue, then you could try logging all your requests in execute_search method. When the script exists, check the last url it loaded and open it yourself to check there wasn’t a random issue where Twitters server returned a 404 or empty message as it was under pressure from a lot of requests.

      Finally, this is completely speculative, but wasn’t there a news article the other day about Twitter blocking tweets based on geolocation? I have no idea or proof this is an issue with this, but if you’ve got a VPN service, you could always try running the extractor via a different country to see if you get more results.

  2. rtwrtw8 rtwrtw8

    Hey I was wondering where in the original code do you make the query change “from:screenname”? Sorry if this is an obvious question, and thanks!

  3. rtwrtw8 rtwrtw8

    Sorry I figured out where you modified the query but now it appears that it will only pull the first 33 tweets before stopping… I think you found this error in your python version – I was wondering, what was the solution?

  4. rtwrtw8 rtwrtw8

    Hey ignore my previous 2 comments, I figured it out; you are a god.

    Thanks!

Leave a Reply