Tracking Epidemics on Twitter
I started a little experiment a while ago.
The idea is quite simple: Using TwitterSearch to track keywords for specific disease. Since I’m German first of all I like to apologize for all mistakes pertaining to grammar and spelling (in every post of this whole blog) and second I only searched for German words.
There are a few German words for a cold which are “Erkältung” and “Grippe” I also searched for the German word of fever (“Fieber”) which is a bit problematic. If somebody writes about “Saturday Night Fever” it will also show up… The Idea was to visualize the total counts of those words per day on a timeline. Since I’m not kneedeep into Computer Science and have no clue about using twitters API, I just used very simple tools and very little time. First of all I used OutTwit by TechHit to collect all the tweets into a Search-Folder in my Outlook 2007. OutTwit is a great tool for many reasons. One of those is that it has no restrictions concerning API. As long as your Computer is running and is connected to the internet OutTwit collects the tweets that you want it to collect (even thousands per hour). And of course I used Excel. The fact that you can simply copy all tweets into Excel makes it very easy, even to handle huge amounts of data. You can see the result in this little diagram:
<edit> I guess I still have to learn how to embed pictures (and espescially which photo service to use for it, since it didn´t work with flickr). That´s why I posted the diagram following this post </edit>
But the simple idea of counting something that already happened is not very exciting.
It would be far more exciting to predict what will happen in the future (or at least what is most likely to happen).
To add relevance a few things would be necessary:
Geo-data for every tweet. (if the location is in the profile it may be easy to grab that data by using the API but I don’t know if that’s possible.)- Also a larger group of Twitter-Users would be helpful, otherwise the picture that the data paints will be to blury and not precise enough.
Wouldn’t it be awesome to predict
- where a disease/epidemic is spreading
- how fast it is spreading
- in which direction it is moving.
I really wonder what the results would be if it would be done professionally, using Twitters API and having very sophisticated statistics….
A short while after I started to collect tweets I discovered, that Google has a very interesting approach to tracking disease as well. The launched “Google Flutrends” (see here) They track how people search for flue´s and flue-related terms on Google. Because all the searches are done in real-time and of course the immense processing-power of google makes analyzing the data still very close to real-time, they are ahead of other predictions by 10 – 14 days. For now “Google Flutrends” only works for the US, but I´m sure we will see a lot more useful stuff like that on a global scale in the future.