Twitter is certainly an interesting beast. It has been the poster child of success stories, saving lives, reporting tragedies, over turning governments and letting thousands of people know where you are going for lunch. There is no denying the impact it has on our scenius.
I wondered if there was a way to find out a bit more about how twitter was being used within the local community and if it was even possible to take a smaller stream from the river of data Twitter produces daily. I chose as my local community Iceland.
I did so for several reasons, it is small-only about 300,000 people total, it has a distinct language so it is easy to find people using the only the text search, and it is where I live so any cool information I learn about, I can actually use it.
Any of the following steps can easily be reproduced for your local area, community or group. You’ll just need to find the key to unlocking like-minded people.
RSS not API
The Twitter API has plenty of great features, but to accomplish all of this I am using the twitter search RSS results. There are plenty of libraries in your favourite language to parse RSS, so I won’t go into depth about those nuts and bolts. The key to getting quality data is to formulate very specific searches.
In Icelandic, there are several characters and words unique to the language. In your niche group there is jargon which can be used to identify the people in the know, a sort of 21st century Shibboleth.
To try and find Icelandic Twitters, an easy first pass is to search for purely Icelandic words. This link will return an RSS feed of matching tweets.
<link type="text/html" href="http://twitter.com/username/statuses/12345" rel="alternate"/>
<link type="image/png" href="http://a1.twimg.com/profile_images/1234/profile5_normal.png" rel="image"/>
From this, we can then extract the display names and twitter usernames and plenty of other information about the tweet.
If you try searching for some jargon within your field, you can begin to collect like-minded people.
There is also a feature in the search which allows for language detection. Using the parameter &lang=is it is possible to further filter the search. For English speakers, this might be more difficult than other languages. The system isn’t perfect either, but it can be away to limit search results for terms like “schmuck” in German to jewelry and “schmuck” in English to an insult.
Finally, you can restrict the results to a geographically close group of friends by using the near parameter &near=New+York&within=15&units=mi. You can also specific the distance and units (miles or kilometers). At the moment this is trying to using the user’s Location, but in the future it might also use their new geolocation API. People don’t always list their location, or change it out of solidarity to a cause.
These systems aren’t perfect, but when put together they allow for an easy way to thin down the millions of twitter to a much smaller more manageable size.
Create your own Groups
This is ideal for local affinity groups or video game clans. Want to see what all “EVE Online” player who are in a specific company and live in “New York City” have to say? Well we can search twitter and start to collect matching usernames, index them and search for interesting conversations. Maybe you can make new friends locally beyond the game.
The downside of using search is that you will only find new people as they tweet. Search only goes back 10 days (maybe less as twitter gets larger and larger). There will also be some false positives, but if you select some good, specific search terms, then you minimize the work pruning out mistakes.
I am searching twitter once an hour for any new posts that meet the criteria. Once an hour is enough for me, but depending on the size and frequency of your group, you can poll more or less frequently. As I loop through the RSS results and find username not already in the database, I add them so I have a nice growing list of people. I then subscribe to their RSS and save all the tweets into another table for later analysis.
Find and Subscribe
Over time, I have managed to find 3900+ Icelanders on twitter. That’s over 1% of the population. Twitter does have a limit of 2000 that people you can follow before you need to maintain a ratio of following to followers. This prevent spammers from following thousands of people and wrecking the system.
Once you have the twitter username, it isn’t necessary to follow them via twitter’s functionality, it’s enough to subscribe directly to their RSS feed if they are public. Now it seems rediculous to subscribe to thousands of RSS feeds, that’s a bandwidth hog. So I have offloaded that work to Google Reader. Google has cached versions, so all I need to do is access a single feed in Google Reader which is a composite of all the Twitter RSS feeds I have found. This makes a single RSS fire hose of your groups tweets.
Since I started mining my twitter data, some interesting statistics have emerged. I looked at the last 24 hours to see how many unique usernames had posted. Roughly 10% of the twitter population in Iceland is posting on a daily basis, but how active are they? Well, we can look at how many posts there were in the last 24 hours and divide that into active users and get about 6.5 posts per active twitter account per day.
I also looked at a longer time span of the last seven days. This allows me to get a break down by day to see the activity. As you can see, Sundays are pretty slow, then it picks up again on Monday. Presumably, as people return to work and are back in-front of the computer they get back to twittering. You can see the current stats at http://icelanders.optional.is/
These stats are valuable information if you are trying to spread your message via social networks. Do you send it out on a Sunday where there is less traffic and it is more likely to be seen by fewer eyeballs or on Monday when more people are online, but there is a deluge of other tweets?
I also broke it down further into hour-by-hour slices. As you can see, the early morning hours from midnight until around 8am are pretty quiet. Still some activity, but not the time to expect someone to reply.
Once you are archiving a sub-set of twitter, it becomes possible to search this smaller list. Maybe the types of complains within the group vary greatly than with the general public. Are discussions staying within the niche group or are links and retweets getting outside of the core group? You can even find out how many people within this group are following each other and determine the level of interconnectedness.
Recently, I went through the database and using the Twitter API, found out when each person signed-up. I could then look to see if there was a growth spike or continual steady growth. It turns out there was a spike in early 2009.
Click image to view large version
Right around the time Twitter was being touted to the masses via Oprah, the major local newspaper joined and mentioned it a few times in print. These two, along with other factors really boosted the sign-ups from a few tens of people to several hundred a month! Even now, there is a solid 100+ new Icelanders coming to Twitter each month to see what it is all about.
Building something similar for your group allows you watch for spikes in discussion or sign-ups, look for trending topics within your community, engage in the discussion and find new friends.