Data visualization is cool. It’s also becoming ever more useful, as the vibrant online community of data visualizers (programmers, designers, artists, and statisticians %u2014 sometimes all in one person) grows and the tools to execute their visions improve.
Jeff Clark is part of this community. He, like many data visualization enthusiasts, fell into it after being inspired by pioneer Martin Wattenberg’s landmark treemap that visualized the stock market.
Clark’s latest work shows much promise. He’s built four engines that visualize that giant pile of data known as Twitter. All four basically search words used in tweets, then look for relationships to other words or to other Tweeters. They function in almost real time.
„Twitter is an obvious data source for lots of text information,“ says Clark. „It’s actually proven to be a great playground for testing out data visualization ideas.“ Clark readily admits not all the visualizations are the product of his design genius. It’s his programming skills that allow him to build engines that drive the visualizations. „I spend a fair amount of time looking at what’s out there. I’ll take what someone did visually and use a different data source. Twitter Spectrum was based on things people search for on Google. Chris Harrison did interesting work that looks really great and I thought, I can do something like that that’s based on live data. So I brought it to Twitter.“
His tools are definitely early stages, but even now, it’s easy to imagine where they could be taken.
Take TwitterVenn. You enter three search terms and the app returns a venn diagram showing frequency of use of each term and frequency of overlap of the terms in a single tweet. As a bonus, it shows a small word map of the most common terms related to each search term; tweets per day for each term by itself and each combination of terms; and a recent tweet. I entered „apple, google, microsoft.“ Here’s what a got:
Right away I see Apple tweets are dominating, not surprisingly. But notice the high frequency of unexpected words like „win“ „free“ and „capacitive“ used with the term „apple.“ That suggests marketing (spam?) of apple products via Twitter, i.e. „Win a free iPad…“.
I was shocked at the relative infrequency of „google“ tweets. In fact there were on average more tweets that included both „microsoft“ and „google“ than ones that just mentioned „google.“
So then I went to Twitter Spectrum, a similar tool that compares two search terms and shows which words are most commonly associated with each term and which words are most commonly used in tweets with both terms. Here’s the „google, microsoft“ Twitter Spectrum:
I love that the word „ugh“ is dead center between Google and Microsoft. But the prominence of social media terms on the blue side versus search terms on the red side is fascinating. It looks like two armies marching at each other ready to fight different wars.
Clark has also created TwitArcs. This one, I feel, is still a work in progress and Clark says „visually I like it but it might be the least useful so far.“ In this case, you type in a tweeter’s handle and it returns a stream of that person’s tweets with arcs that link common words between tweets (on the right) and common retweeters (on the left). Rolling your mouse over highlights the last tweet in the arc. Here’s a TwitArc of @timoreilly:
Finally, the Stream Graph. Enter a search term and Clark’s engine returns the frequency of the most common words found with your search term for the last 1,000 tweets. You see a literal flow of conversation. You can also highlight one term to see how its frequency changed over time and you’ll see the most recent tweets that include both your search term and that highlighted term.
Sometimes 1,000 tweets with your term may span weeks. For my search term, „Tiger Woods“ which I entered yesterday afternoon right after news that he’d speak publicly broke, 1,000 tweets covered about 20 minutes. Here’s the „Tiger Woods“ stream graph with „silence“ highlighted:
It isn’t hard to imagine how this may be applicable to business. I can already see eager marketers watching the stream flow by as their commercial debuts during next year’s Super Bowl.
Clark, like many data visualizers, believes we’re on the front end of a revolution in information presentation. „There’s a lot of work done called scientific visualization or business intelligence graphics,“ he says. „And it’s pragmatic, trying to solve practical problem. It’s all standard, a bar chart or pie. But those standard ways are not adequate when you’re trying to mine a richer data space. The world is full of complex data and we’re just starting to get the tools to make sense of it. We’re looking for new ways of presenting data.“
Data Visualisation, using Twitter Data as a source…