You are what you Tweet

Someone once said to me that if you do something once, it’s an accident. Do it twice and it’s a coincidence.  Do it three or more times and that’s just the way you’re living. The underlying message is that if you repeat something enough, then the patterns of use start to tell their own story. Your repeated activity starts to build up into a pattern of use and looking at those patterns can often give insights into the activity that are not apparent by looking at the individual instances of the activity.

This idea of allowing data to “rest where it lays” and deriving insights from it is essentially the idea behind tag clouds, whose patterns reflect repeated use of words, tags, keywords or ideas.  If you look at someone’s Delicious tag cloud and see the patterns emerging in the form of highlighted, emphasised words, then you see a clear indication of what interests that person.  The more they bookmark using tags, the more evident their interests.  The numbers don’t lie when there are enough of them.

if you aggregate enough tag clouds you start to get an insight into the “patterns of the patterns” – you see not just the interests of individuals emerging, but the interests of the group. This is the whole notion of a folksonomy, and it taps into the fascinating concept of the “wisdom of the crowds”.  Data, especially when you have enough of it to form reliable patterns, starts to become very interesting.

In the same spirit, I was a little intriugued by a twitter app I saw today, called TweetPsych.  TweetPsych looks at the contents of your last 1000 messages on Twitter, analyses the words you use and the way your sentences are constructed, and tries to draw conclusions about what you do, what interests you, and what sort of person you might be – psychologically speaking.  I’ve no idea how accurate it might be, but it’s an interesting idea. I’ll be honest and admit to you that I have absolutely no idea what they really mean, but here’s my results anyway…

Regardless of whether TweetPsych is accurate and up to scratch just yet or not, I think it signals an interesting development in what is sure to become a much bigger deal.  The notion that some level of machine intelligence can be derived from an analysis of massive amounts of our online footprints.  We are all leaving massive amounts of data behind us as we trawl around the Net, and somewhere in that trail of data there are machines piecing together an accurate picture of us… what we like, where we go on holidays, who we talk to, what our preferences are, and so on.  It’s not a new idea – Google’s entire advertising strategy is based on the concept of knowing more and more about you – but seeing TweetPsych’s attempt at psychoanalysing me from these 140 character snippets of my thoughts just threw it into a new light.

Let’s just hope that this data can be put to use in positive, creative ways that help enhance our lives.

Technorati Tags: , , ,

CC BY-SA 4.0 You are what you Tweet by Chris Betcher is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

9 Replies to “You are what you Tweet”

  1. I think this makes a valid point. SOmetimes it seems as though we don’t really think about the trails we leave behind us, but every thing we do is one more step leading back to us. I think it is great that there is technology to do that but like you said, let’s just hope it is used for a good cause because you never know when the wrong person might find one of your footprints.

  2. We are what we tweet and we is what we blog. I agree Chris. We are indeed leaving footprints. I wonder about sites like MySpace and Facebook. There is seemingly a lack of control and ownership there. Blogs such as your own enable you to exercise more control. With Facebook and MySpace you can be at the mercy of your ‘friends’.

    Yet, let me get to the point. I did the TweetPsych test and I am rather happy to say that I am like John Cleese.
    That has made my day.

  3. Hi John,
    I agree completely about the personal blog vs Myspace/Facebook thing. Interesting that my 13 year old daughter started a Blogger blog in preference to a Facebook account like all her friends. Her reasoning? She said she didn’t want a page that other people had more control of than her. She prefered a blog because she could decide what went on it. Clever girl that daughter of mine!
    John Cleese eh? Can you do the silly walk? 😉

  4. To tweet or not to tweet, that is the question. (What would ol’ Will of the Shake say if he were alive today? Why use Twitter? I find it fascinating, primarily because I don’t feel the need to let everyone know what I’m doing or thinking. I have a Facebook page, a sort of hunters and collectors (not the band) anthology, and a MySpace page, that I use to promote my own music. Do they reveal anything about the real me? I would have to argue no……..
    Obviously an opinion / picture could be formed about me based on what those sites contain, but since I control that, it ensures that my real life remains just that.

    On a lighter note, of themselves, patterns can become interesting, but in the end what do they really reveal.
    If you were to look at my blogs, my tag clouds only reveal what I want to use for search bait and don’t indicate anything about my actual interests. I try not to repeat tags entered on posts, but damn, when I look at that tag cloud, there they are like ghosts that haunt, evidence of repeated usage. I sit back and review my own processes……..what kind of footprint am I leaving behind, will someone construct a fictional persona and ascribe it to me based on the bait I use? Do fish ascribe the same to fisherman?
    What you could argue is that you are looking at patterns of communication rather than any data that could facilitate an in depth analysis of the personality. If I were to use Twitter, could valid inferences be drawn from my tweeting? I would be so tempted to make it completely fictional, and perhaps there is an element of the fictional in the reality of the process, after all don’t you twitter only what you want other people to know? That leaves the remainder of what is unsaid as the hidden person. I can’t gauge anything about Chris from his twitter trail. I can only see as laid track.
    As the creator of TweetPsych emphasises in the disclaimer, it’s primarily for entertainment purposes. Which in itself begs the question; can we separate entertainment from reality? And do we believe our own fictions, even in the face of the primacy of the empirical world?
    Am I leading you up the garden path?

    1. Hi Gary,

      I think that you and I probably use Twitter quite differently. And despite your claim that you only show “bait” in your online profile, I’m pretty sure that, over time, I could build up a much better idea of who you are and what your interests are than you suspect… if I took the time to study your Twitter feed, your blog, your Delicious links, your Facebook page, etc… I think the picture it leaves behind of who you are might be more descriptive than you let on.

      Thanks for the comment.


  5. In the interests of fleshing out the ‘tag cloud’ issue, here is a little something to consider

    “We who make websites must strike a fine balance between guiding our users and allowing them to lead us. We listen but we also synthesize and invent. We conduct user research but we interpret the results. We ask what users want but we decide what they are really telling us — and we, not they, determine how best to fulfill the needs they didn’t necessarily realize they were articulating.

    Tag clouds remove the guidance and artistry from our side of the equation, offloading all the work to our users. What’s popular? What’s important? Users decide. This might be okay if the process did not create a false intellectual equivalence between high- and low-level topics, and if it did not skew toward popularity at the expense of findability.

    The idea behind tag clouds is that users know best. Their actions determine how other users navigate. Their choices leave a trail. Typically, though not always, the “important” topics get big while those considered less important (which in this case only means less popular) get small. Once they get small enough, they disappear.

    In Flickr and Technorati, users create their own tags (“design,” “cats,” “California”). When enough people have used the same tag, it begins to show up in the cloud. Once a lot of people have used it, it becomes a visually dominant element, encouraging others to click it — and subtly discouraging them from creating their own tags.

    As tag clouds come to replace expert taxonomies in common practice, carefully constructed hierarchies vanish. In their place is a flattened world where every idea, at any level, is a topic as worthy as any other. Eight Mile is a topic at the same level as Detroit, which is a topic at the same level as Cities, which is a topic at the same level as United States, and so on.

    Instead of a hierarchy based on user-centered classification systems, the tag cloud “hierarchy” is based on raw usage. If several citizens of Detroit view a collection of photos tagged Eight Mile, upload their own photos of that street, and tag their photos Eight Mile, then Eight Mile becomes an important — and visible — category. If no one visits what would ordinarily be a “master” topic page such as Cities or United States, then those master categories shrink in size until they are invisible.

    The intellectual problem is that tag clouds create a data world where subtopics are detached from their parents; where the very notion of parent/child relations no longer exists. The counter-argument is, who cares? If everyone digs Eight Mile, let’s make Eight Mile easy to find. Instead of relying on humans to mine the data every three months and have long tedious arguments about how to update the navigation, let’s allow software to do it in real time, based on actual user behavior. Let the process create the music. There is merit to this view, especially on the community sites from which it sprang. (There is no merit to it on single-author sites, where one person creates all the content and all the tags. If you don’t have a clear purpose for your site, who does?)

    The less brainy and more pressing problem is that with tag clouds, topics either gain immediate, widespread traction with the public, or they disappear from the cloud. Once they disappear, it is as if they no longer exist. Few users will ever find them. Network effects being exponential, what is immediately mildly popular quickly becomes artificially very popular, while what has yet to become popular never will be.

    In an ordinary IA structure, if a photo site contains pictures of Istanbul’s Taksim area, a user can find those pictures by clicking through a taxonomy based on the way folks look for such stuff (Turkey: Istanbul: Taksim). What are the odds of finding Taksim in a tag cloud? Unless the site is devoted to Istanbul nightlife, it’s unlikely that any user will ever find those photos, because they will not be popular enough to show up in the cloud. If the site’s goal is to let only the most popular stuff float to the top, then tag clouds work like James Brown. But if its goal is to offer a better way of letting users find any content they desire, then tag clouds are as wrong as the Patriot Act.

    The same problem plagues any web content mining service powered by popularity. Popularity sometimes promotes quality but it is often a finder of mindlessness: extreme leftist or rightist rants, passed-out co-ed photos, embarrassing videos of people who can’t dance trying to dance and people who can’t sing trying to sing.

    Every blogger knows of a half dozen services like Blogdex or Daypop that list “hot” posts in the selective ring of small publications some of us inaccurately choose to call “the blogosphere.” A post becomes hot when two people with somewhat visible blogs link to it. Once it appears in Blogdex or the Daypop Top 40, a hundred more bloggers will link to it, either because it interests them or just to signify their membership in the tribe.

    Thanks to the exponential nature of such linkage, our lucky post soon has 500 links. Some people link to it without even reading or looking at it, simply because a trustworthy blogger like Kottke linked to it first. Less fortunate articles and discussions wither and die, unnoticed.

    Tag clouds harness all that mindless accidental randomness and make it the driving engine for navigating deep, ever-expanding content troves. Older ways, based on library science, undoubtedly suffer from the disadvantage of not being new. But they help people find what they need. And that is what navigation should do.”

    1. Hi rewired,

      After having read Dave Weinberger’s “Everything is Miscellaneous” I have to say I completely disagree with your assertion that traditional hierarchical navigation can possibly be the right solution to coping with the amazing miscellany of our world. There are lots of obvious flaws with “library science” and almost no perfect examples of such systems doing a comprehensively good job of dealing with massively miscellaneous data. The Dewey Decimal System is a complete joke when you really look at how it works. Taxonomy systems like Carl Linnaeus’s do a great job of categorising the obvious, but a pretty lousy job of dealing with the obscure, and the more random or miscellaneous the information the harder it is for traditional categorisation systems to deal with it.

      The flaw in this argument is in thinking that tag clouds are primarily a navigation tool… they shouldn’t really be used as that. They DO give interesting insights into massive amounts of data, and they CAN reveal interesting patterns that may or may not have been otherwise obvious, but they aren’t really meant (in my opinion) to be a primary source of navigation. Tags ARE really good at adding hooks to data that might otherwise be hard for search engines to lock onto – pictures, links, videos, etc – and therefore they help make the unsearchable searchable. In your example of pictures of Taksim, tags would be an ideal way for search technologies to latch onto relatively obscure data.

      Sites that rise briefly to popularity after getting Dugg might have a short term win, but they quickly fall back off the radar. The real point is that without the ability for the public to draw such sites out of the crowd, they would get lost in the blur and never get any visibility. The tags in this case give visibility, no matter how briefly, to things that people DO find interesting. If Eight Mile is interesting to people, why shouldn’t it have its moment in the sun? It’s not like Detroit is not available – search will still find things that people really need – but the folksonomy will expose things that are not so obvious.

      Thought provoking comment, thanks for posting it.


      1. Hi Chris
        I have to apologize for not following net protocol and failing to correctly reference the quoted ‘part post’ from Jeffrey Zeldman. Had I done so the text would not have been attributed to me….hence ,”I have to say I completely disagree with your assertion that traditional hierarchical navigation can possibly be the right solution to coping with the amazing miscellany of our world.”should be directed to Zeldman
        Like you I do not necessarily agree with all of Zeldman’s assertions, I simply placed the referenced text as a post to stimulate conversation on this very interesting subject. I did place the text in quotation marks and provide the web link to Zeldman’s article, but perhaps I should have been clearer on this.
        You could follow this up by referencing the full post of that was quoted in part……

        Remove Forebrain and Serve: Tag Clouds II

        Which is prefaced by

        In “Tag Clouds are the New Mullets” (Daily Report, 19 April 2005), I claimed that the weighted tag clouds meme popularized by Flickr and Technorati was about to cross a permanent shame threshold because of overuse. My comment suggested that the only sin of tag clouds was popularity. But the problems with tag clouds run deeper.

        All the best

Comments are closed.