Turning Data into Knowledge

Steve Madsen emailed me a few months ago behalf of the NSW Computer Studies Teachers Association, asking if I’d like to run a workshop at the next CSTA quarterly meeting. He didn’t have any particular theme in mind at the time, and indicated that he was happy for me to pick the topic… anything that might be useful to teachers of computing… and he asked that I get back to him with my idea for a workshop. No problem I said.

I thought about what might be useful to a group of computing teachers. They would be a tech savvy group, so what could I possibly share with them? As much as it might sound like a buzzword, it seems to me that there is still an awful lot about the whole Web 2.0 phenomenon that many teachers are still trying to get their heads around, so I thought something along those lines might be useful. I didn’t want it to be too predictable though, and simply talking about blogs and wikis seemed like just a little too… I don’t know… obvious? I started thinking about ways to explore the ideas behind Web 2.0 in a fundamental yet interesting way. Around the same time, I was struck by a couple of websites that do some very Web 2.0 sorts of things, and when looked at in context with each other it became clear that they were tapping into the same fundamental principles in some very interesting ways.

The three sites that grabbed my attention were www.ilike.com, www.43things.com, and del.icio.us. All of these sites shared the same underlying theme of tagging personal data which could then be viewed as a semantic snapshot of the collective consciousness. That seemed like a cool concept to me; this idea of thousands of people all voluntarily submitting many terabytes of content to the web – a massive collection of text, photos, audio and video. More importantly, they were also submitting their opinions and interpretations about that content, and doing it in a way where it could be collated and organised into a broader meaning. Thinking I was being clever, I decided to call the workshop “I Like 43 Delicious Things”.

I emailed Steve back with the idea and he responded by saying that the DET proxy filters might make it hard to do much with that, since they are locked down pretty tight. A little disappointed, I figured I’d mull it over a bit more and maybe some other idea would come to me. However, the next time I heard from Steve he sent me a copy of the agenda for the meeting and there was my original workshop suggestion, listed as a definite thing. Hmm, now I had to make my clever idea actually work.

I sent a couple of emails to clarify the filter situation and it seemed that I might be able to go ahead with the original idea after all, so I started to gather some resources for the workshop. I kinda sorta knew what I wanted to say, but it was all still a bit nebulous in my head. How could I tie it all together so that it made sense to people? (and me!)

It’s funny how things just fall into place sometimes… a few days before the workshop I was still trying to figure out how to make sense of my original idea, and I stumbled across three items that brought it all together for me… one I’d come across before but completely forgotten about, and the other two I’d never seen. When I put these three resources together with the three original websites, it formed a powerful summary of what I felt was going on behind the Web 2.0 phenomenon.

Del.icio.us’s use of tagging to create semantic taxonomies of knowledge was pretty clear to me. The way the tag clouds formed around large collections of bookmarked resources provided a clear snapshot into their hidden meaning. The same concept seemed to apply to the lists of personal goals submitted by people on 43things.com. Lots of people sharing ideas about life goals and forming patterns of collective thought by contributing those thoughts into one place. By tagging and adding metadata to their goals, it formed a “zeitgeist” picture of what the masses were thinking about. Finally, ilike.com tapped into the large store of metadata collected within thousands of iTunes music libraries and brought it all together online to form a collective community of music lovers that were able to share their tastes and suggestions, linking musical tastes and suggestions from the crowd. Three very different sites that all used a common idea of data sharing, metadata tagging and community building.

The glue that held these ideas together was three more things… Firstly, a website which created dynamic tag clouds based on the past 200+ years of US presidential speeches. Chirag Mehta has cleverly been able to delve into the words of America’s past presidents, analyse the frequency and relative importance of their words, and create an interactive tag cloud concept which gives an amazing insight into the way the issues of their day could be seen as a summary of the culture at the time. It was a powerful example of the way existing data can be easily mined for greater meaning.

The second resource was a video called The Machine is Us/ing Us. Although this video has shown up on many education blogs in the last few months, it really explains well why the web is the way it is right now, and how the contribution of user data, tagging, XML and CSS are increasingly responsible for the new web landscape.

The final resource was a video from the TED Talks series called “The Web’s Secret Stories” by Jonathan Harris. In this video, Harris shows a piece of research work (it was more like conceptual art to me) called We Feel Fine. This incredible piece of work needs to be seen for yourself, but I felt it perfectly tied the loose threads together… it was the closest thing I’ve seen to an IT-based system that constantly analyses the random thoughts of the blogosphere’s collective consciousness in near real-time and massages it into a form that is not only informative and interesting, but utterly compelling. You simply must watch the video, then go have a play with the website. It is amazing.

I think most people got something out of the workshop, at least I hope they did. More to the point, I know that I learnt an enormous amount by preparing to share this information with my colleagues. I felt I came away from it with a much deeper insight in the nature of the new web, and in the process got to grips with tools that I had often used but never truly understood. It’s so true that if you want to really understand something, try teaching it to someone else.

Tags: , , , ,

Equity, Dignity, Respect.

I once worked with a very nice Vice Principal. He was a charming fellow and I enjoyed working with him. In his role as VP however, he was required to be pretty strict with the kids… and he did a great job of it. His role was to uphold the rules and policies of the school and he did it with a certain authoritarian gruffness and bulldog-like tenacity. He seemed to work on the idea that if you repeated the rules often enough then the kids would eventually do the right thing (or at least have no excuse for not knowing what the right thing was!)

Every morning, he would get on the school’s PA system and reiterate the rules to the kids. And he would always, always, always finish his PA address with the phrase “Have a great day and remember to treat everyone with equity, dignity and respect”. It was something of a catchphrase for him.

The thing about this approach to repeating the rules so often is that the kids start to just tune out. I asked them one morning in homeroom whether they actually listened to what the VP was saying and they said they didn’t, they just sort of tuned out and didn’t really listen at all. We discussed this for a couple of minutes and I jokingly said that perhaps if he did it as a rap instead they might take more notice. Well, I should know better than to joke about things like that… the next morning I decided to hold my Mac up in front of the PA speaker and record the announcement, which just happened to be a real beauty outlining the sort of clothing the kids were allowed to wear on their civvies (mufti) day the next day.

I dragging the audio file into GarageBand and had a play with it for a few hours. The resulting tune became somewhat of a classic around the school, with many teachers and kids asking for a copy of it. I never did release the actual digital file of it though, because I was a bit concerned with it getting out “in the wild”, so to speak. However, since that was over a year ago and in a completely different country, I figure I may as well put it out there now.

So, for your listening and dancing pleasure, here is Equity, Dignity, Respect.

Equity, Dignity, Respect.

Just click the Audio MP3 button above to listen, or grab your own copy from my Box.net widget. Enjoy.

Killing Spam

Spam is an absolute scourge. I don’t understand why people do it, but then I’ve never really understood why people spray graffiti on walls, or write viruses either. I guess some people just get a kick out of being a bloody nuisance.

Of course, spam is a little different in that there is money involved. Big money apparently. If you send enough emails out about methods to enlarge your p3nis or buy p0rn and viag4r4 or whatever else spam tends to focus on, there are apparently enough stupid and gullible people in the world that someone, somewhere can make a comfortable living off their stupidity. It still amazes me that people respond to these messages in any way whatsoever, but apparently they do. The best way to deal with spam is to completely ignore it – don’t read it, don’t respond to it, don’t acknowledge it… just totally ignore it.

For the first time ever I feel like I’m winning the battle against spam, so I thought I’d share how I’ve managed to arrive at this point.

Firtsly, my Australian ISP (Optusnet) provides spam filters at their mail servers, and I have these enabled. I get a monthly report emailed to me from my ISP, and I’ve always been surprised at just how much mail I get flitered. For the past year or so since I turned the filters on, approximately half of the roughly 3000 messages I get each month have been identified by Optusnet as spam. The percentage floats around the 50% mark, although I’ve seen it as high as 57% spam. Interestingly the most recent report said that only 37% of my mail was spam so perhaps things are improving, or maybe spammers take a break over Christmas?

So, what of the remaining 1500 or so messages that arrive at my mailbox? The Optusnet filters are a good start, but they are certainly not foolproof. I would estimate that about 60% of the remaining messages I receive are still spam. I tried creating some basic filtering rules within Entourage to catch the worst stuff, and it certainly helps, but things still get through.

I get a lot of mail from email lists and these are fairly safe messages so I filter these immediately into folders for later reading.

The remaining mail has filters applied that do things like identifying any messages that were sent to the Optusnet domain but do not start with my username. This kills off most of the mass mailout stuff. I have a few other tricky filters that try to avoid the most obvious spammy stuff, but I was still getting more junk coming into my Inbox than I really wanted.

Then I discovered an amazing little tool called Spam Sieve. Spam Sieve is for the Mac OSX platform and uses a complex mix of safelists, whitelists, blacklists, Bayesian classification and intelligent heuristic scoring analysis to make some incredibly subtle and refined decisions about what comprises a spam message. It looks at word counts within the corpus of my messages and decides statistically what a spam message looks like.

The really neat thing about Spam Sieve is that it learns to make decisions based on MY actual mail flow and at the moment it’s running at 97.4% accuracy in identifying spam. ISP filters can only do so much because they are making blanket decisions about spam messages according to some fairly general rules that suit all users, but Spam Sieve is able to make constantly updated decisions about spam that is actually arriving in my mailbox, giving a far more nuanced view of what a spam message looks like.

On the few occasions when it makes a wrong decision, a simple keystroke lets me teach Spam Sieve which messages were actually spam and the software learns from its mistake, relegating the messages to the spam folder where they belong. Just to make sure the creature is dead, I also set up a mail rule in Entourage that automatically empties the Junk Mail folder every 5 minutes. Begone foul spam!

It’s a $30 purchase but the best $30 I’ve ever spent. I deal with a lot of mail, and I haven’t seen a single spam message in weeks. The bottom line is that email has actually become pleasant to use again.

I’ll also say a nice word for Microsoft Entourage for the Mac which , apart from being a little slow under Rosetta, is probably the best mail client I’ve ever used. I can hardly wait for the Universal version!

There are probably similar solutions for Windows users. Maybe someone could leave a comment if you know of anything, or if you have any good spam coping strategies that you would like to share.