February 23, 2010

Local data vs Localized data: When are they the same?

compass rose
Photo courtesy of future15pic

I've been thinking about geolocated status messages a bit recently and have just started to look into studies that have been conducted about such data.

This started after I had seen some pretty cool visualizations of Facebook data that was geocoded. What the visualizations showed was a globe. Emanating from the globe were icons that corresponded to events that were happening around the world (on facebook) which were also associated with a latitude and longitude. It was amazing to see the activity occurring visually. As you watch you can see people waking up with the sun and making friends across the globe.

After a few minutes though I began to wonder if any of this mattered. It was certainly cool to see something like this, because basically it had never been seen before, but I began to wonder if the location from which the data was generated really mattered.

So I did a little follow-on experiment. I looked into the Twitter API and ran an experiment over a week. For one day I looked at all the geo-tagged tweets coming from the UCI campus (generously defined). Then over the course of 6 days I looked at all the tweets coming from the U.S. Capital and about 4 blocks in all directions.

I was very surprised to see that only 2 tweets emanated from UCI's campus and a similar number came from D.C. So about 2 tweets per day for a large geolocated area. The tweets were not very local either. In UCI's case, one was a tweet meme about Toyota's recent recall problems and the other was a Four Square check-in. In D.C.'s case the results were similar.

This was a surprise to me because I was led to believe (by the media, I suppose) that there is a flood of geotagged tweets arising around us. If only you could tap into this stream you would be amazed at the richness of information around you. That turned out to just not be true at all.

Now, I've begun to think of geotagged data more like email. Why would you expect that the location where you sent an email from was correlated with the content of the email? Certainly some of it is. And certainly some of the content is being made because of the location. But it isn't like the content is all about the location. As if every email that I sent from my office was some description of my office.

I believe the same pertains to Twitter, Facebook etc. Generally speaking, there isn't any reason to think that geotagged status updates are related to the content any more than you would expect email to be related to the location from which it is sent.

My attention to the issue has caused me to read to papers recently about the issue. The first is "Where We Twitter" by two students in my department, Sam Kaufman and Judy Chen. It isn't a huge contribution, but it makes the point that there are some use cases for good local data and they brought out the distinction that I just described above by naming two types of information, "localized data" and "local data". I was mostly complaining about localized data above. That is data which has a place where it was generated but which is not necessarily about the place. This is in contrast to "local data" which is about a location, but wasn't necessarily authored in that location. Geotagged tweets conflate the two types of information.

The second paper was more rigorous and was published in CSCW 2010, called "On the 'Localness' of User-Generated Content". In this paper, the authors, Brent Hecht and Darren Gergle study how much local data is generated from near the same location. The high order take away bit was that there was a lot more alignment in Flickr's data of local data being localized at the same location. Wikipedia in contrast does not tend to have nearly as much local data generated near the location. A simple, but interesting explanation is that the easiest way to localize Flickr photos is to take a photo with a camera that automatically tags it with a location. So entering local data in Flickr requires you to actually be at the location.

Posted by djp3 at February 23, 2010 10:02 AM | TrackBack (0)

I think a third takeaway from our (super brief) paper is that, while not every email sent from your office is about your office, almost every email sent from your office is, in some sense, about you. We take that idea out onto a ledge by saying that, for instance, tweets sent from the café, or West Hollywood, or the gun range are about the group of people frequenting that place.

Maybe. We have some algos and visualizations in the works that should help us figure it out.

(commented from my iphone, Flame Broiler, irvine, ca, USA)

Posted by: Sam at February 23, 2010 1:18 PM
Post a comment

Post a comment