The Onion: “Upwards of 66% of our server time is spent on serving 404s from spiders crawling invalid urls and from urls that exist out in the wild from 6-10 years ago.” That’s pretty crazy.
Pretty cool stuff: dataset cleaning up in gridworks.
Mark Pilgrim just writes in HTML now: “I self-published “Dive Into Python” in HTML, PDF, Word, and plain text. For years, there they sat, a list of downloads in different formats. Then I looked at my logs and realized that very few people ever downloaded it at all, and those that did mostly downloaded the HTML version.”
I wonder what happened to the innovation in tagging. The stuff I did with Mefeedia was somewhat innovative I think. Here are some screenshots (it’s no longer live):
We organized tags into facets (see above, this just took a few hours of organizing the top 1000 tags), and then built an inference engine (which was pretty easy to do):
It worked like this: if the tag “josh leo” (a person) is used together with the tag “new york”, we can infere that Josh has been to New York.
Which goes to show that with very little metadata you can do a lot of cool stuff.
Chilean Quake May Have Shortened Earth Days and ever-so-slightly shifted the earth’s axis :)