The Onion: “Upwards of 66% of our server time is spent on serving 404s from spiders crawling invalid urls and from urls that exist out in the wild from 6-10 years ago.” That’s pretty crazy.
Month: March 2010
Pretty cool stuff: dataset cleaning up in gridworks.
Just write in HTML
Mark Pilgrim just writes in HTML now: “I self-published “Dive Into Python” in HTML, PDF, Word, and plain text. For years, there they sat, a list of downloads in different formats. Then I looked at my logs and realized that very few people ever downloaded it at all, and those that did mostly downloaded the HTML version.”
Facets and tagging: whatever happened to innovation?
I wonder what happened to the innovation in tagging. The stuff I did with Mefeedia was somewhat innovative I think. Here are some screenshots (it’s no longer live):
We organized tags into facets (see above, this just took a few hours of organizing the top 1000 tags), and then built an inference engine (which was pretty easy to do):
It worked like this: if the tag “josh leo” (a person) is used together with the tag “new york”, we can infere that Josh has been to New York.
Which goes to show that with very little metadata you can do a lot of cool stuff.
“The real culprit, the real cause of their economic problems isn’t the Internet, it isn’t the wires that connect computers. It’s the under-$100 terabyte hard drive.”
Chilean Quake May Have Shortened Earth Days and ever-so-slightly shifted the earth’s axis :)