I realized today that the apps I use everyday are all 1-page websites. Gmail. Bloglines. Google search. Actually, Google is the king of the 1-page websites – almost all their products consist of only 1 page.
Twitter is 1-page, because there is 1 page that you spend 90% of your time on. Flickr is multipage, although it’s main function (watch photos) is 1-page. Digg is 1 page. Mmm…
You know this problem in IA when you design sites without real content, and before you know it there are loads of excerpts all over the page that don’t really mean anything, ending in 3 dots “…”? It leads to a homepage like this one for example, just lots of excerpted content that doesn’t really do much for anyone.
I got a word for that. Excerptitis. Maybe you have a better one?
April was not only the hottest, the dryest and the warmest April ever in Belgium, but most likely (if there’s no rain tomorrow) it will also be the first month every without any rain at all.
So far so good, life in Belgium :) The weather has been incredible.
I always love to read scaling discussions, especially about popular web apps, and there are loads of them out there. Here’s my overview of the best. By the way, the best book on scaling apps I’ve ever read is Building Scalable Websites, by Cal Henderson (the Flickr guy).
It’s dog-eared on my desk, and taught me about sharding (which I used extensively for mefeedia). Sharding is when you cut a really big table into pieces, so you can put those on separate servers. It means you have to make changes to your code, and your database isn’t so database-y anymore, but it works. For example, online games use sharding to grow their virtual worlds, because there’s no way they could serve all that information from 1 db cluster.
Scaling Twitter with Ruby.
Twitter is hot today, and they ran into some serious scaling problems, although the app itself is quite simple. It consists of messages of maximum 140 characters. Lessons are the same as most apps: Memcache like crazy, and optimize the database (the biggest bottleneck most of the time).
Also, Ruby on Rails scales pretty much the same way as PHP and other similar languages: shared nothing architecture. Shared nothing means that there is no 1 thing that is shared by all servers, since that would become a bottleneck.
PHP, for example, has shared nothing architecture out of the box, except perhaps for sessions, but that’s easily solved by storing sessions in a db (which then has it’s own scaling approach) and not in the filesystem. Here’s a talk by Rasmus Lerdorf that explain scaling with PHP5. (Here’s the mp3 audio recorded by Niall Kennedy).
One of the problems you get into when scaling something like Flickr where you store LOTS of stuff, is that you can’t just store that on a harddrive anymore: it’s not big enough. Apart from just using Amazon’s S3 service (which rocks – I used it for mefeedia and I know lots of startups who use it), there are other solutions. A good presentation of that by Cal is this one:
Cal (he’s a busy dude) also made this presenation about scaling web apps, generally:
John Allspaw (flickr plumbr) also has a good presentation about scaling Flickr:
Scaling LiveJournal.
LiveJournal was one of the first social networks, before that word meant anything, and they’ve partly invented how to scale standard php/mysql/apache apps. They developed memcached, which is now used by almost anyone who wants to scale their site.
Brad Fitzpatrick has a good set of slides on how they evolved the service, here’s a PDF version. And here’s the slideshow embedded:
Kevin Rose mentioned this was “the bible for scaling Digg” – and I think quite a few other web apps are based on this.
Six Apart.
The livejournal guys with all their scaling expertise were acquired by Six Apart, and they soon launched Vox. And of course, here’s a presentation on making Vox scalable:
Bloglines.
Bloglines’ scaling problems where slightly different from your average web app, since they are an aggregator of feeds. That means they have billions of blogposts they have to keep and serve to users, and that creates its own scaling problems. The Bloglines approach was to, instead of using a database, just store all that stuff in a special filesystem. Today it’d be easier to do this since there are a few filesystems that do that, or you could just go with S3 again. Mark Fletcher (who also sold Onelist to Yahoo which is now Yahoo Groups) has given a few talks on scaling Onelist and Bloglines: here’s the mp3 audio version, and here’s the PDF of that talk. And a text transcript.
Last.fm
Last.fm is one of the aggregation-type apps: they gather a lot of data about what music you listen to. Similarly to Bloglines, that causes it’s own scaling problems:
Slideshare.
All the slides in this post are hosted by Slideshare, an incredible service by my fellow information architect Rashmi Sinha and team. When I found out about the project, I emailed her: “brilliant and so obvious once you think of it”. Like many startups, they use S3 to serve their content, and they have the obligatory yet interesting slides to explain how:
I haven’t linked to lots of good thinking about scaling, or to technical resources and stuff. But the presentations should get you going in the world of memcached, perlbal, nothing shared and federation :) Enjoy!
Another great talk in video this time, from the MySQL Bay Area Community Meetup, May 2007:
Finally, Dan Pritchett has a good presentation on scaling eBay (PDF). 26 Billion SQL queries per day! 300+ new features per quarter! 4 architecture versions since 1998 and some pretty crazy scaling of the search.
A talk on Youtube scalability: “In the summer of 2006, they grew from 30 million pages per day to 100 million pages per day, in a 4 month period. Thumbnails turn out to be surprisingly hard to serve efficiently. (I ran into this with mefeedia too, luckily Amazon S3 came to the rescue by then.)” Youtube uses Python, Apache, MySQL, Memcached.
NEW: Front end scaling is important too, and often ignored. Here’s a good presentation from the Yahoo guys:
In the continuing saga of illegible domain names, I’ve recently purchased wayut.com and xofy.net. Once you know what they are they’re actually easy to remember. 2 possible upcoming projects. Wanna guess?
What’s wrong with the workhack todo list: it dissapears todo items that are done. I like to see what I’ve accomplished, to get that feeling of satisfaction, of knowing you’ve done at least *something* the past 2 days.
A good explanation of the problem with databases. Read and write from memory is much faster. Things are changing in db land – where did I read that mysql will now have a tabletype that’s basically an rss feed?
I hear mefeedia is doing great, numbers continue to grow fast. It’s very satisfying to see that the people I sold it to are continuing to build it out in the original spirit.