(I changed the title because “top 10” posts are indeed sucky. Also: looking for my colombia travel site?)
By the way, here’s the RSS feed of my blog, in case you’d like to subscribe.
I always love to read scaling discussions, especially about popular web apps, and there are loads of them out there. Here’s my overview of the best. By the way, the best book on scaling apps I’ve ever read is Building Scalable Websites, by Cal Henderson (the Flickr guy).
It’s dog-eared on my desk, and taught me about sharding (which I used extensively for mefeedia). Sharding is when you cut a really big table into pieces, so you can put those on separate servers. It means you have to make changes to your code, and your database isn’t so database-y anymore, but it works. For example, online games use sharding to grow their virtual worlds, because there’s no way they could serve all that information from 1 db cluster.
Scaling Twitter with Ruby.
Twitter is hot today, and they ran into some serious scaling problems, although the app itself is quite simple. It consists of messages of maximum 140 characters. Lessons are the same as most apps: Memcache like crazy, and optimize the database (the biggest bottleneck most of the time).
Also, Ruby on Rails scales pretty much the same way as PHP and other similar languages: shared nothing architecture. Shared nothing means that there is no 1 thing that is shared by all servers, since that would become a bottleneck.
PHP, for example, has shared nothing architecture out of the box, except perhaps for sessions, but that’s easily solved by storing sessions in a db (which then has it’s own scaling approach) and not in the filesystem. Here’s a talk by Rasmus Lerdorf that explain scaling with PHP5. (Here’s the mp3 audio recorded by Niall Kennedy).
Blain Cook made this presentation:
Scaling Flickr.
Cal Henderson wrote the above book, and also has a good presentation: Scaling Flickr slides as PDF’s.
One of the problems you get into when scaling something like Flickr where you store LOTS of stuff, is that you can’t just store that on a harddrive anymore: it’s not big enough. Apart from just using Amazon’s S3 service (which rocks – I used it for mefeedia and I know lots of startups who use it), there are other solutions. A good presentation of that by Cal is this one:
Cal (he’s a busy dude) also made this presenation about scaling web apps, generally:
John Allspaw (flickr plumbr) also has a good presentation about scaling Flickr:
Scaling LiveJournal.
LiveJournal was one of the first social networks, before that word meant anything, and they’ve partly invented how to scale standard php/mysql/apache apps. They developed memcached, which is now used by almost anyone who wants to scale their site.
Brad Fitzpatrick has a good set of slides on how they evolved the service, here’s a PDF version. And here’s the slideshow embedded:
Kevin Rose mentioned this was “the bible for scaling Digg” – and I think quite a few other web apps are based on this.
Six Apart.
The livejournal guys with all their scaling expertise were acquired by Six Apart, and they soon launched Vox. And of course, here’s a presentation on making Vox scalable:
Bloglines.
Bloglines’ scaling problems where slightly different from your average web app, since they are an aggregator of feeds. That means they have billions of blogposts they have to keep and serve to users, and that creates its own scaling problems. The Bloglines approach was to, instead of using a database, just store all that stuff in a special filesystem. Today it’d be easier to do this since there are a few filesystems that do that, or you could just go with S3 again. Mark Fletcher (who also sold Onelist to Yahoo which is now Yahoo Groups) has given a few talks on scaling Onelist and Bloglines: here’s the mp3 audio version, and here’s the PDF of that talk. And a text transcript.
Last.fm
Last.fm is one of the aggregation-type apps: they gather a lot of data about what music you listen to. Similarly to Bloglines, that causes it’s own scaling problems:
Slideshare.
All the slides in this post are hosted by Slideshare, an incredible service by my fellow information architect Rashmi Sinha and team. When I found out about the project, I emailed her: “brilliant and so obvious once you think of it”. Like many startups, they use S3 to serve their content, and they have the obligatory yet interesting slides to explain how:
I haven’t linked to lots of good thinking about scaling, or to technical resources and stuff. But the presentations should get you going in the world of memcached, perlbal, nothing shared and federation :) Enjoy!
PS: See also How I Unexpectedly Found Myself Doing Consulting For Startups (this is a post on my “professional” site. I haven’t been able to figure out when to post here or there, any tips on that?).
Update: more presentations.
Another great talk in video this time, from the MySQL Bay Area Community Meetup, May 2007:
Finally, Dan Pritchett has a good presentation on scaling eBay (PDF). 26 Billion SQL queries per day! 300+ new features per quarter! 4 architecture versions since 1998 and some pretty crazy scaling of the search.
New: presentation on how Facebook uses PHP APC cache (PDF).
A talk on Youtube scalability: “In the summer of 2006, they grew from 30 million pages per day to 100 million pages per day, in a 4 month period. Thumbnails turn out to be surprisingly hard to serve efficiently. (I ran into this with mefeedia too, luckily Amazon S3 came to the rescue by then.)” Youtube uses Python, Apache, MySQL, Memcached.
NEW: Front end scaling is important too, and often ignored. Here’s a good presentation from the Yahoo guys:
Hey the comments work!
Great compilation, very useful. Thanks for putting it together.
Slideshare seems perfect for posts like this.
Wow!
Thanks for writing this up: these are a great resource.
And thanks for including my presentation on the list ;->
-jon
I am someone know nothing about Scaling. Yes, I am user of Flickr, Bloglines and others scalable web apps.
Thanks for you all that have much of headache by scaling up the service :)
Your explanation and some slides are helpfull. Cool!
Thanks for gathering these prezos in one place – superb! (I came via amazon/webservices top page)
I came from http://www.kottke.org and I bet a thousand more people will be coming from there shortly.
This is a great rehash of what’s out there on the topic, thank you so much! Now I’d love to find a similar post on the topic of “Should web apps go Open Source?”. It’s totally off topic but that was my initial search when I came accross this :)
I was going to do such a “meta” write-up, too. But after seeing all these presentations bunched together already (with some in there I didn’t know of) I’m just going to link to this ;-) Thanks man!
Great list of resources. I had copies of just a couple of these and didn’t know about all the others, or that they were now on Slideshare. Always a challenge to find useful information on high-end scaling.
This one makes sence “One’s first step in wisdom is to kuesstion everything – and one’s last is to come to terms with everything.”
These guys have a document on how to build scalable websites:
http://blog.thembid.com/index.php/2007/04/05/build-scalable-web-20-sites-with-ubuntu-symfony-and-lighttpd/
This guy also put together a really nice post about different technologies behind different websites:
http://erdtek.com/techblog/2007/05/26/web-20-backends/
Awesome round up!
Brilliant!
Great collection!! :)
I was going to do such a “meta” write-up, too. But after seeing all these presentations bunched together already (with some in there I didn’t know of) I’m just going to link to this ;-) Thanks man!
Now, I am no server admin, so this might be a total shot in the dark, but I know every one of my forums, blogs, etc has been updated to the most secure release. What I do know, is the day that I changed my control panel password on Dreamhost, was the day that this stopped happening.