There’s something very curious to me. While standard thinking is that you need to use “enterprise quality” languages to scale to large loads, the reality is that big Web 2.0 sites scale much, much larger then most enterprise apps and use scripting languages to do it.

So does that mean that “enterprise quality” is a bunch of crap? No. These Web 2.0 sites are often putting enterprise-type code in the back-end and only using the scripting languages in the front-end. This, by the way, is the “normal” use case for scripting languages. Languages like Python and Ruby make it intentionally trivial to integrate with C-language code.

Take a look at this Facebook architecture presentation to get an idea of what I’m talking about.

It’s PHP on the front-end. They recognize that PHP sucks for systems-level programming. But it’s easy to develop in, it’s easy to rev versions and it’s very, very easy to scale with hardware. They use memcache as a data accelerator. For those of you unaware of Web 2.0 architectures, memcache is an extremely simple shared caching framework written in C. Many Web 2.0 sites use it as a rough staging area to keep from constantly hitting the database.

Another interesting point is that traditional thinking would say that MySQL is a small database that isn’t adequate for scaling. Yet its virtually ubiquitous in extremely large Web 2.0 installations. Facebook uses it as their database. But they don’t use it as a traditional database. They don’t do joins. At all. Think about that. All tables are logically separate. They have to copy fields into multiple tables to avoid joins. And they don’t use centralized databases. Here’s an interesting tidbit: how do they avoid a centralized user store for login? They hash the login id and use a locally stored lookup table to select the security database based on that hash. Most enterprise architects would shudder at that.

Now, that’s not the whole of the architecture. They use a custom form of service architecture on the back-end.

This is all open-source. And all of it has at least some customization. They made changes to PHP. They made changes to MySQL. They made changes to memcache.

Web 2.0 guys tend to reject enterprise languages. They also tend to reinvent the wheel. They often succeed so its obvious that enterprise approaches aren’t the only approaches. They also tend to employ hardcore developers who won’t balk at digging into extending MySQL or in writing low-level C socket code.

One of the most interesting things they built is a SOAP competitor named Thrift. Why? Here’s their summary opinion of SOAP and other alternatives:

  • SOAP - XML, XML and more XML
  • CORBA - bloated
  • COM - Win32
  • Pillar [ed. I’ve never heard of this] - Slick! But no versioning or abstraction
  • Protocol Buffers - You have to role a lot of your own code on top of it

Facebook architects seem to be focused on the pragmatic. They don’t try and force PHP to be a systems language. They invented Thrift so that they could create services in whatever language or environment seemed to fit best, e.g., C++, Python, Java, Erlang, etc. At the level of transactions that they’re running they couldn’t support the overhead of traditional RPC. And they wanted something more formal then just REST.

What lessons can we take from this for enterprise development? Should we use whatever language a particular developer likes? Should we reinvent every wheel?

Maybe it’s best to turn this line of questioning around. Why is the business environment any different from these Web 2.0 sites? It’s different mainly because most enterprises aren’t in the software business. Software is a means to improving their business. They don’t sell it, they don’t make money directly from it. It’s a cost of doing business.

As a result they want proven technologies. They don’t want to customize code, they want to reuse it. They don’t want to blaze new ground. They want promises and commitments.

Do they get that with traditional enterprise tools? They get the promise of it. We’ve all seen that those promises are sometimes hollow, but they’re built on reality.

So then what lesson can we take? That simpler is often better. That many technologies are over-engineered and bloated. That open source works extremely well. And that specialized tools are often better then general purpose ones. That being pragmatic is more effective then being pedantic. Keeping that in mind will help any architect and any developer.