MySpace Data Architecture: Hello Large Data

MySpace data architecture has had quite a journey. MySpace uses SQL Server in a big way. On Tuesday night MySpace Chief Data Architect Christa Stelzmuller spoke to the Silicon Valley Microsoft Data Platform User Group in Mountain View. We had a record turnout. This was a rare opportunity to learn how a high profile company is using SQL Server to manage very large data.  And I mean large – think 130 million active users a month!

It’s pretty well known that started out as a two-tier system. They used ColdFusion on the front-end, and SQL Server at the back-end.  Traffic grew radically, and the technical team scrambled to adapt. Over the years, the technology has matured, but we’re talking about big data, heavy traffic, and continued rapid growth.

Christa Stelzmuller speaks about MySpace Data Architecture

Christa Stelzmuller and me in Mountain View

Now ColdFusion is gone, replaced by C# and ASP.NET. They added a middle tier, and are running mainly on SQL Server 2005, Standard Edition, with a few instances of Enterprise where required.  They have about 4 petabytes of disk space, spread across 17,000+ disks.

That volume of data pushes the database hard, and in some cases, beyond what SQL Server can handle out of the box.  Load during replication was so high that they had to write their own replication mechanism.

Likewise for many other processes. The load also impacts the development, testing, release, and backup routines. According to Christa, they literally invented their own processes and tools, as they are in uncharted territory.

Despite continued growth, MySpace is making real technical progress. For instance, when Christa joined the team from Yahoo 2.5 years ago, they were experiencing more than 2 million data integrity errors per day. Now that’s down to about 100,000 per day. My hat goes off to the MySpace engineering team!  The audience was so engaged that an extended Q&A that broke out in the middle of the presentation. Christa fielded dozens of questions about MySpace Data Architecture, ranging from hardware configurations to backup strategies, and then finished off her presentation. You can check out Christa’s slides here.

Christa will speak to the San Francisco Microsoft Data Platform User Group on October 14, 2009 when her topic will be Service Dispatcher: The MySpace Implementation of Service Broker, and I expect we’ll see another record turnout.