Pinterest: Leveraging the Amazon Cloud
Heard an interesting case study at the Amazon Web Services Summit, in New York. Pinterest is an online pinboard where users can organize, share and explore things they love. Pinterest was named 2012′s Breakout Digital Trend at South by Southwest in March.
Pinterest is completely running in the cloud. Pinterest has made heavy use of Amazon Web Services, in particular relying on Amazon EC2 and Amazon S3 for critical infrastructure. What makes the case study interesting is the rapid growth, almost 18+M visitors, that Pinterest is able to support with a small infrastructure team.
According to Ryan Park, operations engineer for Pinterest, “The cloud has enabled us to be more efficient, to try out new experiments at a very low cost, and enabled us to grow the site very dramatically while maintaining a very small team of 2 people.” The advantages Pinterest is leveraging: (1) elastic capacity; (2) quick and global deployment; (3) No CapEx, no initial spend; (4) Pay as you go, for what you use; and (5) Automation and Reuse.
How do you build an Infrastructure that can support 18M visitors?
Pinterest is an online pinboard, a social networking service, that allows people to collect and organize items of interest, so they can be viewed by others. Pinterest uses Amazon Web Services (AWS) extensively. It uses Amazon’s S3 (Simple Storage Service) and Amazon’s EC2 (Elastic Cloud Compute). The company has about 80 million objects stored in S3, which holds about 410 terabytes of user data.
Pinterest uses a range of AWS services to run the site allowing it to move quickly and scale fast. ”Imagine we were running our data center, and we had to go through a process of capacity planning and ordering and racking hardware. It wouldn’t have been possible to scale fast enough,” Park said.
Pinterest runs about 150 EC2 virtual servers, called instances, to run its core Web services, which are written in Python and use the Django framework. Traffic is balanced across these instances using the Amazon ELB (Elastic Load Balancer). “The ELB has a great API, so we can [programmatically] bring in more instances, or take instances out if they are having problems.”
Another 90 EC2 instances are dedicated towards caching, through memcache. “This allows us to keep a lot of data [in memory] that is accessed very often, so we can keep load off of our database system,” Park said. Another 35 instances are used for internal purposes.
Behind the application, Pinterest runs about 70 master databases on EC2, as well as another set of backup databases located in different regions around the world for redundancy.
In order to serve its users in a timely fashion, Pinterest sharded its database tables across multiple servers. When a database server gets more than 50 percent filled, Pinterest engineers move half its contents to another server, a process called sharding. Last November, the company had eight master-slave database pairs. Now it has 64 pairs of databases. “The sharded architecture has let us grow and get the I/O capacity we need,” Park said.
- Current technologies used include Python (Django), MySQL, Redis, Solr, and Hadoop.
- Automation tools such as Chef or Puppet are used.
- Pinterest development is scripting in Bash, Python or Ruby. Database management in MySQL and NoSQL.
- Linux/Unix/BSD servers and running infrastructure with Amazon Web Services.
The Economics of AWS for Pinterest
Pinterest is using a mix of dedicated instances, on-demand and spot instances. Pinterest pays only for the resources it consumes. Most of Pinterest’s traffic happens during the afternoon and evening hours in the U.S. It uses autoscaling feature so that more instances are added during the day when traffic is heavy, and excess instances are removed at night.
With this approach, Pinterest is able to reduce the number of servers it uses at night by around 40 percent. Because Amazon charges by the hour, this reduction results in cost savings: During times of peak traffic, Pinterest spends about $52 an hour on EC2, though in the wee hours of the night the company can spend as little as $15 an hour.
Amazon’s pay-as-you-go billing also lets Pinterest test new services without incurring the costs of buying servers or software. “There is no big sales process or big upfront costs when we try something out, so we can try experiments to see what works and what doesn’t.” Park said.
Pinterest uses MapReduce to recommend content to new users. They use of Amazon’s Hadoop-based Elastic Map Reduce for data analysis, a service that costs the company only “a few hundred dollars a month,” according to Ryan.
Bottomline and Takeaway
Fascinating example of how to leverage AWS to create a new offering. Basically if you are in a consumer facing business and can’t really predict demand or usage of the service (1M users or 15M users) it makes sense to leverage AWS.
However, there is some creative thinking required around Price Optimization. How to get the mix of reserved vs. on-demand vs. spot resources combination right at the lowest price is a very interesting new dimension.
- Amazon: Era Of Data Centers Ending (informationweek.com)
- Amazon’s CTO highlights seven transformations cloud services will enable (econsultancy.com)
- Amazon Web Services simplifies creation of private clouds (infoworld.com)
- Storing Data In Cloud Files With Rails (rackspace.com)
- PHP Ready To Get High On Amazon Clouds (drdobbs.com)
- Why Should Investors Care About Amazon’s Cloud Services? (fool.com)
- AWS Marketplace is Proof that Cloud Value is Higher up the Stack (architects.dzone.com)