© Black Duck 2013
Netflix: Building Up and
Scaling Out on Open Source
2 © Black Duck 2013
2
Andrew Aitken - Founder and GM of Olliance Consulting, the leading open source business and strategy consultancy and a division of Black Duck. With 15+ years of industry
experience, Andrew is a recognized expert on strategies for FOSS commercialization and a leader in the open source community. Founder of the industry’s only “think tank” on the future of commercial open source, a bi-annual event held in Napa, CA and Paris, France, and regularly attended by the leading CEOs and visionaries. He has served as an expert witness on the issues of open source and been an invited guest lecturer at Stanford’s Entrepreneur program. Andrew has chaired and spoken internationally at multiple industry conferences, sits on the Board of Advisors of SugarCRM, DotNetNuke, and Funambol, and has personally worked with companies such as IBM, Microsoft, Intel and the U.S. Navy. In
Adrian Cockcroft is the director of architecture for the Cloud Systems team at Netflix. He is focused on availability, resilience, performance, and measurement of the Netflix cloud platform, and has presented at many conferences, including QCon San Francisco, Beijing and Tokyo. Adrian is also well known as the author of several books while a Distinguished Engineer at Sun Microsystems: Sun Performance and Tuning; Resource Management; and Capacity Planning for Web Services.
From 2004-2007 he was a founding member of eBay Research Labs. He graduated with a BSc in Applied Physics from The City University, London.
3 © Black Duck 2013
Olliance Consulting, a division of Black Duck
Open Source Strategy: Our Experience, Your Success
The world’s leading organizations turn to Olliance Consulting to create and implement open source strategies to achieve business success. With more than a decade of experience and hundreds of engagements assisting companies ranging from start-ups to the world’s largest
corporations, Olliance creates innovative strategies to leverage the strategic, financial and technological advantages of open source software and methods.
Profile
–Open Source Software Industry’s leading business consultancy
–Over 700 engagements to date
4 © Black Duck 2013
The Open Source Think Tank is an invitation-only conference for 140 CEOs, CIOs, CTOs,
legal experts, investors and other senior executives engaged in open source software. An
annual event held in Napa, CA, and regularly attended by the industry’s leading CEO’s and
visionaries.
Visit osthinktank.com
5 © Black Duck 2013
Software is Eating the World
Marc Andreessen – 2011
Cloud Native Open Source at
Netflix
June 2013
Adrian Cockcroft
@adrianco #netflixcloud @NetflixOSS
http://www.linkedin.com/in/adriancockcroft
Cloud Native
NetflixOSS – Cloud Native On-Ramp
Netflix Open Source Cloud Prize
We are Engineers
We solve hard problems
We build amazing and complex things
We fix things when they break
We strive for perfection
Perfect code
Perfect hardware
Perfectly operated
But perfection takes too long…
So we compromise
Time to market vs. Quality
Utopia remains out of reach
Where time to market wins big
Web services
Agile infrastructure - cloud
Continuous deployment
How Soon?
Code features in days instead of months
Hardware in minutes instead of weeks
Tipping the Balance
A new engineering challenge
Construct a highly agile and highly
available service from ephemeral and
Netflix Streaming
Netflix Member Web Site Home Page
How Netflix Streaming Works
Customer Device (PC, PS3, TV…) Web Site or Discovery API User Data Personalization Streaming API DRM QoS Logging OpenConnect CDN Boxes CDN Management and Steering Content Encoding Consumer Electronics AWS Cloud Services CDN Edge LocationsContent Delivery Service
Amazon Video 1.31%
18x
25x
Nov
2012
Streaming
Bandwidth
March
2013
Mean
Bandwidth
+39% 6mo
Real Web Server Dependencies Flow
(Netflix Home page business transaction as seen by AppDynamics)
Start Here
memcached Cassandra
Web service S3 bucket
Personalization movie group choosers (for US, Canada and Latam)
Each icon is three to a few hundred instances across three AWS zones
New Anti-Fragile Patterns
Micro-services and Chaos engines
Highly available systems composed
from ephemeral components
Open Source is the default
Cloud Native
Master copies of data are cloud resident
Everything is dynamically provisioned
How to get to Cloud Native
Freedom and Responsibility for Developers
Decentralize and Automate Ops Activities
Netflix BusDevOps Organization
Chief Product Officer VP Product Management Directors Product VP UI Engineering Directors Development Developers + DevOps UI Data Sources AWS VP Discovery Engineering Directors Development Developers + DevOps Discovery Data Sources AWS VP Platform Directors Platform Developers + DevOps Platform Data Sources AWS Denormalized, independently updated and scaled dataCloud, independently updated and scaled infrastructure
Code, independently updated continuous delivery
Four Transitions
•
Management: Integrated Roles in a Single Organization
–
Business, Development, Operations -> BusDevOps
•
Developers: Denormalized Data – NoSQL
–
Decentralized, scalable, available, polyglot
•
Responsibility from Ops to Dev: Continuous Delivery
–
Decentralized small daily production updates
•
Responsibility from Ops to Dev: Agile Infrastructure - Cloud
What’s Different?
Get out of the way of innovation
Best of breed, provisoned by the hour
Choices based on features and scale
Almost everything is Open Source
Cost reduction Slow down developers Less competitive Less revenue Lower margins Process reduction Speed up developers More competitive More revenue Higher margins
Asgard
Ephemeral Instances
•
Largest services are autoscaled
•
Average lifetime of an instance is 36 hours
P u s h Autoscale Up Autoscale Down
Cross Region Use Cases
•
Geographic Isolation
–
US to Europe replication of subscriber data
–
Read intensive, low update rate
–
Production use since late 2011
•
Redundancy for regional failover
–
US East to US West replication of everything
–
Includes write intensive data, high update rate
Managing Multi-Region Availability
Cassandra Replicas Zone A Cassandra Replicas Zone B Cassandra Replicas Zone CRegional Load Balancers
Cassandra Replicas Zone A Cassandra Replicas Zone B Cassandra Replicas Zone C
Regional Load Balancers
UltraDNS DynECT DNS
AWS Route53
Denominator – manage traffic via multiple DNS providers
Benchmarking Global Cassandra
Write intensive test of cross region capacity
16 x hi1.4xlarge SSD nodes per zone = 96 total
Cassandra Replicas Zone A Cassandra Replicas Zone B Cassandra Replicas Zone C
US-West-2 Region - Oregon
Cassandra Replicas Zone A Cassandra Replicas Zone B Cassandra Replicas Zone C
US-East-1 Region - Virginia Test Load Test Load Validation Load
Inter-Zone Traffic 18TB Backup
Restored from S3 using Priam 1 Million writes CL.ONE 1 Million reads CL.ONE with no Data loss Inter-Region Traffic Up to 9Gbits/s, 83ms 18TB S3
Netflix Dataoven
Data Warehouse Over 2 Petabytes Ursula Aegisthus Data Pipelines From cloud Services ~100 Billion Events/day From C* Terabytes of Dimension dataHadoop Clusters – AWS EMR
1300 nodes 800 nodes Multiple 150 nodes Nightly
RDS
Metadata
Gateways
Beware of Geeks Bearing Gifts: Strategies for an
Increasingly Open Economy
How did Netflix get ahead?
Netflix BusDevOps Org
•
Doing it since 2009
•
SaaS Applications
•
PaaS for agility
•
Public IaaS for AWS features
•
Big data in the cloud
•
Integrating many APIs
•
FOSS from github
•
Renting hardware for 1hr
•
Coding in Java/Groovy/Scala
Traditional IT Operations
•
Taking their time
•
Pilot private cloud projects
•
Beta quality installations
•
Small scale
•
Integrating several vendors
•
Paying big $ for software
•
Paying big $ for consulting
•
Buying hardware for 3yrs
Netflix Platform Evolution
Bleeding Edge
Innovation
Common
Pattern
Shared
Pattern
2009-2010 2011-2012 2013-2014Netflix ended up several years ahead of the industry, but it’s becoming commoditized now
Making it easy to follow
Establish our
solutions as Best
Practices / Standards
Hire, Retain and
Engage Top
Engineers
Build up Netflix
Technology Brand
Benefit from a
shared ecosystem
Goals
Example Application – RSS Reader
Z U U L Zuul Traffic Processing and RoutingZuul Architecture
More Use Cases More Features
Better portability
Higher availability
Easier to deploy
Contributions from end users
Contributions from vendors
Vendor Driven Portability
Interest in using NetflixOSS for Enterprise Private Clouds
“It’s done when it runs Asgard” Functionally complete
Demonstrated March Released June in V3.3
Some vendor interest
Needs AWS compatible Autoscaler
Growing vendor interest
Openstack “Heat” getting there
Another very large vendor planning to demo NetflixOSS at July 17th Meetup
AWS 2009
Baseline features needed to support NetflixOSS
Judges
Aino Corry
Program Chair for Qcon/GOTO Martin Fowler
Chief Scientist Thoughtworks Simon Wardley
Strategist
Yury Izrailevsky VP Cloud Netflix Werner Vogels
CTO Amazon Joe Weinman
Entrants Netflix Engineering
Six Judges Winners
Nominations Conforms to Rules Working Code Community Traction Categories Registration Opened March 13 Github Apache Licensed Contributions
Github Github September 15 Close Entries
Award Ceremony Dinner November AWS Re:Invent Ten Prize Categories $10K cash $5K AWS AWS Re:Invent Tickets Trophy
Functionality and scale now, portability coming
Moving from parts to a platform in 2013
Netflix is fostering a cloud native ecosystem
Rapid Evolution - Low MTBIAMSH
Slideshare NetflixOSS Details
• Lightning Talks Feb S1E1
– http://www.slideshare.net/RuslanMeshenberg/netflixoss-open-house-lightning-talks
• Asgard In Depth Feb S1E1
– http://www.slideshare.net/joesondow/asgard-overview-from-netflix-oss-open-house
• Lightning Talks March S1E2
– http://www.slideshare.net/RuslanMeshenberg/netflixoss-meetup-lightning-talks-and-roadmap
• Security Architecture
– http://www.slideshare.net/jason_chan/
• Cost Aware Cloud Architectures – with Jinesh Varia of AWS
– http://www.slideshare.net/AmazonWebServices/building-costaware-architectures-jinesh-varia-aws-and-adrian-cockroft-netflix
Takeaway
NetflixOSS makes it easier for everyone to become Cloud Native
Open Source is not just the default, it
’
s a strategic weapon
57 © Black Duck 2013
Amazon Cloud Terminology Reference
See http://aws.amazon.com/ This is not a full list of Amazon Web Service features
• AWS – Amazon Web Services (common name for Amazon cloud)
• AMI – Amazon Machine Image (archived boot disk, Linux, Windows etc. plus application code)
• EC2 – Elastic Compute Cloud
– Range of virtual machine types m1, m2, c1, cc, cg. Varying memory, CPU and disk configurations.
– Instance – a running computer system. Ephemeral, when it is de-allocated nothing is kept.
– Reserved Instances – pre-paid to reduce cost for long term usage
– Availability Zone – datacenter with own power and cooling hosting cloud instances
– Region – group of Avail Zones – US-East, US-West, EU-Eire, Asia-Singapore, Asia-Japan, SA-Brazil, US-Gov • ASG – Auto Scaling Group (instances booting from the same AMI)
• S3 – Simple Storage Service (http access)
• EBS – Elastic Block Storage (network disk filesystem can be mounted on an instance)
• RDS – Relational Database Service (managed MySQL master and slaves)
• DynamoDB/SDB – Simple Data Base (hosted http based NoSQL datastore, DynamoDB replaces SDB)
• SQS – Simple Queue Service (http based message queue)
• SNS – Simple Notification Service (http and email based topics and messages)
• EMR – Elastic Map Reduce (automatically managed Hadoop cluster)
• ELB – Elastic Load Balancer
• EIP – Elastic IP (stable IP address mapping assigned to instance or ELB)
• VPC – Virtual Private Cloud (single tenant, more flexible network and security constructs)
• DirectConnect – secure pipe from AWS VPC to external datacenter