• No results found

Netflix: Building Up and Scaling Out on Open Source

N/A
N/A
Protected

Academic year: 2021

Share "Netflix: Building Up and Scaling Out on Open Source"

Copied!
58
0
0

Loading.... (view fulltext now)

Full text

(1)

© Black Duck 2013

Netflix: Building Up and

Scaling Out on Open Source

(2)

2 © Black Duck 2013

2

Andrew Aitken - Founder and GM of Olliance Consulting, the leading open source business and strategy consultancy and a division of Black Duck. With 15+ years of industry

experience, Andrew is a recognized expert on strategies for FOSS commercialization and a leader in the open source community. Founder of the industry’s only “think tank” on the future of commercial open source, a bi-annual event held in Napa, CA and Paris, France, and regularly attended by the leading CEOs and visionaries. He has served as an expert witness on the issues of open source and been an invited guest lecturer at Stanford’s Entrepreneur program. Andrew has chaired and spoken internationally at multiple industry conferences, sits on the Board of Advisors of SugarCRM, DotNetNuke, and Funambol, and has personally worked with companies such as IBM, Microsoft, Intel and the U.S. Navy. In

Adrian Cockcroft is the director of architecture for the Cloud Systems team at Netflix. He is focused on availability, resilience, performance, and measurement of the Netflix cloud platform, and has presented at many conferences, including QCon San Francisco, Beijing and Tokyo. Adrian is also well known as the author of several books while a Distinguished Engineer at Sun Microsystems: Sun Performance and Tuning; Resource Management; and Capacity Planning for Web Services.

From 2004-2007 he was a founding member of eBay Research Labs. He graduated with a BSc in Applied Physics from The City University, London.

(3)

3 © Black Duck 2013

Olliance Consulting, a division of Black Duck

Open Source Strategy: Our Experience, Your Success

The world’s leading organizations turn to Olliance Consulting to create and implement open source strategies to achieve business success. With more than a decade of experience and hundreds of engagements assisting companies ranging from start-ups to the world’s largest

corporations, Olliance creates innovative strategies to leverage the strategic, financial and technological advantages of open source software and methods.

Profile

–Open Source Software Industry’s leading business consultancy

–Over 700 engagements to date

(4)

4 © Black Duck 2013

The Open Source Think Tank is an invitation-only conference for 140 CEOs, CIOs, CTOs,

legal experts, investors and other senior executives engaged in open source software. An

annual event held in Napa, CA, and regularly attended by the industry’s leading CEO’s and

visionaries.

Visit osthinktank.com

(5)

5 © Black Duck 2013

Software is Eating the World

Marc Andreessen – 2011

(6)

Cloud Native Open Source at

Netflix

June 2013

Adrian Cockcroft

@adrianco #netflixcloud @NetflixOSS

http://www.linkedin.com/in/adriancockcroft

(7)

Cloud Native

NetflixOSS – Cloud Native On-Ramp

Netflix Open Source Cloud Prize

(8)

We are Engineers

We solve hard problems

We build amazing and complex things

We fix things when they break

(9)

We strive for perfection

Perfect code

Perfect hardware

Perfectly operated

(10)

But perfection takes too long…

So we compromise

Time to market vs. Quality

Utopia remains out of reach

(11)

Where time to market wins big

Web services

Agile infrastructure - cloud

Continuous deployment

(12)

How Soon?

Code features in days instead of months

Hardware in minutes instead of weeks

(13)

Tipping the Balance

(14)

A new engineering challenge

Construct a highly agile and highly

available service from ephemeral and

(15)
(16)

Netflix Streaming

(17)

Netflix Member Web Site Home Page

(18)

How Netflix Streaming Works

Customer Device (PC, PS3, TV…) Web Site or Discovery API User Data Personalization Streaming API DRM QoS Logging OpenConnect CDN Boxes CDN Management and Steering Content Encoding Consumer Electronics AWS Cloud Services CDN Edge Locations
(19)

Content Delivery Service

(20)

Amazon Video 1.31%

18x

25x

Nov

2012

Streaming

Bandwidth

March

2013

Mean

Bandwidth

+39% 6mo

(21)

Real Web Server Dependencies Flow

(Netflix Home page business transaction as seen by AppDynamics)

Start Here

memcached Cassandra

Web service S3 bucket

Personalization movie group choosers (for US, Canada and Latam)

Each icon is three to a few hundred instances across three AWS zones

(22)

New Anti-Fragile Patterns

Micro-services and Chaos engines

Highly available systems composed

from ephemeral components

Open Source is the default

(23)

Cloud Native

Master copies of data are cloud resident

Everything is dynamically provisioned

(24)

How to get to Cloud Native

Freedom and Responsibility for Developers

Decentralize and Automate Ops Activities

(25)

Netflix BusDevOps Organization

Chief Product Officer VP Product Management Directors Product VP UI Engineering Directors Development Developers + DevOps UI Data Sources AWS VP Discovery Engineering Directors Development Developers + DevOps Discovery Data Sources AWS VP Platform Directors Platform Developers + DevOps Platform Data Sources AWS Denormalized, independently updated and scaled data

Cloud, independently updated and scaled infrastructure

Code, independently updated continuous delivery

(26)

Four Transitions

Management: Integrated Roles in a Single Organization

Business, Development, Operations -> BusDevOps

Developers: Denormalized Data – NoSQL

Decentralized, scalable, available, polyglot

Responsibility from Ops to Dev: Continuous Delivery

Decentralized small daily production updates

Responsibility from Ops to Dev: Agile Infrastructure - Cloud

(27)

What’s Different?

Get out of the way of innovation

Best of breed, provisoned by the hour

Choices based on features and scale

Almost everything is Open Source

Cost reduction Slow down developers Less competitive Less revenue Lower margins Process reduction Speed up developers More competitive More revenue Higher margins

(28)
(29)

Asgard

(30)

Ephemeral Instances

Largest services are autoscaled

Average lifetime of an instance is 36 hours

P u s h Autoscale Up Autoscale Down

(31)
(32)

Cross Region Use Cases

Geographic Isolation

US to Europe replication of subscriber data

Read intensive, low update rate

Production use since late 2011

Redundancy for regional failover

US East to US West replication of everything

Includes write intensive data, high update rate

(33)

Managing Multi-Region Availability

Cassandra Replicas Zone A Cassandra Replicas Zone B Cassandra Replicas Zone C

Regional Load Balancers

Cassandra Replicas Zone A Cassandra Replicas Zone B Cassandra Replicas Zone C

Regional Load Balancers

UltraDNS DynECT DNS

AWS Route53

Denominator – manage traffic via multiple DNS providers

(34)

Benchmarking Global Cassandra

Write intensive test of cross region capacity

16 x hi1.4xlarge SSD nodes per zone = 96 total

Cassandra Replicas Zone A Cassandra Replicas Zone B Cassandra Replicas Zone C

US-West-2 Region - Oregon

Cassandra Replicas Zone A Cassandra Replicas Zone B Cassandra Replicas Zone C

US-East-1 Region - Virginia Test Load Test Load Validation Load

Inter-Zone Traffic 18TB Backup

Restored from S3 using Priam 1 Million writes CL.ONE 1 Million reads CL.ONE with no Data loss Inter-Region Traffic Up to 9Gbits/s, 83ms 18TB S3

(35)
(36)

Netflix Dataoven

Data Warehouse Over 2 Petabytes Ursula Aegisthus Data Pipelines From cloud Services ~100 Billion Events/day From C* Terabytes of Dimension data

Hadoop Clusters – AWS EMR

1300 nodes 800 nodes Multiple 150 nodes Nightly

RDS

Metadata

Gateways

(37)
(38)

Beware of Geeks Bearing Gifts: Strategies for an

Increasingly Open Economy

(39)

How did Netflix get ahead?

Netflix BusDevOps Org

Doing it since 2009

SaaS Applications

PaaS for agility

Public IaaS for AWS features

Big data in the cloud

Integrating many APIs

FOSS from github

Renting hardware for 1hr

Coding in Java/Groovy/Scala

Traditional IT Operations

Taking their time

Pilot private cloud projects

Beta quality installations

Small scale

Integrating several vendors

Paying big $ for software

Paying big $ for consulting

Buying hardware for 3yrs

(40)

Netflix Platform Evolution

Bleeding Edge

Innovation

Common

Pattern

Shared

Pattern

2009-2010 2011-2012 2013-2014

Netflix ended up several years ahead of the industry, but it’s becoming commoditized now

(41)

Making it easy to follow

(42)

Establish our

solutions as Best

Practices / Standards

Hire, Retain and

Engage Top

Engineers

Build up Netflix

Technology Brand

Benefit from a

shared ecosystem

Goals

(43)
(44)

Example Application – RSS Reader

Z U U L Zuul Traffic Processing and Routing
(45)

Zuul Architecture

(46)
(47)

More Use Cases More Features

Better portability

Higher availability

Easier to deploy

Contributions from end users

Contributions from vendors

(48)

Vendor Driven Portability

Interest in using NetflixOSS for Enterprise Private Clouds

“It’s done when it runs Asgard” Functionally complete

Demonstrated March Released June in V3.3

Some vendor interest

Needs AWS compatible Autoscaler

Growing vendor interest

Openstack “Heat” getting there

Another very large vendor planning to demo NetflixOSS at July 17th Meetup

(49)

AWS 2009

Baseline features needed to support NetflixOSS

(50)
(51)
(52)

Judges

Aino Corry

Program Chair for Qcon/GOTO Martin Fowler

Chief Scientist Thoughtworks Simon Wardley

Strategist

Yury Izrailevsky VP Cloud Netflix Werner Vogels

CTO Amazon Joe Weinman

(53)

Entrants Netflix Engineering

Six Judges Winners

Nominations Conforms to Rules Working Code Community Traction Categories Registration Opened March 13 Github Apache Licensed Contributions

Github Github September 15 Close Entries

Award Ceremony Dinner November AWS Re:Invent Ten Prize Categories $10K cash $5K AWS AWS Re:Invent Tickets Trophy

(54)

Functionality and scale now, portability coming

Moving from parts to a platform in 2013

Netflix is fostering a cloud native ecosystem

Rapid Evolution - Low MTBIAMSH

(55)

Slideshare NetflixOSS Details

• Lightning Talks Feb S1E1

– http://www.slideshare.net/RuslanMeshenberg/netflixoss-open-house-lightning-talks

• Asgard In Depth Feb S1E1

– http://www.slideshare.net/joesondow/asgard-overview-from-netflix-oss-open-house

• Lightning Talks March S1E2

– http://www.slideshare.net/RuslanMeshenberg/netflixoss-meetup-lightning-talks-and-roadmap

• Security Architecture

– http://www.slideshare.net/jason_chan/

• Cost Aware Cloud Architectures – with Jinesh Varia of AWS

– http://www.slideshare.net/AmazonWebServices/building-costaware-architectures-jinesh-varia-aws-and-adrian-cockroft-netflix

(56)

Takeaway

NetflixOSS makes it easier for everyone to become Cloud Native

Open Source is not just the default, it

s a strategic weapon

(57)

57 © Black Duck 2013

(58)

Amazon Cloud Terminology Reference

See http://aws.amazon.com/ This is not a full list of Amazon Web Service features

• AWS – Amazon Web Services (common name for Amazon cloud)

• AMI – Amazon Machine Image (archived boot disk, Linux, Windows etc. plus application code)

• EC2 – Elastic Compute Cloud

– Range of virtual machine types m1, m2, c1, cc, cg. Varying memory, CPU and disk configurations.

– Instance – a running computer system. Ephemeral, when it is de-allocated nothing is kept.

– Reserved Instances – pre-paid to reduce cost for long term usage

– Availability Zone – datacenter with own power and cooling hosting cloud instances

– Region – group of Avail Zones – US-East, US-West, EU-Eire, Asia-Singapore, Asia-Japan, SA-Brazil, US-Gov • ASG – Auto Scaling Group (instances booting from the same AMI)

• S3 – Simple Storage Service (http access)

• EBS – Elastic Block Storage (network disk filesystem can be mounted on an instance)

• RDS – Relational Database Service (managed MySQL master and slaves)

• DynamoDB/SDB – Simple Data Base (hosted http based NoSQL datastore, DynamoDB replaces SDB)

• SQS – Simple Queue Service (http based message queue)

• SNS – Simple Notification Service (http and email based topics and messages)

• EMR – Elastic Map Reduce (automatically managed Hadoop cluster)

• ELB – Elastic Load Balancer

• EIP – Elastic IP (stable IP address mapping assigned to instance or ELB)

• VPC – Virtual Private Cloud (single tenant, more flexible network and security constructs)

• DirectConnect – secure pipe from AWS VPC to external datacenter

http://www.slideshare.net/RuslanMeshenberg/netflixoss-open-house-lightning-talks http://www.slideshare.net/joesondow/asgard-overview-from-netflix-oss-open-house http://www.slideshare.net/RuslanMeshenberg/netflixoss-meetup-lightning-talks-and-roadmap http://www.slideshare.net/jason_chan/ http://www.slideshare.net/AmazonWebServices/building-costaware-architectures-jinesh-varia-aws-and-adrian-cockroft-netflix See http://aws.amazon.com/

References

Related documents