Big Data with Component Based Software

(1)

Big Data with Component Based Software

(2)

Who am I

Erik who?

Erik Forsberg <[email protected]>

Linköping University, 1998-2003.

Computer Science programme + lot's of time at Lysator ACS At Opera Software since 2008

(3)

Outline

● Background

● The problem

● Components for Processing of Big Data – Hadoop

● Components for Coordination - ZooKeeper

● Components for Storing of Big Data - Cassandra

● Gluing it together

(4)

Background - What is Opera Software?

1200+ employees 13 locations

Norway, Sweden (Linköping [70+ people], Gothenburg, Stockholm), Poland, USA, Japan, China, South Korea, Taiwan, Russia, Ukraine, Iceland,

Singapore

We make browsers, deliver content, serve ads, compress video, help operators sell data plans, etc..

And use a lot of computers and bandwidth :-)

(5)

What am I doing @ Opera Software?

I'm the Grumpy Guy in the corner Architect, Opera Statistics Platform

Working with around 10 other people in Poland, India, USA

Dealing with Hadoop, Zookeeper, Cassandra, python, Java, Nagios, Zabbix, etc.

(6)

Background

(7)

Background – What's our problem?

Opera has a multitude of services producing data

● Opera Mini

● Opera Coast

● Opera for Android

● Opera Mediaworks

● Opera Discover

● Etc..

Basically every Opera Service / Product produce some kind of data

(8)

Background – contd.

Some of these produce huge amounts of data

Opera Mini – produce 5.5 TiB of data daily for some 250 million monthly users

(9)

Background – What's our problem?

We want statistics on.. a lot of things

● Usage by country?

● Usage by version?

● Usage by country and version?

● Pageviews by country and version?

● Unique users by country and version? Per day, week, month, quarter and year?

● Top domains used? Top domains per country?

● How do we make users stay with our products?

● Etc..

(10)

Problem #1: Big Data

Don't try to handle 5.5TiB of daily data using your laptop..

Don't try to store the results in a MySQL database and do JOIN. It doesn't work too well when you're adding 200+ million combinations every day.

“Internet-scale data”

(11)

Problem #2: Scalability

Our services are growing, and so is the list of required features Must be able to cope with growth by adding more machines, not by replacing one machine with another more expensive and powerful one.

(12)

Problem #3: Reliability / Fault tolerance

Statistics are important and must be delivered in time, even if machines in data processing cluster go down

(13)

Opera Statistics Platform

(14)

Opera Statistics Platform

Architectural overview

(15)

The General Idea

How it all works 1. Collect data

2. Produce combinations of variables for fixed timeperiods in Apache Hadoop

3. Store in Apache Cassandra as key/values 4. Access data via Web UI talking to Cassandra

(16)

Click Insert > Header & Footer

Input data

Logs are produced by various clusters

We install nginx on each machine.

Logs are retrieved over https, with client certificate authentication.

Components used: nginx

(17)

Getting the logs

OSP Log Manager, written in-house.

Pro tip: Doing 400 concurrent https connections, per file,

works well when communicating with servers in China :-)

Currently using 2Gbit/s during approx. 1.5h period

Logs are directly uploaded to HDFS

Components: Twisted library, python, zookeeper

(18)

Processing Big Data

Apache Hadoop

(19)

Apache Hadoop

Map/Reduce (M/R) framework

Distributed fault-tolerant filesystem (HDFS)

Scalable, just add more computers

M/R jobs can be written in Java (native), C++ (pipes),

or with any programming language (streaming)

(20)

Hadoop M/R example (in python)

class MRExample(object):

def map(self, key, value):

if key.startswith("i_want_this"):

yield key, value

def reduce(self, key, values):

yield key, sum(values)

Input:

(”i_want_this_1”, 7) (”i_want_this_2”, 8) (”dont_want_this”, 10) (”i_want_this_1”, 7)

Results:

(”i_want_this_1”, 14) (”i_want_this_2”, 8)

(21)

OSP and Hadoop

●

Long chain of M/R jobs

●

Begins with logextract job with directory of logs as input.

●

Continues with jobs that merges timeperiods

●

Then jobs that produce combinations from variables

●

Ends with job that bulkloads permutations to Cassandra

(22)

Current Hadoop Production Cluster

● 88 nodes

● Each with 16 cores, 64GiB RAM, 6TB disk

● In total 450TB of HDFS storage

● On Iceland, where Power and Cooling is environmentally friendly and cheap.

Hardware, you cannot always avoid it

(23)

OSP Scheduler

●

Keeps track of the chain of jobs

●

One job's output serves as input to another job

●

Multiple chains, executed per timeperiod

●

Multiple machines, multiple worker processes per machine

●

Uses Apache Zookeeper for coordination, locks, distributed queue

(24)

OSP Scheduler

Keeps track of our chain of events

● Reads a configuration file (.ini)

● When one job has completed, jobs depending on it will be started

● Tasks written in python

● Either runs on the scheduler node, or starts a Hadoop job

● Developed in-house. No good alternative available (when we started)

● Fault-tolerant and Scalable

Components used: Python, ZooKeeper

(25)

Coordinating Distributed

Systems

Apache ZooKeeper

(26)

Apache ZooKeeper

● Centralized service for

distributed synchronization

● Really simple API, but can be used to build complex

distributed services

● Provides a simple way of not having to worry about doing things like distributed locks right.

Because coordinating distributed systems is a Zoo

(27)

Apache ZooKeeper

● Filesystem-like data structure

● Znodes

● Znodes can have data (up to 1MiB)

● Znodes can have children

● Atomic changes

● Versioning

● Ephemeral nodes

● Watches Overview

(28)

Apache ZooKeeper

Fault-tolerance and performance

● Needs at least 3 servers for redundancy

● One server goes down – service is still up

● With 5 servers, 2 servers may go down and service is still up

● Coordinator is dynamically elected through quorum vote

● Write performance limited by I/O performance on coordinator node

● Read-performance is good, reads are local on each server. All servers keep full copy of database.

● Clients connect to a randomly selected node, writes are forwarded to coordinator node

(29)

How OSP uses ZooKeeper

● Distributed queue

● Multiple producers, multiple consumers

● No polling, uses watches

● Locks

● Either a znode is present, or it's not (atomicity)

● Want to run only one cron job at any time, but on multiple nodes – ephemeral node with known name

● Barriers

● Waiting until all tasks in a chain have completed before some action

● Configuration data

● Dynamic reconfig of service by watching known znode

(30)

Storing Big Data

Apache Cassandra

(31)

Apache Cassandra

● Distributed key/value store, but with columns

● Scalable. Keys distributed among nodes. Just add more hardware.

● Handles large amounts of data

● Tunable consistency

● Configurable replication

● All nodes in cluster are equal.

No master node.

A distributed NoSQL solution

(32)

NoSQL

I was raised with RDBMS and MySQL, this is different

● No relations

● Key/Value

● But with Columns. A Key can have many many columns

● Think Different

● “How will data be accessed by the application?”

● Not “How should I store data in the most efficient way?”

(33)

Cassandra Data Model

Row keys with sparse columns

● Row keys are hashed and data is distributed among servers

● Keyspace replication factor (RF) decides number of servers that hold each key

● RF > X protects against X servers failing

● RF also decides read performance

● Columns are sorted, and can be retrieved in slices

(34)

Cassandra Cluster

Keyspace1, RF=2 Keyspace1, RF=2 Column Family X

Column Family Y

Column Family Y Column Family Z

(35)

Cassandra data model

Sparse columns

Row

key

Home address Work e-mail Home e-mail forsberg The outskirts forsberg@oper

a.com

carrie NY Downtown sexandthecity

@gmail.com [email protected]

ola Lambohov Ola.leifler@liu.

se

(36)

Cassandra data model

No JOIN, use a second Column Family

Row

key

^Username

[email protected] forsberg [email protected] carrie

[email protected] ola

(37)

Cassandra storage

Sorted String Tables

● Writes to Cassandra are to commitlog, then to memory

● Memory is dumped to sstables periodically (or when they get full)

● Very fast, scalable writes!

● Sparse columns are stored efficiently. No null values to store.

● Columns can have TTL, which makes data go away automatically

● Sstables are merged automatically over time

(38)

How OSP use Cassandra

● Results of Hadoop calculations are stored in Cassandra

● i.e. “country=sweden,shoesize=43”: 7 unique users for Opera Mini

● Millions of keys stored every day

● TTL used to make keys go away after our retention period (i.e. daily data is gone after 6 months)

(39)

Choosing the right tool for

the job

(40)

Hadoop vs Cassandra

Hadoop

● Really good Streaming I/O

● Slow as a melting iceberg on random access. So not useful as backend for UI.

● Has M/R component

Cassandra

● Really good response times on random access to any row key.

● Makes it useful as backend for UI.

● No data crunching component

● Hadoop can crunch data from Cassandra efficiently

(41)

Cassandra vs ZooKeeper

Cassandra

● Scalable write performance

● Can keep large blobs

● Structured data, keys and

columns (can have) data types

● No atomicity, but tunable consistency

ZooKeeper

● Write performance limited by I/O of one node

● Small amounts of data (1MiB).

Data must fit in Heap

● Just a blob (bsob?) (we use JSON)

● Atomicity, ephemeral nodes, watches

Two very different Key/Value (kind of) systems

(42)

How do you choose the right component?

● KISS

● Don't be afraid of Open Source.

● But evaluate the community activeness

● Evaluate multiple alternatives

● Choose what seems easiest to work with. If it takes 3 days to set it up for an experiment, it probably isn't right.

(43)

The UI

(44)

The UI

I'm not a UI person, but we have other team members who are

● Written in PHP

● Uses Highcharts, a fine Norwegian component for drawing SVG graphs

● Also uses Memcache for some local caching

(45)

Tying it together

(46)

Tying it all together

● We choose python

● Chances are that if you need to communicate with component X, python already has a module for it

● Really quick to do development in. Get down to business at once!'

● At Opera we use a multitude of Languages

You need some kind of glue to build a system

(47)

End

Questions?

(48)

We are hiring!

http://jobs.opera.com

(49)