Internet Infrastructure

(1)

Internet Infrastructure

(2)

HA – DR - BC

• High Availability

– Resilience of your infrastructure

– Uptime during agreed windows of 9x,xx%

• Disaster Recovery

– Recovery capability

– Key objectives to meet:

• Recovery Point Objective (RPO): how much loss is acceptable? • Recovery Time Objective (RTO): how long may it take to recover?

• Business Continuity

– Business continuity plan describes how to continue business despite major hurdles (office buildings unavailable, computer rooms inaccessible, power grid outage …)

(3)

HA – LB – DR – back-up - BC

– User point of view

– The IT functions are highly available to their users

• Load balancing

– IT point of view

– Efficient use of available infrastructure

• Disaster recovery

– Returning back to acceptable situation after a major incident

• Back-up

– Safe copies of data to recover from data loss

• Business continuity

– Business point of view

(4)

Disaster recovery – business continuity

Disaster Recovery • Respond to disasters – System down – Disk crash • Measures:

– Recovery point objective

• the maximum time period in which data might be lost

• “loose 1 hour of data”

– Recovery time objective

• The amount of time the business can be without the service, without incurring significant risks or significant losses • “system unavailable for 1 hour”

• Options

– Back-up/restore

Business continuity • Major outage

– IT site unreachable, destroyed

• Survival mode: minimal functions to survive, able to continue business • Provide critical operational support • Options:

– Physically separated sites – Rented locations

– Home working – Types:

• Hot standby: up and running in no time • Cold standby: capability

(5)

HA – LB – DR - back-up - BC

– User point of view

– The IT functions are highly available to their users

• Load balancing

– Efficient use of available infrastructure

• Disaster recovery

– Returning back to acceptable situation after a major incident

• Back-up

– Defines recovery points

– Supports recovery of lost data (files – disks – major storage facilities)

• Business continuity

– Business point of view

– Ensuring business can continue because critical functions are operational – Switch to manual processes, postpone processing …

(6)

Topic:

(7)

High Availability

• Key property: availability

– “high” is relative

– Important to determine correctly

• Cost of unavailability

• Business continuity: how long can one accept unavailability?

• Key concept: single point of failure: SPoF • SPoF examples

– Network (one ISP, one network card, one switchboard, …) – Power (one switch, one provider, …)

– Physical access (one door, one gate, one road …) – People (only one person knows how to …)

• Principle:

– “No single point of failure”

(8)

HA: active - passive

Passive Active Hartbeat link Active Active Capacity: 100% Capacity: 100% Capacity: 70% Capacity: 70% 100% overcapacity 100% OK after switch 40% overcapacity 70% OK after switch Load balancing

(9)

(10)

(11)

Back-up

• Create copy of data • Full – incremental

– Need support for incremental operations

• Son - father – grandfather principle

– make and keep back-up daily (RPO one day) – Keep weekly (RPO one week)

– Keep monthly (RPO one month)

• Versioned file system

– Keeps last “n” versions – Avoids intra-day losses

(12)

Back-up - archiving

Back-up

• Use: protection against data modification – Delete – Update • Redundant copy • Full – incremental • Mostly On-line • Use: restore archiving • Use:

– Reduce on-line storage needs – Record keeping (requirement)

• Master data • Incremental • Mostly off-line • Use: search

(13)

Topic:

(14)

Load balancing

• Solve resource/capacity problem • One system cannot cope with load

– Put bigger system in place – Put a cluster in place

(15)

Load balancing

LB Server 1 Server 2 server3 Server 4 client Virtual server

(16)

Load balancer

• Hides the fact that there are multiple servers • Presents itself as THE server (virtual server) • Determines “best” system to handle

– “Best” system to handle requests based on request and system information

• Expected load

• Current load of systems

• But: may not look at either, and count on statistical balancing (round robin)

(17)

Load balancer: dealing with

stickyness

• Servers may need to get multipel requests to not brake the transparency, or just plain work • Issues:

– Sessions, state

• Example: IP packet of one TCP session must go to same serverin a standard implmentation

• http sessions must be handled by the same server for vanilla web servers

(18)

Sticky sessions

First req

Next req

First req

(19)

State in client – stateless operation

First req

(20)

Client state

• Need to communicate all of the state each time

– Overhead

– size limitations

– Multiple requests to different servers => independent state management possible (possible conflicts)

• Some state data is critical for server security

– Logged-in user, authorizations

• State protection:

– Protect confidentiality, integrity

(21)

Load balancing

• Servers have resource limitations • Objectives:

– Scalability

• Servers have resource limitations, a single server has maximal load points

• OS level (multiprocessor, clusters): expensive • Add or remove additional servers transparently

– Availability

• Replace defective components: only subset of users impacted • But not High-Availability (HA): other measures required

• Secondary

– Transparency

(22)

Load Balancing solutions

• Effective load balancing:

– Measure availability

• Health monitoring

• System availability: ping

• Service availability: service ping

– Measure “load”

• Load = resource usage: CPU, disk, ...

• Transparency

– Server side state

• Managed on a server instance:

– requires persistence

– Each packet /message of connection to same server

– Problem with higher level sessions (higher than LB mechanism)

• State distribution

– Central state store (new problem point) – Cost of state migration

(23)

General principle

Client Server Client Virtual Server Real Server Client1 Virtual Server Real Server Real Server C-S C-S VS-RS Map: C-S:VS-RS Client2 C1-S C2-S C1-S:VS-Rsa C2-S:VS-RSb VS-RSb VS-RSa

(24)

Load balancing systems

• Four main hooks for network load balancing:

– ARP

• Different MACs for single IP

– NAT

• Different mappings for same IP

– DNS round robin

• Different IPs for same DNS name

– Proxy

• Different back-ends for same proxy request

• Many vendors of solution: Alteon, F5, Cisco, Resonate, Radware, ...

(25)

ARP

• IP address: map one IP to multiple MAC addresses

• Keep track of sessions: which MAC for which TCP session

• Note: optimization

– Short requests, long replies: only request via load balancer, replies: direct

(26)

ARP: roundtrip

LB 9.9.9.9 -> 34.1.1.1 M_Rout->M_LB-L M_LB-L 9.9.9.9 -> 34.1.1.1 M_LB-R->M_sysi 34.1.1.1 M_sysi 34.1.1.1 9.9.9.9 <- 34.1.1.1 M_rout <- M_sysi Cl ie nt Rout er _M LB-R Sy s i

(27)

ARP: routing

LB -> 34.1.1.1 34.1.1.1 Cl ie nt Rout er S y s i 34.1.1.254 34.1.1.1 Arp: 34.1.1.1 – MAC LB “noarp” 34.1.1.100 34.1.1.3

(28)

NAT based load balancing

• Virtual IP incoming

• Translate to multiple internal IP addresses • Keep session state

(29)

NAT: roundtrip

LB ->34.1.1.1:80 Cl ie nt Rout er Sy s i 9.1.1.1:x-> 9.1.1.1:x-Sys_i 9.1.1.1:x-> ->10.1.1.1:80

•Set up client connection •Decide back-end system •Translate all packets

(30)

DNS round robin

• Do not allow caching of DNS – IP mapping • Hand out IPs in round-robin mode

• Statistical loadbalancing

(31)

Proxy

• Connections forwarded to different IPs • HTTP reverse proxies

– Host: a = server a/ Host: b = server b – Map:

• /images: http://a/images • /doc: http://b/doc