CONTROL PROCESSOR CONTROL PROCESSOR CONTROL PROCESSOR

(1)

Real-time Middleware for

Distributed and Embedded Systems

Douglas C. Schmidt

Associate Professor & Computer Engineering Dept.

www.cs.wustl.edu/

schmidt/ University of California, Irvine

Sponsors

NSF, DARPA, Bellcore/Telcordia, BBN, Boeing, CDI/GDIS, Experian, Global MT, Hughes, Kodak, Krones, Lockheed, Lucent, Microsoft, Motorola, Nokia, Nortel, OCI,

OTI, Raytheon, SAIC, Siemens SCR, Siemens MED, Siemens ZT, Sprint, USENIX

(2)

Motivation: the Telecom Software Crisis

www.arl.wustl.edu/arl/

Symptoms

– Network element

hardware

gets smaller, faster, cheaper

– Communication

software

gets larger, slower, more expensive

Culprits

–

Inherent

and

accidental

complexity

Solution Approach

– Manage & control network elements

via

QoS-enabled middleware

(3)

Goal: Apply Embedded Middleware to Network Element Mgmt & Control

MANAGEMENT CLIENT

RIO TAO

NETWORK OPERATIONS CENTER

VSI MASTER

RIO TAO

SIGNALING PROCESSOR

VSI MASTER

RIO TAO

SIGNALING PROCESSOR

VSI SLAVE

CONTROL PROCESSOR

TAO RIO

WUGS WUGS

WUGS

ENDSYSTEM A

UNI CLIENT

TAO

CONNECT REQUEST

ENDSYSTEM A

UNI CLIENT

TAO

ATM

SWITCH

ATM

SWITCH

ATM

SWITCH

VSI SLAVE

TAO RIO

VSI SLAVE

TAO RIO

ADD CONNECTION GET STATS

EVENT

PORT DOWN ADD CONNECTION

GET STATS

ADD CONNECTION

CONNECT REQUEST

INTER-ORB PROTOCOL

Domain Challenges

High-speed (20 Gbps) ATM switches w/embedded controllers

Low-latency and statistical real-time deadlines

COTS infrastructure, standards-based open

systems, and small footprint

(4)

Problem: Low-level Switch Control

& Management ( e.g. , GSMP & VSI)

SIGNALING

CLIENT MASTER CONTROLLER SIGNALING PROCESSOR

T A O

NETWORK ELEMENT SLAVE

CONTROLLER SLAVE CONTROLLER CONNECT COMMIT

SWITCH

CONTROL MESSAGES CONTROL LIBRARY API

Features

Setup & release connections

Add & delete

point-to-multipoint leaves

Manage ATM switch ports

Request

configuration information &

statistics

Drawbacks

Non-portable, tedious, and error-prone programming

APIs

(5)

Proxy Server Configuration

SIGNALING P R O C E S S O R

T A O

SIGNALING P R O C C E S S O R

P R O X Y M A S T E R C O N T R O L L E R

T A O

PROXY CONTROL P R O C E S S O R

SLAVE C O N T R O L L E R

T A O

A T M

S W I T C H

A T M

S W I T C H

A T M

S W I T C H

N E T W O R K E L E M E N T

Control cells

config port

C O N T R O L P R O C E S S O R

N E T W O R K E L E M E N T N E T W O R K E L E M E N T

C O N T R O L P R O C E S S O R

C O N N E C T I O N R E Q U E S T

SIGNALING P R O C E S S O R

T A O

SIGNALING P R O C C E S S O R

G S M P : ADD_B R A N C H

VSI: CONNECT COMMIT

Inter-ORB Protocol establish_connect()

Features

Supports standard CORBA programming API

Can use standard ORB

Transparent to existing GSMP & VSI servers

Scales to distributed configuration

– i.e., one CP can

control multiple

switches

(6)

Embedded ORB Configuration

TAO

NETWORK ELEMENT

ATM

SWITCH MANAGEMENT

AGENT

NETWORK OPERATIONS CENTER

SIGNALING PROCESSOR

MASTER CONTROLLER SIGNALING PROCESSOR

ADD NEW CONNECTION GET PORT

STATUS EVENT

PORT DOWN

TAO TAO

INTER-ORB^PROTOCOL

HOST A ^HOSTB

Features

Leverages middleware flexibility and

standardization

Multiple protocols can be supported

– GSMP & VSI in-line bridging, GIOP/IIOP, etc.

Minimal footprint

(7)

Context: Levels of Abstraction in Internetworking and Middleware

HTTP FTP

FDDI ATM

ETHERNET IP

FIBRE CHANNEL

UDP TCP

TELNET DNS

LYNXOS LINUX

WIN NT

CORBA

SOLARIS

CORBA SERVICES CORBA

APPLICATIONS

VXWORKS MIDDLEWARE ARCH INTERNETWORKING ARCH

(8)

Problem: Lack of QoS-enabled Middleware

LOW

LOW--LEVELLEVEL MIDDLEWARE MIDDLEWARE OPERATING OPERATING SYSTEMS SYSTEMS &&

PROTOCOLS PROTOCOLS HIGHER

HIGHER--LEVELLEVEL MIDDLEWARE MIDDLEWARE

COMMON COMMON SERVICES SERVICES

APPLICATIONS APPLICATIONS

Cons

EVENT Cons

EVENT CHANNEL

CHANNEL ConsCons Cons Cons

Many telecom applications require QoS guarantees

– e.g., call-processing,

network/switch management, wireless systems

Building these applications manually is hard

Existing middleware doesn’t support QoS effectively

– e.g., CORBA, DCOM, DCE, Java

Solutions must be integrated

horizontally & vertically

(9)

Candidate Solution: CORBA

INTERFACE REPOSITORY

IMPLEMENTATION REPOSITORY COMPILERIDL

DII ORB

INTERFACE

ORB

ORB CORECORE ^GIOP^GIOP//ÎIOPÎIOP//ÊSIOPSÊSIOPS

IDL

STUBSIDL

STUBS

operation() operation()^{in args}

out args + return value

CLIENT OBJECT

(^SERVANT)

OBJ REF

STANDARD INTERFACE STANDARD LANGUAGE MAPPING

ORB-SPECIFIC INTERFACE STANDARD PROTOCOL INTERFACE

REPOSITORY

IMPLEMENTATION REPOSITORY

IDL

COMPILER

SKELETONIDL DSI

OBJECT ADAPTER

www.cs.wustl.edu/

schmidt/corba.html

Goals of CORBA

Simplify distribution by automating

– Object location &

activation – Parameter

marshaling – Demultiplexing – Error handling

Provide foundation for higher-level

services

(10)

Caveat: Requirements/Limitations of CORBA for Telecom

NETWORK NETWORK OPERATIONS OPERATIONS

CENTER CENTER

HSM ARCHIVEHSM ARCHIVE SERVER SERVER AGENT

AGENT

INTERACTIVE INTERACTIVE AUDIO AUDIO//VIDEOVIDEO AGENT

AGENT ARCHITECTUREARCHITECTURE

SPC HARDWARESPC HARDWARE EMBEDDED EMBEDDED

TAO TAO MIB MIB AGENT

AGENT

www.cs.wustl.edu/schmidt/RT-ORB.ps.gz

Requirements

Location transparency

Performance transparency

Predictability transparency

Reliability tranparency

Limitations

Lack of QoS specifications

Lack of QoS enforcement

Lack of real-time

programming features

Lack of performance optimizations

(11)

Overview of the Real-time CORBA Specification

OS KERNEL

OS I/O SUBSYSTEM NETWORK ADAPTERS

STANDARD SYNCHRONIZERS END-TO-END PRIORITY

PROPAGATION

ORB CORE

OBJECT ADAPTER CLIENT

GIOP

PROTOCOL PROPERTIES

THREAD POOLS EXPLICIT

BINDING

NETWORK

OS KERNEL

OS I/O SUBSYSTEM NETWORK ADAPTERS

operation()

out args + return value in args

OBJECT REF

OBJECT

(^SERVANT)

STUBS SKELETON

Features

1. End-to-end priority propagation

2. Protocol properties 3. Thread pools

4. Explicit binding 5. Standard

synchronizers

www.cs.wustl.edu/

schmidt/

oorc.ps.gz

(12)

Our Approach: The ACE ORB (TAO)

NETWORK NETWORK ORB

ORB^RUN^RUN--^TIME^TIME

SCHEDULER SCHEDULER

operation() operation()

IDL

STUBSIDL

STUBS

IDL

SKELETONIDL

SKELETON in args

in args

out args + return value out args + return value

CLIENT CLIENT

OS KERNEL OS KERNEL

HIGH

HIGH--^SPEED^SPEED

NETWORK INTERFACE NETWORK INTERFACE

REAL

REAL--TIME ITIME I//OO SUBSYSTEM SUBSYSTEM

OBJECT OBJECT ((^SERVANT^SERVANT))

OS KERNEL OS KERNEL

HIGH

HIGH--^SPEED^SPEED

NETWORK INTERFACE NETWORK INTERFACE

REAL

REAL--TIME ITIME I//OO SUBSYSTEM SUBSYSTEM

ACE

ACE ^COMPONENTS^COMPONENTS

OBJ OBJ REF REF

REAL

REAL--TIMETIME ORBORB CORECORE IOP

IOP _PLUGGABLE_PLUGGABLE

ORB

ORB & & XPORTXPORT PROTOCOLS PROTOCOLS

IOP

PLUGGABLE IOP

PLUGGABLE ORB

ORB & & XPORTXPORT PROTOCOLS PROTOCOLS

REAL REAL--TIMETIME

OBJECT OBJECT ADAPTER ADAPTER

www.cs.wustl.edu/

schmidt/TAO.html

TAO Overview

^!

An open-source, standards-based, real-time,

high-performance CORBA ORB

Runs on

POSIX/UNIX, Win32, & RTOS platforms

– e.g., VxWorks, Chorus, LynxOS

Leverages ACE

(13)

The ADAPTIVE Communication Environment (ACE)

PROCESSES//

THREADS THREADS

DYNAMIC DYNAMIC LINKING LINKING

MEMORY MEMORY MAPPING MAPPING SELECT

SELECT//

IO COMP IO COMP

SYSTEM SYSTEM

V VIPCIPC

STREAM STREAM PIPES PIPES

NAMED NAMED PIPES PIPES

APICSS ^SOCKETS^SOCKETS//

TLI TLI

COMMUNICATION COMMUNICATION

SUBSYSTEM SUBSYSTEM

VIRTUAL MEMORY VIRTUAL MEMORY

SUBSYSTEM SUBSYSTEM GENERAL POSIX AND WIN32 SERVICES

PROCESS

PROCESS//^THREAD^THREAD

SUBSYSTEM SUBSYSTEM

FRAMEWORKS ^ACCEPTOR^ACCEPTOR ^CONNECTOR^CONNECTOR SELF-CONTAINED

DISTRIBUTED SERVICE COMPONENTS

NAME NAME SERVER SERVER TOKEN TOKEN SERVER SERVER LOGGING

LOGGING SERVER SERVER

GATEWAY GATEWAY SERVER SERVER

SOCK SOCK__^SAP^SAP//

TLI

TLI__^SAP^SAP ^FIFO^FIFO^SAP^SAP

LOG LOG MSG MSG SERVICE

SERVICE HANDLER HANDLER

TIME TIME SERVER SERVER

C++

WRAPPER FACADES

SPIPE SPIPE SAP SAP

CORBA CORBA HANDLER HANDLER

SYSV SYSV

WRAPPERS WRAPPERS SHARED

SHARED MALLOC MALLOC

THE ACE ORB THE ACE ORB

((^TAO^TAO))

JAWS ADAPTIVE JAWS ADAPTIVE WEB SERVER WEB SERVER

MIDDLEWARE APPLICATIONS

REACTOR REACTOR//

PROACTOR PROACTOR PROCESS

PROCESS//

THREAD THREAD MANAGERS MANAGERS

STREAMS STREAMS

SERVICE SERVICE CONFIG CONFIG--

URATOR URATOR SYNCH

SYNCH WRAPPERS WRAPPERS

MEM MEM MAP MAP

OS ADAPTATION LAYER

www.cs.wustl.edu/

schmidt/ACE.html

ACE Overview

^!

A concurrent OO networking framework

Available in C++ and Java

Ported to

POSIX, Win32, and RTOSs Related work

^!

x-Kernel

SysV

STREAMS

(14)

ACE and TAO Statistics

Over 35 person-years of effort – ACE

^>

200,000 LOC

– TAO

^>

125,000 LOC

– TAO IDL compiler

^>

100,000 LOC – TAO CORBA Object Services

^>

150,000 LOC

Ported to UNIX, Win32, MVS, and RTOS platforms

Large user community

– www.cs.wustl.edu/

schmidt/ACE- users.html

Currently used by dozens of companies

– Bellcore, Boeing, Ericsson, Kodak, Lockheed, Lucent,

Motorola, Nokia, Nortel, Raytheon, SAIC,

Siemens, etc.

Supported commercially – ACE

^!

www.riverace.com – TAO

^!

www.ociweb.com

(15)

Applying TAO to Avionics Mission Computing

REPLICATION SERVICE

OBJECT REQUEST BROKER 1:^SENSORS

GENERATE DATA

GPS IFF FLIR

3:^PUSH(^EVENTS) 2: SENSOR PROXIES DEMARSHAL DATA

& PASS TO EVENT CHANNEL

3:^PUSH(^EVENTS)

EVENT CHANNEL

HUD Nav

FrameAir WTS

4:^PULL(^DATA)

www.cs.wustl.edu/

schmidt/JSAC- 98.ps.gz

Domain Challenges

Deterministic & statistical real-time deadlines

Periodic & aperiodic processing

COTS and open systems

Reusable components

Support platform upgrades

www.cs.wustl.edu/

schmidt/TAO-

boeing.html

(16)

Applying TAO to Other Performance-Sensitive Applications

'

&

$

%

DIAGNOSTIC STATIONS DIAGNOSTIC STATIONS

ATM MANATM MAN

ATMLAN

MODALITIES

(CT, MR, CR) ^{BLOB STORE}^CENTRAL

CLUSTER STOREBLOB BLOBDX

STORE

'

&

$

%

WIDE AREA NETWORK

SATELLITES

SATELLITES TRACKINGTRACKING STATION STATION PEERS PEERS

STATUS INFO

COMMANDS BULK DATA

TRANSFER

LOCAL AREA NETWORK

GROUND STATION PEERS

GATEWAY

Medical Imaging Satellite Surveillance

(17)

Problem: Optimizing Complex Software

DIAGNOSTIC STATIONS DIAGNOSTIC STATIONS

ATM ATMMAN MAN

ATMLAN

MODALITIES

(^CT,^MR,^CR) ^{BLOB STORE}^CENTRAL

CLUSTER STOREBLOB BLOBDX

STORE

www.cs.wustl.edu/

schmidt/

JSAC-99.ps.gz

Common Problems

^!

Optimizing complex software is hard

Small “mistakes” can be costly Solution Approach (Iterative)

^!

Pinpoint overhead via white-box metrics

– e.g.,

Quantify

and

VMEtro

Apply patterns and framework components

Revalidate via white-box and

black-box metrics

(18)

Solution 1: Patterns and Framework Components

ACCEPTOR CONNECTOR

ABSTRACT FACTORY

SERVANT CLIENT

OS KERNEL OS KERNEL

ACTIVE OBJECT THREAD-^SPECIFIC

STORAGE SERVICE CONFIGURATOR

STRATEGY

REACTOR REACTOR WRAPPER FACADES WRAPPER FACADES

www.cs.wustl.edu/

schmidt/ORB- patterns.ps.gz

Definitions

Pattern

– A solution to a problem in a context

Framework

– A “semi-complete”

application built with components

Components

– Self-contained, “pluggable”

ADTs

(19)

Solution 2: ORB Optimization Principle Patterns

Definition

Optimization principle patterns

document rules for avoiding common design and

implementation problems that can degrade the performance, scalability, and predictability of complex systems

Key Principle Patterns Used in TAO

# Principle Pattern

1 Optimize for the common case 2 Remove gratuitous waste

3 Replace inefficient general-purpose

functions with efficient special-purpose ones 4 Shift computation in time,

e.g., precompute

5 Store redundant state to speed-up

expensive operations

6 Pass hints between layers and components

7 Don’t be tied to reference implementations/models

8 Use efficient/predictable data structures

(20)

ORB Latency and Priority Inversion Experiments

00 11

C₁ C₀

Requests

C_n

0

1 00110101 00

1100110011001101

Client

...

2 ATM Switch

Ultra 2 Ultra 2

I/O SUBSYSTEM

Server

Object Adapter

ORB Core

Servants ^SCHEDULER ^RUNTIME

www.cs.wustl.edu/

schmidt/RT-perf.ps.gz

Method

Vary ORBs, hold OS constant

Solaris real-time threads

High priority client ^C⁰ connects to servant ^S⁰ with matching priorities

Clients ^C1

:::C

n have same lower priority

Clients ^C¹ ^:^:^:^Cⁿ connect to servant ^S¹

Clients invoke two-way CORBA calls that cube a number on the servant and returns result

(21)

ORB Latency and Priority Inversion Results

0 4 8 12 16 20 24 28 32 36 40 44 48 52

1 5 10 15 20 25 30 35 40 45 50

Number of Low Priority Clients

Latency per Two-way Request in Milliseconds

CORBAplus High Priority CORBAplus Low Priority MT-ORBIX High Priority MT-ORBIX Low Priority miniCOOL High Priority miniCOOL Low Priority TAO High Priority TAO Low Priority

Synopsis of Results

TAO’s latency is lowest for large # of clients

TAO avoids priority inversion – i.e., high priority client

always has lowest latency

Primary overhead stems from concurrency and connection architecture

– e.g., synchronization and

context switching

(22)

ORB Jitter Results

1 5 10 15 20 25 30 35 40 45 50

TAO High Priority TAO Low Priority

miniCOOL High Priority miniCOOL Low Priority

MT-ORBIX High Priority MT-ORBIX Low Priority

CORBAplus High Priority CORBAplus Low Priority

0 50 100 150 200 250 300

Jitter in milliseconds

Number of Low Priority Clients

Definition

Jitter

^!

standard deviation from average latency Synopsis of Results

TAO’s jitter is lowest and most consistent

CORBAplus’ jitter

is highest and

most variable

(23)

Problem: Improper ORB Concurrency Models

OBJECT OBJECT ADAPTER ADAPTER

SERVANT

SERVANT DEMUXERDEMUXER

SERVANT SERVANT

SKELETONS SKELETONS^U

ORB

ORB CORECORE

FILTER FILTER FILTER FILTER FILTER FILTER

SERVANT SERVANT

SKELETONS SKELETONS^U

2: enqueue(data) 2: enqueue(data) 3: dequeue,

3: dequeue, filter filter request, request,

&enqueue

4: dispatch 4: dispatch

upcall() upcall()

II//OO SUBSYSTEMSUBSYSTEM

THREAD THREAD FILTER FILTER

1: recv() 1: recv()

CONNECTION THREADS CONNECTION THREADS

Common Problems

High context switching and synchronization overhead

Thread-level and packet-level priority inversions

Lack of application control over concurrency model www.cs.wustl.edu/

schmidt/

CACM-arch.ps.gz

(24)

Problem: ORB Shared Connection Models

SERVER SERVER ORB

ORB CORECORE II//OO SUBSYSTEMSUBSYSTEM

COMMUNICATION

COMMUNICATION LINKLINK

II//OO SUBSYSTEMSUBSYSTEM SERVANTS SERVANTS

ONE ONE TCPTCP CONNECTION CONNECTION

ORB

ORB CORECORE

SEMAPHORES SEMAPHORES

2: select()

2: select() 4: release() 3: read()

BORROWED THREAD BORROWED THREADS

APPLICATION

5: dispatch_return() 1: invoke_twoway()

www.cs.wustl.edu/

schmidt/

RTAS-98.ps.gz

Common Problems

Request-level

priority inversions – Sharing multiple

priorities on a

single connection

Complex connection multiplexing

Synchronization

overhead

(25)

Problem: High Locking Overhead

0

94

199

175

0

231

216

599

0 100 200 300 400 500 600 700

TAO miniCOOL CORBAplus MT ORBIX

ORBs Tested

User Level Lock Operations per Request

client server

Common Problems

Locking overhead affects

latency and jitter significantly

Memory management

commonly involves locking www.cs.wustl.edu/

schmidt/

RTAS-98.ps.gz

(26)

Solution: TAO’s ORB Endsystem Architecture

R R U U N N T T II M M E E S S C C H H E E D D U U L L E E R R

HIGH-SPEED NETWORK INTERFACES

(e.g., APIC, VME)

Z Z E E R R O O C C O O PP Y Y B B U U FF FF E E R R

RT SS

RT I/O I/O

SUBSYSTEM SUBSYSTEM

RT

RT OBJECTOBJECT ADAPTER ADAPTER

O

O(1) (1) REQUESTREQUEST DEMUXERDEMUXER

CLIENTS CLIENTS

STUBS STUBS

SERVANTS SERVANTS

SKELETONS SKELETONS

U

RT

RT ORBORB CORECORE

REACTOR REACTOR

((PP11))

REACTOR REACTOR

((PP22))

REACTOR REACTOR

((PP33))

REACTOR REACTOR

((PP44))

SOCKET

SOCKET QUEUEQUEUE DEMUXERDEMUXER PLUGGABLE

PLUGGABLE PROTOCOLSPROTOCOLS

Solution Approach

^!

Integrate scheduler into ORB endsystem

Co-schedule threads

Leader/followers thread pool Principle Patterns

^!

Pass hints, precompute, optimize common case, remove gratuitous waste, store state, don’t be tied to reference implementations &

models

(27)

Thread Pool Comparison Results

SERVANTS SERVANTS ORB

ORB CORECORE

1: select() 1: select()

5: dispatch upcall() 5: dispatch upcall()

2: read() 2: read() 3: enqueue() 3: enqueue() 4: dequeue()

4: dequeue()

II//^O^O

THREAD THREAD WORKER THREADS WORKER THREADS

REQUEST REQUEST QUEUE QUEUE

Worker Thread Pool

SERVANTS SERVANTS ORB

ORB CORECORE

SEMAPHORE SEMAPHORE

LEADER

LEADER FOLLOWERSFOLLOWERS

1: select()

1: select() 3: release()3: release()

4: dispatch upcall() 4: dispatch upcall()

2: read() 2: read()

Leader/Follower Thread Pool

1 2

3 4

5 6

0 25

50 75

100

0 500 1000 1500 2000 2500 3000

Performance Improvement

Threads

Load

(28)

Problem: Reducing Demultiplexing Latency

OPERATION1 OPERATION2 OPERATIONK

2:^{DEMUX TO}

I/O^HANDLE

...

POA1

...

OS I/O SUBSYSTEM NETWORK ADAPTERS SERVANT 1

5:^{DEMUX TO}

SKELETON

6:^DISPATCH

OPERATION

1: DEMUX THRU PROTOCOL STACK

4:^{DEMUX TO}

SERVANT

ORB CORE ORB CORE ROOT POA 3:^{DEMUX TO}

OBJECT ADAPTER

POA2 POA_N

SERVANT N

...

SKEL 1 ^SKEL2 ^{SKEL N}

SERVANT 2

KERNELOS LAYER LAYERORB SERVANT

LAYER

Design Challenges

Minimize demuxing layers

Provide

^O(1)

operation

demuxing through all layers

Avoid priority inversions

Remain CORBA-compliant www.cs.wustl.edu/

schmidt/

POA.ps.gz

(29)

Solution: TAO’s Request Demultiplexing Optimizations

... ...

...

IDL IDL SKEL SKEL 22

OBJECT ADAPTER

SERVANT SERVANT 500500 (A)

(A) LAYERED DEMUXING LAYERED DEMUXING,,

LINEAR SEARCH LINEAR SEARCH

SERVANT

SERVANT 11 ^SERVANT^SERVANT22

IDL IDL SKEL SKEL 11

OPERATIONOPERATION100100

...

IDL IDL SKEL N

SKEL N

... ... ...

SERVANTSERVANT500::500::OPERATIONOPERATION100100

OBJECT ADAPTER

...

IDL SKEL IDL SKEL 22

OBJECT ADAPTER

(C) LAYERED DEMUXING,

PERFECT HASHING

SERVANT

SERVANT 11 ^SERVANT^SERVANT22

...

IDL SKEL NIDL SKEL N

search(object key)

hash(operation)

IDL SKEL IDL SKEL 11

(D)^DE-LAYERED ACTIVE DEMUXING

hash(object key) search(operation)

index(object key/operation) (B) LAYERED DEMUXING,

DYNAMIC HASHING

SERVANT SERVANT 500500

Demuxing

www.cs.wustl.edu/

schmidt/

f

ieee tc-97,COOTS-99

^g

.ps.gz

Perfect hashing

www.cs.wustl.edu/

schmidt/

gperf.ps.gz

(30)

POA Demultiplexing Results

1 5

10 15

20

25 0

1 2 3 4 5

Latency(us)

POA Depth

Transient Persistent

Synopsis of Results

Active demux is

efficient & predictable for both transient and persistent object

references.

Principle Patterns

Precompute, pass hints, use

special-purpose &

predictable data

structures

(31)

Servant Demultiplexing Results

1 100

200 300

400 500

0 50 100 150 200

Latency (us)

No. of Objects

Active Demux Dynamic Demux Linear Demux

Synopsis of Results

Linear demux is costly

Active demux is most efficient &

predictable

Principle Patterns

Precompute, pass hints, use

special-purpose &

predictable data

structures

(32)

Operation Demultiplexing Results

1 10

20 30

40 50

0 5 10 15 20 25

Latency (us)

No. of Methods Perfect Hashing

Binary Search Dynamic Hashing Linear Search

Synopsis of Results

^!

Perfect Hashing

– Highly predictable – Low-latency

Others strategies slower

Principle Patterns

^!

Precompute, use

predictable data

structures, remove

gratuitous waste

(33)

TAO Request Demultiplexing Summary

...

POAPOA11

SERVANT

SERVANT 11

ORB CORE ORB CORE ROOT POA ACTIVE

DEMUXING

POA2 POA_N

SERVANT N

...

SKEL 1 ^SKEL2 ^{SKEL N}

SERVANT 2

ACTIVE DEMUXING

PERFECT HASHING

Demultiplexing Stage Absolute Time (

s)

1. Request parsing 2

2. POA demux 2

3. Servant demux 3

4. Operation demux 2

5. Parameter demarshaling operation dependent

6. User upcall servant dependent

7. Results marshaling operation dependent

(34)

Real-time ORB/OS Performance Experiments

[P ]

0 [P ]

1

SCHEDULER

0

RUNTIME

[P ]

I/O SUBSYSTEM

Server

0 ₁ Sn

Cn

00

1100110011001101

Pentium II

S S

C0 C₁

...

...

Object Adapter Servants

ORB Core

Client

...

[P]

Requests Priority

...

[P ] [P ]

1

[P ] n

n

www.cs.wustl.edu/

schmidt/RT-OS.ps.gz

Method

Vary OS, hold ORBs constant

Single-processor Intel Pentium II 450 Mhz, 256 Mbytes of RAM

Client and servant run on the same machine

Client ^Cⁱ connects to servant ^Sⁱ with priority ^Pⁱ

– ⁱ ranges from ¹^:^:^:⁵⁰

Clients invoke two-way CORBA calls that cube a number on the servant and returns result

(35)

Real-time ORB/OS Performance Results

'

&

$

%

0 200 400 600 800 1000 1200 1400 1600 1800 2000

0 1 2 5 10 15 20 25 30 35 40 45 50 Low Priority Clients

Two-way Request Latency, usec

Linux LynxOS NT Solaris VxWorks

'

&

$

%

0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000

0 1 2 5 10 15 20 25 30 35 40 45 50 Low Priority Clients

Two-way Request Latency, usec

Linux LynxOS NT Solaris VxWorks

High-priority Client Latency Low-priority Clients Latency

(36)

Real-time ORB/OS Jitter Results

'

&

$

%

0 2 10 20 30 40 50

0 500 1000 1500 2000 2500 3000 3500

Two-way Jitter, usec

Low Priority Clients

LynxOS NT VxWorks Linux Solaris

'

&

$

%

0 2 10 20 30 40 50

0 5000 10000 15000 20000 25000 30000 35000 40000

Two-way Jitter, usec

Low Priority Clients

LynxOS Linux Solaris NT VxWorks

High-priority Client Jitter Low-priority Clients Jitter

(37)

Problem: Hard-coded ORB Messaging and Transport Protocols

BOARD BOARD 11

VME VME 1553

1553

BOARD BOARD 22

GIOP/IIOP are not sufficient, e.g.:

– GIOP message footprint may be too large

– TCP lacks necessary QoS

– Legacy commitments to existing protocols

Many ORBs do not support

“pluggable protocols”

– This makes ORBs inflexible and

inefficient

(38)

One Solution: Hacking GIOP

GIOP requests include fields that aren’t needed in homogeneous embedded applications

–

e.g., GIOP magic #, GIOP version, byte order,

request principal, etc.

These fields can be omitted without any changes to the standard CORBA programming model

TAO’s

-ORBgioplite

option save 15 bytes per-request, yielding these calls-per-second:

Marshaling-enabled Marshaling-disabled

min max avg min max avg

GIOP 2,878 2,937 2,906 2,912 2,976 2,949 GIOPlite 2,883 2,978 2,943 2,911 3,003 2,967

The result is a measurable improvement in throughput/latency

– However, it’s so small (2%) that hacking GIOP

is of minimal gain except for low-bandwidth

links

(39)

Better Solution: TAO’s

Pluggable Protocols Framework

CLIENT

STUBS SKELETONS

TCP

MULTICAST

IOP

VME UDP

ORB MESSAGING COMPONENT

ORB TRANSPORT ADAPTER COMPONENT

ESIOP

REAL-TIME

IOP

EMBEDDED

IOP

RELIABLE,

BYTE-STREAM

ATM TCP

MEMORY MANAGEMENT CONCURRENCY

MODEL

OTHER

ORB

CORE SERVICES

COMMUNICATION INFRASTRUCTURE

HIGH SPEED NETWORK INTERFACE REAL-TIME I/O SUBSYSTEM

ORB MESSAGE FACTORY

ORB TRANSPORT ADAPTER FACTORY

OBJECT ADAPTER

GIOP GIOPLITE

ADAPTIVE Communication Environment (ACE)

OBJECT (SERVANT)

operation (args)

IN ARGS

OUT ARGS & RETURN VALUE

POLICY CONTROL

Features

Pluggable

ORB messaging and transport protocols

Highly efficient and predictable behavior

Principle Patterns

Replace

general-purpose functions

(protocols) with

special-purpose

ones

(40)

CORBA Protocol Interoperability Architecture

ORB^MESSAGING

COMPONENT

ORB TRANSPORT

ADAPTER COMPONENT

TRANSPORT LAYER

NETWORK LAYER

GIOP

IIOP

TCP

IP

VME DRIVER

AAL

5

ATM GIOPLITE

VME

-

IOP

ESIOP

ATM

-

IOP

RELIABLE SEQUENCED

PROTOCOL CONFIGURATIONS STANDARD CORBA PROGRAMMING API

Features

^!

Presentation layer – e.g., CDR

Message formats – e.g., GIOP

Transport assumptions – e.g., TCP

Object addressing – e.g., IIOP IOR

www.cs.wustl.edu/

schmidt/pluggable protocols.ps.gz

(41)

Embedded System Benchmark Configuration

MARSHAL MARSHAL PARAMS PARAMS

OBJECT

OBJECT ((^SERVANT^SERVANT))

OBJECT ADAPTER OBJECT ADAPTER

ACTIVE ACTIVE OBJECT OBJECT MAPMAP

IDL

SKELETONIDL

SKELETON

VME BUS VME BUS

OS KERNEL OS KERNEL

VME VME DRIVER DRIVER

OS KERNEL OS KERNEL

VME VME DRIVER DRIVER

IDL

STUBSIDL

STUBS

ORB MESSAGING ORB MESSAGING ORB TRANSPORT ORB TRANSPORT

CLIENT CLIENT

INCOMINGINCOMING

CDR CDR DATA COPY DATA COPY

DATA COPY DATA COPY

DMA

DMA^COPY^COPY

ORB MESSAGING ORB MESSAGING ORB TRANSPORT ORB TRANSPORT CDR

CDR

DATA COPY DATA COPY DEMARSHAL DEMARSHAL

PARAMS PARAMS

DATA COPY DATA COPY

OUTGOINGOUTGOING

ORB COREORB CORE

obj->op (params)

VxWorks running on 200 Mhz PowerPC over a 320 Mbps VME & 10

Mbps Ethernet

(42)

Ethernet & VME Two-way Latency Results

0 1000 2000 3000 4000 5000 6000 7000 8000 9000

void short octet long struct short seq <octet>

long seq <octet>

short seq <long>

long seq <long>

Data Type

Latency (usec)

VME/GIOPlite avg.

VME/GIOP avg.

Ethernet/GIOPlite avg.