Real-time Middleware for
Distributed and Embedded Systems
Douglas C. Schmidt
Associate Professor & Computer Engineering Dept.
www.cs.wustl.edu/
schmidt/ University of California, Irvine
Sponsors
NSF, DARPA, Bellcore/Telcordia, BBN, Boeing, CDI/GDIS, Experian, Global MT, Hughes, Kodak, Krones, Lockheed, Lucent, Microsoft, Motorola, Nokia, Nortel, OCI,
OTI, Raytheon, SAIC, Siemens SCR, Siemens MED, Siemens ZT, Sprint, USENIX
Motivation: the Telecom Software Crisis
www.arl.wustl.edu/arl/
Symptoms
– Network element
hardwaregets smaller, faster, cheaper
– Communication
softwaregets larger, slower, more expensive
Culprits
–
Inherentand
accidentalcomplexity
Solution Approach
– Manage & control network elements
via
QoS-enabled middlewareGoal: Apply Embedded Middleware to Network Element Mgmt & Control
MANAGEMENT CLIENT
RIO TAO
NETWORK OPERATIONS CENTER
VSI MASTER
RIO TAO
SIGNALING PROCESSOR
VSI MASTER
RIO TAO
SIGNALING PROCESSOR
VSI SLAVE
CONTROL PROCESSOR
TAO RIO
WUGS WUGS
WUGS
ENDSYSTEM A
UNI CLIENT
TAO
CONNECT REQUEST
ENDSYSTEM A
UNI CLIENT
TAO
ATM
SWITCH
ATM
SWITCH
ATM
SWITCH
VSI SLAVE
CONTROL PROCESSOR
TAO RIO
VSI SLAVE
CONTROL PROCESSOR
TAO RIO
ADD CONNECTION GET STATS
EVENT
PORT DOWN ADD CONNECTION
GET STATS
ADD CONNECTION
CONNECT REQUEST
INTER-ORB PROTOCOL
Domain Challenges
High-speed (20 Gbps) ATM switches w/embedded controllers
Low-latency and statistical real-time deadlines
COTS infrastructure, standards-based open
systems, and small footprint
Problem: Low-level Switch Control
& Management ( e.g. , GSMP & VSI)
SIGNALING
CLIENT MASTER CONTROLLER SIGNALING PROCESSOR
T A O
NETWORK ELEMENT SLAVE
CONTROLLER SLAVE CONTROLLER CONNECT COMMIT
SWITCH
CONTROL MESSAGES CONTROL LIBRARY API
Features
Setup & release connections
Add & delete
point-to-multipoint leaves
Manage ATM switch ports
Request
configuration information &
statistics
Drawbacks
Non-portable, tedious, and error-prone programming
APIs
Proxy Server Configuration
SIGNALING P R O C E S S O R
T A O
SIGNALING P R O C C E S S O R
P R O X Y M A S T E R C O N T R O L L E R
T A O
PROXY CONTROL P R O C E S S O R
SLAVE C O N T R O L L E R
T A O
A T M
S W I T C H
A T M
S W I T C H
A T M
S W I T C H
N E T W O R K E L E M E N T
Control cells
SLAVE C O N T R O L L E R
config port
C O N T R O L P R O C E S S O R
N E T W O R K E L E M E N T N E T W O R K E L E M E N T
C O N T R O L P R O C E S S O R
C O N N E C T I O N R E Q U E S T
SIGNALING P R O C E S S O R
T A O
SIGNALING P R O C C E S S O R
G S M P : ADD_B R A N C H
VSI: CONNECT COMMIT
Inter-ORB Protocol establish_connect()
Features
Supports standard CORBA programming API
Can use standard ORB
Transparent to existing GSMP & VSI servers
Scales to distributed configuration
– i.e., one CP can
control multiple
switches
Embedded ORB Configuration
SLAVE C O N T R O L L E R
TAO
CONTROL PROCESSOR
NETWORK ELEMENT
ATM
SWITCH MANAGEMENT
AGENT
NETWORK OPERATIONS CENTER
SIGNALING PROCESSOR
MASTER CONTROLLER SIGNALING PROCESSOR
ADD NEW CONNECTION GET PORT
STATUS EVENT
PORT DOWN
TAO TAO
INTER-ORB PROTOCOL
HOST A HOST B
Features
Leverages middleware flexibility and
standardization
Multiple protocols can be supported
– GSMP & VSI in-line bridging, GIOP/IIOP, etc.
Minimal footprint
Context: Levels of Abstraction in Internetworking and Middleware
HTTP FTP
FDDI ATM
ETHERNET IP
FIBRE CHANNEL
UDP TCP
TELNET DNS
LYNXOS LINUX
WIN NT
CORBA
SOLARIS
CORBA SERVICES CORBA
APPLICATIONS
VXWORKS MIDDLEWARE ARCH INTERNETWORKING ARCH
Problem: Lack of QoS-enabled Middleware
LOW
LOW--LEVELLEVEL MIDDLEWARE MIDDLEWARE OPERATING OPERATING SYSTEMS SYSTEMS &&
PROTOCOLS PROTOCOLS HIGHER
HIGHER--LEVELLEVEL MIDDLEWARE MIDDLEWARE
COMMON COMMON SERVICES SERVICES
APPLICATIONS APPLICATIONS
Cons
EVENT Cons
EVENT CHANNEL
CHANNEL ConsCons Cons Cons
Many telecom applications require QoS guarantees
– e.g., call-processing,
network/switch management, wireless systems
Building these applications manually is hard
Existing middleware doesn’t support QoS effectively
– e.g., CORBA, DCOM, DCE, Java
Solutions must be integrated
horizontally & vertically
Candidate Solution: CORBA
INTERFACE REPOSITORY
IMPLEMENTATION REPOSITORY COMPILERIDL
DII ORB
INTERFACE
ORB
ORB CORECORE GIOPGIOP//IIOPIIOP//ESIOPSESIOPS
IDL
STUBSIDL
STUBS
operation() operation()in args
out args + return value
CLIENT OBJECT
(SERVANT)
OBJ REF
STANDARD INTERFACE STANDARD LANGUAGE MAPPING
ORB-SPECIFIC INTERFACE STANDARD PROTOCOL INTERFACE
REPOSITORY
IMPLEMENTATION REPOSITORY
IDL
COMPILER
SKELETONIDL DSI
OBJECT ADAPTER
www.cs.wustl.edu/
schmidt/corba.html
Goals of CORBA
Simplify distribution by automating
– Object location &
activation – Parameter
marshaling – Demultiplexing – Error handling
Provide foundation for higher-level
services
Caveat: Requirements/Limitations of CORBA for Telecom
NETWORK NETWORK OPERATIONS OPERATIONS
CENTER CENTER
HSM ARCHIVEHSM ARCHIVE SERVER SERVER AGENT
AGENT
INTERACTIVE INTERACTIVE AUDIO AUDIO//VIDEOVIDEO AGENT
AGENT ARCHITECTUREARCHITECTURE
SPC HARDWARESPC HARDWARE EMBEDDED EMBEDDED
TAO TAO MIB MIB AGENT
AGENT
www.cs.wustl.edu/schmidt/RT-ORB.ps.gz
Requirements
Location transparency
Performance transparency
Predictability transparency
Reliability tranparency
Limitations
Lack of QoS specifications
Lack of QoS enforcement
Lack of real-time
programming features
Lack of performance optimizations
Overview of the Real-time CORBA Specification
OS KERNEL
OS I/O SUBSYSTEM NETWORK ADAPTERS
STANDARD SYNCHRONIZERS END-TO-END PRIORITY
PROPAGATION
ORB CORE
OBJECT ADAPTER CLIENT
GIOP
PROTOCOL PROPERTIES
THREAD POOLS EXPLICIT
BINDING
NETWORK
OS KERNEL
OS I/O SUBSYSTEM NETWORK ADAPTERS
operation()
out args + return value in args
OBJECT REF
OBJECT
(SERVANT)
STUBS SKELETON
Features
1. End-to-end priority propagation
2. Protocol properties 3. Thread pools
4. Explicit binding 5. Standard
synchronizers
www.cs.wustl.edu/
schmidt/
oorc.ps.gz
Our Approach: The ACE ORB (TAO)
NETWORK NETWORK ORB
ORB RUN RUN--TIMETIME
SCHEDULER SCHEDULER
operation() operation()
IDL
STUBSIDL
STUBS
IDL
SKELETONIDL
SKELETON in args
in args
out args + return value out args + return value
CLIENT CLIENT
OS KERNEL OS KERNEL
HIGH
HIGH--SPEEDSPEED
NETWORK INTERFACE NETWORK INTERFACE
REAL
REAL--TIME ITIME I//OO SUBSYSTEM SUBSYSTEM
OBJECT OBJECT ((SERVANTSERVANT))
OS KERNEL OS KERNEL
HIGH
HIGH--SPEEDSPEED
NETWORK INTERFACE NETWORK INTERFACE
REAL
REAL--TIME ITIME I//OO SUBSYSTEM SUBSYSTEM
ACE
ACE COMPONENTSCOMPONENTS
OBJ OBJ REF REF
REAL
REAL--TIMETIME ORBORB CORECORE IOP
IOP PLUGGABLEPLUGGABLE
ORB
ORB & & XPORTXPORT PROTOCOLS PROTOCOLS
IOP
PLUGGABLE IOP
PLUGGABLE ORB
ORB & & XPORTXPORT PROTOCOLS PROTOCOLS
REAL REAL--TIMETIME
OBJECT OBJECT ADAPTER ADAPTER
www.cs.wustl.edu/
schmidt/TAO.html
TAO Overview
!
An open-source, standards-based, real-time,
high-performance CORBA ORB
Runs on
POSIX/UNIX, Win32, & RTOS platforms
– e.g., VxWorks, Chorus, LynxOS
Leverages ACE
The ADAPTIVE Communication Environment (ACE)
PROCESSES//
THREADS THREADS
DYNAMIC DYNAMIC LINKING LINKING
MEMORY MEMORY MAPPING MAPPING SELECT
SELECT//
IO COMP IO COMP
SYSTEM SYSTEM
V V IPCIPC
STREAM STREAM PIPES PIPES
NAMED NAMED PIPES PIPES
APICSS SOCKETSSOCKETS//
TLI TLI
COMMUNICATION COMMUNICATION
SUBSYSTEM SUBSYSTEM
VIRTUAL MEMORY VIRTUAL MEMORY
SUBSYSTEM SUBSYSTEM GENERAL POSIX AND WIN32 SERVICES
PROCESS
PROCESS//THREADTHREAD
SUBSYSTEM SUBSYSTEM
FRAMEWORKS ACCEPTORACCEPTOR CONNECTORCONNECTOR SELF-CONTAINED
DISTRIBUTED SERVICE COMPONENTS
NAME NAME SERVER SERVER TOKEN TOKEN SERVER SERVER LOGGING
LOGGING SERVER SERVER
GATEWAY GATEWAY SERVER SERVER
SOCK SOCK__SAPSAP//
TLI
TLI__SAPSAP FIFOFIFOSAPSAP
LOG LOG MSG MSG SERVICE
SERVICE HANDLER HANDLER
TIME TIME SERVER SERVER
C++
WRAPPER FACADES
SPIPE SPIPE SAP SAP
CORBA CORBA HANDLER HANDLER
SYSV SYSV
WRAPPERS WRAPPERS SHARED
SHARED MALLOC MALLOC
THE ACE ORB THE ACE ORB
((TAOTAO))
JAWS ADAPTIVE JAWS ADAPTIVE WEB SERVER WEB SERVER
MIDDLEWARE APPLICATIONS
REACTOR REACTOR//
PROACTOR PROACTOR PROCESS
PROCESS//
THREAD THREAD MANAGERS MANAGERS
STREAMS STREAMS
SERVICE SERVICE CONFIG CONFIG--
URATOR URATOR SYNCH
SYNCH WRAPPERS WRAPPERS
MEM MEM MAP MAP
OS ADAPTATION LAYER
www.cs.wustl.edu/
schmidt/ACE.html
ACE Overview
!
A concurrent OO networking framework
Available in C++ and Java
Ported to
POSIX, Win32, and RTOSs Related work
!
x-Kernel
SysV
STREAMS
ACE and TAO Statistics
Over 35 person-years of effort – ACE
>200,000 LOC
– TAO
>125,000 LOC
– TAO IDL compiler
>100,000 LOC – TAO CORBA Object Services
>150,000 LOC
Ported to UNIX, Win32, MVS, and RTOS platforms
Large user community
– www.cs.wustl.edu/
schmidt/ACE- users.html
Currently used by dozens of companies
– Bellcore, Boeing, Ericsson, Kodak, Lockheed, Lucent,
Motorola, Nokia, Nortel, Raytheon, SAIC,
Siemens, etc.
Supported commercially – ACE
!www.riverace.com – TAO
!www.ociweb.com
Applying TAO to Avionics Mission Computing
REPLICATION SERVICE
OBJECT REQUEST BROKER 1: SENSORS
GENERATE DATA
GPS IFF FLIR
3:PUSH (EVENTS) 2: SENSOR PROXIES DEMARSHAL DATA
& PASS TO EVENT CHANNEL
3:PUSH (EVENTS)
EVENT CHANNEL
HUD Nav
FrameAir WTS
4: PULL(DATA)
www.cs.wustl.edu/
schmidt/JSAC- 98.ps.gz
Domain Challenges
Deterministic & statistical real-time deadlines
Periodic & aperiodic processing
COTS and open systems
Reusable components
Support platform upgrades
www.cs.wustl.edu/
schmidt/TAO-
boeing.html
Applying TAO to Other Performance-Sensitive Applications
'
&
$
%
DIAGNOSTIC STATIONS DIAGNOSTIC STATIONS
ATM MANATM MAN
ATMLAN
ATMLAN
MODALITIES
(CT, MR, CR) BLOB STORE CENTRAL
CLUSTER STOREBLOB BLOBDX
STORE
'
&
$
%
WIDE AREA NETWORK
SATELLITES
SATELLITES TRACKINGTRACKING STATION STATION PEERS PEERS
STATUS INFO
COMMANDS BULK DATA
TRANSFER
LOCAL AREA NETWORK
GROUND STATION PEERS
GATEWAY
Medical Imaging Satellite Surveillance
Problem: Optimizing Complex Software
DIAGNOSTIC STATIONS DIAGNOSTIC STATIONS
ATM ATMMAN MAN
ATMLAN
ATMLAN
MODALITIES
(CT, MR, CR) BLOB STORE CENTRAL
CLUSTER STOREBLOB BLOBDX
STORE
www.cs.wustl.edu/
schmidt/
JSAC-99.ps.gz
Common Problems
!
Optimizing complex software is hard
Small “mistakes” can be costly Solution Approach (Iterative)
!
Pinpoint overhead via white-box metrics
– e.g.,
Quantifyand
VMEtro
Apply patterns and framework components
Revalidate via white-box and
black-box metrics
Solution 1: Patterns and Framework Components
ACCEPTOR CONNECTOR
ABSTRACT FACTORY
SERVANT CLIENT
OS KERNEL OS KERNEL
ACTIVE OBJECT THREAD-SPECIFIC
STORAGE SERVICE CONFIGURATOR
STRATEGY
REACTOR REACTOR WRAPPER FACADES WRAPPER FACADES
www.cs.wustl.edu/
schmidt/ORB- patterns.ps.gz
Definitions
Pattern
– A solution to a problem in a context
Framework
– A “semi-complete”
application built with components
Components
– Self-contained, “pluggable”
ADTs
Solution 2: ORB Optimization Principle Patterns
Definition
Optimization principle patterns
document rules for avoiding common design and
implementation problems that can degrade the performance, scalability, and predictability of complex systems
Key Principle Patterns Used in TAO
# Principle Pattern
1 Optimize for the common case 2 Remove gratuitous waste
3 Replace inefficient general-purpose
functions with efficient special-purpose ones 4 Shift computation in time,
e.g., precompute5 Store redundant state to speed-up
expensive operations
6 Pass hints between layers and components
7 Don’t be tied to reference implementations/models
8 Use efficient/predictable data structures
ORB Latency and Priority Inversion Experiments
00 11
C1 C0
Requests
Cn
0
1 00110101 00
1100110011001101
Client
...
...
2 ATM Switch
Ultra 2 Ultra 2
I/O SUBSYSTEM
Server
Object Adapter
ORB Core
Servants SCHEDULER RUNTIME
www.cs.wustl.edu/
schmidt/RT-perf.ps.gz
Method
Vary ORBs, hold OS constant
Solaris real-time threads
High priority client C0 connects to servant S0 with matching priorities
Clients C1
:::C
n have same lower priority
Clients C1 :::Cn connect to servant S1
Clients invoke two-way CORBA calls that cube a number on the servant and returns result
ORB Latency and Priority Inversion Results
0 4 8 12 16 20 24 28 32 36 40 44 48 52
1 5 10 15 20 25 30 35 40 45 50
Number of Low Priority Clients
Latency per Two-way Request in Milliseconds
CORBAplus High Priority CORBAplus Low Priority MT-ORBIX High Priority MT-ORBIX Low Priority miniCOOL High Priority miniCOOL Low Priority TAO High Priority TAO Low Priority
Synopsis of Results
TAO’s latency is lowest for large # of clients
TAO avoids priority inversion – i.e., high priority client
always has lowest latency
Primary overhead stems from concurrency and connection architecture
– e.g., synchronization and
context switching
ORB Jitter Results
1 5 10 15 20 25 30 35 40 45 50
TAO High Priority TAO Low Priority
miniCOOL High Priority miniCOOL Low Priority
MT-ORBIX High Priority MT-ORBIX Low Priority
CORBAplus High Priority CORBAplus Low Priority
0 50 100 150 200 250 300
Jitter in milliseconds
Number of Low Priority Clients
Definition
Jitter
!standard deviation from average latency Synopsis of Results
TAO’s jitter is lowest and most consistent
CORBAplus’ jitter
is highest and
most variable
Problem: Improper ORB Concurrency Models
OBJECT OBJECT ADAPTER ADAPTER
SERVANT
SERVANT DEMUXERDEMUXER
SERVANT SERVANT
SKELETONS SKELETONSU
ORB
ORB CORECORE
FILTER FILTER FILTER FILTER FILTER FILTER
SERVANT SERVANT
SKELETONS SKELETONSU
2: enqueue(data) 2: enqueue(data) 3: dequeue,
3: dequeue, filter filter request, request,
&enqueue
&enqueue
4: dispatch 4: dispatch
upcall() upcall()
II//OO SUBSYSTEMSUBSYSTEM
THREAD THREAD FILTER FILTER
1: recv() 1: recv()
CONNECTION THREADS CONNECTION THREADS
Common Problems
High context switching and synchronization overhead
Thread-level and packet-level priority inversions
Lack of application control over concurrency model www.cs.wustl.edu/
schmidt/
CACM-arch.ps.gz
Problem: ORB Shared Connection Models
SERVER SERVER ORB
ORB CORECORE II//OO SUBSYSTEMSUBSYSTEM
II//OO SUBSYSTEMSUBSYSTEM
COMMUNICATION
COMMUNICATION LINKLINK
II//OO SUBSYSTEMSUBSYSTEM SERVANTS SERVANTS
ONE ONE TCPTCP CONNECTION CONNECTION
ORB
ORB CORECORE
SEMAPHORES SEMAPHORES
2: select()
2: select() 4: release() 3: read()
BORROWED THREAD BORROWED THREADS
APPLICATION
5: dispatch_return() 1: invoke_twoway()
www.cs.wustl.edu/
schmidt/
RTAS-98.ps.gz
Common Problems
Request-level
priority inversions – Sharing multiple
priorities on a
single connection
Complex connection multiplexing
Synchronization
overhead
Problem: High Locking Overhead
0
94
199
175
0
231
216
599
0 100 200 300 400 500 600 700
TAO miniCOOL CORBAplus MT ORBIX
ORBs Tested
User Level Lock Operations per Request
client server
Common Problems
Locking overhead affects
latency and jitter significantly
Memory management
commonly involves locking www.cs.wustl.edu/
schmidt/
RTAS-98.ps.gz
Solution: TAO’s ORB Endsystem Architecture
R R U U N N T T II M M E E S S C C H H E E D D U U L L E E R R
HIGH-SPEED NETWORK INTERFACES
(e.g., APIC, VME)
Z Z E E R R O O C C O O PP Y Y B B U U FF FF E E R R
RT SS
RT I/O I/O
SUBSYSTEM SUBSYSTEM
RT
RT OBJECTOBJECT ADAPTER ADAPTER
O
O(1) (1) REQUESTREQUEST DEMUXERDEMUXER
CLIENTS CLIENTS
STUBS STUBS
SERVANTS SERVANTS
SKELETONS SKELETONS
U
RT
RT ORBORB CORECORE
REACTOR REACTOR
((PP11))
REACTOR REACTOR
((PP22))
REACTOR REACTOR
((PP33))
REACTOR REACTOR
((PP44))
SOCKET
SOCKET QUEUEQUEUE DEMUXERDEMUXER PLUGGABLE
PLUGGABLE PROTOCOLSPROTOCOLS
Solution Approach
!
Integrate scheduler into ORB endsystem
Co-schedule threads
Leader/followers thread pool Principle Patterns
!
Pass hints, precompute, optimize common case, remove gratuitous waste, store state, don’t be tied to reference implementations &
models
Thread Pool Comparison Results
SERVANTS SERVANTS ORB
ORB CORECORE
1: select() 1: select()
II//OO SUBSYSTEMSUBSYSTEM
5: dispatch upcall() 5: dispatch upcall()
2: read() 2: read() 3: enqueue() 3: enqueue() 4: dequeue()
4: dequeue()
II//OO
THREAD THREAD WORKER THREADS WORKER THREADS
REQUEST REQUEST QUEUE QUEUE
Worker Thread Pool
SERVANTS SERVANTS ORB
ORB CORECORE
SEMAPHORE SEMAPHORE
LEADER
LEADER FOLLOWERSFOLLOWERS
1: select()
1: select() 3: release()3: release()
II//OO SUBSYSTEMSUBSYSTEM
4: dispatch upcall() 4: dispatch upcall()
2: read() 2: read()
Leader/Follower Thread Pool
1 2
3 4
5 6
0 25
50 75
100
0 500 1000 1500 2000 2500 3000
Performance Improvement
Threads
Load
Problem: Reducing Demultiplexing Latency
OPERATION1 OPERATION2 OPERATIONK
2: DEMUX TO
I/O HANDLE
...
...
POA1
...
OS I/O SUBSYSTEM NETWORK ADAPTERS SERVANT 1
5: DEMUX TO
SKELETON
6: DISPATCH
OPERATION
1: DEMUX THRU PROTOCOL STACK
4: DEMUX TO
SERVANT
ORB CORE ORB CORE ROOT POA 3: DEMUX TO
OBJECT ADAPTER
POA2 POAN
SERVANT N
...
SKEL 1 SKEL 2 SKEL N
SERVANT 2
KERNELOS LAYER LAYERORB SERVANT
LAYER
Design Challenges
Minimize demuxing layers
Provide
O(1)operation
demuxing through all layers
Avoid priority inversions
Remain CORBA-compliant www.cs.wustl.edu/
schmidt/
POA.ps.gz
Solution: TAO’s Request Demultiplexing Optimizations
... ...
...
...
...
...
IDL IDL SKEL SKEL 22
OBJECT ADAPTER
SERVANT SERVANT 500500 (A)
(A) LAYERED DEMUXING LAYERED DEMUXING,,
LINEAR SEARCH LINEAR SEARCH
SERVANT
SERVANT 11 SERVANT SERVANT 22
IDL IDL SKEL SKEL 11
OPERATIONOPERATION100100
OPERATIONOPERATION22
...
...
...
OPERATIONOPERATION11
IDL IDL SKEL N
SKEL N
... ... ...
SERVANTSERVANT500::500::OPERATIONOPERATION100100
SERVANTSERVANT500::500::OPERATIONOPERATION11
SERVANTSERVANT1::1::OPERATIONOPERATION100100
SERVANTSERVANT1::1::OPERATIONOPERATION22
SERVANTSERVANT1::1::OPERATIONOPERATION11
OBJECT ADAPTER
...
...
...
...
...
...
IDL SKEL IDL SKEL 22
OBJECT ADAPTER
(C) LAYERED DEMUXING,
PERFECT HASHING
SERVANT
SERVANT 11 SERVANT SERVANT 22
OPERATIONOPERATION100100
OPERATIONOPERATION22
...
...
...
OPERATIONOPERATION11
IDL SKEL NIDL SKEL N
search(object key)
hash(operation)
IDL SKEL IDL SKEL 11
(D) DE-LAYERED ACTIVE DEMUXING
hash(object key) search(operation)
index(object key/operation) (B) LAYERED DEMUXING,
DYNAMIC HASHING
SERVANT SERVANT 500500
Demuxing
www.cs.wustl.edu/
schmidt/
f
ieee tc-97,COOTS-99
g.ps.gz
Perfect hashing
www.cs.wustl.edu/
schmidt/
gperf.ps.gz
POA Demultiplexing Results
1 5
10 15
20
25 0
1 2 3 4 5
Latency(us)
POA Depth
Transient Persistent
Synopsis of Results
Active demux is
efficient & predictable for both transient and persistent object
references.
Principle Patterns
Precompute, pass hints, use
special-purpose &
predictable data
structures
Servant Demultiplexing Results
1 100
200 300
400 500
0 50 100 150 200
Latency (us)
No. of Objects
Active Demux Dynamic Demux Linear Demux
Synopsis of Results
Linear demux is costly
Active demux is most efficient &
predictable
Principle Patterns
Precompute, pass hints, use
special-purpose &
predictable data
structures
Operation Demultiplexing Results
1 10
20 30
40 50
0 5 10 15 20 25
Latency (us)
No. of Methods Perfect Hashing
Binary Search Dynamic Hashing Linear Search
Synopsis of Results
!
Perfect Hashing
– Highly predictable – Low-latency
Others strategies slower
Principle Patterns
!
Precompute, use
predictable data
structures, remove
gratuitous waste
TAO Request Demultiplexing Summary
...
...
...
...
...
...
POAPOA11
SERVANT
SERVANT 11
ORB CORE ORB CORE ROOT POA ACTIVE
DEMUXING
POA2 POAN
SERVANT N
...
SKEL 1 SKEL 2 SKEL N
SERVANT 2
ACTIVE DEMUXING
PERFECT HASHING
Demultiplexing Stage Absolute Time (
s)
1. Request parsing 2
2. POA demux 2
3. Servant demux 3
4. Operation demux 2
5. Parameter demarshaling operation dependent
6. User upcall servant dependent
7. Results marshaling operation dependent
Real-time ORB/OS Performance Experiments
[P ]
0 [P ]
1
SCHEDULER
0
RUNTIME
[P ]
I/O SUBSYSTEM
Server
0 1 Sn
Cn
00
1100110011001101
Pentium II
S S
C0 C1
...
...
Object Adapter Servants
ORB Core
Client
...
[P]
Requests Priority
...
[P ] [P ]
1
[P ] n
n
www.cs.wustl.edu/
schmidt/RT-OS.ps.gz
Method
Vary OS, hold ORBs constant
Single-processor Intel Pentium II 450 Mhz, 256 Mbytes of RAM
Client and servant run on the same machine
Client Ci connects to servant Si with priority Pi
– i ranges from 1:::50
Clients invoke two-way CORBA calls that cube a number on the servant and returns result
Real-time ORB/OS Performance Results
'
&
$
%
0 200 400 600 800 1000 1200 1400 1600 1800 2000
0 1 2 5 10 15 20 25 30 35 40 45 50 Low Priority Clients
Two-way Request Latency, usec
Linux LynxOS NT Solaris VxWorks
'
&
$
%
0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000
0 1 2 5 10 15 20 25 30 35 40 45 50 Low Priority Clients
Two-way Request Latency, usec
Linux LynxOS NT Solaris VxWorks
High-priority Client Latency Low-priority Clients Latency
Real-time ORB/OS Jitter Results
'
&
$
%
0 2 10 20 30 40 50
0 500 1000 1500 2000 2500 3000 3500
Two-way Jitter, usec
Low Priority Clients
LynxOS NT VxWorks Linux Solaris
'
&
$
%
0 2 10 20 30 40 50
0 5000 10000 15000 20000 25000 30000 35000 40000
Two-way Jitter, usec
Low Priority Clients
LynxOS Linux Solaris NT VxWorks
High-priority Client Jitter Low-priority Clients Jitter
Problem: Hard-coded ORB Messaging and Transport Protocols
BOARD BOARD 11
VME VME 1553
1553
BOARD BOARD 22
GIOP/IIOP are not sufficient, e.g.:
– GIOP message footprint may be too large
– TCP lacks necessary QoS
– Legacy commitments to existing protocols
Many ORBs do not support
“pluggable protocols”
– This makes ORBs inflexible and
inefficient
One Solution: Hacking GIOP
GIOP requests include fields that aren’t needed in homogeneous embedded applications
–
e.g., GIOP magic #, GIOP version, byte order,request principal, etc.
These fields can be omitted without any changes to the standard CORBA programming model
TAO’s
-ORBgiopliteoption save 15 bytes per-request, yielding these calls-per-second:
Marshaling-enabled Marshaling-disabled
min max avg min max avg
GIOP 2,878 2,937 2,906 2,912 2,976 2,949 GIOPlite 2,883 2,978 2,943 2,911 3,003 2,967
The result is a measurable improvement in throughput/latency
– However, it’s so small (2%) that hacking GIOP
is of minimal gain except for low-bandwidth
links
Better Solution: TAO’s
Pluggable Protocols Framework
CLIENT
STUBS SKELETONS
TCP
MULTICAST
IOP
VME UDP
ORB MESSAGING COMPONENT
ORB TRANSPORT ADAPTER COMPONENT
ESIOP
REAL-TIME
IOP
EMBEDDED
IOP
RELIABLE,
BYTE-STREAM
ATM TCP
MEMORY MANAGEMENT CONCURRENCY
MODEL
OTHER
ORB
CORE SERVICES
COMMUNICATION INFRASTRUCTURE
HIGH SPEED NETWORK INTERFACE REAL-TIME I/O SUBSYSTEM
ORB MESSAGE FACTORY
ORB TRANSPORT ADAPTER FACTORY
OBJECT ADAPTER
GIOP GIOPLITE
ADAPTIVE Communication Environment (ACE)
OBJECT (SERVANT)
operation (args)
IN ARGS
OUT ARGS & RETURN VALUE
POLICY CONTROL
Features
Pluggable
ORB messaging and transport protocols
Highly efficient and predictable behavior
Principle Patterns
Replace
general-purpose functions
(protocols) with
special-purpose
ones
CORBA Protocol Interoperability Architecture
ORB MESSAGING
COMPONENT
ORB TRANSPORT
ADAPTER COMPONENT
TRANSPORT LAYER
NETWORK LAYER
GIOP
IIOP
TCP
IP
VME DRIVER
AAL
5
ATM GIOPLITE
VME
-
IOPESIOP
ATM
-
IOPRELIABLE SEQUENCED
PROTOCOL CONFIGURATIONS STANDARD CORBA PROGRAMMING API
Features
!
Presentation layer – e.g., CDR
Message formats – e.g., GIOP
Transport assumptions – e.g., TCP
Object addressing – e.g., IIOP IOR
www.cs.wustl.edu/
schmidt/pluggable protocols.ps.gz
Embedded System Benchmark Configuration
MARSHAL MARSHAL PARAMS PARAMS
OBJECT
OBJECT ((SERVANTSERVANT))
OBJECT ADAPTER OBJECT ADAPTER
ACTIVE ACTIVE OBJECT OBJECT MAPMAP
IDL
SKELETONIDL
SKELETON
VME BUS VME BUS
OS KERNEL OS KERNEL
VME VME DRIVER DRIVER
OS KERNEL OS KERNEL
VME VME DRIVER DRIVER
IDL
STUBSIDL
STUBS
ORB MESSAGING ORB MESSAGING ORB TRANSPORT ORB TRANSPORT
CLIENT CLIENT
INCOMINGINCOMING
CDR CDR DATA COPY DATA COPY
DATA COPY DATA COPY
DMA
DMA COPY COPY
ORB MESSAGING ORB MESSAGING ORB TRANSPORT ORB TRANSPORT CDR
CDR
DATA COPY DATA COPY DEMARSHAL DEMARSHAL
PARAMS PARAMS
DATA COPY DATA COPY
OUTGOINGOUTGOING
ORB COREORB CORE
obj->op (params)
VxWorks running on 200 Mhz PowerPC over a 320 Mbps VME & 10
Mbps Ethernet
Ethernet & VME Two-way Latency Results
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
void short octet long struct short seq <octet>
long seq <octet>
short seq <long>
long seq <long>
Data Type
Latency (usec)
VME/GIOPlite avg.
VME/GIOP avg.
Ethernet/GIOPlite avg.
Synopsis of Results
VME protocol is much faster than Ethernet