The Performance of
Object-Oriented Components for High-speed Network Programming
Douglas C. Schmidt [email protected]
Washington University, St. Louis
Introduction
Distributed object computing (DOC) frame- works are well-suited for certain communi- cation requirements and certain network en- vironments
{
e.g., request/response or oneway messaging over low-speed Ethernet or Token Ring
However, current DOC implementations ex- hibit high overhead for other types of re- quirements and environments
{
e.g., bandwidth-intensive and delay-sensitive stream-
ing applications over high-speed ATM or FDDI
Outline
Outline communication requirements of dis- tributed medical imaging domain
Compare performance of several network pro- gramming mechanisms:
{
Sockets
{
ACE C++ wrappers
{
CORBA (Orbix)
{
Blob Streaming
Outline Blob Streaming Architecture and Related Patterns
Evaluation and Recommendations
Distributed Medical Imaging in Project Spectrum
DIAGNOSTIC STATIONS
ATM MAN ATM MAN
ATM ATMLANLAN
ATM ATMLANLAN
MODALITIES
(
CT,
MR,
CR)
CENTRALCLUSTER STOREBLOB BLOBDX
STORE
Distributed Objects in Medical Imaging Systems
:: BLOB BLOB
DISPLAY DISPLAY
:: NAME NAME
SERVER
SERVER :: BLOB BLOB
ROUTER ROUTER
:: BLOB BLOB
PROCESSOR PROCESSOR
SOFTWARE BUS SOFTWARE BUS
:: NETWORK NETWORK
TIME TIME
:: IMAGE IMAGE
LOCATOR LOCATOR
:: IMAGE IMAGE
SERVER SERVER
:: DICOM DICOM
PRINTER
PRINTER :: IMAGE IMAGE
SERVER SERVER:: BLOB BLOB
SERVER SERVER
:: BLOB BLOB
LOCATOR LOCATOR
Blob Servers have the following responsibil- ities and requirements:
* Eciently store/retrieve large medical images (Blobs)
* Respond to queries from Blob Locators
* Manage short-term and long-term blob persistence
DOC View of Project Spectrum
DIAGNOSTIC STATIONS
ATM MAN ATM MAN
ATM ATMLANLAN
ATM ATMLANLAN BLOBDX
STORE MODALITIES
(
CT,
MR,
CR)
LOCATORBLOB
SERVERNAME SERVERNAME
LOCATORBLOB
ROUTERBLOB
CLUSTER SERVERTIME
Motivation for Distributed Object Computing
Simplify application development and inter- working, e.g.,
{
CORBA provides higher level integration than tra- ditional \untyped TCP bytestreams"
{
ACE encapsulates lower-level networking and con- currency systems programming interfaces
Provide a foundation for higher-level appli- cation collaboration
{
e.g., Windows OLE and the OMG Common Ob- ject Service Specication (COSS)
Benets for distributed programming simi- lar to OO languages for non-distributed pro- gramming
{
e.g., encapsulation, interface inheritance, and object-
based exception handling
CORBA Architecture
CLIENT OBJECT
IMPL
DYNAMIC INVOCATION
INTERRFACE
STUBS
IDL ORB
INTERFACE
SKELETON
IDL
OBJECT ADAPTER
OBJECT
REQUEST BROKER
op(args)
NAMING SERVICE EVENT
SERVICE LIFECYCLE
SERVICE
SECURITY SERVICE
TRADER SERVICE
CORBA Components
The CORBA specication is comprised of several parts:
1. An Object Request Broker (ORB)
2. An Interface Denition Language (IDL) 3. A Static Invocation Interface (SII)
4. A Dynamic Invocation Interface (DII) 5. A Dynamic Skeleton Interface (DSI)
Other documents from OMG describe com- mon object services built upon CORBA
{
e.g., CORBAServices
!Event services, Name ser-
vices, Lifecycle services
ACE Architecture
THREAD LIBRARY SYNCH SYNCH WRAPPERS WRAPPERS
COMMUNICATION COMMUNICATION
SUBSYSTEM SUBSYSTEM
VIRTUAL MEMORY VIRTUAL MEMORY
SUBSYSTEM SUBSYSTEM DYNAMIC DYNAMIC LINKING LINKING
MEMORY MEMORY MAPPING MAPPING SELECT
SELECT//
POLL POLL
SYSTEM SYSTEM
V V IPCIPC
STREAM STREAM PIPES PIPES
NAMED NAMED PIPES PIPES
SYSV SYSV
WRAPPERS WRAPPERS SPIPE
SAP
GENERAL UNIX AND WIN32 SERVICES C
APIC APISS
C++
C++
WRAPPERS WRAPPERS FRAMEWORKS
AND CLASS CATEGORIES
THREAD MANAGER
PROCESS/THREAD SUBSYSTEM
SOCKETS/ TLI
MEM MAP SHARED MALLOC ACCEPTOR CONNECTOR
DISTRIBUTED
SERVICES NAME
SERVER TOKEN
SERVER
LOGGING SERVER GATEWAY
SERVER
SOCK_SAP/
TLI_SAP FIFOSAP
REACTOR LOG
MSG
SERVICE CONFIG-
URATOR ADAPTIVE SERVICE EXECUTIVE
(ASX)
SERVICE HANDLER
CORBA HANDLER
A set of C++ wrappers, class categories,
and frameworks based on design patterns
Motivation for CORBA and ACE on Project Spectrum
Two crucial issues for overall communica- tion infrastructure exibility and performance
Flexibility motivates the use of a distributed object computing framework like CORBA to transport many formats of data
{
e.g., HL7, DICOM, Blobs, domain objects, etc.
Performance requires we transport this data
as quickly as the current technology allows
Key Research Question
Can CORBA and ACE be used to transfer medical images eciently
over high-speed networks?
Our goal was to determine this empirically
before adopting distributed object comput-
ing wholesale
Performance Experiments
Enhanced version of TTCP
{
TTCP measures end-to-end bulk data transfer with ackknowledgements
{
Enhanced version tests C, ACE C++ wrappers, and CORBA, and Blob Streaming
Parameters varied
{
100 Mbytes of data transferred in various chunk sizes
{
Socket queues were 8k (default) and 64k (maxi- mum)
{
Network was 155 Mbps ATM
Compiler was SunC++ 4.0.1 using highest
optimization level
Network/Host Environment
BAY NETWORKS BAY NETWORKS
LATTISCELL LATTISCELL ATM SWITCH ATM SWITCH
(16
(16 PORT PORT ,, OC3 OC3 155
155 MBPS MBPS // PORT PORT ,, 9,180
9,180 MTU MTU ))
SPARCSTATION SPARCSTATION
20
20 MODEL MODEL 712 712 SS (( ENI ATM ENI ATM
ADAPTORS
ADAPTORS
AND ETHERNET
AND ETHERNET ))
TTCP Conguration for C and ACE C++ Wrappers
ATM ATM SWITCH SWITCH
1: write(buf)
1: write(buf) 3: read(buf) 3: read(buf) 2: forward
2: forward 4: ack 4: ack
Sender
Sender Receiver Receiver
TTCP Conguration for CORBA Implementation
ATM ATM SWITCH SWITCH
2: forward 2: forward
TTCP TTCP Impl Impl
3: send(buf) 3: send(buf)
4: ack 4: ack
Sender
Sender 1: send(buf) 1: send(buf)
TTCP TTCP Skel Skel TTCP
TTCP Stub Stub
TTCP Conguration for Blob Streaming
ATM ATM SWITCH SWITCH Sender
Sender
1: send(buf) 1: send(buf)
2: connect 2: connect
Blob StoreBlob Store
6: send(buf) 6: send(buf)
7: ack
7: ack
Blob_XportBlob_XportSkelSkel4: forward
4: forward
Src BlobSrc ProxyBlob Proxy
Dest BlobDest ProxyBlob Proxy Blob_Xport Blob_Xport
Stub Stub
Src BlobSrc ProxyBlob Proxy
Dest BlobDest ProxyBlob Proxy
5: read(buf) 5: read(buf) 3: write(buf)
3: write(buf)
Performance over ATM
20 25 30 35 40 45 50 55 60 65
0 5 10 15 20 25 30
Mbits/sec
Blob chunk size in megabytes
C, ACE C++, Blob Streaming, and Orbix over ATM
C/64k window ACE/64k window Blob Streaming/64k window Orbix/64k window C/8k window ACE/8k window Blob Streaming/8k window Orbix/8k window
Primary Sources of Overhead
Data copying
Demultiplexing
Memory allocation
Presentation layer formatting
High-Cost Functions
C and ACE C++ Tests
{
Transferring 64 Mbytes with 1 Mbyte buers
Test %Time #Calls Name
--- C sockets 93.9 112 write
(sender) 3.6 110 read
C sockets 93.2 13,085 read
(receiver) 4.5 3 getmsg
ACE C++ wrapper 94.4 112 write
(sender) 3.2 110 read
ACE C++ wrapper 93.9 12,984 read
(receiver) 5.6 3 getmsg
High-Cost Functions (cont'd)
Orbix String and Sequence
Test %Time #Calls Name
--- Orbix Sequence 53.5 127 write
(sender) 35.1 223 read
7.3 1,108 memcpy Orbix Sequence 85.6 12,846 read (receiver) 12.4 1,064 memcpy Orbix String 45.0 127 write
(sender) 35.1 223 read
10.8 1,315 strlen 6.0 1,108 memcpy Orbix String 70.7 12,443 read (receiver) 18.1 2,142 strlen
10.0 1,064 memcpy
High-Cost Functions (cont'd)
Blob Streaming
Test %Time #Calls Name
--- BlobStreaming 48.8 327 write
(sender) 44.8 232 read
1.3 2,055 memcpy BlobStreaming 77.2 12,546 read (receiver) 16.4 12,734 memcpy
1.4 202 write
Overview of Blob Streaming
Blob Streaming provides developers with a uniform interface for operations on multiple types of Binary Large OBjects (BLOBs)
Two primary goals
1. Improved abstraction
{
Shield developers from knowledge of blob loca- tion (e.g., memory vs. \local" les vs. remote network)
2. Maximize performance
{
Transport blobs as eciently as current technol-
ogy allows
Blob Streaming System Architecture
ATM ATM SWITCH SWITCH
Sender Sender
1: push(image) 1: push(image) 2: pull(image)
2: pull(image)
BlobBlobStreaming Streaming
Server Server
BLOB BLOB STORE STORE Receiver
Receiver
Blob Streaming Architecture
Blob Streaming components allow transpar- ent use of resources through uniform blob interfaces
Blob Streaming support the following:
{
Blob location
.
e.g., smart caches to decouple transfers from location algorithms
{
Blob routing
.
e.g., context based routing
{
Source and destination independent Blob trans- port, e.g.,
.
Store and retrieve from remote or local databases
.
Abstract operations like reads/writes may use local le reads/writes , or remote reads/writes
via sockets
Blob Streaming Architecture Design Goals
Goal: decouple application from OS plat- form
{
e.g., applications can be shielded from fact that current version is implemented for UNIX
.
Thus, can port Blob Streaming to Windows NT or OS/2 without changing applications
{
Platform specic operations hidden behind abstract interfaces
.
e.g., WIN32 WaitForMultipleObjects and UNIX
select
Advantages
Portability and extensibility
Blob Streaming Architecture Design Goals (cont'd)
Goal: application independence from trans- port mechanism
{
Switch transports at any stage in the development without aecting application code
.
Presently using CORBA and TCP/IP as trans- port mechanisms
However, none of these mechanisms are ex- posed to programmers
e.g., can use Network OLE
.
As transport technology improves, Blob Stream- ing can change without aecting applications
e.g., \direct ATM"
Advantages
Portability, extensibility, and performance tuning
Design Patterns in Blob Streaming
Active Object Active Object
Half-Sync/
Half-Sync/
Half-Async Half-Async
Iterator Iterator Factory
Factory Method
Method Strategy/ Strategy/ Adapter Adapter Bridge
Bridge
TACTICAL PATTERNS
STRATEGIC PATTERNS
Connector
Connector Acceptor Acceptor Router
Router
Service Service Configurator Configurator Service
Service Stream Stream
Thread Specific Thread Specific
Storage Storage
Reactor Reactor
External
External
Polymorphism
Polymorphism
The Reactor Pattern
Intent
{
An object behavioral pattern that decouples event demultiplexing and event handler dispatching from the services performed in response to events
This pattern resolves the following forces for event-driven software:
{
How to demultiplex multiple types of events from multiple sources of events eciently within a single thread of control
{
How to extend application behavior without requir-
ing changes to the event dispatching framework
Structure of the Reactor Pattern
Reactor
handle_events() register_handler(h) remove_handler(h) expire_timers()
1 1
1
Event_Handler
handle_input() handle_output() handle_signal() handle_timeout() get_handle()
A 1
n
n
Concrete Event_Handler
Timer_Queue
schedule_timer(h) cancel_timer(h) expire_timer(h)
1
1 select (handles);
foreach h in handles { if (h is output handler) h->handle_output () ; if (h is input handler) h->handle_input ();
if (h is signal handler) h->handle_signal ();
}this->expire_timers ();
n Handles
1
APPLICATION
-
DEPENDENT APPLICATION
-
INDEPENDENT
n
Participants in the Reactor pattern
Collaboration in the Reactor Pattern
program main
INITIALIZE
REGISTER HANDLER
callback : Concrete Event_Handler
START EVENT LOOP
DATA ARRIVES OK TO SEND
reactor : Reactor
handle_events()
FOREACH EVENT DO
handle_input() select() Reactor()
register_handler(callback)
handle_output()
SIGNAL ARRIVES TIMER EXPIRES
handle_signal() handle_timeout()
get_handle()
EXTRACT HANDLE
REMOVE HANDLER
remove_handler(callback) INITIALIZATION MODE EVENT HANDLING MODE
handle_close()
CLEANUP
Using the Reactor for Blob Streaming
:: Reactor Reactor
REGISTERED OBJECTS
: Handle Table
: Blob : Blob Handler Handler
: Event : Event Handler Handler : Blob : Blob Processor Processor
2: recv_request(msg) 2: recv_request(msg) 3: putq(msg)
3: putq(msg)
4: getq(msg) 4: getq(msg) 5:svc(msg) 5:svc(msg)
FRAMEWORKFRAMEWORK LEVELLEVEL APPLICATIONAPPLICATION LEVELLEVEL
OS EVENT DEMULTIPLEXING INTERFACE
1: handle_input() 1: handle_input()
: Blob : Blob Handler Handler
: Blob : Blob Handler Handler : Message
: Message
Queue
Queue
The Active Object Pattern
Intent
{
Decouples method execution from method invoca- tion and simplies synchronized access to shared resources by concurrent threads
This pattern resolves the following forces for concurrent communication software:
{
How to allow blocking operations (such as read and write) to execute concurrently
{
How to simplify concurrent access to shared state
Structure of the Active Object Pattern in ACE
Event Handler
handle_input() handle_output() handle_exception() handle_signal() handle_timeout () handle_close() get_handle()=0
A
Shared Object
init()=0 fini ()=0 info()=0
A
Service
Service Object
A
APPLICATION
-
SPECIFIC
APPLICATION
-
INDEPENDENT
Task
A
SYNCH
Message Queue
SYNCH SYNCH
suspend()=0 resume()=0
open()=0
close()=0
put()=0
svc()=0
Collaboration in ACE Active Objects
ACTIVE
:Message Queue
t2 : Task 2: enqueue (msg)
1: put (msg)
ACTIVE
:Message Queue
t1 : Task
ACTIVE
:Message Queue
t3 : Task
5: put (msg) 3: svc ()
4: dequeue (msg)
Using the Active Object Pattern for Blob Streaming
:: Reactor Reactor
REGISTERED OBJECTS
: Handle : Handle
Table Table
: Blob : Blob Handler Handler
: Event : Event Handler Handler
: Blob : Blob Processor Processor
2: recv_request(msg) 2: recv_request(msg) 3: putq(msg)
3: putq(msg)
4: getq(msg) 4: getq(msg) 5:svc(msg) 5:svc(msg)
FRAMEWORKFRAMEWORK LEVELLEVEL APPLICATIONAPPLICATION LEVELLEVEL
OS EVENT DEMULTIPLEXING INTERFACE
1: handle_input() 1: handle_input()
: Blob : Blob Handler Handler
: Blob : Blob Handler Handler : Message
: Message
Queue
Queue
Half-Sync/Half-Async Pattern
Intent
{
An architectural pattern that decouples synchronous I/O from asynchronous I/O in a system to simplify programming eort without degrading execution eciency
This pattern resolves the following forces for concurrent communication systems:
{
How to simplify programming for higher-level com- munication tasks
.
These are performed synchronously (via Active Objects)
{
How to ensure ecient lower-level I/O communi- cation tasks
.
These are performed asynchronously (via the Re-
actor)
Structure of the
Half-Sync/Half-Async Pattern
QUEUEING LAYER SYNCHRONOUS TASK LAYER
S
YNCTASK
1
S
YNCTASK
3
S
YNCTASK
2
1, 4: read(data)
3: enqueue(data) 2: interrupt
ASYNC TASK
EXTERNAL MESSAGE QUEUES
Collaborations in the
Half-Sync/Half-Async Pattern
EXTERNAL EVENT
PROCESS MSG
read(msg)
EXECUTE TASK ENQUEUE MSG
External
Event Source Async
Task Sync
Message Task Queue
enqueue(msg) work()
DEQUEUE MSG
ASYNC PHASE QUEUEING PHASE SYNC PHASE
RECV MSG
notification()
read(msg) work()
This illustrates input processing (output pro-
cessing is similar)
Using the Half-Sync/Half-Async Pattern for Blob Streaming
: Event : Event Handler Handler
: Blob : Blob Processor Processor
2: recv_request(msg) 2: recv_request(msg) 3: putq(msg)
3: putq(msg) 4: getq(msg)
4: getq(msg) 5:svc(msg) 5:svc(msg)
LEVELLEVEL SYNCH TASKSYNCH TASK LEVELLEVEL
1: handle_input() 1: handle_input() : Blob : Blob Handler Handler
: Message : Message
Queue Queue
:: Reactor Reactor
QUEUEINGQUEUEING LEVELLEVEL
: Blob : Blob Handler
Handler : Blob : Blob
Handler
Handler
The Acceptor Pattern
Intent
{
Decouple the passive initialization of a service from the tasks performed once the service is initialized
This pattern resolves the following forces for network servers using interfaces like sock- ets or TLI:
1. How to reuse passive connection establishment code for each new service
2. How to make the connection establishment code portable across platforms that may contain sock- ets but not TLI, or vice versa
3. How to ensure that a passive-mode descriptor is not accidentally used to read or write data
4. How to enable exible policies for creation, con-
nection establishent, and concurrency
Structure of the Acceptor Pattern
Reactor Reactor
11
Acceptor Acceptor
SVC_HANDLER SVC_HANDLER PEER_ACCEPTOR PEER_ACCEPTOR
Concrete Concrete Acceptor Acceptor
Concrete_Svc_Handler Concrete_Svc_Handler SOCK_Acceptor
SOCK_Acceptor
Concrete 11
Concrete Svc Handler Svc Handler
SOCK Stream SOCK Stream
open()
nn
REACTIVEREACTIVE LAYERLAYER CONNECTIONCONNECTION LAYERLAYER APPLICATIONAPPLICATION LAYERLAYER
INITS INITS
Svc Handler Svc Handler
PEER_STREAM PEER_STREAM
open() A A
sh = make_svc_handler();
sh = make_svc_handler();
accept_svc_handler (sh);
accept_svc_handler (sh);
activate_svc_handler (sh);
activate_svc_handler (sh);
nn Event
Event Handler Handler
handle_input()
A A
make_svc_handler() accept_svc_handler() activate_svc_handler() open()
handle_input()
Collaboration in the Acceptor Pattern
Server
REGISTER HANDLER
START EVENT LOOP
CONNECTION EVENT
REGISTER HANDLER FOR CLIENT I/O
FOREACH EVENT DO EXTRACT HANDLE INITIALIZE PASSIVE
ENDPOINT
acc : Acceptor
handle_input() handle_close()
reactor : Reactor
select() Svc_Handlersh:
handle_input()
register_handler(sh) get_handle()
EXTRACT HANDLE DATA EVENT
CLIENT SHUTDOWN
svc()
PROCESS MSG
open()
CREATE, ACCEPT,
AND ACTIVATE OBJECT
SERVER SHUTDOWN handle_close()
ENDPOINT INITIALIZATION PHASE
SERVICE INITIALIZATION PHASE
SERVICE PROCESSING PHASE
: SOCK Acceptor
handle_events() get_handle()
register_handler(acc)
sh = make_svc_handler() accept_svc_handler (sh) activate_svc_handler (sh)
open()
Acceptor factory creates, connects, and ac-
tivates a
Using the Acceptor Pattern for Blob Streaming
PASSIVE PASSIVE LISTENER LISTENER
ACTIVE ACTIVE CONNECTIONS CONNECTIONS
: Svc : Svc Handler Handler
: Blob : Blob Handler Handler
: Svc : Svc Handler Handler
: Blob : Blob Handler Handler
: Svc : Svc Handler Handler
: Blob : Blob Handler Handler
: Svc : Svc Handler Handler
: Blob : Blob Handler Handler : Acceptor
: Acceptor : Blob : Blob Processor Processor
1: sh = make_svc_handler() 1: sh = make_svc_handler() 2: accept_svc_handler(sh) 2: accept_svc_handler(sh) 3: activate_svc_handler(sh)
3: activate_svc_handler(sh) :: Reactor Reactor
Evaluation and Recommendations
Understand communication requirements and network/host environments
Measure performance empirically before adopt- ing a communication model
{
Low-speed networks often hide performance over- head
Insist CORBA implementors provide hooks to manipulate options
{
e.g., setting socket queue size with ORBeline was hard
Increase size of socket queues to largest value supported by OS
Tune the size of the transmitted data buers
to match MTU of the network
Evaluation and Recommendations (cont'd)
Use IDL sequences rather than IDL strings to avoid unnecessary data access (i.e. strlen )
Use write / read rather than send / recv on SVR4 platforms
Long-term solution:
{
Optimize DOC frameworks
{
Add streaming support to CORBA specication
Near-term solution for CORBA overhead on high-speed networks:
Optimizations
CLIENT OBJECTOBJECT
IMPL IMPL
DYNAMIC DYNAMIC INVOCATION INVOCATION INTERFACE INTERFACE
IDL IDL
STUBS
STUBS ORBORB
INTERFACE INTERFACE
IDL IDL
SKELETON SKELETON
OBJECT ADAPTER
OBJECT
REQUEST BROKER
op(args) op(args)
OS KERNEL
NETWORK ADAPTER OS KERNEL
NETWORK ADAPTER
GIOP
GIOP TRANSPORT TRANSPORT PROTOCOL PROTOCOL OPTIMIZATIONS OPTIMIZATIONS DATA COPYING DATA COPYING OPTIMIZATIONS OPTIMIZATIONS
OPERATION OPERATION DEMULTIPLEXING DEMULTIPLEXING OPTIMIZATIONS OPTIMIZATIONS PRESENTATION PRESENTATION
LAYER LAYER OPTIMIZATIONS OPTIMIZATIONS
I/O
I/O SUBSYSTEM SUBSYSTEM
OPTIMIZATIONS OPTIMIZATIONS
NETWORK NETWORK ADAPTER ADAPTER OPTIMIZATIONS OPTIMIZATIONS
To be eective for use with performance-
critical applications over high-speed networks,
CORBA implementations must be optimized
Obtaining ACE
The ADAPTIVE Communication Environ- ment (ACE) is an OO toolkit designed ac- cording to key network programming pat- terns
All source code for ACE is freely available
{
Anonymously ftp to wuarchive.wustl.edu
{
Transfer the les /languages/c++/ACE/*.gz and
gnu/ACE-documentation/*.gz
Mailing list
{
[email protected]
{
[email protected]