• No results found

OBJECTOBJECTREQUEST BROKERREQUEST BROKER DIIDIIORBORB IDLIDL DSIDSI IDLIDL op()op() OBJECTOBJECTIMPLEMENTATIONIMPLEMENTATIONCLIENTCLIENT

N/A
N/A
Protected

Academic year: 2021

Share "OBJECTOBJECTREQUEST BROKERREQUEST BROKER DIIDIIORBORB IDLIDL DSIDSI IDLIDL op()op() OBJECTOBJECTIMPLEMENTATIONIMPLEMENTATIONCLIENTCLIENT"

Copied!
11
0
0

Loading.... (view fulltext now)

Full text

(1)

Optimizations for High Performance ORBs Douglas C. Schmidt (www.cs.wustl.edu/



schmidt)

Aniruddha S. Gokhale (www.cs.wustl.edu/



gokhale) Washington University, St. Louis,

USA.

1

Motivation



Typical state of a airs today is the \Distri- bution Crisis"

{

Computers and networks get faster and cheaper

{

Communication software gets slower, buggier, more expensive



Accidental complexity is one source of prob- lems, e.g.,

{

Incompatible software infrastructures

{

Continuous rediscovery and reinvention of core con- cepts and components



Inherent complexity is another source of prob- lems

{

e.g., latency, partial failures, partitioning, causal ordering, etc.

2

Candidate Solution: CORBA

DII

DII ORB ORB

INTERFACE INTERFACE

OBJECT OBJECT REQUEST BROKER REQUEST BROKER

op() op()

IDL

STUBS IDL

STUBS OBJECT OBJECT

ADAPTER ADAPTER

IDL

SKELETON IDL

SKELETON DSI DSI

in args in args out args + return value out args + return value

OBJECT OBJECT IMPLEMENTATION IMPLEMENTATION CLIENT

CLIENT



Goals

1. Simplify development of distributed applications 2. Provide exible foundation for higher-level services

Observations



CORBA is well-suited for certain commu- nication requirements and certain network environments

{

e.g., request/response or oneway messaging over low-speed Ethernet or Token Ring



However, current CORBA implementations exhibit high overhead for other types of re- quirements and environments

{

e.g., bandwidth-intensive and delay-sensitive ap- plications over high-speed networks



Performance limitations will ultimately im-

pede adoption of CORBA

(2)

Performance Problems with Existing ORBs



Existing ORBs lack certain features (e.g., end-to-end QoS)



Reasons for poor performance

{

Inecient server demultiplexing techniques

{

Long chains of intra-ORB function calls

{

Excessive presentation layer conversions and data copying

{

Non-optimized bu ering algorithms used for net- work reads and writes

{

Improper choice of underlying operating system in- terfaces

{

Lack of integration with advanced "OS" and net- work features

5

Key Research Questions



\Can CORBA be used for performance-sensitive applications on high-speed networks?"

{

Goal is to determine this empirically



\What are the strategic optimizations for Gigabit CORBA"?

{

Goal is to maintain strict CORBA compliance



\What changes are required to provide Real- time CORBA?"

{

Goal is to provide end-to-end QoS guarantees

6

Pinpointing CORBA Overhead



Presentation layer overhead

{

e.g., typed and untyped data



Data manipulation and data copying over- head

{

e.g., message management



Demultiplexing and operation dispatching over- head

{

e.g., layered and de-layered demultiplexing



OS/network/protocol integration

{

e.g., ATM/host adapters, resource reservation and scheduling

General Path of CORBA Requests

STUBS

(

SII

)

OR DII REQUEST

CLIENT SERVER

FRAMING

,

ERROR CHECKING AND INTEROPERABILITY

MARSHALLING

OS LEVEL

INTERFACE METHOD IMPLEMENTATION

FRAMING

,

ERROR CHECKING AND INTEROPERABILITY DEMARSHALLING AND

DEMULTIPLEXING

OS LEVEL

NETWORK

(3)

Experiments Performed



Throughput measurements

{

i.e., using static and dynamic invocation policies



Receiver-side request demultiplexing



Latency measurements



Scalability

{

i.e., under varying number of objects per server

9

Network/Host Environment

BAY NETWORKS BAY NETWORKS

LATTISCELL LATTISCELL ATM SWITCH ATM SWITCH

(16

(16 PORT PORT ,, OC3 OC3 155

155 MBPS MBPS // PORT PORT ,, 9,180

9,180 MTU MTU )) ULTRA

ULTRA SPARC SPARC 22

(( ENI ATM ENI ATM ADAPTORS ADAPTORS AND ETHERNET AND ETHERNET ))

10

TTCP Con guration for CORBA Implementation

ATM ATM SWITCH SWITCH

2: forward 2: forward

TTCP TTCP Impl Impl

3: send(buf) 3: send(buf)

4: ack 4: ack

Sender

Sender 1: send(buf) 1: send(buf)

TTCP TTCP Skel Skel TTCP

TTCP Stub Stub

Experimental Setup for Throughput Measurements



Enhanced version of TTCP

{

TTCP measures end-to-end data/request transfer

{

Enhanced version compares Orbix 2.1 and Visi- Broker 2.0



Parameters varied

{

64MB of typed data

.

Types included sequences of scalars and structs

{

Sender bu er sizes ranged from 1K to 1024K

{

Socket queues were 8k (default) and 64k (maxi- mum)

{

Network was 155 Mbps ATM and \loopback"

(4)

Throughput Measurements using Static Invocation Interface



www.cs.wustl.edu/



schmidt/SIGCOMM

,

96.ps.gz

13

Orbix SII 'Blackbox' Performance

0 20 40 60 80 100 120

0 20 40 60 80 100 120 140

Throughput in Mbps

Sender Buffer size in KBytes

short long double char octet struct

14

VisiBroker SII 'Blackbox' Performance

0 20 40 60 80 100 120

0 20 40 60 80 100 120 140

Throughput in Mbps

Sender Buffer size in KBytes

short long double char octet struct

Orbix High-Cost Functions (sender side)



Orbix sequences for 128K user bu er

Test Time %Exec Name

(msec)

--- scalars 6,091 86.00 write

895 13.00 memcpy struct 32,713 72.00 write

1,177 2.58 NullCoder::codeLongArray

978 2.15 Request::op<<(double&)

952 2.09 BinStruct::encodeOp

952 2.09 Request::encodeLongArray

922 2.02 Request::insertOctet

922 2.02 Request::op<<(short&)

922 2.02 Request::op<<(long&)

922 2.02 Request::op<<(char&)

838 1.84 NullCoder::codeDouble

755 1.66 NullCoder::codeLong

671 1.47 memcpy

(5)

Orbix Whitebox Analysis of Throughput using SII

read

OS Kernel ttcp_sequence_i::

sendStructSequence

CORBA::Request::

send

write

OS Kernel

NETWORK

CLIENT SERVER

ttcp_sequence::

sendStructSequence

CORBA::Request::

invoke

FRFInterface::send

OrbixChannel::

sendMsg

OrbixTCPChannel::

transmitBuffer

send OrbixTCPChannel::

receiveBuffer OrbixChannel::

readMsg Channel::

processIncomingMsg FRRInterface::dispatch

ContextClassS::

dispatch ttcp_sequence_dispatch::

dispatch

OS LEVEL INTERFACEMETHODIMPLEMENTATION

OS LEVEL

0.01%

15%

0.01%

75-80%

CLIENT SEND

0.01%

MARSHALLING

20-25%

0.01%

72%

FRAMING, ERROR CHECKING (NO IIOP INTEROPERABILITY) FRAMING,ERROR CHECKING(NO IIOPINTEROPERABILITY) DEMARSHALLINGDEMULTIPLEXING

17

VisiBroker High-Cost Functions (sender side)



VisiBroker sequences for 128K user bu er

Test Time %Exec Name

(msec)

--- scalars 6,214 99.00 write

struct 37,960 72.00 write

3,831 7.28 op<<(NCostream&,BinStruct&) 3,594 6.83 memcpy

979 1.86 PMCIIOPStream::op<<(double) 951 1.81 PMCIIOPStream::put

951 1.81 PMCIIOPStream::op<<(long) 666 1.27 PMCIIOPStream::op<<(short)

18

VisiBroker Whitebox Analysis of Throughput using SII

PMCIIOPstream::

_read PMCIIOPStream::

receiveMessage

read

OS Kernel ttcp_sequence_i::

sendStructSequence

PMCSkelInfo::

execute

PMCBOAClient::

request

PMCBOAClient::

ProcessMessasge

PMCIIOPstream::

_write CORBA::Object::

_send

PMCStubInfo::

send

PMCIIOPStream::

sendMessage

writev

OS Kernel

NETWORK

CLIENT SERVER

_sk_ttcp_sequence::

_sendStructSequence

ttcp_sequence::

sendStructSequence

INTERFACE

METHOD

IMPLEMENTATION FRAMING,ERROR CHECKINGINTEROPERABILITY OS LEVEL DEMARSHALLINGDEMULTIPLEXING

0.01%

12%

11%

70-75%

CLIENT SENDMARSHALLING

FRAMING, ERROR CHECKING INTEROPERABILITYOS LEVEL

0.01%

15-20%

5%

72%

Throughput Measurements using Dynamic Invocation and Dynamic

Skeleton Interface



www.cs.wustl.edu/



schmidt/GLOBECOM

,

96.ps.gz

(6)

Orbix DII 'Blackbox' Performance

0 20 40 60 80

0 20 40 60 80 100 120 140

Throughput in Mbps

Sender Buffer size in KBytes

short long double char octet struct

21

VisiBroker DII 'Blackbox' Performance

0 20 40 60 80

0 20 40 60 80 100 120 140

Throughput in Mbps

Sender Buffer size in KBytes

short long double char octet struct

22

VisiBroker DII Client Datapath

CORBA_Request::

send_oneway

CORBA_Request::

_invoke

CORBA_Request::

_marshal_out

CORBA_Object::

_send

CORBA_Any::write_value

_write_value(NCostream&, TypeCode&) _write_value(NCostream&, TypeCode&,MarshalInBuf)

Marshal individual fields of the struct

memcpy

PMCStubInfo::send

PMCIIOPStream::

sendMessage

PMCIIOPstream::_writev

CORBA::MarshalInBuffer::

get

memcpy writev

OS Kernel

Orbix High-Cost Functions



Orbix sequences for 128K user bu er

Test Time %Exec #Calls Name

(msec)

--- scalars 27,864 98.14 513 _libc_write

352 1.24 2,560 memcpy

struct 25,740 12.38 44,085,834 cleanfree 18,893 9.09 46,190,107 _free_unlocked 14,328 6.89 44,092,964 realfree

9,710 4.67 4,195,328 count 7,522 3.62 46,190,159 malloc 6,461 3.11 46,190,158 new

4,855 2.34 2,097,664 CORBA::typeCode

::putValue

(7)

VisiBroker High-Cost Functions



VisiBroker sequences for 128K user bu er

Time %Exec #Calls Name

(msec)

--- octets/chars

3,630 90.50 512 writev

353 8.81 8,195 memcpy

doubles

1,413 27.81 8,414,220 memcpy

1,229 24.21 512 writev

1,173 23.09 8,388,608 CORBA::MarshallInBuffer ::operator>>(double&) 880 17.32 8,389,632 CORBA::MarshallInBuffer

::get(char*)

352 6.93 512 CORBA::MarshallInBuffer ::get(double*)

25

VisiBroker High-Cost Functions (cont,d.)

Time %Exec #Calls Name

(msec)

--- longs

2,346 29.41 16,777,216 CORBA::MarshallInBuffer ::operator>>(long&) 1,883 23.60 16,817,163 memcpy

1,760 22.06 16,777,728 CORBA::MarshallInBuffer ::get(char*)

1,249 15.65 512 writev

704 8.82 512 CORBA::MarshallInBuffer ::get(long*)

shorts

4,693 33.87 33,554,422 CORBA::MarshallInBuffer operator>>(short&) 3,520 25.40 33,554,944 CORBA::MarshallInBuffer

::get(char*) 2,824 20.38 33,627,659 memcpy

1,408 10.16 512 CORBA::MarshallInBuffer ::get(short*)

1,367 9.86 512 writev

26

VisiBroker High-Cost Functions (cont,d.)

Time %Exec #Calls Name

(msec)

--- structs

7,279 25.96 9,216 writev

5,200 20.00 2,796,032 various marshalling functions

4,080 14.55 39,321,438 memcpy

3,188 11.37 16,776,192 write_value(NCOstream&, CORBA:::TypeCode*, CORBA::MarshallInBuffer) 2,053 7.32 19,572,736 CORBA::MarshalInBuffer

::get(char*) 1,702 6.07 2,796,032 _write_struct_value

(NCOstream, TypeCode) 1,466 5.23 13,980,160 CORBA::TypeCode::

member_type 1,408 5.02 16,776,192 CORBA::TypeCode::

member_count

VisiBroker DSI 'Blackbox' Performance

0 5 10 15 20 25 30

0 20 40 60 80 100 120 140

Throughput in Mbps

Sender Buffer size in KBytes

short

long

double

char

octet

struct

(8)

Measuring Server-side Request Demultiplexing Overhead



An interface with N (N is large, say 100) methods de ned



Method names chosen arbitrarily



In each iteration, client invokes the nal method M times (M is large, say 100)



Repeat the experiment 100, 500, 1,000 times

29

Client-side Latency in seconds

Version Iterations

1 100 500 1,000

---

Orbix 0.032 5.23 31.86 64.21

VisiBroker 0.010 4.53 24.44 50.06 ---

30

Demultiplexing in ORBs

2: DEMUX TO

I/O HANDLE

OBJECTOBJECT N::N:: METHODMETHOD KK

OBJECTOBJECT N::N:: METHODMETHOD 11

OBJECTOBJECT 1::1:: METHODMETHOD KK

OBJECTOBJECT 1::1:: METHODMETHOD 22

...

...

... ... ... ... ... ... ...

OBJECTOBJECT 1::1:: METHODMETHOD 11

DE

DE -- LAYERED REQUEST LAYERED REQUEST DEMULTIPLEXER DEMULTIPLEXER

METHODMETHOD KK

METHODMETHOD 22

...

...

METHODMETHOD 11 ...

...

...

...

...

...

...

IDL IDL SKEL

SKEL 11 SKEL SKEL IDL IDL 22 SKEL SKEL IDL IDL M M

OBJECT ADAPTER

OBJECT

OBJECT 11 OBJECT OBJECT 22 OBJECT OBJECT N N 4: DEMUX TO

SKELETON

5: DEMUX TO METHOD

3: DEMUX TO METHOD

1: DEMUX THRU PROTOCOL STACK

3: DEMUX TO OBJECT

OBJECT ADAPTER LAYERED

DEMULTIPLEXING

DE

DE -- LAYERED LAYERED DEMULTIPLEXING DEMULTIPLEXING

2: DEMUX TO

I/O HANDLE

1: DEMUX THRU PROTOCOL STACK

Demultiplexing and Dispatching Overhead in ORBs



Both Orbix and VisiBroker pass the method name in the request



Passing a string adds to the amount of in- formation carried in the request header



Orbix uses strcmp and linear search on the server side to demultiplex incoming request



For very large interfaces, this is undesirable

(9)

Hand-crafted Optimizations



For both the ORBs, hash the method name to a numeric value and pass this information in the request header



For Orbix, use numeric comparisons using the hashed value for method name lookup

33

Client-side Latency in seconds

Version Iterations

1 100 500 1,000

---

Orbix 0.032 5.23 31.86 64.21

Optimized Orbix 0.028 4.21 29.97 60.67

VisiBroker 0.010 4.53 24.44 50.06

Optimized VisiBroker 0.010 3.86 24.04 49.36 ---

34

Latency Measurements over ATM



Latency measurements for twoway methods



Methods are of two types:

1. Parameterless methods

2. Methods that send a sequence of octets and structs of primitive data types



Sequence parameter takes range of bu er sizes from 1 to 1,024 units of the speci c data type

Latency for Parameterless Method Invocation

0.0 0.5 1.0 1.5

Latency in msec

CORBA (Orbix) CORBA (Visigenic) TCP/IP (ACE)

(10)

Latency for Methods Sending Sequences

0.0 200.0 400.0 600.0 800.0 1000.0

Units sent (1 struct unit = 32 bytes, 1 octet unit = 1 byte) 0.0

1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 12.0 13.0 14.0 15.0

Latency in msec

CORBA (Orbix) structs CORBA (Visigenic) structs TCP/IP (ACE) structs CORBA (Orbix) octets CORBA (Visigenic) octets TCP/IP (ACE) octets

37

Optimizations for High Performance ORBs



Lack of integration with advanced OS and network features:

{

Provide hooks to utilize OS features such as Real- Time scheduling, Multi-threading, and zero-copy bu er management



Inecient server demultiplexing techniques:

{

Use delayered demultiplexing and ecient packet lters



Long chains of intra-ORB function calls:

{

Use Integrated Layer Processing and Inlining

{

Use compiler techniques to automate this

38

Optimizations for High Performance ORBs



Excessive presentation layer conversions and data copying:

{

Use ecient stub compilers

{

Achieve an optimal tradeo between compiled ver- sus interpreted stubs



Non-optimized bu ering algorithms used for network reads and writes:

{

Use optimal bu er sizes and ow control

Gigabit CORBA Optimizations

REQUEST DEMULTIPLEXING

OPTIMIZATIONS COMMUNICATION

PROTOCOL OPTIMIZATIONS PRESENTATION

LAYER OPTIMIZATIONS

NETWORK ADAPTER OPTIMIZATIONS

DII

DII ORB ORB

INTERFACE INTERFACE

OBJECT OBJECT REQUEST BROKER REQUEST BROKER

op() op()

IDL IDL

STUBS STUBS

OBJECT OBJECT ADAPTER ADAPTER

IDL

IDL

SKELETON SKELETON

DSI DSI

in args

in args out args + return value out args + return value

OBJECT OBJECT IMPLEMENTATION IMPLEMENTATION CLIENT

CLIENT

DATA COPYING OPTIMIZATIONS

OS I OS I

//

OO SUBSYSTEM SUBSYSTEM

OS I OS I

//

O SUBSYSTEM

NETWORK

NETWORK

NETWORK

NETWORK ADAPTER ADAPTER

NETWORK NETWORK ADAPTER ADAPTER

I/O

SUBSYSTEM OPTIMIZATIONS

(11)

Real-time CORBA

...

...

... ... ... ... ... ... ...

GIGABIT I/O SUBSYSTEM GIOP/IIOP END-TO-END TRANSPORT

OBJECT ADAPTER

OBJECTOBJECT N::N:: METHODMETHOD KK

OBJECTOBJECT N::N:: METHODMETHOD 11

OBJECTOBJECT 1::1:: METHODMETHOD KK

OBJECTOBJECT 1::1:: METHODMETHOD 22

OBJECTOBJECT 1::1:: METHODMETHOD 11

DE

DE -- LAYERED REQUEST LAYERED REQUEST DEMULTIPLEXER DEMULTIPLEXER

ZERO ZERO COPY COPY BUFFERS BUFFERS

APPLICATION SPECIFIC

CODE

41

Integrated View of CORBA Optimizations

REAL-TIME REQUEST SCHEDULING

QUEUES REAL

REAL--TIME THREADSTIME THREADS AND UPCALLS AND UPCALLS

ZERO ZERO COPY COPY BUFFERS BUFFERS

REAL-TIME OBJECT ADAPTER

ATM PORT INTERFACE CONTROLLER

(APIC)

GIGABIT I/O SUBSYSTEM

VC

VC11 VCVC22 VCVC33 VCVC44 VCVC55

...

OBJECTOBJECTN::N::METHODMETHOD11

OBJECTOBJECT1::1::METHODMETHOD22

OBJECTOBJECT1::1::METHODMETHOD11

DE

DE--LAYERED REQUESTLAYERED REQUEST DEMULTIPLEXER DEMULTIPLEXER

APPLICATION SPECIFIC

& CORBA CODE SERVICES

REAL-TIME GIOP/IIOP PROTOCOLS

TCP/IP ATM RTP

ATM NETWORK

...

...

... ... ... ...

OBJECTOBJECTN::N::METHODMETHODKK

OBJECTOBJECT1::1::METHODMETHODKK

PRESENTATION LAYER PRESENTATION LAYER Q

QOOSS SPECIFICATIONS SPECIFICATIONS

42

Concluding Remarks



CORBA is a promising architecture for dis- tributed computing



Conventional CORBA implementations are not tuned for high-performance or real-time systems

{

Note, low-speed networks often hide performance overhead



Ultimately, an integrated approach is the best solution



Optimizations must be applied at multiple layers

{

e.g., network/OS/protocol/ORB

References

Related documents

Structs &amp; Alignment CSE 351 Spring 2021 Structs &amp; Alignment CSE 351 Spring 2021 Instructor: Ruth Anderson Teaching Assistants: Allen Aby Joy Dang

[r]

Support for Location Forwarding The location forward mechanism is used by the Orbix 3.3 daemon and the Orbix E2A locator service to direct clients to the true location of a

&gt; Reduces long lists of ideas &gt; Reduces long lists of ideas &gt; Identifies important items &gt; Identifies important items •• Nominal Group Technique Nominal Group Technique

(i) State the colour change you would expect at the end-point of a titration when sulfuric acid is added to potassium hydroxide using

Architecture (CORBA) is an international standard for an Object Request Broker - middleware to manage communications between distributed objects. •  Several implementation of

This type of client application uses C++ environmental objects to access the CORBA objects in an Oracle Tuxedo domain and the CORBA C++ Object Request Broker (ORB) to process

Based on the English translation version of Six Chapters of a Floating Life, the study analyses the manifestation of traditional Chinese culture and methods to