• No results found

SemCast: Semantic Multicast for Content-based Data Dissemination

N/A
N/A
Protected

Academic year: 2021

Share "SemCast: Semantic Multicast for Content-based Data Dissemination"

Copied!
23
0
0

Loading.... (view fulltext now)

Full text

(1)

SemCast:

Semantic Multicast for

Content-based Data

Dissemination

Olga Papaemmanouil Uğur Çetintemel

(2)

Wide Area Stream Dissemination

Applications

Network monitoring

Real-time financial

services/enterprise

News services (RSS feeds)

Characteristics

High data volume

Highly dispersed sources &

destinations

Low latency delivery

Data Sources Clients

(3)

Content-based Dissemination

High level of expressiveness

Profiles are query predicates

Sources/destinations decoupling

(4)

Content-based Pub/Sub

Content-based routing

Predefined acyclic overlay

network

SIENA [Carzaniga et al., 2001],

Gryphon [Banavar et al., 1999]

Predicate-based filtering

network

Upstream profile aggregation

Limitations

Processing cost at each hop

(5)

Processing cost overhead

At each level:

 Content-based filtering at each level

 Maintain data structures at every

broker

 Compression/decompression

 Usually with XML streams

 Encryption/decryption

 Integrity-critical applications (e.g.,

financial feeds, distributed games)

Forwarding costs could dominate

dissemination costs

(6)

Bandwidth overhead

Missed tree optimization

opportunities

Clients’ profiles could

overlap

One spanning tree is

suboptimal

Intermediate nodes may

receive irrelevant data

S1 S

1

Cost = 13 msg/time unit Max Latency= 3 hops

Cost = 10 msg/time unit Max Latency =3 hops

(7)

Our approach: Semantic Multicast

Constructs multiple content-based (a.k.a. semantic)

dissemination channels

 Semantic split of incoming streams

 Channels characterized by their content  Independent overlay dissemination trees

(8)

SemCast’s advantages

Low processing cost

Eliminates local filtering at interior brokers

Low processing requirements for

intermediate nodes

Low overall bandwidth requirements

Low-cost dissemination trees

Enables latency-cost trade off

Delay-bounded trees for latency-sensitive

applications

(9)

Content-based Channelization

SemCast decides:

 Number of channels  Content of channels

 Clients subscriptions to channels  Channel implementation

Operational goals

 No false exclusion

 Low run-time cost : Overall bandwidth consumption

Minimize redundancy among channels’ contents

Create efficient dissemination trees

(10)

System Model

 Source brokers (S)

 Receive streams from

publishers

 Gateway brokers (GB)

 Receive profiles from

subscribers

 Rendezvous points (RP)

 Roots of channels

 Interior brokers (I)

 Forward incoming

messages  Coordinators

 Identify content of Subscribers

S S S RP RP RP GB Coordinator Publishers I I I GB GB GB GB

(11)

SemCast overview

Subscription management

 Join existing channels or create new channels

Periodically reorganize channels

 Exploits overlap among profiles

 Content of channel defined by profiles assigned to it

 Assign similar profiles to same channel  Identify overlap between profiles

 Statistical & syntactical information  Discover containment relations

(12)

SemCast discovers

containment

hierarchies

P

j

contains

P

i

Messages matching P

i

are subset of

those matching P

j

Profile syntax reveals containment

relations

Statistics approximate containment

relations

Hierarchies are possible semantic

Containment relations

P

j

P

i

(13)

Cost-based Channelization

Containment hierarchies might cause high

data redundancy

High message rate for non-overlapping part

Small message rate for overlapping part

Partial overlapping profiles may increase

cost

Assign similar profiles to different channels

Forward matching messages to more than one

channel

(14)

Cost-based containment relations

Place profiles in the same hierarchy (channel) if

it

improves bandwidth consumption

If P

j

covers P

i

, compare cost

Pi and Pj in same channel

Pi on different channel

 Apply lowest-cost scenario

Cost estimation

 Statistics on message rate

 Approximate number of edges

(15)

Hierarchy Merging

Merging hierarchies with partial overlap reduces cost

 Use a cost-based model

 Send non overlapping part through “noise” channels

E2 : x:[15,..,50] E1 : x:[25,..,55]

Destinations D1

(16)

Hierarchy Merging

Merging hierarchies with partial overlap reduces cost

 Use a cost-based model

 Send non overlapping part through “noise” channels

E1 ∧ E2: x:[25,..,50] E1 ∧ ⌐ E2: x:[50,..,55] 25 50 55 15 Destinations D1 Destinations D1+ D2 Destinations D2 E1 ∧ E2 x:[15,..,25]

(17)

Incremental tree construction

Base low cost heuristic

 Request channel’s gateway broker from RP

 Find min-cost path to all destinations in the channel  Connect to closest one

 Incremental Steiner tree

Delay-bounded trees

 Find min-cost path to one broker in the tree that covers delay

bound. C RP A C S B C

(18)

Experimental study

Metrics

 Processing cost  Bandwidth efficiency 

Approaches

Unicast approach

SPT: Shortest Path Tree approach

Distributed pub-sub system [Carzaniga et al., 2001]

SemCast: Distributed content-based channelization

SemCast-O: Centralized Steiner tree construction

Network environment

(19)

0 10 20 30 40 50 60 70 80 90 100 0.02 0.06 0.18 0.5 profile selectivity % c os t d eg ra da tio n ov er S em C as t-O Unicast SPT SemCast

Bandwidth

efficiency-Disjoint profiles

SemCast’s trees are close to Steiner trees

Network size = 600 nodes Profiles= 7000

less distinct profiles higher bandwidth consumption

(20)

Bandwidth

efficiency-Overlapping profiles

-170 -140 -110 -80 -50 -20 10 40 70 0.05 0.25 0.45 0.65 0.85 1

aggregated profile overlap

% c os t i m pr ov em en t o ve r S P T Unicast SemCast Network size =600 Profiles =7000 Profile selectivity =2%

SemCast performs better than SPT with overlapping profiles

higher partial overlap higher bandwidth improvement

(21)

Scalability

0 10 20 30 40 50 60 0.05 0.25 0.45 0.65 0.85 1

aggregated profile overlap

% c o s t im p ro v e m e n t o v e r S P T SemCast (N=300) SemCast (N=400) SemCast (N=600) N = Network size Profiles =5 × N

higher partial overlap higher bandwidth improvement

(22)

Related Work

Publish-Subscribe Systems

 Distributed approaches: Content- based Routing

Gryphon [Opyrchal et al., 2000]SIENA [Carzaniga et al., 2001]ONYX [Diao et al, 2004]

 Centralized approaches: XML Filtering

XFilter [Altinel et al., 2000] , YFilter [Diao et al., 2002] XTrie [Chan et al., 2002]

Application-Level Multicast

SplitStream [Castro et al., 2003]SCRIBE [Castro et al., 2002]

(23)

Conclusions & Ongoing Work

SemCast

 Performs semantic split of incoming streams  Eliminates local filtering in interior brokers  Improves bandwidth consumption

Ongoing work

 SemCast prototype

 More expressive profiles

 Statefull subscriptions

References

Related documents

1) Aristotle CR Question Bank 2) US B-Schools Ranking 2012 3) Quant Concepts & Formulae 4) Global B-School Deadlines 2012 5) OG 11 & 12 Unique Questions’ list.. 6)

As Skute is intended to be used with real-time applications, a query should not be routed through multiple servers before reach- ing its destination. Routing has to be

During the processing of cluster system, each web server’s load is changing as time goes on, the system has to estimate the load-balance according to the real-time server load,

 The insured may assign the benefit of payment not to exceed $3,000 to a vendor providing services or materials to mitigate or repair damage directly arising from a covered

Ministry Studies Certificate – awarded to all students who complete at least Baptist Church History, Ministry of Worship, and Doctrine of the Church and earn an approved M.T.S..

In Figure 4 presents summary data of the dynamics of the number of children with disabilities in general education institutions of Rivne and educational and rehabilitation

Final game project classes give students a more holistic view of computer science and teach or reinforce a variety of skills, including computational thinking skills, software

Examining these elements helps identify two distinct processes in the transformation of representations of the local population, both linked to political interests