• No results found

Maximizing Parallelism in NP. Ran Giladi EZchip Technologies ltd., Communication Systems Engineering Ben-Gurion University, Israel

N/A
N/A
Protected

Academic year: 2022

Share "Maximizing Parallelism in NP. Ran Giladi EZchip Technologies ltd., Communication Systems Engineering Ben-Gurion University, Israel"

Copied!
23
0
0

Loading.... (view fulltext now)

Full text

(1)

Maximizing Parallelism in NP Maximizing Parallelism in NP

Ran Giladi Ran Giladi

EZchip Technologies ltd., EZchip Technologies ltd.,

Communication Systems Engineering Communication Systems Engineering

Ben-Gurion University, Israel Ben-Gurion University, Israel

(2)

Outline Outline

Why Network Processors? Why Network Processors?

What are Network Processors? What are Network Processors?

How networking environment affect How networking environment affect performance and parallelism of NP?

performance and parallelism of NP?

What NP architectures are there to support What NP architectures are there to support parallelism (and performance)?

parallelism (and performance)?

What is good for what? What is good for what?

Questions? Questions?

(3)

Network growth: Macro view

Network growth: Macro view

(4)

Search requirements: Micro view

Search requirements: Micro view

(5)

Networking requirements Networking requirements

Exponential growth – higher than processing growth Exponential growth – higher than processing growth

Classifying, lookups, (modifications), forwardingClassifying, lookups, (modifications), forwarding

Multi processors and parallelism is a mustMulti processors and parallelism is a must

Von Neumann (MSISD, MIMD, SIMD, MISD) / Von Neumann (MSISD, MIMD, SIMD, MISD) / Dataflow

Dataflow

NoC provide switching and not processing NoC provide switching and not processing

Traffic Managers, Switch Fabrics, Packet Traffic Managers, Switch Fabrics, Packet processors, GP multi core, co-processors, search- processors, GP multi core, co-processors, search-

engines, application processors,…

engines, application processors,…

(6)

NP Typology NP Typology

Low entry (<1Gbps) – AccessLow entry (<1Gbps) – Access

Wintegra, Agere, PMC sierra, Intel (IXP300)Wintegra, Agere, PMC sierra, Intel (IXP300)

Medium level (2-5Gbps) – Legacy and multi service, Medium level (2-5Gbps) – Legacy and multi service, service cards, data center and L7 applications

service cards, data center and L7 applications

Legacy: AMCC, Intel, C-port, Agere, Vittese, IBM (Hifn)Legacy: AMCC, Intel, C-port, Agere, Vittese, IBM (Hifn)

Multi core: Cavium, RMI, Broadcom, Silverback, ChelsioMulti core: Cavium, RMI, Broadcom, Silverback, Chelsio

High end (10-40Gbps) – Metro, line cardsHigh end (10-40Gbps) – Metro, line cards

EZchip, Xelerated, Sanburst, Bay microEZchip, Xelerated, Sanburst, Bay micro

(7)

Performance Performance

Black box Black box ( ( system wise system wise ): ): Throughput (MPPS), Throughput (MPPS), latency, implementation issues (power, cost)

latency, implementation issues (power, cost)

White box White box ( ( chip level chip level ): ):

Internal aggregated bandwidth Internal aggregated bandwidth ((GbpsGbps))

Internal memory, inter-processorsInternal memory, inter-processors

External aggregated bandwidthExternal aggregated bandwidth (Gbps)(Gbps)

mainly memories for frames, rules DB, look-aside data & mainly memories for frames, rules DB, look-aside data &

statistics statistics

Operations (& searches) per secondOperations (& searches) per second

How much “real work” rate is achieved?How much “real work” rate is achieved?

Or, how suffice is a wire from ingress to egress…Or, how suffice is a wire from ingress to egress…

(8)

Parallelism: Enablers Parallelism: Enablers

Multiprocessors Multiprocessors - the usual suspects - the usual suspects

Hardware assists Hardware assists – essential in packet – essential in packet processing

processing

Traffic managersTraffic managers

Search and classification enginesSearch and classification engines

Data paths Data paths – critical for lots of parallel, – critical for lots of parallel, asynchronous processing

asynchronous processing

Schedulers Schedulers – resource optimization, and usage – resource optimization, and usage transparency

transparency

(9)

Parallelism: NP Parallelism: NP

Search engines & traffic managers - againSearch engines & traffic managers - again

Packets independencePackets independence

Line card apps vs. L7 and stateful processingLine card apps vs. L7 and stateful processing

Implications: buffer usage, processing stall, functionalityImplications: buffer usage, processing stall, functionality

Instruction set for packet processingInstruction set for packet processing

Heterogeneous vs. homogeneous processorsHeterogeneous vs. homogeneous processors

General purpose vs. optimized processorsGeneral purpose vs. optimized processors

work” per instructionwork” per instruction

Flexibility – changing networking tasks & protocolsFlexibility – changing networking tasks & protocols

Tight real time constraintsTight real time constraints

Parallelism transparency and ease of use – Parallelism transparency and ease of use – programming model

programming model

(10)

NP architectures NP architectures

Homogenous parallelized processors Homogenous parallelized processors ((µµ//pico engines)pico engines)

IBM (Hifn), Intel, Multi-cores (mainly MIPS), C-PortIBM (Hifn), Intel, Multi-cores (mainly MIPS), C-Port

Pipeline of homogenous processorsPipeline of homogenous processors

Xelerated, C-Port, IntelXelerated, C-Port, Intel

Pipeline of heterogonous processors (Agere)Pipeline of heterogonous processors (Agere)

Parallel pipelines of homogenous processors (Cisco)Parallel pipelines of homogenous processors (Cisco)

(11)

Homogenous parallelized processors Homogenous parallelized processors

IBM NP Architecture IBM NP Architecture

(12)

Zooming in – IBM NP

Zooming in – IBM NP

(13)

IBM NP - MPSoC IBM NP - MPSoC

EPC – Embedded Processor Complex ePwrPC – embedded PowerPC Complex TSE – Tree Search Engine

PMM – Physical MAC Multiplexer

DASL – Data aligned synchronous link

(14)

Pipeline of homogenous processors Pipeline of homogenous processors

Xelerated architecture Xelerated architecture

(15)

Pipeline of heterogonous processors Pipeline of heterogonous processors

Agere System Interface (ASI) Fast Pattern

Processor (FPP)

Routing Switch Processor (RSP)

Configuration Bus

µP

Agere’s PayloadPlus architecture Agere’s PayloadPlus architecture

AP550 State Engine

Classification processor

Packet Modifier and Traffic

Manager

µP

(16)

Parallel pipelines Parallel pipelines

Cisco’s Toaster Architecture Cisco’s Toaster Architecture

Express Micro Controller (XMC)

(dual instruction issue execution unit)

(17)

Pipelines of parallel processors Pipelines of parallel processors

NP-1c

link/switch fabric queuing

MAC

TOPresolve engines

TOPsearch I engines

TOPparse engines TOPmodify engines

TOPsearch II engines external

memory

memory memory memory memory

EZchip Architecture EZchip Architecture

(18)

Pipeline or Parallel?

Pipeline or Parallel?

Programming Simplicity Programming Simplicity

Processing flexibility Processing flexibility

Critical to NPs for adopting new protocols, feature Critical to NPs for adopting new protocols, feature sets, or applications

sets, or applications

Determined by large code space, scalableDetermined by large code space, scalable

Excellent in parallel, restricted in pure pipelineExcellent in parallel, restricted in pure pipeline

Control and data hazards

Shared resources and semaphores Slowest stage determines throughput Packet ordering Pipeline balancing (HW/SW)

Parallel Pipeline

(19)

Heterogeneous or Homogenous?

Heterogeneous or Homogenous?

Programming simplicity Programming simplicity

Similar instruction setSimilar instruction set

Flexibility in function locationsFlexibility in function locations

Silicon efficiency Silicon efficiency

Unique capabilities are either restricted or wastedUnique capabilities are either restricted or wasted

Unique hardware accelerators are still a mustUnique hardware accelerators are still a must

(20)

Simulations Simulations

What to measure? What to measure?

Simulation models (OMNeT++) of pure parallel Simulation models (OMNeT++) of pure parallel (Intel), pure pipeline (Xelerated), and pipeline of (Intel), pure pipeline (Xelerated), and pipeline of

parallel processors (Ezchip) were used to parallel processors (Ezchip) were used to

evaluate processors utilization evaluate processors utilization

Results are highly dependent on networking Results are highly dependent on networking application

application

(21)

What is good for what What is good for what

Multicore: Multicore:

Greater flexibilityGreater flexibility

Applications, multi services, L7 tasksApplications, multi services, L7 tasks

=> service cards=> service cards

Deterministic Pipeline: Deterministic Pipeline:

Classifying, forwardingClassifying, forwarding

=> line cards=> line cards

Run to completion Pipeline: Run to completion Pipeline:

Deep packet inspection, classifying, L4-L7 appsDeep packet inspection, classifying, L4-L7 apps

=> line cards, service cards=> line cards, service cards

(22)

Conclusions Conclusions

Parallelism enables maximal utilization of Parallelism enables maximal utilization of

silicon for providing power (work rate = packets silicon for providing power (work rate = packets

per second multiplied by processing per per second multiplied by processing per

packet) packet)

Heterogeneous, task optimized processors, Heterogeneous, task optimized processors, with similar instruction set, organized in a with similar instruction set, organized in a

pipeline of parallel units, enables maximum pipeline of parallel units, enables maximum

parallelism while preserving simple parallelism while preserving simple

programming model.

programming model.

(23)

References (& origin of title) References (& origin of title)

Shah, N. & Keutzer, K. “Network Processors:

Origin of Species”, Proc. of ISCIS XVII, 2002.

Tullsen, D.M., Eggers, S.J., & Levy, H.M.,

“Simultaneous Multithreading: Maximizing On-Chip Parallelism”, Proceedings of ISCA XXII, 1995.

References

Related documents

In order to inform a wide range of audience of the ALLVET project and enhance its visibility, different dissemination materials will be prepared.. Dissemination materials will

The Episcopal Training Institution, a training College for teachers for Episcopal schools was first established in Edinburgh in a house called Croft-an-Righ near Holyrood in

The experiment designed to study the influences of nutritional eicosapentaenoic (EPA) and doco- sahexaenoic (DHA) fatty acids (FA) on performance, egg yolk fat characteristics and

 Official openings for CPP projects included LaFortune Park Disc Golf course, including a provincial tournament which attracted more than 200 players to the Park and community

South Asian Clinical Toxicology Research Collaboration, Faculty of Medicine, University of Peradeniya, Peradeniya, Sri Lanka.. Department of Medicine, Faculty of

The local implementation of California’s Medical Marijuana Identification Card (MMIC) Program is currently under discussion by the Orange County Board of Supervisors.. Although

Today’s supercomputers or multiprocessor systems which can provide huge parallelism has become the dominant computing platforms (through the proliferation of multi-core processors),

Landscape architects, like their fellow design professionals, architects and engineers, are required to be licensed in the state of Colorado.. Licensure, and the landscape