• No results found

ARM Cortex-A9 MPCore Multicore Processor Hierarchical Implementation with IC Compiler

N/A
N/A
Protected

Academic year: 2021

Share "ARM Cortex-A9 MPCore Multicore Processor Hierarchical Implementation with IC Compiler"

Copied!
12
0
0

Loading.... (view fulltext now)

Full text

(1)

ARM Cortex-A9 MPCore

Multicore Processor

Hierarchical Implementation p with IC Compiler

DAC 2008 Philip Watson Philip Watson

Implementation Environment Program Manager ARM Ltd

(2)

Background - Who Are We?

ƒ Processor Division, Cores Implementation, ARM-India. This team is actively involved in processor development benchmarking

ƒ The team has been working alongside the development of the microarchitecture of the ARM® Cortex-A9 processor since early development and test

development and test

ƒ The outcome of this effort is to showcase

ƒ Power consumption

ƒ Performance

ƒ Area

ƒ The effort is focused on making the Cortex-A9 processor core a deployable embedded solutionp y

(3)

Partnership Through the Design Chain

The CPU is at the heart of the The RM ties all

this together, piloting the route from RTL

system-on-chip the route from RTL

to Silicon

Processors

Reference Methodology We work with major

We partner

with silicon foundries

t id di it f

Mutual Customers Fabric &

Physical IP EDA Tools

Methodology EDA companies to

ensure our IP works seamlessly

to provide diversity of SoC implementation

and manufacturing choice

Mutual Customers

SoCs require high performance fabric EDA tools provide

the environment to

l it thi IP and quality physical IP

exploit this IP

(4)

Cortex-A9 MPCore

Multicore Solutions

The relative performance and power range of an ARM processor enabled by its ARM Physical IP

Performance

15% CPU performance

Performance Platform

Mainstream Platform

MHz performance

boost !

15% lower power, higher density Density Optimized

Platform

(5)

Challenges with Cortex-A9 MPCore

ƒ Implementation run time with all EDA tools is a key challenge for design

ƒ Implementation run time with all EDA tools is a key challenge for design closure, particularly with scalable performance processor designs

ƒ Iteration time increases as the design size increases

ƒ The iterations influence our ability to turnaround floor plan changes, tailor optimizations, allow the debug of constraints and design feedback

– this is a key to converging results 6.0

4.0 5.0

1.0 2.0 3.0

0.0 1.0

9 MP 1x with Neon 9 MP 2x with Neon 9 MP 4x with Neon

A9 A9 A9

Gate Count Run time

(6)

Challenges with Cortex-A9 MPCore

ƒ Implementation of 1 CPU vs 4 CPU Cortex-A9 with flat flow

Configuration 1CPU, 1 Neon, 32K D$, 32K I$, 32 interrupts 4CPU, 4 Neon, 32K D$, 32K I$, 32 interrupts

Process Technology TSMC CLN65LP TSMC CLN65LP

Standard Cell Library 12Track – Nominal VT 12Track – Nominal VT

Memory Library Optimized fast cache instances Optimized fast cache instances

ƒ The 4 CPU solution gives:g

ƒ A significant increase in run time

ƒ Potentially some drop in performance (frequency)

…as compared to a 1 CPU implementation.p p

(7)

Hierarchical Implementation with IC Compiler

For faster TTR For faster TTR

Cortex-A9 cpu0 Placement (X Hrs) CTS (Y Hrs)

Cortex-A9 cpu1 Placement (X Hrs) Routing (Z Hrs)

Cortex-A9 MPCore Cortex-A9 top only

CTS (Y Hrs)

Routing (Z Hrs)

Cortex-A9 cpu2 Placement (X Hrs) CTS (Y Hrs)

Routing (Z Hrs)

Placement (A Hrs)

CTS (B Hrs)

Cortex-A9 cpu3 Placement (X Hrs) CTS (Y Hrs) Routing (Z Hrs)

Routing (C Hrs)

( )

Routing (Z Hrs) Total Run Time = X + Y + Z + C Hrs

(8)

Steps involved

Hierarchical Implementation with IC Compiler

Floorplanning

Create Physical Partition SDC &

ScanDef

Steps involved

Create Physical Partition

Partition Aware Place

Power Network Synthesis

Power Network Analysis In-Place Optimization

Clock Planning Pin Assignment

Budgeting

(9)

Cortex-A9 MPCore Multicore Solutions

The relative performance and power range of an ARM processor enabled by its Artisan® physical IP

Cortex-A9 Hierarchical Flow (with IC Compiler)

( p )

Performance

15% CPU performance

Mainstream Platform

MHz

Performance Platform

performance boost !

15% lower power, higher density Density Optimized

Platform

mW

(10)

Hierarchical Implementation with IC Compiler

Results

Configuration 4CPU, 4 Neon, 32K D$, 32K I$, 32 interrupts 4CPU, 4 Neon, 32K D$, 32K I$, 32 interrupts

Process Technology TSMC CLN65LP TSMC CLN65LP

St d d C ll Lib 12T k N i l VT 12T k N i l VT

ƒ Implementation of 1 CPU Cortex-A9 flat vs 4 CPU Cortex-A9 hierarchical flow

Standard Cell Library 12Track – Nominal VT 12Track – Nominal VT

Memory Library Optimized fast cache instances Optimized fast cache instances

Implementation flow Flat Hierarchical

3.0 3.5 4.0 4.5

1.0 1.5 2.0 2.5

The 4 CPU implemented with a hierarchical flow gives:

0.0 0.5

A9 MP 1x with Neon A9 MP 2x with Neon A9 MP 4x with Neon

ƒ Comparable QoR in performance (frequency)

ƒ 25% additional run time

…when compared to a 1CPU flat implementation

Gate Count Run time hierarchical

(11)

Next Steps

ƒ Handling efficiently Multiple Instantiated Module (MIM) for symmetric cores

(12)

Summary

ƒ Hierarchical flow delivers much faster iteration time with no loss of QoR

ƒ Simple and effective strategy to implement a multicore processor

ƒ Reduction in high memory cluster requirements

ƒ Lends itself very well for low power partitioning

ƒ Advanced low power management such as State Retention Power Gatingp g g

ƒ Leakage mitigation by power shutdown if the hardware is not being utilized

ƒ Easily deployable for the partner base (estimated by end of 2008)

ƒ In an ARM-Synopsys iRM (implementation Reference Methodology) with:In an ARM Synopsys iRM (implementation Reference Methodology) with:

ƒ Floorplan

ƒ Tcl Scripts (Complete flow from RTL to GDSII)

ƒ Physical IP Libraries

ƒ Physical IP Libraries

ƒ ARM Documentation - Core Signoff Guide

…providing an out-of-box solution from ARM

References

Related documents