• No results found

Transactional Memory

N/A
N/A
Protected

Academic year: 2021

Share "Transactional Memory"

Copied!
13
0
0

Loading.... (view fulltext now)

Full text

(1)

Transactional Memory

Transactional Memory

Konrad Lai

Konrad Lai

Microprocessor Technology Labs, Intel

Microprocessor Technology Labs, Intel

Intel

Intel MulticoreMulticore University Research ConferenceUniversity Research Conference Dec 8, 2005

(2)

Motivation

Motivation

y

y

Multiple cores face a serious programmability problem

Multiple cores face a serious programmability problem

à

à Writing correct parallel programs is very difficultWriting correct parallel programs is very difficult

y

y

Transactional Memory

Transactional Memory

addresses

addresses

key part of the problem

key part of the problem

à

à Makes parallel programming easier by simplifying coordinationMakes parallel programming easier by simplifying coordination à

(3)

What is Transactional Memory?

What is Transactional Memory?

Basic mechanisms:

Basic mechanisms:

y

y

Isolation: Track read and writes, detect when conflicts occur

Isolation: Track read and writes, detect when conflicts occur

y

y

Version management: Record new/old values

Version management: Record new/old values

y

y

Atomicity: Commit new values or abort back to old values

Atomicity: Commit new values or abort back to old values

Transactional Memory (TM) allows arbitrary multiple memory locat

Transactional Memory (TM) allows arbitrary multiple memory locations to ions to be updated atomically be updated atomically begin_xaction begin_xaction A = A A = A –– 2020 B = B + 20 B = B + 20 A = A A = A –– BB C = C + 20 C = C + 20 end_xaction end_xaction Thread 1 Thread 1 begin_xaction begin_xaction C = C C = C -- 3030 A = A + 30 A = A + 30 end_xaction end_xaction Thread 2 Thread 2 Thread 1 Thread 1’’s s accesses and accesses and updates to A, B, C updates to A, B, C are atomic are atomic

Thread 2 sees either

Thread 2 sees either

“allall”” or or ““nonenone”” of of Thread 1

(4)

Problem: Lock-Based Synchronization

Problem: Lock-Based Synchronization

y

y Software engineering problemsSoftware engineering problems

à

à LockLock--based programs do not composebased programs do not compose

à

à Performance and correctness tightly coupledPerformance and correctness tightly coupled

à

à Timing dependent errors are difficult to find and debugTiming dependent errors are difficult to find and debug

y

y Performance problemsPerformance problems

à

à High performance requires finer grain lockingHigh performance requires finer grain locking

à

à More and more locks add more overheadMore and more locks add more overhead

Need a better concurrency model for multi

Need a better concurrency model for multi

-

-

core software

core software

Lock

Lock

-

-

based synchronization of shared data access

based synchronization of shared data access

Is fundamentally problematic

(5)

Transactional Memory benefits

Transactional Memory benefits

y

y

Focus: Multithreaded programmability crisis

Focus: Multithreaded programmability crisis

à

à Programmability & performanceProgrammability & performance

Î

ÎAllows conservative synchronizationAllows conservative synchronization Î

ÎProgrammer focuses on parallelism & correctness, HW extracts Programmer focuses on parallelism & correctness, HW extracts performance

performance à

à Software engineering and Software engineering and composabilitycomposability

Î

ÎAllows library reAllows library re--use and composition (locks make this very difficult)use and composition (locks make this very difficult)

à

à Critical for wide multiCritical for wide multi--core demandcore demand

y

y

Makes high performance MT programming easier

Makes high performance MT programming easier

à

à Captures a fundamental, wellCaptures a fundamental, well--known, intuitive known, intuitive ““atomicatomic”” constructconstruct à

à Been around for decadesBeen around for decades à

à Similar to a Similar to a ““critical sectioncritical section”” but without its problemsbut without its problems

Î

(6)

Software Transactional Memory

Software Transactional Memory

y

y

Software Transactional Memory (1995 until now)

Software Transactional Memory (1995 until now)

à

à Significant work from Sun, Brown, Cambridge, MicrosoftSignificant work from Sun, Brown, Cambridge, Microsoft à

à Serious performance limitationsSerious performance limitations

Î

ÎDegrades Degrades ““commoncommon”” case of no conflicts/contentioncase of no conflicts/contention Î

Î>90% of transactions are no conflicts>90% of transactions are no conflicts

– 90% of critical sections are uncontended: what if all these slowed down by 5X?

à

à Serious deployability limitationsSerious deployability limitations

Î

ÎRelies on special runtime supportRelies on special runtime support Î

ÎInvasive to applications and librariesInvasive to applications and libraries

à

à Is STM is too slow and too invasive to deploy?Is STM is too slow and too invasive to deploy?

Î

ÎCould there be better implementation?Could there be better implementation? Î

(7)

Hardware Support for Performance

Hardware Support for Performance

A.sum = 100 B.sum = 200 Core 1 Core 1 begin_xaction A.withdraw(20) B.deposit(20) end_xaction

Architectural Memory state

Architectural Memory state

A.sum = 100 B.sum = 200 1 1 A.sum = 80 B.sum = 220 A.sum = 80 B.sum = 220 1.

1. Record recovery stateRecord recovery state 2.

2. Buffer updates/track accessesBuffer updates/track accesses 3.

3. Commit if no external access (discard Commit if no external access (discard all updates if conflict)

all updates if conflict)

1 Core 2 Core 2 1 begin_xaction Sum = A.sum + B.sum end_xaction Coherence protocol Coherence protocol for conflicts for conflicts Core 2 sees $300

Core 2 sees $300 –– never $280 or $320never $280 or $320

A.sum = 100 B.sum = 200 1 1 A bo rt & res tar t A bo rt & res tar t

(8)

What if HW is not sufficient?

What if HW is not sufficient?

y

y

This is the deployability challenge

This is the deployability challenge

à

à The missing piece of the puzzle for all prior workThe missing piece of the puzzle for all prior work……

y

y

Resource limitations are fundamental

Resource limitations are fundamental

à

à Space: cachesSpace: caches

Î

ÎMore HW delays the inevitable: will always be an n+1 caseMore HW delays the inevitable: will always be an n+1 case

à

à Time: scheduling quantaTime: scheduling quanta

Î

ÎProgrammers have no control over timeProgrammers have no control over time

à

à Affects functionality, not just performanceAffects functionality, not just performance

Î

ÎSome transactions may never completeSome transactions may never complete

y

y

Making HW limit explicit is difficult

Making HW limit explicit is difficult

à

à Limited usage onlyLimited usage only à

à Unreasonable for high level languagesUnreasonable for high level languages à

(9)

Virtualize for Completeness

Virtualize for Completeness

Timer interrupts, Timer interrupts, Context switches, Context switches, Exceptions, Exceptions,…… Limited buffers Limited buffers Core 1 Core 1 1 1 1 Virtual TM

virtual address space virtual address space

Log/buffer space

y

y Overflow management Overflow management

à

à Using virtual memoryUsing virtual memory

à

à Software Software libslibs. and microcode. and microcode

Out

Out--ofof--band concurrency controlband concurrency control

y

y Programmer transparentProgrammer transparent y

y Performance isolationPerformance isolation y

(10)

Recent TM Research

Recent TM Research

y

y

Recently, focus on solving the harder problem of TM

Recently, focus on solving the harder problem of TM

à

à Making the model immune to cache buffer size limitations, Making the model immune to cache buffer size limitations, scheduling limitations, etc.

scheduling limitations, etc.

y

y

TCC (Stanford) (2004)

TCC (Stanford) (2004)

à

à Same limitation as Same limitation as HerlihyHerlihy/Moss for TM (size limited to local caches)/Moss for TM (size limited to local caches)

y

y

LTM (MIT), VTM (Intel),

LTM (MIT), VTM (Intel),

LogTM

LogTM

(Wisconsin) (2005)

(Wisconsin) (2005)

à

à Assume hardware TM supportAssume hardware TM support à

à Add support to allow transactions to be immune to resource limitAdd support to allow transactions to be immune to resource limitationsations à

à Goals of each similar, approaches very differentGoals of each similar, approaches very different

Î

ÎLTM: only resource overflowLTM: only resource overflow Î

ÎVTM: complete virtualizationVTM: complete virtualization Î

(11)

Some Research Challenges

Some Research Challenges

y

y Large transactionsLarge transactions y

y Language extensionsLanguage extensions y

y IO, loophole, escape hatches, IO, loophole, escape hatches, …… y

y Interaction and coInteraction and co--existence withexistence with

à

à Other synchronization schemes: locks, flags, Other synchronization schemes: locks, flags, …… à

à Other transactionsOther transactions

Î

ÎDatabase transactionDatabase transaction Î

ÎSystem transaction (Microsoft)System transaction (Microsoft)

à

à Other libraries, system software, operating system, Other libraries, system software, operating system, ……

y

y Performance monitor, tuning, debugging, Performance monitor, tuning, debugging, …… y

y Open Open vsvs Closed NestingClosed Nesting y

y Interaction between transaction & nonInteraction between transaction & non--transactiontransaction y

y Usage & WorkloadUsage & Workload

à

(12)

Backup

(13)

TM – First Decade

TM – First Decade

y

y

IBM 801 Database Storage (1980s)

IBM 801 Database Storage (1980s)

à

à Lock attribute bits on Lock attribute bits on virtual memoryvirtual memory (via (via TLBsTLBs, , PTEsPTEs) at 128 byte ) at 128 byte granularity

granularity

y

y

Load

Load

-

-

Linked/Store Conditional (Jensen et al. 1987)

Linked/Store Conditional (Jensen et al. 1987)

à

à Optimistic update of single cache line (Alpha, MIPS, PowerPC)Optimistic update of single cache line (Alpha, MIPS, PowerPC)

y

y

Transactional Memory (

Transactional Memory (

Herlihy&Moss

Herlihy&Moss

1993)

1993)

à

à Coined term; TM generalization of LL/SCCoined term; TM generalization of LL/SC à

à Instructions explicitly identify transactional loads and storesInstructions explicitly identify transactional loads and stores à

à Used dedicated transaction cacheUsed dedicated transaction cache

Î

Î Size of transactions limited to transaction cacheSize of transactions limited to transaction cache

y

y

Oklahoma Update (Stone et al./IBM 1993)

Oklahoma Update (Stone et al./IBM 1993)

à

à Similar to TM, concurrently proposedSimilar to TM, concurrently proposed à

References

Related documents