Transactional Memory
Transactional Memory
Konrad Lai
Konrad Lai
Microprocessor Technology Labs, Intel
Microprocessor Technology Labs, Intel
Intel
Intel MulticoreMulticore University Research ConferenceUniversity Research Conference Dec 8, 2005
Motivation
Motivation
y
y
Multiple cores face a serious programmability problem
Multiple cores face a serious programmability problem
à
à Writing correct parallel programs is very difficultWriting correct parallel programs is very difficult
y
y
Transactional Memory
Transactional Memory
addresses
addresses
key part of the problem
key part of the problem
à
à Makes parallel programming easier by simplifying coordinationMakes parallel programming easier by simplifying coordination à
What is Transactional Memory?
What is Transactional Memory?
Basic mechanisms:
Basic mechanisms:
y
y
Isolation: Track read and writes, detect when conflicts occur
Isolation: Track read and writes, detect when conflicts occur
y
y
Version management: Record new/old values
Version management: Record new/old values
y
y
Atomicity: Commit new values or abort back to old values
Atomicity: Commit new values or abort back to old values
Transactional Memory (TM) allows arbitrary multiple memory locat
Transactional Memory (TM) allows arbitrary multiple memory locations to ions to be updated atomically be updated atomically begin_xaction begin_xaction A = A A = A –– 2020 B = B + 20 B = B + 20 A = A A = A –– BB C = C + 20 C = C + 20 end_xaction end_xaction Thread 1 Thread 1 begin_xaction begin_xaction C = C C = C -- 3030 A = A + 30 A = A + 30 end_xaction end_xaction Thread 2 Thread 2 Thread 1 Thread 1’’s s accesses and accesses and updates to A, B, C updates to A, B, C are atomic are atomic
Thread 2 sees either
Thread 2 sees either
“
“allall”” or or ““nonenone”” of of Thread 1
Problem: Lock-Based Synchronization
Problem: Lock-Based Synchronization
y
y Software engineering problemsSoftware engineering problems
à
à LockLock--based programs do not composebased programs do not compose
à
à Performance and correctness tightly coupledPerformance and correctness tightly coupled
à
à Timing dependent errors are difficult to find and debugTiming dependent errors are difficult to find and debug
y
y Performance problemsPerformance problems
à
à High performance requires finer grain lockingHigh performance requires finer grain locking
à
à More and more locks add more overheadMore and more locks add more overhead
Need a better concurrency model for multi
Need a better concurrency model for multi
-
-
core software
core software
Lock
Lock
-
-
based synchronization of shared data access
based synchronization of shared data access
Is fundamentally problematic
Transactional Memory benefits
Transactional Memory benefits
y
y
Focus: Multithreaded programmability crisis
Focus: Multithreaded programmability crisis
à
à Programmability & performanceProgrammability & performance
Î
ÎAllows conservative synchronizationAllows conservative synchronization Î
ÎProgrammer focuses on parallelism & correctness, HW extracts Programmer focuses on parallelism & correctness, HW extracts performance
performance à
à Software engineering and Software engineering and composabilitycomposability
Î
ÎAllows library reAllows library re--use and composition (locks make this very difficult)use and composition (locks make this very difficult)
à
à Critical for wide multiCritical for wide multi--core demandcore demand
y
y
Makes high performance MT programming easier
Makes high performance MT programming easier
à
à Captures a fundamental, wellCaptures a fundamental, well--known, intuitive known, intuitive ““atomicatomic”” constructconstruct à
à Been around for decadesBeen around for decades à
à Similar to a Similar to a ““critical sectioncritical section”” but without its problemsbut without its problems
Î
Software Transactional Memory
Software Transactional Memory
y
y
Software Transactional Memory (1995 until now)
Software Transactional Memory (1995 until now)
à
à Significant work from Sun, Brown, Cambridge, MicrosoftSignificant work from Sun, Brown, Cambridge, Microsoft à
à Serious performance limitationsSerious performance limitations
Î
ÎDegrades Degrades ““commoncommon”” case of no conflicts/contentioncase of no conflicts/contention Î
Î>90% of transactions are no conflicts>90% of transactions are no conflicts
– 90% of critical sections are uncontended: what if all these slowed down by 5X?
à
à Serious deployability limitationsSerious deployability limitations
Î
ÎRelies on special runtime supportRelies on special runtime support Î
ÎInvasive to applications and librariesInvasive to applications and libraries
à
à Is STM is too slow and too invasive to deploy?Is STM is too slow and too invasive to deploy?
Î
ÎCould there be better implementation?Could there be better implementation? Î
Hardware Support for Performance
Hardware Support for Performance
A.sum = 100 B.sum = 200 Core 1 Core 1 begin_xaction A.withdraw(20) B.deposit(20) end_xaction
Architectural Memory state
Architectural Memory state
A.sum = 100 B.sum = 200 1 1 A.sum = 80 B.sum = 220 A.sum = 80 B.sum = 220 1.
1. Record recovery stateRecord recovery state 2.
2. Buffer updates/track accessesBuffer updates/track accesses 3.
3. Commit if no external access (discard Commit if no external access (discard all updates if conflict)
all updates if conflict)
1 Core 2 Core 2 1 begin_xaction Sum = A.sum + B.sum end_xaction Coherence protocol Coherence protocol for conflicts for conflicts Core 2 sees $300
Core 2 sees $300 –– never $280 or $320never $280 or $320
A.sum = 100 B.sum = 200 1 1 A bo rt & res tar t A bo rt & res tar t
What if HW is not sufficient?
What if HW is not sufficient?
y
y
This is the deployability challenge
This is the deployability challenge
à
à The missing piece of the puzzle for all prior workThe missing piece of the puzzle for all prior work……
y
y
Resource limitations are fundamental
Resource limitations are fundamental
à
à Space: cachesSpace: caches
Î
ÎMore HW delays the inevitable: will always be an n+1 caseMore HW delays the inevitable: will always be an n+1 case
à
à Time: scheduling quantaTime: scheduling quanta
Î
ÎProgrammers have no control over timeProgrammers have no control over time
à
à Affects functionality, not just performanceAffects functionality, not just performance
Î
ÎSome transactions may never completeSome transactions may never complete
y
y
Making HW limit explicit is difficult
Making HW limit explicit is difficult
à
à Limited usage onlyLimited usage only à
à Unreasonable for high level languagesUnreasonable for high level languages à
Virtualize for Completeness
Virtualize for Completeness
Timer interrupts, Timer interrupts, Context switches, Context switches, Exceptions, Exceptions,…… Limited buffers Limited buffers Core 1 Core 1 1 1 1 Virtual TMvirtual address space virtual address space
Log/buffer space
y
y Overflow management Overflow management
à
à Using virtual memoryUsing virtual memory
à
à Software Software libslibs. and microcode. and microcode
Out
Out--ofof--band concurrency controlband concurrency control
y
y Programmer transparentProgrammer transparent y
y Performance isolationPerformance isolation y
Recent TM Research
Recent TM Research
y
y
Recently, focus on solving the harder problem of TM
Recently, focus on solving the harder problem of TM
à
à Making the model immune to cache buffer size limitations, Making the model immune to cache buffer size limitations, scheduling limitations, etc.
scheduling limitations, etc.
y
y
TCC (Stanford) (2004)
TCC (Stanford) (2004)
à
à Same limitation as Same limitation as HerlihyHerlihy/Moss for TM (size limited to local caches)/Moss for TM (size limited to local caches)
y
y
LTM (MIT), VTM (Intel),
LTM (MIT), VTM (Intel),
LogTM
LogTM
(Wisconsin) (2005)
(Wisconsin) (2005)
à
à Assume hardware TM supportAssume hardware TM support à
à Add support to allow transactions to be immune to resource limitAdd support to allow transactions to be immune to resource limitationsations à
à Goals of each similar, approaches very differentGoals of each similar, approaches very different
Î
ÎLTM: only resource overflowLTM: only resource overflow Î
ÎVTM: complete virtualizationVTM: complete virtualization Î
Some Research Challenges
Some Research Challenges
yy Large transactionsLarge transactions y
y Language extensionsLanguage extensions y
y IO, loophole, escape hatches, IO, loophole, escape hatches, …… y
y Interaction and coInteraction and co--existence withexistence with
à
à Other synchronization schemes: locks, flags, Other synchronization schemes: locks, flags, …… à
à Other transactionsOther transactions
Î
ÎDatabase transactionDatabase transaction Î
ÎSystem transaction (Microsoft)System transaction (Microsoft)
à
à Other libraries, system software, operating system, Other libraries, system software, operating system, ……
y
y Performance monitor, tuning, debugging, Performance monitor, tuning, debugging, …… y
y Open Open vsvs Closed NestingClosed Nesting y
y Interaction between transaction & nonInteraction between transaction & non--transactiontransaction y
y Usage & WorkloadUsage & Workload
à
Backup
TM – First Decade
TM – First Decade
y
y
IBM 801 Database Storage (1980s)
IBM 801 Database Storage (1980s)
à
à Lock attribute bits on Lock attribute bits on virtual memoryvirtual memory (via (via TLBsTLBs, , PTEsPTEs) at 128 byte ) at 128 byte granularity
granularity
y
y
Load
Load
-
-
Linked/Store Conditional (Jensen et al. 1987)
Linked/Store Conditional (Jensen et al. 1987)
à
à Optimistic update of single cache line (Alpha, MIPS, PowerPC)Optimistic update of single cache line (Alpha, MIPS, PowerPC)
y
y
Transactional Memory (
Transactional Memory (
Herlihy&Moss
Herlihy&Moss
1993)
1993)
à
à Coined term; TM generalization of LL/SCCoined term; TM generalization of LL/SC à
à Instructions explicitly identify transactional loads and storesInstructions explicitly identify transactional loads and stores à
à Used dedicated transaction cacheUsed dedicated transaction cache
Î
Î Size of transactions limited to transaction cacheSize of transactions limited to transaction cache
y
y
Oklahoma Update (Stone et al./IBM 1993)
Oklahoma Update (Stone et al./IBM 1993)
à
à Similar to TM, concurrently proposedSimilar to TM, concurrently proposed à