• No results found

SEER PROBABILISTIC SCHEDULING FOR COMMODITY HARDWARE TRANSACTIONAL MEMORY. 27 th Symposium on Parallel Architectures and Algorithms

N/A
N/A
Protected

Academic year: 2021

Share "SEER PROBABILISTIC SCHEDULING FOR COMMODITY HARDWARE TRANSACTIONAL MEMORY. 27 th Symposium on Parallel Architectures and Algorithms"

Copied!
25
0
0

Loading.... (view fulltext now)

Full text

(1)

SEER

Nuno Diegues, Paolo Romano and Stoyan Garbatov

PROBABILISTIC SCHEDULING FOR COMMODITY

HARDWARE TRANSACTIONAL MEMORY

(2)

Multi-cores are

now ubiquitous

The multi-core (r)evolution

Shared Memory

CPU

1

CPU

2

CPU

3

CPU

4

Concurrent programming

is complex

Hard to get right:

fine-grained locks

deadlocks

correctness

Classic approach:

Locking

atomic {

withdraw(acc1,val);

deposit(acc2,val);

}

Transactional

Memory abstraction

Programmer identifies atomic blocks

Runtime implements synchronization

(3)

Too much optimism

Problem: CPU time is wasted

run other computations instead

inhibit parallelism

improve cache usage

increase core frequency

reduce power consumption

y = x

x++

Identify likely

conflicts before

(4)

Software TM (STM): library has full concurrency control

can point precisely the culprit for the conflict

HTM available

in commodity

processors

Hardware TM (HTM): feedback is quite limited

(5)

Avoid running

T1 and T2 concurrently

How to find the

root cause for

the data conflict?

(6)

In an ideal world for HTMs…

xbegin

widthdraw(acc1,val)

deposit(acc2,val)

xend

Transactions may abort:

because of contention on

same memory locations

…and every transaction shall eventually succeed

(7)

…in practice: HTMS are Best-Effort

No progress guarantees:

A transaction may

always

abort

…due to a number of reasons:

Forbidden instructions

Capacity of caches (for reads and writes)

Faults and signals

(8)

Single Global Lock SGL fall-back path for HTM

Hardware transaction executes if SGL is free

Acquire SGL depending on retry policy

SGL is a very simple scheduler

Ignores the root cause

Takes a global decision --- the SGL

Adaptive Transaction Scheduling [SPAA08]

We need better

Scheduling for

Commodity HTMs

(9)

Related Work

Scheduler

Support for

HTM?

Support for Imprecise

Information?

Schedules Transactions in a

Fine-Grained Fashion?

ATS [SPAA08]

Yes

Yes

No

CAR-STM [PODC08]

No

No

Yes

Shrink [PODC09]

No

No

Yes

ProPS [Euro-Par14]

No

No

Yes

SER [PPoPP10]

No

No

Yes

TxLinux [SOSP07]

Yes

No

Yes

SOA [HiPEAC09/10]

Yes

No

Yes

(10)

Key Idea

Transactions to be executed are announced

Many observations are collected

upon transaction commit and abort

which transactions were active at the same time?

Over time, the outliers will be identifiable w.h.p.

(11)

Seer: overview

Transaction = source code transaction

active transactions

(12)

Seer: details

Threads collect lightweight events independently --- low overhead

Locking scheme (re-)calculated periodically

Calculate conditional probabilities of commit/abort

• 

Relevance threshold based on mean/stdev

One lock per transaction (atomic block in the application)

• 

T1 lock (L1) taken by T2 if they are deemed to conflict

• 

T1 waits for L1 to be free before executing

(13)

Seer: details

For each pair of transactions (x,y) acquire lock of each other if:

Are abort events of x common enough with y running concurrently?

Is y one of the main causes for x to abort?

(14)

Seer: optimizations

Capacity Aborts: another limitation from best-effort nature

• 

Per-core lock

• 

Taken when capacity aborts occur

• 

Tailored for hyper-thread usage

Only one thread (re-)calculates the locking scheme:

• 

Whenever it is waiting for the SGL (some thread is on the fallback path)

• 

If the SGL is rarely taken, then scheduling will not improve

Lock acquisition

(15)

Evaluation

HLE: Intel Hardware Lock Elision, i.e., no scheduling

RTM: Intel Commodity HTM with a SGL

SCM: Software-assisted Contention Management

[PODC14] --- schedule with a (single) auxiliary lock

aux lock is not read speculatively (in hw tx)

(16)

How much can we gain with Seer?

Threads

Threads

Geometric Mean

Speedup in STAMP

50%

Sp

ee

du

p

Sp

ee

du

p

Genome

Intruder

(17)

What motivates these gains?

• 

HLE: 77% with fall-back lock

Geometric Mean over STAMP w/ 8 threads

Fine-grained locks

• 

RTM: 37% with SGL

• 

SCM: 5% with SGL, 29% with (single) auxiliary lock

• 

Seer:

• 

3% with at least one tx lock

• 

4% with core lock

• 

12% with tx + core locks

• 

1% with SGL

(18)

Relevance of each mechanism?

Baseline: Seer with all mechanisms enabled (i.e., their overhead)

but without any lock acquisitions.

HTM lock acquisition:

Small improvement --- benchmark dependent

the more locks, the better

Transaction locks:

Detect conflicts inherent to benchmarks

Core locks:

Only relevant for >4t (hyper-threading)

Threshold tuning for probabilities

Consistent/small improvement

(19)

Summary

First scheduler tailored for Commodity HTMs:

Copes with

imprecise

information

Schedules transactions in a fine-grained manner

50% performance improvement with 8 threads

0-8% overhead from monitoring/calculation

(20)

Thank you

Questions?

(21)
(22)

HTM with a fall-back path

start

:

int status = htm_begin

code

:

application logic

(23)

HTM with a fall-back path

start

:

int status = htm_begin

if (status == ok)

// != ok when aborted

if (fallback-in-use())

htm_abort

//

fall-back in use

else goto

code

//

fast-path

??

code

:

application logic

if (inFastPath)

htm_end

//

fast-path

else

??

(24)

HTM with a fall-back path

start

:

int status = htm_begin

if (status == ok)

// != ok when aborted

if (fallback-in-use())

htm_abort

//

fall-back in use

else goto

code

//

fast-path

if (shouldRetry())

// retry policy

goto

start

else

use-fallback()

//

use fall-back

code

:

application logic

if (inFastPath)

htm_end

//

fast-path

else

quit-fallback()

//

fall-back

(25)

HTM with a fall-back: a single lock

start

:

int status = htm_begin

if (status == ok) // != ok when aborted

if (isTaken(lock))

htm_abort

//

fall-back in use

else goto

code

//

fast-path

if (shouldRetry())

// retry policy: e.g., limit retries to 10

goto

start

else

acquire(lock)

//

use fall-back

code

:

application logic

if (inFastPath)

//

fast-path

htm_end

else

//

fall-back

release(lock)

References

Related documents

Homemaker Services - hourly rate multiplied by 44 hours per week, multiplied by 52 weeks Home Health Aide Services - hourly rate multiplied by 44 hours per week, multiplied by 52

Thrombus density predicts successful recanalization with Solitaire stent retriever thrombectomy in acute ischemic stroke.. Detection of thrombus in acute ischemic stroke: value

graduation. Megan discussed music in relation to important people and events in her life. She also mentioned the role music plays in her emotional life. Megan described the

Component 16: Develop strategies to provide professional development for secondary guidance counselors on the benefits of career and professional academies and career themed

financial management, administrative services, and management team support, to provide well-maintained roads, parks, animal services, solid waste disposal, and land.

 Final, Advertiser-approved, fully functional ad creative for Rich Media and Video ad units (see IAB creative guidelines for definition/file size restrictions) is due 5

Statistical analysis of the different weeks studied during 2003 of lacewing eggs population density showed that after treatment in July, the groups ‘Untreated’, and ‘Last’

Lawnmower scenario 2: the splash areas are drifted using our drift model, and then convex-hull of target distribution is searched using the lawnmower algorithm.. DSMC: the splash