Fine-grain multithread programming with MassiveThreads

SML# provides MassiveThreads-based multithread support by default. MassiveThreads is a user-level lightweight thread library in C provided by the University of Tokyo. The SML#’s direct C interface and unobtrusive concurrent garbage collection enable SML# programs to call MassiveThreads directly.

MassiveThreads allows us to create millions of user threads, say 1,000,000 threads, that runs concurrently on multicore processors.

By default, time SML# runtime is restricted to use only one CPU core for the performance of single-thread programs. To enable MassiveThreads on multicore processors, specify at least one MYTH_*

environment variable as a conﬁguration of MassiveThreads. A typical one is MYTH_NUM_WORKERS that speciﬁes the number of worker threads, i.e., the number of CPU cores the program uses. For example, do the following command to start an interactive session:

$ MYTH_NUM_WORKERS=0 smlsharp

11.2. FINE-GRAIN MULTITHREAD PROGRAMMING WITH MASSIVETHREADS 61 In Linux, setting MYTH_NUM_WORKERS to 0 means using all available CPU cores.

SML# provides Myth structure that is a direct binding of MassiveThreads library in SML#. In Myth.Thread structure, you ﬁnd basic functions for thread management. Its primary functions are the following:

• User thread creation.

Myth.Thread.create : (unit -> int) -> Myth.thread

create f creates a new user thread that computes f (). The created user thread will be sched-uled to an appropriate CPU core by the MassiveThreads task scheduler. Its scheduling policy is non-preemptive; a thread occupies a CPU core until either it calls a thread control function of MassiveThreads (Myth.Thread.yield) or it terminates.

• User thread join.

Myth.Thread.join : Myth.thread -> int

join t waits for the completion of thread t and returns the result of the computation of t. Any user thread created by create must be joined sometime in the future. Note that this Myth.Thread structure is just a direct binding of C functions, as in C, the resource of the created threads must be freed explicitly.

• User thread scheduling.

Myth.Thread.yield : unit -> unit yield () yields the CPU core to other threads.

As an introduction to MassiveThreads programming, let us write a task parallel program. Roughly speaking, you can write a task parallel program by the following steps:

1. Write a recursive function that performs divide-and-conquer.

2. Surround each recursive call with a pair of create and join so that each recursive call is evaluated in a diﬀerent thread.

3. To prevent from creating very short threads, set a threshold (cut-oﬀ) to stop thread creation and do recursive calls in the same thread. The threshold must be decided so that sequential wall-clock time of a user thread is suﬃciently longer than the overhead of thread creation. In practice, 3–4 microseconds for a user thread is good enough.

For example, let us write a program that compute fib 40 recursively. The following is a typical deﬁnition of recursive fib function:

fun fib 0 = 0

| fib 1 = 1

| fib n = fib (n - 1) + fib (n - 2) val result = fib 40

To compute fib (n - 1) and fib (n - 2) in parallel, surround one of them with create and join:

fun fib 0 = 0

| fib 1 = 1

| fib n = let

val t2 = Myth.Thread.create (fn () => fib (n - 2)) in

fib (n - 1) + Myth.Thread.join t2 end

val result = fib 40

This is not a goal; unfortunately, if n is very small, the computation cost of fib n is apparently much smaller than the overhead of thread creation. To avoid this, we introduce a threshold so that it computes sequentially if n is smaller than 10.

val cutOff = 10 fun fib 0 = 0

| fib 1 = 1

| fib n = if n < cutOff

then fib (n - 1) + fib (n - 2) else

let

val t2 = Myth.Thread.create (fn () => fib (n - 2)) in

fib (n - 1) + Myth.Thread.join t2 end

val result = fib 40

Now it is all done! Running this program, it eventually generates 3,524,577 user threads in total.

Chapter 12

SML# feature: seamless SQL integration

Accessing databases is essential in most of practical programs that manipulate data. The most widely used database query language is SQL. The conventional method of accessing databases to generate SQL command string, which is cumbersome and error prone. SML# integrates SQL expressions themselves as polymorphically typed ﬁrst-class citizens. This chapter explain this feature.

12.1 Relational databases and SQL

Most of practical database systems are relational databases. To understand SML# database integration, this section review the basics notions of relational databases and SQL.

In the relational model, data are represented by a set of relations. A relation is a set of tuples, each of which represents association of attribute values such as name, age, and salary. Such a relation is displayed as a table of the following form.

name age salary

”Joe” 21 10000

”Sue” 31 20000

”Bob” 41 20000

A relational database is system to manipulate a collection of such tables. A relation R on the sets A1, A2,· · · , An of attribute values is mathematically a subset of the Cartesian product A1× A2· · · × An. Each element t in R is an n element tuple (a1, . . . , an). In an actual database system, each component of a tuple has attribute name, and a tuple is represented as a labeled record. For example, the ﬁrst line of the example table above is regarded as a record {name="Joe", age=21, salary=1000}. On these relations, a family of operations are deﬁned, including union, projection, selection, and Cartesian product. A set of tables associated with a set of these operations is called the relational algebra. One important thing to note on this model is that, as its name indicates, the relational model is an algebra and that it is manipulated by an algebraic language. An algebraic language is a functional language that does not have function expression.

In relational databases, the relational algebra is represented by the language called SQL, which is language of set-value expressions. The central construct of SQL is the following SELECT expression.

SELECT t¹.l₁ as l^′₁,. . ., t^m.l_m as l^′_m FROM R₁ as t₁, . . ., R_n as t_n WHERE P (t1, . . . , tn)

Here we used the following meta variables.

• R: relation variables

• t: tuple variables

• l: labels, or attribute names

• t.l: the l attribute of tuple l

The operational meaning of a SELECT expression can be understood as follows.

1. Evaluate each Ri in FROM clause, and generate their Cartesian product R1× · · · × Rn

2. Let (t1, . . . , tn) be any representative tuple in the product.

3. Select the tuples that satisﬁes the predicate P (t1, . . . , tn) speciﬁed in WHERE clause from the product.

4. For each element (t1, . . . , tn) in the select set, construct a record{l^′1=t¹.l1, . . ., l^′_m=t^m.lm}.

5. Collect all these records.

For example, let the above example table be named as Persons and consider the following SQL.

SELECT P.name as name, P.age as age FROM Persons as P

WHERE P.salary > 10000 This expression is evaluated as follows.

• The Cartesian product of the soul relation Persons is Persons itself.

• Let P be any tuple in Persons.

• Select from Person all the tuples P such that P.Salary > 10000. We obtain the following set.

name age salary

”Sue” 31 20000

”Bob” 41 20000

• For each tuple P in this set, compute the new tuple {name=P.name, age=P.age} to obtain the following set.

name age

”Sue” 31

”Bob” 41 The is the result of the expression.

This result represent the set (list) of records: {{name="Sue", age=31}, {name="Bob", age=31}}.

In document SML# Document Version 4.0.0 (Page 67-71)