Modeling point-to-point communication - Algorithm selection and automatic tuning

3.3 Algorithm selection and automatic tuning

4.2.1 Modeling point-to-point communication

The parallel communication models (PCMs) are a basis for design and analysis of parallel algorithms. A good model includes the smallest possible number of parameters, while still being able to sufficiently capture the complexity of the run-time system across a number of possible criteria. This section introduces three commonly used point-to-point communication models: Hockney, LogP/LogGP, and PLogP.

Hockney model

The Hockney model [Hockney, 1994] assumes that the time to send a message of size m between two nodes is defined as

T = α + β · m (4.1)

In the equation above, α refers to the message startup time, also refered to as latency, which includes the time to prepare the message (copy to system buffer, etc.) and the time for the first byte of the message to arrive at the receiver. β represents the transfer time per byte or reciprocal of the network bandwidth. The original Hockney model used asymptotic values for these two parameters.

(a) (b)

Figure 4.13: Hockney model (a) without, and (b) with communication overlap.

The time to receive message is α + β · m, but the time when the sending process can start sending another message can be defined in a number of different ways. In a worst-case scenario, no communication overlap is allowed, thus the sender must wait for the receiver to receive the message fully before it is allowed to initiate another send. In a best-case scenario, a process is allowed to initiate the second send after the latency has expired. Figure 4.13 depicts the two cases. In our analysis, we assume the worst-case scenario: a sender is allowed to initiate the next send operation after α + β · m time.

One of the limitations of this model is that network congestion cannot be modeled directly. Different authors address this issue. For example, Chan et al. in [Chan et al., 2004] extend the basic model in the presence of network conflicts. The time to receive a message becomes T = α + k · β · m, where k is the maximum number of network conflicts at the time. However, in our analysis, we did not consider these additional changes to the model.

LogP/LogGP models

The LogP model was introduced by Culler et al. in [Culler et al., 1993]. The model attempts to capture the properties of parallel computation in terms of the number of processors, the communication delay, the communication bandwidth (gap), and the communication overhead. The time to receive a message between two processes according to the LogP model is defined as

T = L + 2o (4.2)

where L refers to network latency and o is the communication overhead. In LogP model, the sender is allowed to initiate new send operation after g period of time. This implies that the network allows transmission of at most bL/gc messages simultaneously. Since none of the parameters depend on message size, the model assumes that only constant-size, small messages are communicated between the nodes. Figure 4.14 (a) displays a communication diagram for LogP model parameters.

P0 P1 P2 P3 o o L o o o o o L L o g g g g P0 P1 P2 P3 o o g L o G G G G G o G L o G G G L o o g G G G g o G G G g (a) (b)

Figure 4.14: (a) LogP and (b) LogGP model parameters. The figure shows broadcast-like communication pattern: P 0 → {P 2, P 1}, P 1 → {...}, and P 2 → {P 3, ...}.

LogGP was introduced in [Alexandrov et al., 1995] as an extension of the LogP model that handles large messages. The model introduces the gap per byte parameter G, to capture the cost of sending large messages across the network. The LogGP model predicts the time to send a message of size m between two processes as

T = L + 2o + (m − 1)G (4.3)

As in the case of the LogP model, the sender is able to initiate a new message only after time g expires. Figure 4.14 (b) shows the LogGP model parameters in a simple communication pattern.

PLogP model

The Parametrized LogP model, PLogP, [Kielmann et al., 2000] is an extension of the LogP model. The PLogP model is defined in terms of end-to-end latency L, sender and receiver overheads, os(m) and or(m) respectively, gap per message g(m), and the number of processes

involved in communication P . In this model, sender and receiver overheads and the gap per message parameters depend on the message size, m. The time to receive a message of size m in the PLogP model is defined as

T = L + g(m) (4.4)

Figure 4.15 shows the PLogP model parameters in action in a simple, broadcast-like communication pattern.

The notion of latency and gap in the PLogP model slightly differs from that of the LogP and LogGP models. In addition to the message transfer time, latency in the PLogP model includes all contributing factors, such as copying data to and from network interfaces. The gap parameter in the PLogP model is defined as the minimum time interval between consecutive message transmissions or receptions, implying that at all times g(m) ≥ os(m)

and g(m) ≥ or(m). If g(m) is a linear function of message size m and L excludes the

sender overhead, then the PLogP model is equivalent to a LogGP model that distinguishes between sender and receiver overheads. Kielmann et al. in [Kielmann et al., 2000] provide a

P0 P1 P2 P3 g(m) L os(m) g(m) os(m) or(m) g(m) g(m) L os(m) or(m) g(m) g(m) or(m) os(m) g(m)

Figure 4.15: PLogP model parameters. The figure shows broadcast-like communication pattern: P 0 → {P 2, P 1}, P 1 → {...}, and P 2 → {P 3, ...}. LogP/LogGP PLogP L = L + g(1) − os(1) − or(1) o = os(1)+or(1) 2 g = g(1) G = lim m→∞ “ g(m) m ” P = P

Table 4.1: LogP/LogGP parameters in terms of PLogP parameters.

transformation from PLogP parameters to LogP/LogGP parameters. Table 4.1 reproduces this dependence.

In document Towards Automatic and Adaptive Optimizations of MPI Collective Operations (Page 54-57)