Evaluation of the correctness tool - High-level compiler analysis for OpenMP

To evaluate the correctness framework we take two approaches: evaluate the usefulness based on the experience of novel programmers, and evaluate the accuracy compared to other similar tools. This section introduces the details of both evaluations.

4.4.1 Usefulness

In order to evaluate the usefulness of our tool we have used it in three courses of undergraduate students. The first was the “Course on programming models using OmpSs” [16] that took place in June 2014, in Bucaramanga, Colombia. This course had 21 participants, lasted for one week, and introduced basic and intermediate levels of OmpSs. The second and third courses were part of the “Parallelism” subject [153] of the Computer Science degree at the Technical University of Catalonia, Spain, which took place during May 2014 and October 2014. These courses had 23 participants (10 groups of 2 or 3 students) and 26 (13 groups of 2 students) respectively. Each course lasted for 3 weeks, and covered strategies for task decomposition and mechanisms for tasks synchronization. During the lectures, the students were asked to parallelize different algorithms using OpenMP and OmpSs tasking models, and analyze the performance and correctness of their implementations.

The students were provided with serial implementations or incomplete parallel versions of a series of benchmarks. They were given directions to perform the parallelization, and we applied quality checks on the results of the correctness tool at two different steps: a) before the tool was

given to the students, the expected mistakes were tested by myself, and b) during the lectures, the results were checked by the different professors of the lectures and myself. The assignments involve the next medium size programs:

– Compute the nth number in the Fibonacci sequence: simple version and linked list version

(appendix B.1.1).

– Compute the dot product of two equal-length arrays (appendix B.1.2). – Compute the multiplication of two matrices (appendix B.1.3).

– Compute the number Pi with a Monte Carlo method (appendix B.1.4). – Compute a solution for a random Sudoku puzzle (appendix B.1.5).

Figure 4.6 displays the results of this test. While all codes used in this evaluation involved data-sharing attributes, only one of them involved dependence clauses. This is the cause of having the most common mistakes related with the data-sharing attributes. The mistakes ordered by frequency are as follows:

1. Defining a variable that is never used, thus dead, due to using the firstprivate default data-sharing instead of explicitly defining it as shared.

2. Using a variable as firstprivate instead of private when its initial value is never read.

3. Having a race condition, either because a variable is not protected in an atomic or criticalconstruct, or because the task is not properly synchronized.

4. Using an automatic storage variable in a task which is not synchronized in the scope of the variable.

5. Defining dependences on a pointer variable instead of on the pointed object.

6. Defining a variable as private when it should be firstprivate because it is upwards exposed. 7. Defining a variable as an input dependence when its value is never read.

The two last cases are not common because users have to explicitly determine the data-sharing attribute or the dependence clause, whereas for the other cases, the default data-sharing rules apply for the variables and usually programmers forget to explicitly change it.

0 50 100 150 200 250 300 Dead Incoherent firstprivate Race Automatic as shared Incoherent pointer Incoherent input Incoherent private

Occ

urr

ence

s

Error type

4.4.2 Comparison with other frameworks: Oracle Solaris Sudio 12.3

We also have compared our messages with those from the Oracle Solaris Studio 12.3 compiler [114]. The Studio compiler warns two different situations: parallelized loops with data dependences between different loop iterations, and problematic data-sharing attributes (e.g., declare as shared variables whose accesses in a parallel region might cause data race, and declare as private a variable whose value in a parallel region is used after the parallel region). The first situation is not useful for us to compare because it does not involve tasks, so we only analyze the second situation. Studio does not implement OpenMP 4.0 however, so the case study regarding dependence clauses (Section 4.3.5) cannot be compared.

We use the code snippets shown in each of the case studies presented in Section 4.3. The results are shown in Table 4.1 and analyzed as follows:

Case 1. Mercurium advices to synchronize the task instead of privatizing the variable because it is an array. Studio advises to firstprivatize the variable instead. We have used GCC to test the performance of the two versions with this simple code snippet. After 5 executions, the average time used in the version using firstprivate is 6.656ms, and the time used in the taskwait version is 2.709ms, which results in losing 4ms for this simple example.

Case 2. Studio shows a wrong message, since x is a global variable, meaning that it is accessible from every scope (unless it has been shadowed). In the example, the variable is around at any moment the task is executed. Additionally, the compiler does not warn about the real problem, i.e., the race condition. If we wrap the task and the call to printf in a parallel construct, then Studio is able to recognize the race. It remains unclear to us why the lack of a parallel construct results in a wrong message. Studio compiler is proprietary software and the only documentation is the Oracle web site, so we cannot analyze their algorithm. Case 3. Oracle is not considering the possible loss of performance of firstprivatizing a variable

which value is never read. We already proved in case 1 that copying arrays may be unnecessarily expensive.

In document High-level compiler analysis for OpenMP (Page 63-65)