• No results found

4.2 Graph Construction

4.2.2 OpenMP Tools Interface

The goal of the OpenMP Tools Interface (OMPT) is to provide a standard API, independent of specific platforms and vendors, for OpenMP performance tools. It is designed to allow tools to gather performance information and hide low-level implementation details, while at the same time, staying as non-intrusive as possible. The OMPT interface is currently a working draft [111] and should become standard when it is released as part of OpenMP 5.0. The current version

Table 4.1: Partial OMPT interface with functions and callbacks that are most relevant for con- structing TDGs.

Name Type Description

ompt_start_tool Tool interface Registers a tool

ompt_initialize_t Init callback Initializes callbacks

ompt_finalize_t Final callback Clean-up

ompt_callback_thread_begin_t Event callback Thread begins

ompt_callback_thread_end_t Event callback Thread ends

ompt_callback_parallel_begin_t Event callback Parallel region begins ompt_callback_parallel_end_t Event callback Parallel region ends ompt_callback_task_create_t Event callback Explicit task creation ompt_callback_task_dependences_t Event callback Task dependencies ompt_callback_task_schedule_t Event callback Task scheduling point ompt_callback_implicit_task_t Event callback Implicit task creation

ompt_callback_sync_region_t Event callback Barrier or taskwait

ompt_callback_work_t Event callback Worksharing construct

ext_callback_loop_t Event callback Parallel loop begins

ext_callback_chunk_t Event callback Loop chunk begins

ompt_get_thread_data_t Entry point Retrieves thread data

provides mechanisms for registering a tool, exploring various execution details, examining the state of each OpenMP thread, interpreting a thread’s call stack, receiving event notifications, and tracing execution on OpenMP target devices. To support OMPT, an OpenMP runtime has to maintain additional information about the runtime state of each thread and provide a set of calls that tools can use to query the OpenMP runtime. Since it results in increased overhead, the runtime switches on the support for OMPT only if a tool registers itself at the beginning of the execution.

Table 4.1presents a subset of the interface that is most relevant for a tool designed to con- struct TDGs. The focus, in this case, is on callbacks related to explicit tasks, parallel regions, loops, barriers, and task scheduling points. The first thing a tool must do is to implement the

ompt_start_toolfunction. By implementing it, the tool lets the runtime know that OMPT support should be switched on. The tool then provides function pointers to the initialization and finalization callbacksompt_initialize_t andompt_finalize_t, respectively. Once the initialization callback is invoked, the tool provides the function pointers for event callbacks and queries the runtime for function pointers, such as ompt_get_thread_data_t, that allow the tool to query the runtime for additional information, such as thread data, and to trace activities on a target device.

Event callbacks that signal the beginning or the creation of a new entity, such as a thread, a parallel region, or a task, provide a pointer that allows the tool to leave a “cookie” associ- ated with that entity [108]. Whenever other events occur that involve that specific entity, this “cookie” is passed back to the tool, thereby allowing it to quickly associate events with exist-

ing entities. For example, when the ompt_callback_parallel_begin_t callback is invoked, a tool can create a new struct with all the relevant data for the parallel region; later on, when theext_callback_loop_t callback is invoked, one of the input parameters is the pointer to the parallel region data created earlier and in the context of which this loop now is running.

Theompt_callback_task_*callbacks focus on explicit tasks. The task creation callback is invoked when an explicit or an initial task is created. An initial task is created right after the main thread is created and it represents everything that this thread does until the first parallel region. The task scheduling callback is important for measuring accurate execution times of tasks. Whenever a task is preempted and a different one starts executing, we need to stop measuring the execution time for the preempted task. Since the OpenMP specification does not require explicit tasks to start running immediately after creation [28], this callback is also our only way of knowing that the task has started its execution.

Theompt_callback_implicit_task_tcallback is invoked twice, right after a paral- lel region starts and before it ends. It represents the separate execution of the parallel region by each thread, and hence the invocation occurs in the context of each thread. This callback has a parameter calledompt_scope_endpoint_tthat specifies whether the thread started or finished the implicit task. In contrast to explicit tasks, the execution of an implicit task starts right after its creation and in the context of the thread in which the callback was invoked. The

ompt_callback_sync_region_t callback is invoked both in the beginning and in the end of a synchronization region. It has a parameter calledompt_sync_region_kind_t

that specifies whether the region is a barrier or a taskwait construct.

The two callbacks related to parallel loops, namely ext_callback_loop_t and

ext_callback_chunk_t, are not part of the working technical report draft of OMPT [111]. These callbacks are part of an experimental extension [112] that was added to OMPT to sup- port the creation of Grain Graphs [113]. The purpose of these callbacks is to allow tools to capture individual OpenMP loop chunks. A chunk is either one or more iterations of a parallel loop to be executed by a thread. Since chunks usually include more than one iteration and since the number of iterations in a loop can be very high, constructing a task for each chunk is more efficient than constructing a task for each iteration. Compared to the worksharing call- backompt_callback_work_t, which provides only very general information about a par- allel loop, theext_callback_loop_tcallback provides much more extended information, such as iteration bounds, chunk size, and scheduling type. The ext_callback_chunk_t

callback is invoked in the beginning of each chunk and provides specific iteration bounds of the chunk, as well as a flag specifying whether the current chunk is the last one. Currently, chunk callbacks are only available if the loop is scheduled in dynamic mode (see Section1.2.1). Supporting these callbacks in static mode requires more changes in the runtime and can have a negative impact on performance.

Since OMPT is not yet part of the OpenMP standard, the only reference implementation available is an experimental branch of the LLVM OpenMP runtime [114]. Nevertheless, once OpenMP 5.0 is released, we are confident that OMPT support will quickly become available in other OpenMP implementations as well.

LD_PRELOAD="libtdg.so" OMP_NUM_THREADS=2 \

TDG_TOOL_POSTPROC="tim,dot,log" \

TDG_PAPI_COUNTERS="PAPI_TOT_CYC,PAPI_L3_TCM" ./example_app

Figure 4.3: Example of Libtdg usage.