Academic Journals Database
Disseminating quality controlled scientific knowledge

Autotuning Strategies For Reducing Synchronization Costs In Multithreaded Kernels

Author(s): Apan Qasem

Journal: ARPN Journal of Systems and Software
ISSN 2222-9833

Volume: 2;
Issue: 4;
Start page: 152;
Date: 2012;
VIEW PDF   PDF DOWNLOAD PDF   Download PDF Original page

Keywords: Compiler Design | Parallelism | Software Infrastructure

Emergence of multicore architectures has opened up new opportunities for thread-level parallelism and dramatically increased the theoretical peak on current systems. However, achieving a high fraction of peak performance requires careful orchestration of many architecture-sensitive parameters, both on-chip and across the interconnect. In particular, the presence of shared-caches on multicore architectures makes it necessary to consider, in concert, issues related to thread synchronization and data locality. This paper studies the complex interaction among several compiler-level code transformations that affect data locality, achieved parallelism and synchronization and communication costs. We characterize this interaction using static analysis and generate a search space suitable for efficient automatic performance tuning. We also develop a heuristic based on number of threads; data reuse patterns, and the size and configuration of the shared cache, to estimate the optimal synchronization interval for pipeline-parallelized code. We validate our choice of tuning parameters and evaluate our heuristic with experiments on a set of scientific kernels on four different multicore platforms. The results show that our proposed heuristic is able to estimate the optimal synchronization window with reasonable accuracy and able to achieve significant performance improvement.
Why do you need a reservation system?      Affiliate Program