The LeTS heuristic focuses on amortizing the communication cost between tasks by exploiting inter-task data locality and minimizes the overall schedule length (SL) of the target application. It takes into account both locality and load balancing in order to reduce the execution time of target applications in multi-level cache hierarchy. Exten- sive experimental evaluation, conducted using task graphs taken from Standard Task Graph (STG) shows that LeTS outperforms best known state-of-the-art algorithms
in amortizing the inter-task communication cost. We have performed experiments by varying three major performance parameters, namely: (1) CCR between 0.1 and 1.0, (2) Application size, i.e., task graphs that consist of 50-, 100-, and 300-tasks/graph, and (3) Number of cores with 2-, 4-, 8-, and 16-cores execution scenarios. Results show that conscious decision-making by the scheduler regarding data reuse across tasks and optimal task ordering to minimize reuse distance of shared data between tasks can play an important role in minimising inter-task communication cost. Our results show in depth how variations in the application size and number of cores available to run these applications impact the overall execution time. The LeTS heuristic achieves load balancing through its work-conserving nature and the WTG-OP phase of its working principle. The working principal of LeTS requires the application task graph to be known a priori. The future extensions of LeTS heuristic will work for heterogeneous computing systems and partially-known task graphs.
References
1. Wolf W, Jerraya AA, Martin G (2008) Multiprocessor system-on-chip (MPSoC) technology. IEEE Trans CAD ICs Syst 27(10):1701–1713
2. Bhatti MK, Oz I, Popov K, Brorsson M, Farooq U (2016) Scheduling of parallel tasks with proportionate priorities. Arab J Sci Eng 41(8):3279–3295.https://doi.org/10.1007/s13369-016-2180-9
3. Yoo RM, Hughes CJ, Kim C, Chen Y-K, Kozyrakis C (2013) Locality-aware task management for unstructured parallelism: a quantitative limit study. In: Proceedings of the twenty-fifth annual ACM symposium on parallelism in algorithms and architectures, ser. SPAA ’13. ACM, New York, NY, pp 315–325.https://doi.org/10.1145/2486159.2486175
4. Grama A, Gupta A, Karypis G, Kumar V (2003) Introduction to parallel computing, 2nd edn. Pearson A. Wesley, Reading
5. Sinnen O, Sousa L (2004) List scheduling: extension for contention awareness and evaluation of node priorities for heterogeneous cluster architectures. Parallel Comput 30(1):81–101
6. Sinnen O (2014) Reducing the solution space of optimal task scheduling. Comput OR 43:201–214 7. Bhatti MK, Belleudy C, Auguin M (2011) Hybrid power management in real time embedded systems:
an interplay of DVFs and DPM techniques. Real-Time Syst 47(2):143–162
8. Shahul AS, Sinnen O (2010) Scheduling task graphs optimally with a*. J Supercomput 51(3):310–332 9. Sinnen O, Sousa LA (2005) Communication contention in task scheduling. IEEE Trans Parallel Distrib
Syst 16(6):503–515
10. Dally W (2009) The future of GPU computing. In: The 22nd annual supercomputing conference 11. Hill M, Kozyrakis C (2012) Advancing computer systems without technology progress. In:
DARPA/ISAT workshop
12. Consortium CC (2012) 21st century computer architecture. A community white paper 13. Set STGhttp://www.kasahara.elec.waseda.ac.jp/schedule
14. Sinnen O (2007) Task scheduling for parallel systems. Wiley, New York. ISBN 978-0-471-73576-2 15. Yang T, Gerasoulis A (1994) Dsc: scheduling parallel tasks on an unbounded number of processors.
IEEE Trans Parallel Distrib Syst 5(9):951–967
16. Kasahara H, Narita S (1984) Practical multiprocessor scheduling algorithms for efficient parallel pro- cessing. IEEE Trans Comput C–33(11):1023–1029
17. Khan MA (2012) Scheduling for heterogeneous systems using constrained critical paths. Parallel Comput 38:175–193
18. Topcuouglu H, Hariri S, you Wu M (2002) Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans Parallel Distrib Syst 13(3):260–274
19. Kwok Y-K, Ahmad I (2000) Link contention-constrained scheduling and mapping of tasks and mes- sages to a network of heterogeneous processors. Cluster Comput 3(2):113–124
20. Ahmad I, Kwok Y-K (1998) On exploiting task duplication in parallel program scheduling. IEEE Trans Parallel Distrib Syst 9(9):872–892
21. Kwok Y-K, Ahmad I (1996) Dynamic critical-path scheduling: an effective technique for allocating task graphs to multiprocessors. IEEE Trans Parallel Distrib Syst 7(5):506–521
22. Wu M-Y, Gajski D (1990) Hypertool: a programming aid for message-passing systems. IEEE Trans Parallel Distrib Syst 1(3):330–343
23. Fard HM, Prodan R, Barrionuevo JJD, Fahringer T (2012) A multi-objective approach for workflow scheduling in heterogeneous environments. In: 2012 12th IEEE/ACM international symposium on cluster, cloud and grid computing (ccgrid 2012), pp 300–309
24. Arabnejad H, Barbosa J (2014) List scheduling algorithm for heterogeneous systems by an optimistic cost table. IEEE Trans Parallel Distrib Syst 25(3):682–694
25. Iverson MA, Ozguner F, Follen GJ (1995) Parallelizing existing applications in a distributed hetero- geneous environment. In: HCW ’95, pp 93–100
26. Bertrand Cirou EJ (2001) Triplet: a clustering scheduling algorithm for heterogeneous systems. New York.https://doi.org/10.1109/ICPPW.2001.951956
27. Kim S, Browne J (1988) General approach to mapping of parallel computations upon multiprocessor architectures. Unknown J 3:1–8
28. Sarkar V (1989) Partitioning and scheduling parallel programs for multiprocessors. MIT Press, Cam- bridge, MA
29. Kanemitsu H, Hanada M, Nakazato H (2016) Clustering-based task scheduling in a large number of heterogeneous processors. IEEE Trans Parallel Distrib Syst 27(11):3144–3157
30. Shahul AZ, Sinnen O (2010) Scheduling task graphs optimally with a*. J Supercomput 51(3):310–332 31. Deelman E, Singh G, Su M-H, Blythe J, Gil Y, Kesselman C, Mehta G, Vahi K, Berriman GB, Good J, Laity A, Jacob JC, Katz DS (2005) Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Sci Program 13(3):219–237
32. Darte A, Robert Y, Vivien F (2002) Scheduling and automatic parallelization. BirkhŁuser, New York. ISBN 0-8176-4149-1
33. Suter F, Desprez F, Casanova H (2004) From heterogeneous task scheduling to heterogeneous mixed parallel scheduling. In: Euro-Par 2004 parallel processing, pp 230–237
34. Orsila H, Kangas T, Salminen E, Hamalainen TD, Hannikainen M (2007) Automated memory-aware application distribution for multi-processor system-on-chips. JSA 53(11):795–815
35. de Langen P, Juurlink B (2009) Leakage-aware multiprocessor scheduling. J Signal Process Syst 57(1):73–88
36. Bhatti MK, Oz I, Popov K, Muddukrishna A, Brorsson M (2014) Noodle: a heuristic algorithm for task scheduling in MPSoC architectures. In: 2014 17th Euromicro conference on digital system design (DSD). IEEE, pp 667–670