Abstract
Many applications are designed to perform traversals on tree-like data structures. Fusing and parallelizing these traversals enhance the performance of applications. Fusing multiple traversals improves the locality of the application. The runtime of an application can be significantly reduced by extracting parallelism and utilizing multi-threading. Prior frameworks have tried to fuse and parallelize tree traversals using coarse-grained approaches, leading to missed fine-grained opportunities for improving performance. Other frameworks have successfully supported fine-grained fusion on heterogeneous tree types but fall short regarding parallelization. We introduce a new framework Orchard built on top of Grafter. Orchard’s novelty lies in allowing the programmer to transform tree traversal applications by automatically applying fine-grained fusion and extracting heterogeneous parallelism.Orchard allows the programmer to write general tree traversal applications in a simple and elegant embedded Domain-Specific Language (eDSL). We show that the combination of fine-grained fusion and heterogeneous parallelism performs better than each alone when the conditions are met.
- Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. 1995. Cilk: An Efficient Multithreaded Runtime System. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP ’95). Association for Computing Machinery, New York, NY, USA, 207–216. https://doi.org/10.1145/209936.209958Google ScholarDigital Library
- Uday Bondhugula, Albert Hartono, J. Ramanujam, and P. Sadayappan. 2008. A Practical Automatic Polyhedral Parallelizer and Locality Optimizer. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’08). Association for Computing Machinery, New York, NY, USA, 101–113. https://doi.org/10.1145/1375581.1375595Google ScholarDigital Library
- Yanju Chen, Junrui Liu, Yu Feng, and Rastislav Bodik. 2022. Tree Traversal Synthesis Using Domain-Specific Symbolic Compilation. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’22). Association for Computing Machinery, New York, NY, USA, 1030–1042. https://doi.org/10.1145/3503222.3507751Google ScholarDigital Library
- Duncan Coutts, Roman Leshchinskiy, and Don Stewart. 2007. Stream Fusion: From Lists to Streams to Nothing at All. In Proceedings of the 12th ACM SIGPLAN International Conference on Functional Programming (ICFP ’07). Association for Computing Machinery, New York, NY, USA, 315–326. https://doi.org/10.1145/1291151.1291199Google ScholarDigital Library
- Alain Darte. 1999. On the Complexity of Loop Fusion. In Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques (PACT ’99). IEEE Computer Society, USA, 149.Google ScholarDigital Library
- John Doner. 1970. Tree acceptors and some of their applications. J. Comput. System Sci. 4, 5 (1970), 406 – 451. https://doi.org/10.1016/S0022-0000(70)80041-1Google ScholarDigital Library
- Joost Engelfriet and Sebastian Maneth. 2002. Output String Languages of Compositions of Deterministic Macro Tree Transducers. J. Comput. System Sci. 64, 2 (2002), 350 – 395. https://doi.org/10.1006/jcss.2001.1816Google ScholarDigital Library
- Robert J. Harrison, Gregory Beylkin, Florian A. Bischoff, Justus A. Calvin, George I. Fann, Jacob Fosso-Tande, Diego Galindo, Jeff R. Hammond, Rebecca Hartman-Baker, Judith C. Hill, Jun Jia, Jakob S. Kottmann, M-J. Yvonne Ou, Junchen Pei, Laura E. Ratcliff, Matthew G. Reuter, Adam C. Richie-Halford, Nichols A. Romero, Hideo Sekino, William A. Shelton, Bryan E. Sundahl, W. Scott Thornton, Edward F. Valeev, Álvaro Vázquez-Mayagoitia, Nicholas Vence, Takeshi Yanai, and Yukina Yokoi. 2016. MADNESS: A Multiresolution, Adaptive Numerical Environment for Scientific Simulation. SIAM Journal on Scientific Computing 38, 5 (2016), S123–S142. https://doi.org/10.1137/15M1026171 arXiv:https://doi.org/10.1137/15M1026171Google ScholarCross Ref
- Ken Kennedy and Kathryn S. McKinley. 1993. Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution. In Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing. Springer-Verlag, Berlin, Heidelberg, 301–320.Google ScholarDigital Library
- J. R. Larus and P. N. Hilfinger. 1988. Detecting Conflicts Between Structure Accesses. In Proceedings of the ACM SIGPLAN 1988 Conference on Programming Language Design and Implementation (PLDI ’88). ACM, New York, NY, USA, 24–31. https://doi.org/10.1145/53990.53993Google ScholarDigital Library
- Andreas Maletti. 2008. Compositions of extended top-down tree transducers. Information and Computation 206, 9 (2008), 1187 – 1196. https://doi.org/10.1016/j.ic.2008.03.019 Special Issue: 1st International Conference on Language and Automata Theory and Applications (LATA 2007).Google ScholarDigital Library
- Leo A. Meyerovich and Rastislav Bodik. 2010. Fast and Parallel Webpage Layout. In Proceedings of the 19th International Conference on World Wide Web (WWW ’10). Association for Computing Machinery, New York, NY, USA, 711–720. https://doi.org/10.1145/1772690.1772763Google ScholarDigital Library
- Leo A. Meyerovich, Matthew E. Torok, Eric Atkinson, and Rastislav Bodik. 2013. Parallel Schedule Synthesis for Attribute Grammars. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’13). Association for Computing Machinery, New York, NY, USA, 187–196. https://doi.org/10.1145/2442516.2442535Google ScholarDigital Library
- Dmitry Petrashko, Ondřej Lhoták, and Martin Odersky. 2017. Miniphases: Compilation Using Modular and Efficient Tree Transformations. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017). Association for Computing Machinery, New York, NY, USA, 201–216. https://doi.org/10.1145/3062341.3062346Google ScholarDigital Library
- Apan Qasem and Ken Kennedy. 2006. Profitable Loop Fusion and Tiling Using Model-Driven Empirical Search. In Proceedings of the 20th Annual International Conference on Supercomputing (ICS ’06). Association for Computing Machinery, New York, NY, USA, 249–258. https://doi.org/10.1145/1183401.1183437Google ScholarDigital Library
- S. Rajbhandari, J. Kim, S. Krishnamoorthy, L. Pouchet, F. Rastello, R. J. Harrison, and P. Sadayappan. 2016. A Domain-Specific Compiler for a Parallel Multiresolution Adaptive Numerical Simulation Environment. In SC ’16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 468–479.Google Scholar
- Samyam Rajbhandari, Jinsung Kim, Sriram Krishnamoorthy, Louis-Noël Pouchet, Fabrice Rastello, Robert J. Harrison, and P. Sadayappan. 2016. On Fusing Recursive Traversals of K-d Trees. In Proceedings of the 25th International Conference on Compiler Construction (CC 2016). Association for Computing Machinery, New York, NY, USA, 152–162. https://doi.org/10.1145/2892208.2892228Google ScholarDigital Library
- Laith Sakka, Kirshanthan Sundararajah, and Milind Kulkarni. 2017. TreeFuser: A Framework for Analyzing and Fusing General Recursive Tree Traversals. Proc. ACM Program. Lang. 1, OOPSLA, Article 76 (Oct. 2017), 30 pages. https://doi.org/10.1145/3133900Google ScholarDigital Library
- Laith Sakka, Kirshanthan Sundararajah, Ryan R. Newton, and Milind Kulkarni. 2019. Sound, Fine-Grained Traversal Fusion for Heterogeneous Trees. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2019). Association for Computing Machinery, New York, NY, USA, 830–844. https://doi.org/10.1145/3314221.3314626Google ScholarDigital Library
- Dipanwita Sarkar and Oscar Waddell. 2005. A Nanopass Framework for Compiler Education.Google Scholar
- Tao B. Schardl, William S. Moses, and Charles E. Leiserson. 2017. Tapir: Embedding Fork-Join Parallelism into LLVM’s Intermediate Representation. In Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’17). Association for Computing Machinery, New York, NY, USA, 249–265. https://doi.org/10.1145/3018743.3018758Google ScholarDigital Library
- Philip Wadler. 1990. Deforestation: transforming programs to eliminate trees. Theoretical Computer Science 73, 2 (1990), 231 – 248. https://doi.org/10.1016/0304-3975(90)90147-AGoogle ScholarDigital Library
- Yusheng Weijiang, Shruthi Balakrishna, Jianqiao Liu, and Milind Kulkarni. 2015. Tree Dependence Analysis. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’15). ACM, New York, NY, USA, 314–325. https://doi.org/10.1145/2737924.2737972Google ScholarDigital Library
- Ben Wiedermann and William R. Cook. 2007. Extracting Queries by Static Analysis of Transparent Persistence. In Proceedings of the 34th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’07). ACM, New York, NY, USA, 199–210. https://doi.org/10.1145/1190216.1190248Google ScholarDigital Library
Index Terms
- Orchard: Heterogeneous Parallelism and Fine-grained Fusion for Complex Tree Traversals
Recommendations
Sound, fine-grained traversal fusion for heterogeneous trees
PLDI 2019: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and ImplementationApplications in many domains are based on a series of traversals of tree structures, and fusing these traversals together to reduce the total number of passes over the tree is a common, important optimization technique. In applications such as compilers ...
TreeFuser: a framework for analyzing and fusing general recursive tree traversals
Series of traversals of tree structures arise in numerous contexts: abstract syntax tree traversals in compiler passes, rendering traversals of the DOM in web browsers, kd-tree traversals in computational simulation codes. In each of these settings, a ...
On fusing recursive traversals of K-d trees
CC 2016: Proceedings of the 25th International Conference on Compiler ConstructionLoop fusion is a key program transformation for data locality optimization that is implemented in production compilers. But optimizing compilers for imperative languages currently cannot ex- ploit fusion opportunities across a set of recursive tree ...
Comments