skip to main content
research-article
Free Access
Just Accepted

Orchard: Heterogeneous Parallelism and Fine-grained Fusion for Complex Tree Traversals

Online AM:15 March 2024Publication History
Skip Abstract Section

Abstract

Many applications are designed to perform traversals on tree-like data structures. Fusing and parallelizing these traversals enhance the performance of applications. Fusing multiple traversals improves the locality of the application. The runtime of an application can be significantly reduced by extracting parallelism and utilizing multi-threading. Prior frameworks have tried to fuse and parallelize tree traversals using coarse-grained approaches, leading to missed fine-grained opportunities for improving performance. Other frameworks have successfully supported fine-grained fusion on heterogeneous tree types but fall short regarding parallelization. We introduce a new framework Orchard built on top of Grafter. Orchard’s novelty lies in allowing the programmer to transform tree traversal applications by automatically applying fine-grained fusion and extracting heterogeneous parallelism.Orchard allows the programmer to write general tree traversal applications in a simple and elegant embedded Domain-Specific Language (eDSL). We show that the combination of fine-grained fusion and heterogeneous parallelism performs better than each alone when the conditions are met.

References

  1. Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. 1995. Cilk: An Efficient Multithreaded Runtime System. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP ’95). Association for Computing Machinery, New York, NY, USA, 207–216. https://doi.org/10.1145/209936.209958Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Uday Bondhugula, Albert Hartono, J. Ramanujam, and P. Sadayappan. 2008. A Practical Automatic Polyhedral Parallelizer and Locality Optimizer. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’08). Association for Computing Machinery, New York, NY, USA, 101–113. https://doi.org/10.1145/1375581.1375595Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Yanju Chen, Junrui Liu, Yu Feng, and Rastislav Bodik. 2022. Tree Traversal Synthesis Using Domain-Specific Symbolic Compilation. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’22). Association for Computing Machinery, New York, NY, USA, 1030–1042. https://doi.org/10.1145/3503222.3507751Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Duncan Coutts, Roman Leshchinskiy, and Don Stewart. 2007. Stream Fusion: From Lists to Streams to Nothing at All. In Proceedings of the 12th ACM SIGPLAN International Conference on Functional Programming (ICFP ’07). Association for Computing Machinery, New York, NY, USA, 315–326. https://doi.org/10.1145/1291151.1291199Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Alain Darte. 1999. On the Complexity of Loop Fusion. In Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques (PACT ’99). IEEE Computer Society, USA, 149.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. John Doner. 1970. Tree acceptors and some of their applications. J. Comput. System Sci. 4, 5 (1970), 406 – 451. https://doi.org/10.1016/S0022-0000(70)80041-1Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Joost Engelfriet and Sebastian Maneth. 2002. Output String Languages of Compositions of Deterministic Macro Tree Transducers. J. Comput. System Sci. 64, 2 (2002), 350 – 395. https://doi.org/10.1006/jcss.2001.1816Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Robert J. Harrison, Gregory Beylkin, Florian A. Bischoff, Justus A. Calvin, George I. Fann, Jacob Fosso-Tande, Diego Galindo, Jeff R. Hammond, Rebecca Hartman-Baker, Judith C. Hill, Jun Jia, Jakob S. Kottmann, M-J. Yvonne Ou, Junchen Pei, Laura E. Ratcliff, Matthew G. Reuter, Adam C. Richie-Halford, Nichols A. Romero, Hideo Sekino, William A. Shelton, Bryan E. Sundahl, W. Scott Thornton, Edward F. Valeev, Álvaro Vázquez-Mayagoitia, Nicholas Vence, Takeshi Yanai, and Yukina Yokoi. 2016. MADNESS: A Multiresolution, Adaptive Numerical Environment for Scientific Simulation. SIAM Journal on Scientific Computing 38, 5 (2016), S123–S142. https://doi.org/10.1137/15M1026171 arXiv:https://doi.org/10.1137/15M1026171Google ScholarGoogle ScholarCross RefCross Ref
  9. Ken Kennedy and Kathryn S. McKinley. 1993. Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution. In Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing. Springer-Verlag, Berlin, Heidelberg, 301–320.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. R. Larus and P. N. Hilfinger. 1988. Detecting Conflicts Between Structure Accesses. In Proceedings of the ACM SIGPLAN 1988 Conference on Programming Language Design and Implementation (PLDI ’88). ACM, New York, NY, USA, 24–31. https://doi.org/10.1145/53990.53993Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Andreas Maletti. 2008. Compositions of extended top-down tree transducers. Information and Computation 206, 9 (2008), 1187 – 1196. https://doi.org/10.1016/j.ic.2008.03.019 Special Issue: 1st International Conference on Language and Automata Theory and Applications (LATA 2007).Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Leo A. Meyerovich and Rastislav Bodik. 2010. Fast and Parallel Webpage Layout. In Proceedings of the 19th International Conference on World Wide Web (WWW ’10). Association for Computing Machinery, New York, NY, USA, 711–720. https://doi.org/10.1145/1772690.1772763Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Leo A. Meyerovich, Matthew E. Torok, Eric Atkinson, and Rastislav Bodik. 2013. Parallel Schedule Synthesis for Attribute Grammars. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’13). Association for Computing Machinery, New York, NY, USA, 187–196. https://doi.org/10.1145/2442516.2442535Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Dmitry Petrashko, Ondřej Lhoták, and Martin Odersky. 2017. Miniphases: Compilation Using Modular and Efficient Tree Transformations. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017). Association for Computing Machinery, New York, NY, USA, 201–216. https://doi.org/10.1145/3062341.3062346Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Apan Qasem and Ken Kennedy. 2006. Profitable Loop Fusion and Tiling Using Model-Driven Empirical Search. In Proceedings of the 20th Annual International Conference on Supercomputing (ICS ’06). Association for Computing Machinery, New York, NY, USA, 249–258. https://doi.org/10.1145/1183401.1183437Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Rajbhandari, J. Kim, S. Krishnamoorthy, L. Pouchet, F. Rastello, R. J. Harrison, and P. Sadayappan. 2016. A Domain-Specific Compiler for a Parallel Multiresolution Adaptive Numerical Simulation Environment. In SC ’16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 468–479.Google ScholarGoogle Scholar
  17. Samyam Rajbhandari, Jinsung Kim, Sriram Krishnamoorthy, Louis-Noël Pouchet, Fabrice Rastello, Robert J. Harrison, and P. Sadayappan. 2016. On Fusing Recursive Traversals of K-d Trees. In Proceedings of the 25th International Conference on Compiler Construction (CC 2016). Association for Computing Machinery, New York, NY, USA, 152–162. https://doi.org/10.1145/2892208.2892228Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Laith Sakka, Kirshanthan Sundararajah, and Milind Kulkarni. 2017. TreeFuser: A Framework for Analyzing and Fusing General Recursive Tree Traversals. Proc. ACM Program. Lang. 1, OOPSLA, Article 76 (Oct. 2017), 30 pages. https://doi.org/10.1145/3133900Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Laith Sakka, Kirshanthan Sundararajah, Ryan R. Newton, and Milind Kulkarni. 2019. Sound, Fine-Grained Traversal Fusion for Heterogeneous Trees. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2019). Association for Computing Machinery, New York, NY, USA, 830–844. https://doi.org/10.1145/3314221.3314626Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Dipanwita Sarkar and Oscar Waddell. 2005. A Nanopass Framework for Compiler Education.Google ScholarGoogle Scholar
  21. Tao B. Schardl, William S. Moses, and Charles E. Leiserson. 2017. Tapir: Embedding Fork-Join Parallelism into LLVM’s Intermediate Representation. In Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’17). Association for Computing Machinery, New York, NY, USA, 249–265. https://doi.org/10.1145/3018743.3018758Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Philip Wadler. 1990. Deforestation: transforming programs to eliminate trees. Theoretical Computer Science 73, 2 (1990), 231 – 248. https://doi.org/10.1016/0304-3975(90)90147-AGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  23. Yusheng Weijiang, Shruthi Balakrishna, Jianqiao Liu, and Milind Kulkarni. 2015. Tree Dependence Analysis. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’15). ACM, New York, NY, USA, 314–325. https://doi.org/10.1145/2737924.2737972Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Ben Wiedermann and William R. Cook. 2007. Extracting Queries by Static Analysis of Transparent Persistence. In Proceedings of the 34th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’07). ACM, New York, NY, USA, 199–210. https://doi.org/10.1145/1190216.1190248Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Orchard: Heterogeneous Parallelism and Fine-grained Fusion for Complex Tree Traversals

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Architecture and Code Optimization
        ACM Transactions on Architecture and Code Optimization Just Accepted
        ISSN:1544-3566
        EISSN:1544-3973
        Table of Contents

        Copyright © 2024 Copyright held by the owner/author(s).

        Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 15 March 2024
        • Accepted: 25 February 2024
        • Revised: 24 January 2024
        • Received: 17 July 2023
        Published in taco Just Accepted

        Check for updates

        Qualifiers

        • research-article
      • Article Metrics

        • Downloads (Last 12 months)90
        • Downloads (Last 6 weeks)90

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader