Hostname: page-component-848d4c4894-p2v8j Total loading time: 0 Render date: 2024-05-07T10:16:40.564Z Has data issue: false hasContentIssue false

Lambda calculus with algebraic simplification for reduction parallelisation: Extended study

Part of: ICFP 2019

Published online by Cambridge University Press:  05 April 2021

AKIMASA MORIHATA*
Affiliation:
University of Tokyo, 3-8-1, Komaba, Meguro-ku, Tokyo, Japan (e-mail: morihata@graco.c.u-tokyo.ac.jp)
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Parallel reduction is a major component of parallel programming and widely used for summarisation and aggregation. It is not well understood, however, what sorts of non-trivial summarisations can be implemented as parallel reductions. This paper develops a calculus named λAS, a simply typed lambda calculus with algebraic simplification. This calculus provides a foundation for studying a parallelisation of complex reductions by equational reasoning. Its key feature is δ abstraction. A δ abstraction is observationally equivalent to the standard λ abstraction, but its body is simplified before the arrival of its arguments using algebraic properties such as associativity and commutativity. In addition, the type system of λAS guarantees that simplifications due to δ abstractions do not lead to serious overheads. The usefulness of λAS is demonstrated on examples of developing complex parallel reductions, including those containing more than one reduction operator, loops with conditional jumps, prefix sum patterns and even tree manipulations.

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
© The Author(s), 2021. Published by Cambridge University Press

References

Bergstrom, L., Fluet, M., Rainey, M., Reppy, J. H. & Shaw, A. (2012) Lazy tree splitting. J. Funct. Program. 22(4–5), 382438.CrossRefGoogle Scholar
Blelloch, G. E. (1993) Prefix sums and their applications. In Synthesis of Parallel Algorithms, Chapter 1, Reif, J. H. (ed). Morgan Kaufmann Publishers.Google Scholar
Bruggeman, C., Waddell, O. & Dybvig, R. K. (1996) Representing control in the presence of one-shot continuations. In Proceedings of the ACM SIGPLAN’96 Conference on Programming Language Design and Implementation (PLDI), Philadephia, Pennsylvania, USA, May 21–24, 1996, Fischer, C. N. (ed). ACM, pp. 99107.CrossRefGoogle Scholar
Buneman, P., Cong, G., Fan, W. & Kementsietsidis, A. (2006) Using partial evaluation in distributed query evaluation. In Proceedings of the 32nd International Conference on Very Large Data Bases, Seoul, Korea, September 12–15, 2006, Dayal, U., Whang, K.-Y., Lomet, D. B., Alonso, G., Lohman, G. M., Kersten, M. L., Cha, S. K. & Kim, Y.-K. (eds). ACM, pp. 211222.Google Scholar
Callahan, D. (1992) Recognizing and parallelizing bounded recurrences. In Languages and Compilers for Parallel Computing, Fourth International Workshop, Santa Clara, California, USA, August 7–9, 1991, Proceedings, Banerjee, U., Gelernter, D., Nicolau, A. & Padua, D. A. (eds), Lecture Notes in Computer Science, vol. 589. Springer, pp. 169185.CrossRefGoogle Scholar
Castro, D., Hammond, K. & Sarkar, S. (2016) Farms, pipes, streams and reforestation: Reasoning about structured parallel processes using types and hylomorphisms. In Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming, ICFP 2016, Nara, Japan, September 18–22, 2016, Garrigue, J., Keller, G. & Sumii, E. (eds). ACM, pp. 417.CrossRefGoogle Scholar
Castro, D., Hammond, K., Sarkar, S. & Alguwaifli, Y. (2018) Automatically deriving cost models for structured parallel processes using hylomorphisms. Future Gener. Comp. Syst. 79, 653668.CrossRefGoogle Scholar
Chi, Y.-Y. & Mu, S.-C. (2011) Constructing list homomorphisms from proofs. In Programming Languages and Systems - 9th Asian Symposium, APLAS 2011, Kenting, Taiwan, December 5–7, 2011. Proceedings, Yang, H. (ed), Lecture Notes in Computer Science, vol. 7078. Springer, pp. 7488.CrossRefGoogle Scholar
Chin, W.-N., Takano, A. & Hu, Z. (1998) Parallelization via context preservation. In Proceedings of the 1998 International Conference on Computer Languages, ICCL’98, May 14–16, 1998, Chicago, IL, USA. IEEE Computer Society, pp. 153162.Google Scholar
Cole, M. I. (1989) Algorithmic Skeletons: Structural Management of Parallel Computation. MIT Press.Google Scholar
Cong, G., Fan, W. & Kementsietsidis, A. (2007) Distributed query evaluation with performance guarantees. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Beijing, China, June 12–14, 2007, Chan, C. Y., Ooi, B. C. & Zhou, A. (eds). ACM, pp. 509520.CrossRefGoogle Scholar
Cong, G., Fan, W., Kementsietsidis, A., Li, J. & Liu, X. (2012) Partial evaluation for distributed XPath query processing and beyond. ACM Trans. Database Syst. 37(4), 32:1–32:43.CrossRefGoogle Scholar
Consel, C. & Danvy, O. (1992) Partial evaluation in parallel. Lisp Symb. Comput. 5(4), 327342.Google Scholar
Danvy, O. & Filinski, A. (1990) Abstracting control. In Proceedings of the 1990 ACM Conference on LISP and Functional Programming, LFP 1990, Nice, France, 27–29 June 1990. ACM, pp. 151160.CrossRefGoogle Scholar
Dean, J. & Ghemawat, S. (2004) MapReduce: Simplified data processing on large clusters. In 6th Symposium on Operating System Design and Implementation (OSDI 2004), December 6–8, 2004, San Francisco, California, USA, pp. 137150.Google Scholar
de Moura, A. & Ierusalimschy, R. (2009) Revisiting coroutines. ACM Trans. Program. Lang. Syst. 31(2), 6:16:31.CrossRefGoogle Scholar
Deitz, S. J., Callahan, D., Chamberlain, B. L. & Snyder, L. (2006) Global-view abstractions for user-defined reductions and scans. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2006, New York, New York, USA, March 29–31, 2006, Torrellas, J. & Chatterjee, S. (eds). ACM, pp. 4047.CrossRefGoogle Scholar
Dolan, S., Eliopoulos, S., Hillerström, D., Madhavapeddy, A., Sivaramakrishnan, K. C. & White, L. (2017) Concurrent system programming with effect handlers. In Trends in Functional Programming - 18th International Symposium, TFP 2017, Canterbury, UK, June 19–21, 2017, Revised Selected, Papers, Wang, M. & Owens, S. (eds), Lecture Notes in Computer Science, vol. 10788. Springer, pp. 98117.Google Scholar
Emoto, K., Fischer, S. & Hu, Z. (2012) Filter-embedding semiring fusion for programming with mapreduce. Formal Asp. Comput. 24(4–6), 623645.Google Scholar
Emoto, K., Hu, Z., Kakehi, K., Matsuzaki, K. & Takeichi, M. (2010) Generators-of-generators library with optimization capabilities in Fortress. In Euro-Par 2010 - Parallel Processing, 16th International Euro-Par Conference, Ischia, Italy, August 31–September 3, 2010, Proceedings, Part, II, D’Ambra, P., Guarracino, M. R. & Talia, D. (eds), Lecture Notes in Computer Science, vol. 6272. Springer, pp. 2637.Google Scholar
Farzan, A. & Nicolet, V. (2017) Synthesis of divide and conquer parallelism for loops. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, Barcelona, Spain, June 18–23, 2017, Cohen, A. & Vechev, M. T. (eds). ACM, pp. 540555.CrossRefGoogle Scholar
Farzan, A. & Nicolet, V. (2019) Modular divide-and-conquer parallelization of nested loops. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2019, Phoenix, AZ, USA, June 22–26, 2019, McKinley, K. S. & Fisher, K. (eds). ACM, pp. 610624.CrossRefGoogle Scholar
Fedyukovich, G., Ahmad, M. B. S. & Bodk, R. (2017) Gradual synthesis for static parallelization of single-pass array-processing programs. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, Barcelona, Spain, June 18–23, 2017, Cohen, A. & Vechev, M. T. (eds). ACM, pp. 572585.CrossRefGoogle Scholar
Fisher, A. L. & Ghuloum, A. M. (1994) Parallelizing complex scans and reductions. In Proceedings of the ACM SIGPLAN’94 Conference on Programming Language Design and Implementation (PLDI), Orlando, Florida, June 20–24, 1994, pp. 135146.CrossRefGoogle Scholar
Fluet, M. & Pucella, R. (2006) Phantom types and subtyping. J. Funct. Program. 16(6), 751791.CrossRefGoogle Scholar
Fluet, M., Rainey, M., Reppy, J. H. & Shaw, A. (2008) Implicitly threaded parallelism in Manticore. J. Funct. Program. 20(5–6), 537576.CrossRefGoogle Scholar
Frigo, M., Halpern, P., Leiserson, C. E. & Lewin-Berlin, S. (2009) Reducers and other Cilk++ hyperobjects. In SPAA 2009: Proceedings of the 21st Annual ACM Symposium on Parallelism in Algorithms and Architectures, Calgary, Alberta, Canada, August 11–13, 2009, Meyer auf der Heide, F. & Bender, M. A., pp. 7990.CrossRefGoogle Scholar
Giorgi, J.-F. & Métayer, D. L. (1990) Continuation-based parallel implementation of functional programming languages. In Proceedings of the 1990 ACM Conference on LISP and Functional Programming, LFP 1990, Nice, France, 27–29 June 1990. ACM, pp. 209217.CrossRefGoogle Scholar
Gorlatch, S. (1999) Extracting and implementing list homomorphisms in parallel program development. Sci. Comput. Program. 33(1), 127.CrossRefGoogle Scholar
Henriksen, T., Serup, N. G. W., Elsman, M., Henglein, F. & Oancea, C. E. (2017) Futhark: Purely functional gpu-programming with nested parallelism and in-place array updates. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, Barcelona, Spain, June 18–23, 2017, Cohen, A. & Vechev, M. T. (eds). ACM, pp. 556571.CrossRefGoogle Scholar
Hu, Z., Iwasaki, H. & Takechi, M. (1997) Formal derivation of efficient parallel programs by construction of list homomorphisms. ACM Trans. Program. Lang. Syst. 19(3), 444461.CrossRefGoogle Scholar
Hu, Z., Takeichi, M. & Chin, W.-N. (1998) Parallelization in calculational forms. In POPL’98: Proceedings of the 25th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, January 19–21, 1998, San Diego, CA, USA. ACM, pp. 316328.CrossRefGoogle Scholar
Huet, G. P. (1997) The zipper. J. Funct. Program. 7(5), 549554.CrossRefGoogle Scholar
Imam, S. M. & Sarkar, V. (2014) Cooperative scheduling of parallel tasks with general synchronization patterns. In ECOOP 2014 - Object-Oriented Programming - 28th European Conference, Uppsala, Sweden, July 28–August 1, 2014. Proceedings, Jones, R. E. (ed), Lecture Notes in Computer Science, vol. 8586. Springer, pp. 618643.CrossRefGoogle Scholar
Jiang, P., Chen, L. & Agrawal, G. (2018) Revealing parallel scans and reductions in recurrences through function reconstruction. In Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, PACT 2018, Limassol, Cyprus, November 01–04, 2018, Evripidou, S., Stenström, P. & O’Boyle, M. F. P. (eds). ACM, pp. 10:1–10:13.CrossRefGoogle Scholar
Jones, N. D. (1996) An introduction to partial evaluation. ACM Comput. Surv. 28(3), 480503.CrossRefGoogle Scholar
Keller, G., Chakravarty, M. M. T., Leshchinskiy, R., Jones, S. L. P. & Lippmeier, B. (2010) Regular, shape-polymorphic, parallel arrays in Haskell. In Proceeding of the 15th ACM SIGPLAN International Conference on Functional Programming, ICFP 2010, Baltimore, Maryland, USA, September 27–29, 2010, Hudak, P. & Weirich, S. (eds). ACM, pp. 261272.CrossRefGoogle Scholar
Kobayashi, N., Matsuda, K., Shinohara, A. & Yaguchi, K. (2012) Functional programs as compressed data. Higher-Order Symb. Comput. 25(1), 3984.CrossRefGoogle Scholar
Ladner, R. E. & Fischer, M. J. (1980) Parallel prefix computation. J. ACM 27(4), 831838.CrossRefGoogle Scholar
Launchbury, J. (1993) A natural semantics for lazy evaluation. In Conference Record of the Twentieth Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, Charleston, South Carolina, USA, January 1993, Deusen, M. S. V. & Lang, B. (eds). ACM, pp. 144154.Google Scholar
Lévy, J.-J. (1976) An algebraic interpretation of the lambda beta k-calculus; and an application of a labelled lambda-calculus. Theor. Comput. Sci. 2(1), 97114.CrossRefGoogle Scholar
Levy, P. B. (2003) Call-by-Push-Value: A Functional/Imperative Synthesis. Springer.CrossRefGoogle Scholar
Li, P., Marlow, S., Peyton Jones, S. L. & Tolmach, A. P. (2007) Lightweight concurrency primitives for GHC. In Proceedings of the ACM SIGPLAN Workshop on Haskell, Haskell 2007, Freiburg, Germany, September 30, 2007, Keller, G. (ed). ACM, pp. 107118.CrossRefGoogle Scholar
Marlow, S., Maier, P., Loidl, H.-W., Aswad, M. & Trinder, P. W. (2010) Seq no more: Better strategies for parallel Haskell. In Proceedings of the 3rd ACM SIGPLAN Symposium on Haskell, Haskell 2010, Baltimore, MD, USA, 30 September 2010, Gibbons, J. (ed). ACM, pp. 91102.CrossRefGoogle Scholar
Matsuzaki, K., Hu, Z., Kakehi, K. & Takeichi, M. (2005) Systematic derivation of tree contraction algorithms. Parallel Process. Lett. 15(3), 321336.CrossRefGoogle Scholar
Matsuzaki, K., Hu, Z. & Takeichi, M. (2006) Towards automatic parallelization of tree reductions in dynamic programming. In SPAA 2006: Proceedings of the 18th Annual ACM Symposium on Parallel Algorithms and Architectures, Cambridge, Massachusetts, USA, July 30–August 2, 2006, Gibbons, P. B. & Vishkin, U. (eds). ACM, pp. 3948.CrossRefGoogle Scholar
Minamide, Y. (1998) A functional representation of data structures with a hole. In POPL’98, Proceedings of the 25th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, San Diego, CA, USA, January 19–21, 1998, MacQueen, D. B. & Cardelli, L. (eds). ACM, pp. 7584.CrossRefGoogle Scholar
Morihata, A. (2019) Lambda calculus with algebraic simplification for reduction parallelization by equational reasoning. PACMPL 3(ICFP), 80:180:25.Google Scholar
Morihata, A. & Matsuzaki, K. (2010) Automatic parallelization of recursive functions using quantifier elimination. In Functional and Logic Programming, 10th International Symposium, FLOPS 2010, Sendai, Japan, April 19–21, 2010. Proceedings, Blume, M., Kobayashi, N. & Vidal, G. (eds), Lecture Notes in Computer Science, vol. 6009. Springer, pp. 321336.CrossRefGoogle Scholar
Morihata, A. & Matsuzaki, K. (2011) Balanced trees inhabiting functional parallel programming. In Proceeding of the 16th ACM SIGPLAN International Conference on Functional Programming, ICFP 2011, Tokyo, Japan, September 19–21, 2011, Chakravarty, M. M. T., Hu, Z. & Danvy, O. (eds). ACM, pp. 117128.CrossRefGoogle Scholar
Morihata, A., Matsuzaki, K., Hu, Z. & Takeichi, M. (2009) The third homomorphism theorem on trees: Downward & upward lead to divide-and-conquer. In Proceedings of the 36th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2009, Savannah, Georgia, USA, January 21–23, 2009. ACM, pp. 177185.CrossRefGoogle Scholar
Morita, K., Morihata, A., Matsuzaki, K., Hu, Z. & Takeichi, M. (2007) Automatic inversion generates divide-and-conquer parallel programs. In Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, San Diego, California, USA, June 10–13, 2007, Ferrante, J. & McKinley, K. S. (eds). ACM, pp. 146155.CrossRefGoogle Scholar
Nishimura, S. & Ohori, A. (1999) Parallel functional programming on recursively defined data via data-parallel recursion. J. Funct. Program. 9(4), 427462.CrossRefGoogle Scholar
Okada, M. (1989) Strong normalizability for the combined system of the typed lambda calculus and an arbitrary convergent term rewrite system. In Proceedings of the ACM-SIGSAM 1989 International Symposium on Symbolic and Algebraic Computation, ISSAC’89, Portland, Oregon, USA, July 17–19, 1989, Gonnet, G. H. (ed). ACM, pp. 357363.Google Scholar
Raychev, V., Musuvathi, M. & Mytkowicz, T. (2015) Parallelizing user-defined aggregations using symbolic execution. In Proceedings of the 25th Symposium on Operating Systems Principles, SOSP 2015, Monterey, CA, USA, October 4–7, 2015, Miller, E. L. & Hand, S. (eds). ACM, pp. 153167.CrossRefGoogle Scholar
Reid-Miller, M., Miller, G. L. & Modugno, F. (1993) List ranking and parallel tree contraction. In Synthesis of Parallel Algorithms, Chapter 3, Reif, J. H. (ed). Morgan Kaufmann Publishers.Google Scholar
Sato, S. & Iwasaki, H. (2011) Automatic parallelization via matrix multiplication. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2011, San Jose, CA, USA, June 4–8, 2011, Hall, M. W. & Padua, D. A. (eds). ACM, pp. 470479.CrossRefGoogle Scholar
Suganuma, T., Komatsu, H. & Nakatani, T. (1996) Detection and global optimization of reduction operations for distributed parallel machines. In ICS’96: Proceedings of the 1996 International Conference on Supercomputing, May 25–28, 1996, Philadelphia, PA, USA. ACM, pp. 1825.Google Scholar
Tannen, V. (1988) Combining algebra and higher-order types. In Proceedings of the Third Annual Symposium on Logic in Computer Science (LICS’88), Edinburgh, Scotland, UK, July 5–8, 1988. IEEE Computer Society, pp. 8290.Google Scholar
Tannen, V. & Gallier, J. H. (1991) Polymorphic rewriting conserves algebraic strong normalization. Theor. Comput. Sci. 83(1), 328.CrossRefGoogle Scholar
Terui, K. (2012). Semantic evaluation, intersection types and complexity of simply typed lambda calculus. In 23rd International Conference on Rewriting Techniques and Applications (RTA’12), RTA 2012, May 28–June 2, 2012, Nagoya, Japan, Tiwari, A. (ed), LIPIcs, vol. 15. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, pp. 323338.Google Scholar
Wand, M. (1999) Continuation-based multiprocessing. Higher-Order Symb. Comput. 12(3), 285299.Google Scholar
Xu, D. N., Khoo, S.-C. & Hu, Z. (2004) Ptype system: A featherweight parallelizability detector. In Programming Languages and Systems: Second Asian Symposium, APLAS 2004, Taipei, Taiwan, November 4–6, 2004. Proceedings, Chin, W.-N. (ed), Lecture Notes in Computer Science, vol. 3302. Springer, pp. 197212.Google Scholar
Submit a response

Discussions

No Discussions have been published for this article.