Lambda calculus with algebraic simplification for reduction parallelisation: Extended study

AKIMASA MORIHATA

doi:10.1017/S0956796821000058

Lambda calculus with algebraic simplification for reduction parallelisation: Extended study

Part of: ICFP 2019

Published online by Cambridge University Press: 05 April 2021

AKIMASA MORIHATA

Show author details

AKIMASA MORIHATA*: Affiliation:
University of Tokyo, 3-8-1, Komaba, Meguro-ku, Tokyo, Japan (e-mail: morihata@graco.c.u-tokyo.ac.jp)

Article contents

Abstract
References

Rights & Permissions

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Parallel reduction is a major component of parallel programming and widely used for summarisation and aggregation. It is not well understood, however, what sorts of non-trivial summarisations can be implemented as parallel reductions. This paper develops a calculus named λAS, a simply typed lambda calculus with algebraic simplification. This calculus provides a foundation for studying a parallelisation of complex reductions by equational reasoning. Its key feature is δ abstraction. A δ abstraction is observationally equivalent to the standard λ abstraction, but its body is simplified before the arrival of its arguments using algebraic properties such as associativity and commutativity. In addition, the type system of λAS guarantees that simplifications due to δ abstractions do not lead to serious overheads. The usefulness of λAS is demonstrated on examples of developing complex parallel reductions, including those containing more than one reduction operator, loops with conditional jumps, prefix sum patterns and even tree manipulations.

Type: Research Article
Information: Journal of Functional Programming , Volume 31 , 2021 , e7

DOI: https://doi.org/10.1017/S0956796821000058 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright: © The Author(s), 2021. Published by Cambridge University Press

References

Bergstrom, L., Fluet, M., Rainey, M., Reppy, J. H. & Shaw, A. (2012) Lazy tree splitting. J. Funct. Program. 22(4–5), 382–438.CrossRef Google Scholar

Blelloch, G. E. (1993) Prefix sums and their applications. In Synthesis of Parallel Algorithms, Chapter 1, Reif, J. H. (ed). Morgan Kaufmann Publishers.Google Scholar

Bruggeman, C., Waddell, O. & Dybvig, R. K. (1996) Representing control in the presence of one-shot continuations. In Proceedings of the ACM SIGPLAN’96 Conference on Programming Language Design and Implementation (PLDI), Philadephia, Pennsylvania, USA, May 21–24, 1996, Fischer, C. N. (ed). ACM, pp. 99–107.CrossRef Google Scholar

Buneman, P., Cong, G., Fan, W. & Kementsietsidis, A. (2006) Using partial evaluation in distributed query evaluation. In Proceedings of the 32nd International Conference on Very Large Data Bases, Seoul, Korea, September 12–15, 2006, Dayal, U., Whang, K.-Y., Lomet, D. B., Alonso, G., Lohman, G. M., Kersten, M. L., Cha, S. K. & Kim, Y.-K. (eds). ACM, pp. 211–222.Google Scholar

Callahan, D. (1992) Recognizing and parallelizing bounded recurrences. In Languages and Compilers for Parallel Computing, Fourth International Workshop, Santa Clara, California, USA, August 7–9, 1991, Proceedings, Banerjee, U., Gelernter, D., Nicolau, A. & Padua, D. A. (eds), Lecture Notes in Computer Science, vol. 589. Springer, pp. 169–185.CrossRef Google Scholar

Castro, D., Hammond, K. & Sarkar, S. (2016) Farms, pipes, streams and reforestation: Reasoning about structured parallel processes using types and hylomorphisms. In Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming, ICFP 2016, Nara, Japan, September 18–22, 2016, Garrigue, J., Keller, G. & Sumii, E. (eds). ACM, pp. 4–17.CrossRef Google Scholar

Castro, D., Hammond, K., Sarkar, S. & Alguwaifli, Y. (2018) Automatically deriving cost models for structured parallel processes using hylomorphisms. Future Gener. Comp. Syst. 79, 653–668.CrossRef Google Scholar

Chi, Y.-Y. & Mu, S.-C. (2011) Constructing list homomorphisms from proofs. In Programming Languages and Systems - 9th Asian Symposium, APLAS 2011, Kenting, Taiwan, December 5–7, 2011. Proceedings, Yang, H. (ed), Lecture Notes in Computer Science, vol. 7078. Springer, pp. 74–88.CrossRef Google Scholar

Chin, W.-N., Takano, A. & Hu, Z. (1998) Parallelization via context preservation. In Proceedings of the 1998 International Conference on Computer Languages, ICCL’98, May 14–16, 1998, Chicago, IL, USA. IEEE Computer Society, pp. 153–162.Google Scholar

Cole, M. I. (1989) Algorithmic Skeletons: Structural Management of Parallel Computation. MIT Press.Google Scholar

Cong, G., Fan, W. & Kementsietsidis, A. (2007) Distributed query evaluation with performance guarantees. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Beijing, China, June 12–14, 2007, Chan, C. Y., Ooi, B. C. & Zhou, A. (eds). ACM, pp. 509–520.CrossRef Google Scholar

Cong, G., Fan, W., Kementsietsidis, A., Li, J. & Liu, X. (2012) Partial evaluation for distributed XPath query processing and beyond. ACM Trans. Database Syst. 37(4), 32:1–32:43.CrossRef Google Scholar

Consel, C. & Danvy, O. (1992) Partial evaluation in parallel. Lisp Symb. Comput. 5(4), 327–342.Google Scholar

Danvy, O. & Filinski, A. (1990) Abstracting control. In Proceedings of the 1990 ACM Conference on LISP and Functional Programming, LFP 1990, Nice, France, 27–29 June 1990. ACM, pp. 151–160.CrossRef Google Scholar

Dean, J. & Ghemawat, S. (2004) MapReduce: Simplified data processing on large clusters. In 6th Symposium on Operating System Design and Implementation (OSDI 2004), December 6–8, 2004, San Francisco, California, USA, pp. 137–150.Google Scholar

de Moura, A. & Ierusalimschy, R. (2009) Revisiting coroutines. ACM Trans. Program. Lang. Syst. 31(2), 6:1–6:31.CrossRef Google Scholar

Deitz, S. J., Callahan, D., Chamberlain, B. L. & Snyder, L. (2006) Global-view abstractions for user-defined reductions and scans. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2006, New York, New York, USA, March 29–31, 2006, Torrellas, J. & Chatterjee, S. (eds). ACM, pp. 40–47.CrossRef Google Scholar

Dolan, S., Eliopoulos, S., Hillerström, D., Madhavapeddy, A., Sivaramakrishnan, K. C. & White, L. (2017) Concurrent system programming with effect handlers. In Trends in Functional Programming - 18th International Symposium, TFP 2017, Canterbury, UK, June 19–21, 2017, Revised Selected, Papers, Wang, M. & Owens, S. (eds), Lecture Notes in Computer Science, vol. 10788. Springer, pp. 98–117.Google Scholar

Emoto, K., Fischer, S. & Hu, Z. (2012) Filter-embedding semiring fusion for programming with mapreduce. Formal Asp. Comput. 24(4–6), 623–645.Google Scholar

Emoto, K., Hu, Z., Kakehi, K., Matsuzaki, K. & Takeichi, M. (2010) Generators-of-generators library with optimization capabilities in Fortress. In Euro-Par 2010 - Parallel Processing, 16th International Euro-Par Conference, Ischia, Italy, August 31–September 3, 2010, Proceedings, Part, II, D’Ambra, P., Guarracino, M. R. & Talia, D. (eds), Lecture Notes in Computer Science, vol. 6272. Springer, pp. 26–37.Google Scholar

Farzan, A. & Nicolet, V. (2017) Synthesis of divide and conquer parallelism for loops. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, Barcelona, Spain, June 18–23, 2017, Cohen, A. & Vechev, M. T. (eds). ACM, pp. 540–555.CrossRef Google Scholar

Farzan, A. & Nicolet, V. (2019) Modular divide-and-conquer parallelization of nested loops. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2019, Phoenix, AZ, USA, June 22–26, 2019, McKinley, K. S. & Fisher, K. (eds). ACM, pp. 610–624.CrossRef Google Scholar

Fedyukovich, G., Ahmad, M. B. S. & Bodk, R. (2017) Gradual synthesis for static parallelization of single-pass array-processing programs. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, Barcelona, Spain, June 18–23, 2017, Cohen, A. & Vechev, M. T. (eds). ACM, pp. 572–585.CrossRef Google Scholar

Fisher, A. L. & Ghuloum, A. M. (1994) Parallelizing complex scans and reductions. In Proceedings of the ACM SIGPLAN’94 Conference on Programming Language Design and Implementation (PLDI), Orlando, Florida, June 20–24, 1994, pp. 135–146.CrossRef Google Scholar

Fluet, M. & Pucella, R. (2006) Phantom types and subtyping. J. Funct. Program. 16(6), 751–791.CrossRef Google Scholar

Fluet, M., Rainey, M., Reppy, J. H. & Shaw, A. (2008) Implicitly threaded parallelism in Manticore. J. Funct. Program. 20(5–6), 537–576.CrossRef Google Scholar

Frigo, M., Halpern, P., Leiserson, C. E. & Lewin-Berlin, S. (2009) Reducers and other Cilk++ hyperobjects. In SPAA 2009: Proceedings of the 21st Annual ACM Symposium on Parallelism in Algorithms and Architectures, Calgary, Alberta, Canada, August 11–13, 2009, Meyer auf der Heide, F. & Bender, M. A., pp. 79–90.CrossRef Google Scholar

Giorgi, J.-F. & Métayer, D. L. (1990) Continuation-based parallel implementation of functional programming languages. In Proceedings of the 1990 ACM Conference on LISP and Functional Programming, LFP 1990, Nice, France, 27–29 June 1990. ACM, pp. 209–217.CrossRef Google Scholar

Gorlatch, S. (1999) Extracting and implementing list homomorphisms in parallel program development. Sci. Comput. Program. 33(1), 1–27.CrossRef Google Scholar

Henriksen, T., Serup, N. G. W., Elsman, M., Henglein, F. & Oancea, C. E. (2017) Futhark: Purely functional gpu-programming with nested parallelism and in-place array updates. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, Barcelona, Spain, June 18–23, 2017, Cohen, A. & Vechev, M. T. (eds). ACM, pp. 556–571.CrossRef Google Scholar

Hu, Z., Iwasaki, H. & Takechi, M. (1997) Formal derivation of efficient parallel programs by construction of list homomorphisms. ACM Trans. Program. Lang. Syst. 19(3), 444–461.CrossRef Google Scholar

Hu, Z., Takeichi, M. & Chin, W.-N. (1998) Parallelization in calculational forms. In POPL’98: Proceedings of the 25th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, January 19–21, 1998, San Diego, CA, USA. ACM, pp. 316–328.CrossRef Google Scholar

Huet, G. P. (1997) The zipper. J. Funct. Program. 7(5), 549–554.CrossRef Google Scholar

Imam, S. M. & Sarkar, V. (2014) Cooperative scheduling of parallel tasks with general synchronization patterns. In ECOOP 2014 - Object-Oriented Programming - 28th European Conference, Uppsala, Sweden, July 28–August 1, 2014. Proceedings, Jones, R. E. (ed), Lecture Notes in Computer Science, vol. 8586. Springer, pp. 618–643.CrossRef Google Scholar

Jiang, P., Chen, L. & Agrawal, G. (2018) Revealing parallel scans and reductions in recurrences through function reconstruction. In Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, PACT 2018, Limassol, Cyprus, November 01–04, 2018, Evripidou, S., Stenström, P. & O’Boyle, M. F. P. (eds). ACM, pp. 10:1–10:13.CrossRef Google Scholar

Jones, N. D. (1996) An introduction to partial evaluation. ACM Comput. Surv. 28(3), 480–503.CrossRef Google Scholar

Keller, G., Chakravarty, M. M. T., Leshchinskiy, R., Jones, S. L. P. & Lippmeier, B. (2010) Regular, shape-polymorphic, parallel arrays in Haskell. In Proceeding of the 15th ACM SIGPLAN International Conference on Functional Programming, ICFP 2010, Baltimore, Maryland, USA, September 27–29, 2010, Hudak, P. & Weirich, S. (eds). ACM, pp. 261–272.CrossRef Google Scholar

Kobayashi, N., Matsuda, K., Shinohara, A. & Yaguchi, K. (2012) Functional programs as compressed data. Higher-Order Symb. Comput. 25(1), 39–84.CrossRef Google Scholar

Ladner, R. E. & Fischer, M. J. (1980) Parallel prefix computation. J. ACM 27(4), 831–838.CrossRef Google Scholar

Launchbury, J. (1993) A natural semantics for lazy evaluation. In Conference Record of the Twentieth Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, Charleston, South Carolina, USA, January 1993, Deusen, M. S. V. & Lang, B. (eds). ACM, pp. 144–154.Google Scholar

Lévy, J.-J. (1976) An algebraic interpretation of the lambda beta k-calculus; and an application of a labelled lambda-calculus. Theor. Comput. Sci. 2(1), 97–114.CrossRef Google Scholar

Levy, P. B. (2003) Call-by-Push-Value: A Functional/Imperative Synthesis. Springer.CrossRef Google Scholar

Li, P., Marlow, S., Peyton Jones, S. L. & Tolmach, A. P. (2007) Lightweight concurrency primitives for GHC. In Proceedings of the ACM SIGPLAN Workshop on Haskell, Haskell 2007, Freiburg, Germany, September 30, 2007, Keller, G. (ed). ACM, pp. 107–118.CrossRef Google Scholar

Marlow, S., Maier, P., Loidl, H.-W., Aswad, M. & Trinder, P. W. (2010) Seq no more: Better strategies for parallel Haskell. In Proceedings of the 3rd ACM SIGPLAN Symposium on Haskell, Haskell 2010, Baltimore, MD, USA, 30 September 2010, Gibbons, J. (ed). ACM, pp. 91–102.CrossRef Google Scholar

Matsuzaki, K., Hu, Z., Kakehi, K. & Takeichi, M. (2005) Systematic derivation of tree contraction algorithms. Parallel Process. Lett. 15(3), 321–336.CrossRef Google Scholar

Matsuzaki, K., Hu, Z. & Takeichi, M. (2006) Towards automatic parallelization of tree reductions in dynamic programming. In SPAA 2006: Proceedings of the 18th Annual ACM Symposium on Parallel Algorithms and Architectures, Cambridge, Massachusetts, USA, July 30–August 2, 2006, Gibbons, P. B. & Vishkin, U. (eds). ACM, pp. 39–48.CrossRef Google Scholar

Minamide, Y. (1998) A functional representation of data structures with a hole. In POPL’98, Proceedings of the 25th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, San Diego, CA, USA, January 19–21, 1998, MacQueen, D. B. & Cardelli, L. (eds). ACM, pp. 75–84.CrossRef Google Scholar

Morihata, A. (2019) Lambda calculus with algebraic simplification for reduction parallelization by equational reasoning. PACMPL 3(ICFP), 80:1–80:25.Google Scholar

Morihata, A. & Matsuzaki, K. (2010) Automatic parallelization of recursive functions using quantifier elimination. In Functional and Logic Programming, 10th International Symposium, FLOPS 2010, Sendai, Japan, April 19–21, 2010. Proceedings, Blume, M., Kobayashi, N. & Vidal, G. (eds), Lecture Notes in Computer Science, vol. 6009. Springer, pp. 321–336.CrossRef Google Scholar

Morihata, A. & Matsuzaki, K. (2011) Balanced trees inhabiting functional parallel programming. In Proceeding of the 16th ACM SIGPLAN International Conference on Functional Programming, ICFP 2011, Tokyo, Japan, September 19–21, 2011, Chakravarty, M. M. T., Hu, Z. & Danvy, O. (eds). ACM, pp. 117–128.CrossRef Google Scholar

Morihata, A., Matsuzaki, K., Hu, Z. & Takeichi, M. (2009) The third homomorphism theorem on trees: Downward & upward lead to divide-and-conquer. In Proceedings of the 36th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2009, Savannah, Georgia, USA, January 21–23, 2009. ACM, pp. 177–185.CrossRef Google Scholar

Morita, K., Morihata, A., Matsuzaki, K., Hu, Z. & Takeichi, M. (2007) Automatic inversion generates divide-and-conquer parallel programs. In Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, San Diego, California, USA, June 10–13, 2007, Ferrante, J. & McKinley, K. S. (eds). ACM, pp. 146–155.CrossRef Google Scholar

Nishimura, S. & Ohori, A. (1999) Parallel functional programming on recursively defined data via data-parallel recursion. J. Funct. Program. 9(4), 427–462.CrossRef Google Scholar

Okada, M. (1989) Strong normalizability for the combined system of the typed lambda calculus and an arbitrary convergent term rewrite system. In Proceedings of the ACM-SIGSAM 1989 International Symposium on Symbolic and Algebraic Computation, ISSAC’89, Portland, Oregon, USA, July 17–19, 1989, Gonnet, G. H. (ed). ACM, pp. 357–363.Google Scholar

Raychev, V., Musuvathi, M. & Mytkowicz, T. (2015) Parallelizing user-defined aggregations using symbolic execution. In Proceedings of the 25th Symposium on Operating Systems Principles, SOSP 2015, Monterey, CA, USA, October 4–7, 2015, Miller, E. L. & Hand, S. (eds). ACM, pp. 153–167.CrossRef Google Scholar

Reid-Miller, M., Miller, G. L. & Modugno, F. (1993) List ranking and parallel tree contraction. In Synthesis of Parallel Algorithms, Chapter 3, Reif, J. H. (ed). Morgan Kaufmann Publishers.Google Scholar

Sato, S. & Iwasaki, H. (2011) Automatic parallelization via matrix multiplication. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2011, San Jose, CA, USA, June 4–8, 2011, Hall, M. W. & Padua, D. A. (eds). ACM, pp. 470–479.CrossRef Google Scholar

Suganuma, T., Komatsu, H. & Nakatani, T. (1996) Detection and global optimization of reduction operations for distributed parallel machines. In ICS’96: Proceedings of the 1996 International Conference on Supercomputing, May 25–28, 1996, Philadelphia, PA, USA. ACM, pp. 18–25.Google Scholar

Tannen, V. (1988) Combining algebra and higher-order types. In Proceedings of the Third Annual Symposium on Logic in Computer Science (LICS’88), Edinburgh, Scotland, UK, July 5–8, 1988. IEEE Computer Society, pp. 82–90.Google Scholar

Tannen, V. & Gallier, J. H. (1991) Polymorphic rewriting conserves algebraic strong normalization. Theor. Comput. Sci. 83(1), 3–28.CrossRef Google Scholar

Terui, K. (2012). Semantic evaluation, intersection types and complexity of simply typed lambda calculus. In 23rd International Conference on Rewriting Techniques and Applications (RTA’12), RTA 2012, May 28–June 2, 2012, Nagoya, Japan, Tiwari, A. (ed), LIPIcs, vol. 15. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, pp. 323–338.Google Scholar

Wand, M. (1999) Continuation-based multiprocessing. Higher-Order Symb. Comput. 12(3), 285–299.Google Scholar

Xu, D. N., Khoo, S.-C. & Hu, Z. (2004) Ptype system: A featherweight parallelizability detector. In Programming Languages and Systems: Second Asian Symposium, APLAS 2004, Taipei, Taiwan, November 4–6, 2004. Proceedings, Chin, W.-N. (ed), Lecture Notes in Computer Science, vol. 3302. Springer, pp. 197–212.Google Scholar

Submit a response

Discussions

No Discussions have been published for this article.

Article contents

Lambda calculus with algebraic simplification for reduction parallelisation: Extended study

Abstract

References

Discussions

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests