Skip to main content
Log in

Enhancing the Effectiveness of Inlining in Automatic Parallelization

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

The emergence of multi-core architectures makes it essential for optimizing compilers to automatically extract parallelism for large scientific applications composed of many subroutines residing in different files. Inlining is a well-known technique which can be used to erase procedural boundaries and enable more aggressive loop parallelization. However, conventional inlining cannot be applied to external libraries where the source code is not available, and when overly applied, it can degrade the effectiveness of compiler optimizations due to excessive code complexity. This paper highlights some obstacles we encountered while applying conventional inlining combined with automatic loop parallelization using the Polaris optimizing compiler and presents a new approach, annotation-based inlining, to effectively overcome these obstacles. Our experimental results show that the annotation-based inlining approach can eliminate negative impact of conventional inlining while enhancing the effectiveness of interprocedural parallelization for a majority of applications from the PERFECT benchmark suite.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

References

  1. Allen, R., Kennedy, K.: Optimizing Compilers for Modern Architec- tures. Morgan Kaufmann, San Francisco (2001)

    Google Scholar 

  2. Arnold, M., Fink, S., Sarkar, V., Sweeney, P.F.: A comparative study of static and profile-based heuristics for inlining. SIGPLAN Not. 35(7), 52–64 (2000)

    Article  Google Scholar 

  3. Ashley, J. M.: The effectiveness of flow analysis for inlining. In: In Proceedings of the 1997 ACM SIGPLAN International Conference on Functional Programming, pp 99–111 (1997).

  4. Ayers, A., Schooler, R., Gottlieb, R.: Aggressive inlining. SIGPLAN Not. 32(5), 134–145 (1997)

    Article  Google Scholar 

  5. Balasundaram, V., Kennedy, K.: A technique for summarizing data access and its use in parallelism enhancing transformations. SIGPLAN Not. 24(7), 41–53 (1989)

    Article  Google Scholar 

  6. Blume, W., Eigenmann, R.: Performance analysis of parallelizing compilers on the perfect benchmarks programs. IEEE Trans. Parallel Distrib. Syst. 3, 643–656 (1992)

    Article  Google Scholar 

  7. Blume, W., Eigenmann, R., Faigin, K., Grout, J., Hoeflinger, J., Padua, D., Petersen, P., Pottenger, W., Rauchwerger, L., Tu, P., Weatherford, S.: Polaris: Improving the effectiveness of parallelizing compilers. In: In Languages and Compilers for Parallel Computing, pp. 141–154. Springer-Verlag (1994).

  8. Bondhugula, U., Hartono, A., Ramanujam, J.: A practical automatic polyhedral parallelizer and locality optimizer. In: In PLDI 08: Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation (2008).

  9. Calder, B., Grunwald, D.: Reducing indirect function call overhead in c++ programs. In: POPL ’94: Proceedings of the 21st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 397–408, ACM, New York, NY, USA, (1994).

  10. Cavazos, J., OBoyle, M. F. P.: Automatic tuning of inlining heuristics. In: In ACM/IEEE Conference on Supercomputing, p. 14 (2005).

  11. Chaki, S., Clarke, E., Groce, A.: Modular verification of software components in c. Trans. Softw. Eng. 1(8), 388–402 (2004).

  12. Chang, Y.-S., Lee, H.-J., Park, D.-S., Lee, I.-Y.: Interprocedural transformations for extracting maximum parallelism. In: ADVIS ’02: Proceedings of the Second International Conference on Advances in Information Systems, pp. 415–424, Springer-Verlag, London, UK, (2002).

  13. Cooper, K.D., Hall, M.W., Torczon, L.: Unexpected side effects of inline substitution: a case study. ACM Lett. Program. Lang. Syst. 1(1), 22–32 (1992)

    Article  Google Scholar 

  14. Dean, J., Chambers, C.: Towards better inlining decisions using inlining trials. In: In Proceedings of the 1994 ACM Conference on LISP and Functional Programming, pp. 273–282 (1994).

  15. Flanagan, C., Leino, K. R. M., Lillibridge, M., Nelson, G., Saxe, J. B.,Stata, R.: Extended static checking for java. In: Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation, PLDI ’02, pp. 234–245, ACM, New York, NY, USA (2002).

  16. Grant, B., Mock, M., Philipose, M., Chambers, C., Eggers, S.J.: Dyc: an expressive annotation-directed dynamic compiler for c. Theor. Comput. Sci. 248(1–2), 147–199 (2000)

    Article  Google Scholar 

  17. Grout, J. R.: Inline expansion for the polaris research compiler. Technical Report, (1995).

  18. Guyer, S.Z., Lin, C.: Broadway: A compiler for exploiting the domain-specific semantics of software libraries. Proc. IEEE Special Issue Prog. Gener. Optim. Adapt. 93(2), 342–357 (2005)

    Google Scholar 

  19. Hall, M., Kennedy, K., Kinley, K. S. M.: Interprocedural transformations for parallel code generation. In: In Proceedings of Supercomputing ’91, pp. 424–434 (1991).

  20. Hall, M. W., Mellor-Crummey, J. M., Carle, A., Rodrguez, R. G.: Fiat: A framework for interprocedural analysis and transformation. In: In Proceedings of the Sixth Workshop on Languages and Compilers for Parallel Computing, pp. 522–545. Springer-Verlag (1995).

  21. Hank, R. E., Mei, W., Hwu, W., Rau, B. R.: Region-based compilation: An introduction and motivation. In: In Proceedings of the 28th Annual International Symposium on Microarchitecture, pp. 158–168 (1995).

  22. Hoeflinger, J.P., Paek, Y., Yi, K.: Unified interprocedural parallelism detection. Int. J. Parallel Program. 29(2), 185–215 (2001)

    Article  Google Scholar 

  23. Jagannathan, S., Wright, A.: Flow-directed inlining. In: In Proceedings of the ACM Conference on Programming Language Design and Implementation, pp. 193–205 (1996).

  24. Liu, Y., Zhaoqing, Y. L., Qiao, R., Ching Ju, R. D.: A region- based compilation infrastructure. In: Proceefings of the 7th Workshop on Interaction between Compilers and Computer Architectures, pp. 75–84. IEEE Computer Society Press (2003).

  25. Psarris, K., Kyriakopoulos, K.: The impact of data dependence analysis on compilation and program parallelization. In: ICS ’03: Proceedings of the 17th Annual International Conference on Supercomputing, pp. 205–214, ACM, New York, NY, USA (2003).

  26. Ramaswamy, S., Sapatnekar, S., Banerjee, P.: A framework for exploiting data and functional parallelism on distributed memory multicomputers. IEEE Trans. Parallel Distrib. Syst. 8(11), 1098–1116 (1997). https://doi.org/10.1109/71.642945

  27. Saavedra, R.H., Smith, A.J.: Analysis of benchmark characteristics and benchmark performance prediction. ACM Trans. Comput. Syst. 14, 344–384 (1996)

    Article  Google Scholar 

  28. Seidl, H., Steffen, B.: Constraint-based inter-procedural analysis of parallel programs. Nordic J. of Computing 7(4), 375–400 (2000)

    MathSciNet  MATH  Google Scholar 

  29. Shenoy, U.N., Srikant, Y.N., Bhatkar, V.P.: An automatic parallelization framework for multicomputers. Comput. Lang. 20(3), 135–150 (1994)

    Article  Google Scholar 

  30. Shirako, J., Nagasawa, K., Ishizaka, K., Obata, M., Kasahara, H.: Selective inline expansion for improvement of multi grain parallelism. In: Parallel and Distributed Computing and Networks, pp. 476–482 (2004).

  31. Triantafyllis, S., Bridges, M. J., Raman, E., Ottoni, G., August, D. I.: A framework for unrestricted whole-program optimization. In: In ACM SIGPLAN 2006 Conference on Programming Language Design and Implementation, pp. 61–71 (2006).

  32. Waddell, O., Dybig, R. K.: Fast and effective procedure inlining. In: SAS ’97: Proceedings of the 4th International Symposium on Static Analysis, pp. 35–52, Springer-Verlag, London, UK (1997).

  33. Wolfe, M.J.: High Performance Compilers for Parallel Computing. Addison-Wesley Longman Publishing Co. Inc, Boston, MA, USA (1995)

    MATH  Google Scholar 

  34. Wu, P., Midkiff, S. P., Moreira, J. E., Gupta, M.: Improving Java performance through semantic inlining. In: Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing (1999).

  35. Wu, P., Moreira, J. E., Midkiff, S. P., Gupta, M., Padua, D. A.: Semantic inlining - the compiler support for java in technical computing. In: PPSC (1999).

  36. Zhao, P., Amaral, J. N.: To inline or not to inline? enhanced inlining decisions. In: In Workshop on Languages and Compilers for Parallel Computing (LCPC), pp. 405–419 (2003).

  37. Zhao, P., Amaral, J.N.: Ablego: a function outlining and partial inlining framework. Softw. Pract. Exper. 37(5), 465–491 (2007)

    Article  Google Scholar 

  38. Aleksandar, P., et al.: An optimization-driven incremental inline substitution algorithm for just-in-time compilers.In: 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE (2019).

  39. Häubl, C., Wimmer, C., Mössenböck, H.: Context-sensitive Trace Inlining for Java. Comput. Lang. Syst. Struct. 39(4), 123–141 (2013). https://doi.org/10.1016/j.cl.2013.04.002

    Article  Google Scholar 

  40. Cammarota, R., Nicolau, A., Veidenbaum, A.V., Kejariwal, A., Donato, D., Madhugiri, M.: On the determination of inlining vectors for program optimization. In: Proceedings of the 22nd International Conference on Compiler Construction (CC’13). Springer-Verlag, Berlin, Heidelberg, pp. 164–183. https://doi.org/10.1007/978-3-642-37051-9_9

  41. Simon, D., Cavazos, J., Wimmer, C., Kulkarni, S.: Automatic construction of inlining heuristics using machine learning. In: Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO ’13). IEEE Computer Society, Washington, DC, pp. 1–12. https://doi.org/10.1109/CGO.2013.6495004

  42. Stucki, N., Biboudis, A., Doeraene, S., Odersky, M.: Semantics-preserving inlining for metaprogramming. In: Proceedings of the 11th ACM SIGPLAN International Symposium on Scala (SCALA 2020). ACM, New York, NY, USA, pp. 14–24. https://doi.org/10.1145/3426426.3428486

  43. Norris, H. B., Sadayappan, P.: Annotation-based empirical performance tuning using Orio, In: 2009 IEEE International Symposium on Parallel & Distributed Processing, Rome, Italy, pp. 1-11 (2009). https://doi.org/10.1109/IPDPS.2009.5161004

  44. Papadopoulos, I., Thomas, N., Fidel, A., Amato, N. M., Rauchwerger, L.: STAPL-RTS: An application driven runtime system. In: Proceedings of the 29th ACM on International Conference on Supercomputing (ICS '15). ACM, New York, NY, USA, pp. 425–434. https://doi.org/10.1145/2751205.2751233

  45. Yi, Q., Wang, Q., Cui, H.: Specializing compiler optimizations through programmable composition for dense matrix computations. In: Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-47). IEEE Computer Society, USA, pp. 596–608. https://doi.org/10.1109/MICRO.2014.14

  46. Nguyen, T., Cicotti, P., Bylaska, E., Quinlan, D., Baden, S.B.: Bamboo: translating MPI applications to a latency-tolerant, data-driven form. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC '12). IEEE Computer Society Press, Washington, DC, USA, Article 39, pp. 1–11.

  47. Ali, K., Lhoták, O.: Application-only call graph construction. In: Proceedings of the European Conference on Object-Oriented Programming (ECOOP 2012). Lecture Notes in Computer Science, Vol 7313. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31057-7_30

Download references

Funding

Funding was provided by National Science Foundation (Grant Number CCF-1910488).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kleanthis Psarris.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guo, J., Yi, Q. & Psarris, K. Enhancing the Effectiveness of Inlining in Automatic Parallelization. Int J Parallel Prog 50, 65–88 (2022). https://doi.org/10.1007/s10766-021-00722-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-021-00722-1

Keywords

Navigation