Reproducibility, energy efficiency and performance of pseudorandom number generators in machine learning: a comparative study of python, numpy, tensorflow, and pytorch implementations

Antunes, Benjamin; Hill, David R. C

Computer Science > Mathematical Software

arXiv:2401.17345 (cs)

[Submitted on 30 Jan 2024 (v1), last revised 10 Feb 2024 (this version, v2)]

Title:Reproducibility, energy efficiency and performance of pseudorandom number generators in machine learning: a comparative study of python, numpy, tensorflow, and pytorch implementations

Authors:Benjamin Antunes, David R.C Hill

View PDF

Abstract:Pseudo-Random Number Generators (PRNGs) have become ubiquitous in machine learning technologies because they are interesting for numerous methods. The field of machine learning holds the potential for substantial advancements across various domains, as exemplified by recent breakthroughs in Large Language Models (LLMs). However, despite the growing interest, persistent concerns include issues related to reproducibility and energy consumption. Reproducibility is crucial for robust scientific inquiry and explainability, while energy efficiency underscores the imperative to conserve finite global resources. This study delves into the investigation of whether the leading Pseudo-Random Number Generators (PRNGs) employed in machine learning languages, libraries, and frameworks uphold statistical quality and numerical reproducibility when compared to the original C implementation of the respective PRNG algorithms. Additionally, we aim to evaluate the time efficiency and energy consumption of various implementations. Our experiments encompass Python, NumPy, TensorFlow, and PyTorch, utilizing the Mersenne Twister, PCG, and Philox algorithms. Remarkably, we verified that the temporal performance of machine learning technologies closely aligns with that of C-based implementations, with instances of achieving even superior performances. On the other hand, it is noteworthy that ML technologies consumed only 10% more energy than their C-implementation counterparts. However, while statistical quality was found to be comparable, achieving numerical reproducibility across different platforms for identical seeds and algorithms was not achieved.

Comments:	20 pages, 10 tables, 1 figure
Subjects:	Mathematical Software (cs.MS); Machine Learning (cs.LG)
Cite as:	arXiv:2401.17345 [cs.MS]
	(or arXiv:2401.17345v2 [cs.MS] for this version)
	https://doi.org/10.48550/arXiv.2401.17345

Submission history

From: Benjamin Antunes [view email]
[v1] Tue, 30 Jan 2024 15:44:14 UTC (729 KB)
[v2] Sat, 10 Feb 2024 12:09:18 UTC (727 KB)

Computer Science > Mathematical Software

Title:Reproducibility, energy efficiency and performance of pseudorandom number generators in machine learning: a comparative study of python, numpy, tensorflow, and pytorch implementations

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Mathematical Software

Title:Reproducibility, energy efficiency and performance of pseudorandom number generators in machine learning: a comparative study of python, numpy, tensorflow, and pytorch implementations

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators