Predicting Stellar Masses of the First Galaxies Using Graph Neural Networks

Vincent A. Horvath; Snigdaa S. Sethuram; John H. Wise

doi:10.3847/2515-5172/ad40ad

1. Introduction

Traditional methods for estimating galactic properties require substantial amounts of compute time. The most detailed and expensive approaches use cosmological simulations that solve the governing dynamical equations in a domain simulation and resolve individual star-forming regions. Less computationally expensive semi-analytical models (SAMs) build galaxies employing analytical and empirical models for baryonic physics within merger trees of dark matter (DM) halos sampled from either N-body simulations or extended Press–Schechter formalism. Results from both approaches have uncovered a detailed picture of galaxy formation and evolution (Somerville & Davé 2015) but require a non-negligible amount of computing power. We aim to provide a rapid alternative to these methods by utilizing their outputs to train a regression model that predicts stellar masses of high-redshift galaxies solely from their DM merger trees.

To achieve this vision, we utilize machine learning techniques, namely graph neural networks (GNNs). Previous works have found success with this, seeing up to a factor of two smaller rms errors than other methods, while being four orders of magnitude faster than SAMs (Jespersen et al. 2022, hereafter "Mangrove"). Our attempts differ in that we are utilizing "truth" data generated from cosmological simulations run with the adaptive mesh refinement code Enzo (Brummel-Smith et al. 2019), while Mangrove uses data from the Santa Cruz SAM (Somerville & Primack 1999). Mangrove also targets the low-redshift universe (z < 2), whereas our study focuses on z ≳ 9 galaxies.

2. Methods

We sample our galactic data from a cosmological radiation hydrodynamics simulation of the first stars and galaxies (Skinner & Wise 2020). We generate DM halo merger trees using a combination of Rockstar and Consistent Trees (Behroozi et al. 2012a, 2012b). We supplement the halo data in the merger trees with their metal-enriched stellar masses M_*. We then combine this stellar information with the merger trees into a graph using the PyTorch Geometric library (Hamilton et al. 2017; Fey & Lenssen 2019), with edges from parent halo (snapshot n) to child halo (snapshot n + 1). If two halos merge, they will both be the parent of the new child halo. Our data consists of 11,537 total merger trees (graphs), where each node in the graph contains the galaxy's redshift, stellar mass, DM mass, virial radius, and position.

Because halos grow exponentially, we transform the masses into log-10 values, allowing growth to be linearly regressed by our model. We select low-mass galaxies with halo mass M_h > 10⁵ M_⊙, and 0 < M_*/M_⊙ < 10⁵. This leaves 58 merger trees, approximately 0.5% of all merger trees from the input simulation.

Before training our model, we select DM mass and redshift as our input features, and stellar mass as the output. We then split our data set into train and test graphs by first shuffling, and then randomly selecting 20% of the merger trees to be in the test split. The remaining 80% of trees were used as training data.

Our model uses the GraphSAGE convolution operator in the PyTorch Geometric library, which works by aggregating feature information from each node's neighbors. Each node is assigned a learnable function for its output

$\begin{eqnarray}&&{x}_{i}^{{\prime} }={W}_{1}{x}_{i}+{W}_{2}\times \mathop{\mathrm{mean}}\limits_{j\in { \mathcal N }(i)}\,{x}_{j},\end{eqnarray} \tag{ 1 }$

where x_i represents the features of the ith node, and ${ \mathcal N }(i)$ represents its neighbors. In this context, each halo produces an output based on its own features and those of its immediate ancestors. Chaining multiple SAGEConv layers together allows nodes to obtain information from a wider neighborhood (ancestors-of-ancestors) in the tree.

To regress a relationship between the input features and stellar mass, we create a GNN using this convolutional operator. Our model's architecture consists of five hidden SAGEConv layers expanding to 8 hidden features, each followed by a ReLU activation layer and a batch normalization layer. These convolutional layers are followed by a single linear layer that reduces our output to a single feature, the predicted stellar mass.

We train our model for 100 epochs with a batch size of 8 merger trees, using the mean squared error as our loss function. We use the Adam optimizer (Kingma & Ba 2014), with a learning rate of 0.05 and L2 regularization of 10⁻⁴. The model and workflow can be viewed on Zenodo: 10.5281/zenodo.10939379.

3. Results

Like Mangrove (Jespersen et al. 2022), we evaluated our model using the standard deviation of the residuals

$\begin{eqnarray}&&\sigma \equiv \sqrt{\displaystyle \frac{1}{{N}_{\mathrm{test}}}\displaystyle \sum _{\mathrm{test}}^{N}{({\rm{\Delta }}y-\overline{{\rm{\Delta }}y})}^{2}},\end{eqnarray} \tag{ 2 }$

where N_test is the number of data points in the test set, and ${\rm{\Delta }}y\equiv y-\hat{y}$ is the residual of a prediction. Because the scatter is heavily affected by outliers, we also consider the Pearson correlation coefficient

$\begin{eqnarray}&&\rho \equiv \displaystyle \frac{\mathrm{cov}(y,\hat{y})}{{\sigma }_{y}{\sigma }_{\hat{y}}},\end{eqnarray} \tag{ 3 }$

where cov is the covariance, and coefficient of determination

$\begin{eqnarray}&&{R}^{2}=1-\displaystyle \frac{\sum {({\rm{\Delta }}y)}^{2}}{\sum {(y-\overline{y})}^{2}}.\end{eqnarray} \tag{ 4 }$

Our model obtained a scatter of 0.362 dex, and a bias of −0.0589 dex, both significantly worse than Mangrove's results (Jespersen et al. 2022). We obtained a Pearson correlation coefficient of ρ = 0.425 and coefficient of determination R² = 0.158, implying that our model struggled to regress a strong relationship between the DM merger tree inputs and stellar mass values. Figure 1 depicts the model performance.

**Figure 1.** Stellar masses from the Skinner & Wise (2020) simulation (left) and our GNN (middle) vs. simulated DM halo mass. The right panel compares our GNN-predicted results with the simulated masses.
Download figure:
Standard image High-resolution image

4. Discussion

We attribute the relatively poor performance to three factors in this proof of concept study. First, our model targets very high redshifts (z ≳ 9). These young, low-mass galaxies are likely more difficult to regress without a significantly larger training data set with additional input features because of their bursty nature and sensitivity to prior star formation and feedback.

Second, the majority of our halos have a stellar mass ${\mathrm{log}}_{10}({M}_{* }/{M}_{\odot })=2.5-3.5$ . After a small initial burst, these galaxies may undergo a larger star formation event, jumping between two stellar masses within a few million years (Hazlett et al. 2024). These sharp gradients in stellar mass with respect to time and DM mass induces difficulties in our graph structure. Because our simulation volume is not large enough to adequately sample these galaxies, it is a challenge to provide a balanced input data set to train a reliable model for all masses. In future work, we recommend a dynamic sampling technique which constructs a balanced data set by including merger trees with higher or lower stellar mass distributions based on the data set's current distribution.

Finally, the Skinner & Wise (2020) simulation formed 896 metal-enriched star particles with a total mass of 3.4 × 10⁶ M_⊙, whose masses were captured in 32,026 unique nodes distributed over 918 outputs. After splitting the data, only 3694 halos were available to our test data set. Since the majority of DM halos in the data set contain no stellar mass, our data is sparse when compared to previous works. Jespersen et al. (2022) analyzed over 100,000 merger trees, with up to 2 × 10⁴ halos per tree after pruning. Depending on the composition of their data set, the Mangrove team may have utilized up to tens of thousands of times more halos than our data set. Acquiring more data from early-universe simulations will be critical in the future success of these deep learning applications.

Acknowledgments

This work is supported by NSF grant AST-2108020 and NASA grants 80NSSC20K0520 and 80NSSC21K1053. S.S.S. is supported by the NASA FINESST fellowship award 80NSSC22K1589.

Predicting Stellar Masses of the First Galaxies Using Graph Neural Networks

Article metrics

Author affiliations

ORCID iDs

Dates

Abstract

1. Introduction

2. Methods

3. Results

4. Discussion

Acknowledgments

Predicting Stellar Masses of the First Galaxies Using Graph Neural Networks

Article metrics

Share this article

Author affiliations

ORCID iDs

Dates

Abstract

1. Introduction

2. Methods

3. Results

4. Discussion

Acknowledgments