DEDICATION

We dedicate this work to the cherished memory of Vladimir P. Skulachev – our esteemed teacher, mentor, colleague, and extraordinary scientist. His profound influence extended beyond the confines of academia, as he played a pivotal role in orchestrating scientific endeavors. Academician Skulachev is renowned for his groundbreaking contributions to the fields of cell bioenergetics and mitochondrial function. His unwavering passion has, in recent years, also encompassed the biology of aging, yielding many remarkable discoveries. Academician Skulachev’s legacy is not only defined by his scientific achievements but also by the enduring impression he left through his brilliance, wisdom, kindness, and remarkable character. In our hearts, he will forever occupy a special place, his memory serving as an inspiration for current and future generations.

INTRODUCTION

Aging is the major risk factor for multiple chronic diseases and, therefore, poses a global medical problem [1]. Understanding the molecular basis and inter-species differences of the aging process is thus a crucial step towards developing effective geroprotectors and overcoming the difficulties aging brings to our society. The systems approach to characterize aging has gained popularity in the recent years with the advent of high-throughput sequencing and due to its comprehensiveness, resulting in a plethora of omics data available and a great demand for its aggregation into databases and meta-analysis.

To date, a number of databases containing aging genomics and transcriptomics data exist, including Aging Atlas [2], HAGR [3], Digital Ageing Atlas [4], AGEMAP [5], and Open Genes [6]. Most of these databases accommodate plenty of information in addition to aging transcriptomics, e.g., lifespan-extending interventions, aging phenotype, and epigenetics, but when it comes to gene expression, the information is mostly a collection of age-related differentially expressed genes identified from different studies. To the best of our knowledge, there are currently no databases that would present meta-analysis-derived quantitative general trends of gene expression changes with age, including species-specific, tissue-specific, and global trends, allowing easy comparison and visualization for each gene at the level of individual datasets and individual samples within each dataset. To fill this gap, we developed the AgeMeta database that provides an interactive visual interface to investigate these biomarkers in particular tissues and mammalian species as well as cellular functions enriched in the transcriptomic aging signatures, both at the level of general trends and individual datasets. To facilitate the use of our resource, we developed a comprehensive manual for each section of the database and made the database contents downloadable.

MATERIALS AND METHODS

The full list of datasets used to construct the transcriptomic signatures of mammalian aging integrated in AgeMeta included the following study IDs from GEO [7], ArrayExpress [8], and SRA [9] databases: GSE9103, GSE123981, GSE3150, GSE6591, GSE74463, GSE53960, GSE66715, GSE11291, GSE34378, GSE27625, GSE12480, GSE36192, GSE1572, GSE28422, GSE25941, GSE53890, GSE38718, GSE674, GSE17612, GSE21935, GSE362, GSE132040, E-MTAB-3374, PRJNA281127, PRJNA516151. In addition to these studies, we also used data from GTEx [10]. The datasets contained RNA-Seq and microarray samples obtained from different tissues of Mus musculus, Rattus norvegicus, and Homo sapiens. The tissue content varied across species; 23 tissues used across all three species included adipose tissue [in certain cases labeled specifically as brown adipose tissue (BAT), mesenteric adipose tissue (MAT), gonadal adipose tissue (GAT) and subcutaneous adipose tissue (SCAT)], adrenal gland, blood vessel, bone and bone marrow, brain (in certain cases olfactory bulbs were labeled separately as OB and cerebellum and frontal cortex were labeled separately), esophagus, heart, kidney, liver, lung, muscle, nerve, pancreas, pituitary, prostate, salivary gland, skin, small intestine, spleen, testis, thyroid and whole blood (in some cases white blood cells (WBC) were labeled separately). An annotation of every dataset used in this study, that includes information about the number of samples, age range, strain, tissue, and sex, can be found in the supplementary materials (Table S1 in the Online Resource 1).

The data preprocessing pipelines for RNA-Seq and microarray datasets were described in detail in [11]. For both types of data, all platform IDs of genes were standardized to Entrez IDs and mapped to mouse one-to-one orthologs (in the case of rat and human samples). These two preprocessing steps resulted in some genes being discarded from the data. Genes that were not detected in more than 40% of the datasets used to build a particular signature were also filtered out prior to the corresponding meta-analysis. Consequently, if a given gene has no entry in AgeMeta, this could have been caused by either of the above-stated data processing steps.

The gene expression log fold changes (logFC) utilized to construct the signatures of aging were calculated as the slope coefficient of a linear model when the sample ages included more than two unique values and as the difference in means otherwise, using the limma R package [12]. Mixed-effect model was utilized to aggregate the expression changes from all datasets into aging signatures with the metafor R package [13]. The logFCs and their respective standard errors were used as an input, and study ID, tissue, and species were introduced into the model as random terms. Gene set enrichment analysis was performed using the fgsea R package [14] with 10,000 permutations for the individual datasets and the signatures of aging. The AgeMeta website was created using the shiny R package [15].

DATABASE CONTENT

The contents of the database are based on the 7 transcriptomic signatures of aging identified in our recent study [11]. The transcriptomic signatures represent general trends of gene expression changes with age and were constructed by the meta-analysis of 122 datasets from 26 studies (table). Tissue-specific signatures of brain, skeletal muscle, and liver were derived from the corresponding human, mouse, and rat datasets, while species-specific signatures summarize the data across multiple different tissues of a certain species. The global signature encompasses all of the data from the three species and 23 tissues (see “Materials and Methods” section).

Overview of the 7 transcriptomic signatures of aging

Criterion

Human

Mouse

Rat

Brain

Muscle

Liver

Global

Number of datasets

51

52

19

25

17

11

122

Total number of genes

15,959

13,459

10,382

16,125

15,339

12,805

15,170

Number of statistically significant age-associated genes (p adjusted < 0.05)

649

184

501

1526

379

1219

7

The database consists of 5 major modules (Fig. 1). The first one is the comparative signature interface, which provides aggregated quantitative estimates of the age-related expression changes (normalized logFC) in a particular tissue and species along with the corresponding metrics of statistical significance (p-value before and after the Benjamini–Hochberg adjustment [16]) for every gene. The user can choose the main signature to work with, and sort and filter genes according to their significance in the chosen signature. One can easily compare the differential expression values of chosen genes and their significance between all 7 signatures of aging using the right part of the interactive table, where the colored arrows next to logFC values illustrate the direction of gene expression changes and the cell hues show statistical significance (Fig. 2). The side menu allows the user to filter genes based on their presence and significance in all 7 signatures and on whether they are up- or downregulated with age according to each of the signatures.

Fig. 1.
figure 1

The structure of the AgeMeta database. AgeMeta provides a visual interface for the 7 transcriptomic signatures of aging, consisting of two interactive tables which display quantitative age-related transcriptomic changes for individual genes (the comparative signature interface) and functional groups of genes (the functional signature interface) and allow comparison between the signatures; as well as three types of plots depicting the differential expression of a gene in a signature at the dataset level (the mixed-effects model plot), the change of expression with age for a gene within an individual dataset (the dataset expression plot), and enrichment scores of a functional group of genes for individual datasets associated with a certain signature (the functional dataset plot). Clicking the table cells navigates the user to the corresponding plots, while clicking the points in the mixed-effects model plot directs the user to the dataset expression plot related to the clicked dataset. Both tables can be downloaded in a separate section of the database.

Fig. 2.
figure 2

The comparative signature interface table. The arrow colors represent the direction of gene expression changes in a given signature (the signatures are labeled in the column names), the cell hues represent the statistical significance of those changes. Adjusted p-value less than the soft threshold results in a dull hue, adjusted p-value less than the strict threshold results in a brighter hue. Both thresholds can be set in the menu on the left-hand side of the page. All cells in the right part of the table with the signature logFCs are clickable. Clicking them transfers the user to the corresponding mixed-effects model plot.

Should one wish to inspect the origin of each logFC value in the interactive table, one can click the table cell with the value, which would direct them to the mixed-effects model plot corresponding to the gene and signature represented by the clicked cell (Fig. 3a). The mixed-effects model was used to aggregate differential expression data from individual datasets into signatures (see “Materials and Methods” section). In the plot, each point stands for a dataset, namely a data point with a unique combination of species, tissue, sex, strain, and study. The dataset specifications are displayed for each point on mouse hover. The red line shows the aggregated average logFC value for a given gene presented in the comparative signature table (this value is the output of the mixed-effects model), while the y-coordinate of each point shows the logFC from the corresponding dataset. The color and shape of the points can be set to discriminate between species, tissues, studies, sexes, platforms, and types of models used for obtaining the differential expression values, which allows the user to tailor the plot to their specific needs. A click of a point in this plot directs the user to the dataset expression plot, which shows how the expression of the selected gene changes with age in the clicked dataset (Fig. 3b). The goal of the dataset expression plot is to display the raw data, utilized to calculate the logFC value and its standard error for a particular dataset.

Fig. 3.
figure 3

Examples of the mixed-effects model plot and the differential gene expression plot. a) An example of the mixed-effects model plot. Each point represents a single dataset. Shape and color represent species and tissue, respectively. The data are mean normalized slope/logFC ± SE. The details of each dataset are specified in the text box that appears on mouse hover. b) An example of the dataset gene expression plot corresponding to a point clicked on the mixed-effects model plot. The slope of the regression line corresponds to the logFC or slope value for the gene in the clicked dataset (the y-axis coordinate of the clicked point), while the width of the shaded area around the line reflects the standard error for the slope.

The last two modules of the database essentially mirror the first two, but the genes are replaced with the functional groups of genes. Functional enrichment for the signatures and datasets was performed using Gene Set Enrichment Analysis [1718], where quantitative normalized enrichment scores (NES) and Benjamini–Hochberg adjusted p-values were calculated. In the functional signature interface module, one can compare enrichment scores and corresponding statistical significances across all 7 signatures. The functional terms can be filtered based on their presence, statistical significance, up- or downregulation according to the signatures, as well as on the ontology these terms came from. The utilized ontologies included GO Biological Process [1920], KEGG [21-23], Reactome [24], and Biocarta [25]. Since many functional groups of genes are nested within each other, we introduced a feature that allows one to simultaneously examine all parent and child gene sets for a selected functional term, i.e., groups that contain the selected gene set or are subsets of this gene set, respectively. Clicking a functional term name will reduce the functional signatures interface table, keeping only the selected term and its parent and child terms. The “Back to full table” button under the table will reset the table to the default state with all functional groups from the database. Clicking a table cell navigates the user to the functional dataset plot, which displays enrichment scores and statistical significances for all individual datasets corresponding to the clicked functional term and signature (Fig. 4). Although aggregated functional enrichment scores of signatures were not obtained from the scores of the individual datasets, this plot provides a general idea about the consistency of the age-related up- or downregulation of genes associated with the selected pathway across datasets and helps to justify the values in the interactive table. Each bar in the plot represents one dataset, with the dataset specifications shown on mouse hover. Green and red bars represent statistically significant up- or downregulation with age, respectively, while gray bars stand for enrichment scores that failed to cross the statistical significance threshold set in the side menu (by default, the adjusted p-value threshold is equal to 0.1).

Fig. 4.
figure 4

An example of the functional enrichment dataset plot. The plot corresponds to “GOBP: Antigen processing and presentation” function in the mouse signature. The bars are colored based on statistical significance: up- and downregulated functions are shown in green and red, respectively, while gray bars correspond to non-significant enrichments. Dataset specification is displayed in a text box on mouse hover.

To make the use of the database simple and convenient, we developed the “Manual” section where each database module is explained in detail with example tables and plots. In addition, the whole database is available in two languages: English and Russian. To allow independent analysis of our data, we made the differential gene expression data and the functional enrichment data fully downloadable. They can be accessed through the “Downloads” section.

DISCUSSION

With the use of AgeMeta, the aging-associated differential expression patterns and their inter-species or inter-tissue differences can be easily visualized for any of the thousands of genes or functional groups present in the database. The interactive tables and plots allow users to study genes of interest at the transcriptomic signature level as well as at the level of individual datasets. Additionally, the application of these and other related transcriptomic signatures for prediction and characterization of novel lifespan-extending and rejuvenative interventions [1126] suggests that the signatures of aging integrated in AgeMeta might facilitate the discovery and development of new potential geroprotectors.

Several databases of aging genomics and transcriptomics exist to date. Therefore, it is important to highlight some of the differences between the most relevant existing databases and AgeMeta. GenAge, a database that is a part of a collection of databases known as Human Ageing Genomic Resources, or HAGR [3], presents effects of genomic interventions (e.g., gene knockouts) on lifespan for model organisms such as Mus musculus. The association of these effects with age-related transcriptomic changes is not trivial, i.e., genomic interventions that result in a longer lifespan do not always counteract the gene expression changes that happen with age. This could be explained by some of the age-related gene expression changes being detrimental, while some being adaptive and having a positive effect on the aging phenotype [11]. An example of this phenomenon is Pparg, a gene which is significantly upregulated with age in the mouse signature but whose knockout significantly decreases the mean lifespan of mice [27].

There are other databases with data on aging transcriptomics, e.g., Aging Atlas [2] and AGEMAP [5]. Both of them contain differential expression information from individual studies (there is only one study in AGEMAP), but none of them offer meta-analysis-derived differential expression values, aggregated across multiple studies with multiple tissues and species, which is the key distinguishing feature of AgeMeta. Aging Atlas also contains multi-omics data, including protein–protein interactions, ChIP-Seq data, and single cell transcriptomics, which were outside of the scope of our study. Another database, Open Genes [6], contains descriptions of genes with many different types of associations with aging, including knockouts affecting lifespan and age-related differential expression, but this database also does not contain results of a quantitative meta-analysis.

A couple of limitations of our work should be noted. Firstly, the signatures of aging used in AgeMeta were based on linear differential expression models, which could imply that some non-linear age-related gene expression changes were not identified as statistically significant in this study. Secondly, differential co-expression was shown to occur with age in various mammalian species [28-30], whereas in our work only expression changes of individual genes were examined. Expansion of AgeMeta with a meta-analysis of non-linear differential gene expression and gene co-expression patterns may further improve the comprehensiveness and the application scope of our database as well as help enhance the understanding of molecular mechanisms of mammalian aging by the scientific community.