当前位置: X-MOL 学术Methods Ecol. Evol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
treats: A modular R package for simulating trees and traits
Methods in Ecology and Evolution ( IF 6.6 ) Pub Date : 2024-03-08 , DOI: 10.1111/2041-210x.14306
Thomas Guillerme 1
Affiliation  

1 INTRODUCTION

Comparing biological patterns is one of the key ways to understand mechanisms in evolutionary biology. This leads to the development of phylogenetic comparative methods as key methodologically driven topic in ecology, evolution and palaeontology (Felsenstein, 1985; Pennell & Harmon, 2013). These methods rely on comparing patterns in a phylogenetic context to understand biological mechanisms or concepts (Harmon, 2019). These comparisons can be done between observed patterns under different conditions or against null, neutral or baseline models (see Bausman, 2018 for distinctions) suggesting different processes or mechanisms. For example different traits distribution for species with different diets (Deepak et al., 2023) or habitats (Pinto-Ledezma et al., 2017). Or by comparing some observed pattern to one simulated under null or base conditions (Miller et al., 2022). In theory, workers can use the following research pipeline: (1) thinking of a specific mechanism (e.g. mass extinction allowing the surviving species to acquire new morphologies), (2) collecting some data to test this mechanism (e.g. some traits of species across and extinction event) and then (3) comparing these patterns to some simulated under no specific conditions (e.g. a null model where the traits evolve randomly regardless of an extinction event, Puttick et al., 2020). Workers might thus need to simulate a great diversity of evolutionary scenarios to test their specific question. To do so, we need statistical and software solutions to simulate trees and data to generate many specific null models.

In practice, these evolutionary simulations can be done relatively easily on computers using a birth-death process (Feller, 1939; FitzJohn, 2012; Stadler, 2010). A birth-death process is a continuous time Markov process that had been routinely implemented in R (R Core Team, 2023) to simulate realistic phylogenies (e.g. FitzJohn, 2012; Paradis & Schliep, 2019). This general algorithm to generate phylogenetic trees can be coupled with other Markov processes to also generate traits, for example using a Brownian Motion process (BM; Cavalli-Sforza & Edwards, 1967) or an Ornstein Uhlenbeck (OU; Lande, 1976; see Cooper, Thomas, Venditti, et al., 2016 for a distinction between both). In R, this can be done with several already well used and well documented packages. For example if you want to simulate diversity through time, you can use TreeSim (Stadler, 2011) to simulate diversity under a set of specific parameters (e.g. speciation and extinction) with some events disrupting the simulations (e.g. mass extinctions). You can even improve on generating these patterns using FossilSim (Barido-Sottani et al., 2019) to take into account fossilisation processes. You can also use paleobuddy (do Rosario Petrucci et al., 2022) or paleotree (Bapst, 2012) to generate palaeontology specific data. On the other hand, if you need to simulate both diversity and traits through time, this can be done with specific parameters in RPANDA (Morlon et al., 2016), diversitree (FitzJohn, 2012) or PETER (Puttick et al., 2020) where the traits are generated stochasticaly through time (given some process) during the birth-death process.

Although the packages mentioned above are excellent and routinely used with fast and reliable algorithms and associated documentation, they are all designed for specific tasks and don't allow much modification beyond the input parameters designed by the authors. For example, TreeSim can simulate a birth-death tree with some extinction event but is not designed to simulated one with an extinction event that leads to the birth-death process to be not diversity dependent anymore, simulating a release in selection pressure after the extinction event that leads to a different process dominating speciation. Or PETER is not designed to simulate a complex set of traits (say three correlated BM traits and two independent OU ones). This absence of modularity has hampered the use of complex and question-driven simulations, although I acknowledge this was not the primary aim of the authors of the excellent packages mentioned above. This has led workers to often develop their own tools to answer specific questions (e.g. Puttick et al., 2020). Therefore, I propose treats a modular R package to simulate both trees and traits through time. Note that although treats is modular and thus allows to be used as go to tool for simulating and trees and traits, it lacks the ready-to-use implemented methods featured in other packages such as fossilisation and sampling (Barido-Sottani et al., 2019; do Rosario Petrucci et al., 2022; Stadler, 2011) or specific macroevolutionary simulations (Morlon et al., 2016; Puttick et al., 2020).



中文翻译:

treats:用于模拟树木和特征的模块化 R 包

1 简介

比较生物模式是理解进化生物学机制的关键方法之一。这导致系统发育比较方法的发展,作为生态学、进化论和古生物学中关键方法论驱动的主题(Felsenstein,  1985;Pennell & Harmon,  2013)。这些方法依赖于比较系统发育背景中的模式来理解生物机制或概念(Harmon,  2019)。这些比较可以在不同条件下观察到的模式之间进行,或者针对零模型、中性模型或基线模型(参见 Bausman,  2018 年的区别)进行比较,表明不同的过程或机制。例如,具有不同饮食(Deepak 等,  2023)或栖息地(Pinto-Ledezma 等,  2017)的物种的不同性状分布。或者通过将一些观察到的模式与零或基本条件下模拟的模式进行比较(Miller 等人,  2022)。理论上,工作人员可以使用以下研究流程:(1)思考一种特定的机制(例如大规模灭绝使幸存的物种获得新的形态),(2)收集一些数据来测试这种机制(例如跨物种物种的某些特征)和灭绝事件),然后(3)将这些模式与在没有特定条件下模拟的一些模式进行比较(例如,无论灭绝事件如何,特征都会随机演化的零模型,Puttick 等人,  2020)。因此,工作人员可能需要模拟各种各样的进化场景来测试他们的具体问题。为此,我们需要统计和软件解决方案来模拟树和数据以生成许多特定的空模型。

在实践中,这些进化模拟可以在计算机上使用生灭过程相对容易地完成(Feller,  1939;FitzJohn,  2012;Stadler,  2010)。出生-死亡过程是一个连续时间马尔可夫过程,已在R(R 核心团队,  2023)中常规实施,以模拟现实的系统发育(例如 FitzJohn,  2012;Paradis & Schliep,  2019)。这种生成系统发育树的通用算法可以与其他马尔可夫过程相结合来生成特征,例如使用布朗运动过程(BM;Cavalli-Sforza & Edwards,  1967)或 Ornstein Uhlenbeck(OU;Lande,  1976;参见 Cooper) ,Thomas, Venditti, et al.,  2016两者之间的区别)。在R中,这可以通过几个已经广泛使用且记录良好的包来完成。例如,如果您想模拟随时间变化的多样性,您可以使用TreeSim (Stadler,  2011 ) 来模拟一组特定参数(例如物种形成和灭绝)下的多样性,以及一些破坏模拟的事件(例如大规模灭绝)。您甚至可以使用FossilSim(Barido-Sottani 等人,  2019 )改进生成这些模式,以考虑化石化过程。您还可以使用paleobuddy(do Rosario Petrucci et al.,  2022)或paleotree(Bapst,  2012)来生成古生物学特定数据。另一方面,如果您需要随着时间的推移模拟多样性和特征,可以使用RPANDA (Morlon et al.,  2016 )、diversitree (FitzJohn,  2012 ) 或PETER (Puttick et al.,  2020 )中的特定参数来完成)其中特征是在出生-死亡过程中随着时间(给定某些过程)随机生成的。

尽管上面提到的包非常出色并且通常与快速可靠的算法和相关文档一起使用,但它们都是为特定任务而设计的,并且不允许对超出作者设计的输入参数进行太多修改。例如,TreeSim可以模拟具有某些灭绝事件的生灭树,但其设计目的不是模拟具有灭绝事件的树,从而导致生灭过程不再依赖于多样性,而是模拟灭绝后选择压力的释放导致不同过程主导物种形成的事件。或者PETER并非旨在模拟一组复杂的特征(例如三个相关的 BM 特征和两个独立的 OU 特征)。模块化的缺乏阻碍了复杂的和问题驱动的模拟的使用,尽管我承认这不是上述优秀软件包的作者的主要目标。这导致工作人员经常开发自己的工具来回答特定问题(例如 Puttick 等人,  2020)。因此,我建议使用模块化R包来模拟随时间变化的树和特征。请注意,虽然treats是模块化的,因此可以用作模拟树木和性状的工具,但它缺乏其他包中的现成实施方法,例如化石和采样(Barido-Sottani 等人,  2019;do Rosario Petrucci 等人,  2022;Stadler,  2011)或特定的宏观进化模拟(Morlon 等人,  2016;Puttick 等人,  2020)。

更新日期:2024-03-08
down
wechat
bug