当前位置: X-MOL 学术Hum. Genet. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
STRAS:a snakemake pipeline for genome-wide short tandem repeats annotation and score
Human Genetics ( IF 5.3 ) Pub Date : 2024-03-20 , DOI: 10.1007/s00439-024-02662-5
Mengna Zhang

High-throughput whole genome sequencing (WGS) is clinically used in finding single nucleotide variants and small indels. Several bioinformatics tools are developed to call short tandem repeats (STRs) copy numbers from WGS data, such as ExpansionHunter denovo, GangSTR and HipSTR. However, expansion disorders are rare and it is hard to find candidate expansions in single patient sequencing data with ~ 800,000 STRs calls. In this paper I describe a snakemake pipeline for genome-wide STRs Annotation and Score (STRAS) using a Random Forest (RF) model to predict pathogenicity. The predictor was validated by benchmark data from Clinvar and PUBMED. True positive rate was 93.8%. True negative rate was 98.0%.Precision was 98.6% and recall rate was 93.8%. F1-score was 0.961. Sensitivity was 93.8% and specificity was 99.6%. These results showed STRAS could be a useful tool for clinical researchers to find STR loci of interest and filter out neutral STRs. STRAS is freely available at https://github.com/fancheyu5/STRAS.



中文翻译:

STRAS:用于全基因组短串联重复注释和评分的snakemake管道

高通量全基因组测序(WGS)在临床上用于发现单核苷酸变异和小插入缺失。开发了多种生物信息学工具来从 WGS 数据中调用短串联重复 (STR) 拷贝数,例如 ExpansionHunter denovo、GangSTR 和 HipSTR。然而,扩展障碍很少见,并且很难在具有约 800,000 个 STR 调用的单个患者测序数据中找到候选扩展。在本文中,我描述了使用随机森林 (RF) 模型来预测致病性的全基因组 STR 注释和评分 (STRAS) 的 Snakemake 流程。该预测器通过 Clinvar 和 PUBMED 的基准数据进行了验证。真阳性率为93.8%。真阴性率为98.0%,准确率为98.6%,召回率为93.8%。 F1 分数为 0.961。敏感性为 93.8%,特异性为 99.6%。这些结果表明 STRAS 可能是临床研究人员寻找感兴趣的 STR 位点并筛选中性 STR 的有用工具。 STRAS 可在 https://github.com/fancheyu5/STRAS 免费获取。

更新日期:2024-03-21
down
wechat
bug