Skip to main content
Log in

StringFix: an annotation-guided transcriptome assembler improves the recovery of amino acid sequences from RNA-Seq reads

  • Research Article
  • Published:
Genes & Genomics Aims and scope Submit manuscript

Abstract

Background

Reconstruction of amino acid sequences from assembled transcriptome is of interest in personalized medicine, for example, to predict drug-target (or protein-protein) interaction considering individual’s genomic variations. Most of the existing transcriptome assemblers, however, seems not well suited for this purpose.

Methods

In this work, we present StringFix, an annotation guided transcriptome assembly and protein sequence reconstruction software tool that takes genome-aligned reads and the annotations associated to the reference genome as input. The tool ‘fixes’ the pre-annotated transcript sequence by taking small variations into account, finally to produce possible amino acid sequences that are likely to exist in the test tissue.

Results

The results show that, using outputs from existing reference-based assemblers as the input GTF-guide, StringFix could reconstruct amino acid sequences more precisely with higher sensitivity than direct generation using the recovered transcripts from all the assemblers we tested.

Conclusion

By using StringFix with the existing reference-based assemblers, one can recover not only a novel transcripts and isoforms but also the possible amino acid sequence stemming from them.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data and Software availability

The python code was deposited to PyPI and Github, respectively. Installation instruction, usage and example codes can be found at https://github.com/combio-dku/ (Project name: StringFix, license: GPL 3.0, Operating system(s): Platform independent, Programming language: python 3, other requirements: None). The datasets used in this work can be freely downloaded from the gene expression omnibus (GEO) at https://www.ncbi.nlm.nih.gov/geo/ using their accession number.

References

Download references

Acknowledgements

The authors gratefully acknowledge the Center for Bio-Medical Engineering Core Facility at Dankook University.

Funding

This work was supported by the research fund of Dankook university in 2022.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Seokhyun Yoon.

Ethics declarations

Competing interest

The authors declare that they have no competing interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lee, J., Kim, M., Han, K. et al. StringFix: an annotation-guided transcriptome assembler improves the recovery of amino acid sequences from RNA-Seq reads. Genes Genom 45, 1599–1609 (2023). https://doi.org/10.1007/s13258-023-01458-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13258-023-01458-7

Navigation