Novel resource-efficient methods for robust and accurate taxonomic profiling of metagenomic data

Weging, Silvio

Please use this identifier to cite or link to this item: http://dx.doi.org/10.25673/101350

Title:	Novel resource-efficient methods for robust and accurate taxonomic profiling of metagenomic data
Author(s):	Weging, Silvio
Referee(s):	Große, Ivo Morgenstern, Burkhard
Granting Institution:	Martin-Luther-Universität Halle-Wittenberg
Issue Date:	2022
Extent:	1 Online-Ressource (ii, 149 Seiten)
Type:	Hochschulschrift
Type:	PhDThesis
Exam Date:	2022-10-20
Language:	English
URN:	urn:nbn:de:gbv:3:4-1981185920-1033066
Abstract:	Examining the taxonomic composition of sequenced data is a necessary step in almost any metagenomic analysis. Most existing and widely used programs prioritize speed over accuracy and robustness, while consuming large amounts of memory. As an alternative, we have developed and implemented new methods in a program called kASA, which is able to efficiently identify DNA or protein sequences using k-mers to build a metagenomic profile. We ensure high accuracy and robustness by using an amino acid-like encoding together with an interval of k's while using at most the amount of memory specified by the user. Algorithms and data structures specifically adapted to the use of secondary memory allow a complete taxonomic analysis of metagenomic data without compromises on HPC clusters, desktops or even laptops. Die Untersuchung der taxonomischen Zusammensetzung von sequenzierten Daten ist ein notwendiger Schritt in fast jeder metagenomischen Analyse. Die meisten existierenden und weit verbreiteten Programme priorisieren Geschwindigkeit über Genauigkeit oder Robustheit und verbrauchen dabei große Mengen an Arbeitsspeicher. Als Alternative haben wir neue Methoden entwickelt und in einem Programm namens kASA implementiert, das in der Lage ist, effizient DNA- oder Proteinsequenzen mit k-meren zu identifizieren, um ein metagenomisches Profil zu erstellen. Dabei wird eine hohe Genauigkeit und Robustheit sicher gestellt, indem es eine aminosäureähnliche Kodierung zusammen mit einem Intervall von k's verwendet, wobei dabei maximal die vom Benutzer angegebene Speichermenge verbraucht wird. Algorithmen und Datenstrukturen, die speziell an die Verwendung von Sekundärspeicher angepasst sind, ermöglichen eine taxonomische Analyse von metagenomischen Daten auf HPC-Clustern, Desktops oder sogar Laptops.
URI:	https://opendata.uni-halle.de//handle/1981185920/103306 http://dx.doi.org/10.25673/101350
Open Access:	Open access publication
License:	In Copyright
Appears in Collections:	Interne-Einreichungen

Files in This Item:

File	Description	Size	Format
Dissertation_MLU_2022_WegingSilvio.pdf		2.38 MB	Adobe PDF	View/Open

Show full item record BibTeX EndNote