Please use this identifier to cite or link to this item: http://dx.doi.org/10.25673/118900
Title: The chemical space spanned by manually curated datasets of natural and synthetic compounds with activities against SARS-CoV-2
Author(s): Betow, Jude Y.
Turon, Gemma
Metuge, Clovis S.
Akame, Simeon
Shu, Vanessa A.
Ebob, Oyere T.
Duran, MiquelLook up in the Integrated Authority File of the German National Library
Ntie-Kang, FideleLook up in the Integrated Authority File of the German National Library
Issue Date: 2025
Type: Article
Language: English
Abstract: Diseases caused by viruses are challenging to contain, as their outbreak and spread could be very sudden, compounded by rapid mutations, making the development of drugs and vaccines a continued endeavour that requires fast discovery and preparedness. Targeting viral infections with small molecules remains one of the treatment options to reduce transmission and the disease burden. A lesson learned from the recent coronavirus disease (COVID-19) is to collect ready-to-screen small molecule libraries in preparation for the next viral outbreak, and potentially find a clinical candidate before it becomes a pandemic. Public availability of diverse compound libraries, well annotated in terms of chemical structures and scaffolds, modes of action, and bioactivities are, therefore, crucial to ensure the participation of academic laboratories in these screening efforts, especially in resource-limited settings where synthesis, testing and computing capacity are scarce. Here, we demonstrate a low-resource approach to populate the chemical space of naturally occurring and synthetic small molecules that have shown in vitro and/or in vivo activities against the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and its target proteins. We have manually curated two datasets of small molecules (naturally occurring and synthetically derived) by reading and collecting (hand-curating) the published literature. Information from the literature reveals that a majority of the reported SARS-CoV-2 compounds act by inhibiting the main protease, while 25% of the compounds currently have no known target. Scaffold analysis and principal component analysis revealed that the most common scaffolds in the datasets are quite distinct. We then expanded the initially manually curated dataset of over 1200 compounds via an ultra-large scale 2D and 3D similarity search, obtaining an expanded collection of over 150 k purchasable compounds. The spanned chemical space significantly extends beyond that of a commercially available coronavirus library of more than 20 k small molecules and constitutes a good starting collection for virtual screening campaigns given its manageable size and proximity to hand-curated compounds.
URI: https://opendata.uni-halle.de//handle/1981185920/120856
http://dx.doi.org/10.25673/118900
Open Access: Open access publication
License: (CC BY 4.0) Creative Commons Attribution 4.0(CC BY 4.0) Creative Commons Attribution 4.0
Journal Title: Molecular informatics
Publisher: Wiley-VCH-Verl.
Publisher Place: Weinheim
Volume: 44
Issue: 1
Original Publication: 10.1002/minf.202400293
Page Start: 1
Page End: 11
Appears in Collections:Open Access Publikationen der MLU