Abstract

Lyme disease caused by the Borrelia species is a growing health concern in many parts of the world. Current treatments for the disease may have side effects, and there is also a need for new therapies that can selectively target the bacteria. Pathogens responsible for Lyme disease include B. burgdorferi, B. afzelii, and B. garinii. In this study, we employed structural docking-based screening to identify potential lead-like inhibitors against the bacterium. We first identified the core essential genome fraction of the bacterium, using 37 strains. Later, we screened a library of lead-like marine microbial metabolites () against the arginine deiminase (ADI) protein of Borrelia garinii. This protein plays a crucial role in the survival of the bacteria, and inhibiting it can kill the bacterium. The prioritized lead compounds demonstrating favorable binding energies and interactions with the active site of ADI were then evaluated for their drug-like and pharmacokinetic parameters to assess their suitability for development as drugs. Results from molecular dynamics simulation (100 ns) and other scoring parameters suggest that the compound CMNPD18759 (common name: aureobasidin; IUPAC name: 2-[(4R,6R)-4,6-dihydroxydecanoyl]oxypropan-2-yl (3S,5R)-3,5-dihydroxydecanoate) holds promise as a potential drug candidate for the treatment of Lyme disease, caused by B. garinii. However, further experimental studies are needed to validate the efficacy and safety of this compound in vivo.

1. Introduction

Borrelia garinii belongs to the genus Borrelia and is responsible for causing Lyme disease [1, 2]. It is the most common tick-borne disease in the Northern Hemisphere, with over 300,000 new cases reported each year in the United States alone [3]. It is also the common Borrelia sp. found in Europe and Asia, where it is transmitted to humans through the bite of infected ticks [4]. This bacterium has a complex life cycle and can survive and persist in different hosts, including the vector tick and humans [5, 6]. After infection, B. garinii can cause a range of symptoms associated with Lyme disease, including fever, fatigue, headache, and a characteristic bull’s-eye rash [7, 8]. If left untreated, it can spread to other parts of the body and cause symptoms like joint pain, meningitis, and cardiac issues [9, 10]. The diagnosis of Lyme disease entails a comprehensive approach, integrating clinical symptom presentation with confirmation through blood tests and PCR assays [11, 12]. The standard treatment protocol typically includes a course of antibiotics, such as doxycycline or amoxicillin. The prognosis for patients is generally favorable, particularly when the disease is diagnosed early and treated promptly [13]. There is a need for newer drug leads targeting this bacterium, given the limitations associated with the antibiotics currently in use. They may effectively kill the bacterium during the early stages of infection but are often less effective in the later stages of the disease when the bacteria is disseminated to various organs and tissues [14]. In addition, the emergence of antibiotic-resistant strains of B. garinii is a growing concern, as this limits the effectiveness of existing antibiotics and poses a significant threat to public health [15, 16].

The discovery of new drugs and targets is critical to overcome this challenge and improve the efficacy and safety of Lyme disease treatment. B. garinii has several enzymes that allow it to evade the host immune system and establish persistent infection [17]. These enzymes could be targeted by small molecule inhibitors to prevent survival in the host. Targeting essential genes and proteins that are required for its survival and replication through subtractive proteomics is one strategy. Complementary methods like pan-proteomics are a powerful approach for the narrowing down of new therapeutic targets from the core region of the genome [18]. In silico methods, which rely on computational modeling and simulation, have tremendous potential for swift therapeutic target screening and inhibitor design [19, 20]. This approach can be used to screen a large number of potential drug candidates in a relatively short period, in contrast to slow and expensive traditional drug discovery methods. Current methods enable researchers to analyze vast amounts of genomic and structural data rapidly and efficiently. They can be used to identify potential therapeutic targets based on their essentiality, conservation, and druggability. Moreover, computational methods can be used for the design and optimization of small molecule inhibitors capable of selectively and effectively targeting these specific molecular targets [21]. Focusing on conserved proteins across various strains can enhance the efficacy of inhibition and decrease the probability of antibiotic resistance. This approach may involve utilizing natural products or similar drug-like compounds. The application of the pan-proteome concept and combination of these in silico techniques to select therapeutic protein targets from B. garinii have the potential to identify new targets and improve current therapies for Lyme disease treatment. Furthermore, this approach can contribute to the development of new therapies against the resistant varieties of B. garinii strains.

In this study, we selected marine-derived microbial metabolites for screening as they are a rich source of bioactive compounds with potential therapeutic applications, including the development of novel antibiotics [22]. Numerous marine bacterial and fungal species have developed distinctive metabolic pathways, yielding natural products that confer adaptive advantages for their survival in the challenging marine environment [23]. Their metabolites often possess antimicrobial activity against a range of pathogenic bacteria, including antibiotic-resistant strains [24]. One example of a marine organism-sourced bacterial inhibitor is salinosporamide A, which is produced by the bacterium Salinispora tropica [25]. This compound is a potent inhibitor of the proteasome, essential for the survival of many pathogenic bacteria, including Mycobacterium tuberculosis and Staphylococcus aureus [26]. Salinosporamide A has shown promise in preclinical studies as a potential treatment for multiple myelomas as well [27]. Another example of a marine actinomycete-derived MRSA inhibitor is marinopyrrole A [28, 29]. Quorum-sensing inhibitors from marine Oceanobacillus sp. [30] and Staphylococcus hominis [31] have also been reported in the literature. By disrupting quorum sensing, bacterial colonization and biofilm formation may be prevented in the pathogens. In addition, the efficacy of traditional antibiotics may also be enhanced. Therefore, inhibitors based on marine-derived natural products hold significant promise as novel antibiotics. These have the potential to address antibiotic-resistant infections, thereby enhancing the efficacy and safety of existing treatments. In this study, a marine microbe-derived metabolite library containing lead-like compounds was screened against the arginine deiminase (ADI) enzyme of B. garinii. The ADI enzyme plays a critical role in the urea cycle by catalyzing the conversion of arginine to citrulline and ammonia. Given that arginine is necessary for protein synthesis in bacteria, ADI inhibition presents a promising drug target to curtail bacterial growth. The utilization of a docking-based screening approach enabled the rapid screening of marine microbial inhibitors, as it is a streamlined method for identifying potential inhibitors for therapeutic applications.

2. Material and Methods

2.1. Pan-Proteomics

The proteome sequences of B. garinii () were retrieved in FASTA format from the BV-BRC database (https://www.bv-brc.org/; accessed 30 March 2023) and subjected to pan-proteome analysis using BPGA software [32]. BPGA performs sequence data preprocessing and clustering using USEARCH [33] to generate a pan, unique, and core-genome file. It also performs functional annotation and classification of genes. It uses MUSCLE [34] and rsvg-convert dependencies to align sequences and generate trees based on pan and core genes. Plots were visualized with the aid of gnuplot libraries [35].

2.2. Therapeutic Target Mapping

To identify paralogous sequences, the core proteome of B. garinii was subjected to the CD-HIT suite [36], with default parameters except for a threshold value of 60%. The CD-HIT suite is a popularly employed software for comparing and clustering protein and genomic sequences to eliminate redundant proteins. To identify essential proteins, BLAST was used against the Database of Essential Genes (DEG) [37] and Database of Essential Gene Clusters (CEG) [38]. Next, the obtained proteins were subjected to a BLAST search against the human proteome (https://www.uniprot.org/uniprotkb?query=reviewed%3Atrue+AND+proteome%3Aup000005640; accessed 4 April 2023) to obtain a dataset of nonhomologous proteins that could be targeted without impacting the human host. For this purpose, the E-value criterion was set at . The resulting dataset was further evaluated against human gut flora, and an E-value greater than 10-4 was considered [39]. The final dataset was subjected to a BLAST search against DrugBank [40], with an E-value of <10-3. Hits obtained were classified as drug targets. ADI (Accession: WP_031505634.1) was retained for downstream analysis due to its attributes of druggability, uniqueness, and significance in bacterial survival. To assess its similarity to pathogenically important Borrelia spp., its sequence was aligned using the multialign server (URL: http://multalin.toulouse.inra.fr/multalin/cgi-bin/multalin.pl; accessed 3 September 2023) with the ADI of the B. afzelii and B. burgdorferi.

2.3. Structural Modeling

The protein 3D model was built with iterative threading assembly refinement (I-TASSER) and AlphaFold [41]. The I-TASSER (https://zhanggroup.org/I-TASSER/; retrieved 8 April 2023) algorithm utilizes the identified templates to build a preliminary model, followed by iterative rounds of refinement that optimize the model structure by minimizing energy and improving stereochemistry [42]. During refinement, the algorithm also generates multiple models, which are ranked based on their quality scores. The top modeled structure was superimposed onto its templates through the use of the FATCAT algorithm within the RCSB pairwise structural alignment server module (https://www.rcsb.org/alignment; accessed 3 August 2023). This was to see the common folds and structural regions of similarity. Another tool, the AlphaFold, deploys deep learning and neural networks to predict the 3D arrangement of a protein, providing valuable insights into its folding and overall structural characteristics. UniRef90 was used for generating multiple sequence alignments. The predicted LDDT (pLDDT) was used for intradomain confidence, whereas Predicted Aligned Error (PAE) was used for determining “between domain” or “between chain” structural confidence. The 3D structure validation was carried out using Ramachandran plot analysis [43, 44], and the best predicted structure was retained for further analysis. The active site was defined as having a , estimated by SVM in I-TASSER. The secondary structure was identified by the ProMotif program [45].

2.4. Virtual Screening

The 3D protein structure was prepared for docking as described previously [46, 47]. Seven residues (GLY219, ARG236, HIS272, ASP274, GLY393, ARG394, and CYS399) were defined as active site residues, as predicted by the I-TASSER [42]. S-Nitroso-L-homocysteine was taken as the control, as it is a potent active-site-directed, irreversible inhibitor of ADI (EC number: 3.5.3.6) [48]. It was taken as a control for docking validation, by binding in the pocket and comparing its binding affinity value with new leads. A marine microbial metabolite (lead-like) set of compounds () was obtained from the CMNPD website (https://cmnpd.org/; accessed 12 April 2023). It was filtered based on drug-likeliness and lead-like criteria. Drug-like criteria comprised of the “Lipinski Rule of Five” [49], fulfilling the criteria of “, octanol-water partition coefficient , , and .” A compound that violated more than one of these criteria may have lower oral bioavailability and may face challenges in becoming an effective drug. Hence, the compounds fulfilling at least three of these parameters were selected. In addition, Oprea et al.’s lead-like criteria [50] were checked, and apart from molecular weight and hydrogen bond donors ()/acceptors (), key descriptors included rotatable bonds, rigid bonds, ring count, and logP (-3.5 to 4.5). The rules are aimed at maintaining focus towards effective and orally absorbable compounds. Only molecules fulfilling Lipinski’s drug-like and Oprea’s lead-like criteria were taken for docking. The docking was performed using AutoDock Vina software, according to the previously described parameters [51]. The affinity value was obtained for best binding pose, the top-ranking compounds were visually inspected, and 2D interactions were mapped to identify the interacting residues. Furthermore, Molecular Mechanics/Poisson-Boltzmann Surface Area (MM/PBSA) values were computed for these complexes, according to Basharat et al. [52]. MM/PBSA is a computational approach used in molecular dynamics simulations to estimate the free energy of binding for protein-ligand complexes. The MM/PBSA method combines molecular mechanics (MM) calculations, which describe the energy associated with the molecular structure, with solvation free energy computations based on the Poisson-Boltzmann equation and the solvent-accessible surface area (PBSA). The relevance of MM/PBSA values lies in their ability to offer insights into the strength of interactions between a ligand and its target protein, helping in the identification and ranking of potential drug candidates.

2.5. Pharmacokinetic Profiling

ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) was studied using the pKCSM server (https://biosig.lab.uq.edu.au/pkcsm/theory; accessed 25 April 2023) and SWISS-ADME [53] (http://www.swissadme.ch/index.php; accessed 28 April 2023). Multiple models have been implemented in these tools, including models for predicting human ether-a-go-go-related gene (hERG) inhibition to assess cardiac arrhythmia, cytochrome P450 (CYP) inhibition (leading to possible adverse reactions or therapeutic failures), mutagenicity (Ames test), blood-brain barrier penetration, cytotoxicity, and Caco2 permeability (measure intestinal absorption).

2.6. Molecular Dynamics (MD) Simulation

The top-scoring compound along with the control (S-nitroso-L-homocysteine) was subjected to MD simulation to further evaluate stability of the binding interactions with the ADI protein. Docked complexes were prepared using the preparation wizard of Desmond (Schrodinger, LLC, NY, USA) according to previously described parameters [54]. Het states were generated using the Epik module, and heavy atoms were converged to energy minimization till a value of 0.30 Å was attained. The OPLS3e force field and TIP3P water model were selected. The complexes were simulated for 100 ns, and then, the simulation results were analyzed using built-in tools for obtaining RMSD, etc. This allowed insights into the structural and dynamic properties of the complexes. Snapshots at 0, 50, and 100 ns were extracted, and MM/PBSA values were calculated [52].

3. Results

3.1. Pan-Proteome Analysis

The expected size of the pan-proteome was 3556, while the estimated size was 3622.33. The actual nonredundant proteins in the accessory fraction were 2396. The expected and estimated sizes of the pan-proteome were relatively close, but the actual number of nonredundant proteins in the accessory fraction was lower than expected. This could be due to the reason that some of the proteins in the accessory fraction may be redundant or may not have been detected in the study. The power law model provides a mathematical framework for understanding the open nature of the pan-genome or proteome. It allows the prediction of the pan-proteome size for a given species, even if the genomes of all of the organisms in that species have not yet been sequenced. The equation is used for this purpose, where and are constants that determine the shape of the curve. The value of the exponent in the power law model determines whether the pan-proteome is open or closed. BPGA determined the value of for B. garinii strains as 0.33, indicating that the pan-proteome is open (Figure 1). It signifies that the entirety of proteins within B. garinii is not final and that the ongoing advancements in scientific techniques may lead to the continual discovery of new proteins, rendering the catalog of proteins dynamic [55]. Additionally, horizontal gene transfer, etc., could also add to the existing gene pool [56]. The concept of an open proteome becomes particularly pertinent in the contemporary era of high-throughput sequencing and advanced proteomic methodologies, which help identify and elucidate novel proteins, as well as explore various isoforms, thereby contributing to the evolving understanding of the intricate protein landscape within living organisms. Most microorganisms have an open proteome to some extent, as technological advancements and ongoing research efforts continually reveal new protein-coding genes, splice variants, and posttranslational modifications [57]. The number of accessory proteins varied across proteomes, ranging from 444 in IPT107 to 1210 in IPT134 (Table 1). The expected size of the core genome was 0, while the estimated size was 0.18, which suggests significant variability in the genes considered to be part of the core genome. However, the real set of conserved proteins across all strains included in the study was 37, which were expected to be essential for the survival and function of the species.

The number of unique proteins also varied across proteomes, ranging from one in the strain SZ to 69 in IPT128. The total number of unique proteins was 1123. The highest number of unique genes was 69 in IPT128, suggesting that this strain has undergone significant genomic changes compared to other strains. The lowest number, i.e., just one unique protein, was in the strain IPT133, IPT136, IPT140, and SZ (Table 1). Findings indicate that each strain had a unique set of proteins that distinguished it from the other strains. The highest number of exclusively absent genes was 43 for IPT107, suggesting that this strain has lost a significant number of genes compared to other strains. This could be due to factors such as gene deletion or mutation.

3.2. Therapeutic Target Mapping

The core proteins were taken and subjected to a subtractive proteomic approach. Among these, no paralogs were present, showing there are no duplicated genes within the core proteome of this bacterium. A total of 25 essential proteins were identified after conducting BLAST searches with CEG and 27 after BLAST against DEG. Altogether, 25 proteins common to both databases were retained (Table 2). Subtraction from the human proteome left only eight sequences, while subtraction from the gut bacterial proteome left only a single protein, i.e., ADI (accession no.: WP_031505634.1) for further analysis. This protein was also identified as a drug target after BLAST with the DrugBank. Its alignment with pathogenically important B. afzelii and B. burgdorferi species revealed that 17.8% of residues in the consensus sequence had a similarity lower than 50% (Figure 2). Nonetheless, on the whole, the sequence remained conserved.

3.3. Structure Modeling of ADI

With regard to the total number of residues in the selected target, the ADI sequence was 414. Out of these, 20 were glycine and 14 were proline residues. These amino acids have been reported distinctly as they play a specific role in protein structure and function, transforming it into fibril or elastomer [58]. Small size and flexibility of glycine make it suitable for regions requiring structural adaptability and attain conformational flexibility. The cyclic structure of proline introduces constraints and rigidity, influencing protein folding and stability. It acts as a “helix breaker” and disrupts regular alpha-helix formations. This disruption is essential to prevent overly rigid structures and to facilitate proper protein folding [59].

The top threading templates by I-TASSER included ADI from group A Streptococcus (PDB ID: 4BOF), enolase from Mycoplasma pneumoniae (PDB ID: 7E2Q), ADI from Mycoplasmopsis arginini (PDB ID: 1S9R; 1LXY), and the structural protein VP3 from Bombyx mori cypovirus 1 (PDB ID: 7WHP). The top five predicted structures had confidence scores ranging from -3 to 0.97. The top-scoring structure (Figure 3(a)) with a confidence score of 0.97 had an estimated TM score of and RMSD of . This structure was also overlaid on the top threading templates used for building its coordinates and showed a 48% identity with ADI of group A Streptococcus, 37% identity with ADI of Mycoplasmopsis arginini, just 8% with cypovirus 1, and 4% with the enolase of Mycoplasma pneumoniae. The predicted secondary structure and solvent accessibility of the sequence suggests that it is a globular protein with a predominantly helical and coil structure. The 3D structure had six sheets, four beta-alpha-beta units, six beta-hairpins, seven beta-bulges, 18 strands, 19 helices, 17 helix-helix interactions, 45 beta-turns, and 16 gamma-turns. The predicted solvent accessibility advocates that the ADI comprises regions partially exposed to the solvent, potentially indicating the presence of binding or interaction sites. The structure predicted by AlphaFold (Figure 3(b)) comprised of 6 sheets, 4 beta-alpha-beta units, 6 beta-hairpins, 6 beta-bulges, 19 strands, 21 helices, 25 helix-helix interactions, 36 beta-turns, and 4 gamma-turns.

The Ramachandran plot statistics by I-TASSER depicted 77.0% residues in the most favored regions, 19.6% in additional allowed regions, 1.1% in the generously allowed regions, and 2.4% in disallowed regions (Figure 3(c)). The model had a reasonable stereochemical quality, with most residues falling in the most favored regions of the Ramachandran plot. However, some dihedral angles in the model were unusual, particularly Omega. The model by AlphaFold had 92.3% residues in the most favored regions, 7.4% in additional allowed regions, 0.3% in generously allowed regions, while none in the disallowed region (Figure 3(d)). pLDDT showed a very high confidence score of >90 and low predicted aligned error for most of the structure. Hence, this structure was used for molecular docking and further analysis.

3.4. Docking

Several inhibitors were prioritized after molecular docking (Table 3) with ADI, having better binding scores than control (S-nitroso-L-homocysteine). Ten residues of ADI interacted with the control (Figures 4(a)4(c)), 12 with CMNPD18759 (Figures 4(d)4(f)), 12 with CMNPD24419 (Figures 4(g)4(i)), nine with CMNPD24876 (Figures 4(j)4(l)), 11 with CMNPD8737 (Figures 4(m)4(o)), and ten with CMNPD23643 (Figures 4(p)4(r)).

The control made four, CMNPD18759 made three, CMNPD24876 made three, CMNPD8737 made four, and CMNPD23643 made five hydrogen bonds. CMNPD24876 also made two arene-hydrogen and one arene-cation bond. Five residues (ASP43, ASP44, ASN349, ARG367, and ARG394) made interactions with all ligands, including the control. MM/PBSA values were computed for protein-ligand complexes, representing the estimated free energy of binding for each compound with ADI. Negative MM/PBSA values generally indicate favorable binding interactions, with an increasing negative value suggesting a stronger binding affinity between the ligand and the protein. The ranking of screened compounds based on the free energy of binding mirrored their order from the docking scores. This consistency is promising and indicates the reliability of both approaches in predicting the binding affinity of the compounds. Control compound was an exception, where a positive value was obtained in contrast to its lower docking score. This discrepancy in docking and MM/PBSA score might be attributed to the different aspects captured by each method and emphasizes the importance of considering multiple computational approaches in drug discovery.

The top-scoring compound aureobasidin (Table 3) is an antibiotic and has previously been implicated as an inhibitor of the inositolphosphorylceramide synthase AUR1 of fungus [60]. It has been isolated from Aureobasidium pullulans R106, a black yeast-like fungus [61]. This fungus has previously been isolated from marine sources as well [62]. Korormicin is a metabolite of Pseudoalteromonas sp. [63] and has also been known to produce reactive oxygen species to kill bacterial species like Vibrio cholerae and Pseudomonas aeruginosa [64]. 6-Hydroxypestalotiopsone C is derived from the mangrove-derived endophytic fungus Acremonium strictum [65], while Pestalotiopsone E is derived from Pestalotiopsis sp. [66]. Pestalotiopsone has previously shown influenza virus neuraminidase inhibition activity [67]. These compounds depicted good binding efficacy against ADI of B. garinii and suggest that these marine-derived compounds could potentially be developed into new therapeutics against Lyme infection.

3.5. ADMET

Absorption results by pKCSM showed that the five prioritized marine microbial metabolites had low water solubility but relatively high Caco2 permeability and intestinal absorption (human), suggesting that they may be able to pass through the intestinal lining and enter the bloodstream. This was also confirmed by SWISS-ADME results (Figure 5). The skin permeability values suggest that these compounds may have difficulty penetrating the skin. They were also classified as P-glycoprotein substrates, indicating that they may be transported out of cells by this protein. Additionally, two of these compounds were also predicted as P-glycoprotein inhibitors (Table 4). The steady-state volume of distribution (VDss) values for all five metabolites were negative, indicating that they do not tend to concentrate in tissues, but rather in plasma. The fraction-unbound values were generally low, suggesting that these metabolites tend not to diffuse or traverse cellular membranes. Blood-brain barrier (BBB) permeability values suggest that the metabolites had poor ability to cross the blood-brain barrier, while CNS permeability values suggest that the compounds would have difficulty penetrating the central nervous system. This was again confirmed by SWISS-ADME results (Figure 5). The compounds were not a substrate for cytochrome P (CYP) enzymes, except for CMNPD8737, which had the tendency to be metabolized by CYP3A4. The clearance values for all five metabolites indicate that biliary and hepatic clearance mechanisms may be involved, but there does not seem to be any involvement of the renal OCT2 transporter clearance mechanism in excretion. Regarding toxicity, none of the metabolites were found to be toxic or inhibit potassium channels encoded by hERG (Table 4). Hence, they would not lead to long QT syndrome or subsequent ventricular arrhythmia. However, one of the metabolites, CMNPD8737, was found to be hepatotoxic. The values representing the dose indicate that CMNPD8737 and CMNPD23643 are relatively more toxic for humans, compared to other metabolites having positive values on a log scale. However, for rats, this was not the same, and acute toxicity values were relatively similar for most of the compounds. Only CMNPD18759 had a lower toxicity and a high LD50. The chronic toxicity pattern of the compounds varied from acute toxicity, and CMNPD8737 showed the highest chronic toxicity potential. Studying the toxicity of drug compounds in the ciliated model organism T. pyriformis and minnows is essential to assess their environmental impact, ensure regulatory compliance, and better understand their pharmacological and toxicological properties. The control, CMNPD18759, and CMNPD23643 showed higher toxicity in T. pyriformis, while CMNPD8737 and CMNPD18759 showed higher toxicity in minnow.

3.6. MD Simulation

The top-scoring compound CMNPD18759 was simulated alongside the control, for 100 ns, to determine its binding stability. The default ensemble used for protein-ligand simulations in Desmond was adopted, i.e., the NPT (constant Number of particles, Pressure, and Temperature) ensemble. Thus, the temperature was maintained constant (300 K) throughout the simulation, allowing the system to exchange energy with a heat bath while keeping the particle number fixed. This is suitable for simulating biological macromolecules like proteins and bound ligands because the conditions encountered in experimental settings are mimicked, where the system is maintained at a constant temperature. Counterions like Na+ and Cl- were introduced to neutralize the net charge of the system. The simulation results suggest significant differences in the binding behavior between the control and CMNPD18759, where the control remained bound stably to the ADI throughout the simulation (Figure 6(a)), while CMNPD18759 underwent a conformational change after 60 ns (Figure 6(b)), which implies for larger scale simulations to attain equilibrium. Side chains showed maximum deviation, followed by heavy atoms and then backbone residues. For investigating conformational changes of CMNPD18759, snapshots of ligand-bound ADI were extracted from the trajectory at the first frame (corresponding to 0 ns), 1000th frame (corresponding to 16 ns), 3000th frame (corresponding to 48 ns), 4000th frame (corresponding to 64 ns), and 5000th frame (corresponding to 80 ns). These were then visualized in comparison to the first frame (reference). The ligand position was altered in due course of time (Supplementary Figure 1). RMSF plot of the control showed flexibility around residue 130 and 260 (reaching up to 5 Å) (Supplementary Figure 2A), while the B-factor remained less than 100 throughout the simulation. The N-terminal and C-terminal regions usually exhibit higher levels of fluctuation compared to other segments of the protein. In contrast, secondary structural elements like alpha-helices and beta-strands tend to display greater rigidity in comparison to the unstructured regions of the protein, resulting in relatively lower fluctuations when considering the loop regions. TRP348, ASN349, ASP350, ARG367, SER391, and GLY393 interacted with the control for more than 30% of the simulation time (Supplementary Figure 2B). ARG367 retained hydrogen bonding for the maximum time, amongst other binding residues (Supplementary Figure 2C). Hydrogen-bonding characteristics are of paramount significance in drug design due to their potent impact on drug specificity, metabolism, and absorption. RMSF of CMNPD18759 binding with ADI was more flexible (Supplementary Figure 3A). There were 24.62% helices and 18.87% strands (overall 43.49% secondary structure elements) in the control trajectory over 100 ns, while CMNPD18759 showed 25.29% helices, 18.31% strands, and 43.60% overall secondary structure elements in the simulation trajectory. No residue retained contact for more than 30% of the simulation time with CMNPD18759. It also formed more hydrophobic interactions than the control (Supplementary Figure 3B). The simulations were replicated, and similar results were observed for both complexes (Supplementary Figure 4).

The MM/PBSA values of the simulated complexes were recorded at different time points (0 ns, 50 ns, and 100 ns). In the case of the control-ADI complex, the values were found to be -30.48 kcal/mol at 0 ns, -32.13 kcal/mol at 50 ns, and -31.85 kcal/mol at 100 ns. For the CMNPD18759-ADI complex, the corresponding MM/PBSA values were -30.44 kcal/mol, -33.38 kcal/mol, and -33.38 kcal/mol at 0 ns, 50 ns, and 100 ns, respectively. These values represent the calculated binding-free energies for each complex at the specified time intervals during the MD simulation. The negative values indicate a favorable interaction, with lower energies as the simulation time proceeds, suggesting stronger binding affinity between the ligand and ADI. The consistency in these values for the CMNPD18759-ADI complex over time suggests a stable and strong protein-ligand affinity.

4. Discussion

The field of drug design has been revolutionized by the advent of pan-proteomics, which revolves around the entire coding DNA sequence repertoire of a given microbial species to gain a comprehensive understanding of genetic diversity [68]. This approach enables the identification of core and accessory protein fractions, which can inform the design of drugs that target specific pathways and virulence factors in the select fraction. This approach has previously been implemented for the identification of highly specific and effective therapeutic targets using computational approaches [69, 70]. Using this approach, a single drug target was predicted from the core proteome of 61 strains in the case of Helicobacter pylori, a gastric cancer-causing bacterium [71]. Fereshteh et al. used this approach to determine common drug targets in four gram-negative superbugs [72] while Uddin and Jamil used a similar approach to find drug targets in P. aeruginosa [73]. Basharat et al. used a similar strategy to mine targets in Yersinia pseudotuberculosis [39] and Shigella sp. [52, 74]. Here, we used this approach coupled with subtractive proteomics for B. garinii, to help prioritize candidate therapeutic targets. This bacterium is responsible for Lyme disease, a tick-borne infectious disease. The primary treatment for Lyme disease is antibiotics, such as doxycycline, amoxicillin, and cefuroxime [75]. However, the emergence of antibiotic-resistant strains is a growing concern, as it limits the effectiveness of existing antibiotics and poses a significant threat to public health [15, 76, 77]. To treat such resistant infection and replenish the drying antibiotic pipeline, we need to identify new therapeutic targets and antibiotics.

Marine-derived metabolites are produced by marine organisms such as algae, microbes, sponges, and corals and have shown promising therapeutic potential in the treatment of various diseases, including infectious diseases [78]. Researchers can more efficiently identify and validate new therapeutic targets and design or optimize novel antibiotics from marine sources, for the treatment of infectious diseases (like Lyme disease). This can be done using molecular docking-aided virtual screening. Herein, we scanned more than 4500 lead-like compounds from marine microbes by leveraging the power of in silico methods. This procedure was used to identify lead compounds that can bind and inhibit the function of the selected ADI target. Richards et al. have previously identified its role in Borrelia sp. growth, where it boosts intracellular L-arginine crucial for growth [79]. Inhibiting ADI has several advantages as a therapeutic strategy. First, ADI is essential for bacterial growth, so inhibiting ADI can be expected to be effective against a wide range of bacteria. Second, ADI is not present in humans, so inhibiting ADI is unlikely to have any harmful side effects.

In silico methods have played a crucial role in identifying and optimizing potential small molecule inhibitors from various sources, including marine-derived products, offering a rapid and cost-effective approach to drug development [8082]. Marine natural products have garnered significant attention in drug discovery due to their structural diversity and bioactive properties. The number of marine natural products identified to date is >40,000 (https://marinlit.rsc.org/; retrieved 30 April 2023). These compounds come from a diverse range of marine sources, including microorganisms, phytoplankton, various types of algae, sponges, cnidarians, bryozoans, molluscs, tunicates, echinoderms, mangroves, and other intertidal plants and microorganisms. Only 15 marine-derived natural compounds have been approved by FDA, till 2022 [83]. We need to identify and study more of these useful natural product scaffolds to fight off the menace of antibiotic resistance and replenish drying antibiotic pipelines. Five such compounds are shortlisted in this study (Figure 7). The prioritized inhibitors of ADI had relatively high Caco2 permeability and intestinal absorption which suggest that they may be able to pass through the intestinal lining and enter the bloodstream. The skin permeability values indicate that these metabolites may have difficulty penetrating the skin, which limits their potential use in topical applications. The negative VDss values indicate that these metabolites do not tend to cause QT syndrome. They also tend to accumulate in plasma rather than tissue, and this can affect their distribution and elimination, as metabolites that accumulate in tissues can lead to toxic effects. Although drugs are specifically designed to produce therapeutic effects in humans, they may also inadvertently result in unintended side effects in other organisms. This can occur when pharmaceuticals enter water bodies through wastewater or when animals are exposed to pharmaceutical residues in the environment [84]. Hence, values for toxicity were calculated for model organisms like T. pyriformis and minnow, to help identify potential ecological risks. Compounds showing adverse effects depict a warning sign for potential environmental issues. Accumulation of even mildly toxic compounds over time in the environment can cause detrimental issues like biomagnification. Hence, this aspect needs to be considered, and such pharmaceutical compounds should be appropriately treated from key outlets like hospital wastewater before being released into the environment [85].

In silico methods can help researchers identify and optimize small molecule inhibitors with high binding affinity, selectivity, and pharmacokinetic properties [86]. This approach accelerates the drug discovery process, reduces the cost and time required for traditional drug development, and improves the likelihood of success in clinical trials. However, the limitation of this approach is that molecules may behave differently in the cellular environment, and computational simulations cannot fully capture that. Therefore, compounds identified by this method should be validated through in vitro or in vivo experiments. Moreover, our study only focused on marine-derived microbial compounds, and future research could investigate other natural sources, such as terrestrial plants or fungi, for identifying potential therapeutic compounds. Furthermore, this study only examined a relatively small subset of marine microbes, and a more extensive exploration of marine microbial diversity could yield additional promising compounds.

5. Conclusion

There is a critical need for the development of novel antibiotics that are effective against the causative agents of Lyme disease, for instance, B. garinii, B. burgdorferi, and B. afzelii. Achieving this necessitates the identification of new therapeutic targets and the development of inhibitors that can selectively and effectively inhibit these specific targets. In this study, we focused on one bacterium, i.e., B. garinii, and used a virtual screening approach to identify marine compounds having inhibitory potential against B. garinii. We screened a library of over 4000 marine compounds against the ADI enzyme and identified several compounds with good binding and possible inhibition activity. We conclude that marine compounds are a rich source of novel leads for drug development against B. garinii. These compounds may be further evaluated in vitro and in vivo, but our findings provide a promising starting point for the development of new antibiotics that are urgently needed to combat Lyme disease.

Data Availability

All the data used or generated in this study is provided as an accession number or relevant information as tables in the manuscript.

Conflicts of Interest

The authors declare that they have no conflict of interest.

Authors’ Contributions

ZB, GA, and AAB conceived and designed the experiments. ZB, SS, and AKA performed the experiments and wrote the manuscript. GA and AAB curated the data and validated the findings. GA provided software support, and ZB supervised the project. All authors approved manuscript for publication in the current form.

Acknowledgments

The authors would like to thank the Deanship of Scientific Research at Shaqra University, Saudi Arabia, for supporting the work.

Supplementary Materials

Supplementary Figure 1: protein ligand complex position at various time stages. Water molecules and ions have not been shown to visualize clear placement of the ligand. (A) First frame at 0 ns, showing the ligand in CPK representation. The protein is shown in the ribbon (colored by the secondary structure). (B) 1000th frame at 16 ns, showing the ligand in CPK representation. The protein is shown in the ribbon (purple colored). (C) 3000th frame at 48 ns, showing the ligand in CPK representation. The protein is shown in the ribbon (gray colored). (D) 4000th frame at 64 ns, showing the ligand in CPK representation. The protein is shown in the ribbon (wheat colored). (E) 5000th frame at 80 ns, showing the ligand in CPK representation. The protein is shown in the ribbon (orange colored). Supplementary Figure 2: (A) RMSF of control compound. (B) Residue interaction of control compound, retained for more than 30% of simulation time. (C) Interaction fraction of residues during simulation. Hydrogen bonds are shown in green, hydrophobic in mauve, ionic in pink, and water bridges in blue. Supplementary Figure 3: (A) RMSF of CMNPD18759. (B) Interaction fraction of residues during simulation. Hydrogen bonds are shown in green, hydrophobic in mauve, ionic in pink, and water bridges in blue. Supplementary Figure 4: (A) RMSD plot of control and protein. R1 denotes the first replicate, and R2 denotes the second replicate. (B) RMSD plot of CMNPD18759 and the protein. R1 denotes the first replicate, and R2 denotes the second replicate. (C) RMSF of the control-protein complex. (D) RMSF of the CMNPD18759-protein complex. (Supplementary Materials)