A survey on computational methods in discovering protein inhibitors of SARS-CoV-2

Abstract The outbreak of acute respiratory disease in 2019, namely Coronavirus Disease-2019 (COVID-19), has become an unprecedented healthcare crisis. To mitigate the pandemic, there are a lot of collective and multidisciplinary efforts in facilitating the rapid discovery of protein inhibitors or drugs against COVID-19. Although many computational methods to predict protein inhibitors have been developed [ 1– 5], few systematic reviews on these methods have been published. Here, we provide a comprehensive overview of the existing methods to discover potential inhibitors of COVID-19 virus, so-called severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). First, we briefly categorize and describe computational approaches by the basic algorithms involved in. Then we review the related biological datasets used in such predictions. Furthermore, we emphatically discuss current knowledge on SARS-CoV-2 inhibitors with the latest findings and development of computational methods in uncovering protein inhibitors against COVID-19.


Introduction
Since the first instance of the new coronavirus, Coronavirus Disease-2019 (COVID- 19), was uncovered in Hubei Province, China in December 2019, there have been approximately 18 months after turning the local pandemic into the global one. As of 8 June 2021, a total of about 174 million people were infected by COVID-19, including over 3 870 000 deaths worldwide [6]. The pandemic has devastating consequences not only on humans lives but also on the global economy, including more than 8.5 trillion US dollars lost in 2020 and 2021 [7,8]. Therefore, there is an urgent need to control the pandemic by accelerating the development or production of effective drugs against severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). According to the previous studies [9,10], SARS-CoV-2 is the single-stranded enveloped RNA virus with a symmetrical nucleocapsid. The viral genome of SARS-CoV-2 is highly similar to those of SARS-CoV and MERS-CoV [11], whose outbreaks happened within two decades in China and Saudi Arabia, respectively. Hence, the drugs or inhibitors designed for SARS-CoV and MERS-CoV were considered to be applied for SARS-CoV-2 as well. For example, SARS-CoV enters into the target cells through the structure spike (S) protein by binding to the angiotensinconverting enzyme 2 (ACE2) receptor [10]. The conservation of spike protein of SARS-CoV-2 suggests that the same interaction between the spike protein and the ACE2 receptor would be remained during the processing of inflection. In addition, other small molecules can be potential targets that play critical roles in viral genome replication and gene transcription, e.g. RNA-dependent RNA polymerase (RdRp), or cleavage and activation of the spike protein to enter the host genome and assist genome replication, e.g. type 2 transmembrane serine protease (TMPRSS2). They also keep the similar conservative characteristics among SARS-CoV, MERS-CoV and SARS-CoV-2.
Regardless of their unknown side effects, effective vaccines have been developed against the SARS-CoV-2 infection, like BioNTech vaccine (from Germany), Moderna vaccine (from the USA), Sinopharm vaccine (from China) and AstraZeneca vaccine (from Britain) [12]. Among them, messenger RNA (mRNA)based vaccines are a relatively novel technology that remained to be further proven. Both available mRNA-based vaccines, Moderna and BioNTech, encode the spike protein of SARS-CoV-2 binding with the ACE2 receptor. But the SARS-CoV-2 virus has mutated frequently during its evolution and transmission [13,14], resulting in genetic variations in the population of circulating viral strains throughout the COVID-19 pandemic. Until June 2021, multiple major variants of SARS-CoV-2 have dominated the world, e.g. Alpha virus (found in England), Beta virus (found in South Africa), Delta virus (found in India), [15]. Variants of SARS-CoV-2 have different characteristics, leading to unknown efficacy of the existing vaccine against the mutated virus [16]. Hence, design of inhibitors or drugs for specifically mutated SARS-CoV-2 is still necessary and challengeable. Figure 1 presents the timeline of major events related to the SARS-CoV-2 outbreak and vaccine development during 2020 and 2021 until 30 June 2021.
Compared with the traditional drug/inhibitor design process which is time-consuming and costly, computational methods for drug/inhibitor design are highly efficient to predict or identify potential molecules for the disease treatment [17]. Thus, the computer-aided approaches have great potentials for rapidly designing drugs or vaccines for mutated SARS-CoV-2. In the past months, there were several small molecules identified as potential inhibitors targeting SARS-CoV-2, even though more experimental validations are needed on the molecular targets. Among the molecular targets of SARS-CoV-2, main protease (M pro ) or 3-chymotrypsin-like protease (3CL pro ) [18], structure proteins (e.g. spike protein), and nonstructure proteases, such as RdRp, and helicase [19], are highly conserved as well as essential to the viral life process. The structural information and functional roles of these major molecular targets against SARS-CoV-2 are summarized in Table S1 in the supplementary material.
We will start the review with diverse computational methods for drug and inhibitor design, followed by detailed descriptions and discussions on the findings of multiple enzymes as valid targets for potential inhibitors to treat coronaviruses diseases.

Insights to the Computer-Aided Drug Design
Computer-Aided Drug Design (CADD) emerged as an efficient method to uncover potential lead compounds and aiding the development of possible drugs for a wide range of diseases based on the knowledges collected by huge compound libraries [20]. Typically, CADD has three types of approaches, including structure-based drug design (SBDD), ligand-based drug design (LBDD) and virtual screening (VS). Furthermore, machine learning-based drug design (MLDD) has been widely applied with the rapid development of computer science communities [21,22]. Herrin, we will provide a brief summary of CADD approaches and related databases as seen in Figure 2.

Structure-based drug design
With the development of chemical biology and structural biology technology, the structural information of more and more drugs has been uncovered, providing essential elements for SBDD. Depending on the 3D structure of targets (proteins), such as Xray crystallography or NMR spectroscopy, SBDD method predicts the potential interaction by evaluating the strength of the binding force between small molecule compounds and targets with the known structure. Molecular docking, a molecular modeling technique as the most basic method in SBDD, allows exhaustive search for the most suitable binding conformation of small molecules in the binding pocket of the protein. The framework of molecular docking is a search algorithm in which the ligand conformation is computed recursively until it converges to the lowest energy. It can effectively determine the ligand molecules that match the spatial and electrical characteristics of the active sites of the target receptors. At present, molecular docking plays an increasingly important role in SBDD [23]. Some common molecular docking software are listed in Table S2, including AutoDock [24], AutoDock Vina [25], AutoDockFR [26], ZDOCK [27], Glide [28], Flare [29], Induced Fit [30], MolDock [31] and M-ZDOCK [32].

Ligand-based drug design
However, the 3D structures of some drug targets have not been resolved successfully. For such cases, people developed another approach for direct drug design, LBDD, by taking advantage of existing compounds with known biological activities then establishing the relationship between query molecules and the bioactive molecules. In general, LBDD first converts the molecular structure into digital descriptors from a constructed database, e.g. molecular fragments, physiochemical properties, topology and pharmacophores, then generates the relationship between the molecular activities and constructs these descriptors by specific design models. The new drug molecules can be predicted or designed based on proper statistical methods, whereas their possible targets can be inferred from the bioactive molecules having high chemical affinities with the query. In addition to two common LBDD methods, quantitative structure-activity relationship (QSAR) and pharmacophore model [33], other popular LBDD-based software and their corresponding information are listed in Table S3, including McQSAR [34], SYBYL-X [35], TOPS-MODE [36], LigandScout [37], PLIP [38], FindSite-metal [39], CORAL [40].

Virtual screening
VS is another technique that uses a high-performance computer to analyze large databases of compounds to identify potential drug candidates that bind well to known structural targets [22]. There are two specific strategies for VS: receptor structure-based and ligand similarity-based. Despite different detailed strategies in the VS, the following four steps are essential: (i) preparing target protein and compound database; (ii) docking the molecules in the molecular library with the target one by one; (iii) obtaining a reasonable binding mode according to scores of the binding modes between small molecule and target, then evaluating the binding strength; (iv) purchasing selected preranked screened compounds followed by the activity tests.
The whole VS process can be carried out on computers by indexing the structures of compound molecules in the database instead of purchasing and testing the real compound molecules before the selection. Obviously, VS method is more convenient, cost efficient and quicker, compared to the experimental synthesis. Table S4 lists some common VS tools with brief introductions, e.g. PyRx [41], LiSiCA [42], MTiOpenScreen [43], iScreen [44], DockThor [45], GOLD [46], FlexX-Scan [47].

Machine learning-based drug design
Machine learning (ML) is an advanced data analysis method to improve the model automatically through the learning process from data and patterns [48]. ML technologies have been widely used in many fields, such as computer vision [49][50][51], natural language processing [52][53][54][55] and bioinformatics [56][57][58][59][60]. MLDD adopts various algorithms, such as recursive partitioning, support vector machine (SVM), k-nearest neighbors and neural networks [61][62][63], to investigate the activities of compounds against a target before the clinical trials [64,65]. For example, Holden et al. [66] applied the SVM classification algorithm to the analysis of structure-activity relationship to predict the inhibition of dihydrofolate reductase by pyrimidines. Meng et al. [67] proposed persistent spectral-based ML models for drug design, which consist of the persistent spectral graph, persistent spectral simplicial complex and persistent spectral hypergraph based on the spectral theory. Now the integration of SARS-CoV-2 related studies with modern ML algorithms becomes a hot topic in drug repurposing models [68][69][70].
These databases provide a variety of knowledges about drug candidates including physicochemical properties, molecule structure, in addition to diverse data in vitro, in vivo and from clinical. For example, PubChem [71] is a database of chemical molecules collected by the National Center for Biotechnology Information (NCBI). The NCBI now hosts three dynamically growing primary databases, including 111 million entries of compounds, 293 million entries of substances, and bioactivity results from 1.25 million high-throughput screening assays. Similar to PubChem, ChEMBL [72] is a publicly available database, containing information on binding, functional and ADMET for drug-like bioactive compounds. Currently, the database consists of 5.4 million bioactivity measurements for more than 1 million compounds and 5200 protein targets, which were manually abstracted from the primarily published literatures. There are some databases about drug compounds. For example, DrugBank [73] combines drug data with the information of drug targets and drug actions, which has been widely used in drug-target discovery, drug design, drug docking or screening, and drug interaction prediction. It collects approximately 4900 drug entries including 60% more FDA-approved small molecules and 10% more experimental biotech drug rugs. DrugBank has significantly improved the simplicity of its infrastructure and text query searches in the later updates. The e-Drug3D [75] is a 3D chemical structure database for drugs that provides several collections of drugs and commercial drug fragments. It currently contains 1519 annotated 3D structures of 1305 different FDAapproved drugs with molecular weight less than 2000. In the meantime, the drug databases in genetic and proteomic provide another scenario for drug design or discovery. As of September 2018, BioGRID [84] has recorded 1 598 688 biological interactions manually annotated from 55 809 publications for 71 species. BioGRID also accumulates details for over 700 000 posttranslational modification sites. The recently updated BioGRID also annotates genome-wide CRISPR/Cas9-based screens with gene-phenotype and gene-gene relationships.
During the drug development phases, biological information for therapeutic or metabolism are important and valuable. For example, the HMDB [78], released in 2007, is now considered as the standard metabolomic resource for human metabolic studies including information about human metabolites, physiological concentrations, disease knowledge, chemistry associations, reference spectra and metabolic pathways. Side effects, known as adverse events to a drug, are a crucial research point in drug repurposing. DrugMatrix [80] has been developed based on drug toxicities, consisting of the comprehensive results of thousands of highly controlled and standardized toxicological experiments. It focuses on toxicities research with more than 200 compounds tested in vivo in rat tissues and 125 compounds in the in vitro rat hepatocytes. There is no doubt thought of clinical data which can provide highquality information supporting drug design or discovery. PharmGKB [83] is an open-access database with clinically relevant information, collecting approved drug labels, genedrug interactions and relationships between genotype and phenotype. The corresponding detailed information can be found in Table S5.

CADD against SARS-CoV-2: targeting M pro
The main protease (M pro , also known as 3CL pro ) is recognized as a key enzyme to play a dominant role in the processing of mediating viral transcription and replication [85]. Since the binding pocket of this enzyme is highly conserved among all coronaviruses, like SARS-CoV, MERS-CoV and HCV, the antiviral drug targeting M pro may be effective against SARS-CoV-2 as well [86]. Indeed, a lot of recent studies have been published that employed CADD to discover anti-SARS-CoV-2 agents against M pro by different strategies, e.g. structure-based, ligand-based, VS or ML-based approaches ( Figure 3).
For example, people used the structure-based docking approaches to predict the inhibitory activity and help drug design against SARS-CoV-2 M pro [87]. Yu et al. [88] screened potential drugs by molecular docking to examine the effects of some common antiviral drugs like ribavirin, remdesivir, chloroquine and honeysuckle (a traditional Chinese medicine) as shown in Figure 4. Importantly, they recognized the luteolin as the control molecule is the main flavonoid in honeysuckle (Figure 3), which had a high binding affinity to the same sites of the main protease of SARS-CoV-2. Motonori Tsuji [4] performed structural refinement and energy calculations in the presence of peptidomimetic α-ketoamide inhibitors (PDB ID: 6Y2G, shown in Figure 5A). They found 28 bioactive compounds, including CHEMBL3236740, CHEMBL1447944 and others, were identified as effective anti-SARS-CoV-2 drug candidates ( Figure 3). Singh et al. [89] identified several compounds, glucogallin, mangiferin, N3, remdesivir and X77 which had stronger binding affinities with M pro . Furthermore, the results suggest that the phlorizin had the lowest binding free energy toward M pro (Figure 4), followed by glucogallin and mangiferin.
However, long-range interactions have not been discussed as often as the short-range interactions during the selection of candidate inhibitors. Sencanski et al. [90] used the protocol with both long-range and short-range interactions to select inhibitor candidates. They applied the informational spectrum method and molecular docking for small molecules to search the DrugBank database. Interestingly, 57 drugs were identified as potential SARS-CoV-2 M pro inhibitors. Additionally, tinospora crispa ( Figure 3) was recognized as one potential COVID-19 M pro inhibitor based on another independent molecular docking study [91].
To rapidly discover lead compounds for clinical treatments, Jin et al. [86] investigated a mechanism-based inhibitor (N3) by CADD and the crystal structure of M pro of SARS-CoV-2 with complex N3. They built a predicted model by integrating structurebased virtual and high-throughput screening, which assayed over 10 000 compounds as inhibitor candidates of M pro . One of these compounds, named ebselen, also had potential antiviral ability in cell-based assays (Figures 3 and 4).
Some recent studies have shown the feasibility of employing VS in inhibitor design of targeting M pro . For example, Abel et al. [103] developed a VS method with both ligand-and structurebased approaches. The proposed VS was performed for two NPs databases, Super Natural II [104] and Traditional Chinese Medicine [105]. Additionally, they used an integrated drug repurposing approach to identify potential inhibitors against SARS-CoV-2 M pro . Some drugs, like naldemedine, SN00017653, and pseudostellarin C, were identified as potential inhibitors for the first time ( Figure 3). Lee et al. [1] identified potential inhibitors against COVID-19 from the Korea Chemical Bank drug repurposing (KCB-DR) database [106]. The results suggest ceftaroline fosamil ( Figure 4) and the hepatitis C virus (HCV) protease inhibitor telaprevir as potential inhibitors against M pro .
Although some drugs, such as remdesivir, favipiravir or dexamethasone, have been known beneficial for COVID-19 treatment, they have limitations clinically for different reasons. Hence, Nayak et al. [107] accomplished the VS of a variety of US-FDA-approved drugs using computer-aided tools. The US-FDA-approved drug structures were selected from DrugBank. Among them, arbutin, terbutaline, barnidipine, tipiracil and aprepitant were identified as potential hits. Moreover, tipiracil and aprepitant bound to the M pro consistently, demonstrating potentially promising effects in pharmacologic treatments for COVID-19.
Structure-based VS is adopted to predict the best interaction between a ligand and a molecular target by scoring function. For example, Kumar et al. [108] utilized structure-based VS to identify hit molecules binding with the highest affinity to M pro . The results indicated that the hydrogen bonding and hydrophobic interactions are the major contributing factors in binding pocket of COVID-19 M pro . In addition, Hage-Melim et al. [109] used VS approaches based on the structure of the enzyme and two compound libraries to identify apixaban as a potential drug for future treatment of COVID-19. Fischer et al. [110] used shape screening and two docking protocols relevant for pharmacokinetics to narrow down commercially available compounds, leading to the natural compounds (−)-taxifolin and rhamnetin as potential inhibitors of M pro (Figure 3). These new findings may bring insight into our further understanding and discovery of inhibitor candidates targeting M pro [5,[111][112][113].
The reliability and accuracy of the ligand-based CADD method have been proven [114]. Han et al. [2] utilized the ligand-protein docking and molecular dynamic simulation for ab initio study to explore the binding mechanism or inhibitory ability by comparing two types of drugs: (i) clinically approved drugs including chloroquine, hydroxychloroquine, remdesivir, ritonavir, beclabuvir, indinavir and favipiravir, and (ii) a designed α-ketoamide inhibitor (13b) (Figure 3). The results suggested chloroquine had the strongest binding affinity with M pro /3CL pro . Meanwhile, inhibitor 13b has a higher research priority to treat the SARS-CoV-2 since its improved inhibition efficiency. Eleftheriou et al. [115] uncovered that anticoagulant therapy has been proposed for the treatment of severe SARS-CoV-2 caused pneumonia, particularly, DPP-4 inhibitors may be more effective for SARS-CoV-2-infected diabetic patients.
QSAR model, the classical ligand-based CADD method, was also utilized in recent inhibitor design studies. For example, Ishola et al. [116] selected SARS coronavirus 3C-like protease (3CL pro ) inhibitors data from the CHEMBL database. They constructed a QSAR model using the data with high correlations, which made the model statistically significant. The analysis revealed that 3CL pro -compound 21, 3CL pro -compound 22, 3CL procompound 40 complexes (Figure 3) were steadier than the baseline complex (3CL pro -X77). Alves et al. [3] developed QSAR models of these inhibitors then applied these models in VS with drugs in the DrugBank by conducting similarity searching and molecular docking in parallel. As a result, 42 compounds were identified as consensus computational hits. They were reported coincidentally in subsequent experimental screening studies (https://o pendata.ncats.nih.gov/covid19/). Kumar et al. [117] developed a 2D-QSAR model based on multiple linear regression (MLR) with 3CL pro inhibitors. The proposed model clearly exhibited the structural features which enhanced the inhibitory activity against the 3CL pro enzyme. Additionally, the most and least active molecules were investigated using molecular docking tools to explore the molecular interactions involved in binding. Gogoi et al. [118] screened a library of 44 citrus flavonoids using molecular docking. The nontoxic compounds were further investigated with molecular dynamics simulation and predicted activity (IC50 value) with the 3D-QSAR model. They suggested taxifolin ( Figure 3) as a potential inhibitor against SARS-CoV-2 M pro which can be further analyzed by subsequent experiments for treatment of COVID -19. There are more literatures about ligand-based CADD in inhibitor candidates designing targeting M pro [119,120].
As ML techniques can be applied to the predictive scenario based on previous knowledges and well-known patterns, some recent studies have contributed to the development of MLbased CADD methods targeting M pro . For example, Huang et al. [121] developed a biological activity-based modeling (BABM) approach, by which the compound activity can be predicted for a new target or other assays by using profiles across multiple well-defined assays. This model obtained 311 compounds against SARS-CoV-2, 32% of which showed antiviral activity in a cell culture live virus assay. More importantly, the most potent compounds presented nanomolar concentration levels for a half-maximal inhibitory. Nayarisseri et al. [122] proposed a shape-based ML method, which generates the 3D shaped pharmacophoric features of the seed compound. Furthermore, molecular docking was performed with optimized potential for liquid simulations (OPLS) algorithms to recognize high affinity compounds targeting M pro . The shape-based ML reported that remdesivir, valrubicin, aprepitant and fulvestrant were the best therapeutic drugs (Figure 3) since the highest affinities with the target protein. They also found a novel compound 'nCorv-EMBS', which is not included in public chemical databases (PubChem, ZINC or ChEMBL) so far. The results of toxicity analysis suggested nCorv-EMBS was valuable to further research as the main protease inhibitor in COVID-19 [122].
Inspired by ensemble learning, Gimeno et al. [123] first applied molecule docking against the structure of M pro using three popular tools: Glide [28], FRED [124] and AutoDock Vina [25]. Then, they proposed a hybrid ensemble approach to generate hypothetic binding modes replying on three score functions. Seven possible SARS-CoV-2 M pro inhibitors were predicted including perampanel, carprofen, celecoxib, alprazolam, trovafloxacin, sarafloxacin and ethyl biscoumacetate (Figure 3). Battisti et al. [125] also proposed an inhibitor predicting framework, which not only combines molecular dynamics simulations with molecular docking but also focuses on the feature information of pharmacophore modeling and the flexibility of molecular dynamics simulations simultaneously. The proposed approach identified 10 compounds with high coronavirus inhibition potential.
In addition to the traditional data-driven ML modeling, some studies used deep learning-based approaches to predict potential inhibitors of SARS-CoV-2 M pro [126]. For example, Park et al. [127] recognize some potentially drugs against SARS-CoV-2 using the pretrained deep learning drug-target interaction model called Molecule Transformer-Drug Target Interaction. They found that atazanavir, remdesivir, efavirenz, ritonavir and dolutegravir were the chemical compounds, showing an inhibitory potency against the SARS-CoV-2 3CL pro . Interestingly, they found that lopinavir, ritonavir and darunavir, which were designed to target viral proteinases, also bound to the replication complex components of SARS-CoV-2. Bung et al. [128] employed deep generative and predictive models to discover small molecules targeting inhibiting M pro . The transfer learning and reinforcement learning was applied to optimize the proposed deep learning model, which learned chemical space around the protease inhibitors. Other features, including multiple physicochemical property filters and VS scores, were used for the final screening as well. Finally, they proposed 33 potential compounds for further synthesis and testing against SARS-CoV-2. Based on the structural model, Zhang et al. [129] performed a deep learning-based VS method to rank and identify protein-ligand interactions. The summary of drugs or inhibitors targeting SARS-CoV-2 M pro /3CL pro can be found in Table S6.

CADD against SARS-CoV-2: targeting the structure protein
SARS-CoV-2 contains four structural proteins, including membrane protein (M), spike protein (S), envelope protein (E) and nucleocapsid protein (N), in addition to 16 nonstructural proteins (NSP1-16 as seen in the next section) [130]. Among them, the S protein can mediate the process of coronaviruses entering into host cells, so it becomes an attractive antiviral target for COVID-19 treatment.
Computational approaches have been developed to predict potential SARS-CoV-2 inhibitors targeting S protein. Previous studies demonstrated ACE2 as the key factor for SARS-CoV-2 to enter the host cells being bound by the spike protein of SARS-CoV-2 ( Figure 5B). Hence, ACE2 becomes another common target of drug intervention. Wen et al. [131] investigated the existing drugs according to their abilities to block the binding of S protein to ACE2. According to the pathogenesis of SARS-CoV-2 from the perspective of S protein and ACE2 binding, they found some substances, including peptide P6, griffithsin, EK1 and extracts from traditional chinese medicine, which fought against SARS-CoV-2 through binding ACE2 receptor, S protein, or inhibiting the host and virus. Faria et al. [132] also focused on the molecules that can inhibit the interaction between the S protein and human ACE2. They discovered some molecules at the interaction sites: four molecules in Tyr-491(Spike)-Glu-37(ACE2) and one in Gly-488(Spike)-Lys-353(ACE2). Furthermore, they found that the molecule 1629 and the molecule 2542 had significant inhibitory effects on the site of Gly488-Ly353 and Tyr491-Glu37, respectively, suggesting further laboratory tests on the combination of these molecules that can work at two interaction sites simultaneously. Additionally, the human furin protease, cleaving the S1-S2 domains involved in entering the host cell, may become the third target. CUBUK et al. [133] docked five drug molecules, favipiravir, hydroxychloroquine, remdesivir, lopinavir and ritonavir, on not only S protein and main protease but also human furin protease. The results of molecular docking revealed that the human furin protease can be a potential target of SARS-CoV-2, whereas remdesivir, a nucleic acid derivative, can be used as a template for designing novel furin protease inhibitors to fight against the disease. Taking advantage of the DrugBank and PubChem, Unni et al. [134] identified Bisoxatin (DB09219), a laxative drug, as a promising repurposable drug to develop a new chemical compound for inhibiting SARS-CoV-2 entry into the host, even though Bisoxatin was used to treat constipation and preparation. GR 127935 hydrochloride hydrate, GNF-5, RS504393, and eptifibatide acetate were found to connect to viral binding motifs of ACE2 receptor by Tomar et al. [135]. Table S6 presents the summary of drugs or inhibitors targeting SARS-CoV-2 S protein and ACE2.
Many computational approaches also focused on potential SARS-CoV-2 inhibitors targeting M protein, N protein and E protein, which were believed to be useful for further structure-based VS and other CADD drug and vaccine design. Dong et al. [136] searched the homologous templates of all structural proteins of SARS-CoV-2, including S, E and N proteins. Banerjee et al. [137] recognized micromolecules of inhibitors targeting M protein and E proteins of SARS-CoV-2 by integrating docking and simulation methods. They investigated some compounds from an Indian medicinal plant source (Azadirachta indica or Neem) and found 70 compounds against these two proteins. With molecular dynamics simulations, a few common compounds binding to both M and E proteins were recognized as potentially inhibit their functions. Table S6 lists drugs or inhibitors targeting SARS-CoV-2 proteins with essential information.
CADD against SARS-CoV-2: Targeting the nonstructure protein SARS-CoV-2 nonstructure proteins can be potential targets to inhibit SARS-CoV-2 as well. For example, RdRp, as shown in Figure 5C, plays a crucial role in the viral cycle of coronaviruses, particularly the replication of the viral genome, with the assistance of nonstructure proteins, NSP7 and NSP8, in a polymerase complex. It is not surprising to see that RdRp has been recognized as an important coronavirus target for drug design. Since SARS-CoV-2 has high similarity with other SARS viruses, targetbased VS and molecular docking on antiviral molecules of the SARS explored that the antiviral galidesivir had promise against SARS-CoV-2 as well [138]. Quinupristin was identified as one candidate which can bind in the RNA tunnel of RdRP and block the path and access on both sides with potentials to prevent viral replication and RNA synthesis [139]. Wu et al. [140] systematically compared SARS-CoV-2 genes encoding proteins with that from other coronaviruses, then predicted and built 19 structures with homology modeling. Based on ZINC drug database and their own NPs database, they found 78 antiviral drugs for SARS-CoV-2, which are currently on the market or undergoing clinical trials.
Helicase is another macromolecule viral replication enzyme, responsible for separating DNA and RNA into two singlestranded nucleic acids in the coronaviruses viral cycle unwinding ( Figure 5D). Some studies have also suggested drugs and NPs as potential SARS-CoV-2 helicase inhibitors. For example, one study suggests that vapreotide and atazanavir, two approved drugs for treating AIDS-related diarrhea and HIV infection, are observed to interrupt the activities of the SARS-CoV-2 helicase significantly [141]. Mirza et al. [142] have proposed an integrative VS and molecular dynamics simulations approach for targeting the main protease, RdRp and helicase, which warrants in vitro testing to evaluate compound efficacy.
Iftikhar et al. [143] focused on a small molecule that specifically binds to three essential proteins (RdRp, 3CL pro and helicase). They found three FDA-approved drugs binding to 3CL pro , one drug-like molecule binding to RdRp, and two drug-like molecules specifically interacting with helicase.
The poly-ADP-ribose polymerase 1 (PARP1, shown in Figure 5E) is also critical for viral replication [144][145][146]. Ge et al. [147] developed a data-driven drug repositioning framework combining ML and statistical analysis approaches to explore potential drug candidates against SARS-CoV-2, by integrating their large-scale data including knowledge graphs and transcriptome data from public domain and literatures. Based on the model, CVL218, a PARP1 inhibitor, was recognized as the repurposed therapeutic agent for COVID-19.
The host serine protease TMPRSS2 has a pivotal role in the viral entry of SARS-CoV-2 ( Figure 5F). In the study conducted by Singh et al. [89], they uncovered the strong binding affinity between TMPRSS2 and compounds, glucogallin, mangiferin, N3, remdesivir and X77. Among them, mangiferin showed the lowest binding free energy, followed by phlorizin and glucogallin.

Conclusion
Since the outbreak of COVID-19, people around the world have put much effort into investing vaccines and drugs against SARS-CoV-2. CADD and ML techniques have been employed in many studies to target SARS-CoV-2 macromolecules, which are considered as feasible options to speed up the processes for drug design and discovery. Our paper reviewed the theory and applications of these approaches with specific databases from these studies. We explored the new findings of inhibitors as potential interventions and treatments of COVID-19.
However, considering the variations of SARS-CoV-2, we are still facing big challenges to make sure that developed vaccines and drugs can keep efficient for different viral strains with specific mutations. It is known that structural variations on or even close to the binding sites could dramatically impact ligand binding properties. Gossen et al. [152] redefined the druggability of the proteins as an integrated chemical space generated by multiple conformations of binding sites when ligand binding. This process revealed the unique blueprint of SARS-CoV-2 M pro , leading to a definition of a pharmacophore based on the specific structure, which provides a strong foundation for rational drug design for SARS-CoV-2 M pro . Ugurel et al. [153] analyzed 3458 SARS-CoV-2 genome sequences isolated from 58 countries. They found the incidence of C17747T and A17858G mutations on helicase (NSP13) were significantly higher than others. However, four drugs, including cangrelor, fludarabine, folic acid and polydatin, interrupted both the wild type and mutant SARS-CoV-2 helicase, suggesting that they can be the most potent drugs. We expect that our review can bring insight to identify antiviral inhibitors and potential drug candidates against diverse SARS-COV-2 variants.

Key Points
• Discovering potential inhibitors or drugs of SARS-CoV-2 is critical in mitigating the pandemic impact of COVID-19.
• We give a brief overview of existing computer-aid drug design methods and biological databases used in predicting drugs or inhibitors.
• We provide a systematic review of current knowledge, latest findings using computational methods to discover protein inhibitors of SARS-CoV-2.

Supplementary data
Supplementary data are available online at https://academi c.oup.com/bib.