SARS-CoV-2 and COVID-19 epidemiology and immunopathology

Three highly pathogenic strains of beta coronaviruses (CoVs) with high mortality have emerged during the past two decades as a result of zoonotic transmission. The first two, severe acute respiratory syndrome virus (SARS-CoV-1) and the Middle Eastern respiratory syndrome virus (MERS-CoV), emerged in 2002 and 2012 respectively (Zhu et al. 2020). SARS-CoV-2 is a new beta coranavirus with positive single-stranded RNA that emerged at the end of 2019 in the Hubei province of China and causes coronavirus disease 2019 (COVID-19) (Zhu et al. 2020; Mackenzie et al. 2020). The incidence of COVID-19 continues to increase and as of August 21, 2022, 600,761,268 infected cases, including 6,471,618 deaths, have been reported globally (COVID Live - Coronavirus Statistics - Worldometer 2022).

The genome organization of SARS-CoV-2 is similar to CoV-1 and other coronaviruses (Chen et al. 2020). Open Reading Frames (ORFs) are found in all beta-coronavirus, including SARS-CoV-2. This comprises ORF1ab, which encodes the vast majority of enzymatic proteins, surface spike glycoproteins (S), envelope proteins (E), membrane proteins (M), and nucleocapsid proteins (N). Other proteins consist of nonstructural proteins encoded by ORF3a, ORF6a, ORF7, and ORF8 and ORF10a. CoVs have a far lower rate of nucleotide alterations than other RNA viruses do because of an enzyme that repairs replication faults (Denison et al. 2011). Even so, there are now reports of multiple variants emerging around the world as the SARS-CoV-2 pandemic continues (Makoni 2021; Korber et al. 2020; Hodcroft et al. 2020; Tang et al. 2021a, b; Quarleri et al. 2021), which are classified by the World Health Organization (WHO) into two types: variants of concern (VOC) and variants of interest (VOI).

SARS-CoV-2 may be transmitted from person to person or sometimes indirectly through contaminated surfaces (Liu et al. 2020). However, SARS-CoV-2 is mostly transmitted via respiratory droplets distributed by coughs, sneezes, or even while conversing. In addition to nonspecific symptoms including headache, fatigue, and muscle pain, as well as digestive problems such as diarrhea and vomiting, fever, cough, shortness of breath, and other breathing difficulties are the most common clinical symptoms of COVID-19 patients (Alimohamadi et al. 2020).

Age (> 60 years), gender, smoking history, prior pneumonia, and major concomitant diseases are linked with COVID-19 mortality (such as immunocompromised states, chronic cardiovascular, cerebrovascular, pulmonary, kidney disease, diabetes mellitus, fulminant inflammation, lactic acid accumulation, and thrombotic events) (Goodman et al. 2019; Henkens et al. 2022; Lacedonia et al. 2021). Despite the global effort to discover various aspects of SARS-CoV-2, such as clinical manifestations, epidemiology, mortality and morbidity, and diagnostics, there are still numerous gaps in our understanding of this disease, and many perspectives with respect to host immune response towards COVID-19 remain unknown.

The pathophysiology of SARS-CoV-2 caused pulmonary illness is substantially similar to that of SARS-CoV-1 and MERS-coV (Zhu et al. 2020; Zhou et al. 2021). Damage to infected lung cells leads to hypoxemia and plasma exudate in alveolar spaces (Bösmüller et al. 2021). Histopathological examinations have revealed hyaline membranes, mononuclear and macrophage infiltration of air gaps, and thickening of the alveoluminal wall (Bösmüller et al. 2021; Mason 2020; Pandey and Agarwal 2020). SARS-CoV-2 also disrupts normal immune responses, leading to an impaired immune system and uncontrolled inflammatory responses in severe and critical patients with COVID-19 (Tay et al. 2020; Rubio-Casillas et al. 2022). These individuals show lymphopenia, lymphocyte activation and dysfunction, granulocyte and monocyte abnormalities, elevated cytokine levels, and a rise in immunoglobulin G (IgG) and total antibodies (Gibson et al. 2020; Lippi and Plebani 2020; Yang et al. 2020a). (Merad et al. 2022) discussed two broad hypotheses: inability to mount a timely antiviral response and to control SARS-CoV-2–driven inflammatory responses to explain severe COVID-19 pathophysiology. Hypercoagulation, endothelial damage, and arterial and venous embolism are very commonly seen in severe COVID-19 (Livanos et al. 2021).

To identify viral infection, the innate immune system employs a variety of pattern recognition receptors (PRRs) while alveolar macrophages patrol the respiratory tract’s lumen and serve as the first line of defense (Madden and Diamond 2022). Coronaviruses are detected by cell types that express endosomal TLR3 and TLR8 and also by specialized immune cells such as plasmacytoid dendritic cells (pDCs) through Toll-like receptor 7 (TLR7) (Merad et al. 2022). Cytosolic RNA sensors such as RIG-I and MDA5, or RIG-I–like receptors (RLRs) within infected cells, recognize dsRNA intermediates during viral replication. Signaling downstream of TLRs and RLRs promotes IRF3/IRF7-dependent transcription of type I and type III interferons (IFNs), as well as NF-B-dependent pro-inflammatory cytokines and chemokines (Merad et al. 2022). A small number of severe patients have been found to have “loss of function” variants in loci that control TLR3- and IRF7-dependent type I IFN immunity (Zhang et al. 2020). Through the expression of a number of viral proteins that block these pathways, SARS-CoV-2 is capable of evading innate recognition, signaling, IFN induction, and IFN-stimulated genes (ISGs) (Kasuga et al. 2021). Patients with genetic mutations or autoantibodies that interfere with IFN pathways suffer from life-threatening COVID-19 disease. pDCs, effector cells like NK cells and alveolar macrophages. and adaptive effector T cells that mediate viral clearance get severely depleted in patients with severe COVID-19 (Sanchez-Cerrillo et al. 2020; Lucas et al. 2020; Wilk et al. 20202021; Zhou et al. 2020; Mathew et al. 2020). Contrary to IFN-I- and IFN-III-mediated early antiviral defense that is hindered, pro-inflammatory cytokines and chemokines also get dramatically elevated (Israelow et al. 2020).

To better understand mechanisms of protection against pathogen, to create immune-based therapeutics, and to design and administer vaccines, it is critical to understand the impact of the acquired immune response in the form of antibody (Ab) and cell-mediated immunity. Antibodies block binding of distinct epitopes mainly in the receptor-binding domain (RBD) of the S protein (primarily to sites 1a and 1b) to the ACE2 receptor and can promote effector function by binding to the complement and Fc receptors (Zohar and Alter 2020). Ab responses of varying magnitudes are observed in infected, preimmune, and vaccinated people, most likely reflecting differences in antigenic load and exposure (Piccoli et al. 2020; Lucas et al. 2021). Cell-mediated immunity protects an organism through the activation of antigen-specific cytotoxic T cells that induce apoptosis in cells displaying foreign antigens, such as virus-infected cells, cells with intracellular bacteria, and cancer cells displaying tumor antigens, and promoting the production of antibodies through their effect on B cells. T cell responses to SARS-CoV-2 are crucial factors for recognizing and killing infected cells (Weiskopf et al. 2020). One of the constraints imposed by severe COVID-19 is extensive T cell lymphopenia, which impairs the host’s ability to develop a powerful immunological response. T cells from patients with severe COVID-19 have been shown to have antigen-dependent phenotypic differences associated with differential cytokine secretion (Altmann and Boyton 2020).

HLA overview

The genes encoding the human leukocyte antigen (HLA) molecules have been a primary focus in genetic association studies across a wide range of infectious and immune-mediated diseases due to their critical role in antigen presentation to T cells. The human major histocompatibility complex (MHC) in humans is termed HLA (human leukocyte antigen) because antigens were first found utilizing alloantibodies against leukocytes (Choo 2007). HLA molecules are well-known as transplantation antigens (Crux and Elahi 2017). Human MHC aligns to 6p21 and covers 3600 kilobases of DNA (Tripodis et al. 1998). All the loci of the HLA complex are grouped into four categories of gene status, classified as protein coding, gene candidates, non-coding RNAs, and pseudogenes (Shiina et al. 2009). Human leukocyte antigens are of three main types. Class I HLA antigens include HLA-A, HLA-B, and HLA-C molecules, and are found on the surface of all nucleated cells; class II, which includes HLA-DR, HLA-DQ, and HLA-DP loci, are expressed on the surface of professional antigen-presenting cells; and class III contains genes for proteins that have diverse immune functionality (The MHC sequencing consortium 1999). The HLA-A, HLA-B, and HLA-C class I and HLA-DP, HLA-DQ, and HLA-DR class II molecules exhibit an extraordinary diversity. More than 18,000 class I and 7000 class II alleles have been defined (Choo 2007). Most of their sequence variability is in HLA class I and II antigen-binding sites, indicating diverse and specialized interactions with TCRs and KIR. HLA-E, HLA-F, and HLA-G, the non-classical HLA class I genes, have limited polymorphism and a tissue-restricted expression pattern (Buhler et al. 2016). All the non-classical class I genes encode ligands recognized by NK cell receptors. The HLA class III region is between the class I and II regions and contains genes encoding complement components, heat-shock proteins, and tumor necrosis factor α (TNFα) (Djaoud and Parham 2020). Since HLA genes are closely linked, the entire MHC is inherited as an HLA haplotype in a Mendelian fashion from each parent (Shiina et al. 2009). Possible random combinations of antigens from different HLA loci on an HLA haplotype are enormous, but certain HLA haplotypes are found more frequently in some populations than expected by chance because of linkage disequilibrium (Choo 2007; Meyer et al. 2018). For example, HLA-A1, B8, DR17 is the most common HLA haplotype among individuals with European ancestry, with a frequency of 5% (Abdou et al. 2010). Amino acid variations in several regions change the fine shape of the groove and thus alter the peptide-binding specificity of HLA molecules (Abdou et al. 2010). HLA antigens vary widely by ancestry. This variation of HLA polymorphism may have evolved under distinct geographical pressures. This may be connected to the role of HLA molecules in presenting infectious pathogens in diverse regions (Markov and Pybus 2015).

Almost all nucleated cells have HLA class I molecules on their surface. Only B lymphocytes, antigen-presenting cells (monocytes, macrophages, and dendritic cells), and activated T lymphocytes express class II molecules (Shiina et al. 2009). Class I and class II bind to different natures and sources of peptides. Class I–restricted T cells identify endogenously generated antigens (e.g., cellular, transformed, or virus-induced proteins), while class II–restricted T cells recognize exogenously derived antigens.

The interaction between the HLA complex and the foreign antigen activates T lymphocytes. Upon activation, T cells proliferate and begin to release cytokines, which allows them to initiate an immune response that will recognize and eliminate cells bearing the same foreign antigen/HLA complex when they are next met. The enormous diversity of human leukocyte antigen (HLA) molecules is essential for T cells to receive a variety of antigens. T cell receptors identify the conformational structure of the HLA antigen-binding grove and antigen peptides (Shi et al. 2020; Lee and Koohy 2020; Dutta et al. 2018). The structural determination of disease-relevant peptide-HLA and HLA-peptide-TCR complexes is crucial for the elucidation of the molecular mechanisms responsible for the development of specific protective immunity against pathogens, as well as for the deleterious T cell reactivity that promotes disease. The HLA-peptide-TCR interactions that underpin self and non-self-discrimination are guided by a set of rules that are also implicated in disease. For example, autoreactive T cells are directly generated in autoimmune diseases by mechanisms such as atypical HLA-peptide-TCR binding orientation, low-affinity peptide binding that facilitates thymic escape, TCR-mediated stabilization of weak peptide-HLA interactions, presentation of peptides in a different binding register, post-translational epitope modification, generation of hybrid peptides, and processes that regulate HLA expression and stability (Dendrou et al. 2018). Majorly, HLA can affect three cytotoxic T lymphocyte (CTL) features, which in turn allows for an efficient antiviral response during chronic viral infections. First, targeting a conserved viral sequence or epitope rather than a variable region will increase the effectiveness of CTLs in regulating viral infection (Elahi et al. 2011). Another characteristic that increases the efficiency of viral clearance is polyfunctionality. For instance, CD8 + PD-L1 + CXCR3 + polyfunctional T cell abundances are associated with survival in patients with severe SARS-CoV-2 infection (Adam et al. 2021). Third, the proliferative capability of CTLs is related to their proliferation potential, which may be correlated with their ability to destroy infected targets (Kunwar et al. 2013). Therefore, the immune response may be either beneficial or negative depending on which of these properties an HLA allele can alter.

In humans, the HLA system orchestrates immune regulation. HLA expression during infection may depend on number of other factors, including the presence of specific polymorphic sites in the promoter region that can be targeted by transcription factors; the degree to which post-transcriptional factors may degrade HLA mRNA, particularly microRNAs that can target polymorphic sites at 3′UTR; the structural characteristics of the HLA molecule encoded by that person’s alleles that are available to interact with HLA receptors; and the role of neighboring or distant genes epistatically acting on the HLA gene. Investigating these pathways in depth might help us to comprehend the varying complexity behind immune responses. It is advantageous to have enhanced binding capabilities of HLA molecules for viral peptides such as from SARS-CoV-2, on the cell surface of antigen-presenting cells. For example, Zhang et al. (2022) showed an increased HLA-B18:01/B44:03 allelic fold change in SARS-CoV-2-infected A549 cells (Zhang et al. 2022). New SARS-CoV-2 mutations may modify HLA expression, which may help explain immune response evasion. Benlyamani et al. observed that downregulation of HLA-DR molecules in circulating monocytes creates immunosuppressed circumstances (like lymphopenia) for host response in critically ill individuals (Benlyamani et al. 2020). Thus, HLA haplotypes could be linked to distinct genetic predispositions to disease. Specific HLA genotypes can stimulate the T cell–mediated antiviral response differently and could possibly alter the symptoms and transmission of the disease (COVID-19 Host Genetics Initiative 2020) HLA-B*46:01 possesses the fewest SARS-CoV-2 binding peptides, indicating that persons with this allele could be vulnerable to COVID-19, as found with SARS. HLA-B*15:03 showed the best ability to present highly conserved SARS-CoV-2 peptides shared among human coronaviruses, indicating that this allele may facilitate cross-protective T cell–reliant immunity (Nguyen et al. 2020).

Viruses like hepatitis B virus and SARS-CoV-2 have also been reported to affect cellular splicing machinery (Francies and Dlamini 2021; Wang et al. 2022; Thompson et al. 2020). The virus produces splicing variants that encourage cell division, block signaling pathways, inhibit tumor suppressors, change gene expression through epigenetic alteration, and produce immune-evading mechanisms (Francies and Dlamini 2021). HLA alternate splicing and expression can be impacted by mutation, polymorphism, and gene deletion; for instance, the 14 bp INS/DEL polymorphism is involved in alternative splicing, which results in a better yield of stable HLA-G mRNA transcripts in vitro. Because HLA plays such an important role in the immunological response to infections and the development of infectious illnesses, we hypothesize that population HLA variability could be connected with COVID-19 occurrence. The HLA system affects clinical outcomes in multiple infectious diseases, including HIV and SARS as discussed later in the review (Bardeskar and Mania-Pramanik 2016; Barquera et al. 2020; Kulpa and Collins 2011).

In general, host genetic variability may help explain the multiplicity of immune responses to a virus within a community. Knowing how variability in HLA can impact the progression of COVID-19, in particular, may help distinguish individuals at higher risk for the disease. In the following section, we detail genetic findings relevant to SARS-CoV-2 infection and COVID-19 disease outcomes, with specific emphasis on HLA. In addition to this, in silico epitope prediction contributes new knowledge to the creation of a COVID-19 vaccine (Grifoni et al. 2020).

HLA associations in viral disease

Numerous studies have established links between variation in HLA and susceptibility to, and disease course in, viral disease. HLAs play a complex role in immunomodulation during HIV infection, and variations at the HLA class I locus have been linked to the efficiency of CD8 + T cell control of viremia (Martin and Carrington 2013). Polymorphisms within HLA class I and II loci have been identified as the host genetic modifier of HIV disease progression in several populations. HLA-B*39 allele frequency was considerably greater in HIV-1-positive participants compared to controls, but the HLA-B*44 allele was not present in the Argentinian population’s HIV-1-positive subjects (Sorrentino et al. 2000). In a Zambian early infection cohort, two HLA class II alleles (DRB1*15) and four HLA class I alleles (B*14:01, B*57, B*58:01, and B*81) were discovered to be linked with protection from rapid CD4 + T cell decline without affecting early plasma viral load (Claiborne et al. 2019). A GWAS showed HLA-B ∗ 57:01 rs2395029 and HLA-C rs9264942 and HLA-B ∗ 27:02 associated with HIV-1 disease progression in Caucasian subjects (Fellay et al. 2009; Teixeira et al. 2014) while the presence of the HLA-B ∗ 52 allele was shown to be favorable to slow AIDS progression in Brazilian subjects (Sorrentino et al. 2000).

A number of HLA alleles are identified as protective (A*03:01, B*15:39, B*27:05, B*39:02, B*57:01/02/03, HLA-B*44, and B*58:01) and risk (A*68:03/05, B*15:30, B*35:02, B*35:12/14, B*39:01/06, B*39:05, and B*40:01) factors for disease progression (Valenzuela-Ponce et al. 2018).

Infection with hepatitis B virus (HBV) increases the risk of cirrhosis and liver cancer (Yuen et al. 2018). HBV-specific CD8 + cytotoxic T cells are crucial for viral clearance and liver damage, and HLA polymorphisms modify their responses. For example, HLA-DRB1*13:02 was associated with protection against persistent HBV infection among children and adults in Gambia (Thursz et al. 1995) while HLA-DQ rs9275319C, HLA-DQB1*06:03, HLA-A*03:01, and HLA-DQB1*06:03 allele were found to decrease HBV infection risk and an increased HBV clearance (Ou et al. 2018; Fan et al. 2016). HLA-DPB1 rs9277535A and HLA-B*08 were associated with the risk of persistent HBV infection (Akgöllü et al. 2017; Thio et al. 2003) while HLA-DQB1*06:01 was associated with chronic HBV infection in Japanese patients (Nishida et al. 2016).

Likewise, infection with the hepatitis C virus (HCV) is a substantial contributor to both acute and chronic hepatitis. Hepatocellular carcinoma and liver cirrhosis are brought on by chronic hepatitis (HCC) (Saito et al. 2018). HLA alleles (B*57:01, B*57:03, Cw*01:02, and DRB1*01:01) were associated with the absence of HCV RNA in a large multiracial cohort of US women (Kuniholm et al. 2010). HLA-A*02:01 and HLA-DRB1*11:01 were associated with HCV spontaneous clearance in the Chinese population (Huang et al. 2016) and HLA-DQB1*02:02 and HLA-DRB1 allele associations with HCV were reported (DRB1*03:01:01 and DRB1*13:01:01 alleles as the risk of progression to chronic hepatitis C infection and DRB1*04:01:01, DRB1*04:05:01, DRB1*07:01:01, and DRB1*11:01:01 and protection against HCV infection) (El-Bendary et al. 2019; Vejbaesya et al. 2000).

HLA-A (HLA-A*02:01), the amino acid polymorphism at position 107 of HLA-A protein (HLA-A Gly107) and HLA-B (rs9266089), has been accounted for association with chickenpox. Disease associations have also been seen in the class III region for variant rs41316748), and IFNA21 (rs7047299) in a GWS association study with shingles (Tian et al. 2017). IFNA21 encodes type I interferon and is mainly involved in innate immune response against viral infection. It has been shown to be involved in the pathogenesis of rubella (Mo et al. 2007) and may also influence susceptibility to asthma and atopy (Chan et al. 2006). The amino acid polymorphisms HLA-A Gln43 (rs114193679) and HLA-A*02:05 have been found to be significantly associated with mumps. A variant in the HLA class I region (rs3131623) HLA-DRB1 LS11 in class II region, rs3131623, HLA-DRB1 LS11 has showed GWAS association with pneumonia (Tian et al. 2017).

HLA associations in SARS-CoV-2 infection and COVID-19 disease outcome

Particularly early in the pandemic, many investigations of the role of HLA in disease focused on bioinformatic prediction of HLA binding affinity to SARS-CoV-2 peptides. These studies have shown that mutations in numerous viral strains, mostly in the Spike protein, alter the affinity between these mutant peptides and HLA molecules (Augusto and Hollenbach 2022). For example, The HLA-B*46:01 allele has a low binding affinity, suggesting that subjects with this allele may have a higher risk of developing the more severe forms of COVID-19 (Migliorini et al. 2021). In contrary, the HLA-B*15:03 allele is reported to have the highest binding affinity for viral peptides (Nguyen et al. 2020; Lin et al. 2003). A recent study examining HLA binders reported that five B22 alleles (B*54:01, B*55:01, B*55:07, B*55:12, and B*56:01) were among the 94 weakest HLA-B binders to SARS-CoV-2, further suggesting B22 as a susceptibility marker (Barquera et al. 2020). An in silico analysis of viral peptide-major histocompatibility complex (MHC) class I binding affinity was conducted by Nguyen et al., which revealed that HLA-A*02:02, HLA-B*15:03, and HLA-C*12:03 effectively presented a larger amount of peptides whereas A*25:01, B*46:01, and C*01:02 were the least efficient for of SARS-CoV-2 peptide presentation (Nguyen et al. 2020). Finally, another bioinformatic prediction revealed high binding affinity between SARS-CoV-2 epitopes and the HLA-A*02:06, HLA-B*52:01, HLA-A*24:02, HLA-A*02:01, and HLA-C*12:02 alleles (Kiyotani et al. 2020).

More recent investigations have suggested an association between HLA variation and COVID-19 outcomes, and between HLA genotypes and varying immune responses to SARS-CoV-2. While the number of studies continues to increase, the need for studies in more diverse global regions is evident. HLA Association studies for COVID-19 reported as of this writing by geographic area are shown in Fig. 1.

Fig. 1
figure 1

Map showing number of significant association studies for HLA loci in different geographic regions

Studies with candidate gene approaches suggested roles for several alleles in the HLA region, C*07:29, B*15:27, B*27:07, DRB1*15:01, DQB1*06:02, C*06:02, and DRB1*07:01 in Novelli et al. (Novelli et al. 2020); HLA-C*07:29 and HLA-B*15:27 in (Wang et al. 2020a); and HLA-C*04:01 in (Littera et al. 2020) and in genes associated with the viral cell. HLA-DRB1*08 conferred risk for death due to COVID-19 (Amoroso et al. 2021). Peptide-binding prediction analyses showed that these DRB1*08 alleles were unable to bind any of the viral peptides with high affinity. The extended haplotypes HLA-A*02:05, B*58:01, C*07:01, and DRB1*03:01 were also shown to have a protective effect against SARS-CoV-2 infection in the Sardinian population (Littera et al. 2020).

Several HLA associations in COVID-19 outcomes have also been inferred from genome-wide studies. HLA-A*11:01 (p value = 0.009, OR 2.3), HLA-B*51:01 (p value = 0.007, OR 3.3), and HLA-C*14:02 (p value = 0.003, OR 4.7) were identified as top signals in the HLA class I region in the Chinese population by deeply sequencing and analyzing 332 COVID-19 patients categorized by varying levels of severity (Wang et al. 2020b). HLA-A*11:01:01:01 was also identified as a risk factor for COVID-19 severity (p value = 0.003, OR 3.4), in a study involving 190 Japanese patients and 423 controls, after controlling for comorbidities and other confounding factors (Khor et al. 2021). Three different groups conducted association analyses against a healthy control group to identify susceptibility of infection to SARS-CoV-2 compared 82 COVID-19 vs 3548 controls from China and found HLA-B*15:27 as associated (p value = 0.001, OR 3.6) (Wang et al. 2020a), compared 99 COVID-19 vs 1017 controls from Italy and found 3 significant association (HLA-B*27:07, p value = 0.00001; HLA-DRB1*15:01, p value = 0.002; HLA-DQB1*06:02, p value = 0.0001) (Novelli et al. 2020).

Examination of HLA associations was also performed by the meta-analysis on COVID-19 severity performed by the COVID-19 Human Genetic Initiative (HGI), where a variant in HLA-G was found to be significantly associated but not replicated in a genome-wide association study of 2244 critically ill patients with COVID-19 from 208 UK intensive care units (Douillard 2021; Pairo-Castineira et al. 2021). Five variants reaching statistical significance of association within the CCHCR1 gene, situated 110 kb downstream of HLA-C (top SNP: rs111837807, p value = 2.2 × 10 − 11, OR meta 1.23) as well as a variant within HLA-DPB1 3′UTR (rs9501257, p value = 4.1 × 10 − 8, OR meta 1.19), when comparing the general population to patients with critical COVID-19 (n cases = 8779, n control = 1,001,875, from 25 studies of various ancestries) (Douillard 2021) were identified.

A recent extended GWAS meta-analysis of a well-characterized cohort of 3255 COVID-19 patients with respiratory failure and 12,488 population controls from Italy, Spain, Norway, and Germany/Austria, including stratified analyses based on age, sex, and disease severity, yielded no association for HLA loci at the genome-wide or nominal (p < 10–5) significance threshold, indicating no major role for HLA variability in mediating the severity of COVID-19 in our cohorts (Degenhardt, et al. 2022). In total, 29 HLA alleles—eight protective and 21 risk alleles—have been identified as being linked with COVID-19 susceptibility (Robinson et al. 2020). Figure 2 shows a summary of risk and protective HLA alleles identified across studies as of this writing.

Fig. 2
figure 2

Risk and protective alleles for HLA loci identified to date

In addition to classical HLA loci, several non-classical HLA associations with COVID-19 have been identified. Human leukocyte antigen-G (HLA-G) is a ligand for multiple immune inhibitory receptors, whose expression can be upregulated by viral infections (Jasinski-Bergner et al. 2022). Synergistic suppression effects induced by HLA-G/receptor signaling include the inhibition of cell proliferation and differentiation and the induction of cell apoptosis and senescence, resulting in significant decrease or even exhaustion of immune-competent cells such as T cells, NK cells, B cells, and macrophages in patients with COVID-19 (Lin and Yan 2021). Critical COVID-19 patients had a significantly lower frequency of CD4 + HLA-G + T lymphocytes compared with moderate/severe COVID-19 patients. The increased amount of immunomodulatory HLA-G + cells may reduce the severity of the disease in moderate/severe COVID-19 patients compared with critical COVID-19 patients (Ramzannezhad et al. 2022). Furthermore, it is suggested that HLA-G 14-bp Ins/Del polymorphism is associated with COVID-19 risk (Ad’hiah and Al-Bayatee 2019).

SARS-CoV-2 Nsp13232–240 presented by HLA-E abrogates inhibition of NKG2A + NK cells. HLA-E/Nsp13232–240 complexes are formed efficiently but fail to engage NKG2A, thereby enabling unleashed NKG2A + NK cell effector functions by missing self-recognition (Hammer 2022). NP105-113-B*07:02-specific T cell responses associate with mild disease and high antiviral efficacy, pointing to inclusion for future vaccine design (Peng et al. 2022).

While HLA-C presents various antigens to cytotoxic T lymphocytes (CTLs), it also plays a key role in modulation of natural killer (NK) cell activity, which are primary cells in viral and inflammatory diseases (Kulpa and Collins 2011). A study reported the higher rate of HLA-C methylation in severe patients compared to moderate, critical, and total patients (p < 0.05). Both male and female patients had lower methylation levels than healthy volunteers. (Sharif-zak 2022) Antiviral defense relies heavily on natural killer (NK) cells. Killer cell immunoglobulin–like receptors (KIR) and associated HLA class I ligands are critical regulators of NK cell development and function. Inadequate NK cell maturation and reduced anti-SARS-CoV-2 defense are likely because of a lack of KIR3DL1 + HLA-Bw4 + and KIR3DL2 + HLA-A3/11 + combinations, which cause COVID-19 (Maruthamuthu et al. 2022). The KLRC2 gene encodes NKG2C, an activating NK cell receptor that interacts to HLA-E on infected cells to activate NK cells. Naturally occurring heterozygous or homozygous KLRC2 deletion (KLRC2del) is connected to a considerably reduced or nonexistent amount of NKG2C expression. When compared to patients with minor symptoms, hospitalized patients were markedly overrepresented in both the KLRC2del allele and, to a lesser extent, the HLA-E*0101 allele (p = 0.0006 and p = 0.01, respectively). This was especially true for patients needing intensive care (p 0.0001 and p = 0.01). The two genetic variants each served as a separate risk factor for severe COVID-19 (Vietzen et al. 2021).

Non-HLA genetic findings in COVID-19

Variation in COVID-19 infection and mortality have been documented across different groups as the pandemic has progressed. For example, total cumulative data shows that Black, Hispanic, American Indian, and Native Hawaiian people have experienced higher rates of COVID-19 cases and deaths than White and Asian people when data was adjusted to account for differences in age by race and ethnicity (COVID-19 cases and deaths by race, ethnicity 2022). COVID-19 is associated with a number of risk factors, including human genetic polymorphisms. Polymorphisms in ACE2, TMPRSS2, CCR5, ACE1, ABO locus, CCR5, APOE, IL-6, IL-10, MBL2, TLR4, NLRP3, TNFA, SIR, IL1RN, IL1B, CX3CR1, and INK4A/ARF have been reported to be potential risk predictors of COVID-19 (Feng et al. 2022; Dieter et al. 2022).

Some genes with top associations in GWAS studies have also been reported: ACE2 (p value = 3.63 10), LZTFL1 (p value = 1.05 × 10−80), SLC6A20 (p value = 1.82 × 10−47), CCR3 (p value = 1.86 × 10−19), ABO (p value = 1.15 × 10−9), TYK2 (p value = 3.7 × 10−9), OAS1 (p value = 2.73 × 10−10), FOXP4 (p value = 3.41 × 10−11), KANSL1 (p value = 1.7 × 10−11), TAC4 (p value = 3.88 × 10−9), TMPRSS2 (p value = 1.6 × 10−6), DPP9 (p value = 6.19 × 10−22), CCHCR1 (p value = 9.46 × 10−10) (https://www.covid19hg.org/).

A recent extensive review provides all association studies on COVID-19 (Dieter et al. 2022), some key findings which we will summarize here.

The gene encoding the key receptor ACE2 for SARS-CoV-2 binding is located on the X chromosome. Because males are hemizygous, they have a higher risk of overexpressing ACE2 mutations, resulting in increased susceptibility or disease severity (Feng et al. 2022). Two genome-wide association studies (GWAS) have shown that 3p21.31 and the 9q34 region containing ABO blood group sites are significantly associated with severe COVID-19 (Wu et al. 2020; Ellinghaus et al. 2020). TMPRSS2 gene polymorphisms (rs12329760, rs61735792, rs61735794) were significantly associated with COVID-19 infection (Monticelli et al. 2021; Torre-Fuentes et al. 2021). Furthermore, another study of an Italian population has shown that the furin encoded by the PCSK3 variant rs769208985 is associated with COVID-19 infection (Latini et al. 2020).

Independent of gender, age, or the severity of the illness, the IFNL4 gene (that encodes the interferon (IFN) lambda 4 protein) mutation rs12979860 is linked to the presence of COVID-19 in the Spanish population, which may be due to a decreased virus clearance (Saponi-Cortes et al. 2021). Variant rs738409 (GG genotype) in PNPLA3 gene (Patatin-like phospholipase domain-containing protein 3 PNPLA3) involved in triacylglycerol hydrolysis in adipocytes and the variant rs17047200 (TT genotype) of TLL-1 (Tolloid like-1) which is involved in activating complement system and spike protein cleavage have been correlated with SARS-COV-2-induced infection (Grimaudo et al. 2021). Subjects with PNPLA3 GG genotype have constitutive NLRP3 inflammasome upregulation and be more prone to SARS-CoV-2 tissue damage. TT genotype in TLL-1 gene may influence protease activity on SARS-CoV-2 Spike protein, increasing potential to infect or re-infect host cells (Grimaudo et al. 2021). TLE1 that encodes for TLE1 encodes for the transducin-like enhancer involved in negative regulation of negatively regulate NF-κB has been recently reported to be associated` with critical COVID-19 illness in females (Cruz et al. 2022; Kousathanas et al. 2022). Also, presence of X-linked deleterious variants in the TLR7 gene is reported to be causal for life-threatening COVID-19 only affecting males (Asano et al. 2021; Made et al. 2020; Fallerini et al. 2021). Fine-mapping shows a significant association with an independent missense variant in IL10RB, a receptor for type III (lambda) interferons (rs8178521). A lead risk variant in phospholipid scramblase 1 (chr3:146,517,122:G:A, rs343320; PLSCR1) which disrupts a nuclear localization signal that is important for the antiviral effect of interferons was also reported to be associated with critical COVID-19 outcomes (Kousathanas et al. 2022). Genes like BCL11A and TAC4 involved in lymphopoesis and differentiation of myeloid cells were also to be significantly associated in the same study (Kousathanas et al. 2022). Granulocyte macrophage colony-stimulating factor (GM-CSF) is strongly upregulated in critical COVID-19, and proposed to be target for therapy (Bonaventura et al. 2020). Mendelian randomization supported a causal involvement for coagulation factors (F8) and platelet activation (PDGFRL) in critical COVID-19 (Kousathanas et al. 2022).

A study by the Severe COVID-19 Consortium conducted a genome-wide association study of 1980 patients of European ancestry and found chromosome 3 (SLC6A20, LZTFL1, CCR9, FYC O 1, CXCR6, and XCR1) as well as in the ABO locus (with A as risk and O protective) as the only significantly associated loci (Severe Covid-19 GWAS Group et al. 2020). ELF5 has been recently reported as a new locus associated with critical illness in Europeans (Kousathanas et al. 2022). The DNA methylation status of 44 CpG sites was associated with the clinical severity of COVID-19. The gain-of-function risk A allele of a single-nucleotide polymorphism (SNP), rs17713054G > A, is identified as a probable causative variant and LZTFL1 as a candidate effector gene in pulmonary epithelial cells as contributing to the strong COVID-19 association at the 3p21.31 locus (Downes et al. 2021). Several studies imply that the androgen receptor (AR) pathway is implicated in the severity of SARS-CoV-2 infection (Cruz et al. 2022; Lamy et al. 2021). Finally, fine-mapping, colocalization, and TWAS studies show enhanced MUC1 expression due to rs41264915. This shows that mucins could play a therapeutically essential function in COVID-19 critical disease (Kousathanas et al. 2022).

Importantly, most of these association studies were conducted in European and East Asian cohorts, with a very few studies in South Asian cohorts (Srivastava et al. 2020b; Pandey et al. 2022; Srivastava 2020a; Ahmad et al. 2021). This highlights the necessity of conducting more association studies in increasingly diverse populations.

Future directions

It is abundantly clear that the HLA and COVID-19 link needs to be studied using more diverse patient populations. Due to a lack of diversity, relatively few HLA allele sets have actually been examined in relation to disease. The analysis of allelic and haplotypic variation in populations with varying patterns of linkage disequilibrium ought to aid in identifying those loci and alleles that are responsible for the observed HLA disease associations as more immunological data becomes available and DNA sequences are interpreted in the area of disease susceptibility. Moreover, the issue of genotyping ambiguity will become much more prevalent when more alleles (as well as extra exonic and intronic sequences for currently listed alleles) are discovered in more and more populations (Erlich 2012). Another potential drawback of the studies performed to date is difficulty in evaluating the significance of HLA type in connection to well-established risk variables for disease modification, such as age and clinical comorbidities (Migliorini et al. 2021; Li et al. 2020; Jain and Yuan 2020; Yang et al. 2020b; Guan et al. 2020).

Based on evidence for multigenic control of immune responses to infectious pathogens, there have been some studies where not just single HLA polymorphism but an whole extended haplotype has been reported to influence humoral responses even in vaccinated subjects (Pajewski et al. 2011; Ovsyannikova et al. 2010). Thus, in addition to specific genotypes, the entire HLA region and its associated haplotypes will determine susceptibility, making knowledge of these haplotypes crucial for the creation of future vaccines. Identification of high-risk subjects susceptible to SARS-CoV-2 infection could help to design more effective vaccines and treatments, reducing public health burden and prioritizing preventive medicine. In light of the available data, it seems prudent to include HLA testing in clinical trials, and to combine HLA typing with COVID-19 testing in order to more rapidly identify a predictor of viral severity in the population and to possibly adapt vaccination strategies to genotypically at risk populations.

Platforms for storing, processing, displaying, aggregating, and accessing data across numerous studies can be used for studying the etiology of human diseases, building predictive risk models and finding potential biomarkers and therapeutic interventions. Other GWAS portals like GRASP: Genome-Wide Repository of Associations Between SNPs and Phenotypes (Leslie et al. 2014), NHGRI GWAS catalog (Welter et al. 2014), HVGbaseG2P (Retter et al. 2005), and GWASdb (Li et al. 2016) have proven invaluable for genome-wide association studies. Due to the highly variable nature of the HLA region, and its complex nomenclature, a dedicated platform to aggregate HLA studies in SARS-CoV-2 infection and COVID-19 disease outcomes has been developed by our group and others (http://www.hlacovid19.org/database/). As we continue to unravel the fine details of the role of HLA variation in the ongoing pandemic, a resource dedicated to these studies should improve our ability to better understand the important role that these critical immune genes play in disease.

Finally, incorporation of data regarding HLA associations may aid in vaccine design and development. Binding of promiscuous epitopes to a range of human leukocyte antigen (HLA) alleles is crucial for immune control of COVID-19. When the pathogen epitope(s) triggering protective responses is unknown, genetic variation associated to resistance to infection or with less severe clinical manifestations are potentially powerful tools for the design of vaccines (Blackwell et al. 2009). For vaccine design, predictions can be made using immunoinformatic methods, which combine predictors of proteasomal processing, TAP transport, and MHC binding to produce an overall score indicating the intrinsic potential of each peptide as a T cell epitope (Cun et al. 2021). For example, an in silico analysis focused on vaccine design predicted the binding affinity of epitopes of the spike-glycoprotein to 66 common HLA class II alleles (frequency ≥ 0.01) and found that DPB1 had highest binding affinities, followed by DRB1 and DQB1. A cohort-based study, 17,440 participants of European-based ancestory, identifies HLA-A*03:01, with a strong association with poorer clinical reactions to Pfizer-BioNTech vaccines (Bolze et al. 2022) thus shows the importance of understanding the role of HLA variation in vaccine design.

Conclusion

It is becoming increasingly clear that HLA genotypes affect COVID-19 morbidity and mortality between individuals. Harnessing HLA function clinically is a challenging mission (Nguyen et al. 2020; Dendrou et al. 2018; Lin et al. 2003). Our knowledge of HLA function for clinical benefit has improved as a result of the advancements in characterizing HLA variation and HLA correlations with human disease (Tomita et al. 2020). Identifying the HLA genotype associated with the severity of COVID‐19 or susceptibility to SARS‐CoV‐2 may support future vaccination strategies to genotypically at‐risk populations. Individuals with high‐risk HLA types could be prioritized for vaccines against SARS‐CoV‐2, which may lead to effectively reduce the COVID‐19 morbidity and mortality (Tomita et al. 2020). Increased future cohort size or HLA-centric explorations may yield more meaningful signals. We also propose that more studies examining population-specific binding affinity predictions across different HLA alleles can help to assess the potential for cross‐protective immunity conferred by prior exposure. We hope that the studies detailed here may provide new insights into the COVID‐19 pandemic and improve our understanding of the relationship between HLA and COVID-19.