Article Text

Download PDFPDF

T cell receptor β repertoires as novel diagnostic markers for systemic lupus erythematosus and rheumatoid arthritis
  1. Xiao Liu1,2,
  2. Wei Zhang1,2,
  3. Ming Zhao3,
  4. Longfei Fu1,2,
  5. Limin Liu3,
  6. Jinghua Wu1,2,
  7. Shuangyan Luo3,
  8. Longlong Wang1,2,
  9. Zijun Wang3,
  10. Liya Lin1,2,
  11. Yan Liu3,
  12. Shiyu Wang1,2,
  13. Yang Yang3,
  14. Lihua Luo1,2,
  15. Juqing Jiang3,
  16. Xie Wang1,2,
  17. Yixin Tan3,
  18. Tao Li1,2,
  19. Bochen Zhu3,
  20. Yi Zhao1,2,
  21. Xiaofei Gao3,
  22. Ziyun Wan1,2,
  23. Cancan Huang3,
  24. Mingyan Fang1,2,
  25. Qianwen Li3,
  26. Huanhuan Peng1,2,
  27. Xiangping Liao4,
  28. Jinwei Chen5,
  29. Fen Li5,
  30. Guanghui Ling5,
  31. Hongjun Zhao6,
  32. Hui Luo6,
  33. Zhongyuan Xiang7,
  34. Jieyue Liao3,
  35. Yu Liu3,
  36. Heng Yin3,
  37. Hai Long3,
  38. Haijing Wu3,
  39. huanming Yang1,2,8,
  40. Jian Wang1,2,8,
  41. Qianjin Lu3
  1. 1 BGI, Shenzhen, China
  2. 2 China National GeneBank, Shenzhen, China
  3. 3 Department of Dermatology, Hunan Key Laboratory of Medical Epigenomics, The Second Xiangya Hospital of Central South University, Changsha, China
  4. 4 Department of nephropathy and Rheumatology, Chenzhou No.1 People's Hospital, Chenzhou, China
  5. 5 Department of Rheumatology, The Second XiangyaHospital of Central South University, Changsha, China
  6. 6 Department of Rheumatology, Xiangya Hospital of Central South University, Changsha, China
  7. 7 Department of Clinical Laboratory, The Second XiangyaHospital of Central South University, Changsha, China
  8. 8 James D. Watson Institute of Genome Sciences, Hangzhou, China
  1. Correspondence to Dr Qianjin Lu, The Second Xiangya Hospital of Central South University, Changsha, China; qianlu5860{at}


Objective T cell receptor (TCR) diversity determines the autoimmune responses in systemic lupus erythematosus (SLE) and rheumatoid arthritis (RA) and is closely associated with autoimmune diseases prognosis and prevention. However, the characteristics of variations in TCR diversity and their clinical significance is still unknown. Large series of patients must be studied in order to elucidate the effects of these variations.

Methods Peripheral blood from 877 SLE patients, 206 RA patients and 439 healthy controls (HC) were amplified for the TCR repertoire and sequenced using a high-throughput sequencer. We have developed a statistical model to identify disease-associated TCR clones and diagnose autoimmune diseases.

Results Significant differences were identified in variable (V), joining (J) and V-J pairing between the SLE or RA and HC groups. These differences can be utilised to discriminate the three groups with perfect accuracy (V: area under receiver operating curve > 0.99). One hundred ninety-eight SLE-associated and 53 RA-associated TCRs were identified and used for diseases classification by cross validation with high specificity and sensitivity. Disease-associated clones showed common features and high similarity between both autoimmune diseases. SLE displayed higher TCR heterogeneity than RA with several organ specific properties. Furthermore, the association between clonal expansion and the concentration of disease-associated clones with disease severity were identified, and pathogen-related TCRs were enriched in both diseases.

Conclusions These characteristics of the TCR repertoire, particularly the disease-associated clones, can potentially serve as biomarkers and provide novel insights for disease status and therapeutical targets in autoimmune diseases.

  • autoimmune diseases
  • systemic lupus erythematosus
  • rheumatoid arthritis
  • t cells

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Key messages

What is already known about this subject?

  • Clonal expansion and a reduction in diversity of the T cell receptor (TCR) repertoire have been identified in a small series of systemic lupus erythematosus (SLE) or rheumatoid arthritis (RA) patients. However, the clinical implication of this variation in the TCR repertoire still remains unclear.

What does this study add?

  • We conducted the most comprehensive and quantitative analysis on TCR beta repertoires to date, including data from 877 SLE patients, 206 RA patients and 439 healthy controls (HC).

  • We found significant differences in variable (V), joining (J) and V-J pairing between the SLE or RA and HC groups and developed a random forest model to classify the SLE, RA and HC using V and V-J genes with perfect accuracy.

  • We identified 198 SLE-associated and 53 RA-associated TCR clones which can discriminate between SLE, RA and HC with very high specificity and sensitivity and found that disease-associated clones correlated with clinical features.

How might this impact clinical practice or future developments?

  • These findings contribute to understanding the immunological aetiology and clinical heterogeneity of SLE and RA, as well as provide a biomarker for the diagnosis of SLE and RA and a target for disease treatment.


Systemic lupus erythematosus (SLE) is a prototypic autoimmune disorder, characterised by an excessive production of autoantibodies, immune complex formation and T cell infiltration into tissues and causes organ damage.1 Rheumatoid arthritis (RA) is an organ-specific autoimmune disease that is characterised by chronic synovitis inflammatory and bone erosion.2 T cells play an essential role in SLE and RA pathogenesis. Autoreactive T cells have been observed in the peripheral blood (PB) and various organs of patients with SLE and RA, where they are activated to secret inflammatory cytokines and provide help to reactive B cells.3–9

In T cell receptors (TCRs), the high diversity is generated by genomic rearrangement of the variable (V), diversity (D) and joining (J) regions, along with palindromic and random nucleotide additions, which is crucial for understanding of adaptive immunity in health and disease.10–12 In previous studies, expansion of the partial TCRs and reduction of TCR repertoire diversity were observed in the PB of SLE and RA patients, as well as a correlation between PB T cell expansions or spectra type skewing with disease activity.13–16 However, finite TCR sequences were obtained because of the limited techniques. Recently, although high-throughput sequencing of the TCR repertoire was used in SLE and RA,17–19 the small number of patient samples in these studies limited their findings. Whether the TCR repertoire or TCR clones can be used for biomarkers of SLE or RA remains unclear.

In this work, we provided the most comprehensive, quantitative and unrestricted immunogenetical landscape of the T cell receptor repertoire in PB samples from SLE, RA patients and healthy controls (HC) subjects, and identified novel biomarkers for the accurate diagnosis and monitor of SLE and RA.


Sample collection and preprocess

This study was approved by the ethics committees and institutional review board of the Second Xiangya Hospital of Central South University, and all study participants signed a written informed consent. SLE and RA patients were recruited from the outpatient clinics and ward in the Second Xiangya Hospital of Central South University, Xiangya Hospital of Central South University and Chenzhou No.1 People's Hospital. The HCs had no history of cancer, cardiovascular diseases, autoimmune diseases or known infectious diseases, and were collected from the Health Examination Centre of Second Xiangya Hospital. The baseline characteristics of all samples and the clinical information analysed in this study are presented in online supplementary table S1.

TCR repertoire sequencing

TCR repertoire was prepared from genomic DNA and the details of library construction, sequencing, data processing and the method of identifying disease’s specific clones were described in online supplementary note and table S2.


A statistical analysis was performed using R software (V.3.4.1). A non-parametric test (Mann-Whitney-Wilcoxon test) was used to compare the difference between two groups. For the multiple group comparison, analysis of covariance was initially used. Correction for multiple tests was performed using false discovery rate method.20


The overall TCR repertoire diversity and the expansion of public TCR clones in autoimmune diseases

The sample size of the current study is at least ten times larger than any previous researches regarding autoimmune diseases, though the rare-fraction analysis showed that the TCR clone diversities in HC and RA were less than saturation (figure 1A). Overall, 7.5, 2.7 and 8.1 million unique TCR clones were identified in SLE, RA and HC, respectively, and limited TCR clones were shared (figure 1B). Compared with the HCs, both the SLE and RA patients showed a decreased Shannon entropy, and the value in SLE were the smallest (figure 1C), which implies the existence of expanded TCR clones in the repertoire of typical autoimmunity. The accumulative frequencies of the top 100 or 50 TCR clones in the autoimmune disease patients were significantly higher than that in the HCs (figure 1C, online supplementary figure S1A, B), and the patients had more abundant clones (figure 1C). These comparisons indicate the higher clonality of autoimmunity.

Figure 1

Overall statistics of TCRβ repertoires for autoimmune patients and HC samples. (A) Rarefaction analysis for three groups. Each time, a certain amount of subsamples was selected randomly from the total samples to calculate the unique clone number and public clone (observed in least two subjects) number. For each sample size, the process of subsampling was repeated 50 times. (B) Amount of overlap for all unique clones and the public clones among the three groups. The lower texts show the formula and results of the overlapped rate. (C) Comparison of overall TCRβ repertoire diversity indices among the three groups (ANCOVA test and multiple comparisons test). The dot is the median and the grey line is the average value after being corrected by the covariate age. A high-frequency clone indicates the frequency is greater than 0.1%. (D) Relationship between clone incidence in each group and clone’s frequency. The dot in the left panel is the clone median frequency, and the line shows the +/-SE of the frequencies. The clone median frequencies were divided into two groups for comparison (Mann-Whitney test). ANCOVA, analysis of covariance; freq., frequency; HC, healthy controls; J, joining; num., number; RA, rheumatoid arthritis; SLE, systemic lupus erythematosus; V, variable.

TCR clones shared by more than one individual are generally considered as public. We referred the publicity of a public clone to the number of individuals who share the clone. Rare fraction analysis showed the increment trend of the public clone number with the growth of the sample size (figure 1A). Compared with the total TCR clones, the sharing of public clones among the three groups was higher, which implies the selection advantage of public clones (figure 1B). The public clones in both SLE and RA, were significantly more expanded than their counterparts in the HCs with matched publicity (figure 1D). Moreover, clones with higher publicity (eg, publicity >50) had higher clonal frequencies (figure 1D). As public clones were reported to have biological implications with antigens,21 the above results imply the higher-frequency clones in SLE or RA, which are shared by more individuals and induce their lower diversity, may contribute to the aetiopathology of these autoimmune diseases.

The gene usage comparison and its classification performance

Numerous genes showed significant differential usages between diseases and health (adjusted p<0.05, figure 2A,B). The V-J pairings were more evenly distributed in RA compared with SLE and HCs (figure 1C, online supplementary figure S2). Genes and gene combinations was more discrepant between SLE and RA (figure 2C). Based on these observations, SLE can be classified from HCs using V genes with an area under receiver operating curve (AUC) of 99.63%, and the performance improved to 99.85% by V-J gene pairing. RA was classified from the HC group with AUCs of 99.78% and 99.56% with V genes and V-J pairing, respectively. SLE and RA may also be accurately separated with AUCs of 99.99% and 99.97% using V and V-J genes. Utilising J gene usage alone showed a poorer performance of 93.22% (figure 2D). Taking into account the abundance of unique sequences did not improve the overall performance of classification (online supplementary figure S3). Different discrimination models, such as linear discriminant analysis, also generated a high classification accuracy with an AUC of 99.67% between the SLE and HC groups (online supplementary figure S4A). To examine whether phenotypical differences, such as age and gender, contribute to the classification, we randomly selected samples with the same age and gender composition from the SLE and HC groups, and the AUC still achieved 99.49% (online supplementary figure S4B).

Figure 2

The differential analysis of gene usage in TCRβ repertoires. (A) Vβ gene usage distribution. (B) Jβ gene usage distribution. The error bar in (A) and (B) is the SE. The red asterisk represents the p value between SLE and HCs, and the blue asterisk represents the p value between RA and HCs. Only the genes for which the average frequency is greater than 1% display the p values (Mann-Whitney test, corrected by false discovery rate). (C) Spearman correlation of Vβ gene, Jβ gene and V-J pairings between any two samples that belong to different groups. (D) ROC curves showing the performance of classification for two groups using the Vβ gene, Jβ gene and V-J pairings by 10-fold cross validation. The area under receiver operating curve and 95% CI are shown in the bottom box. AUC, area under receiver operating curve; HCs, healthy controls; J, joining; RA, rheumatoid arthritis; ROC, receiver operating characteristic; SLE, systemic lupus erythematosus; TCR, T cell receptor; TRBJ, TCR beta chain joining; TRBV, TCR beta chain variable; V, variable.

In silico identification of autoimmune disease-associated TCRs and application in disease diagnosis

Previous studies have emphasised the importance and antigenic association of public TCRs in various diseases.21 We also identified that TCR clones with a higher publicity in SLE and RA had higher clonal frequencies compared with HCs (figure 1D). Importantly, by mapping the clones to manually curated TCR specificity databases,22–24 a clear trend showed that clones with high publicity had a higher probability of being identified in the database (online supplementary figure S5). Therefore, we assumed that the autoimmune disease-associated or specific TCR clones have higher publicities and clonal frequencies than HCs. With the strength of our large sample size, we have developed an in silico pipeline to identify all disease-associated TCRs in SLE and RA (online supplementary note, figure 3A). The distribution of the p values, relative risks and false discovery rates for all TCR clones in the patients were plotted and evaluated (online supplementary figure S6). According to the performance of the leave-one-out cross validation with a set of p value thresholds, including the AUC, accuracy and cross-entropy loss, an appropriate p value cut-off was used to determine the disease-associated clones (online supplementary figure S7). Finally, we identified 198 SLE-associated TCR clones (p<5e-4, online supplementary table S3) and 53 RA-associated clones (p<1e-3, online supplementary table S4). To exclude the possibility that some of the associated TCR clones were misidentified due to sequence cross contamination, a subset of samples was randomly selected, and replicated library preparation and sequencing were performed. The numbers of disease-associated TCRs in replicated samples were not reduced, ruling out a random cross contamination (online supplementary figure 8). These TCRs were disease specific or shared by significantly more patients (figure 3B). Three hundred twenty-seven TCRs have been reported before22–24 to have SLE specificity with 156 found in our SLE repertoires, but none have been identified to be SLE-associated in our study (figure 3B). Intriguingly, disease associated TCRs are significantly correlated with disease activity in both SLE and RA, reflecting the reliability of our in silico identification, which will be discussed later.

Figure 3

Disease-associated clone identification and disease detection using these clones. (A) Bioinformatic pipeline for identification of disease-associated clones and the process of disease classification. (B) Publicity of TCRβ clone in disease and HC samples. The size of the spot in each position represents the abundance of the clone. Disease-associated clones identified in this study are represented by red dots. Previously, reported SLE-reactive TCRβ clones are represented by brown dots. All other clones are shown in grey. (C) ROC curves showing the classification performance of a classifier for two groups by leave-one-out cross validation. The blue dot in each line is closest to the top-left part of the plot with perfect sensitivity or specificity. The inset depicts the classification results under the best accuracy. Embedded Image , clone’s supporting sample numbers in control. Embedded Image , clone’s supporting sample numbers in disease. cC, corrected supporting sample numbers in control. cD, corrected supporting sample numbers in disease. cnC, total samples in control - cC. cnD, total samples in control - Embedded Image RR, relative risk. FDR, false discovery rate. Funiq, poportion of unique specific clones presented in the sample. Fabund, proportion of total specific clones presented in the sample. Embedded Image accumulated corrected supporting sample number (in disease group) of specific clones found in the sample. Embedded Image accumulated corrected supporting sample number (in control group) of specific clones found in the sample. HC, healthy control; RA, rheumatoid arthritis; ROC, receiver operating characteristic; SLE, systemic lupus erythematosus; TCR, T cell receptor.

The major purpose of the current study is to validate whether autoimmune disease-associated TCRs are capable to be diagnostical biomarkers. Using these TCR clones, we developed a machine learning model to classify SLE, RA and HCs (online supplementary note). In the leave-one-out cross-validation test, the AUC of the classification reached 94.27% between SLE and HC. The diagnostical accuracy was even higher for RA with an AUC of 96.71%. Discrimination between SLE and RA was more effective with the AUC achieving 96.78% (figure 3C).

Features of autoimmune disease-associated TCR clones

We subsequently examined the features of these disease-associated TCR clones in SLE and RA. We found the four clones with the highest publicities were SLE-associated TCRs, and two of them were also associated with RA (figure 4A). The clone with the highest publicity in RA was both an RA and SLE-associated clone (online supplementary figure S9A). We also found that both the SLE- and RA-associated TCRs had apparently higher frequencies than the other unrelated clones (figure 4B). It has been reported that shorter TCR complementarity-determining region 3 (CDR3) were recombined to confer susceptibility in type 1 diabetes.25 Our data indicated that the lengths of autoimmunity-associated TCR CDR3s, particularly in SLE, were significantly shorter than unrelated clones (figure 4C). We subsequently demonstrated the shorter CDR3 length was mostly attributed to the reduced probabilities of longer insertions in the junction regions for the disease-associated clones (online supplementary figure S9B, C). A set of V and J gene usage of the disease-associated TCR clones have been found to be over represented in SLE and RA (figure 4D). The motif analysis in CDR3 indicated that SLE- and RA-associated clones shared more motifs than other unrelated clones, and these motifs were more prevalent than motifs derived from unrelated clones (figure 4E). To examine the similarities of these disease-associated TCR clones, the Levenshtein distances was used to cluster the clones. We found disease-associated clones were significantly enriched in large clusters, which was not the case for other randomly selected unrelated clones (figure 4F,G). We also used a published method Grouping of Lymphocyte Interactions by Paratope Hotspots (GLIPH) that integrates both naïve T cell receptor sequences and motif analysis to cluster the clones and plotted the connection among them.26 As expected, the SLE-associated TCRs were well connected and clustered, thereby illustrating the high sequence similarity among these clones. The RA-associated TCR clones were not clustered as well by this method (figure 4H). We also demonstrated our SLE- and RA-associated TCR sequences had significantly higher similarities within each group than sorted naïve T cells in the published TCR data26 (figure 4I).

Figure 4

Characteristic analysis of SLE- and RA-associated TCRβ clones. (A) All SLE TCRβ clone frequencies are shown with their incidences in SLE samples. The SLE-specific clones overlapped with RA-specific clones are represented by black triangles. The other SLE-specific clones are represented by blue triangles. The left SLE clones are shown in red triangles. The results of comparison between SLE/RA specific clones and unrelated control clones are shown in (B) (C) (D) (E) (F) and (G). The SLE unrelated clones were randomly selected from all SLE clones with the exception of the SLE specific clones, and RA unrelated clones were selected by the same method. The number of unrelated clones is the same as the disease specific clones and were sampled with replacement for 100 times. (B) Clone frequency distribution (Kolmogorov–Smirnov test). (C) CDR3 amino acid length distribution (Kolmogorov–Smirnov test). The error bar for other unrelated clones is the SE of 100 repeated samples. (D) Vβ gene and Jβ gene usage. The dashed lines were drawn with the slope 0.5 and 2 separately. S.O., SLE other clones. s.s., SLE specific clones. R.O., RA other clones. R.S., RA specific clones. (E) Shared motif between SLE and RA specific clones (left Panel) and SLE and RA other clones (right Panel). Shared motif observed in both SLE and RA groups; at least two samples were marked in red, and the number in the left top corner shows the amount of motif in red. (F) SLE and RA specific clone clustering using Levenshtein distance. (G) SLE and RA other clone clustering using Levenshtein distance. Two CDR3s are clustered together in (I) and (J) if the Levenshtein distance is less than 3. Dot represents a CDR3 and CDR3s in a cluster were connected by a line. Red dot, shared by both SLE and RA. Blue dot, RA clones. Purple dot, SLE clones. (H) SLE-specific clones (top Panel) and RA-specific clones (bottom Panel) were clustered by the tool GLIPH. The dot represents a clone, and two dots are connected if they share a significant motif (compared with naïve clones) or have a similar CDR3 region. (I) Minimum Hamming distance of clones in disease-specific clones compared with equal-sized randomly sampled naïve T cell receptor clone pool. The error bar in the figure is the SE of 100 repeated random samples of naïve clones (X2 test). (J) comparison of SLE and RA specific clones. R.S., RA-specific. s.s., SLE-specific. CDR3, complementarity-determining region 3; freq. frequency; J, joining; RA, rheumatoid arthritis; SLE, systemic lupus erythematosus; TCR, T cell receptor; V, variable.

In the 53 RA-associated TCRs, 32 (60.4%) clones can be found in SLE patients, and 46 of 198 (23.2%) SLE-associated TCRs can be found in RA. Among them, we determined 12 TCR clones were both RA- and SLE-associated TCRs (figure 4J). Moreover, for the 32 RA associated TCRs found in SLE, 81.3% (26/32) had higher publicity in SLE than HC, implying their potential association with SLE. In contrast, 76.1% (35/46) SLE associated TCRs found in RA showed similar trend (figure 4J). These findings indicate the universal autoimmunity for disease associated TCRs. Disease-associated clones in RA and SLE tended to highly use similar V genes, such as V7-7 or V6-4 (figure 4D). Moreover, RA- and SLE-associated clones were clustered in the same large cluster, including nine clones associated with both RA and SLE, which indicates the high similarity among these clones (figure 4F). Taken together, this evidence supports the existence of disease-associated TCR clones in the entire autoimmune disease spectrum.

The divergence and heterogeneity of the TCR repertoire in autoimmune diseases

The correlation of the V, J gene usage and V-J pairing were the lowest in the SLE patients, which indicates the high heterogeneity of TCR gene usage in SLE patients. However, these metrics in RA patients were even higher than in the HC samples, which implies RA exhibits higher homogeneity in this respect (figure 5A). We then measured the clonal overlap between every two groups and found the numbers in SLE and RA were smaller than in the HCs, with SLE being the smallest (figure 5B). Furthermore, the publicities of public clones were substantially smaller in SLE and RA than in HC (figure 5C,D). The above evidences support the high heterogeneity in autoimmune diseases, particularly in SLE. Hierarchical clustering on all samples by the public TCR clones also revealed higher heterogeneity in SLE and RA though no apparent clusters appear (figure 5E-G).

Figure 5

Heterogeneity of TCRβ repertoire in autoimmune diseases. (A) Spearman correlation of Vβ gene, Jβ gene and V-J pairings between any two Subjects (Mann-Whitney test). (B) Overlapped rate (Jaccard Index) of TCRβ clone repertoire between any two Subjects (Mann-Whitney test). (C) Sample incidences of the top 50 clones for three groups. (D) Percentage of clone number in total clones are shown with the clone’s publicity. (E) Top 100 (ranking by publicity) SLE specific clones (y-axes) distributed among all SLE samples (x-axes). The red indicates the clone was observed in the sample, and the blue indicates the clone was not observed in the sample. (F) All 56 RA specific clone distributions among all RA samples. (G) Top 100 clones (ranking by publicity) distributed in HC samples. HC, healthy control; J, joining; RA, rheumatoid arthritis; SLE, systemic lupus erythematosus; TCR, T cell receptor; V, variable.

The clinical implication of the TCR features and enrichment of pathogen-related TCRs in autoimmunity

We subsequently investigated whether the TCR repertoire properties were correlated with the clinical features of these autoimmune diseases. In SLE, the Gini index of the V-J gene combinations and the content of SLE-associated clones were positively correlated with the disease activity measured by the SLE Disease Activity Index (SLEDAI) (figure 6A,C, online supplementary figure S10A). The content of SLE-associated TCR clones and the total unique clone number were correlated with the complement C3 and C4 measurements in SLE, respectively. Furthermore, the content of SLE-associated clones, the total frequency of the top 100 clones and the Gini index of V and J gene pairing were also correlated with the number of damaged organs (figure 6A, online supplementary figure S10A). Additionally, we found some medications such as steroids could lower the diversity through increasing expanded-steroids clone frequencies in SLE (online supplementary figure S10A).

Figure 6

Clinical association analysis and TCRβ repertoire’s annotation by manually curated TCR specificity databases. The heatmap shows p values between the repertoire diversity indices (ANOVA test and multiple comparisons test) or contents of disease-specific clones (ANOVA test and multiple comparisons test) and the clinical information in SLE (A) and RA (B). (C) percentage of SLE specific clones in patients with different SLEDAI and C3. (D) Percentage of RA specific clones in patients with multiple clinical factors. (E) TCRβ repertoire’s annotation for each sample (x-axes) by databases TBAdb, VDJdb, McPAS-TCR. The colour presents the percentage of clones that were found in the database in the sample’s total unique clone number. The right panel shows the p values between two groups (Mann-Whitney test, corrected by false discovery rate). R, RA. S, SLE, C, HC. -, test is not available due to small dataset. (F) Violin plots illustrating the percentage of annotated clones for significantly different groups in (E). MS, multiple sclerosis. ANOVA, analysis of variance; DAS28, disease activity score 28; ESR, erythrocyte sedimentation rate; HC, healthycontrol; RA, rheumatoid arthritis; SLE, systemic lupus erythematosus; TCR,T cell receptor.

It is reported that organ-specific factors, such as tissue resident cells, may contribute to inflammation and tissue injury in SLE.27 Therefore, we assessed whether organ specific autoantigens may drive the selection of TCR clones with specific organ lesions. Our data showed that patients with skin damage tended to have more similar TCR repertoires, but no clear evidence supported the TCR selection of other specific organ lesions. Intriguingly, patients without organ lesions shared closest repertoires (online supplementary figure 11A, B). Blood system involved patients tended to have more divergent repertoires (online supplementary figure 11A-C). We also identified a significantly shorter CDR3 length in patients with blood system involvement; however, the contribution of deletion or insertion was obscure (online supplementary figure11D-F). Notably, the total frequency of SLE-associated TCR clones was the highest in kidney damaged patients, which implies a more active state (online supplementary figure 10A).

In RA, we also identified a higher clonal expansion and lower diversity, measured by the Shannon entropy, the unique clone number and the number of expanded clones in high ESR and C-reactive protein patients, which could indicate disease severity in RA (figure 6B). Intriguingly, in antinuclear antibody (ANA) positive patients, we identified a higher repertoire diversity measured by the unique clone number and the lower expansion of RA-associated TCRs (figure 6B,D and online supplementary figure S10B. In both autoimmune diseases, we found disease-associated clones were more correlated with the clinical features than the overall diversity indices, which further validated the reliability of disease-associated clone identification (figure 6A,B).

Astonishingly, by comparing the appearance of annotated TCR sequences from manually curated TCR specificity databases TBAdb,22 VDJdb23 and McPAS-TCR24 in autoimmune disease samples with that in HCs, autoimmunity displayed a significant enrichment in several categories with pathogen identified at most. SLE harboured significantly more T cells targeting influenza and Epstein-Barr virus (EBV), and tuberculosis specific T cells are enriched in both SLE and RA. Furthermore, we observed the prevalence of multiple sclerosis (MS), allergy and cancer-related T cells in SLE (figure 6E,F).


We have developed an in silico pipeline to test and identified substantial disease-associated TCRs in SLE and RA. Of note, published SLE-associated TCRs either unexist or have low publicity or clonal frequencies in our data. This finding is not unexpected, and the major reasons include antigenic TCR privacy, cross reactivity with different antigens and lack of TCRα pairing information, which have been carefully discussed in cytomegalovirus (CMV) infection.28 In our study, other possibilities should also be raised, such as different human leukocyte antigen (HLA) backgrounds and mostly the extreme heterogeneous autoantigenic exposure in SLE, as shown in our results. Using the identified SLE- or RA-associated TCRs, we can classify SLE or RA from HCs with very high accuracy, as well as between SLE and RA. These findings indicate that TCR can be a better biomarker than traditional serum immunological markers, such as anti-ANA antibody and anti-double stranded DNA (anti-dsDNA) antibody. Although the phenomenon of epitope spreading further diversifies the TCR response from its original epitopes to newer epitopes over time, our finding suggests an immune response exists to a limited set of common autoantigens in SLE or RA patients, which may provide help in developing targeted therapy and vaccinations for SLE and RA.

The interplay between autoimmunity and infections has been discussed for many years but remains elusive. There is evidence that viruses or other infection agents, such as EBV and CMV, could trigger autoimmunity and drive pathogenesis. Several mechanisms could serve as explanations to their co-occurrence, such as molecular mimicry to stimulate the cross activation of T cells or dysregulated activation of the host immune system.29 30 Our discovery of the enrichment of pathogen specific T cells in autoimmune diseases is in line with clinical observations and supplements the evidence from a brand new perspective. As we showed in this study, given the sharing of associated T cells in both SLE and RA, it is not surprising TCRs targeting other autoimmune-related phenotypes, such as MS or allergy, are more prevalent in SLE. The enrichment of cancer-related TCRs may be explained by the cross reactivity of self-antigens, as indicated by the association between autoimmunity and cancer.31

In summary, our large-scale work moves a step forward in demonstrating the clinical utility of TCR repertoire sequencing to assist diagnosis, treatment and potentially early detecting the autoimmune diseases.



  • XL, WZ and MZ contributed equally.

  • Handling editor Prof Josef S Smolen

  • Correction notice This article has been corrected since it published Online First. The fourth author's name has been corrected.

  • Contributors QL, XL, WZ and MZ designed the study, analysed the data and wrote the manuscript. LF, JW, LW, LL, SW, LL, XW, TL, YZ, ZW, MF and HP performed TCR sequencing and data analysis. LL, SL, ZW, YL, YY, JJ, YT, BZ, XG, CH, QL, XL, JC, FL, GL, HZ, HL, ZX JL, YL, HY, HL and HW collected the samples and information of patients and healthy subjects. QL, XL, HY and JW supervised the study.

  • Funding This study was supported by the National Key Research and Development Program of China (2016YFC0903900), the National Natural Science Foundation of China (No. 81430074, No.81522038 and No. 81220108017), the Key research and development plan of Hunan Province (2017SK2042) and the Shenzhen Municipal Government of China (No. JCYJ20170817145536203 and JCYJ20170817145428361).

  • Competing interests None declared.

  • Patient consent for publication Obtained.

  • Ethics approval The study has been approved by the ethical committee of the Second Xiangya Hospital of Central South University.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data availability statement The study data have been made publicly available at pan immune repertoire database (PIRD,,22 which is located in China National GeneBank (CNGB). The project ID in PIRD are P18081001, P18081101 and P18080801.