Article Text

Download PDFPDF

Single-cell analysis reveals fibroblast heterogeneity and myofibroblasts in systemic sclerosis-associated interstitial lung disease
  1. Eleanor Valenzi1,
  2. Melissa Bulik2,
  3. Tracy Tabib3,
  4. Christina Morse3,
  5. John Sembrat1,
  6. Humberto Trejo Bittar4,
  7. Mauricio Rojas1,
  8. Robert Lafyatis3
  1. 1 Division of Pulmonary, Allergy and Critical Care Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
  2. 2 Department of Human Genetics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
  3. 3 Division of Rheumatology and Clinical Immunology, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
  4. 4 Department of Pathology, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
  1. Correspondence to Dr Eleanor Valenzi,Division of Pulmonary, Allergy and Critical Care Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, USA; valenzie{at}


Objectives Myofibroblasts are key effector cells in the extracellular matrix remodelling of systemic sclerosis-associated interstitial lung disease (SSc-ILD); however, the diversity of fibroblast populations present in the healthy and SSc-ILD lung is unknown and has prevented the specific study of the myofibroblast transcriptome. We sought to identify and define the transcriptomes of myofibroblasts and other mesenchymal cell populations in human healthy and SSc-ILD lungs to understand how alterations in fibroblast phenotypes lead to SSc-ILD fibrosis.

Methods We performed droplet-based, single-cell RNA-sequencing with integrated canonical correlation analysis of 13 explanted lung tissue specimens (56 196 cells) from four healthy control and four patients with SSc-ILD, with findings confirmed by cellular indexing of transcriptomes and epitopes by sequencing in additional samples.

Results Examination of gene expression in mesenchymal cells identified two major, SPINT2hi and MFAP5hi, and one minor, WIF1hi, fibroblast populations in the healthy control lung. Combined analysis of control and SSc-ILD mesenchymal cells identified SPINT2hi, MFAP5hi, few WIF1hi fibroblasts and a new large myofibroblast population with evidence of actively proliferating myofibroblasts. We compared differential gene expression between all SSc-ILD and control mesenchymal cell populations, as well as among the fibroblast subpopulations, showing that myofibroblasts undergo the greatest phenotypic changes in SSc-ILD and strongly upregulate expression of collagens and other profibrotic genes.

Conclusions Our results demonstrate previously unrecognised fibroblast heterogeneity in SSc-ILD and healthy lungs, and define multimodal transcriptome-phenotypes associated with these populations. Our data indicate that myofibroblast differentiation and proliferation are key pathological mechanisms driving fibrosis in SSc-ILD.

  • systemic sclerosis
  • fibroblast
  • pulmonary fibrosis

Statistics from

Key messages

What is already known about this subject?

  • Systemic sclerosis-associated interstitial lung disease (SSc-ILD) is a devastating complication of SSc, with high morbidity and mortality, and limited effective treatments.

  • In SSc-ILD, myofibroblasts are the key fibrotic effector cell due to their excessive extracellular matrix production and acquired contractile phenotype.

What does this study add?

  • For the first time, we identify previously unrecognised fibroblast heterogeneity in SSc-ILD and healthy human lung and the transcriptome-phenotype of these and other mesenchymal cell populations.

How might this impact on clinical practice or future developments?

  • These results provide new insights into the transcriptome of pathogenic myofibroblasts, supporting the future development of new targeted therapies directed at these key cells.


Systemic sclerosis (SSc) is an autoimmune disorder with diverse clinical manifestations, including fibrosis of the skin and visceral organs, as well as vasculopathy. With limited effective treatments available, SSc continues to result in substantial morbidity and mortality. Pulmonary complications, including interstitial lung disease (ILD) and pulmonary arterial hypertension, remain the leading cause of disease-related mortality in SSc.1 While the current paradigm for disease pathogenesis suggests multiple processes, including activation of the innate and adaptive immune system, small vessel vasculopathy, and aberrant transforming growth factor (TGF)-β signalling inducing fibroblast dysfunction, the precise pathophysiology remains uncertain.2–4

In fibrotic ILD, myofibroblasts play a pivotal role in aberrant extracellular matrix remodelling due to their dual features: having the collagen-synthesising capacity of fibroblasts and the contractile capacity of smooth muscle.5 With enhanced contractility, myofibroblasts also induce progressive tissue stiffness, creating a perpetuating profibrotic stimulus.6 7 Myofibroblasts are currently believed to derive from multiple sources, including resident mesenchymal progenitors, pericytes, bone marrow-derived fibrocytes, resident fibroblasts, and mesenchymal transition of endothelial cells and epithelial cells.8–14 Studies evaluating the origin and behaviour of myofibroblasts have traditionally defined these cells as fibroblasts expressing ACTA2 (α-smooth muscle actin), lacking specificity as various other cell types including pericytes, smooth muscle cells and myoepithelial cells also express it.

To better examine changes occurring in the lungs of patients with SSc-ILD, we used droplet-based, single-cell RNA-sequencing (scRNA-seq) and cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq),15 a method for simultaneously measuring cell surface proteins and messenger RNA (mRNA) transcripts at the single-cell level, for multimodal analysis of lung tissues from patients with SSc-ILD and healthy controls. We identified all the major cell populations in healthy and SSc-ILD lungs, including myofibroblasts and multiple previously unrecognised fibroblast subpopulations, confirmed with epitope expression by CITE-seq, allowing for detailed analysis of the transcriptome-phenotype of each unique fibroblast population.


scRNA-seq library preparation was performed using the 10X Genomics Chromium System per the manufacturer’s protocol. Libraries were sequenced using an Illumina NextSeq 500 through the University of Pittsburgh Genomics Core Sequencing Facility. Data analysis was performed with the R package Seurat V.2.3.4 and R V.3.5.16 17 To minimise batch effects in combining multiple samples for integrated analysis, an individual object was created for each sample, then aligned for canonical correlation analysis using Seurat’s RunMultiCCA function.18 Differential gene expression analysis for SSc-ILD versus control cells for each cluster was performed using the Wilcoxon rank-sum and model-based analysis of single-cell transcriptomics (MAST) statistical tests.19 A Bonferroni correction was made to correct for multiple comparisons of Wilcoxon p values. Changes in mean proportion of cells that comprised each cell type were compared using a non-parametric Kruskal-Wallis test with Dunn’s multiple comparison test for the overall cell types and a Mann-Whitney test for the fibroblast subpopulations. P values less than 0.05 were considered to be statistically significant. Patients and the public were not involved in the design of this study.

Other comprehensive experimental methods and specific materials are detailed in online supplementary file 1.


Study population

We analysed 13 lung tissue specimens by scRNA-seq from explanted tissues obtained from four patients with SSc-ILD at the time of lung transplant and four healthy controls (organ donors with lungs unable to be transplanted). Separate upper and lower lobe samples were included for each patient with SSc-ILD (8 SSc-ILD samples) and from one control, with only one sample available for all other controls (5 control samples). Explanted tissue from two additional SSc-ILD samples and one additional healthy control was used for confirmation of scRNA-seq findings at the transcriptome and epitope level. We reviewed the pathology of adjacent lung tissue and clinical information for all samples (table 1, online supplementary figure 1). Seven of the SSc-ILD samples showed usual interstitial pneumonia (UIP) on histology, with varied amounts of lymphoid aggregates, increased chronic inflammation and myointimal thickening of the pulmonary arteries. One upper lobe SSc-ILD sample showed non-specific interstitial pneumonia (NSIP) with acute lung injury. While NSIP is the most common histopathological pattern of ILD occurring in SSc overall,20 the predominance of UIP within these samples may reflect the end-stage lung disease seen more commonly in patients requiring transplant, and is consistent with a prior microarray study of SSc-ILD which observed UIP in all explant samples.21

Table 1

Characteristics of patient samples: demographics, number of cells analysed after filtering in the scRNA-seq analysis, pathological review of adjacent tissue and clinical characteristics of the patient samples included

Analysis of SSc-ILD and control transcriptomes reveals multiple cell populations

In total, 56 196 cells were analysed, with 21 768 cells from healthy controls and 34 428 cells from patients with SSc-ILD. Clusters were labelled by cell type using expression of previously described markers (figure 1, online supplementary figure 2). Macrophages and monocytes were identified in seven separate clusters, which divided into three primary subgroups: the first expressing SPP1, CCL2 and MERTK (SPP1hi), the second expressing FABP4, INHBA and SERPING1 (FABP4hi), and the third a population of monocytes expressing FCN1, IL1B and IL1R2 (FCN1hi) (online supplementary figure 3A). Proliferating cells from multiple cell types, predicted to be in the G2/S phase by cell phase analysis, clustered separately by their unique high expression of genes associated with active cell proliferation (online supplementary figure 3B, 4A, 4C). Proliferating macrophages of the FABP4hi phenotype were the predominant proliferating cell population; however, a higher proportion of proliferating SPP1hi macrophages appeared in SSc-ILD samples compared with healthy controls (online supplementary figure 3C). Cluster 13, containing control and SSc-ILD macrophages and lymphocytes with low gene expression, likely represented damaged and dying cells.

Figure 1

Single-cell RNA-sequencing analysis of five human healthy control and eight SSc-ILD lung tissue samples. (A) Visualisation of clustering by t-SNE plot of all 13 combined healthy control and SSc-ILD samples, identified by cell type. (B) Reclustering of the original clusters (clusters 5, 6 and 10 in figure 1A) containing multiple bronchial and alveolar epithelial cell types demonstrating separation into individual epithelial cell types. (C) t-SNE plot of cells coloured according to disease status; all clusters contained cells from both SSC and control samples. (D) Heat map of scaled gene expression data for the top 5 differentially expressed genes identifying each cluster, with selected genes listed. (E) t-SNE plot of cells coloured according to sample of origin, demonstrating all clusters contain cells from all samples. SSc-ILD, systemic sclerosis-associated interstitial lung disease; SSC, systemic sclerosis; t-SNE, t-distributed stochastic neighbour embedding.

Changes in cell populations present

Analysing the proportion of total cells present in each population by sample and disease status revealed several significant changes between normal lungs and SSc-ILD (figure 2). The populations of smooth muscle cells and pericytes increased significantly in SSc-ILD upper lobes compared with healthy control lungs (p=0.0209), with endothelial cells also showing a trend towards increased numbers in SSc-ILD lungs. Total macrophages and monocytes fell from 60.84% of cells in controls to 53.33% of cells in upper lobe SSc and 49.09% of cells in lower lobe SSc. Proliferating macrophages increased from only 1.06% of cells in controls to 1.73% in SSc-ILD upper lobes and 2.95% in SSc-ILD lower lobes (p=0.0111). The proportion of natural killer cells decreased significantly from 7.34% of cells in controls to 3.80% in SSc-ILD upper lobes and 1.34% in SSc-ILD lower lobes (p=0.0125). No consistent changes were seen in the proportion of fibroblasts, alveolar type 1 or alveolar type 2 cells, while ciliated, club and basal cells all trended towards increased numbers in a graded fashion, that is, more cells in SSc-ILD upper lobes than controls, and more cells in SSc-ILD lower lobes than SSc-ILD upper lobes, likely reflecting the mucociliary and basal epithelial cells lining honeycomb cysts.22 23

Figure 2

Mean percentage of total cells that comprised each cell type, comparing control, upper lobe SSc-ILD and lower lobe SSc-ILD samples. Bars indicate the mean percentage of total cells, with error bars indicating SEM. *=P value <0.05. SSc-ILD, systemic sclerosis-associated interstitial lung disease.

Smooth muscle cells and pericytes

Fibroblasts, smooth muscle cells and pericytes, as well as fibroblasts from the proliferating cell cluster (online supplementary figure 4), were combined and reclustered to allow clearer identification of these and any rare mesenchymal cell subpopulations (figure 3, online supplementary figure 5). Fibroblasts composed four clusters, as marked robustly by the expression of LUM, PDGFRA and FBLN1, with fibroblast subpopulations as detailed below. Smooth muscle cells highly expressed ACTA2, DES, MYH11 and PLN (figure 3E, online supplementary figure 5). Differential gene expression between SSc-ILD and control lungs was analysed for all mesenchymal populations (online supplementary file 2).

Figure 3

Single-cell RNA-sequencing analysis of human healthy control and SSc-ILD mesenchymal cell populations. (A) t-SNE plot of combined fibroblast, smooth muscle/pericyte and proliferating fibroblast cells as identified by cell type and fibroblast subpopulations. (B) Volcano plot of differentially expressed genes (log2-fold change >0.5, adjusted p value <0.05) from the comparison of all SSC fibroblasts with all control fibroblasts. Results showed that 461 genes were upregulated and 115 genes were downregulated by greater than twofold. (C) Gene expression of CD34, demonstrating high CD34 expression in MFAP5 fibroblasts, and THY1, demonstrating high THY1 expression in myofibroblasts, MFAP5hi fibroblasts and pericytes. (D) Mean proportion of total fibroblasts that each fibroblast subpopulation comprises in SSc-ILD and control lungs, calculated by individual sample. (E) Expression of selected collagen and cell type specific genes by mesenchymal population. Dot size corresponds to the percentage of cells in a cluster expressing the gene, and dot colour corresponds to the average expression level for the gene in the cluster. (F) t-SNE plot of healthy control fibroblasts only. (G) Violin plots of gene expression of SPINT2, MFAP5 and WIF1 by control fibroblast cluster. (H) Gene expression plots demonstrating high expression of SPINT2 and CD14 in SPINT2hi fibroblasts, MFAP5 and CD34 in MFAP5hi fibroblasts, and WIF1 and ITGA10 in WIF1hi fibroblasts. Dot colour corresponds to the level of gene expression in each cell. *=P value<0.05. SSc-ILD, systemic sclerosis-associated interstitial lung disease; SSC, systemic sclerosis; t-SNE, t-distributed stochastic neighbour embedding.

A distinct pericyte population, markedly expanded in SSc-ILD (p=0.0295), was identified by its expression of the known markers RGS5, PDGFRB, MCAM, CSPG4 (NG2) and NES (figure 3E, online supplementary figures 5, 6).24 25 Although none of these markers were exclusive to pericytes, this cluster was the only population expressing all of these genes. The identified pericyte population showed enhanced expression of FAM162B, CHN1, IGFBP2 and HIGD1B compared with other mesenchymal cells, with FAM162B the most specific identifier of this population. Pericytes did not separate into the previously described subclassification of type 1 (NES−/CSPG4+) and type 2 (NES+/CSPG4+).26 27

Fibroblast subpopulations in controls

To guide our understanding of fibroblast heterogeneity in the combined analysis of control and SSc-ILD lungs, we separately analysed only the fibroblasts from healthy control lungs. Two major and one minor subpopulations of fibroblasts emerged, with all groups containing cells from each control sample (figure 3F-H). The first major population was defined by expression of SPINT2, CD14, LMCD1, FGFR4 and FIGF (SPINT2hi fibroblasts). A second major population was defined by expression of MFAP5, CD34, THY1, SLPI and PLA2G2A (MFAP5hi fibroblasts). A distinct minor population was distinguished by expression of WIF1 and ITGA10 (WIF1hi fibroblasts).

Fibroblast subpopulations in SSc-ILD

Examining fibroblast populations in the combined analysis of control and SSc-ILD mesenchymal cells, we again identified distinct populations of SPINT2hi and MFAP5hi fibroblasts, with each containing both control and SSc-ILD fibroblasts. An additional large population of fibroblasts, containing primarily SSc-ILD cells, expressed the highest level of ACTA2 (3.04-fold increase compared with other fibroblast populations), consistent with this population representing the contractile myofibroblasts. This population did not express the smooth muscle specific markers MYH11 and DES, and had the highest expression of several collagen genes among the mesenchymal cells (figure 3E). The myofibroblasts exhibited high expression of THY1 and low expression of CD34. A group of proliferating myofibroblasts clustered separately based on coexpression of cell proliferation genes and myofibroblast genes. The total myofibroblasts increased from 10.66% of fibroblasts in controls to 62.75% of fibroblasts in SSc-ILD (figure 3D). This may overestimate the proportion of control myofibroblasts, however, as the WIF1hi population was counted within this group, and no myofibroblast population was identified when analysing the control fibroblasts alone. The minor WIF1hi population consisted almost entirely of healthy control fibroblasts. Intersample variability in the presence of and marker gene expression for each fibroblast population is detailed in online supplementary figures 6 and 7.

We identified the presence of the myofibroblasts, SPINT2hi fibroblasts and MFAP5hi fibroblasts by their transcriptome signature and CITE-seq, as well as in separate SSc-ILD and healthy control samples using the surface markers CD34 and CD90 (THY1) (figure 4A,C-D). Although increased CD34 mRNA distinguished MFAP5hi fibroblasts, this difference was reduced on examining surface protein expression (figure 4C-D). Thus, neither of these proteins distinguished the myofibroblast population well. To confirm the population putatively identified as myofibroblasts, we stained SSc-ILD and control lungs for α-smooth muscle actin and collagen triple helix repeat containing 1 (CTHRC1), a gene we found highly and selectively upregulated in the myofibroblasts (figure 4B).

Figure 4

CITE-seq of four additional SSc-ILD (SSC 9–12) and one additional healthy control lung (control 6) and immunofluorescence staining of SSc-ILD and control lung. (A) t-SNE plot of combined fibroblast, smooth muscle and pericyte cells, identified by subpopulation/cluster. (B) Serial sections of control and SSc-ILD lung with immunofluorescence with DAPI nuclear staining and trichrome staining. SMA and CTHRC1 coexpress in areas of disorganised myofibroblasts, with SMA+/CTHRC1− cells staining smooth muscle. Trichrome staining demonstrates excessive collagen deposition (blue) in SSc-ILD lungs. (C) Violin plots of gene expression of CD34 and THY1 and protein expression of CD34 (labelled as CD34-CITE) and CD90/THY1 (labelled as THY1-CITE) as detected by oligonucleotide-labelled antibodies. (D) Gene and protein expression of CD34 and THY1 (CD90). Dot colour corresponds to the level of gene expression in each cell. CITE-seq, cellular indexing of transcriptomes and epitopes by sequencing; CTHRC1, collagen triple helix repeat containing 1; DAPI, 4’,6-diamidino-2-phenylindole; SMA, α-smooth muscle actin; SSc-ILD, systemic sclerosis-associated interstitial lung disease; SSC, systemic slcerosis; t-SNE, t-distributed stochastic neighbour embedding.

Differential gene expression in SSc-ILD is driven by myofibroblasts

Comparing differential gene expression of all fibroblasts from SSc-ILD with the controls, POSTN, CORIN, KIF26B, FNDC1, SEZ6L2 and LAMP5 were the top upregulated genes (figure 3B). Many of the non-collagen upregulated genes were discretely present in the SSc-ILD myofibroblasts, including POSTN, KIAA1324L, COMP, TDO2, ADAM12, MXRA5, ALDH1A3 and LRRC17, supporting the hypothesis that myofibroblasts undergo the greatest phenotypic changes in SSc-ILD.

Differential expression of myofibroblasts

To evaluate the synthetic properties and functions of the three major fibroblast populations, differentially expressed genes (adjusted p<0.05, absolute log2-fold change >0.5) distinguishing each cluster from the other two were identified and evaluated for enriched gene ontology biological processes (online supplementary file 3). In myofibroblasts, 237 genes were upregulated by greater than twofold, including COL10A1 (22.93-fold), DPEP1 (13.61-fold), TSPAN2 (6.45-fold), POSTN (6.39-fold) and CTHRC1 (5.18-fold) (figure 5A,D). Fifteen collagen genes, numerous other genes essential to collagen synthesis, multiple metalloendopeptidases and the post-translational modification inducing E3 ubiquitin ligase FBXO32 were among the upregulated genes. Enriched processes for the upregulated genes reflect the increased collagen and extracellular matrix synthesis by the myofibroblasts (figure 5G). Enriched processes among the downregulated genes reflect reduced response to normal regulatory processes, including decreased regulation of cell proliferation and cell death, consistent with the pathological expansion of this subgroup in SSc-ILD.

Figure 5

Differential gene expression comparing each major fibroblast cluster (myofibroblast, SPINT2hi fibroblast, MFAP5hi fibroblast) with the other two. (A–C) Volcano plots include all differentially expressed genes with absolute value log2-fold change >0.5, and are coloured by adjusted p value <0.05 or not. (D–F) Violin plots demonstrate gene expression of selected distinguishing genes for each fibroblast subpopulation. (G–I) Enriched GO biological processes for the differentially expressed upregulated and downregulated genes for each comparison of one major fibroblast cluster with the other two. Functional enrichment analysis was performed using DAVID with all differentially expressed genes with adjusted p value <0.05 and absolute value log2-fold change >0.5 included. A, D and G display the results of myofibroblast to SPINT2hi fibroblast and MFAP5hi fibroblast comparison. B, E and H display the results of SPINT2hi fibroblast to myofibroblast and MFAP5hi fibroblast comparison. C, F and I display the results of MFAP5hi fibroblast to myofibroblast and SPINT2hi fibroblast comparison. BMP, Bone Morphogenetic Protein; DAVID, Database for Annotation, Visualization, and Integrated Discovery; GO, gene ontology.

Differential expression of SPINT2hi fibroblasts

Within the SPINT2hi fibroblasts, GRIA1 (13.74-fold), KANK3 (8.44-fold), FIGF (7.13-fold), SPINT2 (6.16-fold), FGFR4 (5.27-fold) and TCF21 (2.97-fold) were among the 98 genes with greater than twofold increased expression (figure 5B,E). While this subgroup exhibited substantially less expression of the abundant collagen genes COL12A1, COL1A1 and COL1A2, other collagens including COL13A1 (2.63-fold) and COL6A6 (1.60-fold) showed increased expression within this subgroup.

Differential expression of MFAP5hi fibroblasts

In the MFAP5hi fibroblasts, TNNT3 (39.89-fold), MFAP5 (26.34-fold), PI16 (18.87-fold), IGF2 (18.49-fold) and ACKR3 (16.16-fold) were among the 114 genes with greater than twofold increased expression (figure 5C,F). Three members of the Wnt-related secreted frizzled-related protein (SFRP) family—SFRP1 (13.45-fold), SFRP4 (3.34-fold) and SFRP2 (2.52-fold)—were upregulated, although none were exclusive to this fibroblast subset.


There is presently a gap in knowledge regarding the heterogeneity of fibroblast populations in the human lung and their detailed phenotypic changes in SSc-ILD. In this study we provide a comprehensive view of fibroblasts and other mesenchymal cell populations present in both SSc-ILD and healthy control lungs, and identify new markers of myofibroblasts and other lung fibroblast populations. While fibroblasts with high expression of ACTA2 formed a distinct subtype, other fibroblast subgroups expressed low-level ACTA2 in both normal and diseased tissues, precluding its use as a unique marker of myofibroblasts. Given the high synthetic capacity of the myofibroblasts, excessive levels of type I collagen transcription, dramatic expansion and evidence of active proliferation in SSc-ILD, our results support the current disease paradigm that myofibroblasts are the key profibrotic effector cell.

Although transcriptome data alone cannot conclusively identify myofibroblast origin, our data support the model that, in SSc-ILD, myofibroblasts first differentiate from other lung mesenchymal populations, then proliferate. Control lungs demonstrated a paucity of myofibroblasts when examined alone, with a striking expansion of myofibroblasts appearing in SSc-ILD samples, including a subpopulation of actively proliferating myofibroblasts. While myofibroblasts likely differentiate from multiple sources in disease, we hypothesise MFAP5hi fibroblasts may act as progenitors in SSc-ILD. The MFAP5hi fibroblasts clustered geographically closest to the myofibroblasts, reflecting their more similar transcriptome, expressed greater collagen than the SPINT2hi fibroblasts and had elevated expression of multiple Wnt regulators, consistent with overexpression of SFRP genes previously reported in both idiopathic pulmonary fibrosis lungs and SSc skin.28–30

Pericytes are believed to contribute to fibrosis through their transformation into myofibroblasts, as well as their direct production of collagen.11 31 32 While the SSc-ILD myofibroblasts expressed significantly more collagen than the other mesenchymal cell populations (figure 3E), the SSc-ILD pericytes expressed COL1A2 and COL3A1 at levels similar to the SPINT2hi and MFAP5hi fibroblasts, whereas healthy control pericytes expressed much less collagen. In vitro and murine studies have demonstrated TGF-β1 induces transformation of pericytes to myofibroblasts.13 33 Although our data provide no direct evidence of pericyte to myofibroblast transformation, the marked expansion of pericytes in SSc-ILD samples is consistent with the possibility that pericytes play an important role in SSc-ILD. As all of the patients with SSc-ILD in our study also had WHO group 3 pulmonary hypertension due to chronic lung disease, the pericyte expansion may also play a role in this complication.

Comparing our scRNA-seq data with recently published analyses of murine lung mesenchymal populations,34 35 increased expression of COL13A1 and TCF21 among the SPINT2hi fibroblasts was analogous to the description of a subgroup of COL13A1 matrix fibroblasts in mice.34 ITGA8, NPNT, LBH and MFAP4, among others, were also conserved across species between these groups of fibroblasts, although none were exclusive to the SPINT2hi group, and most were expressed to a lesser degree by the myofibroblasts as well. However, none of the fibroblast subpopulations we observed were consistent with previously described lipofibroblasts, characterised by the lipid-droplet trafficking protein perilipin 2 (also known as ADRP, or adipose differentiation-related protein), as this was similarly expressed in all human fibroblasts.8 36 Other proposed lipofibroblast markers including LIPA, LPL and FABP5 also did not differentiate any specific fibroblast population, suggesting the lipofibroblast designation is not as phenotypically relevant in human lung, or possibly that these cells were lost during scRNA-seq processing.37 38 Recent studies have reported varying results as to whether lipid-droplet-containing cells are present in the human lung.8 39 No analogous human population corresponded to the murine COL14A1 matrix fibroblasts, and the newly proposed markers of murine myofibroblasts, such as Hhip, Mustn1 and Grem2, did not distinguish the human myofibroblast population. In comparison with the SFRP2/DPP4 fibroblasts recently identified in human skin,40 both the MFAP5hi lung fibroblasts and myofibroblasts expressed SFRP2 and DPP4, with the MFAP5hi fibroblasts expressing higher PCOLCE2 and CD55 compared with the other fibroblasts. Unlike in the skin, WIF1hi lung fibroblasts did not express SFRP2 or NKD2, and other markers of dermal fibroblasts did not differentiate pulmonary fibroblast subpopulations.

Although all samples were from end-stage disease and predominantly demonstrated UIP on histology, we examined both typically less advanced upper lobes and more fibrotic lower lobes in order to capture tissues reflective of a spectrum of the disease course. Case series identifying NSIP as the predominant histopathology in SSc used surgical lung biopsies,20 and may only reflect the distribution of disease patterns in early disease, rather than at transplant or death. For example, a 2001 case series including pathology from autopsy and biopsy noted a UIP pattern in 44% of cases,41 and a past report of gene expression in SSc-ILD explants observed UIP in all samples.21 Studying explant tissue is valuable as these patients all progressed to end-stage disease, and thus there is the most critical need for improved understanding of their disease pathogenesis in order to develop new therapeutic options. Comparing our data with a previous microarray analysis of non-end-stage NSIP SSc-ILD tissue obtained by surgical lung biopsy, among the top 40 differentially expressed genes by microarray, COMP, POSTN, FKBP11, COL3A1, COL1A1 and TDO2 were all distinctly expressed by the SSc-ILD myofibroblasts in our scRNA-seq analysis, thus strongly supporting the generalisability of our findings to patients with NSIP and earlier disease.

Our study was limited by its relatively small cohort of patient and control samples. Because patients with SSc-ILD now rarely undergo surgical lung biopsy, explanted lungs are the only consistent source of tissue for new investigative analyses. As many transplant centres continue to avoid transplanting patients with SSc due to their coincident oesophageal disease, the availability of tissue is limited to select centres and precludes acquiring large numbers of samples. We were unable to perform age and sex matching due to reliance on explanted tissue, and thus the average age and sex of control (35.2 years, 40% male) and SSc-ILD lung subjects (56.75 years, 75% male) differed. The canonical correlation analysis methodology aligned cell types well despite such potential bias.18 Additionally, due to a difference in the reagent chemistry and digestion protocol used in processing these samples, we did not combine the two SSc-ILD and one control sample used for CITE-seq with the other 13 samples, and instead chose to use these samples as a validation cohort. Analytic methods for scRNA-seq data are rapidly advancing and may in the near future allow for improved normalisation and integrated analysis of multiple samples, despite interindividual variation and batch effects, in order to create larger combined data sets from multiple investigators. Our study was also limited by the inability to complete immunohistological verifications of all fibroblast subpopulations at this time, due to the absence of reliable antibodies for immunohistochemistry of the relevant markers.

In summary, our analysis harnesses the distinct capacity of scRNA-seq and CITE-seq to discern new fibroblast heterogeneity in human SSc-ILD and healthy control lungs, providing new insights into these pathogenic cells at an unprecedented multimodal level. The expression signature of mRNAs and select surfaces proteins (or transcriptome map) now available for the pathogenic myofibroblasts add considerably to our knowledge of this key effector cell in fibrotic lung diseases and provides new insights into their functional importance.


The authors would like to acknowledge the University of Pittsburgh Medical Center lung transplantation team for procurement of the lungs, the Center for Organ Recovery and Education (CORE), and the organ donors and their families for the generous donation of tissues used in the study.



  • Handling editor Professor Josef S Smolen

  • Contributors RL and EV conceived the project. EV wrote the manuscript and RL edited the manuscript. JS and MR provided patient samples. TT, CM and EV performed the experiments, and TT performed the single-cell RNA-seq. EV performed the RNA-seq analysis. HTB performed the histological assessment of samples. MB and CM performed the immunological staining. All authors provided editorial commentary of the manuscript.

  • Funding Research reported in this publication was supported by the National Institutes of Health National Institute of Arthritis and Musculoskeletal and Skin Diseases under award number 2P50AR060780 (RL) and the National Heart, Lung, and Blood Institute under award numbers R01HL123766 (RL) and 2T32HL007563-31 (EV). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

  • Competing interests RL has received consulting fees from PRISM BioLab, Merck, Bristol Myers Squibb, Biocon, Formation, Genentech/Roche, UCB and Sanofi, and grant support from Elpidera, Kiniksa and Regeneron, outside the submitted work.

  • Patient consent for publication Not required.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data availability statement For scRNA-seq data, raw counts in sparse matrix format for all samples are available at the public, open access repository Gene Exppresion Omnibus (GEO)- GSE 128169.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.