Salivary gland changes, characterised by a focal lymphocytic sialadenitits, play an important role in the diagnosis of primary Sjögren's syndrome (PSS) and were first described over 40 years ago. Recent evidence suggests that minor salivary gland biopsy may also provide information useful for prognostication and stratification, yet difficulties may arise in the histopathological interpretation and scoring, and evidence exists that reporting is variable. With the increasing number of actual and proposed clinical trials in PSS, we review the evidence that might support the role of histopathology as a biomarker for stratification and response to therapy and highlight areas where further validation work is required.
- Sjøgren's Syndrome
- Outcomes research
Statistics from Altmetric.com
Salivary gland biopsy is a commonly performed diagnostic procedure in primary Sjögren's syndrome (PSS). Under the American-European Consensus Group Criteria (AECG),1 the presence of either anti-Ro or La antibodies or a positive salivary gland biopsy is mandatory. Although designed as classification criteria, they have been widely used to support a diagnosis of PSS, and in centres where biopsy is frequently undertaken, as many as 40% of cases are autoantibody negative.2 Furthermore, salivary gland biopsy may have a prognostic role in relation to lymphoma development.
With an increasing number of clinical trials in PSS, we believe there is a need to (i) provide guidance on standardisation of salivary gland histopathology to ensure homogeneity of study populations, and (ii) understand its potential and capability as a biomarker of response. This paper reviews the relevant background data.
Current histopathological guidance for diagnosis
A lip biopsy is usually undertaken to obtain minor salivary glands (MSGs). However, parotid gland biopsy is sometimes performed, and in experienced hands is associated with a low complication rate.3 A positive biopsy has been defined as a focal lymphocytic sialadenitis (FLS) with a focus score (FS) of ≥1 per 4 mm2.4–6 FLS describes the presence of ≥1 aggregates of ≥50 mononuclear cells, mostly lymphocytes, in a perivascular or periductal location, typically adjacent to normal acini.6 It should be recognised at the outset that FLS may occur in conjunction with other autoimmune diseases and in healthy individuals, and so is not by itself diagnostic of PSS.7 A protocol for the analysis of MSG biopsies is available on the Sjögren's International Collaborative Clinical Alliance consortium website.8 This recommends collection of 3–5 MSGs that are surgically separated (as opposed to wedge resection) and specifies that foci associated with FLS "are adjacent to normal-appearing mucous acini, in lobes or lobules that lack duct dilation and contain no more than a minority proportion of plasma cells". The diagnosis of FLS "is assigned when these foci are the only inflammation present in a specimen, or the most prominent feature". The FS is only provided when a ‘diagnosis’ of FLS is made and is calculated by dividing the number of foci by the total glandular surface area in mm2, and multiplying by 4, to give the number of foci per 4 mm2. Above an FS of 10, foci are typically confluent and an arbitrary score of 12 is often applied.4 ,5
Difficulties in interpretation may arise, however, given that features more usually associated with non-specific chronic sialadentitis (NSCS), such as acinar atrophy, interstitial fibrosis and duct dilatation, are relatively common and increase with age,9 and so may coexist with PSS-related FLS. NSCS itself is often accompanied by infiltration and even foci of lymphocytes. Furthermore, the replacement of glandular tissue with fibrotic tissue as a result of both age and chronic salivary gland inflammation10 may both lower the FS and lead to a ‘burnt-out’ appearance. In some cases, anti-Ro antibody-positive PSS may show patterns of inflammation not meeting these criteria.
Nevertheless, this description of FLS with subsequent calculation of an FS is to be preferred over the older Chisholm and Mason score,11 which is still widely reported. Grades 1 and 2 in the latter scheme encompass a qualitative description of lymphocyte infiltration, which is relatively common in the general population, and grades 3 and 4 require the presence of lymphocyte foci.11 Importantly, this non-linear scheme would allow the scoring of patterns of inflammation seen commonly in NSCS and does not enumerate severity above an FS of 1.6
Differences in practice include acquisition of only single MSGs and use of multiple cutting levels.2 ,12 Given the stochastic nature of foci in FLS, an insufficient glandular area to be examined may lead to an underestimation or overestimation of the FS. Altogether, it is not surprising that variability in reporting has been observed between centres.13
Biopsy as a prognostic tool?
In a subset of patients, lymphocyte foci may start taking on features reminiscent of secondary lymphoid organs and develop compartmentalised B-cell-rich and T-cell-rich zones.14 ,15 These are associated with ectopic production of lymphoid chemokines (CXCL13 and CCL21) and expression of the enzyme, activation-induced cytidine deaminase (AID).16 ,17 AID enables local maturation of autoimmune responses by facilitating affinity maturation and class switching of the antibody response, and is expressed in association with CD21+ follicular dendritic cell (FDC) networks.
While the presence of segregated foci with discrete B and T cell areas is relatively common, the presence of fully formed germinal centre (GC)-like structures, characterised by light and dark zones, is only present in a minority of patients.16 This distinction is likely to be functional since survival of high-affinity somatic-mutated B cells in the light zone is dependent upon antigen presentation and local expression of survival factors by FDC. Indeed, increased expression of autoantibodies, both in the tissue and in the periphery, has been associated with GC formation in the salivary glands.18
DNA hypermutation is associated with genetic instability, and recent data have suggested that GC-like structures may provide prognostic information in relation to the risk of lymphoma, which occurs in up to 5–10% of patients over prolonged follow-up.19 This possibility was first suggested in 1999,20 but more recently, Theander et al21 found that 6 out of 7 lymphomas observed in a cohort of 175 patients occurred in those with GCs in baseline biopsies, giving a 99% negative predictive value in the absence of GCs. GC formation is associated with higher FS, so it is not surprising that a similar observation was made in a separate cohort, in which an FS of <3 was also associated with a negative predictive value of 98%, whereas FS ≥3 had a positive predictive value of 16%.22 Importantly, the median follow-up between biopsy and lymphoma was 7 years in the former study and 68 months in the latter, suggesting that biopsy characteristics might be both stable over time and capable of stratifying patients.
Light microscopy is considered sufficient to allow accurate detection of fully formed GC-like structures on H&E stained sections. However, the reported prevalence of GC-like structures is quite variable, ranging between 18% and 59% of patients with PSS.14 Immunohistochemical staining for FDC networks, using CD21 as a marker, might improve the reliability and consistency of GC identification. However, this remains to be tested and whether this improves prognostic value should be evaluated.
MSG biopsy as an outcome measure
The pressing need for validated outcome measures to support clinical trials has been addressed by the European League Against Rheumatism (EULAR) with the development of EULAR's Sjögren's Syndrome Disease Activity Index (ESSDAI)23 and EULAR's Sjögren's Syndrome patient reported index (ESSPRI).24 While an important advance, the ESSDAI focuses on systemic disease features, and so is less relevant to patients with predominantly glandular manifestations. The ESSPRI addresses the symptomatic components of dryness, pain and fatigue, the latter having a particularly important impact on quality of life,25 but which might be susceptible to placebo effects or the impact of concomitant disorders, leading to important implications for sample size. An objective biomarker of glandular inflammation would therefore be desirable, and biopsy has the added advantage that it may offer insights into mechanism of action of a novel agent, or more importantly, reasons for failure in a negative study. Outcome Measures in Rheumatology have proposed a filter to examine the applicability of proposed outcome measures,26 and we consider the available evidence in regard to this below.
Truthfulness encompasses the concepts of face (credibility), content (comprehensiveness), criterion and construct validity. Face validity would be supported by the association of FLS, with the presence of keratoconjunctivitis sicca (KCS), and the severity of KCS with the FS.27 In a large study of patients presenting with dryness symptoms, an FS ≥1 was also associated with antibodies to Ro and La, rheumatoid factor and antinuclear antibody and lower unstimulated salivary flow.6 ,28 ,29 A higher FS predicts a greater decline in unstimulated salivary flow over time.30 The FS has also been reported to correlate moderately with stimulated salivary flow,5 ,10 ,31 and with the presence of fibrosis but not with atrophy.10 Change in stimulated salivary flow correlated with disease duration. Although a small study found no correlation between change in FS over time and change in stimulated salivary flow, a correlation between FS and stimulated salivary flow at the follow-up time point was observed.32
In relation to the comprehensiveness of MSG biopsy, it should be noted that most saliva is produced by the major salivary glands (parotid, submandibular and sublingual) with only 5–10% produced by the 600–1000 MSGs.33 Notably, the predominant parenchymal cells in parotid and MSGs differ, being serous and mucous acinar cells, respectively, and the relevance of this to the pathology of PSS has not been well studied. Few studies have compared major salivary gland and MSG biopsies. Pijpe et al undertook concurrent parotid and MSG biopsies in 30 patients investigated for the presence of PSS. The diagnostic sensitivity and specificity were identical and the presence of foci, confluence and GCs were similar, although the actual FS were not presented as the emphasis of the study was on diagnostic capability.3 In contrast, an older study of 31 subjects reported greater diagnostic sensitivity, but lower specificity, of sublingual versus MSG biopsies, but this predated the use of the AECG criteria and used a FS cut-off of >1 as opposed to ≥1.34 Again, a comparison of the actual FS obtained was not undertaken. Notably, both studies reported a lower incidence of lymphoepithelial lesions35 in MSGs compared with major glands. Lymphoepithelial lesions describe ductal infiltration by lymphocytes and accompanying basal epithelial cell proliferation. It has been suggested that these may also be associated with lymphoma development and, were this to be the case, the source of tissue would be a consideration for stratification and outcome.
Criterion validity tests the ability of a biomarker to agree with a gold standard. The ‘gold standard’ for diagnosis of PSS remains expert clinical opinion, and a recent systematic review found the diagnostic sensitivity and specificity of biopsy to be 72% and 87%, respectively.36 While this reinforces the relevance of MSG as a potential biomarker, it should be noted that only two out of the nine studies that were included made attempts to avoid the circular reasoning inherent in testing the diagnostic capability of a marker that is already in use in existing classification criteria. The ability of MSG biopsy to stratify patients into low-risk and high-risk groups for lymphoma, as already alluded to, provides some evidence to support predictive validity.
Construct validity describes consistency with theoretical concepts and one could consider this high, given the importance placed in pathogenic models on the infiltration of exocrine glands by autoreactive lymphocytes, leading to dryness through the induction of epithelial apoptosis or dysfunction. This seems probable even despite possible factors contributing to dryness that may not directly relate to extent of inflammation, such as neurally mediated reduction in salivary flow37 or functional antimuscarinic antibodies.38 Overall, despite some gaps in knowledge, existing evidence seems to support the truthfulness of FS assessment in relation to its use as a biomarker in PSS.
Critical to the ability of MSG biopsy to detect change (discriminant validity) are the issues of sensitivity to change and reliability. Uncontrolled case series involving small numbers of subjects and obsolete classification criteria and scoring systems (Talal39 and Tarpley40) suggested that MSG lesions were progressive over time but were improved with cyclophosphomide.40 ,41 Later small studies in subjects without treatment, or using agents not considered effective, suggested that the FS may be stable over time or, at most, progress slowly (table 1).
In order to consider biopsy as a biomarker, an understanding of variability and reliability is required. As an illustration of the problems that might arise, Gescuk et al randomised 14 patients meeting AECG criteria to treatment with lamivudine or placebo. There was a statistically significant difference in FS at baseline (6.0 vs 1.8; p=0.0087), although also in salivary flow, rheumatoid factor and anti-Ro positivity, suggesting these groups were poorly matched.42 In a study predating the AECG criteria, but where the presence of a focus in a 4 mm2 representative section of tissue was an inclusion criteria, only the placebo group (n=8) showed a reduction in FS.43 An improved understanding of the natural history of FLS, the variability of scores in the patient population and the reliability of its assessment using MSG biopsies would be desirable and also important for statistical power calculations.
The reproducibility of MSG biopsy has been questioned. Al-Hashimi found that when taking additional sections the reproducibility of biopsy grade in all six sections was present in <40%.44 However, the scoring system used was Chisholm and Mason, and the most variable grades were II and III. Greater reproducibility was observed at grade IV, which might encompass many patients enrolled in a clinical trial. This also reflects the fact that in mild FLS foci are unevenly distributed, arguing for an increase in surface area to improve reliability. This was demonstrated in one study where the addition of two further cutting levels significantly improved diagnostic specificity in the subgroup with an FS of ≥1 but <2.12 Little diagnostic improvement was seen in the subgroup with FS>2, but as this will include many samples with an FS much greater than 2, one would need to assess the impact of number of cutting levels on change in actual FS in order to determine the optimal protocol for use as a biomarker. The practice of examining multiple MSG in each biopsy may also reduce the impact of increasing cutting levels. This was confirmed in other studies where there was no apparent difference in FS in two or three cutting levels 200 μm apart.45 ,46 Given that small lymphocytes are 7–10 μm, but larger lymphocytes up to 14–20 μm, the edges of a large focus could possibly be visible on two levels, risking introduction of bias by double counting.
In the absence of agents definitively proven to alter disease activity in PSS, the ability of MSG biopsy to be an effective biomarker is unproven. A recent study reported a reduction in FS following rituximab, but was not blinded.47 Abatacept increased disease duration-adjusted saliva flow and reduced the absolute number of foci in a pilot, but not foci per mm2.48 Glandular surface area was not reported. An increase in saliva production following open label mizoribine was most marked in subjects with higher levels of inflammation (presence of at least focal aggregates) on baseline biopsies. Change in FS was not reported.49
Few studies have addressed interobserver variability. In a single centre with extensive experience, two observers had intraclass correlation coefficient (ICC) values of 0.97 for the number of foci and 0.96 for the FS.6 A high kappa value of 0.75 was observed for diagnosis, using the Chisholm and Mason score, among 7 Italian centres, but values for the FS were not recorded.50 In contrast, a much lower ICC score of 0.48 was recorded among five pathologists in a single centre,51 and in a tertiary centre, 53% of biopsies referred for a second opinion, underwent diagnostic revision,13 emphasising the need for standardisation. These data are commensurate with many hospital centres receiving few salivary gland biopsy samples per annum for the investigation of PSS. A recent study from the Tolerance and Efficacy of Rituximab in PSS trial found complete intraobserver agreement for a dichotomised FS (<1 vs ≥1) read 2 months apart by a single specially trained pathologist.52 However, when compared with reports by local pathologists from individual study sites, kappa values were 0.71 for a dichotomised FS, and only 0.46 for determination of FLS. ‘Diagnostic revision’ following central read was made in 12.6% of patients, although it should be noted this was a cohort of patients with PSS recruited for a clinical trial, rather than unselected cases of possible PSS where levels of diagnostic revision might be higher. Agreement for the absolute FS, rather than the dichotomised score, was not provided.
In summary, further work on the natural history and reliability of FS evaluation would be desirable with respect to its use as a clinical trials biomarker. Standardisation of scoring and subsequent testing of interobserver and intraobserver variability is an important objective.
In experienced hands, MSG biopsy is a well-tolerated procedure. Some patients experience minor postprocedure pain, bleeding, bruising or swelling. Infection is rare, as is numbness or paraesthesia at the site of the procedure.2 While some patients are willing to have three or even four biopsies for the purposes of research,53 there is a limit on the number of times such a procedure could be performed. Given that histopathological findings may be relatively stable over time, consideration should be given to using pre-existing samples obtained up to 6 or 12 months before baseline if these are suitable for the proposed analyses. Patients’ attitudes towards lip biopsy and its role as a clinical trials biomarker should be studied.
Given the invasive nature of MSG biopsy, we would also advocate further evaluation of alternative methodologies that might function as biomarkers of salivary gland pathology. These would include salivary gland imaging. Ultrasound has been the most studied imaging modality; however, agreement with biopsy in relation to diagnosis in some studies has only been modest due to a lower sensitivity.54 Furthermore, it is unclear to what extent the hypoechoic areas observed on ultrasound reflect damage rather than inflammation, and therefore the extent to which these might be expected to change. Serum cytokine profiles have been reported to distinguish between patients with and without GC-like structures on biopsy,55 ,56 although this requires replication. Salivary biomarkers show promise, and recently a 4- and 6-plex panel was described that discriminated PSS from controls.57 The relationship between these panels and histopathological changes warrants further evaluation in a larger cohort. Numerous other biomarkers have been proposed in PSS. Of particular interest is the identification of type 1 interferon signatures and related biomarkers such as MxA.58 ,59 These appear to subdivide the PSS population and suggest potential for stratified medicine, but their relationship with salivary gland pathology has not been well-defined.
Need for a standardised core dataset
In the context of a clinical trial, how should scoring be standardised and what additional information should be available to allow comparison between studies? Given that NSCS is common, it might be unhelpful to include in the analysis a score of lymphocyte infiltration from a subject where FLS was not present on the baseline biopsy, as this might be observed for reasons other than PSS. However, this might mean some subjects being excluded. In one study, 16% of subjects meeting AECG criteria had a non-diagnostic biopsy, although almost half of these had focal infiltration but with FS <1.56
Conversely, given that fibrosis might arise through age or PSS-related processes, once FLS has been identified, an argument could be made for including all foci in the FS calculation, even when adjacent to fibrotic areas, and including all the glandular area in the denominator, to avoid introduction of bias.
The FS does not always capture the variation in severity seen in PSS since the size of foci may differ markedly and conceivably could change in response to treatment, independent of the FS itself. One report found that percentage of area infiltrated with lymphocytes correlated better than the FS with clinical and autoantibody parameters.60 In the context of clinical trials, we therefore think it is important to also calculate mean foci size and total foci area as a proportion of glandular surface area. Similar to what has been stated above in respect to the FS, an optimal minimum glandular surface area for this assessment that balances reproducibility with practicality has yet to be determined. Better characterisation of GC-like structures with staining for CD3, CD20 and the FDC marker CD21 should be explored.
PSS has been associated with a reduction in the ratio of IgA to IgG plasma cells.61–63 In the context of clinical trials, consideration could therefore be given to enumerating plasma cells over a defined area, with the respective IgA and IgG proportions. Higher FS are also associated with an increased B/T cell ratio, with additional reported associations including reduction in CD4/CD8 ratio and increasing CD68+ macrophage infiltration.64 At a minimum, it would seem desirable to enumerate the mean focal B/T cell ratio as this may be indicative of lesion severity. Although this is likely to be closely related to the extent of GC-like organisation of the lymphocytic foci, the relative value of each of these measures as a trial biomarker has yet to be determined.
Salivary gland biopsy offers distinct potential as a biomarker in PSS, particularly relevant to glandular involvement, and offers additional prognostic, stratification and mechanistic insights. However, its precise value is hard to determine in the absence of proven immunomodulatory therapies in PSS, and further work on validation and understanding the natural history would be desirable. Importantly, there seems a pressing need to standardise the histopathological interpretation and scoring of samples, followed by validation of the resulting protocols and recommendations.
Handling editor Tore K Kvien
Contributors All authors have contributed to the content of the manuscript, which was drafted by BAF and reviewed by all.
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.