Statistics from Altmetric.com
Systemic auto-inflammatory diseases (SAIDs) are a group of disorders characterised by inflammation that occurs in the absence of pathogenic autoantibodies, autoreactive T lymphocytes or other infective causes.1 ,2 The inflammatory attacks are predominantly mediated by cells of the innate immune system, neutrophils and monocytes, and by molecules, such as interleukin (IL)-1β, IL-6 and tumour necrosis factor (TNF)-α.3 ,4
Depending on gene defect(s) and diagnosis, patients affected with SAIDs exhibit continuous or recurrent acute phase response.3 SAID conditions have a heterogeneous genetics characterised by monogenic, and at a lesser extent, multifactorial inheritance, with causative genes involved in the innate branch of the immune response.5 Manifestations usually start in early infancy or neonatal period,6 ,7 but rare adult onset has also been described.8 ,9 SAID clinical picture is extremely wide, ranging from recurrent and self-limiting fever episodes associated with arthralgia and myalgia, rash, chest and abdominal pain (periodic fevers) to chronic and persistent inflammatory disease course with skin rash, arthritis and possible severe neurological inflammation (cryopyrinopathies). Some diseases are characterised by more localised inflammatory manifestations at skin, joint and bones (Pyogenic Arthritis, Pyoderma gangrenosum and Acne (PAPA) and Blau's syndromes). Finally, variable combinations of symptoms in patients with uncharacterised systemic inflammatory diseases may make the diagnosis very difficult.10 ,11
A great improvement in SAID diagnosis has been achieved since 1997, when the first SAID gene was identified.12 ,13 To date, there are 25 known SAID genes reported in the Infevers database (http://fmf.igh.cnrs.fr/ISSAID/infevers), responsible for many more SAID conditions, due to the wide phenotypic heterogeneity typical of these disorders. Molecular genetics has greatly contributed to correct diagnosis, especially in atypical presentations.14 On the other hand, mutational screening may not be exhaustive if focused on a minority of known genes and/or mutational hotspots.6 Indeed, it has been reported that at least 50% of patients with SAID do show a negative genetics.3 Reasons for such a low mutation detection rate include polygenic complex phenotypes, such as systemic onset juvenile idiopathic arthritis and likely also PFAPA syndrome,15 ,16 lack of precise clinical classification criteria and missing mutations because of partial gene screening, namely the screening of either a limited number of candidate genes or a subset of coding portions. As a further complication, a number of polymorphisms have frequently been detected in many SAID genes and investigated to verify their possible association with specific phenotypes.17–20Moreover, even though SAIDs are characterised by early onset, adult patients carrying low penetrant mutations have also been described for cryoyrin-associated periodic syndrome (CAPS), tumor necrosis factor (TNF) receptor-associated periodic syndrome (TRAPS) and familial Mediterranean fever (FMF),21–24 thus indicating genetic and clinical complexity. Finally, somatic mosaicism has been demonstrated, by means of very deep sequencing, in patients presenting with typical CAPS and Blau syndromes, but negative for germline mutations.25 ,26
The very fast development of new technologies for massive sequencing has made it possible to focus on many different genes simultaneously, a comprehensive approach that effectively fulfils the diagnosis of complex and/or heterogeneous diseases.27–30
In an attempt to improve the molecular diagnosis and genotype interpretation of patients affected with SAIDs, here we report the development of a next-generation sequencing (NGS)-based protocol designed to simultaneously screen 10 genes. Fifty DNA samples, from patients with SAID already genotyped for the respective causative gene(s), were sequenced and three different bioinformatic pipelines compared. Sensitivity and specificity of data analysis were assessed, thus prompting both the application of the protocol to routine diagnosis and the study of the correlation between genotypes and the vast range of SAID phenotypes.
Materials and methods
Forty-eight genomic DNA samples from patients with SAID and two asymptomatic transmitting parents had already been found to carry at least one mutation in one of the known causative genes previously tested through Sanger sequencing. These mutation-positive patients were taken into consideration for developing an NGS-based diagnostic protocol.
Genes panel design, libraries preparation and sequencing
A panel of gene amplicons specific for SAIDs was designed through the Ion AmpliSeq designer software (https://www.ampliseq.com/browse.action), including 10 genes selected from those reported on the Infevers database (http://fmf.igh.cnrs.fr/ISSAID/infevers/), most of which have been known for a long time to be involved in the pathogenesis of SAIDs (see online supplementary table S1). Coding regions, plus 20 bp flanking each exon, were included in the design, which ended up with 21.8 kb of target DNA subdivided into 191 amplicons (2 primer pools) covering 116 exons. Libraries preparation and successive sequencing were carried out according to Life Technologies’ protocols. Further details are reported in the online supplementary text.
FastQ data, generated by Ion PGM semiconductor, were analysed by three different workflows: Ion Reporter 4.0, CLC Bio Genomics Workbench 6.5 and GATK-based in-house pipeline, as outlined in the online supplementary figure S1, and reported in detail in the online supplementary text.
Coverage assessment was carried out for every run by the Ion Coverage Analysis plug-in v4.0-r77897 and amplimers at <10× were discarded as unsuitable for further analysis.
Additional missense and frameshift variants were all validated by Sanger sequencing. PCR products were purified by ExoSAP-IT (GE Healthcare) and directly sequenced using Big Dye V.1.1 and a ABI3130 automated sequencer (Applied Biosystems, Foster City, California, USA). Validated variants were studied by SIFT (http://sift.jcvi.org) and PolyPhen (http://genetics.bwh.harvard.edu/pph2/) software in order to predict their possible functional impact.
Two-tailed Fisher's test was performed to compare variant allele frequencies assessed in the present study with those reported in the 1000 Genomes Project website (http://www.1000genomes.org/). p Values corrected for multiple testing are reported and considered significant when <0.05.
SAID gene panel
An NGS-based gene panel screening was developed in order to identify causative mutations in SAIDs. To this end, 10 genes were selected, whose known mutations represent approximately 75% of all the entries reported in the Infevers database (http://fmf.igh.cnrs.fr/ISSAID/infevers/). Overall, the panel includes the MEFV, MVK, TNFRSF1A, NLRP3, NLRP12, NOD2, PSTPIP1, IL1RN, LPIN2 and PSMB8 genes, known to be associated with five SAID spectra, as reported in the online supplementary table S1. More than 99% of the input gene regions could be included in the experimental design. Missed regions are reported in online supplementary table S2.
Samples and PGM runs
Fifty DNA samples, from subjects previously genotyped by Sanger sequencing and carrying at least one coding mutation in one of the 10 SAID genes under analysis, were selected and processed through the new screening protocol. The whole target was captured by means of 191 multiplex amplification reactions subdivided into two PCR pools including 95 and 96 amplicons, respectively. Not all the amplicons could be amplified with the same efficiency. Indeed, a mean coverage of 336× was assessed for the whole target of 191 amplimers in the 50 DNA samples, with 98.5% of amplicons at >10× and 96.06% at >30×. The mean coverage of each amplimer for all the 50 samples is reported in online supplementary figure S2. In figure 1, the 191 amplimers are ordered from the least (left) to the most (right) represented, with those at <30× framed by a red box. In order to find the minimum coverage able to detect true positive variants, amplimers at <10× were assessed and found to be only three, when estimated as mean among the 50 samples. However, there are amplimers showing such a little coverage in a significant proportion of DNA samples, as reported in online supplementary table S3 listing amplimers with a coverage of ≤10× in >50% and 20% of samples. Amplimers presenting with a high rate of low coverage, thus identified, are checked after every run and eventually screened by Sanger sequencing.
Due to sample ascertainment, at least one mutation in one of the 10 genes under study was expected to be found per patient. Therefore, analyses performed by means of the three bioinformatic tools aimed at maximising detection of the expected variants, and implying that, once these variants were correctly called, we would detect also unknown variants of genes not previously analysed in each patient. Resulting variants were compared with the expected mutation list. Surprisingly, no workflow was able to detect all the 79 variants known in the 50 DNA samples (table 1). The CLC Genomic Workbench missed 11 variants, 10 of which are the same missense substitution p.V377I of the MVK gene, while one is the p.M694I mutation of the MEFV gene. The p.V377I change affects a codon lying in the amplimer named MV10 that is known to be covered <10× in >20% of samples (see online supplementary table S3). This can also explain the eight p.V377I variants missed by the Ion Reporter workflow. The GATK-based pipeline has returned the most reliable results, missing only two variants: p.F433L and p.E277D in the NLRP3 and PSTPIP1 genes, respectively. In an attempt to force the calling of these two mutations, we changed the default setting of the ‘maximum deletion fraction’, which normally avoids calls for variants rounded by >5% of reads with deletions. Unfortunately, besides detecting both mutations, with the new settings the analysis suffered from drawbacks and three false variants (p.Q345K and p.A946T in the NOD2 gene and p.R760G in the LPIN2 gene) were called in addition, as confirmed by Sanger sequencing.
The targeted re-sequencing of many genes not previously analysed in the patients has also allowed us to detect a number of additional variants. These new missense and indel variants, either common or unique, were validated by Sanger sequencing in order to compare the true and false positive detection rates of the three workflows. Interestingly, only 1 out of 50 DNA samples showed no additional rare or common variant, while the remaining 49 individuals turned out to carry more variants in the gene set under analysis.
A total of 204 variants were detected, only 151 of which were called by all the pipelines (table 2). CLC-Bio Workbench and Ion Reporter analysis returned 151 and 153 variants, respectively. Sanger sequencing could not confirm 2 CLC and 3 Ion Reporter calls. GATK showed the largest amount of false positive calls, with 54 out of 204 variants that could not be confirmed by Sanger sequencing. Therefore, false positive variants were detected by CLC, Ion Reporter and GATK at a rate of 1.32%, 1.96% and 26.47%, respectively.
To test the hypothesis that some of the variants are either causative or predisposing to SAID, allele frequencies were compared with data available from the 1000 Genomes Project (http://www.1000genomes.org/), both all the samples (All) and European samples only (EU), as reported in online supplementary table S4, along with prediction data about possible variant effects, obtained using SIFT (http://sift.jcvi.org/) and PolyPhen (http://genetics.bwh.harvard.edu/pph2/) software. Only two single-nucleotide polymorphisms, MEFV p.V726A and LPIN2 p.P626S, resulted in a statistically significant over-representation of the variant alleles in SAID samples compared with 1000 Genome data (All and EU). The MEFV variant resulted also deleterious and possibly damaging, supporting its possible pathogenic effect. On the other hand, the LPIN2 variant was recognised as tolerated and benign, thus suggesting it is a neutral polymorphism. Finally, two NOD2 variants, p.P268S and p.V955I already known as genetic variations of uncertain significance, turned out with allele frequencies higher in SAID than controls (All but not EU).
Impact of different variants on the clinical phenotype
The availability of complete genotyping data at 10 loci has prompted us to re-evaluate the overall clinical picture of the patients also in the light of the new mutations found. Complete clinical information at the last follow-up was available for 34 out of 50 individuals (48 patients and 2 transmitting asymptomatic parents). This has allowed us to draw a genotype–phenotype correlation, as displayed in table 3, whose details about groups and single patients are reported in the online supplementary information file.
Seven patients with definitive diagnosis of SAID (two CAPS, three mevalonate kinase deficiency (MKD) and two FMF) did not display any additional possibly causative variant/mutation, besides those already known, and were included in group 1. Consistently, these patients showed a clinical phenotype typical of the specific disease. Conversely, 16 patients with a definitive diagnosis of SAID (four CAPS, five MKD, two TRAPS, three FMF, one PAPA and one FCAS2) were carriers of one or more additional possible effective variants in at least another gene included in the NGS panel (group 2, table 3). Notably, in the majority of these patients, the additional variants seemed not to modify the clinical picture that was therefore consistent with the principal diagnosis. The third group includes patients without a definitive diagnosis of SAID, who show no clear causative mutation, but rather have variants that might have had a role in their disease-like phenotypes. Also in this case, additional mutations did not influence significantly the clinical picture (group 3, table 3). Finally, four patients with undefined clinical diagnosis and non-confirmatory genetic test belong to group 4. Further clinical details about these patients can be found in table 3 and the online supplementary text.
SAIDs are a heterogeneous group of both monogenic and, at a lesser extent, multifactorial diseases caused by primary dysfunctions of the innate immune system.5 Previous diagnosis by Sanger sequencing, performed by us on a restricted number of genes, namely NLRP3, MVK and TNFRSF1A, and/or gene portions, has resulted time consuming and expensive, failing to detect mutations in around 86% of >2000 patients recruited among several Italian Pediatric Rheumatologic Units (unpublished data).
Clinical misdiagnosis, mutations in untested gene regions, genetic heterogeneity and/or a complex mode of inheritance are all possible explanations. To improve molecular diagnosis, and to draw genotype–phenotype correlations, we sought to approach the mutation search in patients with SAID by using an NGS procedure. To this end, we launched a pilot project consisting in the mutational screening of 10 genes, through amplification capture of their coding portions and deep sequencing by an NGS platform, in 50 already diagnosed patients with SAID. The genes selected for the analysis were highly representative of the known SAID genes carrying, at the time of the experimental design, >75% of all the mutation entries reported in the Infevers database (http://fmf.igh.cnrs.fr/ISSAID/infevers/). Patients were known to carry mutations at one or more of these genes, and altogether guaranteed a heterogeneous set of genotypes. Results of our study have laid bare a number of not only technical but also genetic and clinical aspects that deserve consideration.
As expected, the gene input did not coincide with the amplification design obtained from the AmpliSeq online tool, and 6 regions of 5 genes, for a total of 58 coding basepairs, did not result to be part of the initial target capture. Further sequences were missed as, despite the very high mean coverage of the whole target among the 50 samples (336×), some amplicons did not achieve a reliable coverage. Overall, around 1.6 kb of coding target sequences (7.3% of total target) were not included because of either lack of original design or poor amplification rate. Once precisely mapped, these regions need to be tested by standard Sanger sequencing approach in order to implement the analysis results with variants possibly located within these gene portions, refractory with the present capture tool to the NGS assay. Indeed, this is a matter we need to be aware of as we have experienced, during the validation of our NGS procedure, one frequent MVK mutation (p.V377I) missed several times from two out of three bioinformatic pipelines because of low coverage of the corresponding amplicon (in each case <10×).
Variant calling also represents a tricky step, which we undertook by comparing three different pipelines: (i) Ion Reporter, (ii) GATK-based and (iii) CLC-Bio software. Mutations previously detected by standard analysis in the 50 subjects (48 patients with SAID +2 parents), hence referred to as ‘expected’ mutations, were correctly called, except a few cases, due to either low coverage or setting conditions. These confounding circumstances need to be kept in mind to minimise the false negative rate. On the other hand, due to our experimental design, many additional variants were called, yet unknown in the corresponding patients as they had not been analysed for all the 10 genes before. To estimate the rate of false positives among these latter variants, 204 of them were validated through standard Sanger analysis. A quite high number of variants called by the GATK-based workflow could not be confirmed while the specificity of the two other pipelines turned out to be much higher (≥98% vs 74%). Altogether, we can conclude that the Ion Reporter 4.0 has returned the most significant results, which are highly reliable provided a careful consideration of the gene portions presenting with low coverage. Finally, the minimum coverage needed for a reliable diagnostic test has already been recommended to be 30×; 31 however, according to our results, which missed variants in amplicons at ≤10×, we are tempted to suggest a coverage of ≥15× to call germline variants.
Besides mere technical evaluations, we also assessed the potential impact of the additional variants on patients’ phenotypes by (i) comparing allele frequencies in our set with those deduced by the 1000 Genomes Project and (ii) drawing a genotype–phenotype correlation.
A few common variants have turned out to be possibly associated with SAID, an observation that may account for the wide phenotypic variability of these disorders, and deserving further investigations. Interestingly, some genes, such as IL1RN and PSMB8, did not show many variants, while others, that is, NOD2, LPIN2 and NLRP12, presented with a high frequency of mutations, some of which might be responsible for mild or atypical symptoms.
Nevertheless, their contribution to the clinics is still doubtful, representing the most challenging issue for the actual use of NGS panels in the daily clinical practice in patients with suspected SAID. Indeed, our findings are going to have an impact on diagnosis assessment and interactions between geneticists and clinicians to interpret genotypes and report combinations of new and/or known NGS variants. The contribution of multiple genes to SAIDs has already been reported, representing not only a fortuitous coexistence of highly prevalent mutations in a single individual but digenic inheritance could be proven in some cases too.32 In the present study, we had the possibility to analyse for the first time the possible impact of additional variants in other genes in the clinical phenotype of 34 patients that we could carefully and systematically review in the light of the novel NGS findings. Our initial experience shows that, with a few exceptions, the presence of additional possibly pathogenic variants in genes included in the present small gene panel and not related to the original diagnosed disorder apparently did not affect the clinical phenotype (group 2), even in patients with a non-confirmatory genetic test (group 3). Conversely, the interpretation of the actual NGS data in patients with an undefined inflammatory phenotype (group 4) is remarkably difficult. Though variant combinations may explain the presence of a complex and ambiguous phenotype, like in patient FP1418, in other cases the presence of several mutations could lead to overestimate the NGS findings and to misdiagnosis. A combination of NGS and clinical information may provide diagnosis for some patients, thus supporting the need of evidence-based and validated clinical criteria as crucial tools to be used concurrently with the genetic analysis for the final diagnosis and classification of patients with SAIDs.33 The unanswered question on the possible role of multi-gene analysis with NGS in the characterisation of undiagnosed patients with an undefined phenotype will be analysed in a larger number of patients through a larger panel of genes.
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
- Data supplement 1 - Online supplement
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.