Article Text

Download PDFPDF

Artificial intelligence should always be guided by human intelligence. Response to ‘Augmented vs. artificial intelligence for stratification of patients with myositis’ by Mahler et al
  1. Alain Meyer1,2,
  2. Lionel Spielmann3,
  3. François Séverac4,5
  1. 1Exploration Fonctionnelle Musculaire, Service de physiologie, Hôpitaux Universitaires de Strasbourg, Strasbourg, France
  2. 2Centre National de Référence des Maladies Auto-Immunes Systémiques Rares de l'Est et du Sud-Ouest, Service de rhumatologie, Hôpitaux Universitaires de Strasbourg, Strasbourg, France
  3. 3Service de Rhumatologie, Hôpitaux civils de Colmar, Colmar, France
  4. 4Service de Santé Publique, GMRC, CHU de Strasbourg, Strasbourg, France
  5. 5ICube, UMR 7357, équipe IMAGeS, Université de Strasbourg, Strasbourg, France
  1. Correspondence to Dr Lionel Spielmann, Service de Rhumatologie, Hospices civils de Colmar, Colmar, Alsace 68024, France; lionel.spielmann{at}ch-colmar.fr

Statistics from Altmetric.com

We thank Mahler et al for their comment1 on our work in which we used hierarchical clustering on principal components to define clinically meaningful subgroups of patients with anti-Ku antibodies.2

Mahler et al argue for the use of machine learning alongside expert decision, thus relying on augmented judgement in making the final decision on patient stratification. We share this view.

In this regard, we disagree with the statement according to which the hierarchical clustering on principal components applied to 1000 observations with a multivariate normal distribution proposed by Pinal-Fernandez and Mammen3 and the hierarchical clustering on principal components applied to our observations from 42 anti-Ku patients yielded similar results.

As Mahler et al stated, clustering “will always give an optimal solution for the number of clusters present in a dataset”.

But here are three major differences between the two results that should not be overlooked, in order to avoid false discovery:

  1. Mahler et al stated that “it is to the user’s discretion to determine whether those clusters exist or not”. At first (human) glance on the factorial maps, one can deduce that the results yielded by the clustering applied in Pinal-Fernandez and Mammen’s dataset and ours are radically different. Examination with the naked eye of the projection in the factoral map of the 1000 observations proposed by Pinal-Fernandez and Mammen reveals a unique dot that should lead to the human intelligence–driven conclusion that no relevant cluster exists in the dataset. In contrast, projection in a factorial map of our observations in 42 anti-Ku patients yielded three dots when observed with the naked eye that should lead to the human intelligence–driven conclusion that partitioning these patients on their principal components is relevant.

  2. To further appreciate the relevance of the clusters, human intelligence can be assisted (‘augmented intelligence’) by the use of statistical tools such as Bartlett’s test of sphericity and K-fold cross-validation. We ran these two tests on the example proposed Pinal-Fernandez and Mammen and showed that the irrelevance, already obvious at the stage of ‘human intelligence’, was confirmed at the stage of ‘augmented intelligence’.4 On the contrary, suitability of our data for hierarchical clustering on principal components and the presence of the three clusters observed on the naked-eye examination was confirmed by these tests.2

  3. In essence, the example provided by Pinal-Fernandez and Mammen is a smart statistical sophistry in which artificial intelligence is voluntarily misguided by human intelligence since one should not use hierarchical clustering on principal components on a dataset that is not suspected to contain relevant clusters. On the contrary, we used this technique because prior data from independent teams have highlighted that patients with anti-Ku are heterogeneous and thus well fitted for this analysis.

In conclusion, we share the view of Mahler et al according to which artificial intelligence should always be guided by human intelligence in order to avoid the risk of false discovery, elegantly illustrated by the example provided by Pinal-Fernandez and Mammen. As briefly pointed above, certain good practice rules do exist to prevent both authors and readers from this danger. Ultimately, hypotheses generated by these techniques must be validated in independent cohorts.

References

View Abstract

Footnotes

  • Handling editor Josef S Smolen

  • Contributors AM and LS wrote the manuscript with support from FS. All authors contributed to the final version of the manuscript.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Patient consent for publication Not required.

  • Provenance and peer review Commissioned; internally peer reviewed.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Linked Articles