Article Text

Download PDFPDF

On how to not misuse hierarchical clustering on principal components to define clinically meaningful patient subgroups. Response to: ‘On using machine learning algorithms to define clinical meaningful patient subgroups’ by Pinal-Fernandez and Mammen
Free
  1. Alain Meyer1,2,
  2. Lionel Spielmann3,
  3. François Séverac4,5
  1. 1 Exploration Fonctionnelle Musculaire, Service de physiologie, Hôpitaux Universitaires de Strasbourg, Strasbourg, France
  2. 2 Centre National de Référence des Maladies Auto-Immunes Systémiques Rares de l'Est et du Sud-Ouest, Service de rhumatologie, Hôpitaux Universitaires de Strasbourg, Strasbourg, France
  3. 3 Service de Rhumatologie, Hôpitaux Civils de Colmar, Colmar, France
  4. 4 Service de Santé Publique, GMRC, CHU de Strasbourg, Strasbourg, France
  5. 5 iCUBE, UMR 7357, équipe IMAGeS, Université de Strasbourg, Strasbourg, France
  1. Correspondence to Dr Lionel Spielmann, Service de Rhumatologie, Hospices Civils de Colmar, Colmar 68024, Alsace (Région), France; lionel.spielmann{at}ch-colmar.fr

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

We thank Pinal-Fernandez and Mammen for their interesting methodological comment on our work in which we used hierarchical clustering on principal components to define clinically meaningful subgroups of patients with anti-Ku antibodies.1 2

We fully agree with the conclusion of the authors: ‘machine learning methods may be fundamentally flawed if a cornerstone of the analysis depends upon the incorrect use of a complex biostatistical technique’.

In this regard, the example of hierarchical clustering on principal components they provide in their comment is an illustration on how this statistical tool can be misused and generate false discoveries:

  1. First, hierarchical clustering on principal components is a descriptive method that is fitted to describe heterogeneous datasets. Prior …

View Full Text

Linked Articles