Optimising the choice of normalisation method for use in machine-learning classification of human blood plasma ambient ionisation mass spectra

Eardley-Brunt ASJ, Song L, Study OAMI, Study OAAA, Vallance C

February 2026

Journal article

Journal:

International Journal of Mass Spectrometry

Volume:

520

Elsevier

117553

Construction of large mass spectrometric data sets usually involves some combination of normalisation, scaling, and transformation of individual mass spectra in order to correct for technical (and sometimes biological) variation. Many different approaches to data normalisation have been reported, and there is no particular consensus on the best approach. The present study systematically evaluates a set of 24 normalisation, scaling, and transformation methods, and their 420 possible combinations, in the context of atmospheric solids analysis probe (ASAP) mass spectra of human blood plasma. The plasma samples came from two separate cohorts of patients, enrolled respectively in the Oxford Acute Myocardial Infarction (OxAMI) and Oxford Abdominal Aortic Aneurysm (OxAAA) clinical studies. Within each cohort, patients are classified according to a number of different clinical variables. We have investigated the effect of normalisation, scaling, and transformation method on subsequent clustering of the data into the classes of interest, and on machine-learning based classification of the data into the categories of interest. The choice of method was found to have a substantial effect on data clustering, measured via the clustering ratio C R , but a much smaller effect on machine-learning based classification, quantified via Cohen’s κ statistic. New intensity-histogram-based normalisation methods were found to have the greatest effect on clustering, while mean, median, vector, and AUC normalisation yielded the best machine-learning classification performance across multiple algorithms. High clustering ratios do not necessarily correlate with improved supervised classification outcomes, underscoring the need to consider subsequent data analysis methodology carefully when optimising data preprocessing pipelines.

Keywords:

3401 Analytical Chemistry

34 Chemical Sciences

Machine Learning and Artificial Intelligence

DOI

10.1016/j.ijms.2025.117553

Optimising the choice of normalisation method for use in machine-learning classification of human blood plasma ambient ionisation mass spectra

OQI is a strategic initiative of the University of Oxford and is supported by the Strategic Research Fund

Contact us

oqiadmin@physics.ox.ac.uk