The Kennard
Stones algorithm [23]-[25]
has gained an increasing popularity for splitting data sets into two subsets.
The algorithm starts by finding 2 samples that are the farthest apart from each
other on the basis of the input variables. These 2 samples are removed from
the original data set and put into the calibration data set. This procedure
is repeated until the desired number of samples has been reached in the calibration
set. The advantages of this algorithm are that the calibration samples map the
measured region of the variable space completely and that the test samples all
fall inside the measured region. Yet, this algorithm is only usable for a single
subsampling run, as the partitioning of the data is unique rendering the algorithm
for a resampling procedure unusable.