For the
variable selection, a combination of a genetic algorithm and neural networks
described in section 2.8.5 was used. For the evaluation
of the fitness function , the calibration data set of the
refrigerant measurements (see section 4.5.1.1) was
randomly split into a calibration (75%) and a test data subset (25%). The neural
networks were fully connected with 4 hidden neurons and 2 output neurons (1
network for both analytes together). The genetic algorithm evaluated 50 populations
during 76 generations whereas the stopping criterion was set to a convergence
of the standard deviation of the genes below 0.04. The parameter a of the fitness function was set to 0.9,
which resulted in the selection of 8 time points (0, 12, 15, 51, 67, 93, 122
and 125 seconds) as most dominant solution in the last generation. The corresponding
neural network (8 hidden neurons, fully connected and 1 output neuron) predicted
the test data subset with excellent low rel. RMSE of 1.87% for R22 and 2.50%
for R134a. Yet, the prediction of the external validation data by this network,
which had been trained using the complete calibration data set, shows RMSE of
2.32% for R22 and 2.93% for R134a comparable with the non-optimized neural
networks using all time points (see table 3
in section 7.4). A second run of the genetic algorithm
using a different partitioning of the calibration data into calibration and
test data subsets showed even worse results. After 86 generations 8 time points
(0, 3, 6, 51, 74, 90, 115 and 125 seconds) were selected with rel. RMSE of 1.84%
(R22) and 2.62% (R134a) for the prediction of the test subsets. The prediction
of the external validation data showed disappointing high errors of 2.63% for
R22 and 3.35% R134a (see table 3). For both
runs, the predictions of the external validation data are significantly worse
compared with the test data subset used for the evaluation of the fitness for
the genetic optimization. Additionally, the selection of the time points is
not reproducible. This instability of the variable selection can also be seen
in figure 46, which shows the frequency
of the time points being selected during 100 runs of the GA. Although some time
points are more often selected than other time points, there is no time point,
which was never selected. Both findings, the instability of the variable selection
and the deterioration of the prediction ability for external validation data
can be ascribed to a general problem of single run genetic algorithms. The variables
are selected on the basis of a fitness function with a static test and calibration
data set. Consequently, the optimal solution is only valid for one individual
partitioning of the data into calibration and test data subsets and is not representative
for the complete data set. Although the fitness function tries to compensate
for the overestimation of the test data by partly considering the calibration
data (in contrast to most GA found in literature), the drawbacks of a static
partitioning cannot be completely compensated. Apart from these problems known
in literature (approximately 99% of all GA are based on static data sets), the
single run algorithms are faced by additional problems:
1.Both, the chromosomes of the initial population and
the weights of the neural networks are randomly generated. As there is no
guarantee that the walk of the genetic algorithm in the search space, which
also contains random steps, can always find the best subset of variables before
converging, different runs (even with identical test and calibration data
subsets) often find similar but not exactly identical subsets of variables [254].
2.Jouan-Rimbaud et al. [255]
recently demonstrated that by chance correlation of variables often irrelevant
variables are selected by GA or have at least a significant influence on the
final model, even if validation procedures are used.