Testing of Machine learning models

Nowadays with lot of data available ML models are being extensively used to enhance the customer experience.
Whether generating a T-shirt design or recommending a dress for an occasion or buying furniture; most of these things are driven using machine learning.

But like normal software products, machine learning models also need software testing.
In the case of normal software, we do have test oracles. This means we have input and expected pair. But in the case of ML models values are expected or predicted, thus having input-expected pairs is not feasible.

Think of face recognition software. During model development, one can check the accuracy using test split, train split, or cross-validation.
In real life, there can be various situations like a person wearing a mask,
sufficient background light not there, etc. So we need to test those real-life situations too. We need to have something known as pseudo oracles.

For machine learning model testing is being done in 3 ways, which are used the most :

  1. Dual coding: It means the same data will be applied to various models. So think of business rule being generated using decision tree and Random forest and artificial neural network. We observe that whether for all of these models in case of same input whether the output value lies almost in the same range.
    If the value does not vary much we can say that our rule generation approach is proper. For example, in the case of face recognition software, the approach or features selected to determine a face will be mostly same irrespective of the model being chosen.
  2. Metamorphic relationship: In the case of any ML model we need to find the metamorphic relation that exists. For example in the case of machine learning software that determines the chances of developing heart disease is there. Now we determine that if a person is aged more than 30 years and smokes then chances to have heart disease is increased by 5 %. Now when training data or test data is being passed; the percentage should come as 5 % for age 30, 10 % for the age of 35 and above, and 15 % for the age of 40 and above. If at any level if this fails then that is a bug.
  3. Coverage Guided Fuzzing(CGF): Coverage guided fuzzing is something which is used effectively for conventional software. In the case of conventional software, various fuzzed input is being given so that all paths of code are being traversed and hence coverage increased and so does the bug detection.
    Now suppose there is one ML model which is based on a neural network. Then the neurons can be considered similar to various paths of conventional software. The aim of fuzzing here is to activate all the neurons by various inputs and thus identify the corresponding bug.

So in the case of CGF there are 3 parts :

  1. Input fuzzer.
  2. Mutator
  3. Analyzer.

Aim of input fuzzer is to pass those inputs that will lead to an error state and will also cause the activation of various neurons.

Mutator is for modifying the input. So for example in the case of image recognition software, a pixel being given as input will be reversed as that will activate a particular neuron and determine how does it behave. Think of it in a way that nose length is one factor to identify a face and then there is the corresponding neuron for the same. Now for a particular image, nose length is increased and passed as input to ANN to see whether the concerned neuron is able to determine or differentiate based on this.

The analyzer is to understand whether coverage increasing or not. So for example in case of tensor flow, for every input being passed there will be one activation function. Now whether that activation function is different from the activation function of last input or not; determines whether the coverage has increased or not. To differentiate between the activation functions one way is to utilize the K-nearest neighbor algorithm or using euclidean distance.

Based on these three components we can determine more guided coverage and also more detection of bugs.

These various approaches help us in software testing of the machine learning models.
For testing any machine learning software the pre-requisites are test engineers must be aware of the basic concepts of machine learning.
During the determination of the testing approach the QA team, data scientist, and PM must come together at the same table; so that the business goal is clarified which needs to be achieved through the ML model.
Doing so will ensure an enhanced and accurate customer experience.


Productivity Engineer at Narvar . Data Science enthusiast. Boxer by passion