With all the advances of AI in recent years, people start to phantasize more and more about automating everything, including processes such as unit testing. In this article, I’ll pin down the 2 main arguments why full automation will never actually happen for unit testing.

1Unit Test Generation: The less effort you put in, the worse your results will get

One of the main pain points of unit testing is to actually generate the test cases. Thus, one main idea of using AI for unit testing is to auto-generate these test cases right away. Full automation in that case means that based on some information which is already there (usually the code under test), some algorithm applies some magic to generate unit test code, which then can be executed right away. Through static evaluation techniques, an abstract syntax tree Abstract Syntax Tree Generator can be generated to reason about the code under test, and which test cases might be required to test it. However, these techniques are not scalable to modern-sized software, as it requires to parse and interpret the whole internal logic of your project (it kind of seems obvious that this is too complex to do by a computer – but there is also a lot of literature on this, where this problem is usually referred to as the state explosion problem. If you are interested in further reading, this might be a good starting point The State Explosion Problem).

Thus, the second approach for automating test generation is to introduce randomness to reduce the complexity of the test generation problem. The basic idea is to disregard the abstract syntax tree, and just auto-generate test cases for random parts of the code under test. How does this work? For each code unit, you simply insert random input values based on the data types of these values, and record the output of the software after executing it with these input values. However, without any guidance (we are still talking about FULL automation), these random values might not be relevant at all. Just imagine all the potential values of a floating number – do you really think it’s reasonable to randomly choose some values from this infinite space of potential numbers? If you have bad luck, the first 1.000 results might be between 1 and 2, which most likely does not provide any value to finding different bugs. Also, how easy is it for you to interpret a value if the random number generater outputs 2.5928303553095320111000222 for a floating number? And now, just feed-forward this problem to randomly generated strings… Additionally, this approach assumes that the current version of the code unit is correct (as the output values are recorded as expected outputs). But isn’t it the reason to do software testing in the first place that we assume that our code is NOT correct??

So to sum up, fully automated test case generation ouputs a lot of test cases, where most of them might be redundant, and all of them are usually very difficult to interpret and maintain. However, manually creating all test cases is also not the best soution. There is indeed a lot of automation potential. Techniques such as equivalence class partitioning (read more about it here: 2 methods that help you save 60 % of your effort in Unit Testing) can help give automation tools the information required to generate meaningful test cases. The nice side: these techniques are optimized for requiring as few time as possible.

So: try to automate unit test generation – but always keep in mind that the less efort you put in, the worse your results might get!

2Unit Test Evaluation: The time VS accuracy tradeoff

Even though you have some test cases at hand, you still want to know to which extent these tests are representative for the quality of your software. Therefore, metrics have been proposed to automatically evaluate the potential of a given test suite to find bugs. These metrics can be used by ai tools (genetic algorithms have shown to be very efficient here, see e.g.: Search Algorithms for Regression Test Case Prioritization) However, there are still some issues with these metrics, which will also hinder full automation for test suite maintenance in the end.

Metrics such as code coverage can be calculated very easily, and thus give an overview about the test suite effectiveness very fast. However, these results are not really realiable, because of the nature of the underlying calculations. Why? Because of the same reason we mentioned in reason #1 already: we have to assume that the current version of our code unit is correct. Code coverage simply compares the lines of code which are covered by a test case. But what if there are lines of code which should have been written (e.g. input checks), but are not there in the actual software? For some examples that show the impact of this problem, continue reading here: Limits at which code coverage fails catastrophically.

As a result of this issue, blackbox-based evaluation metrics such as mutation scores have been developed. A mutation score is calcualted by altering the code under test to introduce faults (e.g. change < to <= in a check condition) – so-called mutants. If an introduced mutant makes at least one test case fail, it is regarded as “killed” by the respective test suite. The mutation score is then calculated by dividing the number of killed mutants by the overall number of introduced mutants. However, calculating this value shows to be quite computationally intense. For each mutant, the code needs to be parsed and adapted, and then, the whole test suite needs to be run. Why the whole test suite for each mutant? Well if you run the test suite only once for several mutants, and several test cases fail, you don’t know whether these test failures were caused by one single or by several mutants. Thus, calculating mutation scores can be very slow, leading to a tradeoff in unit test evaluation between time (very low for code coverage, but high for mutaiton scores) and accuracy (very high for mutation scores, but low for code coverage).

Without solving this tradeoff, ai-based test maintenance will not work in the end. These techniques need some kind of information on the value of each test case. however, there because of the mentioned tradeoff, there is no metric that provides stable results that can be calculated on a high scale (ai-based techniques usually require a lot of repeated calculations, especially when it comes to genetic algorithms). Thus, there will always be a human involved when it comes to maintaining test suites.

So after reading this article, I hope that you learned that even though there is a lot of potential for automation in current unit testing practices, this automation will always require a human in the loop. I’ve also equiped you with the three main arguments to convince others about this fact. But what is your personal take on AI-based automation? Let me know in the comments section below!

Leave a Comment

We use cookies to give you the best online experience. By agreeing you accept the use of cookies in accordance with our cookie policy.

Privacy Settings saved!
Privacy Settings

When you visit any web site, it may store or retrieve information on your browser, mostly in the form of cookies. Control your personal Cookie Services here.

GetResponse, Google Analytics

We use LinkedIn Insight for marketing purposes. You can disable these cookies.

We use Google Analytics for marketing purposes. You can disable these cookies.
  • __utmz
  • __utma
  • _ga
  • _gat

We use GetResponse for marketing purposes. This service cannot be disabled, otherwise the website functions will be limited.

Decline all Services
Accept all Services
Get Free Access Now to
9 eBooks!
All about Automated Software Testing
Proven experts
Learn to save up to 75% of your test efforts
Get Free Access Now!
Get Access Now! & Save 50%
Personal Trainer FREE Nutrition Custom Workout App
Get Access Now!
eBook Download
Enter your details to get your free ebook!
All about Automated Software Testing
Download Free Ebook
Lorem ipsum dolor sit amet, consectetur adipiscing