How a lack of unit tests almost destroyed our company

In 2021, everything was still going smoothly at AutoLab. The team was bursting with energy and set off into a new chapter full of enthusiasm. With our first major TwinCAT project, we decided to develop our own AutoLab library that would serve as the foundation for future projects. At the same time, we launched our most ambitious endeavor to date: a high-speed end-of-line testing machine that was meant to catapult us to the forefront of inspection technology.

For months, we sprinted toward this goal. Along the way, many new functions, classes, and modules were created, all tested manually in simulation. In meetings, we would go through what was already done and what had been tested. Statements like “The XTS process logic has been tested and works,” or “The event management system was developed over weeks and controls the error handling and response of the entire system” were made, reinforcing our confidence in our work. Unfortunately, it was a false confidence.

Commissioning was scheduled to begin in September. By July, a lot was already finished, but there was still plenty left to do. And that’s when strange behaviors began to pile up. Nothing clearly reproducible, but enough to set off my gut alarm. I analyzed a few of these cases—two full days of investigation—until it suddenly became clear what was going on: the code only worked on the surface. Underneath, it was riddled with bugs, inconsistent states, and silent errors. The manual testing had mostly focused on the standard case, but as soon as one of the many edge cases occurred, things fell apart. It was also quite possible that some things had once worked but were broken again by later changes—only no one could reliably notice, because we lacked automated test cases to systematically uncover such issues. What exactly had gone wrong was impossible to pin down. Without tests, everything remained speculation. It didn’t even matter anymore. What mattered was that we only had two months left—two months to fix everything in a system that no one had ever truly, and I mean systematically, tested.

The only way to make up for my late realization of the situation was to throw myself into finishing the project with everything I had. From that point on, 70- to 80-hour weeks became my norm. By the time commissioning began, many of the bigger bugs were fixed and most features prepared—unfortunately still without clean unit tests, just with sheer willpower. By early September, we had a project state we could actually start commissioning with. The first disaster had been averted, but the next one was already on the horizon. We had only two months left until the Run@Rate, where the machine would have to prove its capabilities at full capacity. The big errors? We had already beaten them. But the truly dangerous ones weren’t the obvious ones—they were the dozens of small ones. And a small error could easily cause a major crash and wreck the entire schedule. And it was exactly these small errors that began to surface during commissioning, one after another. We now sprinted toward the project finish line in overlapping two-shift operation: one programmer in the “normal day shift,” me from 12 noon until 3 a.m. During the day, I helped with the robots; in the evening and at night, I worked on cleaning up the PLC program or wrote the first HALCON routines for image processing. Saturdays and Sundays were, of course, workdays too. The lack of testing was now catching up to us, and the deadline was getting closer. Two weeks before Run@Rate, almost nothing was working yet—but giving up was not an option. Two working days before Run@Rate, the machine ran stably and robustly for the first time. I could hardly believe it. Somehow, through sheer force of will, we had made it. The Run@Rate itself was passed with flying colors, and the celebration afterward was long and lively.

The project was ultimately completed successfully, but the library was left in shambles—practically unusable. The planned project hours had been vastly exceeded. At that moment, it became clear to me that things could not continue like this. We needed a fundamental change: libraries that were not only tested manually, but reliably and automatically verified—with a kind of zero-trust approach. And so, we started our first library developed with consistent unit testing. It quickly became clear that this new approach was paying off: the library functions were reliable, and the error rate dropped drastically. This was confirmed impressively at the next commissioning, which went almost flawlessly thanks to the tested libraries and, this time, was completed in just a few weeks—without overtime. Hidden bugs were a thing of the past.

Today, I am convinced: unit testing saved AutoLab. If we hadn’t introduced it, we might have gotten lucky once or twice more, but our downfall would have been inevitable. Without tests, you rely on luck. With tests, you rely on clean, systematic validation.

What are unit tests and why are they important?

Unit-Tests

Unit Tests are automated tests that check individual, isolated units of your code—typically functions or methods. They help ensure that these small building blocks of code do exactly what they are supposed to. Each test is written in such a way that it runs independently from the rest of the application.

Imagine you are developing an app that calculates discounts on products. One function in it should apply a 10% discount to a price.

A unit test would now test exactly this one function—independently of whether the app has a database, whether it runs in the browser, or whether other features exist.

Example: You enter €100 into the function—the test checks whether €90 comes out. Then you test €50 → expectation: €45. You can also check special cases: e.g. €0 or negative values.

The goal: You want to make sure that this one discount calculation works correctly—completely on its own.

In your real application or library, the functions you test are often much more complex, for example, synchronizing motion controls. The good thing, however, is that defining the expected value is usually surprisingly simple. If the actual value later matches it, the unit test passes. If not, it fails, and you immediately know where to start. And most importantly: once written, the test can be executed automatically and systematically with every change in the codebase!

In this case, it doesn’t matter which programming language or environment you are working in. Whether Python, TwinCAT, or MATLAB—the principle remains the same: call a function with certain inputs and check the result against the expected value.

In Python, I work with pytest; in TwinCAT, with TcUnit; and in MATLAB, I use script-based tests that can be started directly via runtests.

Integration tests: the next step

While unit tests check individual building blocks, integration tests test whether several components work together. For example: an operator sets a target value via the touch panel. The test checks whether this value correctly reaches the PLC and is processed there, including feedback to the interface. Such tests are usually more complex, run more slowly, but reveal precisely the critical transitions and interfaces where systems often fail.

In practice, the line between unit and integration tests often blurs. Of course, one could engage in a scientific debate over definitions, but in my opinion, that’s a complete waste of time. What really matters is that testing is done properly. That includes both unit and integration tests.

The good thing about it: in terms of tooling, it doesn’t matter. Whether pytest or TcUnit—both are suitable for both types of tests. The difference lies only in the goal: either you check the result of a single functional unit, or the result of an entire chain of functions working together in integration.

Test Driven Development (TDD): test first, then code

TDD is a development approach in which you write the test before the actual code. You first think about what a function is supposed to accomplish and then formulate exactly that as an automated test.

After that, you write only as much code as needed for the test to pass successfully. Once that’s achieved, you structure the code: clean up, shorten, rename, reuse. What’s important is that the behavior doesn’t change—the test must still pass.

TDD helps to think through the code more carefully, write it in a more modular way, and catch errors early on. Especially as complexity grows, this pays off in the long run.

Above all, the point about modularity is, in my opinion, a great side effect. TDD does not inherently aim to make the code more modular, but it automatically forces you to break functions down into small, manageable units that are as independent of other methods as possible. My experience: the more TDD I do, the cleaner my code becomes—seemingly as if by magic.

Who actually writes the tests?

In TDD, it is common for the same person to write both the test and the corresponding code. First, you formulate a test for a specific behavior, then you write the function that fulfills this behavior. After that comes the next test, the next function—step by step. In my experience, this also has a very practical mental advantage. Instead of working toward one big reward at the end, you experience many small intermediate successes throughout the day. Personally, this makes programming much more enjoyable for me.

Do you have to write all the tests first?

No. In TDD, the goal is not to write out the complete test catalog of a class in advance. Quite the opposite. You start with a small, specific test that describes a particular behavior. Then you write the corresponding code. Once that works, you move on to the next test, the next function.

In this way, clean code emerges step by step, traceable and verifiable at any time.

Challenges without unit tests

A central problem with skipping unit tests lies in the different approaches developers take to manual testing. While some proceed thoroughly, others hardly test individual methods at all and only check the code in the overall system, which makes debugging extremely difficult. On top of that, manual testing is significantly more time-consuming in the long run. For example: let’s assume manually testing a method takes about five minutes. Writing an automated test might take twenty minutes, but as soon as changes or refactorings come into play, the effort quickly pays off—because the test can be executed automatically as many times as needed.

Small example calculation:

  • Manual test: 5 minutes
  • Automated test (one-time write): 20 minutes
  • Break-even: 20 minutes ÷ 5 minutes = after 4 runs

That means: after just four manual test runs, the automation effort has already paid off. In large projects or libraries used over many years, manual testing quickly becomes a bottleneck. In addition, automated tests also uncover errors that are otherwise easy to overlook. For example, I once noticed through unit tests that a change in our own library had triggered a bug in an external integrated library, which would otherwise have gone undetected. Such examples show how valuable automated tests are for quality assurance.

Another crucial disadvantage is the lack of traceability. Automated tests document the testing process and the results. Moreover, without automated tests, many hidden dependencies and errors that creep into complex projects go unnoticed. All these points clearly show why unit tests are so valuable and, in the long run, significantly improve not only quality but also development efficiency.

Best Practices for Unit Testing

💡Every bug, its unit test

Every bug gets its own unit test. When you discover a bug, first write a unit test that fails. Only then fix the bug, and keep the unit test forever to make sure it stays covered even during refactoring.

💡Teste Verhalten, nicht Implementierung

If you have to restructure the code just so the test can “understand” it, you’re not testing the behavior—you’re misusing the implementation.

💡Keep tests small, independent, and clear

A good unit test is compact, self-contained, and tells you right away whether everything is correct.

💡Use meaningful test names

A good test name is like a good commit message: you immediately understand what it’s about—without any further comment.

💡Better to start than to wait

Better to start with automated unit tests today and trigger the pipeline manually than to wait months for the perfect automation.

Success stories and results

I am truly convinced that TDD saved our company. The reasons for this are manifold. In the systems we developed without unit testing, we regularly had smaller bugs even after commissioning. Nothing serious that would have endangered production, but annoying enough to make life difficult especially for inexperienced operators and to cost us a lot of time.

In all projects with TDD, the commissioning time has been drastically reduced—from around 30 to just 8 weeks for projects of similar size. And after commissioning, the topic is usually done. Occasionally, there may be feedback with requests for improved usability, but the functionalities simply run.

An extreme advantage is also the trust in our libraries. When developing machines as complex as ours, you always have dozens of actuators and sensors interacting with each other, plus different modes and operator inputs. In the past, when strange effects occurred, you were faced with the question: Is the bug in component A, B, C, D, E, or F?

Today, we usually know very quickly where to look, because over 90% of the code is in libraries that have been tested almost to perfection. In 99.99% of cases, the error is not there, and you can rule out a significant number of possible sources within seconds.

Just recently, we had a case where a system repeatedly stopped due to an error message: “Multiple, contradictory end positions on cylinder XY.” Thanks to our feature that automatically saves all history logs with every step change after an error occurs, we were able to trace it—even when it happened during the night shift.

Maintenance checked the situation, moved the cylinder, and assured us that the sensors were properly adjusted and the error was not plausible. The first impulse of many: the library has a bug. But since we had numerous unit and integration tests for exactly this case, I was able to rule that out with near certainty. That left only a few possibilities. One of them: the filter time for the end positions was set too high. Due to the signal delay, the error sometimes occurred.

The cause was therefore found within minutes—instead of a developer having to spend hours searching the code for errors. The library was then updated so that operators can now only set filter times that make sense, making such pseudo-errors impossible.

Even just now, while I’m writing this blog post, one of our developers reported a strange edge case: the Twicat event logger runs into a problem if you reconfigure during an alarm whether or not operator confirmation is required. Thanks to the numerous tests, we didn’t have to look for a needle in a haystack, but knew within minutes that it could only be an edge case occurring when making this change in FB_init. In the past, the analysis might have taken us a whole day—now it takes just a few minutes.

If I wanted to, I could list hundreds more advantages, but I’ll close with one that is far from insignificant: onboarding junior developers. I still clearly remember when we were working on a generic axis module as part of a bachelor’s thesis. Enabling and then disabling the axis took the developer 15 minutes. But when we introduced unit tests together with her and familiarized her with TDD, one tiny bug after another came to light. In the end, the whole process took 3 hours.

Some might cry out here and say: “Exactly, that shows how much time you waste with TDD.” But I see it as an absolute success. Thanks to testing, the focus is on a flawless implementation. The developer learns how many little details interact even in simple functions. And as a team lead, I know that during commissioning we save not just hours, but often days or even weeks—because we can rightly trust the foundation thanks to the tests.

Conclusion and outlook

In the end, it can be said that introducing TDD was an absolute game changer for us at AutoLab. It made our libraries more modular, easier to maintain, and many times less error-prone. Compared to the problems we had before without TDD, I don’t think it’s an exaggeration to say that it “saved” us.

Nevertheless—especially in the PLC field, the frameworks are still quite immature, and the entire industry probably still has significant potential to unlock here. We will definitely try to push this further by continuously demonstrating the benefits and actively promoting it as part of our consulting services.

If industrial automation is still lagging behind general software development when it comes to TDD, then in the research field it’s probably even worse. I have been able to observe this in the course of my academic work—and I am also deliberately trying to counteract it. I am deeply convinced that TDD also leads to better research work, since by now almost every technical or scientific field relies heavily on software.


Blog post published

in

Schlagwörter:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

en_GBEnglish (UK)