Why TDD for PowerShell? Or why pester? Or why unit test a "scripting" language? / by Matt Wrock

I was asked a couple weeks ago by Adam Bertram  (@abertram) on twitter for any info on why one would want to use TDD with Pester. I have written a couple posts on HOW to use pester and I'm sure I mentioned TDD but I really don't recall ever seeing any posts on WHY one would use TDD. I think that's a fascinating question. I have not been writing much powershell at all these days but these questions are just as applicable to infrastructure code written in ruby I have been writing. I have alot of thoughts on this subject but I'd like to expand the question to an even broader scope. Why use pester (or any unit testing framework) at all? Really? Unit tests for a "scripting" language?

We are living in very interesting times. As "infrastructure as code" is growing in popularity we have devs writing more systems code and admins/ops writing more and more sophisticated scripts. In some windows circles, we see more administrators learning and writing code who have never scripted before. So you have talented devs that don't know layer 1 from layer 2 networking and think CIDR is just something you drink and experienced admins who consider share point a source control repository and have never considered writing tests for their automation.

I'm part of the "dev" group and have no right to judge here. I believe god placed cidr calculators on the internet (thanks god!) for calculating IP ranges and wikipedia for a place to lookup the OSI model. However, I'm fairly competent in writing tests and believe the discovery of TDD was a turning point in my becoming a better developer.

So this post is a collection of observations and thoughts on testing "scripts". I'm intentionally surrounding scripts in quotes because I'm finding that one person's script quickly become a full blown application. I'll also touch on TDD which I am passionate about but less dogmatic on the subject than I once was.

Tests? I ran the code and it worked. There's your test!

Don't dump your friction laden values on my devops rainbow. By the time your tests turn green, I've shipped already and am in the black. I've encountered these sentiments both in infrastructure and more traditional coding circles. Sometimes it is a conscous inability to see value in adding tests but many times the developers just are not considering tests or have never written test code. One may argue: Why quibble over these implementation details? We are taking a huge, slow manual process that took an army of operators hours to accomplish and boiling it down to an automated script that does the same in a few minutes.

Once the script works, why would it break? In the case of provisioning infrastructure, many may feel if the VM comes up and runs its bootstrap installs without errors, extra validations are a luxury.

Until the script works, testing it is a pain

So we all choose our friction. The first time we run through our code changes, we think we'll manually run it, validate it and then move on. Sounds reasonable until the manual validations prove the code is broken again and again and again. We catch ourselves rerunning cleanup, rerunning the setup, then rerunning our code and then checking the same conditions. This gets old fast and gets even worse when you have to revisit it a few days or weeks later. Its great to have a harness that will setup, run the exact same tests and then clean up - all by invoking a single command.

No tests? Welcome to Fear Driven Development

Look, testing is really hard. At least I think so. I usually spend way more time getting tests right and factored than whipping out the actual implementation code. However, whenever I am making changes to a codebase, I am so relieved when there are tests. Its my safety net. If the tests were constructed skillfully, I should be able to rip things apart and know that things are not deployable from all the failing tests. I may need to add, change or remove some tests to account for my work but overall, as those failing tests go green, its like breadcrumbs leading me back home to safety.

But maybe there are no tests. Now I'm scared and I should be and if you are on my team then you should be too. So I have a "sophie's" choice: write tests now or practice yet another time honored discipline - prayer driven development - sneaking in just this one change and hoping some manual testing gets me through it.

I'm not going to say that the former is always the right answer. Writing tests for existing code can be incredibly difficult and can make a 5 minute bug fix turn into a multi day yak hair detangling session even when you focus on just adding tests for the code you are changing. Sometimes it is the right thing to invest this extra time. It really depends on context, but I assure you the more one takes the latter road, the more dangerous the code becomes to change. The last thing you want in your codebase is to be afraid to change it unless it all works perfectly and its requirements are immutable.

Your POC will ship faster with no tests

Oh shoot, we shipped the POC. (You are likely saying something other than "shoot").

This may not always be the case, but I am pretty confident that a MVP (minimal viable product) can be completed more quickly without tests. However, v2 will be slower, v3, even slower. v4 and on will likely be akin to death marches and you probably hired a bunch of black box testers to test the features reporting bugs well after the developer has mentally moved on to other features. As the cyclomatic complexity of your code grows, it becomes nearly impossible to test all conditions affected by recent changes let alone remember them.

TIP: A POC should be no more than a POC. Prove the concept and then STOP and do it right! Side note: Its pretty awesome to blog about this and stand so principled...real life is often much more complicated...ugh...real life.

But infrastructure code is different

Ok. So far I don't think anything in this post varies with infrastructure code. As far as I am concerned, these are pretty universal rules to testing. However, infrastructure code IS different. I started the post (and titled it) referring to Pester - a test framework written in and for PowerShell. Chances are (though no guarantees) if you are writing PowerShell you are working on infrastructure. I have been focusing on infrastructure code for the past 3 to 4 years and I really found it different. I remain passionate about testing, but have embraced different patterns, workflows and principles since working in this domain. And I am still learning.

If I mock the infrastructure, what's left?

So when writing more traditional style software projects (whatever the hell that is but I don't know what else to call it), we often try to mock or stub out external "ifrastructureish" systems. File systems, databases, network sockets - we have clever ways of faking these out and that's a good thing. It allows us to focus on the code that actually needs testing.

However if I am working on a PR for the winrm ruby gem that implements the winrm protocol or I am working on provisioning a VM or am leaning heavily on something that uses the windows registry, if I mock away all of these layers, I may fall into the trap where I am not really testing my logic.

More integration tests

One way in which my testing habits have changed when dealing with infrastructure code is I am more willing to sacrifice unit tests for integration style tests. This is because there are likely to be big chunks of code that may have little conditional logic but is instead expending its effort just moving stuff around. If I mock everything out I may just end up testing that I am calling the correct API endpoints with the expected parameters. This can be useful to some extent but can quickly start to smell like the tests just repeat the implementation.

Typically I like the testing pyramid approach of lots and lots of unit tests under a relatively thin layer of integration tests. I'll fight to keep that structure but find that often the integration layer needs to be a bit thicker in the infrastructure domain. This may mean that coverage slips a bit at the unit level but some unit tests just don't provide as much value and I'm gonna get more bang for my buck in integration tests.

Still - strive for unit tests

Having said I'm more willing to skip unit tests for integration tests, I would still stress the importance of unit tests. Unit tests can be tricky but there is more often than not a way to abstract out the code that surrounds your logic in a testable way. It may seem like you are testing some trivial aspect of the code but if you can capture the logic in unit tests, the tests will run much faster and you can iterate on the problem more quickly. Also bugs found in unit tests lie far closer to the source of the bug and are thus much easier to troubleshoot.

Virtues of thorough unit test coverage in interpreted languages

When working with compiled languages like C#, Go, C++, Java, etc, it is often said that the compiler acts as Unit Test #1. There is alot to be said for code that compiles. Well there is also great value in using dynamic languages but one downside in my opinion is the loss of this  initial "unit test". I have run into situations both in PowerShell and Ruby where code was deployed that simply was not correct. Using a misspelled method name or referencing an undeclared variable just to name a couple possibilities. If anything, unit tests that do no more than merely walk all possible code paths can protect code from randomly blowing up.

How about TDD?

Regardless of whether I'm writing infrastructure code or not, I tend to NOT do TDD when I am trying to figure out how to do something. Like determining which APIs to call and how to call them. How can I test for outcomes when I have no idea what they look like. I might not know what registry tree to scan or even if the point of automation is controlled at all by the registry.

Well with infrastructure code I find myself in more scenarios where I start off having no idea how to do something and the code is a journey of figuring this out. So I'm probably not writing unit tests until I figure this out. But when I can, I still love TDD. I've done lots of infrastructure TDD. Really does not matter what the domain is, I love the red, green, refactor workflow.

If you can't test first, test ASAP

So maybe writing tests first in some cases does not make sense as you hammer out just how things work. Once you do figure tings out either refactor what you have with tests first or fill in with tests after the initial implementation. Another law I have found to equally apply to all code is that the longer you delay writing the tests, the more difficult (or impossible) it is to write the tests. Some code is easy to test and some is not. When you are coding with the explicit intention of writing tests, you are motivated to make sure things are testable.

This tends to also have some nice side effects of breaking down the code into smaller decoupled components because its a pain in the but to test monoliths.

When do we NOT write tests

I don't think the answer is never. However, too often than not "throw away code" is not thrown away. Instead it grows and grows. What started as a personal utility script gets committed to source control, distributed with our app and depended on by customers. So I think we just need to be cautious to identify these inflection points as soon as possible when our "one-off" script becomes a core routine of our infrastructure.