Boxstarter 2.5 Realeased / July 13, 2015 by Matt Wrock

This week I released Boxstarter 2.5.10. Although there are no new features introduced in this release or really the last several releases, it seemed appropriate to bump the minor version after so many bug fixes, stabilizations and small experience improvement tweaks in v2.4. The focus of this release has been to provide improved isolation between the version of Chocolatey used by boxstarter from the version previously (or subsequently) installed by the user.

In this post I'd like to provide further details on why boxstarter uses its own version of Chocolatey and also share some thoughts on where I see boxstarter heading in the future.

Why is boxstarter running its own version of Chocolatey?

When Chocolatey v0.9.9 released, it introduced a complete rewrite of chocolatey, transforming itself from a set of powershell scripts and functions to a C# based executable. This fundamentally changes the way that boxstarter is able to interact with chocolatey and hook into the package installation sequence. Boxstarter essentially "Monkey Patches" chocolatey in order to intercept certain package installation commands. It needs to do this for a few reasons - the most important is to be able to detect pending reboots prior to any install operation and reboot if necessary.

Boxstarter now installs the last powershell incarnation of chocolatey (v0.9.8.33) in its own install location ($env:appdata\Boxstarter\Chocolatey). However, it keeps the chocolatey lib and bin directories in the standard chocolatey locations inside $env:ProgramData\Chocolatey so that all applications installed with either version of chocolatey are kept in the same repository. Boxstarter continues to intercept choco commands and reroute them to its own chocolatey.ps1. Its important to note this only happens when installing chocolatey packages via boxstarter.

Boxstarter cannot achieve the same behavior with the latest version of chocolatey without some significant changes to boxstarter and likely some additional changes contributed to Chocolatey. I did some spiking on this while on vacation a few months ago but have not found the time to dig in further since. However it is the highest priority item lined up in boxstarter feature work standing just behind critical bug fixes.

Windows protected your PC?

If you use boxstarter using the "click-once" based web installer. You may see a banner in windows 8/2012 and up when installing boxstarter. This is because I have had to obtain a new code signing certificate (an annual event) and it has not yet gained enough time/downloads to establish complete trust from windows. I do not know what the actual algorithm for gaining this trust is but last year these warnings went away after just a few days. In the meantime just click the "more info" link and then choose "run anyways."

Why fewer features?

Over the last year I have taken a bit of a break from boxstarter development in comparison to 2013 and early 2014. This is partly because my development has been much less windows centric and I've been exploring the rich set of automation tool chains that have deeper roots in linux but have been gaining a foothold in windows as well. These include tools such as Vagrant, Packer, Chef, Docker and Ansible. This exploration has been fascinating. (Side note: another reason for less boxstarter work is my commitment to this blog. These posts consume alot of time but I find it worth it).

I strongly believe that in order to provide sound automation tools for windows and truly understand the landscape, one must explore how these tools have been built and consumed in the linux world. Many of these tools have some overlap with boxstarter and I have no desire to reinvent the wheel. I also think I now have a more mature vision of how to focus future boxstarter contributions.

Whats ahead for boxstarter?

First and foremost - compatibility with the latest chocolatey. This will mean some possibly gut wrenching (the good kind) changes to boxstarter architecture.

Integrating with other ecosystems

I'd like to focus on taking the things that make boxstarter uniquely valuable and making them accessible to other tools. This also means cutting some boxstarter features where other tools do a better job delivering the same functionality.

For instance, I'd like to make boxstarter provisioners for Vagrant and Packer. This means people can more easily setup windows boxes and images with boxstarter while leveraging Vagrant and Packer to interact with the Hypervisor and image file manipulation. With that, I could deprecate the Azure and Hyper-V boxstarter modules because Vagrant is already doing that work.

I'd also like to see better interoperability between boxstarter and other configuration management tools like chef, puppet and DSC.

Decoupling Features

Boxstarter has some features which I find very useful like its streaming scheduled task output, hang detection, and ability to wrap a script with reboot resiliency. There have been many times that I would have liked to have been able to reuse these features in different contexts, but they are currently hard wired and not intended to be easily consumable on their own. I'd like to break this functionality out which would not only make these features more accessible outside of boxstarter but also improve the boxstarter architecture.

Hoping for a more active year to come

I'm unfortunately still a bit time starved but I am really hoping to knock out the chocolatey compatibility work and get started on some new boxstarter features and integrations over the next year.

Why TDD for PowerShell? Or why pester? Or why unit test a "scripting" language? / June 22, 2015 by Matt Wrock

I was asked a couple weeks ago by Adam Bertram (@abertram) on twitter for any info on why one would want to use TDD with Pester. I have written a couple posts on HOW to use pester and I'm sure I mentioned TDD but I really don't recall ever seeing any posts on WHY one would use TDD. I think that's a fascinating question. I have not been writing much powershell at all these days but these questions are just as applicable to infrastructure code written in ruby I have been writing. I have alot of thoughts on this subject but I'd like to expand the question to an even broader scope. Why use pester (or any unit testing framework) at all? Really? Unit tests for a "scripting" language?

We are living in very interesting times. As "infrastructure as code" is growing in popularity we have devs writing more systems code and admins/ops writing more and more sophisticated scripts. In some windows circles, we see more administrators learning and writing code who have never scripted before. So you have talented devs that don't know layer 1 from layer 2 networking and think CIDR is just something you drink and experienced admins who consider share point a source control repository and have never considered writing tests for their automation.

I'm part of the "dev" group and have no right to judge here. I believe god placed cidr calculators on the internet (thanks god!) for calculating IP ranges and wikipedia for a place to lookup the OSI model. However, I'm fairly competent in writing tests and believe the discovery of TDD was a turning point in my becoming a better developer.

So this post is a collection of observations and thoughts on testing "scripts". I'm intentionally surrounding scripts in quotes because I'm finding that one person's script quickly become a full blown application. I'll also touch on TDD which I am passionate about but less dogmatic on the subject than I once was.

Tests? I ran the code and it worked. There's your test!

Don't dump your friction laden values on my devops rainbow. By the time your tests turn green, I've shipped already and am in the black. I've encountered these sentiments both in infrastructure and more traditional coding circles. Sometimes it is a conscous inability to see value in adding tests but many times the developers just are not considering tests or have never written test code. One may argue: Why quibble over these implementation details? We are taking a huge, slow manual process that took an army of operators hours to accomplish and boiling it down to an automated script that does the same in a few minutes.

Once the script works, why would it break? In the case of provisioning infrastructure, many may feel if the VM comes up and runs its bootstrap installs without errors, extra validations are a luxury.

Until the script works, testing it is a pain

So we all choose our friction. The first time we run through our code changes, we think we'll manually run it, validate it and then move on. Sounds reasonable until the manual validations prove the code is broken again and again and again. We catch ourselves rerunning cleanup, rerunning the setup, then rerunning our code and then checking the same conditions. This gets old fast and gets even worse when you have to revisit it a few days or weeks later. Its great to have a harness that will setup, run the exact same tests and then clean up - all by invoking a single command.

No tests? Welcome to Fear Driven Development

Look, testing is really hard. At least I think so. I usually spend way more time getting tests right and factored than whipping out the actual implementation code. However, whenever I am making changes to a codebase, I am so relieved when there are tests. Its my safety net. If the tests were constructed skillfully, I should be able to rip things apart and know that things are not deployable from all the failing tests. I may need to add, change or remove some tests to account for my work but overall, as those failing tests go green, its like breadcrumbs leading me back home to safety.

But maybe there are no tests. Now I'm scared and I should be and if you are on my team then you should be too. So I have a "sophie's" choice: write tests now or practice yet another time honored discipline - prayer driven development - sneaking in just this one change and hoping some manual testing gets me through it.

I'm not going to say that the former is always the right answer. Writing tests for existing code can be incredibly difficult and can make a 5 minute bug fix turn into a multi day yak hair detangling session even when you focus on just adding tests for the code you are changing. Sometimes it is the right thing to invest this extra time. It really depends on context, but I assure you the more one takes the latter road, the more dangerous the code becomes to change. The last thing you want in your codebase is to be afraid to change it unless it all works perfectly and its requirements are immutable.

Your POC will ship faster with no tests

Oh shoot, we shipped the POC. (You are likely saying something other than "shoot").

This may not always be the case, but I am pretty confident that a MVP (minimal viable product) can be completed more quickly without tests. However, v2 will be slower, v3, even slower. v4 and on will likely be akin to death marches and you probably hired a bunch of black box testers to test the features reporting bugs well after the developer has mentally moved on to other features. As the cyclomatic complexity of your code grows, it becomes nearly impossible to test all conditions affected by recent changes let alone remember them.

TIP: A POC should be no more than a POC. Prove the concept and then STOP and do it right! Side note: Its pretty awesome to blog about this and stand so principled...real life is often much more complicated...ugh...real life.

But infrastructure code is different

Ok. So far I don't think anything in this post varies with infrastructure code. As far as I am concerned, these are pretty universal rules to testing. However, infrastructure code IS different. I started the post (and titled it) referring to Pester - a test framework written in and for PowerShell. Chances are (though no guarantees) if you are writing PowerShell you are working on infrastructure. I have been focusing on infrastructure code for the past 3 to 4 years and I really found it different. I remain passionate about testing, but have embraced different patterns, workflows and principles since working in this domain. And I am still learning.

If I mock the infrastructure, what's left?

So when writing more traditional style software projects (whatever the hell that is but I don't know what else to call it), we often try to mock or stub out external "ifrastructureish" systems. File systems, databases, network sockets - we have clever ways of faking these out and that's a good thing. It allows us to focus on the code that actually needs testing.

However if I am working on a PR for the winrm ruby gem that implements the winrm protocol or I am working on provisioning a VM or am leaning heavily on something that uses the windows registry, if I mock away all of these layers, I may fall into the trap where I am not really testing my logic.

More integration tests

One way in which my testing habits have changed when dealing with infrastructure code is I am more willing to sacrifice unit tests for integration style tests. This is because there are likely to be big chunks of code that may have little conditional logic but is instead expending its effort just moving stuff around. If I mock everything out I may just end up testing that I am calling the correct API endpoints with the expected parameters. This can be useful to some extent but can quickly start to smell like the tests just repeat the implementation.

Typically I like the testing pyramid approach of lots and lots of unit tests under a relatively thin layer of integration tests. I'll fight to keep that structure but find that often the integration layer needs to be a bit thicker in the infrastructure domain. This may mean that coverage slips a bit at the unit level but some unit tests just don't provide as much value and I'm gonna get more bang for my buck in integration tests.

Still - strive for unit tests

Having said I'm more willing to skip unit tests for integration tests, I would still stress the importance of unit tests. Unit tests can be tricky but there is more often than not a way to abstract out the code that surrounds your logic in a testable way. It may seem like you are testing some trivial aspect of the code but if you can capture the logic in unit tests, the tests will run much faster and you can iterate on the problem more quickly. Also bugs found in unit tests lie far closer to the source of the bug and are thus much easier to troubleshoot.

Virtues of thorough unit test coverage in interpreted languages

When working with compiled languages like C#, Go, C++, Java, etc, it is often said that the compiler acts as Unit Test #1. There is alot to be said for code that compiles. Well there is also great value in using dynamic languages but one downside in my opinion is the loss of this initial "unit test". I have run into situations both in PowerShell and Ruby where code was deployed that simply was not correct. Using a misspelled method name or referencing an undeclared variable just to name a couple possibilities. If anything, unit tests that do no more than merely walk all possible code paths can protect code from randomly blowing up.

How about TDD?

Regardless of whether I'm writing infrastructure code or not, I tend to NOT do TDD when I am trying to figure out how to do something. Like determining which APIs to call and how to call them. How can I test for outcomes when I have no idea what they look like. I might not know what registry tree to scan or even if the point of automation is controlled at all by the registry.

Well with infrastructure code I find myself in more scenarios where I start off having no idea how to do something and the code is a journey of figuring this out. So I'm probably not writing unit tests until I figure this out. But when I can, I still love TDD. I've done lots of infrastructure TDD. Really does not matter what the domain is, I love the red, green, refactor workflow.

If you can't test first, test ASAP

So maybe writing tests first in some cases does not make sense as you hammer out just how things work. Once you do figure tings out either refactor what you have with tests first or fill in with tests after the initial implementation. Another law I have found to equally apply to all code is that the longer you delay writing the tests, the more difficult (or impossible) it is to write the tests. Some code is easy to test and some is not. When you are coding with the explicit intention of writing tests, you are motivated to make sure things are testable.

This tends to also have some nice side effects of breaking down the code into smaller decoupled components because its a pain in the but to test monoliths.

When do we NOT write tests

I don't think the answer is never. However, too often than not "throw away code" is not thrown away. Instead it grows and grows. What started as a personal utility script gets committed to source control, distributed with our app and depended on by customers. So I think we just need to be cautious to identify these inflection points as soon as possible when our "one-off" script becomes a core routine of our infrastructure.

How to tell if you are running in a remote process / June 6, 2015 by Matt Wrock

I've encountered a few scenarios now where I need to perform different logic based on whether or not I am running locally or remotely. One scenario may be that I have just provisioned a new machine from a base image that has a generic bootstrap password and I need to change the root/admin password. If I do this in an SSH or WinRM/PowerShell session, it will likely kill the process I am running in (of course I could schedule it to happen later). Another scenario may be I'm about to install some Windows updates. I've blogged before about how this does not work over WinRM. So I know I will need to perform that action in a scheduled task.

Depending on your OS and the type of process you are living in (ruby, powershell, ssh, winrm, etc) there are different techniques to detect if you are remote or local and some are more friendly than others.

Detecting SSH

There are a couple ways I have done this. One is language independent. Its probably os independent too but I have only done this on Linux. There are a couple environment variables set in most SSH sessions so you may be able to simply check if that exists:

if ENV('SSH_CLIENT')
  # your crazy remote code here
end

I've run into problems with this though. Lets say you start a service in a SSH session. Now the process in that service has inherited your environment variables. So even if you logout and the process continues to run outside of the initial SSH context, using the above code will still trigger the remote logic.

Is the console associated with a terminal (tty)?

Most languages have a friendly way to determine if the console has a TTY. Check out this wiki that provides snippets for several different programming languages. Ive done this in ruby using:

if STDOUT.isatty
 puts 'I am remote...yo'
end

Detecting a powershell remoting session

Note that WinRM or powershell remoting do not associate a TTY with the console like SSH does; so you need to take a different approach here. If you are lucky enough to be running in a powershell remoting session and NOT a vanilla WinRM session (these are similar but different), there is an easy way to tell if you are local or being invoked from a remote powershell session:

if($PSSenderInfo -ne $null) { Write-Output 'I am remote' }

$PSSenderInfo contains metadata about one's remote session, like the winrm connection string.

Detecting a non-powershell winrm session

There are many reasons to dislike winrm and I regret to give you one more. I googled the "heck" out of this when I was trying to figure out how to do this and I assure you words much more harsh than "heck" flowed freely from my dry, parched lips. I found nothing. The solution I came up with was to search the process tree for an instance of winrshost.exe. All remote winrm sessions run as a child process of winrshost (god bless it).

Here is how you might do this in powershell:

function Test-ChildOfWinrs($ID = $PID) {
  if(++$script:recursionLevel -gt 20) { return $false }
  $parent = (Get-WmiObject -Class Win32_Process -Filter "ProcessID=$ID")
  $parent = $parent.ParentProcessID
  if($parent -eq $null) {
    return $false
  }
  else
  {
    try {
      $parentProc = Get-Process -ID $parent -ErrorAction Stop
    }
    catch {
      return $false
    }
    if($parentProc.Name -eq "winrshost") {return $true} 
    else {
      return Test-ChildOfWinrs $parent
    }
  }
}

This function takes a process id but defaults to the currently running process's id if none is provided. It traverses the ancestors of the process looking for winrshost.exe. One thing I found I had to do to make this "safe" is watch the recursion level and don't become infinite. Your CPU will thank you. I wrote this a year ago and am trying to remember the exact situation but I do know that somehow I hit a situation here I got caught in a circular traverse of the tree.

Here is a ruby function I have used in chef to do roughly the same thing:

def check_process_tree(parent, attr, match)
   wmi = ::WIN32OLE.connect("winmgmts://") 
   check_process_tree_int(wmi, parent, attr, match)   
end

def check_process_tree_int(wmi, parent, attr, match)
  if parent.nil?
    return nil 
  end
  query = "Select * from Win32_Process where ProcessID=#{parent}"
  parent_proc = wmi.ExecQuery(query)
  if parent_proc.each.count == 0
    return nil
  end
  proc = parent_proc.each.next
  result = proc.send(attr)
  if !result.nil? && result.downcase.include?(match)
    return proc 
  end
  if proc.Name == 'services.exe'
    return nil
  end
  return check_process_tree_int(wmi, proc.ParentProcessID, attr, match)
end

This will climb the process tree looking for any process you tell it to. So you would call it using:

is_remote = !check_process_tree(Process.ppid, :Name, 'winrshost.exe').nil?

Be careful with wmi queries and recursion. Every instance of the wmi search root creates a process. So I typically make sure to create one and reuse it.

I hope you have found this post helpful.

A destructible and repeatable chef workstation environment / May 11, 2015 by Matt Wrock

Several months ago I posted on the CentyryLink Cloud blog about how we reproduce our chef development environment with Vagrant using the chef solo provisioner that consumes the same cookbook as our build agents to create an environment for testing cookbooks. This post will dive into the technical details of such a Vagrantfile/cookbook and kicks it up a notch by publicly sharing a repo that is just that but stripped of any CenturyLink specific references like our Nexus server, Berkshelf API client setup or internal gem installs. Here I will share a chef_workstation repo that anyone can use to quickly spin up an environment that's awesome for developing and testing cookbooks. This repo can be found here in my github chef_workstation repo.

Why a vagrant box for chef development?

Many use vagrant for creating test nodes that converge cookbooks but it is not as common to hear about using vagrant for creating chef workstations/development environments. The fact is that ChefDK goes a long way to solve an isolated pre-built chef environment. However, every environment is different even if many use the ChefDK as a starting point. Its not uncommon to be changing or creating gems, adding tools, and tweaking rake tasks and such before you want to either "capture" these changes to be easily shared or blow these changes away and start fresh from a known, working base.

We have found that being able to easily reproduce shared dev environments has added a ton of value to our workflow. So lets dive in and explore what exactly this repo provides.

Whats in the box?

I've tried to make sure the current readme describes the details of this box, what it contains, and how to use it. So please feel free to review that. Here is a high level list of what the chef_workstation repo provides:

chefdk
docker
nano text editor
git
squid proxy cache
SSH agent forwarding
generated knife.rb
Rakefile with tasks for cookbook testing
Support for VirtualBox, Parallels, Hyper-V and VMWare
Secret energizing ingredient guaranteed to infuze your soul with the spirit of automation!!

The repo contains a Vagrantfile at the root that builds an Ubuntu 12.04 image and sets up all of the above. I've been updating this repo in preparation for this post and I can say there is a fair amount that goes into getting this setup just right. Some of it seems trivial to me today but when I was just getting used to chef and linux a year ago, some of this seemed very mysterious then.

We use vagrant to develop but not for testing

Lets get this out of the way right now. In the automation team at CenturyLink Cloud, we use docker to test almost all linux based cookbooks and a custom vsphere kitchen driver for everything else. We don't use kitchen-vagrant. I do use kitchen-vagrant often on personal projects but it just does not make sense at work. We ARE a cloud so spinning up our own cloud resources needs to be central to our testing. Its also simpler for us to manage test instances in our own large scale capacity cloud than on a much thinner build agent's hard drive.

So if you primarily use kitchen-vagrant for testing, you may not benefit from all the features here but you may want to stick around before you run off. There is more here than just a test-kitchen environment and if you work mostly with linux infrastructure, I cant stress enough the benefits of containers to testing productivity and decreased feedback cycles. This workstation aims to reduce the friction of setting this up repeatably.

Cache all the things!

One feature of this environment that is very popular is its use of squid and the vagrant-cachier plugin. The first time one runs vagrant up with this environment, it may take 10 minutes to run. Give or take a few minutes. This setup caches most internet downloaded artifacts on the host so that all subsequent vagrant ups will build much more quickly.

This feature is not available to windows hosts but if you run a mac or a linux desktop, this works wonders to bring down box build times.

Standardize on a base tools install while maintaining freedom of dev tools

I've historically preferred not to work in VMs as a dev environment. I either need to make sure all my "stuff" is installed there or cope with an environment that has minimal tooling. For those who use vagrant, you know this is where it shines. It syncs folders seamlessly across guest and host regardless of differing operating systems so I can dev on the same files in my favorite text editor or IDE on the host and then run those bits immediately in the shared guest environment.

This environment standardizes on Ubuntu 12.04 as a guest for a few reasons. Its used in many of our production servers and the popular hashicorp/precise64 vagrant box image has providers for VirtualBox, Hyper-V, VMWare Fusion and parallels. This makes it easy to share among any member of our org. I don't think there is anyone who cannot accommodate any of these hypervisor configurations. We don't have to tell anyone that their OS is not welcome.

Decoupling the "shell" from the "core"

This repository is intended to serve as one's "master" chef cookbook repository but it allows one to keep individual cookbooks in separate repos or in a single repo that is separate from this one. This repository contains only one cookbook which actually builds the environment. Any additional cookbook added to the cookbooks directory will be gitignored and should not be maintained in this repo. Simply clone your cookbooks into the cookbooks directory to be maintained separately. As long as they are in the cookbooks directory, they will be included in the vagrant folder syncing and immediately available in your vagrant guest.

Common rake tasks

The readme goes into the specifics of how to call all of the available tasks. I removed quite alot from this Rakefile that was in our own CenturyLink Rakefile and tied into our internal CI/CD pipeline with tasks for syncing dependencies, bumping versions, promoting cookbooks to different environments, etc. However what remains are likely tasks that most would expect to be present in any basic chef repo:

The tasks can work on a single cookbook or all cookbooks in the cookbooks directory in a single task invocation. The kitchen task also allows you to provide your own cookbook level Rakefile if you need to "orchestrate" the test suites for more complex test behavior.

Chef server friendly

The idea here is that if you work with a chef server, upon bringing up this environment, knife commands should "just work" with your account with no gymnastics or secret winking patterns (not that I think secret winking patterns are an anti-pattern -- I'm a huge fan!).

The chef_workstation cookbook generates a knife.rb in the .chef directory of the repo and only if no other knife.rb file exists here. By default the username of the logged in user on the host is used and the chef server url of the publicly hosted chef server is included but is missing the org in the path. Both the user and server url are configurable in the Vagrantfile. Then it is up to the user to add their .pem file to the top level .chef directory with the same name as the user. The entire .chef directory is gitignored so you can add keys without fear of them being added to source control.

A brief walk through of building the box and testing with docker

Getting setup

See the readme for the exact prerequisites and install instructions. Chances are high you already have them. At the very least you want:

Recent version of vagrant
A hypervisor (most use virtualbox)
The vagrant-omnibus plugin

Unless you are on windows I strongly suggest installing vagrant-cachier but its completely optional. Finally clone the chef_workstation repo.

Make it your own

This wisdom applies equally to American Idol performances and customized chef workstation vagrant boxes. Edit the Vagrantfile chef.json property that will inject the node attributes into the chef_workstation cookbook that provisions the box. This can include chef server user name and url, gems you want added that are not in the default installed gems, or additional deb packages you want included. All the possible attributes are mentioned here.

Also add your own cookbooks that you intend to work with to the cookbooks directory. so far, everything we have mentioned in the walk through, you only have to do once.

For this walk through, I will add the same cookbook covered a couple posts ago that covered multi node testing with test-kitchen and docker. Its a fork of the couchbase cookbook that can add nodes to a cluster.

cd cookbooks
git clone https://github.com/mwrock/couchbase -b multi-node

Build the box

Now run:

vagrant up

If you are using a hypervisor other than virtualbox, make sure to either:

Set the VAGRANT_DEFAULT_PROVIDER environment variable to the provider you want to use
Specify it in the --provider argument to vagrant up

In a few minutes the box should be ready.

Add your server key if using a server

If you use a chef server, copy your private key used to authenticate to the .chef directory in the root of the repo. The box provisioning should have created this directory. This file should be named {user_name}.pem. You may of course edit the knife.rb file if you prefer to use a different name. Once generated, the knife.rb file will never be overwritten when provisioning the box, and as mentioned previously, it is in the .gitignore file and thus excluded from source control.

Log on to the box and navigate to the couchbase cookbook

vagrant ssh
cd cookbooks/couchbase

Test a kitchen suite

We wont go through the whole multi node test cycle. Refer to my post on that topic if you are interested in running through that because in this environment you can without a problem.

First lets run our chefspecs just to make sure unit tests are happy:

vagrant /chef-repo $ rake chefspec[couchbase]

...
2 deprecation warnings total

Finished in 0.90515 seconds (files took 2.34 seconds to load)
243 examples, 0 failures

vagrant /chef-repo $

Looks good. Now lets see what kitchen suites we have available. There are quite a few. Here is a portion of the suites at the top of the list:

vagrant /chef-repo/cookbooks/couchbase $ kitchen list
Instance                          Driver   Provisioner  Verifier  Transport  Last Action
server-community-debian-76        Docker   Nodes        Busser    Ssh        <Not Created>
server-community-ubuntu-1204      Docker   Nodes        Busser    Ssh        <Not Created>
server-community-centos-65        Docker   Nodes        Busser    Ssh        <Not Created>
server-community-windows-2012R2   Vagrant  Nodes        Busser    Winrm      <Not Created>
second-node-debian-76             Docker   Nodes        Busser    Ssh        <Not Created>
second-node-ubuntu-1204           Docker   Nodes        Busser    Ssh        <Not Created>

We will run the server-community-ubuntu-1204 suite:

kitchen verify server-community-ubuntu-1204

You will likely see a lot of text and this may take several minutes to complete at least the first time as the base ubuntu docker image is downloading. If you are using vagrant-cachier, the next time will be much faster. Hopefully this should end in a successful converge and serverspec tests...yup:

       Finished in 0.13151 seconds (files took 0.54705 seconds to load)
       13 examples, 0 failures

       Finished verifying <server-community-ubuntu-1204> (0m5.89s).
-----> Kitchen is finished. (5m1.32s)

Add the cookbook to our server

First we'll do a berks vendor to grab all dependencies:

vagrant /chef-repo/cookbooks/couchbase $ berks vendor /tmp/couchbase-vendor
Resolving cookbook dependencies...
Fetching 'couchbase' from source at .
Fetching 'couchbase-tests' from source at test/integration/cookbooks/couchbase-tests
Using apt (2.7.0)
Using chef-sugar (3.1.0)
Using chef_handler (1.1.6)
Using couchbase (1.3.0) from source at .
Using couchbase-tests (0.1.0) from source at test/integration/cookbooks/couchbase-tests
Using export-node (1.0.1)
Using minitest-handler (1.3.2)
Using openssl (4.0.0)
Using windows (1.36.6)
Using yum (3.6.0)
Vendoring apt (2.7.0) to /tmp/couchbase-vendor/apt
Vendoring chef-sugar (3.1.0) to /tmp/couchbase-vendor/chef-sugar
Vendoring chef_handler (1.1.6) to /tmp/couchbase-vendor/chef_handler
Vendoring couchbase (1.3.0) to /tmp/couchbase-vendor/couchbase
Vendoring couchbase-tests (0.1.0) to /tmp/couchbase-vendor/couchbase-tests
Vendoring export-node (1.0.1) to /tmp/couchbase-vendor/export-node
Vendoring minitest-handler (1.3.2) to /tmp/couchbase-vendor/minitest-handler
Vendoring openssl (4.0.0) to /tmp/couchbase-vendor/openssl
Vendoring windows (1.36.6) to /tmp/couchbase-vendor/windows
Vendoring yum (3.6.0) to /tmp/couchbase-vendor/yum

Now we will upload to our server. Mine is my personal, free hosted chef server:

vagrant /chef-repo/cookbooks/couchbase $ knife cookbook upload -a -o /tmp/couch
base-vendor
Uploading apt            [2.7.0]
Uploading chef-sugar     [3.1.0]
Uploading chef_handler   [1.1.6]
Uploading couchbase      [1.3.0]
Uploading couchbase-tests [0.1.0]
Uploading export-node    [1.0.1]
Uploading minitest-handler [1.3.2]
Uploading openssl        [4.0.0]
Uploading windows        [1.36.6]
Uploading yum            [3.6.0]
Uploaded all cookbooks.

Rinse and repeat

Now I am free to destroy this box and my synced repo folder will remain in tact. We do this all the time. Boxes get stale as other team members are reving shared internal gems or you are experimenting with alternate configurations and want to just start over.

There is something to be said for the freedom that comes with knowing you can walk on a tight rope since there is a net just beneath you to catch your fall (except when there's not).

Multi node Test-Kitchen tests and working with Vagrant NAT addressing with VirtualBox / May 1, 2015 by Matt Wrock

This is a gnat. Let the observer note that it is different from NAT

This is now the third post I have written on creating multi node test-kitchen tests. The first covered a windows scenario building an active directory controller pair and the last one covered a multi docker container couchbase cluster using the kitchen-docker driver. This will likely be the last post in this series and will cover perhaps the most commonly used test-kitchen setup using the kitchen-vagrant driver with virtualbox.

In one sense there is nothing particularly special about this setup and all of the same tests demonstrated in the first two posts can be run through vagrant on virtualbox. In fact that is exactly what the first post used for the active directory controllers although it also supports Hyper-V. However there is an interesting problem with this setup that the first post was lucky enough to avoid. If you were to switch from the docker driver to the vagrant driver in the second post that built a couchbase cluster in docker containers, you may have noticed a recipe at the top of the runlist in each suite: couchbase-tests::ipaddress.

Getting nodes to talk to one another

We will soon get to the purpose behind that recipe, but first I'll lay out the problem. When you use the kitchen-vagrant driver to build test instances with the virtualbox provider without configuring any networking properties on the driver, your instance will have a single interface with an ip address of 127.0.0.1. Its gonna be difficult to do any kind of multi node testing with this. For one thing if you have two nodes, they will not be able to talk to each other over these interfaces. From the outside the host can talk to these nodes using its own localhost address and over the forwarded ports. But to another node? They are dead to each other.

The trick to get them to be accessible to one another is to add an additional network interface to the nodes by adding a network in the kitchen.yml:

    driver:
      network:
        - ["private_network", { type: "dhcp" }]

Now the nodes will have a dhcp assigned ip address that both the host and each node can use to access the other.

One could then use my kitchen-nodes provisioner that derives from the chef-zero provisioner so that normal chef searches can find the other nodes and access their ip addresses.

Just pull down the gem:

gem install kitchen-nodes

Add it to your .kitchen.yml:

provisioner:
  name: nodes

Now chef searches from inside your nodes can find one another as long as both nodes have beed created:

other_ip = search(:node, "run_list:*couchbase??master*")[0]['ipaddress']

Missing the externally reachable ip address in ohai

While this allows the nodes to see one another, one may surprised if they inspect a node's own ip address from inside that node.

ip = node['ipaddress'] # will be 127.0.0.1

The ip will be the localhost ip and not the same ip address that the other node will see. This may be fine in many scenarios, but for others you may need to know the externally reachable ip. You would like the ohai attributes to expose the second NIC's address and not the one belonging to the localhost interface.

This was the wall I hit in my kitchen-docker post because I was registering couchbase nodes in a couchbase cluster and the couchbase cluster could not add multiple nodes with the same ip (127.0.0.1) and each node needed to register an ip that could be used to access itself by the master node.

I googled for a while to see how others dealt with this scenario. I did not find much but what I did find were posts explaining how to create a custom ohai plugin to expose the right ip. Most posts really contained just a fraction of the information needed to create the plugin and once I did manage to assemble all the information needed to create the plugin, it honestly felt like quite a bit of ceremony for such a simple assignment.

Overwriting the 'automatic' attribute

So I thought instead of using an ohai plugin I'd find the second interfaces address in the ohai attributes and then set the ['automatic']['ipaddress'] attribute with that reachable ip. This seemed to work jut fine and as long as its done at the front of the chef run, any call to node['ipaddress'] subsequently in the run would return the desired address.

Here is the full recipe that sets the correct ipaddress attribute:

kernel = node['kernel']
auto_node = node.automatic

# linux
if node["virtualization"] && node["virtualization"]["system"] == "vbox"
  interface = node["network"]["interfaces"].select do |key|
    puts key
    key == "eth1"
  end
  unless interface.empty?
    interface["eth1"]["addresses"].each do |ip, params|
      if params['family'] == ('inet')
        auto_node["ipaddress"] = ip
      end
    end
  end
  
# windows

elsif kernel && kernel["cs_info"] && kernel["cs_info"]["model"] == "VirtualBox"
  interfaces = node["network"]["interfaces"]
  interface_key = interfaces.keys.last
  auto_node["ipaddress"] = interfaces[interface_key]["configuration"]["ip_address"][0]
end

First, this is not a lot of code and its confined to a single file. Seems a lot simpler than the wire-up required in an ohai plugin.

This is designed to work on both windows and linux. I have only used it on windows 2012R2 and ubuntu but its likely fairly universal. The code only sets the ip if virtualbox is the host hypervisor so you can safely include the recipe in multi-driver suites and other non virtualbox environments will simply ignore it.

Get it in the hurry-up-and-test-cookbook

In case others would like to use this I have included this recipe in a new cookbook I am using to collect recipes helpful in testing cookbooks. This cookbook is called hurry-up-and-test and is available on the chef supermarket. It also includes the export-node recipe I have shown in a couple posts that allows one to access a node's attributes from inside test code.

I hope others find this useful and I'd love to hear if anyone thinks there is a reason to package this as a full blown ohai plugin instead.

Now what are you waiting for? Hurry up and test!!