Multi node Test-Kitchen tests and working with Vagrant NAT addressing with VirtualBox / May 1, 2015 by Matt Wrock

This is a gnat. Let the observer note that it is different from NAT

This is now the third post I have written on creating multi node test-kitchen tests. The first covered a windows scenario building an active directory controller pair and the last one covered a multi docker container couchbase cluster using the kitchen-docker driver. This will likely be the last post in this series and will cover perhaps the most commonly used test-kitchen setup using the kitchen-vagrant driver with virtualbox.

In one sense there is nothing particularly special about this setup and all of the same tests demonstrated in the first two posts can be run through vagrant on virtualbox. In fact that is exactly what the first post used for the active directory controllers although it also supports Hyper-V. However there is an interesting problem with this setup that the first post was lucky enough to avoid. If you were to switch from the docker driver to the vagrant driver in the second post that built a couchbase cluster in docker containers, you may have noticed a recipe at the top of the runlist in each suite: couchbase-tests::ipaddress.

Getting nodes to talk to one another

We will soon get to the purpose behind that recipe, but first I'll lay out the problem. When you use the kitchen-vagrant driver to build test instances with the virtualbox provider without configuring any networking properties on the driver, your instance will have a single interface with an ip address of 127.0.0.1. Its gonna be difficult to do any kind of multi node testing with this. For one thing if you have two nodes, they will not be able to talk to each other over these interfaces. From the outside the host can talk to these nodes using its own localhost address and over the forwarded ports. But to another node? They are dead to each other.

The trick to get them to be accessible to one another is to add an additional network interface to the nodes by adding a network in the kitchen.yml:

    driver:
      network:
        - ["private_network", { type: "dhcp" }]

Now the nodes will have a dhcp assigned ip address that both the host and each node can use to access the other.

One could then use my kitchen-nodes provisioner that derives from the chef-zero provisioner so that normal chef searches can find the other nodes and access their ip addresses.

Just pull down the gem:

gem install kitchen-nodes

Add it to your .kitchen.yml:

provisioner:
  name: nodes

Now chef searches from inside your nodes can find one another as long as both nodes have beed created:

other_ip = search(:node, "run_list:*couchbase??master*")[0]['ipaddress']

Missing the externally reachable ip address in ohai

While this allows the nodes to see one another, one may surprised if they inspect a node's own ip address from inside that node.

ip = node['ipaddress'] # will be 127.0.0.1

The ip will be the localhost ip and not the same ip address that the other node will see. This may be fine in many scenarios, but for others you may need to know the externally reachable ip. You would like the ohai attributes to expose the second NIC's address and not the one belonging to the localhost interface.

This was the wall I hit in my kitchen-docker post because I was registering couchbase nodes in a couchbase cluster and the couchbase cluster could not add multiple nodes with the same ip (127.0.0.1) and each node needed to register an ip that could be used to access itself by the master node.

I googled for a while to see how others dealt with this scenario. I did not find much but what I did find were posts explaining how to create a custom ohai plugin to expose the right ip. Most posts really contained just a fraction of the information needed to create the plugin and once I did manage to assemble all the information needed to create the plugin, it honestly felt like quite a bit of ceremony for such a simple assignment.

Overwriting the 'automatic' attribute

So I thought instead of using an ohai plugin I'd find the second interfaces address in the ohai attributes and then set the ['automatic']['ipaddress'] attribute with that reachable ip. This seemed to work jut fine and as long as its done at the front of the chef run, any call to node['ipaddress'] subsequently in the run would return the desired address.

Here is the full recipe that sets the correct ipaddress attribute:

kernel = node['kernel']
auto_node = node.automatic

# linux
if node["virtualization"] && node["virtualization"]["system"] == "vbox"
  interface = node["network"]["interfaces"].select do |key|
    puts key
    key == "eth1"
  end
  unless interface.empty?
    interface["eth1"]["addresses"].each do |ip, params|
      if params['family'] == ('inet')
        auto_node["ipaddress"] = ip
      end
    end
  end
  
# windows

elsif kernel && kernel["cs_info"] && kernel["cs_info"]["model"] == "VirtualBox"
  interfaces = node["network"]["interfaces"]
  interface_key = interfaces.keys.last
  auto_node["ipaddress"] = interfaces[interface_key]["configuration"]["ip_address"][0]
end

First, this is not a lot of code and its confined to a single file. Seems a lot simpler than the wire-up required in an ohai plugin.

This is designed to work on both windows and linux. I have only used it on windows 2012R2 and ubuntu but its likely fairly universal. The code only sets the ip if virtualbox is the host hypervisor so you can safely include the recipe in multi-driver suites and other non virtualbox environments will simply ignore it.

Get it in the hurry-up-and-test-cookbook

In case others would like to use this I have included this recipe in a new cookbook I am using to collect recipes helpful in testing cookbooks. This cookbook is called hurry-up-and-test and is available on the chef supermarket. It also includes the export-node recipe I have shown in a couple posts that allows one to access a node's attributes from inside test code.

I hope others find this useful and I'd love to hear if anyone thinks there is a reason to package this as a full blown ohai plugin instead.

Now what are you waiting for? Hurry up and test!!

Lamentations of the OSS consumer: I'd like to read the $@#%ing manual but no one has written it / April 24, 2015 by Matt Wrock

I've been an active open source consumer and contributor for the past few years and overall being involved in both roles has been one of the peak experiences of my career and I only wish I had discovered open source much much sooner. However its not all roses, and things can be rough on both sides of the pull request. Especially for those new to these ecosystems and even more so if you come from a heritage of tools and culture not originally friendly to open source.

Yesterday I received a great email from someone asking about how to navigate a very popular infrastructure testing project where the documentation can be sparse in some respects. The project is ServerSpec - a really fantastic tool that many gain value from every day. The question comes from someone new to the chef community and ServerSpec is a key tool used in the chef ecosystem. The questioner immediately won my respect. They were curious but not bitter (at least did not admit to being so) and wanted to know how to learn more and start to contribute and get to a point of writing more creative tests.

This inspired me because i love interacting with people who are passionate about this craft and who like myself want to learn and improve themselves. It also struck a nerve since I have alot of opinions about approaching OSS projects and empathy for those new to the playing field and perhaps feeling a bit awkward. This individual, like myself, comes from a windows background, so I think I have some insights to where he is coming from.

I thought it might be interesting to transform my responses to a blog post. Here are some modified excerpts from my replies.

When Windows is the edge case

I think one issue that windows suffers from in this ecosystem is that it is the "edge case". The vast majority of those using, testing and contributing to this project are largely linux users. So when a minor version bump occurs and the PR notes explicitly call out that it wont break current tests, clearly thats evidence that windows was not tested. Although one can argue that windows is just not much of a player (but that's of course changing).

I'd look at this differently if this was code in a chef owned project where I would expect them to be paying more attention to windows. Regarding ServerSpec, a wholly open source project with no funding and shepherded by a community member who has a full time job that is not ServerSpec, I tend to be more forgiving but it can definitely make for a frustrating development experience at times.

I'm really hoping that more windows folks get involved and contribute more in this ecosystem both with code and documentation and also just filing issues, and I hope that their employers support them in these efforts. They stand to gain alot in doing so.

There may be no manual but there are always the codes

One thing I have found in the ruby world and much of the OSS world outside of ruby is that sometimes the best way to figure something out is to read the source code. The obvious downside to this is that its hard to read a language we may not be familiar with and we just want to write our test and move on with our lives.

So I find myself going back and forth. I may just do some quick “code spelunking” and not find anything that clearly points out how to do what I want and may take an uglier “brute force” approach. On other days depending on mood and barometric pressure in the office, I might be inclined to spend the time and dig deeper. It would be awesome if the authors took the time to spell out how to write custom matchers and resource types, but many OSS projects seem to lack this level of detailed documentation. I’m guessing because no one is paying them to and in the end they, like us, have a problem that needs solving and lack time to document.

One consolation is that these ruby libraries tend to be relatively small. Compare the ServerSpec code base including its sister project Specinfra to something like XUnit in C#. Its a lot less code. Of coarse it may take 3x longer to groc if you are a ruby beginner. What I often find is that given the motivation to learn and be more proficient, you eventually reach a point of minimal comfort with the codebase where you get it enough to see what needs to be added to get the thing to do what you want it to do and that’s when you start making contributions.

Heh. I totally have a weird love hate relationship with this stuff. There are days when I curse these libraries because I just want to do something that seems so simple and I really have no desire or time to make an investment and then there are other times when I am totally into the code and loving the sense that I am gaining an understanding of new patterns and coding constructs and realize I’m gaining some knowledge where I can not only make the code better for me but for others as well.

In the end its all just a constant slog through the marshes of learning and as software engineers, that’s our sweet spot. The ability to live in a state of learning and not so much bask in what we have learned.

Multi node testing with Test-Kitchen and Docker containers / April 21, 2015 by Matt Wrock

Two docker containers created and tested with Kitchen-Docker

My last post provided a walk through of some of the new Windows functionality available in the latest Test-Kitchen RC and demonstrated those features by creating and testing a Windows Active Directory domain controller pair. This post will also be looking at testing multiple nodes but instead of windows, I'll be spinning up multiple docker containers. I'm going to be using a Couchbase cluster as my example. Note that while I am using docker containers, there is nothing special happening here preventing one from running the same tests on multiple linux or windows VMs using the Kitchen-Vagrant driver. Couchbase runs on windows too.

Why run tests with containers when my production nodes are VMs?

There are some really interesting things being done with containers in production environments but even if you are not using containers in production, there are some clear benefits to using them for testing infrastructure development. The biggest value is faster provisioning. Using the kitchen-docker driver over vagrant or another cloud based driver can potentially save several minutes per test. You might wonder "whats a couple minutes?" However, when you are iterating over a problem and need to reprovision several times, a couple minutes or more can add up quick.

You still want to test provisioning to VMs if that is what your production infrastructure runs, but that can sit later in your testing pipeline. You will save alot of time, money and tears (you'll need those later) by keeping your feedback cycles short early in your development process.

Setting things up

To get started you will need to have the docker engine installed and the latest RC of test-kitchen.

Docker Install

There are a few approaches one can take to installing docker. Some are more complicated than others and really depend on your host operating system. I'm using an Ubuntu 14.04 desktop os on my laptop. Ubuntu 14.04 has no prerequisites and you simply run:

wget -qO- https://get.docker.com/ | sh

Ubuntu 12.02 requires a kernel upgrade and several packages before the above install will work. The docker installation documentation provides instructions for most operating systems. If you are running windows or a mac, you will want to run the docker engine from inside a linux vm. You can either setup a vm of your favorite linux distro and then install docker following the instructions on the docker site or you can install Boot2Docker which will install a local docker CLI, VirtualBox, and a stripped down, tiny core linux image.

This post is not aimed to explore the different ways of installing docker. If you do not already have docker or a vm setup from which you can install it friction-free, take a look at my chef_workstation repo that includes a Vagrantfile that will provision a workable chef enabled workstation environment with docker installed. It should work with VirtualBox, Hyper-V or Parallels on a mac. I believe it also works for VMWare Fusion users but I have not validated that for a while.

A multi-node enabled cookbook to test

To demonstrate multi node testing with test-kitchen, I have forked the community couchbase cookbook. I'll be sending a PR with these changes:

Compatibility with docker (current version uses netstat to validate a listening port an thats not installed on the default ubuntu container)
Extends the couchbase-cluster resource to allow other nodes to be joined to a cluster
Fixes the cookbook on windows which is unrelated to this post but aligns well with one of my personal missions in life

Clone my fork and checkout the multi-node branch:

git clone -b multi-node https://github.com/mwrock/couchbase

If you are using the vagrant box in my chef_workstation repo, cd to the cookbooks directory just below the directory you land in from vagrant ssh and clone from there.

Using the right gems

To help facilitate testing multiple nodes, this cookbook uses a custom test-kitchen provisioner plugin that utilizes functionality exposed in the latest test-kitchen RC. So the cookbook includes a Gemfile that references both of these gems and other important dependencies. To ensure that you are testing with all of the correct gems, cd into the root of the couchbase cookbook and run:

bundle install

Converge and test the first node

We are now ready to create, converge and test the first node of our couchbase cluster. Make sure to run with bundle exec so that we use all of the correct gem versions:

bundle exec kitchen verify server-community-ubuntu

This will start a new container running ubuntu 12.04, install Couchbase and initialize a new cluster. Then a serverspec test will ensure that the service is running and configured the way we want it.

Joining an additional node to the cluster

To get the full multi-node effect, lets now ask test-kitchen to run our second-node suite:

bundle exec kitchen converge second-node-ubuntu

This brings up a new container that will post to the couchbase rest endpoint of our first node asking to join the cluster. Then its serverspec test will pull the list of nodes in the cluster exposed from the original node and check if our second node is included in the list.

Discovering the original node

One possible strategy could be to set an attribute specifying the IP or host name of the initiating couchbase node. However this assumes it is a known and constant value. You may prefer your infrastructure to dynamically query for an existing couchbase node. In our test scenario, we really cant predict the ip or host name since we are getting IPs from DHCP and docker is handing out a unique hash for a host name.

Note that we could tweak the driver configuration in our .kitchen.yml to expose predictable hostnames that can link to other containers. Here is an example of a possible config for our node suites:

suites:
- name: server-community
  driver:
    publish_all: true
    instance_name: first_cluster
  run_list:
  - recipe[couchbase::server]
  attributes:
    couchbase:
      server:
        password: "whatever"

- name: second-node
  driver:
    links: "first_cluster:first_cluster"
  run_list:
  - recipe[couchbase-tests::default]
  attributes:
    couchbase:
      server:
        password: "whatever"
        cluster_to_join: first_cluster

Here the first node uses the kitchen-docker configuration to ask the docker engine to expose its container with a specific name "first_cluster." The second node is asked to link the name "first_cluster" with he "first_cluster" instance. This way any requests from the second container to the DNS name first_cluster will resolve to our first container. Finally we would create a node attribute named luster_to_join that our second node would ask to join.

This may work for your scenario and thats great. However it may break down for others. First its not very portable. This cookbook supports windows and locking in docker specific options will run into problems for windows tests that leverage vagrant here:

- name: windows-2012R2
  driver:
    name: vagrant
    network:
      - ["private_network", { type: "dhcp" }]
  transport:
    name: winrm
  driver_config:
    gui: true
    box: mwrock/Windows2012R2Full
    customize:
      memory: 1024

Furthermore, our test logic needs to match production logic. If production nodes will be querying the chef server for a node to send cluster join requests to, out tests must validate that this strategy works.

The kitchen-nodes provisioner plugin

In my last post I demonstrated a strategy that uses chef search to find a chef node based on a run list recipe. It used my kitchen-nodes provisioner plugin to create mock chef nodes of each kitchen suite so that a chef search can find other suite test instances during convergences. Since that example was creating a windows active directory controller pair, its functionality had some windows specific functionality. I have extended the functionality of this plugin to support most *Nix scenarios including docker.

First we tell test-kitchen to use the kitchen-nodes plugin as a provisioner for the suites that test our couchbase servers:

suites:
- name: server-community
  provisioner:
    name: nodes
  run_list:
  - recipe[couchbase-tests::ipaddress]
  - recipe[couchbase::server]
  - recipe[export-node]
  attributes:
    couchbase:
      server:
        password: "whatever"

- name: second-node
  provisioner:
    name: nodes
  run_list:
  - recipe[couchbase-tests::ipaddress]
  - recipe[couchbase-tests::default]
  - recipe[export-node]
  attributes:
    couchbase:
      server:
        password: "whatever"

The defult recipe of the couchbase-tests cookbook used by our second node can now find the first node using chef search:

primary = search_for_nodes("run_list:*couchbase??server* AND platform:#{node['platform']}")
node.normal["couchbase-tests"]["primary_ip"] = primary[0]['ipaddress']

The search_for_nodes method is defined in our couchbase-tests library:

require 'timeout'

def search_for_nodes(query, timeout = 120)
  nodes = []
  Timeout::timeout(timeout) do
    nodes = search(:node, query)
    until  nodes.count > 0 && nodes[0].has_key?('ipaddress')
      sleep 5
      nodes = search(:node, query)
    end
  end

  if nodes.count == 0 || !nodes[0].has_key?('ipaddress')
    raise "Unable to find nodes!"
  end

  nodes
end

Here we are using a chef search to find a node that includes the couchbase server recipe and has the same os platform of the current node. Matching on platform is important if our .kitchen.yml is designed to test more than one platform like ours.

Chef-zero and chef search

The kitchen-nodes plugin derives from the chef-zero test-kitchen provisioner. Using chef-zero we can issue a chef-search for nodes without being hooked up to a real chef-server. Chef-zero accomplishes this by storing information on each node in a json file stored in its nodes folder. The test-kitchen chef-zero provisioner wires all of this up by copying all files under tests/integration/nodes to {test-kitchen temp folder on test instance}/nodes. So you can create a json file for each test suite in your local nodes folder and then chef search calls will effectively treat the nodes files as the master chef server database.

The kitchen-nodes plugin automatically generates a node file when a test instance is provisioned by test-kitchen. Provisioning occurs at the very beginning of the converge operation. kitchen-nodes populates the node's json file with ip address, platform, and run list. Here are the two nodes' json files generated in my tests:

{
  "id": "server-community-ubuntu-1204",
  "automatic": {
    "ipaddress": "172.28.128.3",
    "platform": "ubuntu"
  },
  "run_list": [
    "recipe[couchbase-tests::ipaddress]",
    "recipe[couchbase::server]",
    "recipe[export-node]"
  ]
}

{
  "id": "second-node-ubuntu-1204",
  "automatic": {
    "ipaddress": "172.17.128.4",
    "platform": "ubuntu"
  },
  "run_list": [
    "recipe[apt]",
    "recipe[couchbase-tests::ipaddress]",
    "recipe[couchbase-tests::default]",
    "recipe[export-node]"
  ]
}

During provisioning, kitchen-nodes will either use SSH or WinRM depending on the test instance platform to interrogate its interfaces for an IP that is accessible to the host. On windows, this information is retrieved using a few powershell cmdlets and on *Nix instances either ifconfig or ip addr show is used depending on what is available on that distro. There may be several interfaces but kitchen-nodes will only choose an ipv4 ip that can be pinged from the host.

Testing that we joined the correct cluster

So how do we test that we actually found the correct node? We cant write a serverspec test using a hard coded IP. We use a testing recipe, export-node, that dumps the entire node object to a json file. Our test recipe run by the second node stores the primary node's IP in a node attribute as we saw further above.

Here is an instant replay:

node.normal["couchbase-tests"]["primary_ip"] = primary[0]['ipaddress']

So when the export-node cookbook dumps the node data, that IP address will be included. Here is the test that validates the node join:

describe "cluster" do
  let(:node) { JSON.parse(IO.read(File.join(ENV["TEMP"] || "/tmp", "kitchen/chef_node.json"))) }
  let(:response) do
    resp = Net::HTTP.start node["normal"]["couchbase-tests"]["primary_ip"], 8091 do |http|
      request = Net::HTTP::Get.new "/pools/default"
      request.basic_auth "Administrator", "whatever"
      http.request request
    end
    JSON.parse(resp.body)
  end

  it "has found the priary node and it is not itself" do
    expect(node["normal"]["couchbase-tests"]["primary_ip"]).not_to eq(node['automatic']['ipaddress'])
  end

  it "has joined the primary cluster" do
    joined = false
    response['nodes'].each do |cluster_node|
      if cluster_node['hostname'] == "#{node['automatic']['ipaddress']}:8091"
        joined =  true
      end
    end

    expect(joined).to be true
  end
end

The export-nodes cookbook dumps the node json to a file named chef-node.json in the kitchen temp folder. So our test pulls the ip that was returned by the chef search from here. It makes sure that it is in fact a different node from its own IP and then issues a couchbase API request to that node to return all nodes in its cluster. Our test passes as long as the second node is included in the returned node list.

Testing all the things

I find this helpful and reassuring that I can include my node interactions into my tests. Test-Kitchen's coverage can indeed extend well beyond the boundaries of a single node.

Orchestrating multi node Windows tests in Test-Kitchen Beta! / March 29, 2015 by Matt Wrock

Primary and backup active directory controllers in their own kitchen.local domain

This week marks an important milestone in the development bringing Test-Kitchen to windows. All the work that has gone into this effort over the past nine months has been merged into the master branch and prerelease gems have been cut for both the test-kitchen repo as well as kitchen-vagrant. So this post serves as another update to getting started with these new bits on windows (there are some changes) and expands on some of my previous posts by illustrating a multi node converge and test. The same technique can be applied to linux boxes as well but I'm going to demonstrate this with a cookbook that will build a windows primary and backup active directory domain controller pair.

Prerequisites

In order to make the cookbook tests in this post work, the following needs to be preinstalled:

A recent version of vagrant greater than 1.6 and I would strongly recommend even higher to account for various bug fixes and enhancements around windows.
Either VirtualBox or Hyper-V hypervisor
git
A ruby environment with bundler. If you dont have this, I strongly suggest installing the chefdk.
Enough local compute resources to run 2 windows VMs. I use a first generation lenovo X1 with an i7 processor, 8GB of ram, an SSD and build 10041 of the windows 10 technical preview. I have also run this on a second generation X1 running Ubuntu 14.04 with the same specs.

The great news is that now your host can be linux, mac, or windows.

Setup

Install the vagrant-winrm plugin. Assuming vagrant is installed, this will download and install vagrant-winrm:

vagrant plugin install vagrant-winrm

Clone my windows-ad-pair cookbook:

git clone https://github.com/mwrock/windows-ad-pair.git

At the root of the repo, bundle install the necessary gems:

bundle install

This will grab the necessary prerelease test-kitchen and kitchen-vagrant gems along with their dependencies. It will also grab kitchen-nodes, a kitchen provisioner plugin I will explain later.

Using Hyper-V

I use Hyper-V when testing on my windows laptop. Vagrant will use VirtualBox by default but can use many other virtualization providers. To force it to use hyper-v you can:

Add the provider option to your .kitchen.yml:

  driver_config:
    box: mwrock/Windows2012R2Full
    communicator: winrm
    vm_hostname: false
    provider: hyperv

Add an environment variable, VAGRANT_DEFAULT_PROVIDER, and assign it the value "hyperv".

I prefer the later option since given a particular machine, I will want to always use the same provider and I want to keep my .kitchen.yml portable so I can share it with others regardless of their hypervisor preferences.

A possible vagrant/hyper-v bug?

I've been seeing intermittent crashes in powershell during machine create on the box used in this post. I had to create new boxes for this cookbook. One reason is that using the same image for multiple nodes in the same security domain required the box to be sysprepped to clean all SIDs (security identifiers) from the base images. This means that when vagrant creates the machine, there is at least one extra reboot involved and I think this may be confusing the hyperv plugin.

I have not dug into this but I have found that immediately reconverging after this crash consistently succeeds.

Converge and verify

Due to the sensitive nature of standing up an Active Directory controller pair (ya know...reboots), rather than calling kitchen directly, we are going to orchestrate with rake. We'll dive deeper into this later but to kick things off run:

bundle exec rake windows_ad_pair:integration

Now go grab some coffee.

...

No, no, no. I meant get in your car and drive to another town for coffee and then come back.

What just happened?

Test kitchen created a primary AD controller, rebooted it and then spun up a replica controller. All of this uses the winrm protocol that is native to windows so no SSH services needed to be downloaded and installed. This pair now manages a kitchen.local domain that you could actually join additional nodes to if you so choose.

Where are these windows boxes coming from?

These are evaluation copies of windows 2012R2. They will expire in a little under six months from the date of this post. They are not small and weigh in at about 5GB. I typically use smaller boxes where I strip away unused windows features but I needed several features to remain in this cookbook and it was easiest to just package new boxes without any features removed. I keep the boxes accesible on Hashicorp's Atlas site but the bytes live in Azure blob storage.

Is there an SSH server running on these instances?

No. Thanks to Salim Afiune's work, there is a new abstraction in the Test-Kitchen model, a Transport. The transport governs communication between test-kitchen on the host and the test instance. The methods defined by the transport handle authentication, transferring files, and executing commands. For those familiar with the vagrant model, the transport is the moral equivalent of the vagrant communicator. Test-kitchen 1.4 includes built in transports for winrm and ssh. I could imagine other transports such as a vmware vmomi transport that would leverage the vmware client tools on a guest.

How does the backup controller locate the primary controller?

One of the challenges of performing multi node tests with test-kitchen be they windows or not is orchestrating node communication without hard coding endpoint URIs into your cookbooks or having to populate attributes with these endpoints. Ideally you want nodes to discover one another based on some metadata. At CenturyLink we use chef search to find nodes operating under a specific runlist or role. In this cookbook, the backup controller issues a chef search for a node with the primary recipe in its runlist and then grabs its IP address from the ohai data.

primary = search_for_nodes("run_list:*windows_ad_pair??primary*")
primary_ip = primary[0]['ipaddress']

The key here is the search_for_nodes method found in this cookbook's library helper:

require 'timeout'

def search_for_nodes(query, timeout = 120)
  nodes = []
  Timeout::timeout(timeout) do
    nodes = search(:node, query)
    until  nodes.count > 0 && nodes[0].has_key?('ipaddress')
      sleep 5
      nodes = search(:node, query)
    end
  end

  if nodes.count == 0 || !nodes[0].has_key?('ipaddress')
    raise "Unable to find any nodes meeting the search criteria '#{query}'!"
  end

  nodes
end

Does Test-Kitchen host a chef-server?

Mmmmm....kind of. Test-kitchen supports both chef solo and chef zero provisioners. Chef zero supports a solo-like workflow allowing one to converge a node locally with no real chef server and also supports search functionality. This is facilitated by adding json files to a nodes directory underneath the test/integration folder:

The node files are named using the same suite and platform combination as the kitchen test instances. The contents of the node look like:

{
  "id": "backup-windows-2012R2",
  "automatic": {
    "ipaddress": "192.168.1.10"
  },
  "run_list": [
    "recipe[windows_ad_pair::backup]"
  ]
}

You can certainly create these files manually but there is no guarantee that the ip address will always be the same especially if others use this same cookbook. Wouldn't it be nice if you could dynamically create and save this data at node provisioning time? I think so.

We use this technique at CenturyLink and have wired it into some of our internal drivers. I've been working to improve on this making it more generlized and extracting it into its on dedicated kitchen provisioner plugin, kitchen-nodes. Its included in this cookbook's Gemfile and is wired into test-kitchen in the .kitchen.yml:

provisioner:
  name: nodes

Its still a work in progress and I came accross a scenario in the cookbook here where I had to add functionality to support vagrant/VirtualBox and temporarily make this plugin windows specific. I'll be changing that later. There is really not much code involved and here is the meat of it:

def create_node
  node_dir = File.join(config[:test_base_path], "nodes")
  Dir.mkdir(node_dir) unless Dir.exist?(node_dir)
  node_file = File.join(node_dir, "#{instance.name}.json")

  state = Kitchen::StateFile.new(config[:kitchen_root], instance.name).read
  ipaddress = get_reachable_guest_address(state) || state[:hostname]

  node = {
    :id => instance.name,
    :automatic => {
      :ipaddress => ipaddress
    },
    :run_list => config[:run_list]
  }

  File.open(node_file, 'w') do |out|
    out << JSON.pretty_generate(node)
  end
end

def get_reachable_guest_address(state)    
  ips = code <<-EOS
    Get-NetIPConfiguration | % { $_.ipv4address.IPAddress}
  EOS      
  session = instance.transport.connection(state).node_session
  session.run_powershell_script(ips) do |address, _|
    address = address.chomp unless address.nil?
    next if address.nil? || address == "127.0.0.1"
    return address if Net::Ping::External.new.ping(address)
  end
  return nil
end      

The class containing this derives from the ChefZero provisioner class. It reads from the state file that test-kitchen uses to get the node's ip address and then ensures that this ip is reachable externally or uses one that is from a different NIC in the node. It then adds that ip and the node's run list to the node json file.

Dealing with reboots

Standing up active directory controllers present some awkward challenges to an automated workflow in that each must be rebooted before they can recognize and work with the domain they control. Ideally we would simply be able to kick off a kitchen test which would converge each node and run their tests. However if we did this here, the results would be disappointing unless you like failure - just don't count on failing fast. So we have to "orchestrate" this. A rather fancy term for the method I'm about to describe.

I'll warn this is crude and I'm sure there are better ways to do this but this is simple and it consistently works. It includes using a custom rake task to manage the test flow. It looks like this:

desc "run integration tests"
task :integration do
 system('kitchen destroy')
 system('kitchen converge primary-windows-2012R2')
 system("kitchen exec primary-windows-2012R2 -c 'Restart-Computer -Force'")
 system('kitchen converge backup-windows-2012R2')
 system("kitchen exec backup-windows-2012R2 -c 'Restart-Computer -Force'")
 system('kitchen verify primary-windows-2012R2')
 system('kitchen verify backup-windows-2012R2')
end

This creates and converges the primary node and then uses kitchen exec to reboot that instance. While it is rebooting, the backup instance in created and converged. Of course there is no guarantee that the primary node will be available by the time the backup node tries to join the domain but I have never seen the back up node add itself to the domain without the primary node completing its restart. Remember that the backup node has to go through a complete create first which takes minutes. Then the backup node reboots after its converge and while its doing that, the primary node begins its verify process. The kitchen verify can be run independently and do not need both instances to be up.

If this were a production cookbook running on a build agent, I'd raise an error if any of the system calls failed (returned false) and I'd include an ensure clause at the end that destroyed both instances.

Vagrant networking

With the vagrant virtualbox provider, vagrant creates a NAT based network and forwards host ports for the transport protocol. This is a great implementation for single node testing but for multi node tests, you may need a real IP issued statically or via DHCP that one node can use to talk to another. This is not always the case but it is for an active directory pair mainly because creating a domain is tied to DNS and simply forwarding ports wont suffice. In order for the backup node to "see" the domain, "kitchen.local" here, created by the primary node, we assign the primary node's IP address to the primary DNS server of the backup node's NIC.

  powershell_script 'setting primary dns server to primary ad' do
    code <<-EOS
      Get-NetIPConfiguration | ? { 
        $_.IPv4Address.IPAddress.StartsWith("#{primary_subnet}") 
        } | Set-DnsClientServerAddress -ServerAddresses '#{primary_ip}'
    EOS

    guard_interpreter :powershell_script
    not_if "(Get-DnsClientServerAddress -AddressFamily IPv4 | ? { $_.ServerAddresseses -contains '#{primary_ip}' }).count -gt 0"
  end

Adding 127.0.0.1 would not work. We need a unique IP from wich the domain's DNS records can be queried.

Most non-virtualbox providers will not need special care here and certainly not hyper-v. Hyper-V uses an external or private virtual switch which maps to a real network adapter on the host. For virtualbox, you can provide network configuration to tell vagrant to create an additional NIC on the guest:

    driver:
      network:
        - ["private_network", { type: "dhcp" }]

This will simply be ignored by hyper-v.

Testing windows infrastructure - the future is now

I started using all of this last July and have been blogging about it since August. That first post was entitled Peering into the future of windows automation. Well here we are with prerelease gems available for use. Its really been exciting to see this shape up and it has been super helpful to me as I have been learning a new language, ruby, to interact with this code base along with Salim Afiune and Fletcher Nichols to accelerate my learning process which is far from over.

By the way I'll be at ChefConf nearly all week next week and carrying both of my laptops (linux and windows) with all of this code. If anyone wants a demo to see this in action or just wants to talk Test Driven Infrastructure shop, I'd love to chat!

Managing Ruby versions and Gem dependencies from a .Net perspective / March 15, 2015 by Matt Wrock

I've mentioned this in several posts. The last year I have coded primarily in ruby after spending over a decade in C#. See this post for my general observations contrasting Windows and Linux culture. Here I want to focus on language version as well as dependency management particularly comparing nuget to gem packages. This is a topic that has really tripped me up on several occasions. I used to think that ruby was inferior in its ability to avoid dependency conflicts, but in fact I just misunderstood it. Gems and Nuget are deceptively similar but there are some concepts that Ruby separates into different tools like Bundler and that Nuget consolidates into the core package manager with some assistance from Visual Studio.

In this post I'll draw some comparisons and also contrast the differences of managing different versions of ruby as opposed to different .Net runtimes and then explore package dependency management.

.Net Runtimes vs. Ruby Versions

Individual versions of .net are the moral equivalent of different ruby versions. However .net has just one physical install per runtime/architecture saved to a common place on disk. Ruby can have multiple installations of the same version and stored in any location.

In .net, which runtime version to use is determined at application compile time (there are also ways to change this at runtime). One specifies which .net version to compile against and then that version will be loaded whenever the compiled program is invoked. In .net, the program "owns" the process and the runtime libraries are loaded into that process. This is somewhat orchestrated by the Windows registry which holds the location of the runtime on disk and therefore knows where to find the base runtime libraries to load. Note that I'm focusing on .net on windows and not mono which can run .Net on linux.

Ruby versions can be located anywhere on disk. They do not involve the registry at all. Which version used depends entirely on which ruby executable is invoked. Unlike .net, the ruby user application is not the process entry point. Instead one always invokes the ruby binary (ruby.exe if on windows) and passes ruby files (*.rb) to load and run. Usually one controls which version is the system wide used version by putting that ruby bin folder on the path.

Loading the runtime, step by step

Lets look at both .net and ruby and see what exactly happens when you invoke a program.

.net

Invoke an exe
create process
.net bootstrapper (corExeMain) loads .net runtime into process
.net code (msil) is run

This glosses over some details but the main point is that each .net runtime resides in a single location on disk and each compiled .net program includes a bootstrapper called corExeMain that makes some system calls to locate that runtime.

Also note that it is very likely that the .exe invoked is not necessarily "the app". ASP.Net is a good example. Assuming a traditional IIS web server is hosting your ASP.Net application. There is an IIS worker process spawned by iis that hosts an asp.net application. A ASP.net developer did not write this worker process. They wrote code that is compiled into .DLL libraries. The ASP.Net worker process discovers these DLLs and loads them into the .Net runtime that they host.

ruby

Lets take the case of a popular ruby executable bundler.

Invoke bundle install on the command line
Assuming C:\Ruby21-x64\bin is on the PATH, C:\Ruby21-x64\bin\bundler.bat is called. A "binstub" that ruby creates.
This is a thin wrapper that calls ruby.exe C:/Ruby21-x64/bin/bundler
This bundler file is another thin wrapper that loads the bundler ruby gem and loads the bundler file in that gem's bin directory
This bundler file contains the entry point of the bundler application

Managing multiple versions of the runtime on the same machine

.net places its copies of runtime libraries in the Microsoft.NET directory of the system drive. A .net developer will compile their library targeting a specific runtime version. Also, a configuration file can be created for any .Net .exe file that can inform which runtime to load. The developer just needs to insure that the .net version they use exists on the machine.

With Ruby, its all a matter of pointing to the ruby bin file in the installed ruby directory tree you want to use and this may be influenced by one's PATH settings when not using absolute file paths.

rvm

One popular and convenient method of managing multiple ruby versions is using a tool called rvm. This is really meant for consumption on linux machines. I do believe it can be used with cygwin on windows machines. Personally I'm not a big cygwin fan and prefer to just use a linux VM where I can use native rvm if I need to switch ruby versions.

rvm exposes commands that can install different versions of ruby and easily switch one's working environment from one version to another.

omnibus installs

One big challenge in ruby is distributing an application. When one writes a ruby program either as a library to be consumed or as a command line executable, it is most common to package this code into one or more ruby gems. However, a gem is just a collection of .rb files and some other supporting files. The gem does not contain the ruby runtime itself. If I were to give someone a .gem file who was unfamiliar with ruby, they would not have a clue as to what to do with that file, but I would still love them.

That person, god bless them, would need to install ruby and then install that gem into the ruby installation.

So one way that ruby applications have begun to distribute themselves is via an omnibus. An omnibus installation ships with a full ruby runtime embedded in the distributed app. Its application code is still packaged as one or more gems and they are located in the special gems directory of this ruby installation along with all of its dependent gems. This also ensures that all the dependent gems and their required versions are preinstalled to eliminate the risk of dependency conflicts. Two example omnibus applications that I regularly work with are Chef and Vagrant.

So chef might have the following directory structure:

chef
|_bin
|_chef.bat
|_chef
|_embedded

Upon installation, chef/bin is added to the PATH so that calling chef from the command line will invoke the chef.bat file. That chef.bat file is a thin wrapper that calls chef/embedded/bin/ruby.exe and loads the ruby in chef/bin/chef which then calls into the chef gem.

So the advantage of the omnibus is a complete ruby environment that is not at risk of containing user loaded gems. The disadvantage is that any ruby app even if it is tiny needs to distribute itself with a complete ruby runtime which is not small.

Assemblies

Before we dive into nuget and gem packages, we need to address assemblies (.DLLs) which only exist in .net. How do assemblies, .gems and .nupkg (nuget) files map to one another? Assemblies are the final container of application logic and are what physically compose the built .net application. Assemblies were at one time a collection of code files that have been compiled down to IL(intermediate language) and packaged as a .DLL file. In package management terms, assemblies are what gets packaged but they are not the package.

Assemblies can exist in various places on disk. Typically they will exist in one of two places, the Global Assembly Cache (GAC) or in an application's bin directory. When a .net application is compiled, every assembly includes an embedded manifest of its version and the versions of all dependent assemblies it was built with. At runtime, .net will try to locate assemblies of these versions unless there is configuration metadata telling it otherwise.

The .net runtime will always search the GAC first (there are tricks to subvert this) unless the assembly is not strong named and then fall back to the bin paths configured for the application. For details on assembly loading see this MSDN article. Side note: The GAC is evil and I am inclined to look down upon those who use it and their children. Other than that, I have no strong opinions on the matter.

So software teams have to maintain build processes that ensure that any version of its application is always built with an agreed upon set of assemblies. Some of these assemblies may be written by the same application team, others might be system assemblies that ship with the OS, others might be official microsoft assemblies freely available from Microsoft Downloads, and others might be from other commercial or open source projects. Keeping all of these straight can be a herculean effort. Often it comes down to just putting all of these dependencies (in their compiled form) in source control in the same repo as the consuming application. For the longest time - like a decade - this was version management in .net. and it remains so for many today.

This suffers several disadvantages. Bloated source control for one thing. These assemblies can eventually take over the majority of a repository's space (multiple gigabytes). They do not lend themselves to being preserved as deltas and so alot of their data is duplicated in the repo. Eventually, this crushes the productivity of builds and developer work flow since so much time is wasted pulling these bits down from source control.

One strategy to overcome this inefficiency is for larger teams or groups of teams to place all of their dependent assemblies in a common directory. This can save space and duplication since different teams that depend on the same assembly will basically share the same physical file. But now teams must version dependencies at the same cadence and eventually find themselves bound to a monolith leading to other practices that impede the maintainability of the application and make engineers cry and want to kill baby seals.

Package management

Enter package management. Package management performs several valuable functions. Here are just a few:

Dependency discovery - finding dependencies
Dependency delivery - downloading dependencies
Dependency storage and versioning outside of source control

Ruby was the first (between itself and .net) to implement this with RubyGems and later, inspired by ruby, .net introduced nuget.

A little history: ngem was the first incarnation of nuget that had several starts and stops. David Laribee came up with Nubular as the name for ngem in 2008 and it stuck. Later Dru Sellers, Rob Reynolds, Chris Patterson, Nick Parker, and Bil Simser picked it up as a ruby project instead of .net and started moving really fast.

In the meantime Microsoft had quietly been working on a project called NPack and had been doing so for about four months when they contacted the Nu team. Nu was getting wildly popular in a matter of a few short weeks. These teams combined forces because it was the best possible thing for the community - and to signify the joining forces it was renamed nupack.

Shortly thereafter it was discovered that Caltech had a tool called nucleic acid package or nupack for short so it was renamed to nuget which is what it remains today.

My guess, totally unsubstantiated, is that one reason why ruby was the first to develop this is because ruby has no assembly concept. Ruby is interpreted and therefore all of the individual code files are stored with the application. With assemblies, its at least somewhat sane to have a unit of dependency be a single file that has a version embedded inside and that is not easy to tamper with and break.

Similarities between gems and nugets

There is so much that is similar here that it is easy to be deceived that there is more similar than there really is. So first lets cover some things that truly are the same. Keep in mind nuget was originally implemented as gems that could be stored on rubygems.org so there is a reason for the similarities.

gemspec/nuspec

Both have a "spec" file stored at the root of the package that contain metadata about the package. Ruby calls this a gemspec and and nuget a nuspec. The key bits of data which both support are package version, name, content manifest and other packages this one depends on.

Feeds

Both gems and nuget packages can be discovered and downloaded from a http source or feed. These feeds expose an API allowing package consumers to query an http endpoint for which packages it has and which versions and a means of downloading the package file.

Semantic versioning

Both dependency resolvers are based on semantic versioning. However they use different nomenclature for specifying allowed ranges.

CLI

Both have a CLI and clearly the Nuget.exe cli commands come from ruby heriatage but have diverged. In Ruby the gem CLI plays a MUCH more central role than it does with Nuget. But both have the core capabilities of building, installing, uninstalling, publishing and querying package stores.

A case of cognitive dissonance and adjusting to gems

So for those with experience working on .net projects inside Visual Studio and managing dependencies with nuget, the story of dependency management is fairly streamlined. There are definitely some gotchas but lets look at the .net workflow that many are used to practicing with two points of focus. Application creation and application contributor. One involving the initial setup of an app's dependencies and the other capturing the experience of someone wanting to contribute to that application.

Setting up and managing dependencies

One begins by creating a "project" in visual studio that will be responsible for building your code. Then when you want to add dependencies, you use the package manager gui inside visual studio to find the package and add it to your project. To be clear, you can do this from the command console in visual studio too. Installing these will also install their dependencies and resolve versions using the version constraints specified in each package's nuspec.

Once this is done you can return to the nuget package manager in visual studio to see all the packages you depend on and that will include any packages they depend on as well recursively. Here you will also see if any of these packages have been updated and you will have the opportunity to upgrade them.

All of this setup is reflected in a collection of files in your visual studio project and also solution (a "container" for multiple projects in a single visual studio instance). Each project has a packages.config file that lists all package dependencies and their version. While this includes all dependencies in the "tree", the list is flat.

<?xml version="1.0" encoding="utf-8"?>
<packages>
  <package id="log4net" version="2.0.3" targetFramework="net40" />
  <package id="Microsoft.Web.Xdt" version="2.1.1" targetFramework="net40" />
  <package id="NuGet.Core" version="2.8.2" targetFramework="net40" />
  <package id="PublishedApplications" version="2.4.0.0" targetFramework="net40" />
  <package id="Rx-Core" version="2.1.30214.0" targetFramework="net40" />
  <package id="Rx-Interfaces" version="2.1.30214.0" targetFramework="net40" />
  <package id="Rx-Linq" version="2.1.30214.0" targetFramework="net40" />
  <package id="SimpleInjector" version="2.5.0" targetFramework="net40" />
</packages>

The visual studio solution includes a packages folder that acts as a local repository for each package and also includes a repositories.config file that simply lists the path to each of the packages.config files inside of your project.

Many .net projects do not include this projects folder in source control because each packages.config file include everything necessary to pull these packages from a nuget feed.

Contributing to an existing project with nuget dependencies

So now lets say I clone the above project with plans to submit a PR. Assuming I'm using automatic package restore (the current default), visual studio will download all the packages in the packages.config files to my own packages folder. It will pull the exact same versions of those packages that were commited to source control. Perfect! I can be pretty confident that when I commit my project and its dependent package manifests, others who clone this project will have the same experience.

Of course there are other nuances that can play into this like with multiple repositories that can also be configured but hopefully you get the general gist here.

Transplanting the same workflow to ruby

So I create a gem in ruby and declare some gems as dependencies in my gemspec file. I install my gem by running gem install and that adds my gem and its dependencies and their dependencies and so forth to my local gem repository. Just like with nuget, the version constraints declaired in my gemspec and the gemspecs of my dependencies are honored and used to resolve the gem versions downloaded. Everything works great and all my tests pass so I commit my gem to source control.

Now I'm on a different machine and clone my gem. I do a gem install of my gem which puts my gem into my local repo and also download and installs all my dependencies. Now it just so happens that the http gem, one of my dependencies, updated the day before and its update bumped its own version constraint on http_parser which had an undiscovered bug only occurring on machine names beginning with princess that injected random princess images into the html body. Well imagine my surprise when I invoke my gem to find princess images littering my screen because, naturally, I happen to be using my princess_elsa machine.

How did this happen?

Well gem install has no packages.config equivalent. Thus the gem versions I had initially downloaded, were not locked and when I ran gem install later, it simply installed all gems that complied with the version constraints in the gemspec. The bumped http gem still lied inside my constraint range so the fact that I got the different http_parser was completely legal. Its at this point that I start trashing ruby on twitter and my #IHateRuby slack channel.

Introducing bundler

Bundler fixes this. To be fair, I had been told about bundler in my very first Chef course. However, I didn't really get it. I wasn't able to map it to anything in my .net experience and it also seemed at the time that gem install just worked fine on its own. Why do I need this extra thing especially when I dont fully understand what it is doing or why I need it?

Today I would say, think of bundler being to gem what the visual studio nuget packet manager is to nuget.exe. Ruby, has no project system like Visual Studio (praise jesus). One could also point out that neither does C#. However, because its a royal pain to compile raw C# files using the command line compiler csc.exe, the vast majority of C# devs use visual studio and we often take for granted the extra services that IDE offers. I know the next generation of .net tooling is aiming to fix all this but I'm focusing on what everyone works with today.

Bundler works with two files that can be shipped with a gem: Gemfile and Gemfile.lock. A Gemfile includes a list of gems, gem sources and version constraints. This file augments the gem dependencies in a gemspec file. A typical ruby workflow is to clone a gem and then run bundle install. Bundle install works very similar to gem install and additionally creates a Gemfile.lock file in the root of your gem that includes the exact versions downloaded. This is the equivalent of a nuget packages.config. If the Gemfile.lock already exists, bundle install will not resolve gem versions but will simply fetch all the versions listed in the lock file. In fact, the Gemfile.lock is even more sophisticated than packages.config. It reveals the source feed from which each gem was downloaded and also represents the list of gems as a hierarchy. This is helpful because now if I need to troubleshoot where the gem dependencies originate, the lock file will reveal which parent gem caused a downloaded gems to be installed.

GEM
  remote: https://rubygems.org/
  remote: http://1.2.3.4:8081/artifactory/api/gems/localgems/
  specs:
    addressable (2.3.7)
    berkshelf (3.2.3)
      addressable (~> 2.3.4)
      berkshelf-api-client (~> 1.2)
      buff-config (~> 1.0)
      buff-extensions (~> 1.0)
      buff-shell_out (~> 0.1)
      celluloid (~> 0.16.0)
      celluloid-io (~> 0.16.1)
      cleanroom (~> 1.0)
      faraday (~> 0.9.0)
      minitar (~> 0.5.4)
      octokit (~> 3.0)
      retryable (~> 2.0)
      ridley (~> 4.0)
      solve (~> 1.1)
      thor (~> 0.19)
    berkshelf-api-client (1.2.1)
      faraday (~> 0.9.0)

This ruby workflow does not end with bundle install. The other staple command is bundle exec. bundle exec is called in front of a call to any ruby executable bin.

bundle exec rspec /spec/*_spec.rb

Doing this adjusts one's shell environment so that only the gems in the installed bundle are loaded when calling the ruby executable - in this case rspec.

This might seem odd to a .net dev unless we remind ourselves how the ruby runtime is fundamentally different from .net as described earlier. Namely that all installed gems are available to any ruby program run within a single ruby installation. Remember there can be multiple on the same machine and the particular ruby.exe called determines which ruby install is used. For a given version of ruby, its common practice for ruby devs to use the same installation. The local ruby repository is like a .net GAC but scoped to a ruby install instead of an entire machine. So even though I call bundle install with a Gemfile.lock, I may have a newer version of a gem than the version specified in the lock file. Now I have both and chances are that its the newer version that will be loaded if I simply invoke a ruby program that depends on it. So using bundle exec insures that only the gems explicitly listed in the lock file will be loaded when calling a ruby bin file.

More differences between ruby gems and nuget

I have to admit that the more that I work with ruby and become comfortable with the syntax and idioms, the more I like it. I really like the gem semantics. Here are some things I find superior to nuget:

gemspecs and Gemfiles are written in ruby. While all nuget artifacts are xml files. This means I can use ruby logic in these fies. For example, instead of including a bunch of xml nodes to define the files in my package I can include "s.files = `git ls-files`.split($\)" to indicate that all files in the repo should be included in the packaged gem.
gems separate the notion of runtime dependencies and development dependencies (things like rake, rspec and other development time tools). Also you can group dependencies using your own arbitrary labels and then call bundle and omit one or more groups from the bundle.
Not only do I have more flexibility in assigning individual dependencies in a Gemfile to different gem sources (feeds), bundler allows me to specify git URLs and a git ref (commit, branch, tag, etc.) and the gem source will be pulled from that url and ref. This is great for development.

There is alot more that could be mentioned here regarding the differences between nuget and gems. Its a rather hot topic in some circles. See this post as an example. My intent for this post is really to give a birdseye, view of the key differences especially in how it relates to dependency resolution. I hope others find this informative and helpful.