Managing Ruby versions and Gem dependencies from a .Net perspective by Matt Wrock

I've mentioned this in several posts. The last year I have coded primarily in ruby after spending over a decade in C#. See this post for my general observations contrasting Windows and Linux culture. Here I want to focus on language version as well as dependency management particularly comparing nuget to gem packages. This is a topic that has really tripped me up on several occasions. I used to think that ruby was inferior in its ability to avoid dependency conflicts, but in fact I just misunderstood it. Gems and Nuget are deceptively similar but there are some concepts that Ruby separates into different tools like Bundler and that Nuget consolidates into the core package manager with some assistance from Visual Studio.

In this post I'll draw some comparisons and also contrast the differences of managing different versions of ruby as opposed to different .Net runtimes and then explore package dependency management.

.Net Runtimes vs. Ruby Versions

Individual versions of .net are the moral equivalent of different ruby versions. However .net has just one physical install per runtime/architecture saved to a common place on disk. Ruby can have multiple installations of the same version and stored in any location.

In .net, which runtime version to use is determined at application compile time (there are also ways to change this at runtime). One specifies which .net version to compile against and then that version will be loaded whenever the compiled program is invoked. In .net, the program "owns" the process and the runtime libraries are loaded into that process. This is somewhat orchestrated by the Windows registry which holds the location of the runtime on disk and therefore knows where to find the base runtime libraries to load. Note that I'm focusing on .net on windows and not mono which can run .Net on linux.

Ruby versions can be located anywhere on disk. They do not involve the registry at all. Which version used depends entirely on which ruby executable is invoked. Unlike .net, the ruby user application is not the process entry point. Instead one always invokes the ruby binary (ruby.exe if on windows) and passes ruby files (*.rb) to load and run. Usually one controls which version is the system wide used version by putting that ruby bin folder on the path.

Loading the runtime, step by step

Lets look at both .net and ruby and see what exactly happens when you invoke a program.

.net

  1. Invoke an exe
  2. create process
  3. .net bootstrapper (corExeMain) loads .net runtime into process
  4. .net code (msil) is run

This glosses over some details but the main point is that each .net runtime resides in a single location on disk and each compiled .net program includes a bootstrapper called corExeMain that makes some system calls to locate that runtime.

Also note that it is very likely that the .exe invoked is not necessarily "the app". ASP.Net is a good example. Assuming a traditional IIS web server is hosting your ASP.Net application. There is an IIS worker process spawned by iis that hosts an asp.net application. A ASP.net developer did not write this worker process. They wrote code that is compiled into .DLL libraries. The ASP.Net worker process discovers these DLLs and loads them into the .Net runtime that they host.

ruby

Lets take the case of a popular ruby executable bundler

  1. Invoke bundle install on the command line
  2. Assuming C:\Ruby21-x64\bin is on the PATH, C:\Ruby21-x64\bin\bundler.bat is called. A "binstub" that ruby creates.
  3. This is a thin wrapper that calls ruby.exe C:/Ruby21-x64/bin/bundler
  4. This bundler file is another thin wrapper that loads the bundler ruby gem and loads the bundler file in that gem's bin directory
  5. This bundler file contains the entry point of the bundler application

Managing multiple versions of the runtime on the same machine

.net places its copies of runtime libraries in the Microsoft.NET directory of the system drive. A .net developer will compile their library targeting a specific runtime version. Also, a configuration file can be created for any .Net .exe file that can inform which runtime to load. The developer just needs to insure that the .net version they use exists on the machine.

With Ruby, its all a matter of pointing to the ruby bin file in the installed ruby directory tree you want to use and this may be influenced by one's PATH settings when not using absolute file paths.

rvm

One popular and convenient method of managing multiple ruby versions is using a tool called rvm. This is really meant for consumption on linux machines. I do believe it can be used with cygwin on windows machines. Personally I'm not a big cygwin fan and prefer to just use a linux VM where I can use native rvm if I need to switch ruby versions.

rvm exposes commands that can install different versions of ruby and easily switch one's working environment from one version to another.

omnibus installs

One big challenge in ruby is distributing an application. When one writes a ruby program either as a library to be consumed or as a command line executable, it is most common to package this code into one or more ruby gems. However, a gem is just a collection of .rb files and some other supporting files. The gem does not contain the ruby runtime itself. If I were to give someone a .gem file who was unfamiliar with ruby, they would not have a clue as to what to do with that file, but I would still love them.

That person, god bless them, would need to install ruby and then install that gem into the ruby installation.

So one way that ruby applications have begun to distribute themselves is via an omnibus. An omnibus installation ships with a full ruby runtime embedded in the distributed app. Its application code is still packaged as one or more gems and they are located in the special gems directory of this ruby installation along with all of its dependent gems. This also ensures that all the dependent gems and their required versions are preinstalled to eliminate the risk of dependency conflicts. Two example omnibus applications that I regularly work with are Chef and Vagrant.

So chef might have the following directory structure:

chef
|_bin
 |_chef.bat
 |_chef
|_embedded

Upon installation, chef/bin is added to the PATH so that calling chef from the command line will invoke the chef.bat file. That chef.bat file is a thin wrapper that calls chef/embedded/bin/ruby.exe and loads the ruby in chef/bin/chef which then calls into the chef gem.

So the advantage of the omnibus is a complete ruby environment that is not at risk of containing user loaded gems. The disadvantage is that any ruby app even if it is tiny needs to distribute itself with a complete ruby runtime which is not small.

Assemblies

Before we dive into nuget and gem packages, we need to address assemblies (.DLLs) which only exist in .net. How do assemblies, .gems and .nupkg (nuget) files map to one another? Assemblies are the final container of application logic and are what physically compose the built .net application. Assemblies were at one time a collection of code files that have been compiled down to IL(intermediate language) and packaged as a .DLL file. In package management terms, assemblies are what gets packaged but they are not the package.

Assemblies can exist in various places on disk. Typically they will exist in one of two places, the Global Assembly Cache (GAC) or in an application's bin directory. When a .net application is compiled, every assembly includes an embedded manifest of its version and the versions of all dependent assemblies it was built with. At runtime, .net will try to locate assemblies of these versions unless there is configuration metadata telling it otherwise.

The .net runtime will always search the GAC first (there are tricks to subvert this) unless the assembly is not strong named and then fall back to the bin paths configured for the application. For details on assembly loading see this MSDN article. Side note: The GAC is evil and I am inclined to look down upon those who use it and their children. Other than that, I have no strong opinions on the matter.

So software teams have to maintain build processes that ensure that any version of its application is always built with an agreed upon set of assemblies. Some of these assemblies may be written by the same application team, others might be system assemblies that ship with the OS, others might be official microsoft assemblies freely available from Microsoft Downloads, and others might be from other commercial or open source projects. Keeping all of these straight can be a herculean effort. Often it comes down to just putting all of these dependencies (in their compiled form) in source control in the same repo as the consuming application. For the longest time - like a decade - this was version management in .net. and it remains so for many today.

This suffers several disadvantages. Bloated source control for one thing. These assemblies can eventually take over the majority of a repository's space (multiple gigabytes). They do not lend themselves to being preserved as deltas and so alot of their data is duplicated in the repo. Eventually, this crushes the productivity of builds and developer work flow since so much time is wasted pulling these bits down from source control.

One strategy to overcome this inefficiency is for larger teams or groups of teams to place all of their dependent assemblies in a common directory. This can save space and duplication since different teams that depend on the same assembly will basically share the same physical file. But now teams must version dependencies at the same cadence and eventually find themselves bound to a monolith leading to other practices that impede the maintainability of the application and make engineers cry and want to kill baby seals.

Package management

Enter package management. Package management performs several valuable functions. Here are just a few:

  • Dependency discovery - finding dependencies
  • Dependency delivery - downloading dependencies
  • Dependency storage and versioning outside of source control

Ruby was the first (between itself and .net) to implement this with RubyGems and later, inspired by ruby, .net introduced nuget.

A little history: ngem was the first incarnation of nuget that had several starts and stops. David Laribee came up with Nubular as the name for ngem in 2008 and it stuck. Later Dru SellersRob Reynolds, Chris Patterson, Nick Parker, and Bil Simser picked it up as a ruby project instead of .net and started moving really fast. 
In the meantime Microsoft had quietly been working on a project called NPack and had been doing so for about four months when they contacted the Nu team. Nu was getting wildly popular in a matter of a few short weeks. These teams combined forces because it was the best possible thing for the community - and to signify the joining forces it was renamed nupack.
Shortly thereafter it was discovered that Caltech had a tool called nucleic acid package or nupack for short so it was renamed to nuget which is what it remains today.

My guess, totally unsubstantiated, is that one reason why ruby was the first to develop this is because ruby has no assembly concept. Ruby is interpreted and therefore all of the individual code files are stored with the application. With assemblies, its at least somewhat sane to have a unit of dependency be a single file that has a version embedded inside and that is not easy to tamper with and break.

Similarities between gems and nugets

There is so much that is similar here that it is easy to be deceived that there is more similar than there really is. So first lets cover some things that truly are the same. Keep in mind nuget was originally implemented as gems that could be stored on rubygems.org so there is a reason for the similarities.

gemspec/nuspec

Both have a "spec" file stored at the root of the package that contain metadata about the package. Ruby calls this a gemspec and and nuget a nuspec. The key bits of data which both support are package version, name, content manifest and other packages this one depends on.

Feeds

Both gems and nuget packages can be discovered and downloaded from a http source or feed. These feeds expose an API allowing package consumers to query an http endpoint for which packages it has and which versions and a means of downloading the package file.

Semantic versioning

Both dependency resolvers are based on semantic versioning. However they use different nomenclature for specifying allowed ranges.

CLI

Both have a CLI and clearly the Nuget.exe cli commands come from ruby heriatage but have diverged. In Ruby the gem CLI plays a MUCH more central role than it does with Nuget. But both have the core capabilities of building, installing, uninstalling, publishing and querying package stores.

A case of cognitive dissonance and adjusting to gems

So for those with experience working on .net projects inside Visual Studio and managing dependencies with nuget, the story of dependency management is fairly streamlined. There are definitely some gotchas but lets look at the .net workflow that many are used to practicing with two points of focus. Application creation and application contributor. One involving the initial setup of an app's dependencies and the other capturing the experience of someone wanting to contribute to that application.

Setting up and managing dependencies

One begins by creating a "project" in visual studio that will be responsible for building your code. Then when you want to add dependencies, you use the package manager gui inside visual studio to find the package and add it to your project. To be clear, you can do this from the command console in visual studio too. Installing these will also install their dependencies and resolve versions using the version constraints specified in each package's nuspec.

Once this is done you can return to the nuget package manager in visual studio to see all the packages you depend on and that will include any packages they depend on as well recursively. Here you will also see if any of these packages have been updated and you will have the opportunity to upgrade them.

All of this setup is reflected in a collection of files in your visual studio project and also solution (a "container" for multiple projects in a single visual studio instance). Each project has a packages.config file that lists all package dependencies and their version. While this includes all dependencies in the "tree", the list is flat.

<?xml version="1.0" encoding="utf-8"?>
<packages>
  <package id="log4net" version="2.0.3" targetFramework="net40" />
  <package id="Microsoft.Web.Xdt" version="2.1.1" targetFramework="net40" />
  <package id="NuGet.Core" version="2.8.2" targetFramework="net40" />
  <package id="PublishedApplications" version="2.4.0.0" targetFramework="net40" />
  <package id="Rx-Core" version="2.1.30214.0" targetFramework="net40" />
  <package id="Rx-Interfaces" version="2.1.30214.0" targetFramework="net40" />
  <package id="Rx-Linq" version="2.1.30214.0" targetFramework="net40" />
  <package id="SimpleInjector" version="2.5.0" targetFramework="net40" />
</packages>

The visual studio solution includes a packages folder that acts as a local repository for each package and also includes a repositories.config file that simply lists the path to each of the packages.config files inside of your project.

Many .net projects do not include this projects folder in source control because each packages.config file include everything necessary to pull these packages from a nuget feed.

Contributing to an existing project with nuget dependencies

So now lets say I clone the above project with plans to submit a PR. Assuming I'm using automatic package restore (the current default), visual studio will download all the packages in the packages.config files to my own packages folder. It will pull the exact same versions of those packages that were commited to source control. Perfect! I can be pretty confident that when I commit my project and its dependent package manifests, others who clone this project will have the same experience.

Of course there are other nuances that can play into this like with multiple repositories that can also be configured but hopefully you get the general gist here.

Transplanting the same workflow to ruby

So I create a gem in ruby and declare some gems as dependencies in my gemspec file. I install my gem by running gem install and that adds my gem and its dependencies and their dependencies and so forth to my local gem repository. Just like with nuget, the version constraints declaired in my gemspec and the gemspecs of my dependencies are honored and used to resolve the gem versions downloaded. Everything works great and all my tests pass so I commit my gem to source control.

Now I'm on a different machine and clone my gem. I do a gem install of my gem which puts my gem into my local repo and also download and installs all my dependencies. Now it just so happens that the http gem, one of my dependencies, updated the day before and its update bumped its own version constraint on http_parser which had an undiscovered bug only occurring on machine names beginning with princess that injected random princess images into the html body. Well imagine my surprise when I invoke my gem to find princess images littering my screen because, naturally, I happen to be using my princess_elsa machine.

How did this happen?

Well gem install has no packages.config equivalent. Thus the gem versions I had initially downloaded, were not locked and when I ran gem install later, it simply installed all gems that complied with the version constraints in the gemspec. The bumped http gem still lied inside my constraint range so the fact that I got the different http_parser was completely legal. Its at this point that I start trashing ruby on twitter and my #IHateRuby slack channel.

Introducing bundler

Bundler fixes this. To be fair, I had been told about bundler in my very first Chef course. However, I didn't really get it. I wasn't able to map it to anything in my .net experience and it also seemed at the time that gem install just worked fine on its own. Why do I need this extra thing especially when I dont fully understand what it is doing or why I need it?

Today I would say, think of bundler being to gem what the visual studio nuget packet manager is to nuget.exe. Ruby, has no project system like Visual Studio (praise jesus). One could also point out that neither does C#. However, because its a royal pain to compile raw C# files using the command line compiler csc.exe, the vast majority of C# devs use visual studio and we often take for granted the extra services that IDE offers. I know the next generation of .net tooling is aiming to fix all this but I'm focusing on what everyone works with today.

Bundler works with two files that can be shipped with a gem: Gemfile and Gemfile.lock. A Gemfile includes a list of gems, gem sources and version constraints. This file augments the gem dependencies in a gemspec file. A typical ruby workflow is to clone a gem and then run bundle install. Bundle install works very similar to gem install and additionally creates a Gemfile.lock file in the root of your gem that includes the exact versions downloaded. This is the equivalent of a nuget packages.config. If the Gemfile.lock already exists, bundle install will not resolve gem versions but will simply fetch all the versions listed in the lock file. In fact, the Gemfile.lock is even more sophisticated than packages.config. It reveals the source feed from which each gem was downloaded and also represents the list of gems as a hierarchy. This is helpful because now if I need to troubleshoot where the gem dependencies originate, the lock file will reveal which parent gem caused a downloaded gems to be installed.

GEM
  remote: https://rubygems.org/
  remote: http://1.2.3.4:8081/artifactory/api/gems/localgems/
  specs:
    addressable (2.3.7)
    berkshelf (3.2.3)
      addressable (~> 2.3.4)
      berkshelf-api-client (~> 1.2)
      buff-config (~> 1.0)
      buff-extensions (~> 1.0)
      buff-shell_out (~> 0.1)
      celluloid (~> 0.16.0)
      celluloid-io (~> 0.16.1)
      cleanroom (~> 1.0)
      faraday (~> 0.9.0)
      minitar (~> 0.5.4)
      octokit (~> 3.0)
      retryable (~> 2.0)
      ridley (~> 4.0)
      solve (~> 1.1)
      thor (~> 0.19)
    berkshelf-api-client (1.2.1)
      faraday (~> 0.9.0)

This ruby workflow does not end with bundle install. The other staple command is bundle exec. bundle exec is called in front of a call to any ruby executable bin.

bundle exec rspec /spec/*_spec.rb

Doing this adjusts one's shell environment so that only the gems in the installed bundle are loaded when calling the ruby executable - in this case rspec.

This might seem odd to a .net dev unless we remind ourselves how the ruby runtime is fundamentally different from .net as described earlier. Namely that all installed gems are available to any ruby program run within a single ruby installation. Remember there can be multiple on the same machine and the particular ruby.exe called determines which ruby install is used. For a given version of ruby, its common practice for ruby devs to use the same installation. The local ruby repository is like a .net GAC but scoped to a ruby install instead of an entire machine. So even though I call bundle install with a Gemfile.lock, I may have a newer version of a gem than the version specified in the lock file. Now I have both and chances are that its the newer version that will be loaded if I simply invoke a ruby program that depends on it. So using bundle exec insures that only the gems explicitly listed in the lock file will be loaded when calling a ruby bin file.

More differences between ruby gems and nuget

I have to admit that the more that I work with ruby and become comfortable with the syntax and idioms, the more I like it. I really like the gem semantics. Here are some things I find superior to nuget:

  1. gemspecs and Gemfiles are written in ruby. While all nuget artifacts are xml files. This means I can use ruby logic in these fies. For example, instead of including a bunch of xml nodes to define the files in my package I can include   "s.files = `git ls-files`.split($\)" to indicate that all files in the repo should be included in the packaged gem.
  2. gems separate the notion of runtime dependencies and development dependencies (things like rake, rspec and other development time tools). Also you can group dependencies using your own arbitrary labels and then call bundle and omit one or more groups from the bundle.
  3. Not only do I have more flexibility in assigning individual dependencies in a Gemfile to different gem sources (feeds), bundler allows me to specify git URLs and a git ref (commit, branch, tag, etc.) and the gem source will be pulled from that url and ref. This is great for development.

There is alot more that could be mentioned here regarding the differences between nuget and gems. Its a rather hot topic in some circles. See this post as an example. My intent for this post is really to give a birdseye, view of the key differences especially in how it relates to dependency resolution. I hope others find this informative and helpful.

Calling knife commands from ruby without shelling out by Matt Wrock

ruby_knife.jpg

When I started out writing Chef cookbooks, occasionally I'd want to run a knife command from my recipe, library, LWRP or my own gem or knife plugin and I'd typically just use the ruby system method which just creates a subshell to run a command.  This never felt quite right. Composing a potentially complex command by building a large string is cumbersome and not politely readable. Then there is the shelling out to a subshell which is inefficient. So after doing some cursory research I was surprised to find little instruction or examples on how to use straight ruby to call knife commands. Maybe my google foo just wasn't up to snuff.

So here I'll run through the basics of how to compose a knife command in ruby, feeding it input and even capturing output and errors.

A simple knife example from ruby

We'll start with a complete but simple example of what a knife call in ruby looks like and then we can dissect it.

# load any dependencies declared in knife plugin
Chef::Knife::Ssh.load_deps

# instantiate command
knife = Chef::Knife::Ssh.new

# pass in switches
knife.config[:attribute] = 'ipaddress'
knife.config[:ssh_user] = "root"
knife.config[:ssh_password_ng] = "password"
knife.config[:config_file] = Chef::Config[:config_file]

# pass in args
knife.name_args = ["name:my_node", "chef-client"]

# setup output capture
stdout = StringIO.new
stderr = StringIO.new
knife.ui = Chef::Knife::UI.new(stdout, stderr, STDIN, {})

# run the command
knife.run

puts "Output: #{stdout.string}"
puts "Errors: #{stderr.string}"

Setup and create command

This is very straight forward. Knife plugins may optionally define a deps method which is intended to include any require statements needed to load the dependencies of the command. Not all plugins implement this, but you should always call load_deps (which will call deps) just in case they do.

Finally, new up the plugin class. The class name will always reflect the command name where each command name token is capitalized in the class name. So knife cookbook list is CookbookList.

Command input

Knife commands typically take input via two forms:

Normal command line arguments

For instance:

knife cookbook upload my_cookbook

where my_cookbook is an argument to CookbookUpload.

These inputs are passed to the knife command in a simple array via the name_args method ordered just as they would be on the command line.

knife.name_args = ["name:my_node", "chef-client"]

Using our knife ssh example, here we are passing the search query and ssh command.

Command options

These include any command switches defined by either the plugin itself or its knife base classes so it can always include all the standard knife options.

These are passed via the config hash:

knife.config[:attribute] = 'ipaddress'
knife.config[:ssh_user] = "root"
knife.config[:ssh_password_ng] = "password"
knife.config[:config_file] = Chef::Config[:config_file]

Note that the hash keys are usually but not necessarily the same as the name of the option switch so you may need to review the plugin source code for these.

Capturing output and errors

By default, knife commands send output and errors to the STDOUT and STDERR streams using Knife::UI. You can intercept these by providing an alternate UI instance as we are doing here:

stdout = StringIO.new
stderr = StringIO.new
knife.ui = Chef::Knife::UI.new(stdout, stderr, STDIN, {})

Now instead of logging to STDOUT and STDERR, the command will send this text to our own stdout and stderr StringIO instances. So after we run the command we can extract any output from these instances.

For example:

puts "Output: #{stdout.string}"
puts "Errors: #{stderr.string}"

Running the command

This couldn't be simpler. You just call:

knife.run

Hope this is helpful.

Help test the future of Windows Infrastructure Testing on Test-Kitchen by Matt Wrock

Update: Test-Kitchen beta is here! See this post for an example using the latest prerelease gems.

I've posted about using Test-Kitchen on Windows a couple times. See this post and this one too. Both of these posts include rather fragile instructions on how to prepare your environment in order to make this possible. Writing them feels like handing out scissors and then encouraging people to run on newly oiled floors generously sprinkled with legos while transporting said scissors. Then, if they are lucky, their windows nodes will converge and kick off tests ready for their review once they reach "the other side." Its dangerous. Its exciting. Pain may be an active ingredient.

Well development has been ramping up in this effort. Some of the outside forks have now been merged into a dedicated branch of the official Test-Kitchen repo - windows-guest-support and its been rebased with the latest master branch of Test-Kitchen. A group of folks from within and outside of chef including test-kitchen creator Fletcher Nichol as well as Salim Afiune who got the ball rolling on windows compatibility meet regularly to discuss progress and bugs. I'm honored to be involved and contributed the winrm based file copying logic (future blog post pending - my wounds have not yet fully healed).

I can't wait until the day that no special instructions are required and we think that day is not far off but here is an update on how to get up and running with the latest bits. Lots have changed since my last post but I think its much simpler now.

What to install and where to get it

First clone the windows-guest-support branch of the Test-Kitchen repo:

git clone -b windows-guest-support https://github.com/test-kitchen/test-kitchen 

Build and install the gem. If you are running the chefdk on either windows or linux, you can use the rake task dk_install which will do the build and install and additionally overlay the bits on top of the omnibussed Test-Kitchen.

rake dk_install

This may not be compatible with all drivers. I use it regularly with:

Lets run through the vagrant setup.

Clone the windows-guest-support branch of kitchen-vagrant:

git clone -b windows-guest-support https://github.com/test-kitchen/kitchen-vagrant

Build and install the gem:

rake install

You should now be good to go:

C:\> kitchen -v
Test Kitchen version 1.3.2.dev

Configuration

There is just one thing that needs changing in your .kitchen.yml configuration file. As an example, here is my .kitchen.yml for a recent PR of mine adding windows support to the chef-minecraft cookbook:

driver_plugin: vagrant

provisioner:
  name: chef_zero

platforms:
- name: windows-2012R2
  driver_config:
    box_url: https://wrock.blob.core.windows.net/vhds/vbox2012r2.box
    communicator: winrm
    vm_hostname: false    
  transport:
    name: winrm

- name: ubuntu-12.04
  run_list:
  - recipe[ubuntu]
  driver_config:
    box: hashicorp/precise64

suites:
- name: default
  run_list:
  - recipe[minitest-handler]
  - recipe[minecraft]
  attributes:
    minecraft:
      accept_eula: true

The windows box hosted in my Azure storage is an evaluation copy due to expire in a couple months. I try to rebuild it before it expires. Note the transport setting here:

transport:
  name: winrm

This tells test-kitchen to use the winrm transport instead of the default ssh transport. Furthermore, you will notice that a

kitchen list

produces slightly modified output:

Instance                Driver   Provisioner  Transport  Last Action
default-windows-2012R2  Vagrant  ChefZero     Winrm      <Not Created>
default-ubuntu-1204     Vagrant  ChefZero     Ssh        <Not Created>

Note the new transport column.

A note for Hyper-V users

I tend to use Hyper-V on my personal windows laptop and VirtualBox on my work Ubuntu laptop. I have only one issue on Hyper-V now. It hangs when vagrant tries to change the hostname of the box. I believe this is a bug in Vagrant. If you interrupt the box provisioning and boot into the box, it then blue screens - at least this has been my experience. To work around this for now I comment out line 24 of templates/Vagrantfile.erb in the kitchen-vagrant driver:

<% if config[:vm_hostname] %>
  # c.vm.hostname = "<%= config[:vm_hostname] %>"
<% end %>

Then I reinstall the gem.

Tip: The url to my Hyper-V vagrant box with an evaluation copy of windows 2012R2 is:

https://wrock.blob.core.windows.net/vhds/hyperv2012r2.box

Lets all join hands and bow our heads in convergence

You'll appreciate the spiritual tone when your screen reveals a converged vm with passing tests. Either that or you will rage quit windows when this all goes to bloody hell, but I'm gonna try and keep a "glass half full" attitude here. You are welcome to follow along with me and the mine craft server cookbook. Clone my repo:

git clone -b windows https://github.com/mwrock/chef-minecraft

Now set the .kitchen-vagrant.yml file to be the "active" kitchen config file instead of .kitchen.yml which is configured to use DigitalOcean:

Powershell

$env:KITCHEN_YAML=".kitchen-vagrant.yml"

Bash

export KITCHEN_YAML=.kitchen-vagrant.yml

And all together now on 1, 2, 3...Converge!!

kitchen converge default-windows-2012R2

While you wait...:

Just 3 minutes later, its a success!!!

       15 tests, 5 assertions, 0 failures, 0 errors, 0 skips
         - MiniTest::Chef::Handler
       Running handlers complete
       Chef Client finished, 34/57 resources updated in 180.231501 seconds
       Finished converging <default-windows-2012R2> (5m4.94s).

Side note on automating local policy

This might be somewhat unrelated but I just cannot let it go. The minecraft server cookbook creates a windows scheduled task (kinda like a linux cron job) that runs the java process that hosts the minecraft server and it creates a user under which the job runs. In order to run a scheduled task, a windows user must have the "log on as batch job" right configured in their local policy.

Turns out this is a bit tricky to automate. I'll spare the audience from the horror which makes this possible but if you must look, see https://github.com/mwrock/chef-minecraft/blob/windows/templates/default/LsaWrapper.ps1.erb. Basically this can only be done by calling into the windows API as is done here. Big thanks to Fabien Dibot for this information!

Hurry up and wait! Test and provide feedback

There is alot of work going into making sure this is stable and provides a good experience for those wanting to test windows infrastructure with test-kitchen. However there are so many edge cases that are easy to miss. I very much encourage anyone wanting to try this out to do so and reach out via github issues to report problems.

More windows packaging for vagrant and fixing 1603 errors during MSI installs by Matt Wrock

This post is largely a follow up to my November post In search of a light weight windows vagrant box. If you are interested in some pointers to get your windows install as small as possible and then package it up into a Hyper-V or VirtualBox vagrant box file, I'd encourage you to read it. this post will cover three main topics:

  • Why a windows box (vagrant or otherwise) may suffer from 1603 errors when installing MSIs (this is why i set out to repackage my boxes)
  • New "gotchas" packaging Hyper-V boxes on the Windows 10 technical preview
  • LZMA vs. GZIP compression...a cage match

Caution: This Installation may be fatal!

While error code 1603 is a standard MSI error code, I can assure you it is never good and in fact it is always fatal. 1603 errors are "Fatal errors." First a quick primer in troubleshooting failed MSI installs.

MSI installs may simply fail silently leaving no clue as to what might have happened. That can be common of many installation errors especially if you are performing a silent install. The install will likely emit an erroneous exit code at the least but perhaps nothing else. This is when it is time to use the log file switch and add some verbosity for good measure. This may assist you in tracking down an actionable error message or flood you with more information than you ever wanted or both.

The log file is usually generated by adding:

/lv c:\some\log\file.log

to your MSIEXEC.exe command. If the install fails, give this file a good look over. It may seem overly cryptic and will largely contain info meant to be meaningful only to its authors but more often than not one can find the root cause of a failed install within this file.

In mid August of 2014, microsoft rolled out an update KB 2918614 that caused many machines to raise this error when installing MSIs. An almost universal fix was found and that was to uninstall KB 2918614. But in this age of rolling forward, rolling back is so 2013. Months later a hotfix was issued KB3000988. In short this error can occur if you have patch KB2918614  and are running an install with an admin user that has never logged into the box before. In my case I was installing the chef client to start a Test-Kitchen run on a newly provisioned vagrant box.

I could manually install the chef client just fine if I hit this error because that entailed actually logging into the box. However after doing this several times it gets really old but running through a full vagrant packaging can be a multi night process that I have been avoiding but can do so no longer.

Packaging Hyper-V Vagrant boxes on windows 10

Tl;dr: you can't.

You can call me a "Negative Nancy" but I refuse to wear your labels. 

Hyper-V has changed the format it uses to store metadata about the VM. This has been stored in XML format until now. When you package a vagrant Hyper-V box, you include this file in the .box package file and then when you import it, vagrant reads from it and extracts the vital pieces of metadata like cores, memory, network info, etc in order to correctly create a new VM based on that data. It does NOT simply import that file since it contains some unique identifiers that could possibly conflict with other VMs on your host system.

Windows 10 uses a binary format to store this data with a .vmcx extension. This is supposed to provide better performance and reliability when changing vm settings. However it also renders a vagrant import doomed. Thankfully, one can still import vagrant boxes packaged in the xml format and Hyper-V will migrate them to the new format, but this migration is unidirectional at the moment.

I'm hoping future releases will be able to export machines in XML format or at the least the .vmcx format will be published so that vagrant contributors can add support to these new boxes. For now, I'm just gonna need to find a pre v10 windows host to create an xml based VM export that I can package. (I accept donations). Funny how I have access to thousands of guest VMs but the only physical windows boxes I work with are my personal laptop and my wife and kids with Windows Home edition (no Hyper-V). So on to creating a VirtualBox box file.

Update: I was able to package a Hyper-V box by simply using the same box artifacts I had used in my previous box and replacing the virtual hard drive with my updated one. It just needs to have the same name. This works as long as the vm metadata equally applies to the new drive which was the case for me.

LZMA compression: smaller payload larger compression/decompression overhead

In my November post I discussed the benefits of using the LZMA format to package the box. This format is more efficient but takes significantly longer to complete the compression. My personal opinion is that compression time is not that important compared with download and decompression time since the former is done far less frequently and can be scheduled "out of band" so to speak. Better compression is even more important with windows boxes because they are significantly larger than *nix flavored machines.

Arthur Maltson commented on twitter the other day that he sticks with gzip over lzma because the lzma decompression is also considerably longer. I hadn't noticed this but I also did not measure it closely. So lets have a closer look at 3 key factors: compressed box size, download time and decompression time.

This week I rebuilt my windows box including the  KB3000988 hotfix mentioned above. I created both an lzma .box file and a gzip version for VirtualBox so I could compare the two. Both are identical in content. The gzip box weighs in at 3.6GB and the lzma version is 2.8GB. About a 22% delta. Not bad but also not as large of a delta as my observations in November.

Anyone can do the math on the download time. I get about 13mbps on my home FIOS internet connection. So the .8GB delta should mean the gzip will take about 9 extra minutes to download assuming I am pulling the box from an online source. I keep my boxes in Azure storage. Now here is the kicker: the LZMA compressed box takes about 6 minutes to decompress compared to about 1 minute with the gzip. So overall I'm saving just under 5 minutes with the LZMA box. A five minutes savings is great but in light of a total one hour download and the two to three hours it took to produce the initial compressed box, I'm thinking the gzip is the winner here. There are other benefits too. For instance this means you are better off simply using the vagrant package command for VirtualBox boxes meaning more simplicity.

Furthermore it is important to note that Vagrant downloads and decompresses the package only once and caches it in your .vagrnt.d folder. All "vagrant up" commands simply copy the previously downloaded and decompressed image to a new VM. So any savings yielded from a smaller download is only rewarded one time per box on any one host assuming you do not explicitly delete the box.

Staying "in the know" with podcasts by Matt Wrock

TL;DR: There will be no dog hosted podcasts discussed here but please enjoy this adorable image.

TL;DR: There will be no dog hosted podcasts discussed here but please enjoy this adorable image.

I love podcasts and I credit them, those who produce them and their guests for playing a significant role in developing my career and passions. You can skip to the end of this post to check out the podcasts I listen to today, but allow me to pontificate about podcasts and how  I like to consume them.

I started listening to podcasts (mostly technical) almost ten years ago. Around that time I got interested in ultra marathons (any run longer than 26.2 miles) and they would keep me company on my monthly 50K runs mostly in the dark through the trails of Chino Hills State Park. Back then I had been developing software professionally for several years and had done some truly cool stuff but mostly in a cave of my own making. I am a self-taught coder and what I knew at the time I had learned mostly from books and my own tinkering. I was not at all "plugged in" to any developer community and the actual human developers I knew were limited to those at my place of work. Podcasts changed all of that.

High level awareness over deep mastery

First things first, if you set aside time to listen to a podcast with the hopes of really learning some deep details about a particular topic, you may be disappointed. This is not to say that podcasts lack rich technical content, they simply are not the medium by which one should expect to gain mastery over a given topic.

Most will agree that technology workers like those likely reading this post are constantly inundated with new technologies, tools, and ideas. Sometimes it can feel like we are constantly making decisions as to what NOT to learn because no human being can possibly set out to study and even gain a novice ability to work with all of this information. So its important that the facts we use to decide where to invest our learning efforts are as well informed as possible.

I like the fact that I can casually listen to several podcasts and build an awareness of concepts that may be useful to me and that I can draw from later at a deeper level. There have now been countless times that I have come across a particular problem and recall something I heard in a podcast that I think may be applicable. At that time I can google the topic and either determine that its not worth pursuing or start to dive in and explore.

So many trends and ideas - you need to be aware

There is so much going on in our space and at such a fast pace. Like I mention above, its simply impossible to grasp everything. Its also impossible to simply follow every trending topic. However we all need to maintain some kind of feed to the greater technical community in order to maintain at least a basic awareness of what is current in our space. Its just too easy to live out our careers in isolation, regardless of how smart we are, and miss out on so many of the great ideas in circulation around us.

When I started listening to podcasts, my awareness and exposure to new ideas took off and allowed me to follow new disciplines that truly stretched me. I may not have gained these awarenesses  had I not had this link to the "outside world." 

Some of the significant "life changing" ideas that podcasts introduced me to were: Test Driven Development, Inversion of Control patterns and container implementations, several significant Open Source projects but more importantly, a curiosity to become actively involved in open source.

Making a bigger impact

After listening to several podcasts I began to take stock of my career and realize that while I had accomplished to put out some good technology and gain notoriety within my own work place, that notoriety and overall impact did not reach far beyond that relatively small sphere of influence. Listening to podcasts and being exposed to the guests that appeared on them made me recognize the value of "getting out there" and becoming involved with a broader group. This especially hit home when I decided to change jobs after being with the same employer for nine years.

It was in large part thanks to some of the prolific bloggers I heard interviewed that inspired me to start my own blog. I had listened to tons of open source project contributors talk about the projects they started and maintain and I eventually started my own projects. A couple of these got noticed and I have now been invited to speak on a few podcasts myself. That just seems crazy and tends to strongly invoke my deep seated imposter syndrome, but they were all alot of fun.

I even got to work with a podcaster who I enjoyed listening to for years, David Starr (@elegantcoder),  and had the privilege of sitting right next to him every day. What a treat and I have to say that the real life David lived up to the episodes I enjoyed on my runs years before. If you want to hear someone super smart, I'm talking about David, have a listen to his interview on Hanselminutes.

Podcasts I listen to

So my tastes and the topics I tend to gravitate towards have changed over the past few years. For instance, I listen to more "devopsy" podcasts and less webdev shows than I used to but I still religiously listen to some of the first podcasts I started with. Some I enjoy more for the host than the topics covered.

Here are the podcasts I subscribe to today in alphabetical order:

.Net Rocks!

This may have been the first series I listened to and I still listen now and again. As the name suggests, its focus is on .net technologies. Carl Franklin and Richard Campbell do a great and very professional job producing this podcast.

Arrested Devops

This is a fairly new podcast focusing on devops topics and usually includes not only the hosts, Matt Stratton, Trevor Hess, and Bridget Kromhout but also one or more great guests knowledgeable of devops topics. You will also learn, and I'll just tell you right now, that there is always devops in the banana stand. I did not know that.

I've had the pleasure of meeting Matt on a few occasions at some Chef events. He's a great guy, fun to talk to and passionate about devops in the windows space.

The Cloudcast

Put on by Aaron Delp and Brian Gracely, I just started listening to this one and so far really like it. I work for a cloud so it seems only natural that  listen to such a podcast.

Devops Cafe

Another great podcast focusing on devops topics put on by John Willis and Damon Edwards. The favicon of their website looks like a Minecraft cube. Is there meaning here? I don't know but I like it.

Food Fight Show

Another Devops centered podcast hosted by Nathan Harvey and Brandon Burton. The show often covers topics relevant to the Chef development community. So if you are interested in Chef, I especially recommend this show but its coverage certainly includes much more.

Hanselminutes

Another show that I have been listening to since the beginning of my podcast listening. Its hosted by Scott Hanselman and I think he has a real knack for interviewing other engineers. Many of the shows cover topics relevant to Microsoft topics but in recent years Scott has been focusing on alot on broad, and I think important, social issues and how they intersect with developer communities. Its really good stuff.

Herding Code

A great show that often, but not necesarily always focuses on web based technologies. These guys - Jon GallowayK. Scott Allen, Kevin Dente, and Scott Koon - ask alot of great questions of their guests and have the ability to dive deep into technical issues.

Ops All the Things

Put on by Steven Murawski and Chris Webber talking about devops related topics. I learned about Steven from his appearances on several other podcasts talking about Microsoft's DSC (Desired State Configuration) and his experiences working with it at Stack Exchange. I've had the privilege of meeting Steven and recently working with him on a working group aimed at bringing Test-Kitchen (an ifrastructure automation testing tool) to Windows.

PowerScripting Podcast

A great show focused on powershell hosted by Jonathan Walz and Hal Rottenberg. If you like or are interested in powershell, you should definitely subscribe to this podcast. They have tons of great guests including at least three episodes with Jeffrey Snover the creator of powershell.

Runas Radio

A weekly interview show with Richard Campbell and an interesting guest focusing on Microsoft IT Professional (Ops) and lately many "devops" related guests and topics.

The Ship Show

Another podcast focused on devops topics hosted by Join J. Paul Reed, Youssuf El-KalayEJ Ciramella, Seth Thomas, Sascha Bates , and Pete Cheslock. These episodes often include great discussion both among the hosts and with some great guests.

Software Defined Talk

Another new show in my feed but this one is special. Its hosted by Michael Coté, Matt Ray, and Brandon Whichard. I find these guys very entertaining and informative. The show tends to focus on general market trends in the software industry but there is something about the three of these guys and their personalities that I find really refreshing. I walk away from all of these episodes with a good chuckle and with several tidbits of industry knowledge I didn't have before.

Software Engineering Radio

Here is another show that I have been listening to since the beginning. One thing I like about this series is that it really has no core technical focus and therefore provides a nice range of topics across, "devops", process management, and engineering covering several different disciplines. I highly recommend a recent episode, Gang of Four – 20 Years Later.

This Developers Life

There hasn't been a new episode in over a year and perhaps there never will be another but each of these episodes are a must listen. If you like the popular This American Life podcast, you should really enjoy this series which shamelessly copies the former but focuses on issues core to development. Scott Hanselman and Rob Connery are true creative genius here.

Windows Weekly

It took me a couple episodes to get into this one but I now look forward to it every week. Hosted by Leo Laporte, Mary Jo Foley and Paul Thurrott, it takes a more "end user" view into Microsoft technologies. Now that I no longer work for Microsoft I find it all the more interesting to get some inside scoop on that place where I used to work.

The Goat Farm

I just discovered this and it looks like another good addition to my list. Run by Michael Ducy and Ross Clanton. Just listened to my first episode last night: Taylorism, Hating Agile, and DevOps at CSG.