Getting Ready…Troubleshooting unattended windows installation by Matt Wrock

ready.PNG

I install windows (and linux) A LOT in my role at CenturyLink Cloud automating our infrastructure rollout and management. Sometimes things go wrong. Usually if our provisioning code has been waiting for more than a few minutes for the machine to be reachable I know something is not right. So I might pop open a VMWare console and see this ever familiar screen. The windows installation is “Getting Ready.” That may fill one with the adrenaline of sweet anticipation but I know this only ends in disappointment. I can assure you that if windows is not ready now, it will never be ready. As in never, ever, ever ready.

In the past I have sat staring into the spinning circle of emptiness wondering what in gods name is windows doing. There are no error messages and usually nothing helpful in the VMWare events other than telling me that the OS customization has failed. Mmm…thanks. Sometimes after 5 or 15 minutes, the OS may come to life but often not in a state that our provisioning can connect to over winrm. I’m usually caught off guard by this since I have been spending the past several minutes in a very intense Vulcan mind meld with my monitor. Hoping somehow to break through and thinking I’m just beginning to feel the silent, cold, lonely suffering of a failed domain join when suddenly I am asked to press ctrl+alt+delete. Well…ok…I will…and slowly, as if just awoken from one of those inception dreams within a dream within another dream and having aged hundreds of years, I type just that – ctrl+alt+delete.

OK. You got me. Ctrl+Alt+Del does not work in a VMWare console, but you get the idea. Anyhoo, I next run off to the event logs reading lot and lots of events that are entirely unhelpful and provide no clues. Usually this all ends up being some stupid error like providing a faulty domain admin password to the unattend file. Not too long ago we added code to our windows provisioning that adds a second NIC and that introduced a few issues leading to this phenomenon until I got the sequence just right of adding the NIC, disabling it, configuring it and enabling it. But a couple weeks ago I ran into a new issue that really stumped me and I was not able to solve by looking over my provisioning code or configuration data. This prompted me to research how to get to the bottom of what's going on when Windows is “Getting Ready.” In this post I will cover what I learned and hopefully reveal clues that can help others figure out how to get out of these installation hang-ups

Overview of CenturyLink Cloud’s server provisioning sequence

It may help to point out roughly how we go about installing our windows boxes. Our methods may be different from yours but that should be irrelevant and the techniques here to troubleshooting windows installation hangs and errors should be just as applicable to just about any unattended windows install. Our windows servers do run server 2012 R2 so older OSs may certainly be different.

We have been using chef for our server automation and, in particular, Chef-Metal for our provisioning process. We have written a custom Chef-Metal Vsphere driver that leverages the RBVMOMI ruby library to interact with the VMWare VSphere API that does all the footwork of going to the right host, cloning a initial VM template, hooking up the right data stores, setting up initial networking etc. This also calls into VMWare’s guest OS customization configuration which will produce a windows unattend.xml file. Also known as an answer file. The VMWare tools will inject this file into the setup which windows will then use to drive its installation.

Our unattend file ends up being pretty simple. It performs a domain join and runs some scripts that tweak winrm so our provisioner can talk to the machine, install the Chef client and kick-off the appropriate cookbooks and recipes making the machine a “real boy” in the end. We run a mix of windows and linux but everything goes through this same sequence but of coarse the linux boxes don’t have unattend.xml files generated but they do have their own OS customization process that configures initial networking.

If everything goes right. This takes about 5 minutes from the initial cloning until the machine can receive network traffic and begin its convergence to whatever role that machine will fill: web server, rabbitMQ server, CouchDB server, etc. It really doesn't matter if its windows or linux, 5 minutes is roughly the norm. BTW: for most of our automation testing of linux machines we use Docker which is nearly instantaneous but we do not use that in production (yet).

Breaking through Getting Ready

So what can one do when the windows install gets “stuck” in this Getting Ready state? Shift-F10 is your friend. I don’t think it matters what hypervisor infrastructure you are using or even if this is a bare metal install. We use VMWare but this should work on Hyper-V, VirtualBox, etc. Shift-F10 will immediately open a CMD.exe as administrator if typed during the unattended install phase.

From here you can start pouring through logs and can even open regedit and other gui based tools if necessary but this command prompt is usually enough to find out what is happening.

Where are the logs?

As I have stated above, I have personally not found the VMWare events or the machine event logs to be much help. Your mileage may vary but you are likely going to want to find the unattend activity log which is located, of course, in

c:\windows\panther\UnattendGC\setupact.log

I don’t know what Panther is. I like to think there was some MS windows team back in the early 90’s that called themselves the panther team pioneering the way forward in windows automation. I also like to think they used gang-like panther calls to communicate with one another when spotting each other in the cafeteria or the campus store. They may have worn special jackets with the wild face of a panther on the back and perhaps some had tattoos or some form of tribal scarification applied resembling panther like imagery. Who knows…I can only guess.

At least in my case this is where the answers were found. Certainly they will be here if the issue is related to the domain join which mine usually tend to be. If the authentication with the domain admin account is at fault, that should be clear here. For instance:

2014-09-06 22:30:10, Warning  [DJOIN.EXE] Unattended Join: NetJoinDomain attempt failed: 0x775, will retry in 10 seconds...
2014-09-06 22:30:20, Warning  [DJOIN.EXE] Unattended Join: NetJoinDomain attempt failed: 0x775, will retry in 10 seconds...
2014-09-06 22:30:30, Warning  [DJOIN.EXE] Unattended Join: NetJoinDomain attempt failed: 0x775, will retry in 10 seconds...
2014-09-06 22:30:40, Warning  [DJOIN.EXE] Unattended Join: NetJoinDomain attempt failed: 0x775, will retry in 10 seconds...
2014-09-06 22:30:51, Warning  [DJOIN.EXE] Unattended Join: NetJoinDomain attempt failed: 0x775, will retry in 10 seconds...
2014-09-06 22:31:01, Warning  [DJOIN.EXE] Unattended Join: NetJoinDomain attempt failed: 0x775, will retry in 10 seconds...
2014-09-06 22:31:11, Warning  [DJOIN.EXE] Unattended Join: NetJoinDomain attempt failed: 0x775, will retry in 10 seconds...
2014-09-06 22:31:22, Warning  [DJOIN.EXE] Unattended Join: NetJoinDomain attempt failed: 0x775, will retry in 10 seconds...
2014-09-06 22:31:32, Warning  [DJOIN.EXE] Unattended Join: NetJoinDomain attempt failed: 0x775, will retry in 10 seconds...
2014-09-06 22:31:42, Warning  [DJOIN.EXE] Unattended Join: NetJoinDomain attempt failed: 0x775, will retry in 10 seconds...

The key above is the hex error code. Given the nature of the hexadecimal numeric format, the root is often immediately obvious and if not a google search usually points you to a more specific message.

In my recent stump scenario, the issue was that the domain controller could not be found. It ended up that although I was explicitly giving the domain controller IPs as the DNS servers to use, I was assigning the machine IP via DHCP and the DHCP server pointed to a different pair of DNS servers. For whatever reason, windows was choosing to use those servers and therefore unable to resolve the domain name to its correct domain controllers. There is also many other non-domain join details to be found here as well.

Other log locations that may be helpful

If for whatever reason, the unattend activity log does not have helpful information, there are a few more places to look. All files and subdirectories under:

c:\windows\panther
c:\windows\debug
c:\windows\temp

If you too are using the VMWare tools to drive the OS customization, you will find logs specific to VMWare’s work in c:\windows\temp. Many of the logs in the directories mentioned above may duplicate one another but some may have more granular detail than others.

I certainly hope this helps. If it does and you so happen to spot me in a crowd, let out a wild panther shriek and I promise to return with the same.

Dear VMWare, please give us nice things to automate the things by Matt Wrock

I’ve spent this week at VMWorld 2014 in San Francisco and have been exposed to a fair amount of VMWare news and products. One of my goals for the week was to talk to someone on the VMWare team about questions I have regarding their SDKs as well as to provide feedback regarding my own experiences working with their APIs. Yesterday I met with Brian Graf (@vTagion) during a “Meet the Experts” session. Brian has taken over Alan Renouf’s (@alanrenouf) previous role as technical marketing engineer focusing on automation. He was gracious enough to hear out some of my venting on this topic and asked that I follow up with an email so that he could direct these issues to the right folks in his organization. This blog post is intended to fill the role of that email in an internet indexible format.

I’m pretty new to VMWare. Having recently come from Microsoft, Hyper-V has been my primary virtualization tool. One of the attractions to my new job at CenturyLink Cloud was the opportunity to work with the VMWare APIs. I have many colleagues and acquaintances in the automation space who work almost exclusively with VMWare. VMWare has dominated the virtualization market since the beginning and I have been wanting for some time to get a better glimpse into their products and see what the hype was all about.

So for the last several months, I have been working closely with VMWare tools and APIs nearly all day every day. One of my key focus points has been developing the automation pipeline that CenturyLink Cloud uses to fire up new data centers and to bring existing datacenters under more highly automated management. This not only involves automating the build out of every server but automating the automation of those servers. That’s where me and VMWare hang out. We have been leveraging Chef and its new machine resource framework Chef Metal. So I have been doing a fair amount of Ruby development writing and refining a VSphere driver that serves as our bridge between VMWare VMs and Chef. This also includes code that ties into our testing framework and allows us to spin up VMs as new automation is committed to source control and then automatically tested to ensure what was meant to be automated is automated.

Not only am I new to VMWare, I’m also new to Ruby. For years and years I had been largely a C# developer and more recently with powershell over the last 5 years. So I may not be the most qualified voice to speak on behalf of the x-plat automation community, but I am a voice nonetheless and I have had the pleasure of interacting with several “Rubyists” and hearing their thoughts and observations on working with the VMWare APIs.

The VSphere SDK is wrought with friction

From what I can tell, there is absolutely no controversy here. Talk to any developer who has had to work with VMWare APIs around provisioning VMs and they are excited to not merely mention but pontificate upon the unfriendliness of these SDKs. I’m not talking about PowerCLI here -  more on that later but I am speaking of nearly all of the major programming language SDKs that sit on top of the VMWare SOAP service. One nice thing is that they all look nearly identical since they all sit on the same API therefore my criticism can be globally applicable. I have personally worked with the C# and mostly the Ruby based rbvmomi library.

One of the biggest pain points is the verbosity required to wire up a command. For example, to clone a VM there is quite a few configuration classes that  I have to instantiate and string together to feed into the CloneVM method. So CloneVM ends up being a very fat call but if anything goes wrong or I have not provided just the right value in one of the configuration classes, I may or may not get an actionable error message back and if not I may have to engage in quite a bit of trial and error to determine just where things went wrong. I think I understand the technical reasoning here and that this is attempting to keep the network chatter down, but frankly I don’t care. This is a solvable problem and it would be interesting to know if VMWare is looking to solve this.

Open Source solutions

I am not asking that VMWare feed me a better API. Especially in the Ruby community there are many quality developers more than willing to help. In fact a look over at github will reveal that there has been significant community effort and assistance here. 

Just use Fog

One option that many pursue in the Ruby space is using an API called Fog. This is an API that abstracts several of the popular (and not so popular) cloud and hypervisor solutions into a single API. In theory this means that I can code my infrastructure against VSphere but also leverage EC2. The API aggregates many of the underlying components that one expects to find in any of these environments like Machines, Networks, Storage, etc.

Of coarse the reality is that simply moving from one implementation to another never “just works.” Also the more you need to leverage the specific and unique strengths of one implementation, the more likely it is that you eventually need to “go native” and abandon fog. This was my fate and I also found the fog plugin model to be inherently flawed in that when I pull down the Fog ruby GEM, I had to pull down all plugin implementations built in making for a huge payload to download and install.

An apparent OSS cone of silence?

The core Ruby library, rbvmomi, the same library that Fog leverages has fairly recently been transferred to VMWare ownership. I’d be inclined to say that is a good thing. However it seems that VMWare is neither engaging with the developers trying to contribute to this library not are they releasing new bits to rubygems.org. Further, VMWare has been silent amidst requests to when a release can be expected. The last release was in December, 2013.

Unfortunately one pull request merged in January (8 months ago), has still not been released to RubyGems. This particular commit fixes a breaking change related to the popular Nokogiri library and this means that many need to pin their rbvmomi version to 1.5.5 released in 2012. Remember 2012? This not only shows a lack of desire to collaborate with and support their community but has an even more damaging effect of discouraging developers to contribute. Why would you want to contribute  to a dead repository?

I’m not saying that I believe VMWare has no interest in community collaboration and I fully appreciate the herculean effort sometimes needed to get a legal department in the company the size of VMWare to authorize this kind of collaboration but silence does not help and it is sending a bad message (well I guess no message really).

Engagement with the Ruby community is important

This community is likely small in comparison to VMWare’s bread and butter customers, but the world is changing. Ruby has an incredibly large stake in the configuration management space along with many other popular tools in the “devops” ecosystem. Puppet and Chef are both rooted in Ruby and as a Chef integrator, almost any integration between Chef and VSphere is written in Ruby. Another popular tool is Vagrant, any Vagrant plugin to support VSphere integration is going to leverage rbvmomi.

The industry is currently seeing a huge influx of involvement and interest with getting tools like these plugged into their infrastructures. I believe this will continue to become more popular and eventually those who use VMWare not because they “have to” may eventually opt for solutions that are more friendly to interact with.

Scant documentation

All of this is made worse by the fact that the documentation for the VSphere API is unacceptably sparse. VMWare does maintain a site that serves to document all of the objects, methods and properties exposed by their SDK. However these consist of one line descriptions with no in depth details or examples. Yes there is an active VMWare community but the resources they produce often do not suffice and may be difficult to find for more obscure issues.

PowerCLI is cool but it does not help me

The sense that I often get is that VMWare is trying to answer these shortcomings with its PowerCLI API – a powershell based API that I do think is awesome. The PowerCLI has succeeded in making many of the operations that take several lines of ruby or C# into one liners and it comes with great command line help. In stark contrast to the other language SDKs, almost everyone raves over PowerCLI who is able to use it in their automation pipeline. However, this is not an answer and especially so if you run either a linux or a mixed linux/windows shop (ie most people).

At Centurylink Cloud we run both windows and linux. We find that it is easiest to run all of our automation from linux as the control center to our pipeline. Its just not practical to provision a set of windows nodes to act as a gateway into VSphere. Therefore this means Power CLI is only available to me for one off scripts which is exactly what we need to automate away from.

While I’m at it, one small nit on PowerCLI. Its implementation as a powershell snapin as opposed to a module can make it more difficult to plug into existing powershell infrastructure. It’s a powershell v1 technology (we are coming up on v5) and while somewhat small, it is one of those things that can give the impression of an amateur effort. That said, PowerCLI is by no means amateur.

What am I suggesting?

First, let us know that you hear this and that you have a desire to move forward to help my community integrate with your technology. Respond to people commenting on your github repository. If you lack the legal approval to release contributions, designate one or more community members and transfer ownership to them allowing them to coordinate PRs and releases. This library is not contributing to VMWare IP its just a convenience layer sitting on top of your web services so this seems like a reasonable request. Let me also state what I don’t want. I do not want fancy GUIs or anything that waits for me to point and click. I’m working to automate every dark corner of our infrastructure so that we can survive.

Finally, I want to clarify this this post is not intended to insult anyone. My hope is that it serves as another data point to help you understand your customers and that you can consider as you plan future strategy. I’m sure there are many employees at VMWare that share my passion and want to make integration with other automation tools a better story. To those employees, I say fight fight fight and do not become complacent and you are really super cool and awesome for doing that.

Running Ubuntu with DHCP on Hyper-V over WIFI by Matt Wrock

Our CenturyLink Cloud Chef workstation served from Vagrant on Hyper-V. Credit for the ascii art goes to Tim Shakarian (@tsh4k).

Our CenturyLink Cloud Chef workstation served from Vagrant on Hyper-V. Credit for the ascii art goes to Tim Shakarian (@tsh4k).

A few months back when I began doing a bunch of linux automation and was waiting for my company ordered machine to arrive, I was mostly working from my personal windows laptop and was fairly invested in Hyper-V as my hypervisor of choice. Both at work and at home I work off of a wireless connection. This has not been a problem running windows guests especially since windows 8.1. There were a few rough edges on windows 8 but those seem to have been smoothed over in 8.1.

So my first go of an Ubuntu 12.04 guest installed just fine and I could interact with it via a hyper-v console but I could not SSH to the guest. It was not being assigned an IP accessible from the outside.

I had difficulty finding good information about this on the net. This is probably because the scenario is not very popular. This issue does not occur if you are on a wired connection or if your guest is using a statically assigned IP. Anyhow, I thought I’d blog about the solution for the other five people who run into this.

Is only Ubuntu affected or are other linux distributions affected as well?

I’m not sure but it is very possible. Personally I ran into this on Ubuntu 12.04 and 14.04. I have found some reports that seem to indicate that this is due to some fundamental network configuration changes made to Ubuntu in v12. If you are experiencing similar symptoms under other distros or earlier Ubuntu versions, the solution reported here is certainly worth a shot and please comment if you can.

Why run linux on Hyper-V?

That’s a very fair question. It does seem that most folks running linux VMs on windows tend to use Virtual Box as their hypervisor. I’ve run Virtual Box quite a bit back on windows 7 and it worked great. Since Windows 8, hyper-v comes “in the box” on the professional and enterprise SKUs. I had become familiar with using hyper-v on windows server SKUs, liked it and also really liked the hyper-v powershell module that ships with powershell version 3 and above.

One thing to be aware of is that you cannot run Virtual Box and hyper-v concurrently on the same machine. However, there is a work around if you create a separate boot record for a “sans Hyper-V’' setup. Of coarse this means a reboot if you want to switch. More importantly though, I have found that if you later uninstall Virtual Box, your hyper-v install can become corrupted. This has happened to me twice. The first incident required a repave of my machine and the second I recovered from by restoring to a previous machine image. I don’t know…maybe I’m doing something wrong but that was my experience and hopefully your mileage will vary. Since I use hyper-v for some side projects, I prefer to keep Virtual Box off of my personal machine.

Use an internal virtual switch and enable internet connection sharing to its adapter

This, in short, is the solution. In other words, do not use an external switch. When you are on wifi, hyper-v will create a bridge between your wifi adapter and the adapter it creates for the external switch. I wont get into the details (because I do not know them), but the Ubuntu guest cannot obtain an IP from DHCP under this setup.

So if you do not have one already, create an internal virtual switch from the Hyper-V management interface.

You can keep your external one if you use it for other guests, they can coexist just fine. Configure your linux guest’s network adapter to use the internal switch.

Next go to the Networking and Sharing center and select Change adapter settings. Open the properties of the adapter supplying your internet. This will likely be your wifi adapter. However, if you already have and plan to keep an external switch, you will notice that the wifi adapter is bridged to a separate adapter named after your external switch. If that’s the case, that’s the adapter whose properties you want to select.

Once in the properties pane, select the “sharing” tab and check: Allow other network users to connect through this computer’s network connection.

 If you have ,multiple adapters that this adapter could possibly share with, there will be a drop down option to choose. You can only share with one. If you only have one (in this case the adapter assigned to the internal switch) then there will be no drop down.

That’s it. You may need to restart the networking service but after doing so, it should get an IP and you can SSH to the guest using that.

The only residual fallout from this setup, and you may experience this regardless is that sometimes moving to a different network may require resetting one or more of your adapters. For example if you transport your laptop from a work network to a home network. Again, you may experience this even without this setup or you may not experience it at all. Its been rather hit and miss for me but I seem to bump into this more often under this setup.

Peering into the future of windows automation testing with Chef, Vagrant and Test-Kitchen – Look mom, no SSH! by Matt Wrock

tablet_converge

Update: see this post for the latest update on getting up and running with Test-Kitchen on windows.

Linux automation testing has been supported for a while now using many great tools like chef, puppet, Test-KitchenServerSpec, MiniTest, Bats, Vagrant, etc. If you were willing to install an SSH server on Windows, you could get most of these tools to work but if you wanted to stay “native” you were on your own.

Pictured above: Testing node convergence on an 8 inch tablet.

I’m not at all morally opposed to installing SSH on windows. I love SSH. We spoon regularly. But while SSH is “just there” on linux, it incurs an extra install step for windows that must either be done manually or included in initial provisioning or image creation. Also, for some windows-only shops, the unfamiliarity of SSH may add a layer of unwanted friction in an automation ecosystem where windows is often an after thought.

Well recent efforts to make Windows testing a first class experience are beginning to take shape. Its still early days and some of the bits are not yet “officially” released but are available to use by pulling the latest bits from source control. I know…I know…to many that will still spell “friction” in bold. However, I want to share that one can today test windows machine builds via winrm with no SSH server installed, and I also want to offer a glimpse to those who prefer to wait until everything is fully baked of what is to come, and inform you that the wheels are in motion so please keep abreast of these developments.

Note: I presented much of this material and several Boxstarter demos to the Philadelphia PowerShell User Group last week, the video is available here.

Its not automated until it is tested

I work for CenturyLink Cloud and infrastructure automation is front and center to our business. Like many shops, we have a mixed environment and central to our principals is the belief that testing our automation is just as important as building our automation. In fact they are not even two separate concepts. Untested automation is not finished being built. So I am going to share with you here how we test our Windows server infrastructure along with some other bits I have been working with on the side.

Vagrant

If you have not heard of Vagrant, just stop reading right now and mosey on over to http://vagrantup.com. Vagrant is a hypervisor agnostic way of spinning up and provisioning servers that is particularly suited for developing and testing. It completely abstracts both the VM infrastructure as well as many possible provisioning systems (chef, puppet, plain shell scripts, docker and many many more) so that one can provision and share the same machine among a team using different platforms.

To illustrate the usefulness here, where I work we have a diverse team where some prefer MACs, others work on Windows and others (like myself) run a Linux desktop. We use Chef to automate our infrastructure and anyone who needs to create or edit chef artifacts needs all sorts of dependencies installed with specific versions in order to be successful. Vagrant plays a key role here. Anyone can download our Ubuntu 12.04 base image via VirtualBox, VMWare or Hyper-V and then use its Chef provisioner plugin to build that image to a state that mirrors the one used by the entire team. all this is done by including a small file of metadata that serves as a pointer to here the base images can be found as well as the chef recipes. If this sounds interesting, again I refer you to Vagrant’s documentation for the details What I want to point out here is its windows support.

Added support for WinRM and Hyper-V

Until fairly recently, Vagrant only supported SSH as a transport mechanism to provision a VM. It also lacked official Hyper-V support as a VM provider. This changed in version 1.6 with a WinRM “Communicator” and a Hyper-V provider plugin included in the box. While I don’t really use Hyper-V at work, I have some windows based personal projects at home and I prefer to use Hyper-V. So I quickly tested out this new plugin and was happy to see it available. There are still some kinks in the current version but work is underway to improve the experience. I’m trying to to personally contribute to issues that are blocking my own work and a couple have been accepted into Vagrant Master. Overall that has been a lot of fun. Here are the issues that have come up for me:

  • Only .vhdx image files are supported and .vhd files cannot be imported. I hit a wall with this when trying to use the .vhd files freely available for testing here on Technet. I have since added a patch which has been accepted to fix this.
  • Generation 2 Hyper-V VMs are imported as Generation 1 VMs and fail to boot. Oddly, most .vhdx images tend to be generation 2. My PR for this issue was just accepted yesterday.
  • Synced folders over SMB (this is the norm for a windows host/windows guest setup) fail. I’m hoping my PR for this issue is accepted.

If these same issues become blockers for you, the first two can be immediately fixed by pulling the latest copy of Vagrant’s master branch and copying the lib and plugin directories onto the installed version and you are welcome to pull my smb_sync branch which includes all of the fixes:

git clone -b smb_sync https://github.com/mwrock/vagrant
copy-item -path vagrant\lib `
  C:\HashiCorp\Vagrant\embedded\gems\gems\vagrant-1.6.3 `
  -recurse -force
copy-item -path vagrant\plugins `
  C:\HashiCorp\Vagrant\embedded\gems\gems\vagrant-1.6.3 `
  -recurse -force

Having worked with Vagrant for the past few months, I’ve been finding myself wishing there was a remote powershell equivilent to the vagrant SSH command which drops you into an ssh session on the guest box. So today I banged out a first draft of a vagrant ps command that does just that and will submit once it is more polished. You can expect it to look like this:

C:\dev\vagrant\win8.1x64> vagrant ps
default: Creating powershell session to 192.168.1.14:5985
default: Username: vagrant
[192.168.1.14]: PS C:\Users\vagrant\Documents>

A base box for testing

I’ve been playing with creating windows vagrant boxes. Unfortunately for Hyper-V, the vagrant package command is not yet implemented so I have to “manually” create the base box. Perhaps I’ll work on an implementation for my next contribution. My Windows 2012R2 Hyper-V box requires all the above fixes to install without error. You could use this Vagrantfile to test:

# -*- mode: ruby -*-
# vi: set ft=ruby :

VAGRANTFILE_API_VERSION = "2"
Vagrant.configure(VAGRANTFILE_API_VERSION) do |config| 
  config.vm.box = "mwrock/Windows2012R2"
  config.vm.box_url = "https://vagrantcloud.com/mwrock/Windows2012R2/version/1/provider/hyperv.box"
  # Change "." below with your own folder you would like to sync
  config.vm.synced_folder ".", "/chocolateypackages", disabled: true
  config.vm.guest = :windows 
  config.vm.communicator = "winrm"
  config.winrm.username = "administrator"
  config.winrm.password = "Pass@word1"
end

Note here that you need to specify :windows as the guest. Vagrant will not infer that on its own nor will it assume you are using winrm if you are using a windows guest so make sure to add that to your boxes as well if you intend to use winrm.

Test-Kitchen

Test-Kitchen is a testing framework most often used for testing Chef recipes (hence – kitchen). However I understand it is also compatible with Puppet as well. Like many tools in this space such as Vagrant above, it is highly plugin driven. Test-Kitchen by itself doesn’t really do much. What Test-Kitchen brings to the table (Ha Ha! I said table. get it?) is the ability to bring together a provisioning configuration management system like Chef and Puppet, a myriad of different cloud and hypervisor platforms and several testing frameworks. In the end it will spin up a machine, run your provisioning code and then run your tests. Further you can integrate this in your builds providing quick feedback as to the quality of your automation upon committing changes.

“Official” support for windows guests coming soon

Currently the “official” release of Test-Kitchen does not support winrm and must go through SSH on windows. However, Salim Afiune (@afiune), a developer with Chef has been working on adding winrm support. I have plumbed this into our Windows testing at CenturyLink cloud and have also used it developing my Boxstarter cookbook, which allows one to embed boxstarter based powershell in a recipe and provides all the reboot resiliency and windows config functions available in Boxstarter core. Salim has also contributed corresponding changes to the vagrant and EC2 Test-Kitchen drivers.

At CenturyLink, we use vmware and a customized vsphere driver to test with Test-Kitchen. It was trivial to add support for Salim’s branch.. With the Boxstarter cookbook, I use his vagrant plugin without issue. According to this Chef blog post, all of this windows work will likely be pulled into the next release of Test-Kitchen.

But I just cant wait. I must try this today!

So for those interested in “kicking the tires” today, here is how you can install all the bits needed:

cinst chefdk
cinst vagrant

git clone -b transport https://github.com/afiune/test-kitchen
git clone -b transport https://github.com/mwrock/kitchen-vagrant 

copy-item test-kitchen\lib `
  C:\opscode\chefdk\embedded\apps\test-kitchen ` 
  -recurse -force
copy-item test-kitchen\support `
  C:\opscode\chefdk\embedded\apps\test-kitchen `
  -recurse -force
copy-item -Path kitchen-vagrant\lib ` 
  C:\opscode\chefdk\embedded\lib\ruby\gems\2.0.0\gems\kitchen-vagrant-0.15.0`
  -recurse -force 

cd test-kitchen
gem build .\Test-kitchen.gemspec
chef gem install test-kitchen-1.3.0.gem

This will install the Chef Development Kit and vagrant via Chocolatey and I’m assuming you have chocolatey installed. Otherwise you can download these from their respective download pages here and here. Then it clones the winrm based test-kitchen and kitchen-vagrant projects and copies them over the current bits.

Note that my instructions here are assuming you are testing on Windows. However, the winrm functionality is most certainly capable of running on Linux as I do at work. If you were doing this on Linux, I’d suggest running bundle install and bundle exec instead of copying over the chef directories. However this has caused me too many problems on Windows to recommend to others and purely copying the bits has not caused me any problems.

Hyper-V

Now you can pull down the boxstarter cookbook to test from https://github.com/mwrock/boxstarter-cookbook. If you run Hyper-V, you will want to install my vagrant fixes according to the instructions above since the box inside the boxstarter cookbook’s kitchen config is on a vhd file. You can then simply navigate to the boxstarer cookbook directory and run:

kitchen test

This will build a win 2012 R2 box and install and test a very simple cookbook via Test-Kitchen.

Virtual Box

If you run VirtualBox, you will need to make a couple changes. Replace the VagrantfileWinrm.erb content with this:

# -*- mode: ruby -*-
# vi: set ft=ruby :

VAGRANTFILE_API_VERSION = "2"
Vagrant.configure(VAGRANTFILE_API_VERSION) do |config| 
  config.vm.box = "<%= config[:box] %>"
  config.vm.box_url = "<%= config[:box_url] %>"
  config.vm.guest = :windows 
  config.winrm.username = "vagrant"
  config.winrm.password = "vagrant"
  config.winrm.port = 55985
end

You would also replace the .kitchen.yml content with:

The test included in the boxstarter cookbook is not very interesting but illustrates that you can indeed run kitchen tests against windows machines with no ssh installed.

---
driver: 
  name: vagrant 

provisioner: 
  name: chef_zero 

platforms: 
  - name: windows-81 
    transport: 
      name: winrm 
      max_threads: 1 
    driver: 
      port: 55985 
      username: vagrant 
      password: vagrant 
      guest: :windows 
      box: mwrock/Windows8.1-amd64 vagrantfile_erb: VagrantfileWinrm.erb 
      box_url: https://wrock.blob.core.windows.net/vhds/win8.1-vbox-amd64.box 

suites: 
  - name: default
    run_list: 
      - recipe[boxstarter_test::simple] 
    attributes:

Looking at a more interesting ServerSpec test

For those reading who might want to see what a more interesting test would look like, lets take a look at this Chef recipe:

include_recipe 'boxstarter::default'

boxstarter "boxstarter run" do
  password 'Pass@word1'
  code <<-EOH
    Update-ExecutionPolicy Unrestricted
    Set-WindowsExplorerOptions -EnableShowHiddenFilesFoldersDrives `
      -EnableShowProtectedOSFiles -EnableShowFileExtensions 
    Enable-RemoteDesktop
    cinst console2
    cinst IIS-WebServerRole -source windowsfeatures

    #Install-WindowsUpdate -acceptEula
  EOH
end

This is a sample recipe I include with the Boxstarter cookbook but I have commented out the call that runs windows updates. This recipe will run the included Boxstarter resource and perform the following:

  • Update the powershell execution policy

  • Adjust the windows explorer settings

  • enable remote desktop

  • install the console2 command line console

  • Install IIS

Here is a test file that will check most of the items changed by the recipe:

require 'serverspec'

include Serverspec::Helper::Cmd
include Serverspec::Helper::Windows 

describe file('C:\\programdata\\chocolatey\\bin\\console.exe') do
  it { should be_file }
end

describe windows_feature('Web-Server') do
  it{ should be_installed.by("powershell") }
end

describe windows_registry_key(
  'HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Explorer\advanced') do
  it { should have_property_value('Hidden', :type_dword,'1') }
end

describe command('Get-ExecutionPolicy') do
  it { should return_stdout 'Unrestricted'}
end

Serverspec provides a nice Ruby DSL for testing the state of a server. Although the test is pure ruby code, in most cases you don’t really need to know ruby. Familiarity with the cut and paste features will be very helpful so please review those as necessary.

The documentation on the ServerSpec.org page does a decent job of describing the different resources that can be tested. Above are just a few:  a file resource, windows feature resource, a windows registry resource and a command resource that you can use to issue any powershell necessary to test your server.

All of these tests, as we do at CenturyLink can be fed into a Continuous Integration server (Jenkins, TeamCity, TFS, etc.) to give your team speedy feedback on the state of your automation codebase.

I hope you find this helpful and I look forward to these features making it into the official vagrant and test-kitchen installs soon.

Leaving Microsoft and Building a ‘DataCenterStarter’ for CenturyLink Cloud by Matt Wrock

As of last week I am no longer working at Microsoft. I worked at Microsoft for the last four and a half years and it was an amazing experience where I learned a lot from many very smart people. I am now a Software Engineer focusing on data center automation at CenturyLink Cloud.

What the heck did I do at Microsoft?

I came to Microsoft and the Pacific Northwest from Southern California where I had spent the previous 9 years working for an online advertising company starting as a front line web developer and eventually becoming VP of technology. I reached a point where I wanted a change and a return to hands on engineering. Having no formal computer engineering training and being almost exclusively exposed to “startup” shops, I really wanted to work for a major technology company to witness how a well established organization runs things. Well I definitely got what I was looking for and received exposure to some amazing people and practices.

At Microsoft, I started working on the Visual Studio Gallery and several other similar sites like the Technet script gallery and the MSDN Code Sample gallery as well as some of the “goo” that provided a unified experience for the Microsoft Forums, the galleries, search and profile pages on MSDN and Technet. Some of the greatest things I walked away with here was an engrained devotion to the practice of Test Driven Development and a great appreciation for not only the consumption but participation in Open Source Software projects.

In my free time I created an open source library that significantly improved our page load performance across the above sites as well as the msdn/technet blog and wiki platform. Later I worked on environment setup and deployment automation for these properties which inspired Boxstarter.org. The last 2 years were spent in the Visual Studio Cloud Services org within DevDiv where I worked on “Feature Flags” allowing us to deploy “hidden” features while they were in the middle of development, the back end for the new Charting features inside TFS Work Item Tracking and most recently deployment automation for Visual Studio Online.

A new chapter

Over the past couple of years, my “side project” Boxstarter has consumed a lot of my passion and has led me to develop some relationships in the DevOps community and learn of many disciplines and technologies that fascinate me. I love and have become somewhat consumed by automation.

So a little over a month ago I received a twitter DM from a previous colleague, Tim Shakarian, asking me if I would be interested in building a “DataCenterStarter” for CenturyLink’s recent cloud acquisition at Tier3. I read this having just returned from dinner with my friend Rob Reynolds and some other guys from Puppet Labs and Peter Pouliot who heads up Microsoft community development of Hyper-V integration in OpenStack. Everyone present shared the same passions for automation and I was especially inspired hearing Peter’s automation stories from his Novell days and recent work with organizations like CERN. So with these conversations fresh in my mind, I wondered what are these DataCenters Tim speaks of?

I really was not “on the market” looking to move from Microsoft. In fact my role had recently changed and there were some great opportunities ahead to bring my organization to an exciting new level of engineering efficiency, but I thought it would be foolish not to at least listen to what my friend Tim had to say. Well six weeks later here I am and I am totally excited to be working on data center automation for CenturyLink Cloud. I feel like a kid in a candy store beginning work on projects that give me the opportunity to build out automation at vast scale and help my team deliver an awesome cloud solution to our customers.

Its not easy leaving behind an organization like Microsoft. There are a lot of great people there and I will truly miss my free MSDN Ultimate subscription where I managed to consistently milk about 148 of my 150 dollar monthly spending allowance on Azure services (the same granted to any MSDN Ultimate subscriber). I think it may be a couple years before I fully process my experiences at Microsoft so please be on the lookout for my forthcoming graphic novel series, Razzle Dragon, that portrays in Japanese Manga style my stint as a Microsoft software engineer.