Calling knife commands from ruby without shelling out by Matt Wrock

ruby_knife.jpg

When I started out writing Chef cookbooks, occasionally I'd want to run a knife command from my recipe, library, LWRP or my own gem or knife plugin and I'd typically just use the ruby system method which just creates a subshell to run a command.  This never felt quite right. Composing a potentially complex command by building a large string is cumbersome and not politely readable. Then there is the shelling out to a subshell which is inefficient. So after doing some cursory research I was surprised to find little instruction or examples on how to use straight ruby to call knife commands. Maybe my google foo just wasn't up to snuff.

So here I'll run through the basics of how to compose a knife command in ruby, feeding it input and even capturing output and errors.

A simple knife example from ruby

We'll start with a complete but simple example of what a knife call in ruby looks like and then we can dissect it.

# load any dependencies declared in knife plugin
Chef::Knife::Ssh.load_deps

# instantiate command
knife = Chef::Knife::Ssh.new

# pass in switches
knife.config[:attribute] = 'ipaddress'
knife.config[:ssh_user] = "root"
knife.config[:ssh_password_ng] = "password"
knife.config[:config_file] = Chef::Config[:config_file]

# pass in args
knife.name_args = ["name:my_node", "chef-client"]

# setup output capture
stdout = StringIO.new
stderr = StringIO.new
knife.ui = Chef::Knife::UI.new(stdout, stderr, STDIN, {})

# run the command
knife.run

puts "Output: #{stdout.string}"
puts "Errors: #{stderr.string}"

Setup and create command

This is very straight forward. Knife plugins may optionally define a deps method which is intended to include any require statements needed to load the dependencies of the command. Not all plugins implement this, but you should always call load_deps (which will call deps) just in case they do.

Finally, new up the plugin class. The class name will always reflect the command name where each command name token is capitalized in the class name. So knife cookbook list is CookbookList.

Command input

Knife commands typically take input via two forms:

Normal command line arguments

For instance:

knife cookbook upload my_cookbook

where my_cookbook is an argument to CookbookUpload.

These inputs are passed to the knife command in a simple array via the name_args method ordered just as they would be on the command line.

knife.name_args = ["name:my_node", "chef-client"]

Using our knife ssh example, here we are passing the search query and ssh command.

Command options

These include any command switches defined by either the plugin itself or its knife base classes so it can always include all the standard knife options.

These are passed via the config hash:

knife.config[:attribute] = 'ipaddress'
knife.config[:ssh_user] = "root"
knife.config[:ssh_password_ng] = "password"
knife.config[:config_file] = Chef::Config[:config_file]

Note that the hash keys are usually but not necessarily the same as the name of the option switch so you may need to review the plugin source code for these.

Capturing output and errors

By default, knife commands send output and errors to the STDOUT and STDERR streams using Knife::UI. You can intercept these by providing an alternate UI instance as we are doing here:

stdout = StringIO.new
stderr = StringIO.new
knife.ui = Chef::Knife::UI.new(stdout, stderr, STDIN, {})

Now instead of logging to STDOUT and STDERR, the command will send this text to our own stdout and stderr StringIO instances. So after we run the command we can extract any output from these instances.

For example:

puts "Output: #{stdout.string}"
puts "Errors: #{stderr.string}"

Running the command

This couldn't be simpler. You just call:

knife.run

Hope this is helpful.

Help test the future of Windows Infrastructure Testing on Test-Kitchen by Matt Wrock

I've posted about using Test-Kitchen on Windows a couple times. See this post and this one too. Both of these posts include rather fragile instructions on how to prepare your environment in order to make this possible. Writing them feels like handing out scissors and then encouraging people to run on newly oiled floors generously sprinkled with legos while transporting said scissors. Then, if they are lucky, their windows nodes will converge and kick off tests ready for their review once they reach "the other side." Its dangerous. Its exciting. Pain may be an active ingredient.

Well development has been ramping up in this effort. Some of the outside forks have now been merged into a dedicated branch of the official Test-Kitchen repo - windows-guest-support and its been rebased with the latest master branch of Test-Kitchen. A group of folks from within and outside of chef including test-kitchen creator Fletcher Nichol as well as Salim Afiune who got the ball rolling on windows compatibility meet regularly to discuss progress and bugs. I'm honored to be involved and contributed the winrm based file copying logic (future blog post pending - my wounds have not yet fully healed).

I can't wait until the day that no special instructions are required and we think that day is not far off but here is an update on how to get up and running with the latest bits. Lots have changed since my last post but I think its much simpler now.

What to install and where to get it

First clone the windows-guest-support branch of the Test-Kitchen repo:

git clone -b windows-guest-support https://github.com/test-kitchen/test-kitchen 

Build and install the gem. If you are running the chefdk on either windows or linux, you can use the rake task dk_install which will do the build and install and additionally overlay the bits on top of the omnibussed Test-Kitchen.

rake dk_install

This may not be compatible with all drivers. I use it regularly with:

Lets run through the vagrant setup.

Clone the windows-guest-support branch of kitchen-vagrant:

git clone -b windows-guest-support https://github.com/test-kitchen/kitchen-vagrant

Build and install the gem:

rake install

You should now be good to go:

C:\> kitchen -v
Test Kitchen version 1.3.2.dev

Configuration

There is just one thing that needs changing in your .kitchen.yml configuration file. As an example, here is my .kitchen.yml for a recent PR of mine adding windows support to the chef-minecraft cookbook:

driver_plugin: vagrant

provisioner:
  name: chef_zero

platforms:
- name: windows-2012R2
  driver_config:
    box_url: https://wrock.blob.core.windows.net/vhds/vbox2012r2.box
  transport:
    name: winrm

- name: ubuntu-12.04
  run_list:
  - recipe[ubuntu]
  driver_config:
    box: hashicorp/precise64

suites:
- name: default
  run_list:
  - recipe[minitest-handler]
  - recipe[minecraft]
  attributes:
    minecraft:
      accept_eula: true

The windows box hosted in my Azure storage is an evaluation copy due to expire in a couple months. I try to rebuild it before it expires. Note the transport setting here:

transport:
  name: winrm

This tells test-kitchen to use the winrm transport instead of the default ssh transport. Furthermore, you will notice that a

kitchen list

produces slightly modified output:

Instance                Driver   Provisioner  Transport  Last Action
default-windows-2012R2  Vagrant  ChefZero     Winrm      <Not Created>
default-ubuntu-1204     Vagrant  ChefZero     Ssh        <Not Created>

Note the new transport column.

A note for Hyper-V users

I tend to use Hyper-V on my personal windows laptop and VirtualBox on my work Ubuntu laptop. I have only one issue on Hyper-V now. It hangs when vagrant tries to change the hostname of the box. I believe this is a bug in Vagrant. If you interrupt the box provisioning and boot into the box, it then blue screens - at least this has been my experience. To work around this for now I comment out line 24 of templates/Vagrantfile.erb in the kitchen-vagrant driver:

<% if config[:vm_hostname] %>
  # c.vm.hostname = "<%= config[:vm_hostname] %>"
<% end %>

Then I reinstall the gem.

Tip: The url to my Hyper-V vagrant box with an evaluation copy of windows 2012R2 is:

https://wrock.blob.core.windows.net/vhds/hyperv2012r2.box

Lets all join hands and bow our heads in convergence

You'll appreciate the spiritual tone when your screen reveals a converged vm with passing tests. Either that or you will rage quit windows when this all goes to bloody hell, but I'm gonna try and keep a "glass half full" attitude here. You are welcome to follow along with me and the mine craft server cookbook. Clone my repo:

git clone -b windows https://github.com/mwrock/chef-minecraft

Now set the .kitchen-vagrant.yml file to be the "active" kitchen config file instead of .kitchen.yml which is configured to use DigitalOcean:

Powershell

$env:KITCHEN_YAML=".kitchen-vagrant.yml"

Bash

export KITCHEN_YAML=.kitchen-vagrant.yml

And all together now on 1, 2, 3...Converge!!

kitchen converge default-windows-2012R2

While you wait...:

Just 3 minutes later, its a success!!!

       15 tests, 5 assertions, 0 failures, 0 errors, 0 skips
         - MiniTest::Chef::Handler
       Running handlers complete
       Chef Client finished, 34/57 resources updated in 180.231501 seconds
       Finished converging <default-windows-2012R2> (5m4.94s).

Side note on automating local policy

This might be somewhat unrelated but I just cannot let it go. The minecraft server cookbook creates a windows scheduled task (kinda like a linux cron job) that runs the java process that hosts the minecraft server and it creates a user under which the job runs. In order to run a scheduled task, a windows user must have the "log on as batch job" right configured in their local policy.

Turns out this is a bit tricky to automate. I'll spare the audience from the horror which makes this possible but if you must look, see https://github.com/mwrock/chef-minecraft/blob/windows/templates/default/LsaWrapper.ps1.erb. Basically this can only be done by calling into the windows API as is done here. Big thanks to Fabien Dibot for this information!

Hurry up and wait! Test and provide feedback

There is alot of work going into making sure this is stable and provides a good experience for those wanting to test windows infrastructure with test-kitchen. However there are so many edge cases that are easy to miss. I very much encourage anyone wanting to try this out to do so and reach out via github issues to report problems.

More windows packaging for vagrant and fixing 1603 errors during MSI installs by Matt Wrock

This post is largely a follow up to my November post In search of a light weight windows vagrant box. If you are interested in some pointers to get your windows install as small as possible and then package it up into a Hyper-V or VirtualBox vagrant box file, I'd encourage you to read it. this post will cover three main topics:

  • Why a windows box (vagrant or otherwise) may suffer from 1603 errors when installing MSIs (this is why i set out to repackage my boxes)
  • New "gotchas" packaging Hyper-V boxes on the Windows 10 technical preview
  • LZMA vs. GZIP compression...a cage match

Caution: This Installation may be fatal!

While error code 1603 is a standard MSI error code, I can assure you it is never good and in fact it is always fatal. 1603 errors are "Fatal errors." First a quick primer in troubleshooting failed MSI installs.

MSI installs may simply fail silently leaving no clue as to what might have happened. That can be common of many installation errors especially if you are performing a silent install. The install will likely emit an erroneous exit code at the least but perhaps nothing else. This is when it is time to use the log file switch and add some verbosity for good measure. This may assist you in tracking down an actionable error message or flood you with more information than you ever wanted or both.

The log file is usually generated by adding:

/lv c:\some\log\file.log

to your MSIEXEC.exe command. If the install fails, give this file a good look over. It may seem overly cryptic and will largely contain info meant to be meaningful only to its authors but more often than not one can find the root cause of a failed install within this file.

In mid August of 2014, microsoft rolled out an update KB 2918614 that caused many machines to raise this error when installing MSIs. An almost universal fix was found and that was to uninstall KB 2918614. But in this age of rolling forward, rolling back is so 2013. Months later a hotfix was issued KB3000988. In short this error can occur if you have patch KB2918614  and are running an install with an admin user that has never logged into the box before. In my case I was installing the chef client to start a Test-Kitchen run on a newly provisioned vagrant box.

I could manually install the chef client just fine if I hit this error because that entailed actually logging into the box. However after doing this several times it gets really old but running through a full vagrant packaging can be a multi night process that I have been avoiding but can do so no longer.

Packaging Hyper-V Vagrant boxes on windows 10

Tl;dr: you can't.

You can call me a "Negative Nancy" but I refuse to wear your labels. 

Hyper-V has changed the format it uses to store metadata about the VM. This has been stored in XML format until now. When you package a vagrant Hyper-V box, you include this file in the .box package file and then when you import it, vagrant reads from it and extracts the vital pieces of metadata like cores, memory, network info, etc in order to correctly create a new VM based on that data. It does NOT simply import that file since it contains some unique identifiers that could possibly conflict with other VMs on your host system.

Windows 10 uses a binary format to store this data with a .vmcx extension. This is supposed to provide better performance and reliability when changing vm settings. However it also renders a vagrant import doomed. Thankfully, one can still import vagrant boxes packaged in the xml format and Hyper-V will migrate them to the new format, but this migration is unidirectional at the moment.

I'm hoping future releases will be able to export machines in XML format or at the least the .vmcx format will be published so that vagrant contributors can add support to these new boxes. For now, I'm just gonna need to find a pre v10 windows host to create an xml based VM export that I can package. (I accept donations). Funny how I have access to thousands of guest VMs but the only physical windows boxes I work with are my personal laptop and my wife and kids with Windows Home edition (no Hyper-V). So on to creating a VirtualBox box file.

Update: I was able to package a Hyper-V box by simply using the same box artifacts I had used in my previous box and replacing the virtual hard drive with my updated one. It just needs to have the same name. This works as long as the vm metadata equally applies to the new drive which was the case for me.

LZMA compression: smaller payload larger compression/decompression overhead

In my November post I discussed the benefits of using the LZMA format to package the box. This format is more efficient but takes significantly longer to complete the compression. My personal opinion is that compression time is not that important compared with download and decompression time since the former is done far less frequently and can be scheduled "out of band" so to speak. Better compression is even more important with windows boxes because they are significantly larger than *nix flavored machines.

Arthur Maltson commented on twitter the other day that he sticks with gzip over lzma because the lzma decompression is also considerably longer. I hadn't noticed this but I also did not measure it closely. So lets have a closer look at 3 key factors: compressed box size, download time and decompression time.

This week I rebuilt my windows box including the  KB3000988 hotfix mentioned above. I created both an lzma .box file and a gzip version for VirtualBox so I could compare the two. Both are identical in content. The gzip box weighs in at 3.6GB and the lzma version is 2.8GB. About a 22% delta. Not bad but also not as large of a delta as my observations in November.

Anyone can do the math on the download time. I get about 13mbps on my home FIOS internet connection. So the .8GB delta should mean the gzip will take about 9 extra minutes to download assuming I am pulling the box from an online source. I keep my boxes in Azure storage. Now here is the kicker: the LZMA compressed box takes about 6 minutes to decompress compared to about 1 minute with the gzip. So overall I'm saving just under 5 minutes with the LZMA box. A five minutes savings is great but in light of a total one hour download and the two to three hours it took to produce the initial compressed box, I'm thinking the gzip is the winner here. There are other benefits too. For instance this means you are better off simply using the vagrant package command for VirtualBox boxes meaning more simplicity.

Furthermore it is important to note that Vagrant downloads and decompresses the package only once and caches it in your .vagrnt.d folder. All "vagrant up" commands simply copy the previously downloaded and decompressed image to a new VM. So any savings yielded from a smaller download is only rewarded one time per box on any one host assuming you do not explicitly delete the box.

Staying "in the know" with podcasts by Matt Wrock

TL;DR: There will be no dog hosted podcasts discussed here but please enjoy this adorable image.

TL;DR: There will be no dog hosted podcasts discussed here but please enjoy this adorable image.

I love podcasts and I credit them, those who produce them and their guests for playing a significant role in developing my career and passions. You can skip to the end of this post to check out the podcasts I listen to today, but allow me to pontificate about podcasts and how  I like to consume them.

I started listening to podcasts (mostly technical) almost ten years ago. Around that time I got interested in ultra marathons (any run longer than 26.2 miles) and they would keep me company on my monthly 50K runs mostly in the dark through the trails of Chino Hills State Park. Back then I had been developing software professionally for several years and had done some truly cool stuff but mostly in a cave of my own making. I am a self-taught coder and what I knew at the time I had learned mostly from books and my own tinkering. I was not at all "plugged in" to any developer community and the actual human developers I knew were limited to those at my place of work. Podcasts changed all of that.

High level awareness over deep mastery

First things first, if you set aside time to listen to a podcast with the hopes of really learning some deep details about a particular topic, you may be disappointed. This is not to say that podcasts lack rich technical content, they simply are not the medium by which one should expect to gain mastery over a given topic.

Most will agree that technology workers like those likely reading this post are constantly inundated with new technologies, tools, and ideas. Sometimes it can feel like we are constantly making decisions as to what NOT to learn because no human being can possibly set out to study and even gain a novice ability to work with all of this information. So its important that the facts we use to decide where to invest our learning efforts are as well informed as possible.

I like the fact that I can casually listen to several podcasts and build an awareness of concepts that may be useful to me and that I can draw from later at a deeper level. There have now been countless times that I have come across a particular problem and recall something I heard in a podcast that I think may be applicable. At that time I can google the topic and either determine that its not worth pursuing or start to dive in and explore.

So many trends and ideas - you need to be aware

There is so much going on in our space and at such a fast pace. Like I mention above, its simply impossible to grasp everything. Its also impossible to simply follow every trending topic. However we all need to maintain some kind of feed to the greater technical community in order to maintain at least a basic awareness of what is current in our space. Its just too easy to live out our careers in isolation, regardless of how smart we are, and miss out on so many of the great ideas in circulation around us.

When I started listening to podcasts, my awareness and exposure to new ideas took off and allowed me to follow new disciplines that truly stretched me. I may not have gained these awarenesses  had I not had this link to the "outside world." 

Some of the significant "life changing" ideas that podcasts introduced me to were: Test Driven Development, Inversion of Control patterns and container implementations, several significant Open Source projects but more importantly, a curiosity to become actively involved in open source.

Making a bigger impact

After listening to several podcasts I began to take stock of my career and realize that while I had accomplished to put out some good technology and gain notoriety within my own work place, that notoriety and overall impact did not reach far beyond that relatively small sphere of influence. Listening to podcasts and being exposed to the guests that appeared on them made me recognize the value of "getting out there" and becoming involved with a broader group. This especially hit home when I decided to change jobs after being with the same employer for nine years.

It was in large part thanks to some of the prolific bloggers I heard interviewed that inspired me to start my own blog. I had listened to tons of open source project contributors talk about the projects they started and maintain and I eventually started my own projects. A couple of these got noticed and I have now been invited to speak on a few podcasts myself. That just seems crazy and tends to strongly invoke my deep seated imposter syndrome, but they were all alot of fun.

I even got to work with a podcaster who I enjoyed listening to for years, David Starr (@elegantcoder),  and had the privilege of sitting right next to him every day. What a treat and I have to say that the real life David lived up to the episodes I enjoyed on my runs years before. If you want to hear someone super smart, I'm talking about David, have a listen to his interview on Hanselminutes.

Podcasts I listen to

So my tastes and the topics I tend to gravitate towards have changed over the past few years. For instance, I listen to more "devopsy" podcasts and less webdev shows than I used to but I still religiously listen to some of the first podcasts I started with. Some I enjoy more for the host than the topics covered.

Here are the podcasts I subscribe to today in alphabetical order:

.Net Rocks!

This may have been the first series I listened to and I still listen now and again. As the name suggests, its focus is on .net technologies. Carl Franklin and Richard Campbell do a great and very professional job producing this podcast.

Arrested Devops

This is a fairly new podcast focusing on devops topics and usually includes not only the hosts, Matt Stratton, Trevor Hess, and Bridget Kromhout but also one or more great guests knowledgeable of devops topics. You will also learn, and I'll just tell you right now, that there is always devops in the banana stand. I did not know that.

I've had the pleasure of meeting Matt on a few occasions at some Chef events. He's a great guy, fun to talk to and passionate about devops in the windows space.

The Cloudcast

Put on by Aaron Delp and Brian Gracely, I just started listening to this one and so far really like it. I work for a cloud so it seems only natural that  listen to such a podcast.

Devops Cafe

Another great podcast focusing on devops topics put on by John Willis and Damien Edwards. The favicon of their website looks like a Minecraft cube. Is there meaning here? I don't know but I like it.

Food Fight Show

Another Devops centered podcast hosted by Nathan Harvey and Brandon Burton. The show often covers topics relevant to the Chef development community. So if you are interested in Chef, I especially recommend this show but its coverage certainly includes much more.

Hanselminutes

Another show that I have been listening to since the beginning of my podcast listening. Its hosted by Scott Hanselman and I think he has a real knack for interviewing other engineers. Many of the shows cover topics relevant to Microsoft topics but in recent years Scott has been focusing on alot on broad, and I think important, social issues and how they intersect with developer communities. Its really good stuff.

Herding Code

A great show that often, but not necesarily always focuses on web based technologies. These guys - Jon GallowayK. Scott Allen, Kevin Dente, and Scott Koon - ask alot of great questions of their guests and have the ability to dive deep into technical issues.

Ops All the Things

Put on by Steven Murawski and Chris Webber talking about devops related topics. I learned about Steven from his appearances on several other podcasts talking about Microsoft's DSC (Desired State Configuration) and his experiences working with it at Stack Exchange. I've had the privilege of meeting Steven and recently working with him on a working group aimed at bringing Test-Kitchen (an ifrastructure automation testing tool) to Windows.

PowerScripting Podcast

A great show focused on powershell hosted by Jonathan Walz and Hal Rottenberg. If you like or are interested in powershell, you should definitely subscribe to this podcast. They have tons of great guests including at least three episodes with Jeffrey Snover the creator of powershell.

Runas Radio

A weekly interview show with Richard Campbell and an interesting guest focusing on Microsoft IT Professional (Ops) and lately many "devops" related guests and topics.

The Ship Show

Another podcast focused on devops topics hosted by Join J. Paul Reed, Youssuf El-KalayEJ Ciramella, Seth Thomas, Sascha Bates , and Pete Cheslock. These episodes often include great discussion both among the hosts and with some great guests.

Software Defined Talk

Another new show in my feed but this one is special. Its hosted by Michael Coté, Matt Ray, and Brandon Whichard. I find these guys very entertaining and informative. The show tends to focus on general market trends in the software industry but there is something about the three of these guys and their personalities that I find really refreshing. I walk away from all of these episodes with a good chuckle and with several tidbits of industry knowledge I didn't have before.

Software Engineering Radio

Here is another show that I have been listening to since the beginning. One thing I like about this series is that it really has no core technical focus and therefore provides a nice range of topics across, "devops", process management, and engineering covering several different disciplines. I highly recommend a recent episode, Gang of Four – 20 Years Later.

This Developers Life

There hasn't been a new episode in over a year and perhaps there never will be another but each of these episodes are a must listen. If you like the popular This American Life podcast, you should really enjoy this series which shamelessly copies the former but focuses on issues core to development. Scott Hanselman and Rob Connery are true creative genius here.

Windows Weekly

It took me a couple episodes to get into this one but I now look forward to it every week. Hosted by Leo Laporte, Mary Jo Foley and Paul Thurrott, it takes a more "end user" view into Microsoft technologies. Now that I no longer work for Microsoft I find it all the more interesting to get some inside scoop on that place where I used to work.

 

Adventures in sysprep and the failed quest for disk cleanup on server 2012 R2 by Matt Wrock

A couple months ago I wrote a post about creating light weight windows vagrant boxes. For those unfamiliar with vagrant, a "Vagrant Box" is essentially a VM image and vagrant provides a rich plugin ecosystem that allows one to consume a "box" from different clouds and hypervisors and also use a variety of provisioning technologies to build out the final instance. My post covered how a windows image is prepared for vagrant and also discussed several techniques for making the image as small as possible. Last week I set about updating a windows vmware template using many of those same optimizations but when it came time to sysprep the image, alas it was not a tear free process.

This post will cover:

  • gotchas when it comes to sysprepping windows images
  • Troubleshooting sysprep failures
  • public mourning of the loss of our good friend, cleanmgr.exe, on server 2012 R2

What is sysprep?

Sysprep is a command line tool that prepares a windows instance to be "reconsumed." It can take different command line arguments which will produce different flavors of output. My use of the tool and the one covered by this post is to prepare a base windows image to be deployed from VMWare infrastructure. This often involves the use of the /generalize switch which strips a windows OS of its individuality. It removes things like hostname, IP, user SIDs and even geographical association. You can also provide sysprep a path to an unattend file, also known as an answer file, that can contain all sorts of setup metadata such as administrator credentials, startup script, windows product key and more. Here is an example:

<?xml version="1.0" encoding="utf-8" ?>
<Unattend>
   <UserData>
      <!--This section contains elements for pre-populating user information and personalizing the user experience-->
      <AdminPassword Value="TG33hY" StrongPassword="No" EncryptedPassword="No"/>
      <FullName Value="Cookie Jones" />
      <ProductKey Value="12345-ABCDE-12345-ABCDE-12345" />
   </UserData>
   <DiskConfig>
      <!--This section contains elements for pre-populating information about disk configuration settings-->
      <Disk ID="0">
         <CreatePartition />
      </Disk>
   </DiskConfig>
   <SystemData>
      <RegionalSettings>
         <!--This section contains elements for selecting regional and language settings for the user interface-->
         <UserInterface Value="12" />
      </RegionalSettings>
   </SystemData>
</Unattend>

This has the advantage of preparing a fresh install that does not require the user to manually input a bunch of information before logging on and being productive.

You might prep the os with this file by running:

C:\windows\system32\sysprep\sysprep.exe /generalize /oobe /shutdown /unattend:myAnswerFile.xml

Sysprep without running sysprep

I dont often have the need to directly interact with sysprep.exe. Almost all of my dealings with it have been through VMWare's customization tooling and API which allow me to provision windows machines from ruby code that instruct VMWare how to perform sysprep and assemble the answer file. Here is an example of working with the ruby based vmware API, rbvmomi, to programatically construct the answer file:

def windows_prep_for(options, vm_name)
  cust_runonce = RbVmomi::VIM::CustomizationGuiRunOnce.new(
    :commandList => [
      'winrm set winrm/config/client/auth @{Basic="true"}',
      'winrm set winrm/config/service/auth @{Basic="true"}',
      'winrm set winrm/config/service @{AllowUnencrypted="true"}',
      'shutdown -l'])

  cust_login_password = RbVmomi::VIM::CustomizationPassword(
    :plainText => true,
    :value => options[:password])
  if options.has_key?(:domain)
    cust_domain_password = RbVmomi::VIM::CustomizationPassword(
      :plainText => true,
      :value => options[:domainAdminPassword])
    cust_id = RbVmomi::VIM::CustomizationIdentification.new(
      :joinDomain => options[:domain],
      :domainAdmin => options[:domainAdmin],
      :domainAdminPassword => cust_domain_password)
  else
    cust_id = RbVmomi::VIM::CustomizationIdentification.new(
      :joinWorkgroup => 'WORKGROUP')
  end
  cust_gui_unattended = RbVmomi::VIM::CustomizationGuiUnattended.new(
    :autoLogon => true,
    :autoLogonCount => 1,
    :password => cust_login_password,
    :timeZone => options[:win_time_zone])
  cust_userdata = RbVmomi::VIM::CustomizationUserData.new(
    :computerName => RbVmomi::VIM::CustomizationFixedName.new(
      :name => options[:hostname]
    ),
    :fullName => options[:org_name],
    :orgName => options[:org_name],
    :productId => options[:product_id])
  RbVmomi::VIM::CustomizationSysprep.new(
    :guiRunOnce => cust_runonce,
    :identification => cust_id,
    :guiUnattended => cust_gui_unattended,
    :userData => cust_userdata)
end

VMWare calls sysprep.exe for me on the base vm template image and can pass in a file like the one above to enable winrm, register the product key, setup the local administrator and domain join the final vm. This all works great except for when it doesn't.

When things go wrong either by calling sysprep.exe directly or via VMWare, its not immediately obvious what the error is. In fact I would say that it is immediately very confusing...and even worse sometimes it is not immediate at all. I wrote a post six months ago about how to troubleshoot unattended windows provisioning gone wrong. Here I want to look specifically at issues concerning disk cleanup.

Preparing for sysprep

Often the point of running sysprep is to be able to take a golden image and deploy that for use in many virtual instances.So you want to make sure that the image you are capturing is...well...golden. That might also mean, especially for windows, as small as possible. Since windows images are much larger than their linux counter parts and orders of magnitude larger than containers, its important to me that they be as small as possible at the outset so that an already drawn out provisioning time does not go even longer.

There are a few techniques that can be applied here and which ones will depend on the version of windows you are running. I'm focusing here on the latest released server version 2012 R2. I'd definitely encourage you to read my vagrant post that talks about some of the new features of component cleanup and features on demand that can shave many gigabytes off of your base image. Another tool that many use to purge useless files from their windows os is cleanmgr.exe. Many know this better as the little app that is launched from the "disk cleanup" button when viewing a disk's properties.

Enabling Disk Cleanup on windows server

Windows clients have this feature enabled by default but out of the box it is not present on server SKUs. The way to enable it is by adding the Desktop Experience feature. This would be done in powershell by running:

Add-WindowsFeature Desktop-Experience

The problem with this is that the Desktop-Experience brings alot of baggage with it that you do not typically need or want on a server. In fact, it will automatically enable two additional features:

  • Media Services
  • Ink and Handwriting Services

All around in files and registry size, this makes your OS footprint larger so there are typically two ways to deal with this.

Install, Cleanup, Uninstall

You want to have this be the last step of your image preparation process. Once everything is as it should be, you install the Desktop Experience, perform a required reboot, invoke cleanmgr.exe and dump as much as you can and then uninstall the feature along with the above two features it installed. Then finally, of course, reboot again.

Install cleanmgr.exe ala carte style

You dont need this feature just to run cleanmgr. While this is certainly not obvious, it is buried deep inside your windows folder even when the desktop experience is not enabled. This is even documented on Technet. Search for cleanmgr.exe  and cleanmgr.exe.mui inside of c:\windows\winSXS:

Get-ChildItem -Path c:\windows\winsxs -Recursive -Filter cleanmgr.exe
Get-ChildItem -Path c:\windows\winsxs -Recursive -Filter cleanmgr.exe.mui

This may return two or three versions of the same file. You'll probably want whichever has the highest versions. According to the above referenced Technet article, on server 2008 R2 these will be in:

C:\Windows\winsxs\amd64_microsoft-windows-cleanmgr_31bf3856ad364e35_6.1.7600.16385_none_c9392808773cd7da\cleanmgr.exe

C:\Windows\winsxs\amd64_microsoft-windows-cleanmgr.resources_31bf3856ad364e35_6.1.7600.16385_en-us_b9cb6194b257cc63\cleanmgr.exe.mui

They can simply be copied to c:\windows\system32 and c:\windows\system32\en-US respectively. While they wont be visible from the disk properties, you can still access them from the command line.

Two steps forward one step back on server 2012 R2

Server 2012 R2 has delivered some major enhancements for reducing the size of a windows os footprint. It provides commands for cleaning out installed updates and you can completely remove unused features from disk. Further, those parts of winSXS that are not in use are compressed. This is all great stuff but the problem is that because cleanmgr.exe is compressed, it cannot simply be copied out and run as is. Further, neither I nor anyone else on the internet can seem to extract it.

Its clearly compressed. While disabled, its about 82k and 213k afterwards. I tried using the compact commandline tool as well as winrar without luck.

One option is to do a mix of the above two approaches: Enable the feature. Once enabled, those two files are both expanded and moved to system32. Then copy them somewhere safe before disabling the feature. Now you could use these files either on this machine or another server 2012 R2 box...or not.

I have tried this and it works in so far as I can get cleanmgr.exe to pop its GUI dialog, but it appears crippled. Only a hand full of the usually available options are present:

Where are the error reports, the setup files, etc?

So what does this have to do with sysprep?

Go ahead and perform a sysprep after disabling the desktop experience feature.

A fatal error...hmm.

Troubleshooting sysprep

When things go wrong during a sysprep cycle, the place to look is:

c:\windows\system32\sysprep\panther\setupact.log

This file will almost always include a more instructive error as well as information as to what it was doing just before the failure which can help debug the issue. The error we get here is:

Package winstore_1.0.0.0_neutral_neutral_cw5n1h2txyewy was installed for a user, but not provisioned for all users. This package will not function properly in the sysprep image.

Sysprep will attempt to uninstall all windows store apps and here it is complaining that it cannot and one is still installed.

Lets just check to see what store apps are currently installed:

PS C:\Users\Administrator.WIN-DKAJ9Q1JK5N> Get-AppxPackage


Name              : winstore
Publisher         : CN=Microsoft Windows, O=Microsoft Corporation, L=Redmond, S=Washington, C=US
Architecture      : Neutral
ResourceId        : neutral
Version           : 1.0.0.0
PackageFullName   : winstore_1.0.0.0_neutral_neutral_cw5n1h2txyewy
InstallLocation   : C:\Windows\WinStore
IsFramework       : False
PackageFamilyName : winstore_cw5n1h2txyewy
PublisherId       : cw5n1h2txyewy
IsResourcePackage : False
IsBundle          : False
IsDevelopmentMode : False

Name              : windows.immersivecontrolpanel
Publisher         : CN=Microsoft Windows, O=Microsoft Corporation, L=Redmond, S=Washington, C=US
Architecture      : Neutral
ResourceId        : neutral
Version           : 6.2.0.0
PackageFullName   : windows.immersivecontrolpanel_6.2.0.0_neutral_neutral_cw5n1h2txyewy
InstallLocation   : C:\Windows\ImmersiveControlPanel
IsFramework       : False
PackageFamilyName : windows.immersivecontrolpanel_cw5n1h2txyewy
PublisherId       : cw5n1h2txyewy
IsResourcePackage : False
IsBundle          : False
IsDevelopmentMode : False

Ok. fine. We'll just delete them ourselves.

PS C:\Users\Administrator.WIN-DKAJ9Q1JK5N> Get-AppxPackage | Remove-AppxPackage
Remove-AppxPackage : Deployment failed with HRESULT: 0x80073CFA, Removal failed. Please contact your software vendor.
(Exception from HRESULT: 0x80073CFA)
error 0x80070032: AppX Deployment Remove operation on package winstore_1.0.0.0_neutral_neutral_cw5n1h2txyewy from:
C:\Windows\WinStore failed. This app is part of Windows and cannot be uninstalled on a per-user basis. An
administrator can attempt to remove the app from the computer using Turn Windows Features on or off. However, it may
not be possible to uninstall the app.
NOTE: For additional information, look for [ActivityId] cc6d4139-3ae8-0000-0447-6dcce83ad001 in the Event Log or use
the command line Get-AppxLog -ActivityID cc6d4139-3ae8-0000-0447-6dcce83ad001
At line:1 char:19
+ Get-AppxPackage | Remove-AppxPackage
+                   ~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : WriteError: (winstore_1.0.0....l_cw5n1h2txyewy:String) [Remove-AppxPackage], IOException
    + FullyQualifiedErrorId : DeploymentError,Microsoft.Windows.Appx.PackageManager.Commands.RemoveAppxPackageCommand

Remove-AppxPackage : Deployment failed with HRESULT: 0x80073CFA, Removal failed. Please contact your software vendor.
(Exception from HRESULT: 0x80073CFA)
error 0x80070032: AppX Deployment Remove operation on package
windows.immersivecontrolpanel_6.2.0.0_neutral_neutral_cw5n1h2txyewy from: C:\Windows\ImmersiveControlPanel failed.
This app is part of Windows and cannot be uninstalled on a per-user basis. An administrator can attempt to remove the
app from the computer using Turn Windows Features on or off. However, it may not be possible to uninstall the app.
NOTE: For additional information, look for [ActivityId] cc6d4139-3ae8-0000-0f47-6dcce83ad001 in the Event Log or use
the command line Get-AppxLog -ActivityID cc6d4139-3ae8-0000-0f47-6dcce83ad001
At line:1 char:19
+ Get-AppxPackage | Remove-AppxPackage
+                   ~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : WriteError: (windows.immersi...l_cw5n1h2txyewy:String) [Remove-AppxPackage], IOException
    + FullyQualifiedErrorId : DeploymentError,Microsoft.Windows.Appx.PackageManager.Commands.RemoveAppxPackageCommand

Ugh. We cant uninstall these? Nope. You cannot. So once you install the desktop experience feature, it cannot be fully uninstalled. The only way to sysprep this machine is to keep the desktop experience feature enabled.

Whether you sysprep via the VMWare tools or directly, you can now no longer run a successful sysprep without the desktop experience unless you start over with a new OS. I have scowered the internet for a work around have not found any. There are lots of folks complaining about this.

Its not as bad as it might seem

The fact of the matter is that I do not see this as being a horrendous show stopper at least not for my use case. By the time I run disk cleanup, there really is not that much to be purged. Far less than a gigabyte. This is because I am preparing a fresh os so there has not been much accumulation of cruft. The vast majority of disposable content I can now purge very thoroughly with the new DISM.exe command:

Dism.exe /online /Cleanup-Image /StartComponentCleanup /ResetBase

Worse case, I manually delete temp files and some of the other random junk lying around. Its unfortunate that we have lost cleanmgr.exe but all is not lost.

Exceeding the 3 sysprep limit

Another issue I hit with sysprep that threw me and prompted a fair amount of research was the limit of 3 sysprep runs from a single os install. It is true that you are limited to three but there is an easy workaround I found here. The limit manifests itself with another fatal error during sysprep and the following message in the log file:

RunExternalDlls:Not running DLLs; either the machine is in an invalid state or we couldn't update the recorded state, dwRet = 0x1f

According to the post mentioned above, the work around is to set the following reg keys:

HKEY_LOCAL_MACHINE\SYSTEM\Setup\Status\SysprepStatus\GeneralizationState\
CleanupState:2
GeneralizationState:7

Then run:

msdtc -uninstall
msdtc -install

and then reboot. I was able to get by by just setting the GeneralizationState property of HKEY_LOCAL_MACHINE\SYSTEM\Setup\Status\SysprepStatus\GeneralizationState to 7, but your mileage may vary.