How can we most optimally shrink a Windows base image? by Matt Wrock

I have spent alot of time trying to get my Windows vagrant boxes as small as possible. I blogged pretty extensively on what optimizations one can make and how those optimizations can be automated with Packer. Over the last week I've leveraged that automation to allow me to collect data on exactly how much each technique I employ saves in the final image. The results I think are very interesting.

Diving into the data

The metrics above reflect the savings yielded in a fully patched Windows 2012 R2 VirtualBox base image. The total size of the final compressed .box vagrant file with no optimizations was 7.71GB and 3.71GB with all optimizations applied.

I have previously blogged the details involved in each optimization and my Packer templates can be found online that automate this process. Let me quickly summarize these optimizations in order of biggest bang for buck:

  • SxS Cleanup (54%): The Windows SxS folder can grow larger and larger over time. This has historically been a major problem and until not too long ago, the only remedy was to periodically repave the OS. Among other things, this folder includes backups for every installed update so that they can be undone if necessary. The fact of the matter is that most will never rollback any update. Windows now expose commands and scheduled tasks that allow us to periodically trim this data. Naturally this will have the most impact the more updates that have been installed.
  • Removing windows features or Features On Demand (25%): Windows ships with almost all installable features and roles on disk. In many/most cases, a server is built for a specific task and its dormant unenabled features simply take up valuable disk space. Another relatively new feature in Windows management is the ability to totally remove these features from disk. They can always be restored later either via external media or Windows Update.
  • Optimize Disk (13%): This is basically a defragmenter and optimizes the disk according to its used sectors. This will likely be more important as disk activity increases between OS install and the time of optimization.
  • Removing Junk/Temp files (5%): Here we simply delete the temp folders and a few other unnecessary files and directories created during setup. This will likely have minimal impact if the server has not undergone much true usage.
  • Removing the Page File (3%): This is a bit misleading because the server will have a page file. We just make sure that the image in the .box file has no page file (possibly a GB in space but compresses to far less). On first boot, the page file will be recreated.

The importance of "0ing" out unused space

This is something that is of particular importance for VirtualBox images. This is the act of literally flipping every unused bit on disk to 0. Otherwise the image file treats this space as used in a final compressed .box file. The fascinating fact here is if you do NOT do this, you save NOTHING. At least for VirtualBox but not Hyper-V and that is all I measured. So our 7.71 GB original patched OS with all optimizations applied but without this step compressed to 7.71GB. 0% savings.

This is small?

Lets not kid ourselves. As hard as we try to chip away at a windows base image, we are still left with a beast of an image. Sure we can cut a fully patched Windows image almost in half but it is still just under 4 GB. Thats huge especially compared to most bare Linux base images.

If you want to experience a truly small Windows image, you will want to explore Windows Nano Server. Only then will we achieve orders of magnitude of savings and enter into the Linux "ballpark". The vagrant boxes I have created for nano weigh in at about 300MB and also boot up very quickly.

Your images may vary

The numbers above reflect a particular Windows version and hypervisor. Different versions and hypervisors will assuredly yield different metrics.

There is less to optimize on newer OS versions

This is largely due to the number of Windows updates available. Today, a fresh Windows 2012 R2 image will install just over 220 updates compared to only 5 on Windows 2016 Technical Preview 5. 220 updates takes up alot of space, scattering bits all over the disk.

Different Hypervisor file types are more efficient than others

A VirtualBox .vmdk will not automatically optimize as well as a Hyper-V .vhd/x. Thus come compression time, the final vagrant VirtualBox .box file will be much larger if you dont take steps yourself to optimize the disk.

Creating a Windows Server 2016 Vagrant box with Chef and Packer by Matt Wrock

I've been using Packer for a bit over a year now to create the Windows 2012 R2 Vagrant box that I regularly use for testing various server configuration scripts. My packer template has been evolving over time but is composed of some Boxstarter package setup and a few adhoc Powershell scripts. I have blogged about this process here. This has been working great, but I'm curious how it would look differently if I used Chef instead of Boxstarter and random powershell.

Chef is a much more mature configuration management platform than Boxstarter (which I would not even label as configuration management). My belief is that breaking up what I have now into Chef resources and recipes will make the image configuration more composable and easier to read. Also as an engineer employed by Chef, I'd like to be able to walk users through how this would look using Chef.

To switch things up further, I'm conducting this experimentation on a whole new OS - Windows Server 2016 TP5. This means I dont have to worry about breaking my other templates, my windows updates will be much smaller (5 updates vs > 220) and I can use DSC resources for much of the configuring. So this post will guide you through using Chef and Packer together and dealing with the "gotchas" which I ran into. The actual template can be found on github here.

If you want to "skip to the end," I have uploaded both Hyper-V and VirtualBox providers to Atlas and you can use them with vagrant via:

 vagrant init mwrock/Windows2016
 vagrant up

Preparing for the Chef Provisioner

There are a couple things that need to happen before our Chef recipes can run.

Dealing with cookbook dependencies

I've taken most of the scripts that I run in a packer run and have broken them down into various Chef recipes encapsulated in a single cookbook I include in my packer template repository. Packer's Chef provisioners will copy this cookbook to the image being built but what about other cookbooks it depends on? This cookbook uses the windows cookbook, the wsus-client cookbook and dependencies that they have and so on, but packer does not expose any mechanism for discovering those cookbooks and downloading them.

I experimented with three different approaches to fetching these dependencies. The first two really did the same thing: installed git and then cloned those cookbooks onto the image. The first method I tried did this in a simple powershell provisioner and the second method used a Chef recipe. The down sides to this approach were:

  • I had to know upfront what the exact dependency tree was and each git repo url.
  • I also would either have to solve all the versions myself or just settle for the HEAD of master for all cookbook dependencies.

Well there is a well known tool that solves these problems: Berkshelf. So my final strategy was to run berks vendor to discover the correct dependencies and their versions and download them locally to vendor/cookbooks which we ignore from source control:

C:\dev\packer-templates [master]> cd .\cookbooks\packer-templates\
C:\dev\packer-templates\cookbooks\packer-templates [master]> berks vendor ../../vendor/cookbooks
Resolving cookbook dependencies...
Fetching 'packer-templates' from source at .
Fetching cookbook index from https://supermarket.chef.io...
Using chef_handler (1.4.0)
Using windows (1.44.1)
Using packer-templates (0.1.0) from source at .
Using wsus-client (1.2.1)
Vendoring chef_handler (1.4.0) to ../../vendor/cookbooks/chef_handler
Vendoring packer-templates (0.1.0) to ../../vendor/cookbooks/packer-templates
Vendoring windows (1.44.1) to ../../vendor/cookbooks/windows
Vendoring wsus-client (1.2.1) to ../../vendor/cookbooks/wsus-client

Now I include both my packer-templates cookbook and the vendored dependent cookbooks in the chef-solo provisioner definition:

"provisioners": [
  {
    "type": "chef-solo",
    "cookbook_paths": ["cookbooks", "vendor/cookbooks"],
    "guest_os_type": "windows",
    "run_list": [
      "wsus-client::configure",
      ...

Configuring WinRM

As we will find as we make our way to a completed vagrant .box file, there are a few key places where we will need to change some machine state outside of Chef. The first of these is configuring WinRM. Before you can use either the chef-solo provisioner or a simple powershell provisioner, WinRM must be configured correctly. The Go WinRM library cannot authenticate via NTLM and so we must enable Basic Authentication and allow unencrypted traffic. Note that my template removes these settings prior to shutting down the vm before the image is exported since my testing scenarios have NTLM authentication available.

Since we cannot do this from any provisioner, we do this in the vm build step. We add a script to the <FirstLogonCommands> section of our windows answer file. This is the file that automates the initial install of windows so we are not prompted to enter things like admin password, locale, timezone, etc:

<FirstLogonCommands>
  <SynchronousCommand wcm:action="add">
     <CommandLine>cmd.exe /c C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe -File a:\winrm.ps1</CommandLine>
     <Order>1</Order>
  </SynchronousCommand>
</FirstLogonCommands>

The winrm.ps1 script looks like:

netsh advfirewall firewall add rule name="WinRM-HTTP" dir=in localport=5985 protocol=TCP action=allow
winrm set winrm/config/service/auth '@{Basic="true"}'
winrm set winrm/config/service '@{AllowUnencrypted="true"}'

As soon as this runs on our packer build, packer will detect trhat WinRM is accessible and will move on to provisioning.

Choosing a Chef provisioner

There are two Chef flavored provisioners that come "in the box" with packer. The chef-client provisioner is ideal if you store your cookbooks on a Chef server. Since I am storing the cookbook with the packer-templates to be copied to the image, I am using the chef-solo provisioner.

Both provisioners will install the Chef client on the windows VM and will then converge all recipes included in the runlist specified in the template:

  "provisioners": [
    {
      "type": "chef-solo",
      "cookbook_paths": ["cookbooks", "vendor/cookbooks"],
      "guest_os_type": "windows",
      "run_list": [
        "wsus-client::configure",
        "packer-templates::install_ps_modules",
        "packer-templates::vbox_guest_additions",
        "packer-templates::uninstall_powershell_ise",
        "packer-templates::delete_pagefile"
      ]
    },

Windows updates and other WinRM unfriendly tasks

The Chef provisioners invoke the Chef client via WinRM. This means that all of the restrictions of WinRM apply here. That means no windows updates, no installing .net, no installing SQL server and a few other edge case restrictions.

We can work around these restrictions by isolating these unfriendly commands and running them directly via the powershell provisioner set to run "elevated":

    {
      "type": "powershell",
      "script": "scripts/windows-updates.ps1",
      "elevated_user": "vagrant",
      "elevated_password": "vagrant"
    },

When elevated credentials are used, the powershell script is run via a scheduled task and therefore runs in the context of a local user free from the fetters of WinRM. So we start by converging a Chef runlist with just enough configuration to set things up. This includes turning off automatic updates by using the wsus-client::configure recipe so that manually running updates will not interfere with automatic updates kicked off by the vm. The initial runlist also installs the PSWindowsUpdate module which we will use in the above powershell provisioner.

Here is our install-ps-modules.rb recipe that installs the Nuget package provider so we can install the PSWindowsUpdate module and the other DSC modules we will need during our packer build:

powershell_script 'install Nuget package provider' do
  code 'Install-PackageProvider -Name NuGet -Force'
  not_if '(Get-PackageProvider -Name Nuget -ListAvailable -ErrorAction SilentlyContinue) -ne $null'
end

%w{PSWindowsUpdate xNetworking xRemoteDesktopAdmin xCertificate}.each do |ps_module|
  powershell_script "install #{ps_module} module" do
    code "Install-Module #{ps_module} -Force"
    not_if "(Get-Module #{ps_module} -list) -ne $null"
  end
end

The windows-updates.ps1 looks like:

Get-WUInstall -WindowsUpdate -AcceptAll -UpdateType Software -IgnoreReboot

Multiple Chef provisioning blocks

After windows updates, I move back to Chef to finish off the provisioning:

    {
      "type": "chef-solo",
      "remote_cookbook_paths": [
        "c:/windows/temp/packer-chef-client/cookbooks-0",
        "c:/windows/temp/packer-chef-client/cookbooks-1"
      ],
      "guest_os_type": "windows",
      "skip_install": "true",
      "run_list": [
        "packer-templates::enable_file_sharing",
        "packer-templates::remote_desktop",
        "packer-templates::clean_sxs",
        "packer-templates::add_postunattend",
        "packer-templates::add_pagefile",
        "packer-templates::set_local_account_token_filter_policy",
        "packer-templates::remove_dirs",
        "packer-templates::add_setup_complete"
      ]
    },

A couple important things to include when running the Chef provisioner more than once is to tell it not to install Chef and to reuse the cookbook directories it used on the first run.

For some reason, the Chef provisioners will download and install chef regardless of whether or not Chef is already installed. Also, on the first Chef run, packer copied the cookbooks from your local environment to the vm. When it copies these cookbooks on subsequent attempts, its incredibly slow (several minutes). I'm assuming this is due to file checksum checking logic in the go library. You can avoid this sluggish file copy by just referencing the remote cookbook paths setup by the first run with the remote_cookbook_paths array shown above.

Cleaning up

Once the image configuration is where you want it to be, you might (or not) want to remove the Chef client. I try to optimize my packer setup for minimal size and the chef-client is rather large (a few hundred MB). Now you can't remove Chef with Chef. What kind of sick world would that be? So we use the powershell provisioner again to remove Chef:

Write-Host "Uninstall Chef..."
if(Test-Path "c:\windows\temp\chef.msi") {
  Start-Process MSIEXEC.exe '/uninstall c:\windows\temp\chef.msi /quiet' -Wait
}

and then clean up the disk before its exported and compacted into its final .box file:

Write-Host "Cleaning Temp Files"
try {
  Takeown /d Y /R /f "C:\Windows\Temp\*"
  Icacls "C:\Windows\Temp\*" /GRANT:r administrators:F /T /c /q  2>&1
  Remove-Item "C:\Windows\Temp\*" -Recurse -Force -ErrorAction SilentlyContinue
} catch { }

Write-Host "Optimizing Drive"
Optimize-Volume -DriveLetter C

Write-Host "Wiping empty space on disk..."
$FilePath="c:\zero.tmp"
$Volume = Get-WmiObject win32_logicaldisk -filter "DeviceID='C:'"
$ArraySize= 64kb
$SpaceToLeave= $Volume.Size * 0.05
$FileSize= $Volume.FreeSpace - $SpacetoLeave
$ZeroArray= new-object byte[]($ArraySize)
 
$Stream= [io.File]::OpenWrite($FilePath)
try {
   $CurFileSize = 0
    while($CurFileSize -lt $FileSize) {
        $Stream.Write($ZeroArray,0, $ZeroArray.Length)
        $CurFileSize +=$ZeroArray.Length
    }
}
finally {
    if($Stream) {
        $Stream.Close()
    }
}
 
Del $FilePath

What just happenned?

All of the Chef recipes, powershell scripts and packer templates can be cloned from my packer-templates github repo, but in summary, this is what they all did:

  • Installed windows
  • Installed all windows updates
  • Turned off automatic updates
  • Installed VirtualBox guest additions (only in vbox-2016.json template)
  • Uninstalled Powershell ISE (I dont use this)
  • Removed the page file from the image (it will re create itself on vagrant up)
  • Removed all windows featured not enabled
  • Enabled file sharing firewall rules so you can map drives to the vm
  • Enabled Remote Desktop and its firewall rule
  • Cleaned up the windows SxS directory of update backup files
  • Set the LocalAccountTokenFilterPolicy so that local users can remote to the vm via NTLM
  • Removes "junk" files and folders
  • Wiped all unused space on disk (might seem weird but makes the final compressed .box file smaller)

Most of this was done with Chef resources and we were also able to make ample use of DSC. For example, here is our remote_desktop.rb recipe:

dsc_resource "Enable RDP" do
  resource :xRemoteDesktopAdmin
  property :UserAuthentication, "Secure"
  property :ensure, "Present"
end

dsc_resource "Allow RDP firewall rule" do
  resource :xfirewall
  property :name, "Remote Desktop"
  property :ensure, "Present"
  property :enabled, "True"
end

Testing provisioning recipes with Test-Kitchen

One thing I've found very important is to be able to test packer provisioning scripts outside of an actual packer run. Think of this, even if you pair down your provisioning scripts to almost nothing, a packer run will always have to run through the initial windows install. Thats gonna be several minutes. Then after the packer run, you must wait out the image export and if you are using the vagrant post-provisioner, its gonna be several more minutes while the .box file is compressed. So being able to test your provisioning scripts in an isolated environment that can be spun up relatively quickly can save quite a bit of time.

I have found that working on a packer template includes three stages:

  1. Creating a very basic box with next to no configuration
  2. Testing provisioning scripts in a premade VM
  3. A full Packer run with the provisioning scripts

There may be some permutations of this pattern. For example I might remove windows update until the very end.

Test-Kitchen comes in real handy in step #2. You can also use the box produced by step #1 in your Test-Kitchen run. Depending on if I'm building Hyper-V or VirtualBox provider I'll go about this differently. Either way, a simple call to kitchen converge can be much faster than packer build.

Using kitchen-hyperv to test scripts on Hyper-V

The .kitchen.yml file included in my packer-templates repo uses the kitchen-hyperv driver to test my Chef recipes that provision the image:

---
driver:
  name: hyperv
  parent_vhd_folder: '../../output-hyperv-iso/virtual hard disks'
  parent_vhd_name: packer-hyperv-iso.vhdx

If I'm using a hyperv builder to first create a minimal image, packer puts the build .vhdx file in output-hyperv-iso/virtual hard disks. I can use kitchen-hyperv and point it at that image and it will create a new VM using that vhdx file as the parent of a new differencing disk where I can test my recipes. I can then have test-kitchen run these recipes in just a few minutes or less which is a much tighter feedback loop than packer provides.

Using kitchen-vagrant to test on Virtualbox

If you create a .box file with a minimal packer template, it will output that .box file in the root of the packer-template repo. You can add that box to your local vagrant repo by running:

vagrant box add 2016 .\windows2016min-virtualbox.box

Now you can test against this with a test-kitchen driver config that looks like:

---
driver:
  name: vagrant
  box: 2016

Check out my talk on creating windows vagrant boxes with packer at Hashiconf!

I'll be talking on this topic next month (September 2016) at Hashiconf. You can use my discount code SPKR-MWROCK for 15% off General Admission tickets.

Creating Hyper-V images with Packer by Matt Wrock

Over the last week I've been playing with some new significant changes to my packer build process. This includes replacing the Boxstarter based install with a Chef cookbook,  and also using a native Hyper-V builder. I'll be blogging about the Chef stuff later. This post will discuss how to use Taliesin Sisson's PR #2576 to build Hyper-V images with Hyper-V.

The state of Hyper-V builders

Packer currently comes bundled with builders for the major cloud providers, some local hypervisors like VirtualBox, VMware, Parallels, and QEMU as wells as OpenStack and Docker. There is no built in Hyper-V builder - the native hypervisor on Windows.

The packer ecosystem does provide a plugin model for supporting third party builders. In early 2015 it was announced that MSOpenTech had built a usable Hyper-V builder plugin and there were hopes to pull that into Packer core. This never happened. I personally see this like two technology asteroids rocketing past each other. The Hyper-V builder came in on a version of GO that Packer did not yet support but by the time it did, packer and Hyper-V had moved on.

I started playing with packer in July of 2015 and when I tried this builder on Windows 10 (a technical preview at the time) it just did not work. Likely this is because some things in Hyper-V, like its virtual machine file format had completely changed. Hard to say but as a Packer newcomer and wanting to just get an image built I quickly moved away from using a Hyper-V builder.

Converting VirtualBox to Hyper-V

After ditching the hope of building Hyper-V images with packer, I resurrected my daughter's half busted laptop to become my VirtualBox Packer builder. It worked great.

I also quickly discovered that I could simply convert the VirtualBox images to VHD format and create a Vagrant Hyper-V provider box without Hyper-V. I blogged about this procedure here and I think its still a good option for creating multiple providers on a single platform.

Its great to take the exact same image that provisions a VirtualBox VM to also provision a Hyper-V VM. However, its sometimes a pain to have to switch over to a different environment. My day to day dev environment uses Hyper-V and ideally this is where I would develop and test Packer builds as well.

A Hyper-V builder that works

So early this year I started hearing mumblings of a new PR to packer for an updated Hyper-V builder. My VirtualBox setup worked fine and I needed to produce both VirtualBox and Hyper-V providers anyways so I was not highly motivated to try out this new builder.

Well next month I will be speaking at Hashiconf about creating windows vagrant boxes with packer. It sure would be nice not to have to bring my VirtualBox rig and just use a Hyper-V builder on my personal laptop. (Oh and hey: Use my discount code SPKR-MWROCK for 15% off General Admission tickets to Hashiconf!)

So I finally took this PR for a spin last week and I was pretty amazed when it just worked. One thing I have noticed in "contemporary devops tooling" is that the chances of the tooling working on Windows is sketchy and as for Hyper-V? Good luck! No one uses it in the communities where I mingle (oh yeah...except me it sometimes seems). If few are testing the tooling and most building the tooling are not familiar with Windows environment nuances, its not a scenario optimized for success.

Using PR #2576 to build Hyper-V images

For those unfamiliar with working with Go source builds, getting the PR built and working is probably the biggest blocker to getting started. Its really not that bad at all and here is a step by step walk through to building the PR:

  1. Install golang using chocolateycinst golang -y. This puts Go in c:\tools\go
  2. Create a directory for Go deveopment: c:\dev\go and set $env:gopath to that path
  3. From that path run go get github.com/mitchellh/packer which will put packer's master branch in c:\dev\go\src\github.com\mitchellh\packer
  4. Navigate to that directory and add a git remote to Taliesin Sisson's PR branch: git remote add hyperv https://github.com/taliesins/packer
  5. Run git fetch hyperv and then git checkout hyperv. Now the code for this PR is on disk
  6. Build it with go build -o bin/packer.exe .
  7. Now the built packer.exe is at C:\dev\go\src\github.com\mitchellh\packer\bin\packer.exe

You can now run C:\dev\go\src\github.com\mitchellh\packer\bin\packer.exe build and this builder will be available!

Things to know

If you have used the VirtualBox builder, this builder is really not much different at all. The only thing that surprised and tripped me up a bit at first is that unless you configure it differently, the builder will create a new switch to be used by the VMs it creates. This switch may not be able to access the internet and your build might break. You can easily avoid this and use an existing switch by using the switch_name setting.

A working template

As I mentioned above, I've been working on using Chef instead of Boxstarter to provision the packer image. I've been testing this building a Windows Server 2016 TP5 image. Here is the Hyper-V template. The builder section is as follows:

  "builders": [
    {
      "type": "hyperv-iso",
      "guest_additions_mode": "disable",
      "iso_url": "{{ user `iso_url` }}",
      "iso_checksum": "{{ user `iso_checksum` }}",
      "iso_checksum_type": "md5",
      "ram_size_mb": 2048,
      "communicator": "winrm",
      "winrm_username": "vagrant",
      "winrm_password": "vagrant",
      "winrm_timeout": "12h",
      "shutdown_command": "C:/Windows/Panther/Unattend/packer_shutdown.bat",
      "shutdown_timeout": "15m",
      "switch_name": "internal_switch",
      "floppy_files": [
        "answer_files/2016/Autounattend.xml",
        "scripts/winrm.ps1"
      ]
    }
  ]

Documentation

Fortunately this PR includes updated documentation for the builder. You can view it in markdown here.

A look under the hood at Powershell Remoting through a cross plaform lens by Matt Wrock

Many Powershell enthusiasts don't realize that when they are using commands like New-PsSession and streaming pipelines to a powershell runspace on a remote machine, they are actually writing a binary message wrapped in a SOAP envelope that leverages a protocol with the namesake of Windows Vista. Not much over a year ago I certainly wasn't. This set of knowledge all began with needing to transfer files from a linux machine to a windows machine. In a pure linux world there is a well known tool for this called SCP. In Windows we map drives or stream bytes to a remote powershell session. How do we get a file (or command for that matter) from one of these platforms to the other?

I was about to take a plunge to go deeper than I really wanted into a pool where I did not really care to swim. And today I emerge with a cross platform "partial" implementation of Powershell Remoting in Ruby. No not just WinRM but a working PSRP client.

In this post I will cover how PSRP differs from its more familiar cross platform cousin WinRM, why its of value and how one can give it a try. Hopefully this will provide an interesting perspective into what Powershell Remoting looks like from an implementor's point of view.

In the beginning there was WinRM

While PSRP is a different protocol from WinRM (Windows Remote Management) with its own spec. It cannot exist or be explained without WinRM. WinRM is a SOAP based web service defined by a protocol called Web Services Management Protocol Extensions for Windows Vista (WSMV). I love that name. This protocol defines several different message types for performing different tasks and gathering different kinds of information on a remote instance. I'm going to focus here on the messages involved with invoking commands and collecting their output.

A typical WinRM based conversation for invoking commands goes something like this:

  1. Send a Create Shell message and get the shell id from the response
  2. Create a command in the shell sending the command and any arguments and grab the command id from the response
  3. Send a request for output on the command id which may return streams (stdout and/or stderr) containing base64 encoded text.
  4. Keep requesting output until the command state is done and examine the exit code.
  5. Send a command termination signal
  6. Send a delete shell message

The native windows tool (which nobody uses anymore) to speak pure winrm is winrs.exe.

C:\dev\winrm [winrm-v2]> winrs -r:http://192.168.137.10:5985 -u:vagrant -p:vagrant ipconfig

Windows IP Configuration

Ethernet adapter Ethernet:

   Connection-specific DNS Suffix  . : mshome.net
   Link-local IPv6 Address . . . . . : fe80::c11b:f734:5bd4:ab03%3
   IPv4 Address. . . . . . . . . . . : 192.168.137.10
   Subnet Mask . . . . . . . . . . . : 255.255.255.0
   Default Gateway . . . . . . . . . : 192.168.137.1

You can turn on analytical event log messages or watch a wireshark transcript of the communication. One thing is for sure, you will see a lot of XML and alot of namespace definitions. Its not fun to debug but you'll learn to appreciate it after examining PSRP transcripts.

Here's an example create command message:

<s:Envelope
 xmlns:s="http://www.w3.org/2003/05/soap-envelope"
 xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/08/addressing"
 xmlns:wsman="http://schemas.dmtf.org/wbem/wsman/1/wsman.xsd">
 <s:Header>
 <wsa:To>
 http://localhost:80/wsman
 </wsa:To>
 <wsman:ResourceURI s:mustUnderstand="true">
 http://schemas.microsoft.com/wbem/wsman/1/windows/shell/cmd
 </wsman:ResourceURI>
 <wsa:ReplyTo>
 <wsa:Address s:mustUnderstand="true">
 http://schemas.xmlsoap.org/ws/2004/08/addressing/role/anonymous
 </wsa:Address>
 </wsa:ReplyTo>
 <wsa:Action s:mustUnderstand="true">
 http://schemas.microsoft.com/wbem/wsman/1/windows/shell/Command
 </wsa:Action>
 <wsman:MaxEnvelopeSize s:mustUnderstand="true">153600</wsman:MaxEnvelopeSize>
 <wsa:MessageID>
 uuid:F8671978-E928-49DA-ADB8-5BF97EDD9535</wsa:MessageID>
 <wsman:Locale xml:lang="en-US" s:mustUnderstand="false" />
 <wsman:SelectorSet>
 <wsman:Selector Name="ShellId">
 uuid:0A442A7F-4627-43AE-8751-900B509F0A1F
 </wsman:Selector>
 </wsman:SelectorSet>
 <wsman:OptionSet xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
 <wsman:Option Name="WINRS_CONSOLEMODE_STDIN">TRUE</wsman:Option>
 <wsman:Option Name="WINRS_SKIP_CMD_SHELL">FALSE</wsman:Option>
 </wsman:OptionSet>
 <wsman:OperationTimeout>PT60.000S</wsman:OperationTimeout>
 </s:Header>
 <s:Body>
 <rsp:CommandLine
 xmlns:rsp="http://schemas.microsoft.com/wbem/wsman/1/windows/shell">
 <rsp:Command>del</rsp:Command>
 <rsp:Arguments>/p</rsp:Arguments>
 <rsp:Arguments>
 d:\temp\out.txt
 </rsp:Arguments>
 </rsp:CommandLine>
 </s:Body>
</s:Envelope>

Oh yeah. Thats good stuff. This runs del /p d:\temp\out.txt.

Powershell over WinRM

When you invoke a command over WinRM, you are running inside of a cmd.exe style shell. Just as you would inside a local cmd.exe, you can always run powershell.exe and pass it commands. Why would anyone ever do this? Usually its because they are using a cross platform WinRM library and its just the only way to do it.

There are popular libraries written for ruby, python, java, Go and others. Some of these abstract the extra powershell.exe call and make it feel like a true native powershell repl experience. The fact is that this works quite well and so why bothering implementing a separate protocol? As I'll cover in a bit, PSRP is much more complicated than vanila WSMV so if you can get away with the simpler protocol, great.

The limitations of WinRM

There are a few key limitations with WinRM. Many of these limitations are the same limitations involved with cmd.exe:

Multiple shells

You have to open two shells (processes). First the command shell and then startup a powershell instance. This can be a performance suck especially if you need to run several commands.

Maximum command length

The command line length is limited to 8k inside cmd.exe. Now you may ask, why in the world would you want to issue a command greater than 8192 characters? There are a couple common use cases here:

  1. You may have a long script (not just a single command) you want to run. However, this script is typically fed to the -command or -EncodedCommand argument of powershell.exe so this entire script needs to stay within the 8k threshold. Why not just run the script as a file? Ha!...Glad you asked.
  2. WinRM has no native means of copying files like SCP. So the common method of copying files via WinRM is to base64 encode a file's contents and create a command that appends 8k chunks to a file.

#2 is what sparked my interest in all of this. I just wanted to copy a damn file, So Shawn Neal, Fletcher Nichol and I wrote a ruby gem that leveraged WinRM to do just that. It basically does this alot:

"a whole bunch of base64 text" >> c:\some\file.txt

It turns out that 8k is not a whole lot of data and if you want to copy hundreds of megabytes or more, grab a book. We added some algorithms to make this as fast as possible like compressing multiple files before transferring and extracting them on the other end. However, you just cant get around the 8k transfer size and no performance trick is gonna make that fast.

More Powershell streams than command streams

Powershell supports much more than just stdout and stderr. Its got progress, verbose, etc, etc. The WSMV protocol has no rules for transmitting these other streams. So this means all streams other than the output stream is sent on stderr.

This can confuse some WinRM libraries and cause commands that are indeed successful to "appear" to fail. The trick is to "silence" these streams. For example the ruby WinRM gem prepends all powershell scripts with:

$ProgressPreference = "SilentlyContinue"

Talking with Windows Nano

The ruby WinRM gem uses the -EncodedCommand to send powershell command text to powershell.exe. This is a convenient way of avoiding quote hell and base64ing text that will be transferred inside XML markup. Well Nano's powershell.exe has no EncodedCommand argument and so the current ruby WinRM v1 gem cannot talk powershell with Windows Nano Server. Well that simply can't be. We have to be able to talk to Nano.

Introducing PSRP

So without further ado let me introduce PSRP. PSRP supports many message types for extracting all sorts of metadata about runspaces and commands. A full implementation of PSRP could create a rich REPL experience on non windows platforms. However in this post I'm gonna limit the discussion to messages involved in running commands and receiving their output.

As I mentioned before, PSRP cannot exist without WinRM. I did not just mean that in a philosophical sense, it literally sits on top of the WSMV protocol. Its sort of a protocol inside a protocol. Running commands and receiving their response includes the same exchange illustrated above and issuing the same WSMV messages. The key differences is that instead of issuing commands in these messages in plain text and recieving simple base64 encoded raw text output, the powershell commands are packaged as a binary PSRP message (or sequence of message fragments) and the response includes one or more binary fragments that are then "defragmented" into a single binary message.

PSRP Message Fragment

A complete WSMV SOAP envelope can only be so big. This size limitation is specified on the server via the MaxEnvelopeSizeKB setting. This defaults to 512 on 2012R2 and Nano server. So a very large pipeline script or a very large pipeline output must be split into fragments.

The PSRP spec illustrates a fragment as:

All fragments have an object id representing the message being fragmented and each fragment of that object will have incrementing fragment ids starting at 0. E and S are each a single bit flag that indicates if the fragment is an End fragment and if it is a Start fragment. So if an entire message fits into one fragment, both E and S will be 1.

The blob (the interesting stuff) is the actual PSRP message and of course the blob length of the blob in bytes. So the idea here is that you chain the blobs ob all fragments with the same object id in the order of fragment id and that aggregated blob is the PSRP message.

Here is an implementation of a message fragment written in ruby and here is how we unwrap several fragments into a message.

PSRP messages

There are 41 different types of PSRP messages. Here is the basic structure as illustrated in the PSRP spec:

Destination signifies who the message is for: client or server. Message type is a integer representing which of the 41 possible message types this is and RPID and PID both represent runspace_id and pipeline_id respectively. The data has the "meat" of the message and its structure is determined by the message type. The data is XML. Many powershellers are familiar with CLIXML. Thats the basic format of the message data. So in the case of a create_pipeline message, this will include the CLIXML representation of the powershell cmdlets and arguments to run. It can be quite verbose but always beautiful. The symmetric nature of XML really shines here.

Here is an implementation of PSRP message in ruby.

A "partial" implementation in Ruby

So as far as I am aware, the WinRM ruby gem has the first open source implementation of PSRP. Its not officially released yet, but the source is fully available and works (at least integration tests are passing). Why am I labeling it a "partial" implementation?

As I mentioned earlier, PSRP provides many message structures for listing runspaces, commands and gathering lots of metadata. The interests of the WinRM gem are simple and aims to adhere to the same basic interface it uses to issue WSMV messges (however we have rewritten the classes and methods for v2). Essentially we want to provide an SSH like experience where a user issues a command string and gets back string based standard output and error streams as well as an exit code. This really is a "dumbed down" rendering of what PSRP is capable of providing.

The possibilities are very exciting and perhaps we will add more true powershell REPL features in the future but today when one issues a powershell script, we are basically constructing CLIXML that emits the following command:

Invoke-Expression -Command "your command here" | Out-String -Stream

This means we do not have to write a CLIXML serializer/deserializer but we reap most of the benefits of running commands directly in powershell. No more multi shell lag, no more comand length limitations and hello Nano Server. In fact, our repo provides a Vagrantfile that provisions Windows Nano server for running integration tests.

Give it a try!

I have complete confidence that there are major flaws in the implementation as it is now. I've been testing all along the way but I'm just about to start really putting it through the ringer. I can guarantee that Write-Host "Hello World!" works flawlessly. The Hello juxtaposed starkly against the double pointed 'W' and ending in the minimalistic line on top of a dot (!) is pretty amazing. The readme in the winrm-v2 branch has been updated to document the code as it stands now and assuming you have git, ruby and bundler installed, here is a quick rundown of how to run some powershell using the new PSRP implementation:

git clone https://github.com/WinRb/WinRM
git fetch
git checkout winrm-v2
bundle install
bundle exec irb

require 'winrm'
opts = { 
  endpoint: "http://myhost:5985/wsman",
  user: 'administrator',
  password: 'Pass@word1'
}
conn = WinRM::Connection.new(opts)
conn.shell(:powershell) do |shell|
  shell.run('$PSVersionTable') do |stdout, stderr|
    STDOUT.print stdout
    STDERR.print stderr
  end
end

The interfaces are not entirely finalized so things may still change. The next steps are to refactor the winrm-fs and winrm-elevated gems to use this new winrm gem and also make sure that it works with vagrant and test-kitchen. I cant wait to start collecting benchmark data comparing file copy speeds using this new version and the one in use today!

Certificate (password-less) based authentication in WinRM by Matt Wrock

This week the WinRM ruby gem version 1.8.0 released adding support for certificate authentication. Many thanks to the contributions of @jfhutchi and @fgimenezm that make this possible. As I set out to test this feature, I explored how certificate authentication works in winrm using native windows tools like powershell remoting. My primary takeaway was that it was not at all straightforward to setup. If you have worked with similar authentication setups on linux using SSH commands, be prepared for more friction. Most of this is simply due to the lack of documentation and google results (well now there is one more). Regardless, I still think that once setup, authentication via certificates is a very good thing and many are not aware that this is available in WinRM.

This post will walk through how to configure certificate authentication, enumerate some of the "gotchas" and pitfalls one may encounter along the way and then explain how to use certificate authentication using Powershell Remoting as well as via the WinRM ruby gem which opens up the possibility of authenticating from a linux client to a Windows WinRM endpoint.

Why should I care about certificate based authentication?

First lets examine why certificate authentication has value. What's wrong with usernames and passwords? In short, certificates are more secure. I'm not going to go too deep here but here are a few points to consider:

  • Passwords can be obtained via brute force. You can protect against this by having longer, more complex passwords and changing them frequently. Very few actually do that, and even if you do, a complex password is way easier to break than a certificate. Its nearly impossible to brute force a private key of sufficient strength.

  • Sensitive data is not being transferred over the wire. You are sending a public key and if that falls into the wrong hands, no harm done.

  • There is a stronger trail of trust establishing that the person who is seeking authentication is in fact who they say they are given the multi layered process of having a generated certificate signed by a trusted certificate authority.

Its still important to remember that nothing may be able to protect us from sophisticated aliens or time traveling humans from the future. No means of security is impenetrable.

Not as convenient as SSH keys

So one reason some like to use certificates over passwords in SSH scenarios is ease of use. There is a one time setup "cost" of sending your public key to the remote server, but:

  1. SSH provides command line tools that make this fairly straight forward to setup from your local environment.

  2. Once setup, you just need to initiate an ssh session to the remote server and you don't have to hassle with entering a password.

Now the underlying cryptographic and authentication technology is no different using winrm, but both the initial setup and the "day to day" use of using the certificate to login is more burdensome. The details of why will become apparent throughout this post.

One important thing to consider though is that while winrm certificate authentication may be more burdensome, I don't think the primary use case is for user interactive login sessions (although that's too bad). In the case of automated services that need to interact with remote machines, these "burdens" simply need to be part of the connection automation and its just a non sentient cpu that does the suffering. Lets just hope they won't hold a grudge once sentience is obtained.

High level certificate authentication configuration overview

Here is a run down of what is involved to get everything setup for certificate authentication:

  1. Configure SSL connectivity to winrm on the endpoint

  2. Generate a user certificate used for authentication

  3. Enable Certificate authentication on the endpoint. Its disabled by default for server auth and enabled on the client side.

  4. Add the user certificate and its issuing CA certificate to the certificate store of the endpoint

  5. Create a user mapping in winrm with the thumbprint of the issuing certificate on the endpoint.

I'll walk through each of these steps here. When the above five steps are complete, you should be able to connect via certificate authentication using powershell remoting or using the ruby or python open source winrm libraries.

Setting up the SSL winrm listener

If you are using certificate authentication, you must use a https winrm endpoint. Attempts to authenicate with a certificate using http endpoints will fail. You can setup SSL on the endpoint with:

$ip="192.168.137.169" # your ip might be different
$c = New-SelfSignedCertificate -DnsName $ip `
                               -CertStoreLocation cert:\LocalMachine\My
winrm create winrm/config/Listener?Address=*+Transport=HTTPS "@{Hostname=`"$ip`";CertificateThumbprint=`"$($c.ThumbPrint)`"}"
netsh advfirewall firewall add rule name="WinRM-HTTPS" dir=in localport=5986 protocol=TCP action=allow

Generating a client certificate

Client certificates have two key requirements:

  1. An Extended Key Usage of Client Authentication

  2. A Subject Alternative Name with the UPN of the user.

Only ADCS certificates work from Windows 10/2012 R2 clients via powershell remoting

This was the step that I ended up spending the most time on. I continued to receive errors saying my certificate was malformed:

new-PSSession : The WinRM client cannot process the request. If you are using a machine certificate, it must contain a DNS name in the Subject Alternative Name extension or in the Subject Name field, and no UPN name. If you are using a user certificate, the Subject Alternative Name extension must contain a UPN name and must not contain a DNS name. Change the certificate structure and try the request again.

I was trying to authenticate from a windows 10 client using powershell remoting. I don't typically work or test in a domain environment and don't run an Active Directory Certificate Services authority. So I wanted to generate a certificate using either New-SelfSignedCertificate or OpenSSL.

In short here is the bump I hit: powershell remoting from a windows 10 or windows 2012 R2 client failed to authenticate with certificates generated from OpenSSL or New-SelfSignedCertificate. However these same certificates succeed to authenticate from windows 7 or windows 2008 R2. They only worked on Windows 10 and 2012 R2 if I used the ruby WinRM gem instead of powershell remoting. Note that while I tested on windows 10 and 2012 R2, I'm sure that windows 8.1 suffers the same issue. The only certificates I got to work on windows 10 and 2012 R2 via powershell remoting were created via an Active Directory Certificate Services Enterprise CA .

So unless I can find out otherwise, it seems that you must have access to an Enterprise root CA in Active Directory Certificate Services and have client certificates issued in order to use certificate authentication from powershell remoting on these later windows versions. If you are using ADCS, the stock client template should work.

Generating client certificates via OpenSSl

As stated above, certificates generated using OpenSSL or New-SelfSignedCertificate did not work using powershell remoting from windows 10 or 2012 R2. However, if you are using a previous version of windows or if you are using another client (like the ruby or python libraries), then these other non-ADCS methods will work fine and do not require the creation of a domain controller and certificate authority servers.

If you do not already have OpenSSL tools installed, you can get them via chocolatey:

cinst openssl.light -y

Then you can run the following powershell to generate a correctly formatted user certificate which was adapted from this bash script:

function New-ClientCertificate {
  param([String]$username, [String]$basePath = ((Resolve-Parh .).Path))

  $OPENSSL_CONF=[System.IO.Path]::GetTempFileName()

  Set-Content -Path $OPENSSL_CONF -Value @"
  distinguished_name = req_distinguished_name
  [req_distinguished_name]
  [v3_req_client]
  extendedKeyUsage = clientAuth
  subjectAltName = otherName:1.3.6.1.4.1.311.20.2.3;UTF8:$username@localhost
"@

  $user_path = Join-Path $basePath user.pem
  $key_path = Join-Path $basePath key.pem
  $pfx_path = Join-Path $basePath user.pfx

  openssl req -x509 -nodes -days 3650 -newkey rsa:2048 -out $user_path -outform PEM -keyout $key_path -subj "/CN=$username" -extensions v3_req_client 2>&1

  openssl pkcs12 -export -in $user_path -inkey $key_path -out $pfx_path -passout pass: 2>&1

  del $OPENSSL_CONF
}

This will output a certificate and private key file both in base64 .pem format and additionally a .pfx formatted file.

Generating client certificates via New-SelfSignedCertificate

If you are on windows 10 or server 2016, then you should have a more advanced version of the New-SelfSignedCertificate cmdlet - more advaced than what shipped with windows 2012 R2 and 8.1. Here is the command to generate the script:

New-SelfSignedCertificate -Type Custom `
                          -Container test* -Subject "CN=vagrant" `
                          -TextExtension @("2.5.29.37={text}1.3.6.1.5.5.7.3.2","2.5.29.17={text}upn=vagrant@localhost") `
                          -KeyUsage DigitalSignature,KeyEncipherment `
                          -KeyAlgorithm RSA `
                          -KeyLength 2048

This will add a  certificate for a vagrant user to the personal LocalComputer folder in the certificate store.

Enable certificate authentication

This is perhaps the simplest step. By default, certificate authentication is enabled for clients and disabled for server. So you will need to enable it on the endpoint:

Set-Item -Path WSMan:\localhost\Service\Auth\Certificate -Value $true

Import the certificate to the appropriate certificate store locations

If you are using powershell remoting, the user certificate and its private key should be in the My directory of either the LocalMachine or the CurrentUser store on the client. If you are using a cross platform library like the ruby or python library, the cert does not need to be in the store at all on the client. However, regardless of client implementation, it must be added to the server certificate store.

Importing on the client

As stated above, this is necessary for powershell remoting clients. If you used ADCS or New-SelfSignedCertificate, then the generated certificate is added automatically. However if you used OpenSSL, you need to import the .pfx yourself:

Import-pfxCertificate -FilePath user.pfx `
                      -CertStoreLocation Cert:\LocalMachine\my

Importing on the server

There are two steps to timporting the certificate on the endpoint:

  1. The issuing certificate must be present in the Trusted Root Certification Authorities of the LocalMachine store

  2. The client certificate public key must be present in the Trusted People folder of the LocalMachine store

Depending on your setup, the issuing certificate may already be in the Trusted Root location. This is the certificate used to issue the client cert. If you are using your own enterprise certificate authority or a publicly valid CA cert, its likely you already have this in the trusted roots. If you used OpenSSL or New-SelfSignedCertificate then the user certificate was issued by itself and needs to be imported.

If you used OpenSSL, you already have the .pem public key. Otherwise you can export it:

Get-ChildItem cert:\LocalMachine\my\7C8DCBD5427AFEE6560F4AF524E325915F51172C |
  Export-Certificate -FilePath myexport.cer -Type Cert

This assumes that 7C8DCBD5427AFEE6560F4AF524E325915F51172C is the thumbprint of your issuing certificate. I guarantee that is an incorrect assumption.

Now import these on the endpoint:

Import-Certificate -FilePath .\myexport.cer `
                   -CertStoreLocation cert:\LocalMachine\root
Import-Certificate -FilePath .\myexport.cer `
                   -CertStoreLocation cert:\LocalMachine\TrustedPeople

Create the winrm user mapping

This will declare on the endpoint: given a issuing CA, which certificates to allow access. You can potentially add multiple entries for different users or use a wildcard. We'll just map our one user:

New-Item -Path WSMan:\localhost\ClientCertificate `
         -Subject 'vagrant@localhost' `
         -URI * `
         -Issuer 7C8DCBD5427AFEE6560F4AF524E325915F51172C `
         -Credential (Get-Credential) `
         -Force

Again this assumes the issuing certificate thumbprint of the certificate that issued our user certificate is 7C8DCBD5427AFEE6560F4AF524E325915F51172C and we are allowing access to a local account called vagrant. Note that if your user certificate is self-signed, you would use the thumbprint of the user certificate itself.

Using certificate authentication

This completes the setup. Now we should actually be able to login remotely to the endpoint. I'll demonstrate this first using powershell remoting and then ruby.

Powershell remoting

C:\dev\WinRM [master]>Enter-PSSession -ComputerName 192.168.137.79 `
>> -CertificateThumbprint 7C8DCBD5427AFEE6560F4AF524E325915F51172C
[192.168.137.79]: PS C:\Users\vagrant\Documents>

Ruby WinRM gem

C:\dev\WinRM [master +3 ~0 -0 !]> gem install winrm
WARNING:  You don't have c:\users\matt\appdata\local\chefdk\gem\ruby\2.1.0\bin in your PATH,
          gem executables will not run.
Successfully installed winrm-1.8.0
Parsing documentation for winrm-1.8.0
Done installing documentation for winrm after 0 seconds
1 gem installed
C:\dev\WinRM [master +3 ~0 -0 !]> irb
irb(main):001:0> require 'winrm'
=> true
irb(main):002:0> endpoint = 'https://192.168.137.169:5986/wsman'
=> "https://192.168.137.169:5986/wsman"
irb(main):003:0> puts WinRM::WinRMWebService.new(
irb(main):004:1*   endpoint,
irb(main):005:1*   :ssl,
irb(main):006:1*   :client_cert => 'user.pem',
irb(main):007:1*   :client_key => 'key.pem',
irb(main):008:1*   :no_ssl_peer_verification => true
irb(main):009:1> ).create_executor.run_cmd('ipconfig').stdout

Windows IP Configuration


Ethernet adapter Ethernet:

   Connection-specific DNS Suffix  . : mshome.net
   Link-local IPv6 Address . . . . . : fe80::6c3f:586a:bdc0:5b4c%12
   IPv4 Address. . . . . . . . . . . : 192.168.137.169
   Subnet Mask . . . . . . . . . . . : 255.255.255.0
   Default Gateway . . . . . . . . . : 192.168.137.1

Tunnel adapter Local Area Connection* 12:

   Connection-specific DNS Suffix  . :
   IPv6 Address. . . . . . . . . . . : 2001:0:5ef5:79fd:24bc:3d4c:3f57:7656
   Link-local IPv6 Address . . . . . : fe80::24bc:3d4c:3f57:7656%14
   Default Gateway . . . . . . . . . : ::

Tunnel adapter isatap.mshome.net:

   Media State . . . . . . . . . . . : Media disconnected
   Connection-specific DNS Suffix  . : mshome.net
=> nil
irb(main):010:0>

Interested?

This functionality just became available in the winrm gem 1.8.0 this week. This gem is used by Vagrant, Chef and Test-Kitchen to connect to remote machines. However, none of these applications provide configuration options to make use of certificate authentication via winrm. My personal observation has been that nearly no one uses certificate authentication with winrm but that may be a false observation or a result of the fact that few no about this possibility.

If you are interested in using this in Chef, Vagrant or Test-Kitchen, please file an issue against their respective github repositories and make sure to @ mention me (@mwrock) and I'll see what I can do to plug this in or you can submit a PR yourself if so inclined.