Hosting a .Net 4 runtime inside Powershell v2 by Matt Wrock

My wish for the readers of this post is that you find this completely irrelevant and wonder why folks would wish to inflict powershell v2 on themselves now that we are on a much improved v5. However the reality is that many many machines are still running windows 7 and server 2008R2 without an upgraded powershell.

As I was working on Boxstarter 2.6 to support Chocolatey 0.9.9 which now ships as a .net 4 assembly, I had to be able to load it inside of Powershell 2 since I still want to support virgin win7/2008R2 environments. Without "help", this will fail because Powershell 2 hosts .Net 3.5. I really don't want to ask users to install an updated WMF prior to using Boxstarter because that violates the core mission of Boxstarter which is to setup a machine from scratch.

Adjusting CLR version system wide

So after some investigation I found several posts telling me what I already knew which included the following solutions:

  1. Upgrade to a WMF 3 or higher
  2. Create or edit a Powershell.exe.config file in C:\WINDOWS\System32\WindowsPowerShell\v1.0 setting the supportedRuntime to .net 4
  3. Edit the  hklm\software\microsoft\.netframework registry key to only use the latest CLR

I have already mentioned why option 1 was not an option. Options 2 and 3 are equally unpalatable if you do not "own" the system since both change system wide behavior. I just want to change the behavior when my application is running.

An application scoped solution

So after more digging I found an obscure, and seemingly undocumented environment variable that can impact the version of the .net runtime loaded: $env:COMPLUS_version. If you set this variable to "v4.0.30319" and then spawn a new process, that process will use the specified version of the .net runtime.

PS C:\Users\Administrator> $PSVersionTable

Name                           Value
----                           -----
CLRVersion                     2.0.50727.5420
BuildVersion                   6.1.7601.17514
PSVersion                      2.0
WSManStackVersion              2.0
PSCompatibleVersions           {1.0, 2.0}
SerializationVersion           1.1.0.1
PSRemotingProtocolVersion      2.1


PS C:\Users\Administrator> $env:COMPLUS_version="v4.0.30319"
PS C:\Users\Administrator> & powershell { $psVersionTable }

Name                           Value
----                           -----
PSVersion                      2.0
PSCompatibleVersions           {1.0, 2.0}
BuildVersion                   6.1.7601.17514
CLRVersion                     4.0.30319.17929
WSManStackVersion              2.0
PSRemotingProtocolVersion      2.1
SerializationVersion           1.1.0.1

A script that runs commands in .net 4

So given that this works, I created a Enter-DotNet4 command that allows one to run ad hoc scripts inside .net 4. Here it is:

function Enter-Dotnet4 {
<#
.SYNOPSIS
Runs a script from a process hosting the .net 4 runtile

.DESCRIPTION
This function will ensure that the .net 4 runtime is installed on the
machine. If it is not, it will be downloaded and installed. If running
remotely, the .net 4 installation will run from a scheduled task.

If the CLRVersion of the hosting powershell process is less than 4,
such as is the case in powershell 2, the given script will be run
from a new a new powershell process tht will be configured to host the
CLRVersion 4.0.30319.

.Parameter ScriptBlock
The script to be executed in the .net 4 CLR

.Parameter ArgumentList
Arguments to be passed to the ScriptBlock

.LINK
http://boxstarter.org

#>
    param(
        [ScriptBlock]$ScriptBlock,
        [object[]]$ArgumentList
    )
    Enable-Net40
    if($PSVersionTable.CLRVersion.Major -lt 4) {
        Write-BoxstarterMessage "Relaunching powershell under .net fx v4" -verbose
        $env:COMPLUS_version="v4.0.30319"
        & powershell -OutputFormat Text -ExecutionPolicy bypass -command $ScriptBlock -args $ArgumentList
    }
    else {
        Write-BoxstarterMessage "Using current powershell..." -verbose
        Invoke-Command -ScriptBlock $ScriptBlock -argumentlist $ArgumentList
    }
}

function Enable-Net40 {
    if(!(test-path "hklm:\SOFTWARE\Microsoft\.NETFramework\v4.0.30319")) {
        if((Test-PendingReboot) -and $Boxstarter.RebootOk) {return Invoke-Reboot}
        Write-BoxstarterMessage "Downloading .net 4.5..."
        Get-HttpResource "http://download.microsoft.com/download/b/a/4/ba4a7e71-2906-4b2d-a0e1-80cf16844f5f/dotnetfx45_full_x86_x64.exe" "$env:temp\net45.exe"
        Write-BoxstarterMessage "Installing .net 4.5..."
        if(Get-IsRemote) {
            Invoke-FromTask @"
Start-Process "$env:temp\net45.exe" -verb runas -wait -argumentList "/quiet /norestart /log $env:temp\net45.log"
"@
        }
        else {
            $proc = Start-Process "$env:temp\net45.exe" -verb runas -argumentList "/quiet /norestart /log $env:temp\net45.log" -PassThru 
            while(!$proc.HasExited){ sleep -Seconds 1 }
        }
    }
}

This will install .net 4.5 if not already installed and then spawn a new powershell process to run the given commands with the .net 4 runtime hosted.

Does not work in a remote shell

One scenario where this does not work is if you are remoted on a Powershell v2 machine. The .net4 CLR will almost immediately crash. My guess is that this is related to the fact that remote shells have an inherently different hosting model and run under wsmprovhost.exe or winrshost.exe.

The workaround for this in Boxstarter is to call the chocolatey.dll in a Scheduled Task instead of using Enter-DotNet4 when running remote.

Released Boxstarter 2.6 now with an embedded Chocolatey 0.9.10 Beta by Matt Wrock

Today I released Boxstarter 2.6. This release brings Chocolatey support up to the latest beta release of the Chocolatey core library. In March of this year, Chocolatey released a fully rewritten version 0.9.9. Prior to this release, Chocolatey was released as a Powershell module. Boxstarter intercepted every Chocolatey call and could easily maintain state as both chocolatey and boxstarter coexisted inside the same powershell process. With the 0.9.9 rewrite, Chocolatey ships as a .Net executable and creates a separate powershell process to run each package. So there has been lot of work to create a different execution flow requiring changes to almost every aspect of Boxstarter. While this may not introduce new boxstarter features, it does mean one can now take advantage of all new features in Chocolatey today.

A non breaking release

This release should not introduce any breaking functionality from previous releases. I have tested many different usage scenarios. I also increased the overall unit and functional test coverage of boxstarter in this release to be able to more easily validate the impact of the Chocolatey overhaul. That all said, there has been alot of changes to how boxstarter and chocolatey interact and its always possible that some bugs have hidden themselves away. So please report issues on github as soon as you encounter problems and I will do my best to resolve problems quickly. Pull requests are welcome too!

Where is Chocolatey?

One thing that may come as a surprise to some is that Boxstarter no longer installs Chocolatey. One may wonder, how can this be? Well, Chocolatey now exposes its core functionality via an API (chocolatey.dll). So if you are setting up a new machine with boxstarter, you will still find the Chocolatey repository in c:\ProgramData\Chocolatey, but no choco.exe. Further the typical chocolatey commands: choco, cinst, cup, etc will not be recognized commands on the command line after the Boxstarter run unless you explicitly install Chocolatey.

You may do just that: install chocolatey inside a boxstarter package if you would like the machine setup to include a working chocolatey command line.

iex ((new-object net.webclient).DownloadString('https://chocolatey.org/install.ps1'))

You'd have to be nuts NOT to want that.

Running 0.9.10-beta-20151210

When I say Boxstarter is running the latest Chocolatey, I really mean the latest prerelease. Why? That has a working version of the WindowsFeatures chocolatey feed. When the new version of Chocolatey was released, The WindowsFeature source feed did not make it in. However it has been recently added and because it is common to want to toggle windows features when setting up a machine and many Boxstarter packages make use of it, I consider it an important feature to include.


Fixing - WinRM Firewall exception rule not working when Internet Connection Type is set to Public by Matt Wrock

You may have seen the following error when either running Enable-PSRemoting or Set-WSManQuickConfig:

Set-WSManQuickConfig : <f:WSManFault xmlns:f="http://schemas.microsoft.com/wbem/wsman/1/wsmanfault" Code="2150859113"
Machine="localhost"><f:Message><f:ProviderFault provider="Config provider"
path="%systemroot%\system32\WsmSvc.dll"><f:WSManFault xmlns:f="http://schemas.microsoft.com/wbem/wsman/1/wsmanfault"
Code="2150859113" Machine="win81"><f:Message>WinRM firewall exception will not work since one of the network
connection types on this machine is set to Public. Change the network connection type to either Domain or Private and
try again. </f:Message></f:WSManFault></f:ProviderFault></f:Message></f:WSManFault>
At line:1 char:1
+ Set-WSManQuickConfig -Force
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (:) [Set-WSManQuickConfig], InvalidOperationException
    + FullyQualifiedErrorId : WsManError,Microsoft.WSMan.Management.SetWSManQuickConfigCommand

This post will explain how to get around this error. There are different ways to do this depending on your operating system version. Windows 8/2012 workarounds are fairly sane while windows 7/2008R2 may seem a bit obtuse.

This post is inspired by an experience I had this week trying to get a customer's Chef node able to connect via WinRM over SSL. Her test node was running Windows 7 and she was getting the above error when trying to enable WinRM. Windows 7 has no way to change the connection Type via a native powershell cmdlet. I had done this via the commandline before on Windows 7 but did not have the commands handy. Further, it had been so long since changing the connection type on windows 7 via the GUI, I had to fire up my own win 7 VM and run through it just so I could relay propper instructions to this very patient customer.

So I write this to run through the different nuances of connection types on different operating systems but primarily to have a known place on the internet where I can stash the commands.

TL;DR for Windows 7 or 2008R2 Clients:

If you dont care about anything else other than getting past this error on windows 7 or 2008R2 without ceremonial pointing and clicking, simply run these commands:

$networkListManager = [Activator]::CreateInstance([Type]::GetTypeFromCLSID([Guid]"{DCB00C01-570F-4A9B-8D69-199FDBA5723B}")) 
$connections = $networkListManager.GetNetworkConnections() 

# Set network location to Private for all networks 
$connections | % {$_.GetNetwork().SetCategory(1)}

This works on windows 8/2012 and up as well but there are much friendlier commands you can run instead.  Unless you are one partial to GUIDs.

Connection Types - what does it mean?

Windows provide different connection type profiles (or Network Locations) each with different levels of restrictions on what connections can be granted to the local computer on the network.

I have personally always found these types to be confusing yet well meaning. You are perhaps familiar with the message presented the first time you logon to a machine asking you if you would like the computer to be discoverable on the internet. If you choose no, the network interface is given a public internet connection profile. If you choose "yes" then it is private. For me the confusion is that I equate "public" with "publicly accessible" but here the opposite applies.

Public network locations have Network Discovery turned off  and restrict your firewall for some applications. You cannot create or join Homegroups with this setting. WinRM firewall exception rules also cannot be enabled on a public network. Your network location must be private in order for other machines to make a WinRM connection to the computer.

Domain Networks

If your computer is on a domain, that is an entirely different network location type. On a domain network, the accessibility of the machine is governed by your domain policies. This network location type is automatically chosen if your machine is part of an Active Directory domain.

Working around Public network restrictions on Windows 8 and up

When enabling WinRM, client SKUs of windows (8, 8.1, 10) expose an additional setting that allow the machine to be discoverable over WinRM publicly but only on the same subnet. By using the -SkipNetworkProfileCheck switch of Enable-PSRemoting or Set-WSManQuickConfig you can still allow connections to your computer but those connections must come from other machines on the same subnet.

Enable-PSRemoting -SkipNetworkProfileCheck

So this can work for local VMs but will still be restrictive for cloud based VMs.

Changing the Network Location

Once you answer yes or no the initial question of whether you want to be discovered, you are never prompted again, but you can change this setting later.

This is a family friendly blog so I am not going to cover how to change the Network Location via the GUI. It can be done but you are a dirty person for doing so (full disclosure - I have been guilty of doing this).

Windows 8/2012 and up

Powershell version 3 and later expose cmdlets that allow you to see and change your Network Location. Get-NetConnectionProfile shows you the network location of all network interfaces:

PS C:\Windows\system32> Get-NetConnectionProfile


Name             : Network  2
InterfaceAlias   : Ethernet
InterfaceIndex   : 3
NetworkCategory  : Private
IPv4Connectivity : Internet
IPv6Connectivity : LocalNetwork

Note the NetworkCategory above. The Network Location is private.

Use the Set-NetConnectionProfile to change the location type:

Set-NetConnectionProfile -InterfaceAlias Ethernet -NetworkCategory Public

or

Get-NetConnectionProfile | Set-NetConnectionProfile -NetworkCategory Private

The later is convenient if you want to ensure that all network interfaces are set to a particular location.

Windows 7 and 2008R2

You will not have the above cmdlets available on Windows 7 or 2008R2. You can still change the location on the command line but the commands are far less intuitive. As shown in the tl;dr, here is the command:

$networkListManager = [Activator]::CreateInstance([Type]::GetTypeFromCLSID([Guid]"{DCB00C01-570F-4A9B-8D69-199FDBA5723B}")) 
$connections = $networkListManager.GetNetworkConnections() 

# Set network location to Private for all networks 
$connections | % {$_.GetNetwork().SetCategory(1)}

First we get a reference to a COM instance of an INetworkListManager which naturally has a Class ID of DCB00C01-570F-4A9B-8D69-199FDBA5723B. We then grab all the network connections and finally set them all to the desired location:

  • 0 - Public
  • 1 - Private
  • 2 - Domain

Understanding and troubleshooting WinRM connection and authentication: a thrill seeker's guide to adventure by Matt Wrock

Connecting to a remote windows machine is often far more difficult than one would have expected. This was my experience years ago when I made my first attempt to use powershell remoting to connect to an Azure VM. At the time, powershell 2 was the hotness and many were talking up its remoting capabilities. I had been using powershell for about a year at the time and thought I'd give it a go. It wasn't simple at all and took a few hours to finally succeed.

Now armed with 2012R2 and more knowledge its simpler but lets say you are trying to connect from a linux box using one of the open source WinRM ports, there are several gotchas.

I started working for Chef about six weeks ago and it is not at all uncommon to find customers and fellow employees struggling with failure to talk to a remote windows node. I'd like to lay out in this post some of the fundamental moving parts as well as the troubleshooting decision tree I often use to figure out where things are wrong and how to get connected.

I'll address cross platform scenarios using plain WinRM, powershell remoting from windows and some Chef specific tooling using the knife-windows gem.

Connecting and Authenticating

In my experience these are the primary hurdles to WinRM sweet success. First is connecting. Can I successfully establish a connection on a WinRM port to the remote machine? There are several things to get in the way here. Then a yak shave or two later you get past connectivity but are not granted access. What's that you say? You are signing in with admin credentials to the box?...I'm sorry say that again?...huh?...I just can't hear you.

TL;DR - A WinRM WTF checklist:

I am going to go into detail in this post on the different gotchas and their accompanying settings needed to successfully connect and execute commands on a remote windows machine using WinRM. However, if you are stuck right now and don't want to sift through all of this, here is a cheat sheet list of things to set to get you out of trouble:

On the remote windows machine:

  • Run Enable-PSRemoting
  • Open the firewall with: netsh advfirewall firewall add rule name="WinRM-HTTP" dir=in localport=5985 protocol=TCP action=allow
  • Accessing via cross platform tools like chef, vagrant, packer, ruby or go? Run these commands:
winrm set winrm/config/client/auth '@{Basic="true"}'
winrm set winrm/config/service/auth '@{Basic="true"}'
winrm set winrm/config/service '@{AllowUnencrypted="true"}'

Note: DO NOT use the above winrm settings on production nodes. This should be used for tets instances only for troubleshooting WinRM connectivity.

This checklist is likely to address most trouble scenarios when accessing winrm over HTTP. If you are still stuck or want to understand this domain more, please read on.

Barriers to entry

Lets talk about connectivity first. Here are the key issues that can prevent connection attempts to a WinRM endpoint:

  • The Winrm service is not running on the remote machine
  • The firewall on the remote machine is refusing connections
  • A proxy server stands in the way
  • Improper SSL configuration for HTTPS connections

We'll address each of these scenarios but first...

How can I know if I can connect?

It can often be unclear whether we are fighting a connection or authentication problem. So I'll point out how you can determine if you can eliminate connectivity as a potential issue.

On Mac/Linux:

$ nc -z -w1 <IP or host name> 5985;echo $?

This uses netcat available on the mac and most linux distros. Assuming you are using the default HTTP based WinRM port 5985 (more on determining the correct port in just a bit), if the above returns 0, you know you are getting through to a listening WinRM endpoint on the other side.

On Windows:

Test-WSMan -ComputerName <IP or host name>

Again this assumes you are trying to connect over the default HTTP WinRM port (5985), if not add -UseSSL. This should return some non-error response that looks something like:

wsmid         : http://schemas.dmtf.org/wbem/wsman/identity/1/wsmanidentity.xsd
ProtocolVersion : http://schemas.dmtf.org/wbem/wsman/1/wsman.xsd
ProductVendor   : Microsoft Corporation
ProductVersion  : OS: 0.0.0 SP: 0.0 Stack: 3.0

WinRM Ports

The above commands used the default WinRM HTTP port to attempt to connect to the remote WinRM endpoint - 5985. WinRM is a SOAP based HTTP protocol.

Side Note: In 2002, I used to car pool to my job in Sherman Oaks California with my friend Jimmy Bizzaro and would kill time by reading "Programming Web Services with SOAP" an O'Reilly publication. This was cutting edge, cool stuff. Java talking to .net, Java talking to Java but from different machines. This was the future. REST was done in a bed or on a toilet. So always remember, today's GO and Rust could be tomorrow's soap.

Anyhoo...WinRM can talk HTTP and HTTPS. The default ports are 5985 and 5986 respectfully. However the default ports can be changed. Now usually the change is driven by network address translation. Sure these ports can be changed locally too, but in my experience if you need to access WinRM on ports other than 5985 or 5986 its usually to accommodate NAT. So check your Virtualbox NAT config or your Azure or EC2 port mappings to see if there is a port forwarding to 5985/6 on the VM. Those would be the ports you need to use. The Test-WSMan cmdlet also takes a -port parameter where you can provide a non standard WinRM port.

So now you know the port to test but you are getting a non 0 netcat response or an error thrown from Test-WSMan. Now What?

Is WinRM turned on?

This is the first question I ask. If winrm is not listening for requests, then there is nothing to connect to. There are a couple ways to do this. What you usually do NOT want to do is simply start the winrm service. Not that that is a bad thing, its just not likely going to be enough. The two best ways to "turn on" WinRM are:

winrm quickconfig

or the powershell approach:

Enable-PSRemoting

For default windows 2012R2 installs (not altered by group policy), this should be on by default. However windows 2008R2 and client SKUs will be turned off until enabled.

Foiled by Public Network Location

You may get the following error when enabling winrm:

Set-WSManQuickConfig : <f:WSManFault xmlns:f="http://schemas.microsoft.com/wbem/wsman/1/wsmanfault" Code="2150859113"
Machine="localhost"><f:Message><f:ProviderFault provider="Config provider"
path="%systemroot%\system32\WsmSvc.dll"><f:WSManFault xmlns:f="http://schemas.microsoft.com/wbem/wsman/1/wsmanfault"
Code="2150859113" Machine="win81"><f:Message>WinRM firewall exception will not work since one of the network
connection types on this machine is set to Public. Change the network connection type to either Domain or Private and
try again. </f:Message></f:WSManFault></f:ProviderFault></f:Message></f:WSManFault>
At line:1 char:1
+ Set-WSManQuickConfig -Force
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (:) [Set-WSManQuickConfig], InvalidOperationException
    + FullyQualifiedErrorId : WsManError,Microsoft.WSMan.Management.SetWSManQuickConfigCommand

You need to set the Network Location to Private. I have written a post devoted to Internet Connection Type. There are different ways to set the location on different windows versions. You can view the details in the above post but the one that is the most obscure but universally works across all versions is:

$networkListManager = [Activator]::CreateInstance([Type]::GetTypeFromCLSID([Guid]"{DCB00C01-570F-4A9B-8D69-199FDBA5723B}")) 
$connections = $networkListManager.GetNetworkConnections() 

# Set network location to Private for all networks 
$connections | % {$_.GetNetwork().SetCategory(1)}


Wall of fire

In some circles called a firewall. This can often be a blocker. For instance, while winrm is on by default on 2012R2, its firewall rules will block public traffic from outside its own subnet. So if you are trying to connect to a server in EC2 or Azure for example, opening this firewall restriction is important and can be done with:

HTTP:

netsh advfirewall firewall add rule name="WinRM-HTTP" dir=in localport=5985 protocol=TCP action=allow

HTTPS:

netsh advfirewall firewall add rule name="WinRM-HTTPS" dir=in localport=5986 protocol=TCP action=allow

This also affects client SKUs which by default do not open the firewall to any public traffic. If you are on a client version of windows 8 or higher, you can also use the -SkipNetworkProfileCheck switch when enabling winrm via Enable-PSRemoting which will at least open public traffic to the local subnet and may be enough if connecting to a machine on a local hypervisor.

Proxy Servers

As already stated, WinRM runs over http. Therefore if you have a proxy server sitting between you and the remote machine you are trying to connect to, you need to make sure that the request goes through that proxy server. This is usually not an issue if you are on a windows machine and using a native windows API like powershell remoting or winrs to connect. They will simply use the proxy settings in your internet settings.

Ruby tooling like Chef, Vagrant, or others uses a different mechanism. If the tool is using the WinRM ruby gem, like chef and vagrant do, they rely on the HTTP_PROXY environment variable instead of the local system's internet settings. As of knife-windows 1.1.0, the http_proxy settings in your knife.rb config file will make its way to the HTTP_PROXY environment variable. You can manually set this as follows:

Mac/Linux:

$ export HTTP_PROXY="http://<proxy server>:<proxy port>/"

Windows Powershell:

$env:HTTP_PROXY="http://<proxy server>:<proxy port>/"

Windows Cmd:

exit

Friends don't let friends use cmd.exe and you are my friend.

SSL

I'm saving SSL for the last connection issue because it is more involved (why folks often opt for HTTP over the more secure HTTPS). There is extra configuration required both on both the remote and local side and that can vary by local platform. Lets first discuss what must be done on the remote WinRM endpoint.

Create a self signed certificate

Assuming you have not purchased a SSL certificate from a valid certificate authority, you will need to generate a self signed certificate. If your are on a 2012R2 windows os version or later, this is trivial:

$c = New-SelfSignedCertificate -DnsName "<IP or host name>" -CertStoreLocation cert:\LocalMachine\My

Read ahead for issues with New-SelfSignedCertificate and certificate verification with openssl libraries.

Creating a HTTPS WinRM listener

Now WinRM needs to be configured to respond to https requests. This is done by adding an https listener and associating it with the thumbprint of the self signed cert you just created.

winrm create winrm/config/Listener?Address=*+Transport=HTTPS "@{Hostname=`"<IP or host name>`";CertificateThumbprint=`"$($c.ThumbPrint)`"}"

Adding firewall rule

Finally enable winrm https requests through the firewall:

netsh advfirewall firewall add rule name="WinRM-HTTPS" dir=in localport=5986 protocol=TCP action=allow

SSL client configuration

At this point you should be able to reach a listening WinRM endpoint on the remote server. On a mac or linux box, a netcat check on the https winrm port should be successful:

$ nc -z -w1 <IP or host name> 5986;echo $?

On Windows, runing Test-NetConnection (a welcome alternative to telnet on windows 8/2012 or higher) should show an open TCP port:

C:\> Test-netConnection <IP> -Port 5986

ComputerName           : <IP>
RemoteAddress          : <IP>
RemotePort             : 5986
InterfaceAlias         : vEthernet (External Virtual Switch)
SourceAddress          : <local IP>
PingSucceeded          : True
PingReplyDetails (RTT) : 0 ms
TcpTestSucceeded       : True

However, trying to establish a WinRM connection will likely fail with a certificate validation error unless you install that same self signed cert you created on the remote endpoint.

If you try to test the connection on windows using Test-WSMan as we saw before, you would receive this error:

Test-WSMan -ComputerName 192.168.1.153 -UseSSL
Test-WSMan : <f:WSManFault
xmlns:f="http://schemas.microsoft.com/wbem/wsman/1/wsmanfault" Code="12175"
Machine="ultrawrock"><f:Message>The server certificate on the destination
computer (192.168.1.153:5986) has the following errors:
The SSL certificate is signed by an unknown certificate authority.
The SSL certificate contains a common name (CN) that does not match the
hostname.     </f:Message></f:WSManFault>
At line:1 char:1
+ Test-WSMan -ComputerName 192.168.1.153 -UseSSL
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (192.168.1.153:String) [Test-W
   SMan], InvalidOperationException
    + FullyQualifiedErrorId : WsManError,Microsoft.WSMan.Management.TestWSManC
   ommand

Now you have a few options depending on your platform and needs:

  • Do not install the certificate and disable certificate verification (not recommended)
  • Install to the windows certificate store if you are on windows and need to use native windows APIs like powershell remoting
  • Export the certificate to a .pem file for use with ruby based tools like chef

Ignoring certificate validation errors

This is equivalent to when you are browsing the internet in a standard browser and try to view a https based site with an invalid cert and the browser gives you a scary warning that you are about to go somewhere potentially dangerous but gives you the option to go there anyway even though that's probably a really bad idea.

If you are testing, especially using a local hypervisor, the risk of a man in the middle attack is pretty small, but you didn't hear that from me. If you do not want to go through the trouble of installing the certificate (we'll go through those steps shortly), here is what you need to do:

Powershell Remoting:

$options=New-PSSessionOption -SkipCACheck -SkipCNCheck
Enter-PSSession -ComputerName <IP or host name> -Credential <user name> -UseSSL -SessionOption $options

WinRM Ruby Gem:

irb
require 'winrm'
w=WinRM::WinRMWebService.new('https://<ip or host>:5986/wsman', :ssl, :user => '<user>', :pass => '<password>', :no_ssl_peer_verification => true)

Chef's knife-windows

knife winrm -m <ip> ipconfig -x <user> -P <password> -t ssl --winrm-ssl-verify-mode verify_none

Installing to the Windows Certificate store

This is the more secure route and will allow you to interact with the machine via powershell remoting without being nagged that your certificate is not valid.

The first thing to do is download the certificate installed on the remote machine:

$webRequest = [Net.WebRequest]::Create("https://<ip or host>:5986/wsman")
try { $webRequest.GetResponse() } catch {}
$cert = $webRequest.ServicePoint.Certificate

Now we have an X509Certificate instance of the certificate used by the remote winrm HTTPS listener. So we install it in our local machine certificate store along with the other root certificates:

$store = New-Object System.Security.Cryptography.X509Certificates.X509Store `
  -ArgumentList  "Root", "LocalMachine"
$store.Open('ReadWrite')
$store.Add($cert)
$store.Close()

Having done this, we can now validate the SSL connection with Test-WSMan:

C:\> Test-WSMan -ComputerName 192.168.43.166 -UseSSL
wsmid        : http://schemas.dmtf.org/wbem/wsman/identity/1/wsmanidentity.xsd
ProtocolVersion : http://schemas.dmtf.org/wbem/wsman/1/wsman.xsd
ProductVendor   : Microsoft Corporation
ProductVersion  : OS: 0.0.0 SP: 0.0 Stack: 3.0

Now we can use tools like powershell remoting or winrs to talk to the remote machine.

Exporting the certificate to a .pem/.cer file

The above certificate store solution works great on windows for windows tools, but it won't help for many cross platform scenarios like connecting from non-windows or using chef tools like knife-windows. The WinRM gem used by tools like Chef and Vagrant take a certificate file which is expected to be a base 64 encoded public key only certificate file. It commonly has a .pem, .cer, or .crt extension.

On windows you can export the X509Certificate we downloaded above to such a file by using the following lines of powershell:

"-----BEGIN CERTIFICATE-----" | Out-File cert.pem -Encoding ascii
[Convert]::ToBase64String($cert.Export('cert'), 'InsertLineBreaks') |
  Out-File .\cert.pem -Append -Encoding ascii
"-----END CERTIFICATE-----" | Out-File cert.pem -Encoding ascii -Append

With this file you could use Chef's knife winrm command from the knife-windows gem to run commands on a windows node:

knife winrm -m 192.168.1.153 ipconfig -x administrator -P Pass@word1 -t ssl -f cert.pem

Problems with New-SelfSignedCertificate and openssl

If the certificate on the server was generated using New-SelfSignedCertificate, cross platform tools that use openssl libraries may fail to verify the certificate unless New-SelfSignedCertificate was used with the -CloneCert argument and passed a certificate that includes a BasicConstraint property identifying it as a CA. Viewing the certificate's properties in the certificate manager GUI, the certificate should contain this:

certCA.PNG

There are are several alternatives to the convenient New-SelfSignedCertificate cmdlet if you need a cert that must be verified with openssl libraries:

  1. Disable peer verification (not recommended) as shown earlier
  2. Create a private/public key certificate using openssl's req command and then use openssl pkcs12 to combine those 2 files to a pfx file that can be imported to the winrm listener's certificate store. Note: Make sure to include the "Server Authentication" Extended Key Usage (EKU) not added by default
  3. Use the handy New-SelfSignedCertificateEx available from the Technet Script Center and provides finer grained control of the certificate properties and make sure to use the -IsCA argument:
. .\New-SelfSignedCertificateEx.ps1
New-SelfsignedCertificateEx -Subject "CN=$env:computername" `
 -EKU "Server Authentication" -StoreLocation LocalMachine `
 -StoreName My -IsCA $true

Exporting the self signed certificate on non-windows

If you are not on a windows machine, all this powershell is going to produce far different output than what is desirable. However, its actually even simpler to do this with the openssl s_client command:

openssl s_client -connect <ip or host name>:5986 -showcerts </dev/null 2>/dev/null|openssl x509 -outform PEM >mycertfile.pem

The output mycertfile.pem can now be passed to the -f argument of knife-windows commands to execute commands via winrm:

mwrock@ubuwrock:~$ openssl s_client -connect 192.168.1.155:5986 -showcerts </dev/null 2>/dev/null|openssl x509 -outform PEM >mycertfile.pem
mwrock@ubuwrock:~$ knife winrm -m 192.168.1.155 ipconfig -x administrator -P Pass@word1 -t ssl -f ~/mycertfile.pem
WARNING: No knife configuration file found
192.168.1.155
192.168.1.155 Windows IP Configuration
192.168.1.155
192.168.1.155
192.168.1.155 Ethernet adapter Ethernet:
192.168.1.155
192.168.1.155    Connection-specific DNS Suffix  . :
192.168.1.155    Link-local IPv6 Address . . . . . : fe80::6c3f:586a:bdc0:5b4c%12
192.168.1.155    IPv4 Address. . . . . . . . . . . : 192.168.1.155
192.168.1.155    Subnet Mask . . . . . . . . . . . : 255.255.255.0

Authentication

As you can probably tell so far, alot can go wrong and there are several moving parts to establishing a successful connection with a remote windows machine over WinRM. However, we are not there yet. Most of the gotchas here are when you are using HTTP instead of HTTPS and you are not domain joined. This tends to describe 95% of the dev/test scenarios I come in contact with.

As we saw above, there is quite a bit of ceremony involved in getting SSL just right and running WinRM over HTTPS. Lets be clear: its the right thing to do especially in production. However, you can avoid the ceremony but that just means there is other ceremonial sacrifices to be made. At this point, if you are connecting over HTTPS, authentication is pretty straight forward. If not, there are often additional steps to take. However these additional steps tend to be less friction laden, but more security heinous, than the SSL setup.

HTTP, Basic Authentication and cross-platform

Both the Ruby WinRM gem and the Go winrm package do not interact with the native windows APIs needed to make Negotiate authentication possible and therefore must use Basic Authentication when using the HTTP transport. So unless you are either using native windows WinRM via winrs or powershell remoting or using knife-windows on a windows client (more on this in a bit), you must tweak some of the WinRM settings on the remote windows server to allow plain text basic authentication over HTTP.

Here are the commands to run:

winrm set winrm/config/client/auth '@{Basic="true"}'
winrm set winrm/config/service/auth '@{Basic="true"}'
winrm set winrm/config/service '@{AllowUnencrypted="true"}'

One bit of easy guidance here is that if you can't use Negotiate authentication, you really really should be using HTTPS with verifiable certificates. However if you are just trying to get off the ground with local Vagrant boxes and you find yourself in a situation getting WinRM Authentication errors but know you are passing the correct credentials, please try running these on the remote machine before inflicting personal bodily harm.

I always include these commands in windows packer test images because that's what packer and vagrant need to talk to a windows box since they always use HTTP and are cross platform without access to the Negotiate APIs.

This is quite the security hole indeed but usually tempered by the fact that it is on a test box in a NATed network on the local host. Perhaps we are due for a vagrant PR allowing one to pass SSL options in the Vagrantfile. That would be simple to add.

Chef's winrm-s gem using windows negotiate on windows

Chef uses a separate gem that mostly monkey patches the WinRM gem if it sees that winrm is authenticating from windows to windows. In this case it leverages win32 APIs to use Negotiate authentication instead of Basic Authentication and therefore the above winrm settings can be avoided. However, if accessing from a linux client, it will drop to Basic Authentication and the settings shown above must then be present.

Local user accounts

Windows remote communication tends to be easier when you are using domain accounts. This is because domains create implicit trust boundaries so windows adds restrictions when using local accounts. Unfortunately the error messages you can sometimes get do not at all make it clear what you need to do to get past these restrictions. There are two issues with local accounts that I will mention:

Qualifying user names with the "local domain"

One thing that has previously tripped me up and I have seen others struggle with is related to authenticating local users. You may have a local user (not a domain user) and it is getting access denied errors trying to login. However if you prefix the user name with './', then the error is resolved. The './' prefix is equivelent to '<local host or ip>\<user>'. Note that the './' prefix may not work in a windows login dialog box. In that case use the host name or IP address of the remote machine instead of '.'.

Setting the LocalAccountTokenFilterPolicy registry setting

This does not apply to the built in administrator account. So if you only logon as administrator, you will not run into this. However lets say I create a local mwrock account and even add this account to the local Administrators security group. If I try to connect remotely with this account using the default remoting settings on the server, I will get an Access Denied error if using powershell remoting or a WinRMAuthentication error if using the winrm gem. This is typically only visible on 2012R2. By default, the winrm service is running on a newly installed 2012R2 machine with an HTTP listener but without the LocalAccountTokenFilterPolicy enabled, while 2008R2 and client SKUs have no winrm service running at all. Running winrm quickconfig or Enable-PSRemoting on any OS will enable the LocalAccountTokenFilterPolicy, which will allow local accounts to logon. This simply sets the LocalAccountTokenFilterPolicy subkey of HKLM\software\Microsoft\Windows\CurrentVersion\Policies\system to 1.

Trusted Hosts with HTTP, non domain joined powershell remoting

There is an additional security restriction imposed by powershell remoting when connected over HTTP on a non domain joined  (work group) environment. You need to add the host name of the machine you are connecting to the list of trusted hosts. This is a white list of hosts you consider ok to talk to. If there are many, you can comma delimit the list. You can also include wildcards for domains and subdomains:

Set-Item "wsman:\localhost\client\trustedhosts" -Value 'mymachine,*.mydomain.com' -Force

Setting your trusted hosts list a single wildcard would allow all hosts:

Set-Item "wsman:\localhost\client\trustedhosts" -Value '*' -Force

You would only do this if you simply interact with local test instances and even that is suspect.

Double-Hop Authentication

Lets say you want to access a UNC share on the box you have connected to or in any other way use your current credentials to access another machine. This will typically fail with the ever informative Access Denied error. You can enable whats called credential delegation by using a different type of authentication mechanism called CredSSP. This is only available using Powershell remoting and requires extra configuration on both the client and remote machines.

On the remote machine, run:

Enable-WSManCredSSP -Role Server

On the client there are a few things to set up. First, similar to the server, you need to enable it but also specify a white list of endpoints.  This is formatted similar to the trusted hosts discussed above:

Enable-WSManCredSSP -Role Client -DelegateComputer 'my_trusted_host.me.org'

Next you need to edit the local security policy on the machine to allow delegation to specific endpoints. In the gpedit GUI, navigate to Computer Configuration > Administrative Templates > System > Credential Delegation and enable "Allow Delegating Fresh Credentials". Further, you need to add the endpoints you authorize delegation to. You can add WSMAN\*.my_domain.com to allow all endpoints in the my_domain.com domain. You can add as many entries as you need.

Certificate based authentication

Even more secure than usernames and passwords is using a x509 certificate signed by a trusted certificate authority. Many use this techniue when using SSH with SSH keys. Well, the same is possible with WinRM. I won't get into the details here since I have blogged separately on this topic here.

Windows Nano TP 3

As of the date of this post, Microsoft has released technical preview 3 of its new Windows Nano flavored server OS. I have previously blogged about this super light weight os but here is a winrm related bit of info that is unique to nano as of this version at least: there are no tools to tweak the winrm settings. Neither the winrm command or the winrm powershell provider are present.

In order to make changes, you must edit the registry directly. These settings are located at:

 HKLM\Software\Microsoft\Windows\CurrentVersion\WSMAN

Other Caveats

I've written an entire post on this topic and will not go into the same detail here. Basically I have found that once winrm is correctly configured, there is still a small subset of operations that will fail in a remote context. Any interaction with wsus is an example but please read my previous post for more. When you hit one of these road blocks, you typically have two options:

  1. Use a Scheduled Task to execute the command in a local context
  2. Install an SSH server and use that

The second option appears to be imminent and in the end will make all of this easier and perhaps render this post irrelevant.

A Packer template for Windows Nano server weighing 300MB by Matt Wrock

Since the dawn of time, human kind has struggled to produce Windows  images under a gigabyte and failed. We have all read the stories from the early Upanishads, we have studied the Zoroastrian calculations, recited the talmudic laws governing SxS yet continue to grow ever older as we wait for our windows pets to find an IP for us to RDP to. Well hopefully these days are nearing an end. I think its pretty encouraging that I can now package a windows VM in a 300MB vagrant package.

This post is going to walk through the details and pitfalls of creating a Packer template for Windows Nano Vagrant boxes. I have already posted on the basics of Packer templates and Vagrant box packaging. This post will assume some knowledge of Packer and Vagrant basic concepts.

Windows Nano, a smaller windows

Windows nano finally brings us vm images of similar relative size to its linux cousins. The one I built for VirtualBox is about 307MB. This is 10x smaller than the smallest 2012R2 box I have packaged at around 3GB.

Why so much smaller?

Here are a few highlights:

  • No GUI. Really this time. No notepad and no cmd.exe window. Its windows without windows.
  • No SysWow64. Nano completely abandons 32 bit compatibility, but I'm bummed there will be no SysWowWow128.
  • Minimal packages and features in the base image. The windows team have stripped down this OS to a minimal set of APIs and features. You will likely find some of your go to utilities missing here, but thats ok because it likely has another and probably better API that accomplishes the same functionality.

Basically Microsoft is letting Backwards compatibility slide on this one and producing an OS that does not try to support legacy systems, but is far more effective at managing server cattle.

Installation challenges

Windows Nano does not come packaged in a separate ISO nor does it bundle as a separate image inside the ISO like most of the other server SKUs such as Standard Server or Data Center. Instead you need to build the image from bits in the installation media and extract that.

If you want to host Nano on Hyper-V, running the scripts to build and extract this image are shockingly easy. Even if you want to build a VirtualBox VM, things are not so bad. However there are more moving parts and some elusive gotchas when preparing a Packer template.

Just show me the template

Before I go into detail, mainly as a cathartic act of self governed therapy to recover from the past week of yak shaving, lets just show how to start producing and consuming packer templates for Nano images today. The template can be found here in my packer-templates repository. I'm going to walk through the template and the included scripts but that is optional reading.

I'm running Packer 0.8.2 and Virtualbox 5.0.4 on Windows 8.1 to build the template.

Known Issues

There were several snags but here are a couple items that just didn't work and may trip you up when you first try to build the template or Vagrant up:

  1. I had upgraded to the latest Packer version, 0.8.6 at the time of this post, and had issues with WinRM connectivity so reverted back to 0.8.2. I do plan to investigate that and alter the template to comply with the latest version or file issue(s) and/or PRs if necessary.
  2. Vagrant up will fail but may succeed to the extent that you need it to. It will fail to establish a WinRM connection with the box but it will create a connectable box and can also destroy it. This does mean that you will not have luck using any vagrant provisioners or packer provisioners. For me, that's fine for now.

The reason for the latter issue is that the WinRM service in nano expects requests to use codepage 65001 (UTF-8) and will refuse requests that do not. The WinRM ruby gem used by Vagrant uses codepage 437 and you will see exceptions when it tries to connect. Previous windows versions have accepted both codepages and I have heard that this will be the case with nano by the time it officially ships.

Connecting and interacting with the Nano Server

I have been connecting via powershell remoting. That of coarse assumes you are connecting from Windows. Despite what I said above about the limitations of the ruby WinRM gem, it does have a way to override the 437 codepage. However, doing so is not particularly friendly and means you cannot use alot of the helper methods in the gem.

To connect via powershell, run:

# Enable powershell remoting if it is not already enabled
Enable-PSRemoting -Force

# You may change "*" to the name or IP of the machine you want to connect to
Set-Item "wsman:\localhost\client\trustedhosts" -Value "*" -Force

# the password is vagrant
$creds = Get-Credential vagrant

# this assumes you are using NAT'd network which is the Virtualbox default
# Use the computername or IP of the machine mand skip the port arg
# if you are using Hyper-V or another non NAT network
Enter-PSSession -Computername localhost -Port 55985 -Credential $creds

If you do not have a windows environment from which to run a remote powershell session, you can just create a second VM.

Deploying Nano manually

Before going through the packer template, it would be helpful to understand how one would build a nano server without packer or by hand. Its a bit more involved that giving packer an answer file. There are a few different ways to do this and some paths work better for different scenarios. I'll just layout the procedure for building Nano on virtualbox.

From Windows hosts

Ben Armstrong has a great post on creating nano VMs for Hyper-V. If you are on Windows and want to create Virtualbox VMs, the instructions for creating the nano image are nearly identical. The key change is to specify -OEMDrivers instead of -GuestDrivers in the New-NanoServerImage command. GuestDrivers have the minimal set of drivers needed for Hyper-V. While it can also Create a VirtualBox image that loads and shows the initial nano login screen, I was unable to actually login. Using -OEMDrivers adds a larger set of drivers and allows the box to function in VirtualBox. Its interesting to note that a Hyper-V Vagrant Box build using GuestDrivers is 60MB smaller than one using OEMDrivers.

Here is a script that will pop out a VHD after you mount the Windows Server 2016 Technical Preview 3 ISO:

cd d:\NanoServer
. .\new-nanoserverimage.ps1
mkdir c:\dev\nano
$adminPassword = ConvertTo-SecureString "Pass@word1" -AsPlainText -Force

New-NanoServerImage `
  -MediaPath D:\ `
  -BasePath c:\dev\nano\Base `
  -TargetPath c:\dev\nano\Nano-image `
  -ComputerName Nano `
  -OEMDrivers `
  -ReverseForwarders `
  -AdministratorPassword $adminPassword

Now create a new Virtualbox VM and attach to the VHD created above.

From Mac or Linux hosts

You have no powershell here so the instructions are different. Basically you need to either create or use an existing windows VM. Make sure you have a shared folder setup so that you can easily copy the nano VHD from the windows VM to your host and then create the Virtualbox vm using that VHD as its storage.

That all seems easy, why Packer?

So you may very well be wondering at this point, "Its just a handful of steps to create a nano VM. Your packer template has multiple scripts and probably 100 lines of powershell. What is the advantage of using Packer?"

First there might not be one. If you want to create one instance and play around on the same host and don't care about supporting other instances on other hosts or have scenarios where you need to ensure that multiple nodes come from an identically built image, then packer may not be the right tool for you.

Here are some scenarios where packer shines:

  • Guaranteed  identical images - If all images come from the same template, you know that they are all the same and you have "executable documentation" on how they were produced.
  • Immutable Infrastructure - If I have production clusters that I routinely tear down and rebuild/replace or a continuous delivery pipeline that involves running tests on ephemeral VMs that are freshly built for each test suite, I can't be futzing around on each node, copying WIMs and VHDs.
  • Multi-Platform - If I need to create both linux and windows environments, I'd prefer to use a single tool to pump out the base images.
  • Single click, low friction box sharing - For the thousands and thousands of vagrant users out there, many of whom do not spend much time on windows, giving them a vagrant box is the best way to ensure they have a positive experience provisioning the right image and Packer is the best tool for creating vagrant boxes.

Walking through the template

So now we will step through the key parts of the template and scripts highlighting areas that stray from the practices you would normally see in windows template work and dwelling on nano behavior that may catch you off guard.

High level flow

First a quick summary of what the template does:

  1. Installs Windows Server 2016 Core on a new Virtualbox VM
  2. Powershell script is launched from the answer file that creates the Nano image, mounts it, copies it to an empty partition and then updates the default boot record to boot from that partition.
  3. Machine reboots into nano
  4. Some winrm tweaks are made, the Windows Server 2016 partition is removed and the nano partition extended over it.
  5. "Zap" unused space on disk.
  6. Packer archives the VM to vmdk and packages to a .box file.

Three initial disk partitions

We assume that there is no windows anywhere (because this reflects many build environments) so we will be installing two operating systems: The larger Windows Server 2016 and Nano. We build nano from the former. Our third partition is a system partition. Its easier to have a separate partition for the master boot record that we don't have to touch or move around in the process.

It is important that the Windows Server 2016 Partition be physically located at the end of the disk. Otherwise we will be stuck with a gap in the disk after we remove it.

One may find it odd that our Autounattend.xml file installs Server 2016 from an image named "Windows Server 2012 R2 SERVERDATACENTERCORE." It is odd but correct. That's cool. This is all beta still and I'm sure this is just one detail yet to be ironed out. There is probably some horrendously friction laden process involved to change the image name. One thing that tripped me up a bit is that there are 4 images in the ISO:

C:\dev\test> Dism /Get-ImageInfo /ImageFile:d:\sources\install.wim

Deployment Image Servicing and Management tool
Version: 10.0.10240.16384

Details for image : d:\sources\install.wim

Index : 1
Name : Windows Server 2012 R2 SERVERSTANDARDCORE
Description : Windows Server 2012 R2 SERVERSTANDARDCORE
Size : 9,621,044,487 bytes

Index : 2
Name : Windows Server 2012 R2 SERVERSTANDARD
Description : Windows Server 2012 R2 SERVERSTANDARD
Size : 13,850,658,303 bytes

Index : 3
Name : Windows Server 2012 R2 SERVERDATACENTERCORE
Description : Windows Server 2012 R2 SERVERDATACENTERCORE
Size : 9,586,595,551 bytes

Index : 4
Name : Windows Server 2012 R2 SERVERDATACENTER
Description : Windows Server 2012 R2 SERVERDATACENTER
Size : 13,847,190,006 bytes

The operation completed successfully.

Images 3 and 4, the DataCenter ones are the only ones installable from an answer file.

Building Nano

I think .\scripts\nano_create.ps1 is pretty straight forward. We build the nano image as discussed earlier in this post and copy it to a permanent partition.

What might seem odd is the last few lines that setup winrm. Why do we do this when we are about to blow away this OS and never use winrm? We do this because of the way that the VirtualBox builder works in packer. It is currently waiting for winrm to become available before moving forward in the build process. So this is done simply as a signal to packer. A signal to what? 

The Virtualbox builder will now invoke any "provisioners" in the template and then issue the template's shutdown command. We dont use any provisioners which brings us to the our first road bump.

Nano forces a codepage incompatible with packer and vagrant

On the one hand it is good to see Nano using a Utf-8 code page (65001). However, previous versions of Windows have traditionally used the old MS-DOS code page (437) and both the ruby WinRM gem used by Vagrant and the GO WinRM package used by packer are hard coded to use 437. At this time, Nano will not accept 437 so any attempt to establish WinRM communication by Vagrant and Packer will fail with htis error:

An error occurred executing a remote WinRM command.

Shell: powershell
Command: hostname
if ($?) { exit 0 } else { if($LASTEXITCODE) { exit $LASTEXITCODE } else { exit 1 } }
Message: [WSMAN ERROR CODE: 2150859072]: <f:WSManFault Code='2150859072' Machine='192.168.1.130' xmlns:f='http://schemas.microsoft.com/wbem/wsman/1/wsmanfault'><f:Message><f:ProviderFault path='%systemroot%\system32\winrscmd.dll' provider='Shell cmd plugin'>The WinRS client cannot process the request. The server cannot set Code Page. You may want to use the CHCP command to change the client Code Page to 437 and receive the results in English. </f:ProviderFault></f:Message></f:WSManFault>

 This means packer provisioners will not work and we need to take a different route to provisioning.

One may think this a show stopper for provisioning Windows images and it is for some scenarios but for my initial packer use case, that's OK and I hear that Nano will accept 437 before it "ships." Note that this only seems to be the case with Nano and not Windows Server 2016.

Cut off from Winrm Configuration APIs

Both Vagrant and Packer expect to communicate over unencrypted WinRM using Basic Authentication. I know I just said that Vagrant and Packer cant talk WinRM at all but I reached a challenge with WinRM before discovering the codepage issue. When trying to allow unencrypted WinRM and basic auth, I found that the two most popular methods for tweaking winrm were not usable on nano.

These methods include:

  1. Using the winrm command line utility
  2. Using the WSMan Powershell provider

The first simply does not exist. Now the winrm command is c:\windows\system32\winrm.cmd which is a tiny thin wrapper around cscript.exe, the scripting engine used to run vbscripts. Well there is no cscript or wscript so no visual basic runtime at all. Interestingly, winrm.vbs does exist. Feels like a sick joke.

So we could use the COM API to do the configuration. If you like COM constants and HRESULTS, this is totally for you. The easier approach at least for my purposes is to simply flip the registry keys to get the settings I want:

REG ADD HKLM\Software\Microsoft\Windows\CurrentVersion\WSMAN\Service /v allow_unencrypted /t REG_DWORD /d 1 /f

REG ADD HKLM\Software\Microsoft\Windows\CurrentVersion\WSMAN\Service /v auth_basic /t REG_DWORD /d 1 /f

REG ADD HKLM\Software\Microsoft\Windows\CurrentVersion\WSMAN\Client /v auth_basic /t REG_DWORD /d 1 /f

No modules loaded in Powershell scripts run from SetupComplete.cmd

SetupComplete.cmd is a special file that can sit in windows\setup\scripts and if it does, will be run on first boot and then never again. We use this because as mentioned before, we cant use Packer provisioners since winrm is not an option. I have never used this file before so its possible this is not specific to nano but that would be weird. I was wondering why the powershell script I called from this file was not being called at all. Everything seemed to go fine, no errors but my code was definitely not being called. Kinda like debugging scheduled tasks.

First, Start-Transcript is not present on Nano. So that was to blame for the lack of errors. I switched to old school redirection:

cmd.exe /c C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe -command . c:\windows\setup\scripts\nano_cleanup.ps1 > c:\windows\setup\scripts\cleanup.txt

Next I started seeing errors about other missing cmdlets like Out-File. I thought that seemed strange and had the script run Get-Module. The result was an empty list of modules so I added loading of the basic PS modules and the storage module, which would normally be auto loaded into my session:

Import-Module C:\windows\system32\windowspowershell\v1.0\Modules\Microsoft.PowerShell.Utility\Microsoft.PowerShell.Utility.psd1
Import-Module C:\windows\system32\windowspowershell\v1.0\Modules\Microsoft.PowerShell.Management\Microsoft.PowerShell.Management.psd1
Import-Module C:\windows\system32\windowspowershell\v1.0\Modules\Storage\Storage.psd1

Not everything you expect is on Nano but likely everything you need

As I mentioned above, Start-Transcipt and cscrpt.exe were missing, but thats not the only things. Here are some other commands I noticed were gone:

  • diskpart
  • bcdboot
  • Get-WMIObject
  • Restart-Computer

I'm sure there are plenty others but these all have alternatives that I could use.

Different arguments to powershell.exe

A powershell /? will reveal a command syntax slightly different from what one is used to:

C:\dev\test> Enter-PSSession -ComputerName 192.168.1.134 -Credential $c
[192.168.1.134]: PS C:\Users\vagrant\Documents> powershell /?
USAGE: powershell [-Verbose] [-Debug] [-Command] <CommandLine>



  CoreCLR is searched for in the directory that powershell.exe is in,

  then in %windir%\system32\CoreClrPowerShellExt\v1.0\.

No -ExecutionProfile,  no -File and others are missing too. I imagine this could break some existing scripts.

No 32 bit

I knew this going in but was still caught off guard when sdelete.exe failed to work. I use sdelete, a sysinternals utility, for freeing empty space on disk which leads to a dramatically smaller image size when we are done. Well I'm guessing it was compiled for 32 bit because I got complaints about the executable image being incomatible with nano.

In the end this turned out to be for the best, I found a pure powershell alternative to sdelete which I adapted for my limited needs:

$FilePath="c:\zero.tmp"
$Volume= Get-Volume -DriveLetter C
$ArraySize= 64kb
$SpaceToLeave= $Volume.Size * 0.05
$FileSize= $Volume.SizeRemaining - $SpacetoLeave
$ZeroArray= new-object byte[]($ArraySize)
 
$Stream= [io.File]::OpenWrite($FilePath)
try {
   $CurFileSize = 0
    while($CurFileSize -lt $FileSize) {
        $Stream.Write($ZeroArray,0, $ZeroArray.Length)
        $CurFileSize +=$ZeroArray.Length
    }
} 
finally {
    if($Stream) {
        $Stream.Close()
    }
}
 
Del $FilePath

Blue Screens of Death

So I finally got the box built and was generally delighted with its size (310MB). However when I launched the vagrant box, the machine blue screened reporting that a critical process had died. All of the above issues had made this a longer haul than I expected but it turned out that troubleshooting the bluescreens was the biggest time suck and sent me on hours of wild goose chases and red herrings. I almost wrote a separate post dedicated to this issue, but I'm gonna try to keep it relatively brief here (not a natural skill).

What was frustrating here is I knew this could work. I had several successful tests but with slightly different execution flows which I was tweaking along the way, but it certainly did not like my final template and scripts. I would get the CRITICAL_PROCESS_DIED blue screen twice and then it would stop at a display of error code 0xc0000225 and the message "a required device isn't connected or can't be accessed."

Based on some searching I thought that there was something wrong somewhere in the boot record. After all I was messing with deleting and resizing partitions and changing the boot record compounded by the fact that I am not an expert in that area. However lots of futzing with diskpoart, cdbedit, cdbboot, and bootrec got me nowhere. I also downloaded the Technical preview 3 debug symbols to analyze the memory dump but there was nothing interesting there. Just a report that the process that died was wininit.exe.

Trying to manually reproduce this I found that the final machine produced by packer was just fine. Packer exports the VM to a new .vmdk virtual disk. Trying to create a machine from that would produce blue screens. Further, manually cloning a .vdi had the same effect - more blue screens. Finally, I tried attaching a new VM to the same disk that worked and made sure the vm settings were identical to the working machine. This failed too which seemed very odd. I then discovered that removing the working machine and manually editing the broken machine 's .box xml to have the same UUID as the working one, fixed things. After more researching, I found out that Virtualbox has a modifiable setting called a Hardware UUID. If none is supplied, it uses the box's UUID. So I cloned another box from the working machine, validated that it blue screened and then ran:

vboxmanage modifyvm --hardwareid "{same uuid as the working box}"

Voila! The box came to life. So I could fix this by telling the packer template to stamp an artificial guid at startup:

    "vboxmanage": [
      [ "modifyvm", "{{.Name}}", "--natpf1", "guest_winrm,tcp,,55985,,5985" ],
      [ "modifyvm", "{{.Name}}", "--memory", "2048" ],
      [ "modifyvm", "{{.Name}}", "--vram", "36" ],
      [ "modifyvm", "{{.Name}}", "--cpus", "2" ],
      [ "modifyvm", "{{.Name}}", "--hardwareuuid", "02f110e7-369a-4bbc-bbe6-6f0b6864ccb6" ]
    ],

then add the exact same guid to the Vagrantfile template:

config.vm.provider "virtualbox" do |vb|
  vb.customize ["modifyvm", :id, "--hardwareuuid", "02f110e7-369a-4bbc-bbe6-6f0b6864ccb6"]
  vb.gui = true
  vb.memory = "1024"
 end

This ensures that vagrant "up"s the box with the same hardware UUID that it was created with. The actual id does not matter and I don't think there is any harm, at least for test purposes, in having duplicate hardware uuids.

I hoped that a similar strategy would work for Hyper-V by changing its BIOSGUID using a powershell script like this:

#Virtual System Management Service
$VSMS = Get-CimInstance -Namespace root/virtualization/v2 -Class Msvm_VirtualSystemManagementService
 
#Virtual Machine
$VM = Get-CimInstance -Namespace root/virtualization/v2  -Class Msvm_ComputerSystem -Filter "ElementName='Demo-VM'"
 
#Setting Data
$SD = $vm | Get-CimAssociatedInstance -ResultClassName Msvm_VirtualSystemSettingData -Association Msvm_SettingsDefineState

#Update bios uuid
$SD.BIOSGUID = "some guid"
 
#Create embedded instance
$cimSerializer = [Microsoft.Management.Infrastructure.Serialization.CimSerializer]::Create()
$serializedInstance = $cimSerializer.Serialize($SD, [Microsoft.Management.Infrastructure.Serialization.InstanceSerializationOptions]::None)
$embeddedInstanceString = [System.Text.Encoding]::Unicode.GetString($serializedInstance)
 
#Modify the system settings
Invoke-CimMethod -CimInstance $VSMS -MethodName ModifySystemSettings @{SystemSettings = $embeddedInstanceString}

Thanks to this post for an example.  It did not work but the Hyper-V blue screens seem to be "self healing". I post more details in the vagrant box readme on atlas.

No sysprep

I suspect that the above is somehow connected to OS activation. I saw lots of complaints on google about needing to do something similar with the hardware uuid in order to preserve the windows activation of a cloned machine. I also noted that if I cloned the box manually before packer rebooted into nano thereby letting the clone run the initial setup, things worked.

Ideally, the fix here would be to leave the hardware uuid alone and just sysprep the machine as the final packer step. This is what I do for 2012R2. However from what I can tell, there is no sysprep available for nano. I really hope that there will be.

Finally a version of windows for managing cattle servers

You may have heard the cattle vs. pets analogy that compares "special snowflake" servers to cloud based clusters. The idea is that one perceives cloud instances like cattle. There is no emotional attachment or special treatment given to one server over another. At this scale, one cant afford to. We don't have the time or resources to be accessing our instances via a remote desktop and clicking buttons or dragging windows. If one becomes sick, we put it out of its misery quickly and replace it.

Nano is light weight and made to be accessed remotely. Really interested to see how this progresses.