Retiring the Boxstarter Web Launcher by Matt Wrock

install.PNG

The "Web Launcher" which installs and runs Boxstarter using a "click once" install URL, will soon be retiring. This post will discuss why I have decided to sunset this feature and how those who regularly use this feature can access the same functionality via other methods.

What is the Web Launcher

When I originally wrote boxstarter, one of the primary design goals was that one could jump on a fresh Window OS install, and launch their configuration scripts with almost no effort or pre-bootstrapping. The click once install technology seemed like a good fit and indeed, I think it has served this purpose well. With a simple, easy to remember URL, one can install boxstarter and run a boxstarter package. This only works when invoked via internet explorer, and while I do not use IE as my default browser, this restriction is completely viable for a clean install where IE is guaranteed to be present.

Why retire a good thing?

Again, the click once installer has been a very successful boxstarter feature. The only hassle it has really caused has been for users wanting to use it from Chrome or Firefox. It has also been known to trigger false positive malware detection from Windows Smart Screen for reasons that usually baffle me. Both of these issues are really minor.

I am retiring it due to cost and time. Using click-once requires that I maintain a Software Signing certificate. I used to be able to obtain one for free, but the provider I have used has started to charge and made the renewal process particularly burdensome. The friction is not unreasonable given the nature of the company and I am truly grateful for the years of free service. Further, the click once installer requires some server side logic requiring me to pay hosting fees. As a former Microsoft Employee, I could host this on Azure for free but I no longer benefit from free Azure services.

I don't at all mean to come off like I'm on the brink of bankruptcy or anything like that. However, it seems unwise to pay hundreds of dollars a year for cert renewals and hosting fees when the fact of the matter is that almost all of this value can be accessed for free.

When will the Web Launcher retire?

I do not intend to yank the installer off the Boxstarter.org site right away. I'll likely keep it there for at least a few months. However, I will not be renewing the code signing certificate which means that starting June 28th 2017, Windows will warn users that the certificate is from an untrusted publisher.

I have removed documentation from the Boxstarter.org website that talks about the Web Launcher and replaced that documentation with new instructions for installing Boxstarter over the web and installing packages via boxstarter.

How can I install Boxstarter and install packages via the web without the Web Launcher?

Actually quite easily thanks to powershell. For some time now, I have shipped a bootstrapper.ps1 embedded in a setup.bat file downloadable from the boxstarter.org website. I am making some minor enhancements to this bootstrapper that will make it easy to install the boxstarter modules by simply running:

. { iwr -useb http://boxstarter.org/bootstrapper.ps1 } | iex; get-boxstarter -Force

This will install Chocolatey and even .Net 4.5 if either are not already installed and then install all of the necessary boxstarter modules and even import them into the current shell. The installer will terminate with a warning if you are not running as an administrator or have a Restricted Powershell Execution Policy.

Once this runs successfully, one can use the Install-BoxstarterPackage command to install their package or gist URL

Install-BoxstarterPackage -PackgeName https://gist.githubusercontent.com/mwrock/5e483f46cd15791970bdd3dd221dc179/raw/2632913a757570b576b9945ed04f94b747355b69/gistfile1.txt -DisableReboots

One can consult the command line help of the boxstarter website for details on how to use the command.

I understand this is a tiny bit more involved than the Web Launcher. You cannot both install boxstarter and run a package in a single command and if you don't like to enter a console...well...now you have to.

The reason I did not expose the bootstrapper like this in the first place was that then Powershell v3 where Invoke-WebRequest (aliased iwr) was not at all the norm at the time and the command that accomplishes the same in Powershell v2 was more verbose and awkward:

iex ((New-Object System.Net.WebClient).DownloadString('http://boxstarter.org/bootstrapper.ps1')); get-boxstarter -Force

Now I suspect that the majority of boxstarter users are on Powershell 3 or more likely even higher. If you are still on version 2, you can use the longer command above.

Habitat application portability and understanding dynamic linking of ELF binaries by Matt Wrock

I do not come from a classical computer science background and have spent the vast majority of my career working with Java, C# and Ruby - mostly on Windows. So I have managed to evade the details of exactly how native binaries find their dependencies at compile time and runtime on Linux. It just has not been a concern in the work that I do. If my app complains about missing low level dependencies, I find a binary distribution for Windows (99% of the time these exist and work across all modern Windows platforms) and install the MSI. Hopefully when the app is deployed, those same binary dependencies have been deployed on the production nodes and it would be just super if its the same version.

Recently I joined the Habitat team at Chef and one of the first things I did to get the feel of using Habitat to build software was to start creating Habitat build plans. The first plan I set out to create was .NET Core. I would soon find out that building .NET Core from source on Linux was probably a bad choice for a first plan. It uses clang instead of GCC, it has lots of cmake files that expect binaries to live in /usr/lib and it downloads built executables that do not link to Habitat packaged dependencies. Right out the gate, I got all sorts of various build errors as I plodded forward. Most of these errors centered around a common theme: "I can't find X." There were all sorts of issues beyond linking too that I won't get into here but I'm convinced that if I knew the basics of what this post will attempt to explain, I would have had a MUCH easier time with all the errors and pitfalls I faced.

What is linking and what are ELF binaries?

First lets define our terms:

ELF

There are no "Lord of the Rings" references to be had here. ELF is the Extensible and linkable format and defines how binary files are structured on Linux/Unix. This can include executable files, shared libraries, object files and more. An ELF file contains a set of headers and a number of sections for things like text, data, etc. One of the key roles of an ELF binary is to inform the operating system how to load a program into memory including all of the symbols it must link to.

Linking

Linking is a key part of the process of building an executable. The other key part is compiling. Often we refer to both jointly as "compiling" but they are really two distinct operations. First the compiler takes source code files and turn them into machine language instructions in the form of object files. These object files alone are not very useful to running a program.

Linking takes the object files (some might be from source code you wrote) and links them together with external library files to create a functioning program. If your source code calls a function from an external library, the compiler gleefully assumes that function exists and moves on. If it doesn't exist, don't worry, the linker will let you know.

Often when we hear about linking, two types are mentioned: static and dynamic. Static linking takes the external machine instructions and embeds them directly into the built executable. If all external dependencies of a program were statically linked, there would be only one executable file and no need for any dependent shared object files to be referenced.

However, we usually dynamically link our external dependencies. Dynamic linking does not embed the external code into the final executable. Instead it just points to an external shared object (.so) file (or .dll file on Windows) and loads that code into the running process at runtime. This has the benefit of being able to update external dependencies without having to ship and package your application each time a dependency is updated. Dynamic linking also results in a smaller application binary since it does not contain the external code.

On Unix/Linux systems, the ELF format specifies the metadata that governs what libraries will be linked. These libraries can be in many places on the machine and may exist in more than one place. The metadata in the ELF binary will help determine exactly what files are linked when that binary is executed.

Habitat + dynamic linking = portability

Habitat leverages dynamic linking to provide true application portability. It might not be immediately obvious what this means or why it is important or if it is even a good thing. So lets start by describing how applications typically load their dependencies in a normal environment and the role that configuration management systems like Chef play in these environments.

How you manage dependencies today

Lets say you have written an application that depends on the ZeroMQ library. You might use apt-get or yum to install ZeroMQ and its binaries are likely dropped somewhere into /usr. Now you can build and run your application and it will consume the ZeroMQ libraries installed. Unless it is told otherwise, the linker will scan the trusted Linux library locations for shared object files to link.

To illustrate this, I have built ZeroMQ from source and it produced libzmq.so.5 and put it in /usr/local/lib. If I examine that shared object with ldd, I can see where it links to its dependencies:

mwrock@ultrawrock:~$ ldd /usr/local/lib/libzmq.so.5
linux-vdso.so.1 =>  (0x00007ffffe05f000)
libunwind.so.8 => /usr/lib/x86_64-linux-gnu/libunwind.so.8 (0x00007f7e92370000)
libsodium.so.18 => /usr/local/lib/libsodium.so.18 (0x00007f7e92100000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f7e91ef0000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f7e91cd0000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f7e91ac0000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f7e917a0000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f7e91490000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f7e910c0000)
/lib64/ld-linux-x86-64.so.2 (0x00007f7e92a00000)
liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007f7e90e80000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f7e90c60000)

They are all linked to the dependencies found in the Linux trusted library locations.

Now the time comes to move to production and just like you needed to install the ZeroMQ libraries in your dev environment, you will need to do the same on your production nodes. We all know this drill and we have probably all been burned at some point - something new is deployed to production and either its dependencies were not there or they were but they were the wrong version.

Configuration Management as solution

Chef fixes this right? Kind of...it's complicated.

You can absolutely have Chef make sure that your application's dependencies are installed with the correct versions. But what if you have different applications or services on the same node that depend on a different version of the same dependency? It may not be possible to have multiple versions coexist in /usr/lib. Maybe your new version will work or maybe it won't. Especially for some of the lower level dependencies, there is simply no guarantee that compatible versions will exist. If anything, there is one guarantee: different distros will have different versions.

Keeping the automation with the application

Even more important - you want these dependencies to travel with your application. Ideally I want to install my application and know by virtue of installing it, everything it needs is there and has not stomped over the dependencies of anything else. I do not want to delegate the installation of its dependencies and the knowledge of which version to install to a separate management layer. Instead, Habitat binds dependencies with the application so that there is no question what your application needs and installing your application includes the installation of all of its dependencies. Lets look at how this works and see how dynamic linking is at play.

When you build a habitat plan, your plan will specify each dependency required by your application in your application's plan:

pkg_deps=(core/glibc core/gcc-libs core/libsodium)

Then when Habitat packages your build into its final, deployable artifact (.hart file), that artifact will include a list of every dependent Habitat package (including the exact version and release):

[35][default:/src:0]# cat /hab/pkgs/mwrock/zeromq/4.1.4/20161225135834/DEPS
core/glibc/2.22/20160612063629
core/gcc-libs/5.2.0/20161208223920
core/libsodium/1.0.8/20161214075415

At install time, Habitat installs your application package and also the packages included in its dependency manifest (the DEPS file shown above) in the pkgs folder under Habitat's root location. Here it will not conflict with any previously installed binaries on the node that might live in /usr. Further, the Habitat build process links your application to these exact package dependencies and ensures that at runtime, these are the exact binaries your application will load.

[36][default:/src:0]# ldd /hab/pkgs/mwrock/zeromq/4.1.4/20161225135834/lib/libzmq.so.5
linux-vdso.so.1 (0x00007fffd173c000)
libsodium.so.18 => /hab/pkgs/core/libsodium/1.0.8/20161214075415/lib/libsodium.so.18 (0x00007f8f47ea4000)
librt.so.1 => /hab/pkgs/core/glibc/2.22/20160612063629/lib/librt.so.1 (0x00007f8f47c9c000)
libpthread.so.0 => /hab/pkgs/core/glibc/2.22/20160612063629/lib/libpthread.so.0 (0x00007f8f47a7e000)
libstdc++.so.6 => /hab/pkgs/core/gcc-libs/5.2.0/20161208223920/lib/libstdc++.so.6 (0x00007f8f47704000)
libm.so.6 => /hab/pkgs/core/glibc/2.22/20160612063629/lib/libm.so.6 (0x00007f8f47406000)
libc.so.6 => /hab/pkgs/core/glibc/2.22/20160612063629/lib/libc.so.6 (0x00007f8f47061000)
libgcc_s.so.1 => /hab/pkgs/core/gcc-libs/5.2.0/20161208223920/lib/libgcc_s.so.1 (0x00007f8f46e4b000)
/hab/pkgs/core/glibc/2.22/20160612063629/lib64/ld-linux-x86-64.so.2 (0x0000560174705000)

Habitat guarantees that the same binaries that were linked at build time, will be linked at run time. Even better, it just happens and you don't need a separate management layer to enforce this.

This is how a Habitat package provides portability. Installing and running a Habitat package brings all of its dependencies with it. They do not all live in the same .hart package, but your application's .hart package includes the necessary metadata to let Habitat know what other packages to download and install from the depot. These dependencies may or may not already exist on the node with varying versions, but it doesn't matter because a Habitat application only relies on the packages that reside within Habitat. And even within the Habitat environment, you can have multiple applications that rely on the same dependency but different versions, and these applications can run side by side.

The challenge of portability and the Habitat studio

So when you are building a Habitat plan into a hart package, what keeps that build from pulling dependencies from the default Linux lib directories? What if you do not specify these dependencies in your plan and the build links them from elsewhere? That could break our portability. If your application builds magically from a non-Habitat controlled location, then there is no guarantee that those dependencies will land when you install your application elsewhere. Habitat constructs a build environment called a "studio" to protect against this exact scenario.

The Habitat studio is a clean room environment. The only libraries you will find in this environment are those managed by Habitat. You will find /lib and /usr/lib totally empty here:

[37][default:/src:0]# ls /lib -la
total 8
drwxr-xr-x  2 root root 4096 Dec 24 22:46 .
drwxr-xr-x 26 root root 4096 Dec 24 22:46 ..
lrwxrwxrwx  1 root root    3 Dec 24 22:46 lib -> lib
[38][default:/src:0]# ls /usr/lib -la
total 8
drwxr-xr-x 2 root root 4096 Dec 24 22:46 .
drwxr-xr-x 9 root root 4096 Dec 24 22:46 ..
lrwxrwxrwx 1 root root    3 Dec 24 22:46 lib -> lib

Habitat installs several packages into the studio including several familiar Linux utilities and build tools. Every utility and library that Habitat loads into the studio is a Habitat package itself.

[1][default:/src:0]# ls /hab/pkgs/core/
acl       cacerts    gawk      gzip            libbsd         mg       readline    vim
attr      coreutils  gcc-libs  hab             libcap         mpfr     sed         wget bash      diffutils  glibc     hab-backline    libidn         ncurses  tar         xz
binutils  file       gmp       hab-plan-build  linux-headers  openssl  unzip       zlib bzip2     findutils  grep      less            make           pcre     util-linux

This can be a double edged sword. On the one hand it protects us from undeclared dependencies being missed by our package. The darker side is that your plan may be building source that has build scripts that expect dependencies or other build tools to exist in their "usual" homes. If you are unfamiliar with how the standard Linux linker scans for dependencies, discovering what is wrong with your build may be less than obvious.

The rules of dependency scanning

So before we go any further lets take a look at how the linker works and how Habitat configures its build environment to influence where it finds dependencies at both build and run time. The linker looks at a combination of environment variables, cli options and well known directory paths and in a strict order of precedence. Here is a direct quote from the ld (the linker binary) man page:

The linker uses the following search paths to locate required shared libraries:

1. Any directories specified by -rpath-link options.
2. Any directories specified by -rpath options.  The difference between -rpath and -rpath-link is that directories specified by -rpath options are included in the executable and used at runtime, whereas the -rpath-link option is only effective at link time. Searching -rpath in this way is only supported by native linkers and cross linkers which have been configured with the --with-sysroot option.
3. On an ELF system, for native linkers, if the -rpath and -rpath-link options were not used, search the contents of the environment variable "LD_RUN_PATH".
4. On SunOS, if the -rpath option was not used, search any directories specified using -L options.
5. For a native linker, search the contents of the environment variable "LD_LIBRARY_PATH".
6. For a native ELF linker, the directories in "DT_RUNPATH" or "DT_RPATH" of a shared library are searched for shared libraries needed by it. The "DT_RPATH" entries are ignored if "DT_RUNPATH" entries exist.
7. The default directories, normally /lib and /usr/lib.
8. For a native linker on an ELF system, if the file /etc/ld.so.conf exists, the list of directories found in that file.

At build time Habitat sets the $LD_RUN_PATH variable to the library path of every dependency that the building plan depends on. We can see this in Habitat's build output when we build a Habitat plan:

zeromq: Setting LD_RUN_PATH=/hab/pkgs/mwrock/zeromq/4.1.4/20161225135834/lib:/hab/pkgs/core/glibc/2.22/20160612063629/lib:/hab/pkgs/core/gcc-libs/5.2.0/20161208223920/lib:/hab/pkgs/core/libsodium/1.0.8/20161214075415/lib

This means that at run time, when you run your application built by habitat, it will load from the "habetized" packaged dependencies. This is because setting the $LD_RUN_PATH influences how the ELF metadata is constructed and causes it to point to these Habitat package paths.

Patching pre-built binaries

Habitat not only allows one to build packages from source but also supports "binary-only" packages. These are packages that are made up of binaries downloaded from some external binary repository or distribution site. These are ideal for closed-source software or software that may be too complicated or takes too long to build. However, Habitat cannot control the linking process for these binaries. If you try to execute these binaries in a Habitat studio, you may see runtime failures.

The dotnet-core package is a good example of this. I ended up giving up on building that plan from source and instead just download the binaries from the public .NET distribution site. Running ldd on the dotnet binary, we see:

[8][default:/src:0]# ldd /hab/pkgs/mwrock/dotnet-core/1.0.0-preview3-003930/20161225145648/bin/dotnet
/hab/pkgs/core/glibc/2.22/20160612063629/bin/ldd: line 117:
/hab/pkgs/mwrock/dotnet-core/1.0.0-preview3-003930/20161225145648/bin/dotnet:
No such file or directory

Well that's not very clear. This isn't even able to show us any of the linked dependencies because the glibc interpreter the ELF metadata says to use is not where the metadata says it is:

[9][default:/src:1]# file /hab/pkgs/mwrock/dotnet-core/1.0.0-preview3-003930/20161225145648/bin/dotnet
/hab/pkgs/mwrock/dotnet-core/1.0.0-preview3-003930/20161225145648/bin/dotnet:
ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked,
interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32,
BuildID[sha1]=db256f0ac90cd718d8ec2d157b29437ea8bcb37f, not stripped

/lib64/ld-linux-x86-64.so.2 does not exist . We can manually fix this even after a binary is built with a tool called patchelf. We will declare a build dependency in our plan to core/patchelf and then we can use the following command:

find -type f -name 'dotnet' \
  -exec patchelf --interpreter "$(pkg_path_for glibc)/lib/ld-linux-x86-64.so.2"

Now lets try ldd again:

[16][default:/src:130]# ldd /hab/pkgs/mwrock/dotnet-core/1.0.0-preview3-003930/20161225151837/bin/dotnet
linux-vdso.so.1 (0x00007ffe421eb000)
libdl.so.2 => /hab/pkgs/core/glibc/2.22/20160612063629/lib/libdl.so.2 (0x00007fcb0b2cc000)
libpthread.so.0 => /hab/pkgs/core/glibc/2.22/20160612063629/lib/libpthread.so.0 (0x00007fcb0b0af000)
libstdc++.so.6 => not found
libm.so.6 => /hab/pkgs/core/glibc/2.22/20160612063629/lib/libm.so.6 (0x00007fcb0adb1000)
libgcc_s.so.1 => not found
libc.so.6 => /hab/pkgs/core/glibc/2.22/20160612063629/lib/libc.so.6 (0x00007fcb0aa0d000)
/hab/pkgs/core/glibc/2.22/20160612063629/lib/ld-linux-x86-64.so.2 (0x00007fcb0b4d0000)

This is better. It now links our glibc dependencies to the Habitat packaged glibc binaries, but there are still a couple dependencies that the linker could not find. At least now we can see more clearly what they are.

There is another argument we can pass to patchelf --set-rpath that can edit the ELF metadata as if $LD_RUN_PATH was set when the binary was built:

find -type f -name 'dotnet' \
  -exec patchelf --interpreter "$(pkg_path_for glibc)/lib/ld-linux-x86-64.so.2" --set-rpath "$LD_RUN_PATH" {} \;
find -type f -name '*.so*' \
  -exec patchelf --set-rpath "$LD_RUN_PATH" {} \;

So we set the rpath to the $LD_RUN_PATH set in the Habitat environment. We will also make sure to do this for each *.so file in the directory where we downloaded the distributable binaries. Finally ldd now finds all of our dependencies:

[19][default:/src:130]# ldd /hab/pkgs/mwrock/dotnet-core/1.0.0-preview3-003930/20161225152801/bin/dotnet
linux-vdso.so.1 (0x00007fff3e9a4000)
libdl.so.2 => /hab/pkgs/core/glibc/2.22/20160612063629/lib/libdl.so.2 (0x00007f1e68834000)
libpthread.so.0 => /hab/pkgs/core/glibc/2.22/20160612063629/lib/libpthread.so.0 (0x00007f1e68617000)
libstdc++.so.6 => /hab/pkgs/core/gcc-libs/5.2.0/20161208223920/lib/libstdc++.so.6 (0x00007f1e6829d000)
libm.so.6 => /hab/pkgs/core/glibc/2.22/20160612063629/lib/libm.so.6 (0x00007f1e67f9f000)
libgcc_s.so.1 => /hab/pkgs/core/gcc-libs/5.2.0/20161208223920/lib/libgcc_s.so.1 (0x00007f1e67d89000)
libc.so.6 => /hab/pkgs/core/glibc/2.22/20160612063629/lib/libc.so.6 (0x00007f1e679e5000)
/hab/pkgs/core/glibc/2.22/20160612063629/lib/ld-linux-x86-64.so.2 (0x00007f1e68a38000)

Every dependency is a Habitat packaged binary as declared in our own application's (dotnet-core here) dependencies as low level as glibc. This should be fully portable across any 64 bit Linix distribution.

Creating a Docker container Host on Windows Nano Server with Chef by Matt Wrock

This week Microsoft launched the release of Windows Server 2016 along with its ultra light headless deployment option - Nano Server. The Nano server images are many times smaller than what we have come to expect from a Windows server image. A Nano Vagrant box is just a few hundred megabytes. These machines also boot up VERY quickly and require fewer updates and reboots.

Earlier this year, I blogged about how to run a Chef client on Windows Nano Server. Things have come a long way since then and this post serves as an update. Now that the RTM Nano bits are out, we will look at:

  • How to get and run a Nano server
  • How to install the chef client on Windows Nano
  • How to use Test-Kitchen and Inspec to test your Windows Nano Server cookbooks.

The sample cookbook I'll be demonstrating here will highlight some of the new Windows container features in Nano server. It will install docker and allow you to use your Nano server as a container host where you can run, manipulate and inspect Windows containers from any Windows client.

How to get Windows Nano Server

You have a few options here. One thing to understand about Windows Nano is that there is no separate Windows Nano ISO. Deploying a Nano server involves extracting a WIM and some powershell scripts from a Windows 2016 Server ISO. You can then use those scripts to generate a .VHD file from the WIM or you can use the WIM to deploy Nano to a bare metal server. There are some shortcuts available if you don't want to mess with the scripts and prefer a more instantly gratifying experience. Lets explore these scenarios.

Using New-NanoServerImage to create your Nano image

If you mount the server 2016 ISO (free evaluation versions available here), you will find a "NanoServer\NanoServerImageGenerator" folder containing a NanoServerImageGenerator powershell module. This module's core function is New-NanoServerImage. Here is an example of using to to produce a Nano Server VHD:

Import-Module NanoServerImageGenerator.psd1
$adminPassword = ConvertTo-SecureString "vagrant" -AsPlainText -Force

New-NanoServerImage `
  -MediaPath D:\ `
  -BasePath .\Base `
  -TargetPath .\Nano\Nano.vhdx `
  -ComputerName Nano `
  -Package @('Microsoft-NanoServer-DSC-Package','Microsoft-NanoServer-IIS-Package') `
  -Containers `
  -DeploymentType Guest `
  -Edition Standard `
  -AdministratorPassword $adminPassword

This will generate a Nano Hyper-V capable image file of a Container/DSC/IIS ready Nano server. You can read more about the details and other options of this function in this Technet article.

Direct EXE/VHD download

As I briefly noted above, you can download evaluation copies of Windows Server 2016. Instead of downloading a full multi gigabyte Windows ISO, you could choose the exe/vhd download option. This will download an exe file that will extract a pre-made vhd. You can then create a new Hyper-V VM from the vhd. With that vm, just login to the Nano console to set the administrative password and you are good to go.

Vagrant

This is my installation method of choice. I use a packer template to automate the download of the 2016 server ISO, the generation of the image file and finally package the image both for Hyper-V and VirtualBox Vagrant providers. I keep the image publicly available on Atlas via mwrock/WindowsNano. The advantage of these images is that they are fully patched (key for docker to work with Windows containers), work with VirtualBox and enable file sharing ports so you can map a drive to Nano.

Vagrant Nano bug

One challenge working with Nano Server and cross platform automation tools such as vagrant is that Nano exposes a Powershell.exe with no -EncryptedCommand argument which many cross platform WinRM libraries leverage to invoke remote Powershell on a Windows box.

Shawn Neal and I rewrote the WinRM ruby gem to use PSRP (powershell remoting protocol) to talk powershell and allow it to interact with Nano server. This has been integrated with all the Chef based tools and I will be porting it to Vagrant soon. In the meantime, a "vagrant up" will hang after creating the VM. Know that the VM is in fact fully functional and connectable. I'll mention a hack you can apply to get Test-Kitchen's vagrant driver working later in this post.

Connecting to Windows Nano Server

Once you have a Nano server VM up and running. You will probably want to actually use it. Note: There is no RDP available here. You can connect to Nano and run commands either using native Powershell Remoting from a Windows box (powershell on Linux does not yet support remoting) or use knife-windows' "knife winrm" from Windows, Mac or Linux.

Powershell Remoting:

$ip = "<ip address of Nano Server>"

# You only need to add the trusted host once
Set-Item WSMan:\localhost\Client\TrustedHosts $ip
# use usename and pasword "vagrant" on the mwrock vagrant box
Enter-PSSession -ComputerName $ip -Credential Administrator

Knife-Windows:

# mwrock vagrant boxes have a username and password "vagrant"
# add "--winrm-port 55985 for local VirtualBox
knife winrm -m <ip address of Nano Server> "your command" --winrm-userator --winrm-password

Note that knife winrm expects "cmd.exe" style commands by default. Use "--winrm-shell powershell" to send powershell commands.

Installing Chef on Windows Nano Server

Quick tip: Do not try to install a chef client MSI. That will not work.

Windows Nano server jettisons many of the APIs and subsystems we have grown accustomed to in order to achieve a much more compact and cloud friendly footprint. This includes the removal of the MSI subsystem. Nano server does support the newer appx packaging system currently best known as the format for packaging Windows Store Apps. With Nano Server, new extensions have been added to the appx model to support what is now known as "Windows Server Applications" (aka WSAs).

At Chef, we have added the creation of appx packages into our build pipelines but these are not yet exposed by our Artifactory and Bintray fed Omnitruck delivery mechanism. That will happen but in the mean time, I have uploaded one to a public AWS S3 bucket. You can grab the current client (as of this post) here. To install this .appx file (note: if using Test-Kitchen, this is all done automatically for you):

  1. Either copy the .appx file via a mapped drive or just download it from the Nano server using this powershell function.
  2. Run "Add-AppxPackage -Path <path to .appx file>"
  3. Copy the appx install to c:\opscode\chef:
  $rootParent = "c:\opscode"
  $chef_omnibus_root - Join-Path $rootParent "chef"
  
  if(!(Test-Path $rootParent)) {
    New-Item -ItemType Directory -Path $rootParent
  }

  # Remove old version of chef if it is here
  if(Test-Path $chef_omnibus_root) {
    Remove-Item -Path $chef_omnibus_root -Recurse -Force
  }

  # copy the appx install to the omnibus_root. There are serious
  # ACL related issues with running chef from the appx InstallLocation
  # This is temporary pending a fix from Microsoft.
  # We can eventually just symlink
  $package = (Get-AppxPackage -Name chef).InstallLocation
  Copy-Item $package $chef_omnibus_root -Recurse

The last item is a bit unfortunate but temporary. Microsoft has confirmed this to be an issue with running simple zipped appx applications. The ACLs on the appx install root are seriously restricted and you cannot invoke the chef client from that location. Until this is fixed, you need to copy the files from the appx location to somewhere else. We'll just copy to the well known Chef default location on Windows c:\opscode\chef.

Running Chef

With the chef client installed, its easiest to work with chef when its on your path. To add it run:

$env:path += ";c:\opscode\chef\bin;c:\opscode\chef\embedded\bin"

# For persistent use, will apply even after a reboot.
setx PATH $env:path /M

Now you can run the chef client just as you would anywhere else. Here I'll check the version using knife:

C:\dev\docker_nano_host [master]> knife winrm -m 192.168.137.25 "chef-client -v" --winrm-user vagrant --winrm-password vagrant
192.168.137.25 Chef: 12.14.60

Not all resources may work

I have to include this disclaimer. Nano is a very different animal than our familiar 2012 R2. I am confident that the newly launched Windows Server 2016 should work just as 2012 R2 does today, but nano has APIs that have been stripped away that we have previously leveraged heavily in Chef and Inspec. One example is Get-WmiObject. This cmdlet is not available on Nano Server so any usage that depends on it will fail.

Most of the crucial areas surrounding installing and invoking chef are patched and tested. However, there may be resources that either have not yet been patched or will simply never work. The windows_package resource is a good example. Its used to install MSIs and EXE installers not supported on Nano.

Test-Kitchen and Inspec on Nano

The WinRM rewrite to leverage PSRP allows our remote execution ecosystem tools to access Windows Nano Server. We have also overhauled our mixlib-install gem to use .Net core APIs (the .Net runtime supported on Nano) for the chef provisioners. With those changes in place, Test-Kitchen can install and run Chef, and Inspec can test resources on your Nano instances.

There are a few things to consider when using Test-Kitchen on Windows Nano:

Specifying the Chef appx installer

As I mentioned above, the "OmniTruck" system is not yet serving appx packages to Nano. However, you can tell Test-Kitchen in your .kitchen.yml to use a specific .msi or .appx installer. Here is some example yaml for running Test-Kitchen with Nano:

---
driver:
  name: vagrant

provisioner:
  name: chef_zero
  install_msi_url: https://s3-us-west-2.amazonaws.com/nano-chef-client/chef-12.14.60.appx

verifier:
  name: inspec

platforms:
  - name: windows-nano
    driver_config:
      box: mwrock/WindowsNano

Inspec requires no configuration changes.

Working around Vagrant hangs

Until I refactor Vagrant's winrm communicator, it cannot talk powershell with Windows Nano. Because Test-Kitchen and Inspec talks to Nano directly via the newly PSRP supporting WinRM ruby gem, they make Vagrant's limitation nearly unnoticeable. However the RTM Nano bits exacerbated the Vagrant bug causing it to hang when it does its initial winrm auth check. This can unfortunately hang your kitchen create. You can work around this by applying a simple "hack" to your vagrant install:

Update C:\HashiCorp\Vagrant\embedded\gems\gems\vagrant-1.8.5\plugins\communicators\winrm\communicator.rb (adjusting the vagrant gem version number as necessary) and change:

result = Timeout.timeout(@machine.config.winrm.timeout) do
  shell(true).powershell("hostname")
end

to:

result = Timeout.timeout(@machine.config.winrm.timeout) do
  shell(true).cmd("hostname")
end

This should get your test-kitchen runs unblocked.

Running on Azure hosted Nano images

If you prefer to run Test-Kitchen and Inspec against an Azure hosted VM instead of vagrant, use Stuart Preston's excellent kitchen-azurerm driver:

---
driver:
  name: azurerm

driver_config:
  subscription_id: 'your subscription id'
  location: 'West Europe'
  machine_size: 'Standard_F1'

platforms:
  - name: windowsnano
    driver_config:
      image_urn: MicrosoftWindowsServer:WindowsServer:2016-Nano-Server-Technical-Preview:latest

See the kitchen-azurerm readme for details regarding azure authentication configuration. As of the date of this post, RTM images are not yet available but thats probably going to change very soon. In the meantime, use TP5.

Using Chef to Configure a Docker host

One of the exciting new features of Windows Server 2016 and Nano Server is their ability to host Windows containers. They can do this using the same Docker API we are familiar with with linux containers. You could walk through the official instructions for setting this up or you could just have Chef do this for you.

Updating the Nano server

Note that in order for this to work on RTM Nano images, you must install the latest Windows updates. My vagrant boxes come fully patched and ready but if you are wondering how do you install updates on a Nano server, here is how:

$sess = New-CimInstance -Namespace root/Microsoft/Windows/WindowsUpdate -ClassName MSFT_WUOperationsSession
Invoke-CimMethod -InputObject $sess -MethodName ApplyApplicableUpdates

Then just reboot and you are good.

A sample cookbook to install and configure the Docker service

I converted the above mentioned instructions for installing Doker and configuring the service into a Chef cookbook recipie.  Its fairly straightforward:

powershell_script 'install Nuget package provider' do
  code 'Install-PackageProvider -Name NuGet -Force'
  not_if '(Get-PackageProvider -Name Nuget -ListAvailable -ErrorAction SilentlyContinue) -ne $null'
end

powershell_script 'install nano container package' do
  code 'Install-Module -Name xNetworking -Force'
  not_if '(Get-Module xNetworking -list) -ne $null'
end

zip_path = "#{Chef::Config[:file_cache_path]}/docker.zip"
docker_config = File.join(ENV["ProgramData"], "docker", "config")

remote_file zip_path do
  source "https://download.docker.com/components/engine/windows-server/cs-1.12/docker.zip"
  action :create_if_missing
end

dsc_resource "Extract Docker" do
  resource :archive
  property :path, zip_path
  property :ensure, "Present"
  property :destination, ENV["ProgramFiles"]
end

directory docker_config do
  recursive true
end

file File.join(docker_config, "daemon.json") do
  content "{ \"hosts\": [\"tcp://0.0.0.0:2375\", \"npipe://\"] }"
end

powershell_script "install docker service" do
  code "& '#{File.join(ENV["ProgramFiles"], "docker", "dockerd")}' --register-service"
  not_if "Get-Service docker -ErrorAction SilentlyContinue"
end

service 'docker' do
  action [:start]
end

dsc_resource "Enable docker firewall rule" do
  resource :xfirewall
  property :name, "Docker daemon"
  property :direction, "inbound"
  property :action, "allow"
  property :protocol, "tcp"
  property :localport, [ "2375" ]
  property :ensure, "Present"
  property :enabled, "True"
end

This downloads the appropriate docker binaries, installs the docker service and configures it to listen on port 2375.

To validate that all actually worked we have these Inspec tests:

describe port(2375) do
  it { should be_listening }
end

describe command("& '$env:ProgramFiles/docker/docker' ps") do
  its('exit_status') { should eq 0 }
end

describe command("(Get-service -Name 'docker').status") do
  its(:stdout) { should eq("Running\r\n") }
end

If this all passes, we know our server is listening on the expected port and that docker commands work.

Converge and Verify

So lets run these with kitchen verify:

C:\dev\docker_nano_host [master]> kitchen verify
-----> Starting Kitchen (v1.13.0)
-----> Creating <default-windows-nano>...
       Bringing machine 'default' up with 'hyperv' provider...
       ==> default: Verifying Hyper-V is enabled...
       ==> default: Starting the machine...
       ==> default: Waiting for the machine to report its IP address...
           default: Timeout: 240 seconds
           default: IP: 192.168.137.25
       ==> default: Waiting for machine to boot. This may take a few minutes...
           default: WinRM address: 192.168.137.25:5985
           default: WinRM username: vagrant
           default: WinRM execution_time_limit: PT2H
           default: WinRM transport: negotiate
       ==> default: Machine booted and ready!
       ==> default: Machine not provisioned because `--no-provision` is specified.
       [WinRM] Established

       Vagrant instance <default-windows-nano> created.
       Finished creating <default-windows-nano> (1m15.86s).
-----> Converging <default-windows-nano>...

...


  Port 2375
     ✔  should be listening
  Command &
     ✔  '$env:ProgramFiles/docker/docker' ps exit_status should eq 0
  Command (Get-service
     ✔  -Name 'docker').status stdout should eq "Running\r\n"

Summary: 3 successful, 0 failures, 0 skipped
       Finished verifying <default-windows-nano> (0m11.94s).

Ok our docker host is ready.

Creating and running a Windows container

First if you are running Nano on VirtualBox, you need to add a port forwarding rule for port 2375. Also note that you will need the docker client installed on the machine where you intend to run docker commands. I'm running them from my Windows 10 laptop. To install docker on Windows 10:

Invoke-WebRequest "https://download.docker.com/components/engine/windows-server/cs-1.12/docker.zip" -OutFile "$env:TEMP\docker.zip" -UseBasicParsing

Expand-Archive -Path "$env:TEMP\docker.zip" -DestinationPath $env:ProgramFiles

$env:path += ";c:\program files\docker"

No matter what platform you are running on, once you have the docker client, you need to tell it to use your Nano server as the docker host. Simply set the DOCKER_HOST environment variable to "tcp://<ipaddress of server>:2375".

So now lets download a nanoserver container image from the docker hub repository:

C:\dev\NanoVHD [update]> docker pull microsoft/nanoserver
Using default tag: latest
latest: Pulling from microsoft/nanoserver
5496abde368a: Pull complete
Digest: sha256:aee7d4330fe3dc5987c808f647441c16ed2fa1c7d9c6ef49d6498e5c9860b50b
Status: Downloaded newer image for microsoft/nanoserver:latest

Now lets run a command...heck lets just launch an interactive powershell session inside the container with:

docker run -it microsoft/nanoserver powershell

Here is what we get:

Windows PowerShell
Copyright (C) 2016 Microsoft Corporation. All rights reserved.

PS C:\> ipconfig

Windows IP Configuration


Ethernet adapter vEthernet (Temp Nic Name):

   Connection-specific DNS Suffix  . : mshome.net
   Link-local IPv6 Address . . . . . : fe80::2029:a119:3e4f:851a%15
   IPv4 Address. . . . . . . . . . . : 172.30.245.4
   Subnet Mask . . . . . . . . . . . : 255.255.240.0
   Default Gateway . . . . . . . . . : 172.30.240.1
PS C:\> $env:COMPUTERNAME
E1C534D94707
PS C:\>

Ahhwwww yeeeeaaaahhhhhhh.

What's next?

So we have made alot of progress over the last few months but the story is not entirely complete. We still need to finish knife bootstrap windows winrm and plug in our azure extension.

Please let us know what works and what does not work. I personally want to see Nano server succeed and of course we intend for Chef to provide a positive Windows Nano Server configuration story.

Released WinRM Gem 2.0 with a cross-platform, open source PSRP client implementation by Matt Wrock

Today we released the gems: WinRM 2.0, winrm-fs 1.0 and winrm elevated 1.0. I first talked about this work in this post and have since performed extensive testing (but I have confidence the first bug will be reported soon) and made several improvements. Today its released and available to any consuming application wanting to use it and we should see a Test-Kitchen release in the near future upgrading its winrm gems. Up next will be knife-windows and vagrant.

This is a near rewrite of the WinRM gem. Its gotten crufty over the years and its API and internal structure needed some attention. This release fixes several bugs and brings some big improvements. You should read the readme to catch up on the changes but here is how it looks in a nutshell (or an IRB shell):

mwrock@ubuwrock:~$ irb
2.2.1 :001 > require 'winrm'
opts = {
  endpoint: 'http://127.0.0.1:55985/wsman',
  user: 'vagrant',
  password: 'vagrant'
}
conn = WinRM::Connection.new(opts); nil
conn.shell(:powershell) do |shell|
  shell.run('$PSVersionTable') do |stdout, stderr|
    STDOUT.print stdout
    STDERR.print stderr
  end
end; nil => true
2.2.1 :002 > opts = {
2.2.1 :003 >       endpoint: 'http://127.0.0.1:55985/wsman',
2.2.1 :004 >       user: 'vagrant',
2.2.1 :005 >       password: 'vagrant'
2.2.1 :006?>   }
 => {:endpoint=>"http://127.0.0.1:55985/wsman", :user=>"vagrant", :password=>"vagrant"}
2.2.1 :007 > conn = WinRM::Connection.new(opts); nil
 => nil
2.2.1 :008 > conn.shell(:powershell) do |shell|
2.2.1 :009 >       shell.run('$PSVersionTable') do |stdout, stderr|
2.2.1 :010 >           STDOUT.print stdout
2.2.1 :011?>         STDERR.print stderr
2.2.1 :012?>       end
2.2.1 :013?>   end; nil

Name                           Value
----                           -----
PSVersion                      4.0
WSManStackVersion              3.0
SerializationVersion           1.1.0.1
CLRVersion                     4.0.30319.34209
BuildVersion                   6.3.9600.17400
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0}
PSRemotingProtocolVersion      2.2

Note this is run from an Ubuntu 14.04 host targeting a Windows 2012R2 VirtualBox VM. No Windows host required.

100% Ruby PSRP client implementation

So for the four people reading this that know what this means: yaaay! woohoo! you go girl!! we talk PSRP now. yo.

No...Really...why should I care about this?

I'll be honest, there are tons of scenarios where PSRP will not make any difference, but here are some tangible points where it undoubtedly makes things better:

  • File copy can be orders of magnitude faster. If you use the winrm-fs gem to copy files to a remote windows machine, you may see transfer speeds as much as 30x faster. This will be more noticeable transferring files larger than several kilobytes. For example, the PSRP specification PDF - about 4 and a half MB - takes about 4 seconds via this release vs 2 minutes on the previous release on my work laptop. For details as to why PSRP is so much faster, see this post.
  • The WinRM gems can talk powershell to Windows Nano Server. The previous WinRM gem is unable to execute powershell commands against a Windows Nano server. If you are a test-kitchen user and would like to see this in action, clone https://github.com/mwrock/DSCTextfile and:
bundle install
bundle exec kitchen verify

This will download my WindowsNanoDSC vagrant box, provision it, converge a DSC file resource and test its success with Pester. You should notice that not only does the nano server's .box file download from the internet MUCH faster, it boots and converges several minutes faster than its Windows 2012R2 cousin.

Stay tuned for Chef based kitchen converges on Windows Nano!

  • You can now execute multiple commands that operate in the same scope (runspace). This means you can share variables and imported commands from call to call because calls share the same powershell runspace whereas before every call ran in a separate powershell.exe process. The winrm-fs gem is an example of how this is useful.
def stream_upload(input_io, dest)
  read_size = ((max_encoded_write - dest.length) / 4) * 3
  chunk, bytes = 1, 0
  buffer = ''
  shell.run(<<-EOS
    $to = $ExecutionContext.SessionState.Path.GetUnresolvedProviderPathFromPSPath("#{dest}")
    $parent = Split-Path $to
    if(!(Test-path $parent)) { mkdir $parent | Out-Null }
    $fileStream = New-Object -TypeName System.IO.FileStream -ArgumentList @(
        $to,
        [system.io.filemode]::Create,
        [System.io.FileAccess]::Write,
        [System.IO.FileShare]::ReadWrite
    )
    EOS
  )

  while input_io.read(read_size, buffer)
    bytes += (buffer.bytesize / 3 * 4)
    shell.run(stream_command([buffer].pack(BASE64_PACK)))
    logger.debug "Wrote chunk #{chunk} for #{dest}" if chunk % 25 == 0
    chunk += 1
    yield bytes if block_given?
  end
  shell.run('$fileStream.Dispose()')
  buffer = nil # rubocop:disable Lint/UselessAssignment

  [chunk - 1, bytes]
end

def stream_command(encoded_bytes)
  <<-EOS
    $bytes=[Convert]::FromBase64String('#{encoded_bytes}')
    $fileStream.Write($bytes, 0, $bytes.length)
  EOS
end

Here  we issue some powershell to create a FileStream, then in ruby we iterate over an IO class and write to that FileSteam instance as many times as we need and then dispose of the stream when done. Before, that FileStream would be gone on the next call and instead we'd have to open the file on each trip.

  • Non administrator users can execute commands. Because the former WinRM implementation was based on winrs, a user had to be an administrator in order to authenticate. Now non admin users, as long as they belong to the correct remoting users group, can execute remote commands.

This is just the beginning

In and of itself, a WinRM release may not be that exciting but lays the groundwork for some great experiences. I cant wait to explore testing infrastructure code on windows nano further and, sure, sane file transfer rates sounds pretty great.

How can we most optimally shrink a Windows base image? by Matt Wrock

I have spent alot of time trying to get my Windows vagrant boxes as small as possible. I blogged pretty extensively on what optimizations one can make and how those optimizations can be automated with Packer. Over the last week I've leveraged that automation to allow me to collect data on exactly how much each technique I employ saves in the final image. The results I think are very interesting.

Diving into the data

The metrics above reflect the savings yielded in a fully patched Windows 2012 R2 VirtualBox base image. The total size of the final compressed .box vagrant file with no optimizations was 7.71GB and 3.71GB with all optimizations applied.

I have previously blogged the details involved in each optimization and my Packer templates can be found online that automate this process. Let me quickly summarize these optimizations in order of biggest bang for buck:

  • SxS Cleanup (54%): The Windows SxS folder can grow larger and larger over time. This has historically been a major problem and until not too long ago, the only remedy was to periodically repave the OS. Among other things, this folder includes backups for every installed update so that they can be undone if necessary. The fact of the matter is that most will never rollback any update. Windows now expose commands and scheduled tasks that allow us to periodically trim this data. Naturally this will have the most impact the more updates that have been installed.
  • Removing windows features or Features On Demand (25%): Windows ships with almost all installable features and roles on disk. In many/most cases, a server is built for a specific task and its dormant unenabled features simply take up valuable disk space. Another relatively new feature in Windows management is the ability to totally remove these features from disk. They can always be restored later either via external media or Windows Update.
  • Optimize Disk (13%): This is basically a defragmenter and optimizes the disk according to its used sectors. This will likely be more important as disk activity increases between OS install and the time of optimization.
  • Removing Junk/Temp files (5%): Here we simply delete the temp folders and a few other unnecessary files and directories created during setup. This will likely have minimal impact if the server has not undergone much true usage.
  • Removing the Page File (3%): This is a bit misleading because the server will have a page file. We just make sure that the image in the .box file has no page file (possibly a GB in space but compresses to far less). On first boot, the page file will be recreated.

The importance of "0ing" out unused space

This is something that is of particular importance for VirtualBox images. This is the act of literally flipping every unused bit on disk to 0. Otherwise the image file treats this space as used in a final compressed .box file. The fascinating fact here is if you do NOT do this, you save NOTHING. At least for VirtualBox but not Hyper-V and that is all I measured. So our 7.71 GB original patched OS with all optimizations applied but without this step compressed to 7.71GB. 0% savings.

This is small?

Lets not kid ourselves. As hard as we try to chip away at a windows base image, we are still left with a beast of an image. Sure we can cut a fully patched Windows image almost in half but it is still just under 4 GB. Thats huge especially compared to most bare Linux base images.

If you want to experience a truly small Windows image, you will want to explore Windows Nano Server. Only then will we achieve orders of magnitude of savings and enter into the Linux "ballpark". The vagrant boxes I have created for nano weigh in at about 300MB and also boot up very quickly.

Your images may vary

The numbers above reflect a particular Windows version and hypervisor. Different versions and hypervisors will assuredly yield different metrics.

There is less to optimize on newer OS versions

This is largely due to the number of Windows updates available. Today, a fresh Windows 2012 R2 image will install just over 220 updates compared to only 5 on Windows 2016 Technical Preview 5. 220 updates takes up alot of space, scattering bits all over the disk.

Different Hypervisor file types are more efficient than others

A VirtualBox .vmdk will not automatically optimize as well as a Hyper-V .vhd/x. Thus come compression time, the final vagrant VirtualBox .box file will be much larger if you dont take steps yourself to optimize the disk.