I've mentioned this in several posts. The last year I have coded primarily in ruby after spending over a decade in C#. See this post for my general observations contrasting Windows and Linux culture. Here I want to focus on language version as well as dependency management particularly comparing nuget to gem packages. This is a topic that has really tripped me up on several occasions. I used to think that ruby was inferior in its ability to avoid dependency conflicts, but in fact I just misunderstood it. Gems and Nuget are deceptively similar but there are some concepts that Ruby separates into different tools like Bundler and that Nuget consolidates into the core package manager with some assistance from Visual Studio.
In this post I'll draw some comparisons and also contrast the differences of managing different versions of ruby as opposed to different .Net runtimes and then explore package dependency management.
.Net Runtimes vs. Ruby Versions
Individual versions of .net are the moral equivalent of different ruby versions. However .net has just one physical install per runtime/architecture saved to a common place on disk. Ruby can have multiple installations of the same version and stored in any location.
In .net, which runtime version to use is determined at application compile time (there are also ways to change this at runtime). One specifies which .net version to compile against and then that version will be loaded whenever the compiled program is invoked. In .net, the program "owns" the process and the runtime libraries are loaded into that process. This is somewhat orchestrated by the Windows registry which holds the location of the runtime on disk and therefore knows where to find the base runtime libraries to load. Note that I'm focusing on .net on windows and not mono which can run .Net on linux.
Ruby versions can be located anywhere on disk. They do not involve the registry at all. Which version used depends entirely on which ruby executable is invoked. Unlike .net, the ruby user application is not the process entry point. Instead one always invokes the ruby binary (ruby.exe if on windows) and passes ruby files (*.rb) to load and run. Usually one controls which version is the system wide used version by putting that ruby bin folder on the path.
Loading the runtime, step by step
Lets look at both .net and ruby and see what exactly happens when you invoke a program.
.net
- Invoke an exe
- create process
- .net bootstrapper (corExeMain) loads .net runtime into process
- .net code (msil) is run
This glosses over some details but the main point is that each .net runtime resides in a single location on disk and each compiled .net program includes a bootstrapper called corExeMain that makes some system calls to locate that runtime.
Also note that it is very likely that the .exe invoked is not necessarily "the app". ASP.Net is a good example. Assuming a traditional IIS web server is hosting your ASP.Net application. There is an IIS worker process spawned by iis that hosts an asp.net application. A ASP.net developer did not write this worker process. They wrote code that is compiled into .DLL libraries. The ASP.Net worker process discovers these DLLs and loads them into the .Net runtime that they host.
ruby
Lets take the case of a popular ruby executable bundler.
- Invoke bundle install on the command line
- Assuming C:\Ruby21-x64\bin is on the PATH, C:\Ruby21-x64\bin\bundler.bat is called. A "binstub" that ruby creates.
- This is a thin wrapper that calls ruby.exe C:/Ruby21-x64/bin/bundler
- This bundler file is another thin wrapper that loads the bundler ruby gem and loads the bundler file in that gem's bin directory
- This bundler file contains the entry point of the bundler application
Managing multiple versions of the runtime on the same machine
.net places its copies of runtime libraries in the Microsoft.NET directory of the system drive. A .net developer will compile their library targeting a specific runtime version. Also, a configuration file can be created for any .Net .exe file that can inform which runtime to load. The developer just needs to insure that the .net version they use exists on the machine.
With Ruby, its all a matter of pointing to the ruby bin file in the installed ruby directory tree you want to use and this may be influenced by one's PATH settings when not using absolute file paths.
rvm
One popular and convenient method of managing multiple ruby versions is using a tool called rvm. This is really meant for consumption on linux machines. I do believe it can be used with cygwin on windows machines. Personally I'm not a big cygwin fan and prefer to just use a linux VM where I can use native rvm if I need to switch ruby versions.
rvm exposes commands that can install different versions of ruby and easily switch one's working environment from one version to another.
omnibus installs
One big challenge in ruby is distributing an application. When one writes a ruby program either as a library to be consumed or as a command line executable, it is most common to package this code into one or more ruby gems. However, a gem is just a collection of .rb files and some other supporting files. The gem does not contain the ruby runtime itself. If I were to give someone a .gem file who was unfamiliar with ruby, they would not have a clue as to what to do with that file, but I would still love them.
That person, god bless them, would need to install ruby and then install that gem into the ruby installation.
So one way that ruby applications have begun to distribute themselves is via an omnibus. An omnibus installation ships with a full ruby runtime embedded in the distributed app. Its application code is still packaged as one or more gems and they are located in the special gems directory of this ruby installation along with all of its dependent gems. This also ensures that all the dependent gems and their required versions are preinstalled to eliminate the risk of dependency conflicts. Two example omnibus applications that I regularly work with are Chef and Vagrant.
So chef might have the following directory structure:
chef
|_bin
|_chef.bat
|_chef
|_embedded
Upon installation, chef/bin is added to the PATH so that calling chef from the command line will invoke the chef.bat file. That chef.bat file is a thin wrapper that calls chef/embedded/bin/ruby.exe and loads the ruby in chef/bin/chef which then calls into the chef gem.
So the advantage of the omnibus is a complete ruby environment that is not at risk of containing user loaded gems. The disadvantage is that any ruby app even if it is tiny needs to distribute itself with a complete ruby runtime which is not small.
Assemblies
Before we dive into nuget and gem packages, we need to address assemblies (.DLLs) which only exist in .net. How do assemblies, .gems and .nupkg (nuget) files map to one another? Assemblies are the final container of application logic and are what physically compose the built .net application. Assemblies were at one time a collection of code files that have been compiled down to IL(intermediate language) and packaged as a .DLL file. In package management terms, assemblies are what gets packaged but they are not the package.
Assemblies can exist in various places on disk. Typically they will exist in one of two places, the Global Assembly Cache (GAC) or in an application's bin directory. When a .net application is compiled, every assembly includes an embedded manifest of its version and the versions of all dependent assemblies it was built with. At runtime, .net will try to locate assemblies of these versions unless there is configuration metadata telling it otherwise.
The .net runtime will always search the GAC first (there are tricks to subvert this) unless the assembly is not strong named and then fall back to the bin paths configured for the application. For details on assembly loading see this MSDN article. Side note: The GAC is evil and I am inclined to look down upon those who use it and their children. Other than that, I have no strong opinions on the matter.
So software teams have to maintain build processes that ensure that any version of its application is always built with an agreed upon set of assemblies. Some of these assemblies may be written by the same application team, others might be system assemblies that ship with the OS, others might be official microsoft assemblies freely available from Microsoft Downloads, and others might be from other commercial or open source projects. Keeping all of these straight can be a herculean effort. Often it comes down to just putting all of these dependencies (in their compiled form) in source control in the same repo as the consuming application. For the longest time - like a decade - this was version management in .net. and it remains so for many today.
This suffers several disadvantages. Bloated source control for one thing. These assemblies can eventually take over the majority of a repository's space (multiple gigabytes). They do not lend themselves to being preserved as deltas and so alot of their data is duplicated in the repo. Eventually, this crushes the productivity of builds and developer work flow since so much time is wasted pulling these bits down from source control.
One strategy to overcome this inefficiency is for larger teams or groups of teams to place all of their dependent assemblies in a common directory. This can save space and duplication since different teams that depend on the same assembly will basically share the same physical file. But now teams must version dependencies at the same cadence and eventually find themselves bound to a monolith leading to other practices that impede the maintainability of the application and make engineers cry and want to kill baby seals.
Package management
Enter package management. Package management performs several valuable functions. Here are just a few:
- Dependency discovery - finding dependencies
- Dependency delivery - downloading dependencies
- Dependency storage and versioning outside of source control
Ruby was the first (between itself and .net) to implement this with RubyGems and later, inspired by ruby, .net introduced nuget.
A little history: ngem was the first incarnation of nuget that had several starts and stops. David Laribee came up with Nubular as the name for ngem in 2008 and it stuck. Later Dru Sellers, Rob Reynolds, Chris Patterson, Nick Parker, and Bil Simser picked it up as a ruby project instead of .net and started moving really fast.
In the meantime Microsoft had quietly been working on a project called NPack and had been doing so for about four months when they contacted the Nu team. Nu was getting wildly popular in a matter of a few short weeks. These teams combined forces because it was the best possible thing for the community - and to signify the joining forces it was renamed nupack.
Shortly thereafter it was discovered that Caltech had a tool called nucleic acid package or nupack for short so it was renamed to nuget which is what it remains today.
My guess, totally unsubstantiated, is that one reason why ruby was the first to develop this is because ruby has no assembly concept. Ruby is interpreted and therefore all of the individual code files are stored with the application. With assemblies, its at least somewhat sane to have a unit of dependency be a single file that has a version embedded inside and that is not easy to tamper with and break.
Similarities between gems and nugets
There is so much that is similar here that it is easy to be deceived that there is more similar than there really is. So first lets cover some things that truly are the same. Keep in mind nuget was originally implemented as gems that could be stored on rubygems.org so there is a reason for the similarities.
gemspec/nuspec
Both have a "spec" file stored at the root of the package that contain metadata about the package. Ruby calls this a gemspec and and nuget a nuspec. The key bits of data which both support are package version, name, content manifest and other packages this one depends on.
Feeds
Both gems and nuget packages can be discovered and downloaded from a http source or feed. These feeds expose an API allowing package consumers to query an http endpoint for which packages it has and which versions and a means of downloading the package file.
Semantic versioning
Both dependency resolvers are based on semantic versioning. However they use different nomenclature for specifying allowed ranges.
CLI
Both have a CLI and clearly the Nuget.exe cli commands come from ruby heriatage but have diverged. In Ruby the gem CLI plays a MUCH more central role than it does with Nuget. But both have the core capabilities of building, installing, uninstalling, publishing and querying package stores.
A case of cognitive dissonance and adjusting to gems
So for those with experience working on .net projects inside Visual Studio and managing dependencies with nuget, the story of dependency management is fairly streamlined. There are definitely some gotchas but lets look at the .net workflow that many are used to practicing with two points of focus. Application creation and application contributor. One involving the initial setup of an app's dependencies and the other capturing the experience of someone wanting to contribute to that application.
Setting up and managing dependencies
One begins by creating a "project" in visual studio that will be responsible for building your code. Then when you want to add dependencies, you use the package manager gui inside visual studio to find the package and add it to your project. To be clear, you can do this from the command console in visual studio too. Installing these will also install their dependencies and resolve versions using the version constraints specified in each package's nuspec.