This post is not about using git for source control. It assumes that you are already doing that. What I am going to discuss is a version numbering strategy that leverages the git log. The benefit here is the guarantee that any change in the artifact (cookbook, environment, data bag) will result in a unique version number that will not conflict with other versions provided by your fellow teammates. It ensures that deciding on what version to stamp your change is one thing you don’t need to think about. I'll close the post demonstrating how this can be automated as a part of your build process.
The strategy explained
You can use the git log command to list all commits applied to a directory or individual file in your repository:
git log --pretty=oneline some_directory/
This will list all commits within the some_directory directory in a single line per commit that prints the sha1 and the commit comment. To make this a version, you would count these lines:
(git log --pretty=oneline some_directory/).count
git log --pretty=oneline some_directory/ | wc –l
If you are using semantic versioning to express version numbers, the commit count can be used to produce the build number – the third element of a version. So what about the major and minor numbers? One argument you can pass to git log is a starting ref from which to list commits. When you decide to increment the major or minor build number, you want to tag your repository with those numbers:
git tag 2.3
So now you want your build numbers to reset to 0 starting from the commit being tagged. You can do this by by telling git’s log command to list all commits from that tag forward like so:
git log 2.3.. --pretty=oneline some_directory/
If you were to run this just after tagging your repo with the major and minor versions, you would get 0 commits and thus the semantic version would be 2.3.0. So you will need to give thought to incrementing major and minor build number but the final element just happens upon commit.
Benefits and downsides to this strategy
Before getting into the details of applying this to chef artifacts, lets briefly examine some pros and cons of this technique.
Any change to a versionable artifact will result in a unique build number and if two builds have the same contents, their build numbers will be the same
This is crucial especially if you need to communicate with customers or fellow team members regarding features or bugs. This can help to remove confusion and ensure you are discussing the same build. If you are using a bug tracking system, you will want to include this version in the bug report so other team members reviewing the bug can checkout that version from source control or review all changes made since that version was committed.
Builds can be produced independently of a separate build server
Especially for solo/side projects where you may not even have a build server, this can help you create deterministic build numbers. However even if your project’s authoritative builds are produced by a system like Jenkins or TeamCity, individual team members can produce their own builds and produce the same build numbers generated by your build server (assuming the build server is using this strategy). Of course the number may vary slightly if other team members have produced commits and have not yet pushed to your shared remote or if the build is performed without pulling the latest changes. That’s why you also want to include the current sha1 somewhere in your artifact. More on that later.
Allows you to separately version different artifacts in your repository
Especially if your chef repository houses multiple cookbooks and you freeze your cookbook versions or use version constraints in your environments, this can be very important. I want to know that any change to a cookbook will increment the version and if the cookbook has remained unchanged, its version should be the same.
There will be gaps in your build numbers
You will likely commit several times between builds. So two subsequent builds with say 5 commits in between will increment the build number by 5. This should not be an issue as long as your team is aware of this. However, if you consider sequential build numbers important as a customer facing means to communicate change, this could be an issue. I have used this technique on a couple of fairly popular OSS projects and I never had an issue with users or contributors stumbling on this.
Build numbers can get big
If you rarely increment the major or minor build numbers, this will surely happen over time. I try to increment the minor number on any feature enhancing release in which case this is not usually an issue.
If build agents cannot talk to git
If you are using a centralized build server and if this is a collaborative project you certainly should be, you definitely want the builds produced by your build server to follow this same strategy. In order to do that, you want to configure your build server to delegate the git pull to the build agents. Otherwise, the git log commands will not work. The build agent must have an actual git repo with the .git folder available to see the commit counts.
Applying this to chef artifacts
First, what do I mean by “chef artifacts?” Don’t I really mean cookbooks? No. While cookbooks are certainly included and are the most important artifact to version, I also want to version environment and data_bag files. If I used roles, I would version those too. Regardless of the fact that cookbooks are the only entity that has first class versioning support on a chef server, I should be able to pin these artifacts to their specific git commit. Also, I may change environment or data_bag files several times before uploading to the server and I may want to choose a specific version to upload. If you add cookbook version constraints to your environments, any dependency change will result in a version bump to your environment and your environment version may serve as a top level repository version.
Stamping the artifact
So what gets stamped where? For cookbooks this is obvious. The version string in metadata.rb will have the generated version applied. For environment and data_bag files, we create a new json element in the document:
I add the version as an override attribute since you cannot add new top level keys to environment files. However for data_bag files I do insert the version as a top level json key.
Including the sha1
You may have noticed that the environment file displayed above has a sha1 attribute just below the version. Every commit in git is identified by a sha1 hash that uniquely identifies it. While the version number is a human readable form of expressing changes and can still be used to find the specific commit in git that produced the version, having the sha1 included with the version makes it much easier to track down the specific git commit. I can simply do a:
git checkout <sha1>
This will update my working directory to match all code exactly as it was when that version was commited. If you report problems with a cookbook and can give me this sha1, I can bring up its exact code in seconds.
As we have already seen, the sha1 is stored in a separate json attribute for environment and data_bag files. For cookbook metadata.rb file, I add this as a comment to the end of the file: