4.2. Version numbers (SemVer)

Version numbers, in conjunction with a version control system (VCS), are an essential step towards reproducible data processing (and, more generally, reproducible work with software). While, technically speaking, every revision in a version control system has a unique identifier (in case of git, a cryptographic hash value), this is usually not really human-readable. Hence, human-readable version numbers adhering to a clear scheme provide means of addressing a particular version/state of a piece of software.

Crucial is to have a one-to-one (bijective) mapping of software version (revision in a VCS) to version number: one version number refers to exactly one state of the software, and one state of the software has one and only one version number.

While there are lots of different schemes for version numbers used, one that is particularly wide-spread (not only) in the Python world is “Semantic Version” (SemVer): https://semver.org/.

4.2.1. SemVer: Emphasising compatibility

What is the special feature of “Semantic Versioning”? This version numbering scheme encodes semantic information, i.e. meaning, into version numbers, namely compatibility. Furthermore, it has a strict scheme that is rather easy to follow. Strictly speaking, there is no way to automatically check nor enforce adhering to this scheme, as the semantics cannot be checked for algorithmically. Therefore, the information transported in the version numbers still relies on the programmers using the scheme according to its specifications.

SemVer essentially has a tripartite version number, with the individual parts separated by a single dot: MAJOR.MINOR.PATCH. The basic rules are pretty simple as well:

  • If you have a change in your code that breaks backwards-compatibility, increment “MAJOR”.

    Generally, it is not a good idea of introducing breaking changes, hence use with care and only if absolutely necessary.

  • If you add some new functionality, increment “MINOR”.

  • If you fix bugs and don’t add new functionality, increment “PATCH”.

Furthermore, each individual number is incremented independently: The minor version following “1.9.0” is therefore “1.10.0”, and similarly, “0.99.0” is followed by “0.100.0”.

There are a few additional specialties and conventions that come in handy when (actively) developing software: Add the suffix “dev#” followed by an incrementing number (denoted here with “#”) for development versions. Similarly, you can add “rc#” for “release candidates” with “#” being again the number of the candidate. Hence, a usual version number for a development version would be something like 0.1.0.dev54.

While you should make sure to never use development versions in production/data processing and analysis (as in development versions, you can never be sure that there is a one-to-one bijective relationship between actual state of your software and version number), it is still better to know that you have used a development version than not having this (sometimes crucial) information. To circumvent the problem of being tempted to use development versions for actual data analysis, release early and release often, in incremental steps. Having at least a well-documented workflow for releasing your software helps a lot. An automatic continuous integration/continuous delivery workflow would be even better. Nevertheless, keep in mind that everything you can automate (and if only in the sense of: having a sheet of paper with the individual steps clearly outlined) frees you from having to think about it, and greatly enhances consistency and reproducibility.

4.2.2. Where to place the version number?

Simply in a file VERSION in your project root directory, containing nothing but the plain version number.

Make sure to store the version number only in one single place. If there are reasons to have the version number in more than one place, define one place as the primary store for the version number, and make all other places depending on this one (original) information, by means of being automatically updated.

For Python packages using a setup.py file in their root directory, you may enforce reading the version number from the VERSION file by adding the following code to your setup.py file:

import os
import setuptools


def read(fname):
    with open(os.path.join(os.path.dirname(__file__), fname)) as f:
        content = f.read()
    return content


setuptools.setup(
    name="<package name>",
    version=read('VERSION').strip(),
    # ...
)

Similarly, for the Spinx configuration of your project documentation, residing in the conf.py file in the docs directory of your Python package, add something similar to the following code:

import os

with open(os.path.join(os.path.dirname(__file__), '..', 'VERSION')) as \
        version_file:
    release_ = version_file.read().strip()

# The full version, including alpha/beta/rc tags
# The short X.Y version
version = ".".join(release_.split('.')[0:2])
# The full version, including alpha/beta/rc tags
release = release_

4.2.3. Auto-incrementing version numbers

Having to manually increment the version number on each commit is just not working in practice. Hence, we need ways to auto-increment the version number. Thankfully, git provides means of hooking such mechanism into the usual git workflow by means of git hooks. Hence, if we have a script taking care of the version number, we can just hook it into git.

A bash script taking care of our version number residing in the file VERSION may look as follows:

 1#!/bin/bash
 2#
 3# Increment version number (last part of version string)
 4# read from a file containing only the version number.
 5#
 6# Assuming a version number scheme following SemVer
 7# consisting of
 8#
 9#	MAJOR.MINOR.PATCH
10#
11# in which case "PATCH" is incremented,
12# or alternatively
13#
14#	MAJOR.MINOR.PATCH.dev#
15#
16# where the number following "dev" is incremented.
17#
18# If the internal variable CHECKGIT is set to "true", the file
19# containing the version string will be checked for manual changes
20# and if there are any, the script will exit immediately.
21#
22# Copyright (c) 2017-21, Till Biskup
23# 2021-04-18
24
25# Some configuration
26VERSIONFILE="VERSION"
27CHECKGIT=true # set to "true" to check for changes via git diff
28ONLYONMASTER=true
29
30CURRENTBRANCH=$(git rev-parse --abbrev-ref HEAD)
31
32# Internal functions
33function join_by { local IFS="$1"; shift; echo "$*"; }
34
35if [[ ${ONLYONMASTER} == true && ${CURRENTBRANCH} != 'master' ]]
36then
37  echo "Not on master branch, hence nothing to do."
38  exit
39fi
40
41if [[ ${CHECKGIT} == true && $(git diff --name-only ${VERSIONFILE}) ]]
42then
43    echo "File $VERSIONFILE has been changed already..."
44    exit
45fi
46
47
48# Read version from file
49read -r oldversionstring <<< "$(cat "${VERSIONFILE}")"
50
51# Split version string
52IFS='.' read -r -a versionArray <<< "$oldversionstring"
53
54lastPart=${versionArray[${#versionArray[@]}-1]}
55
56# Check whether we need to increment a development version
57# Otherwise just increment $lastPart
58if [[ ${lastPart} =~ .*dev.* ]]
59then
60    IFS='dev' read -r -a splitLastPart <<< "$lastPart"
61    revision=${splitLastPart[${#splitLastPart[@]}-1]}
62    ((revision++))
63    lastPart=dev${revision}
64else
65    ((lastPart++))
66fi
67
68# Reassign last part of versionArray
69versionArray[${#versionArray[@]}-1]=${lastPart}
70
71# Concatenate new version string
72newVersionString=$(join_by . "${versionArray[@]}")
73
74# Write new version string to file
75echo "${newVersionString}" > ${VERSIONFILE}
76
77if [[ ${CHECKGIT} == true ]]
78then
79    git add ${VERSIONFILE}
80    echo "Version in version file upped"
81fi

Admittedly, bash is not the easiest programming/scripting language to read. But this (well-tried) script does do a good job. Nevertheless, the script is somewhat documented an tries to stick with what it says it does. In short:

  • Check whether we are on the master branch - if not, exit.

  • Check whether version file has already been changed, and if so, exit.

  • Check whether we need to increment a development version.

So how to hook this into our git workflow? Here are the steps:

  • Save the above bash script to a file in your project, preferrably as ./bin/incrementVersion.sh, and make the file executable (chmod +x incrementVersion.sh).

  • Create a file pre-commit in the .git/hooks/ directory of your repository, if it does not already exist.

  • Add these lines to the pre-commit file:

    #!/bin/sh
    bash bin/incrementVersion.sh
    
  • Make the pre-commit file executable.

Make sure to repeat these steps with every local copy of your repository on the different computers you’re using. And document these steps somewhere in your project’s documentation, preferably in the section for developers.