Helix Engineering: The art of configuration, part 1

September 27, 2018

This is part of a blog series written by the Helix Engineering staff. To follow along with the team as they publish, keep tabs on the Helix Engineering blog category.

Software is made of code and configuration. The code is typically the algorithms that process the inputs and data. The behavior of the code is impacted by the configuration. For example, configuration of a microservice may include the endpoints of related microservices, timeouts, and various feature flags. Strictly speaking, configuration is not needed. You can always hard-code your configuration in the code and whenever you need to change it, you just create a new version of the code. For many good reasons that I will not get into here, developers usually prefer to separate their code from its configuration. In this blog post we’ll explore the following:

• Common configuration options
• Configuration in the world of cloud-native, distributed systems
• Configuration of microservices deployed as containers via a CI/CD pipeline
• AWS Parameter Store as a remote configuration service

Some things that are related to the discussion, but I’ll leave for another day are:

• Resource provisioning
• Secret management
• Configuration versioning

Command-line arguments

Command-line arguments are a staple of program configuration. Every programming language supports them. There are many good libraries for command-line argument definition and parsing. They are particularly great if you want to run the same program multiple times in a row with different arguments. The AWS CLI (command-line interface) is a great example of a program that really takes advantage of command-line arguments. The AWS CLI is a Python program that lets you access all the AWS APIs from the command line. It has multiple levels of commands and sub-commands and uses command-line arguments to great effect.
The down side of command-line arguments is that you have to provide them every time you run the command. Some programs require a lot of information to run, and providing it all on the command line can be tedious. There are several other issues you may run into, such as:

• Multi-line arguments
• Arguments containing spaces (shell will interpret as multiple arguments if not quoted)
• Arguments containing special characters that confuse the shell
• Too many arguments
• Difficult to provide hierarchical data such as JSON or YAML (or TOML)

Environment variables

Environment variables are variables you define in the shell before executing your program. These variables are available to your program and you can use them instead of command-line arguments. Environment variables can save you a lot of typing if you often need to pass the same information to your program. Using the AWS CLI as an example again, most commands operate on a specific region. If you often operate on the same region instead of passing it as a command-line argument, you can set an environment variable once (or even put it in your profile) and run as many commands as you want that will target the same region:

AWS_DEFAULT_REGION=us-east-1

Environment variables have their own issues:

• Clutter a global resource
• Risk of conflicting with environment variables of other programs
• Need to remember to set every time you start a new session (or have in profile)
• Limited size (different on different operating systems)
• Difficult to provide hierarchical data such as JSON or YAML (or TOML)

Configuration files

Configuration files provide another flexible way to configure programs. Configuration files are great for managing a lot of structured information. A default configuration file is often stored in source control along with the code, so the history of configuration is accessible too. Configuration files are usually text files formatted as INI, XML, YAML, JSON or TOML. The AWS CLI uses a couple configuration files typically stored in:

• ~/.aws/credentials
• ~/.aws/config

Those files contain user credentials and profiles. Configuration files are typically large and contain a lot of information. You can’t expect users to type in a new configuration file every time they run the program. As a result, configuration files are often shipped with the program and savvy users can tweak them. But, the fact that the configuration file must accompany the program makes the boundary between code and configuration a little fuzzy. This is very prominent in dynamic languages like Python or Ruby, where the configuration file may even be a Python or Ruby file that contains a bunch of assignments.

Hybrid approaches

All of these approaches are not mutually exclusive. You can use some or all of them at the same time. It is very convenient, for example, to have the AWS region specified in an environment variable if you usually operate in a single region, but be able to override it with a command-line argument if you need to perform a one-time operation in another region. The AWS CLI takes this approach to the limit and allows multiple configuration options and defines a search order with clear rules about the precedence of different options. (See https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html#config-settings-and-precedence.)

Stay tuned for more

That’s it for part one—but check back soon, because our next chapter will cover the configuration of cloud applications and using the AWS Parameter Store as a remote configuration service.