Helix Engineering: Our approach to microservices, part 3

July 3, 2018

In Part 1 of our series, we touched on the benefits of microservices, why Helix uses them, and how we’ve progressed down that path. Last week, we looked at authentication, authorization, and APIs. In the conclusion to our three-part series on microservices, we’ll run through the CI/CD pipeline, testing, and where we’re headed next.

The CI/CD pipeline

Helix uses GoCD for continuous integration and deployment. Each service and application live in their own git repo on Github. Developers work on a branch, and whenever they push changes to their branch, their changes are tested by GoCD agents running in a test environment on a feature branch. We have homegrown scripts to properly version and tag every build and enforce uniform behavior when rebasing. When a PR is reviewed, accepted, and merged into master it is tested again, then the changes automatically propagate to the staging environment and tested again. Finally, the changes move to production.
As part of the build process, each service is packaged into a Docker image that gets pushed into the AWS Elastic Container Registry. The same image is pulled into the different environments and gets deployed on the AWS Elastic Container Service. Another important aspect of deployment is provisioning various AWS resources such as instances, IAM roles, firewall rules, and load balancers. We use Terraform to specify all these resources and have some more homegrown scripts to ensure we follow a uniform process.

Testing

At Helix, we take testing very seriously. There is always a question of how much testing you need. There is no single answer. Different parts of the system may be more or less critical. We practice multi-tier testing that includes both unit tests and end-to-end tests per service. In a microservices application, a service often needs to talk to other services. Some of these may be third party services that might not have a test environment. We have several solutions, such as:

Mocking dependencies
Creating test data in dedicated test environments
Hitting endpoints in the staging environment for read-only tests

Our primary tool for testing Go code is the Ginkgo testing framework. For front-end code we use primarily Jest and Nightwatch, with flow type checks for good measure, and a smattering of Selenium to drive the browser.

Error reporting, logging, and instrumentation

Any non-trivial distributed system must keep track of what’s going on inside. At Helix, we use Rollbar for central error reporting, Sumo Logic for central logging, and NewRelic for collecting metrics.

Troubleshooting

Helix uses PagerDuty to stay on top of issues. When something goes wrong, the on-call engineer gets notified and can start diagnosing the problem. For infrastructure-related issues we have a separate rotation of InfraOps engineers. In a microservices environment, it is not always easy to pinpoint the root cause because data flows between many services and a problem at the beginning of the chain might only be discovered way downstream. This is where robust error messages and logging come into play.

Challenges

The Helix platform is a work in progress. New business needs and operational improvements require constant work of the Helix Engineering team. We are very proud of what we have built, and excited to further develop and refine it. Here are some of the upcoming challenges we will be facing:

Balancing developer productivity with system stability and security
New service provisioning and setup
Configuration
Cross service testing
Cross service troubleshooting
AWS cost reduction
AWS limits and quotas
Introducing and migrating to new technologies, tools and processes

Looking ahead

In the future, there are many technologies and capabilities we want to evaluate, incorporate into the Helix platform, or use more frequently. Here is a partial list:

gRPC
Serverless
GraphQL
Using the new React context API
Cross-service testing
Performance and load testing
Dynamic Configuration
Queue-Based Service interactions
Fargate and Kubernetes
Blue-green and canary deployments

Stay tuned for more

So, that was a whirlwind high-level tour of the Helix microservice-oriented architecture. The story is still unfolding! In future blog posts, we’ll drill in deeper and examine the cool stuff we build here. Stay tuned!
To make sure you’re seeing the latest from the Helix Engineering team, follow along here.