Dynamic Configuration at Mixpanel

Move fast and (un)break things

Why Dynamic Configuration?

Your code is often powered by many constants and configurations. Imagine passing this static configuration to your code:

Config {
whitelisted_customers: [customer_1, customer_2],
feature_1_turn_on_percentage: 0.1,
}

You wrote your code with this static config and deployed it to 100 machines. All is good in the world until you need to change it. For us, these changes were frequently motivated by one of the following reasons:

  • Adding a customer to a new alpha feature
  • Remove a customer from said alpha to debug query failures
  • Rate limit a particular client to avoid resource starvation
  • Rolling out a feature to a stateful system carefully
  • Changing timeouts, number of retries, rate limits, etc

In the static configuration world, you make the config change in code, rebuild your service binary, and redeploy. This can be time-consuming, particularly for systems with large numbers of replicas or stateful systems that have elaborate graceful shutdown procedures. Ideally, you could update these configs “live” (without requiring a restart), for much faster iteration speed.

A dynamic configuration system allows developers to update configs without redeploying their services. Companies like Facebook and Twitter have built pretty sophisticated systems for this very purpose. At Mixpanel, we decided to write our little way of managing dynamic configuration for our Golang services, built on Kubernetes.

Requirements for dynamic configs

  1. Fast: When a change to configuration is made, all the related deployed services should see the effect in under a minute.
  2. Flexible: Configs should be stored in a format that allows for arbitrary values
  3. Safe: It should be possible to write validation on the configs.
  4. Version Controlled: As a natural extension, it should be easy to revert to an older version of the config.
  5. Easy-to-use: There should be easy to use SDKs for services, that make it seem like they are just fetching regular config values.

How Mixpanel Manages Dynamic Configuration

kubeopher

Federation

With these design principles in mind, we set out to write something quickly that covers 90% of our use cases. Mixpanel runs on GKE and most of our services are written in Go. Kubernetes already has something called configmaps, which allow developers to use Kube’s SDKs to create, update, and apply configurations to services or deployments.

Mounting a configmap simply places the configs in files (key being the filename, the value being the contents) in the pod’s filesystem. Any changes to the configmap are done just like applying any other changes to a kube object. This reliably solves the problem of shipping configurations to pods in a consistent and real-time way.

Version Control and Validation

Our configurations are written in YAML files as described in a section below. These files are checked into our git repo, and changes to these follow the same dev workflow as changes to code. This way, all our configurations have version control and go through a review + CI process.

Config Scope

Initially, we decided to group configurations for the same service together. For example, for our querying service, we would have a single scope “query” under which all the configurations used in that service live.

Later, we decoupled configs and services (though they are often closely related) for the following reasons:

  1. Some configurations (eg whitelists) supported a large feature that spanned multiple services
  2. Two different teams who work in a single service want to update configs independently without the mental overhead of reasoning through each other’s configs.

Today, most of the configurations are scoped based on the product vertical as opposed to individual services.

Configuration Object Layout

Here is an example of what our configmap ends up looking like:

Data
====
configs.json:
- — —
[
{
"key": "feature_enabled_customers",
"value": {
"123": {},
"456": {}
}
},
{
"key": "scaling_percentage",
"value": 1
},
{
"key": "timeout_secs",
"value": 5
}
]

The example above will end up mounting the configmap and creating a file called configs.json with the JSON contents inside it. You can notice that each config is

{ 
“key”: “key_name”,
“value”: <JSON object, string, bool, number … >
}

SDK

At the moment we only have one SDK written in Go. The SDK does the following:

  1. Load the contents of configs in the configs.json file into memory whenever it changes. This is achieved using fsnotify.
  2. Provide an interface to fetch the configuration by the config name. At the time of writing, this is what the SDK interface looks like:
ConfigManager Interface

This SDK is open-sourced! You can find it at https://github.com/mixpanel/configmanager

Changing Configuration

Kubernetes allows developers to describe resources (services, jobs, configs) as declarative YAML. At our scale of services, we’ve found hand-writing this YAML to be duplicative and error-prone. Instead, we write Jsonnet, which is then compiled to generate YAML files. Both the Jsonnet and YAML files are checked into our repo.

To make config changes declarative and easy for our developers, we wrote the following Jsonnet library.

configmap lib in Jsonnet

Given this kind of syntactic sugar, we can write our sample config:

Sample Config in Jsonnet

Compiling this will produce the YAML example above.

In the Wild

Configmanager (which is what we creatively call this library) is used widely across our backend services at Mixpanel. Here is just a small sampling of its uses:

  1. Storage: Pause and resume button on CPU intensive features such as fulfilling GDPR deletion requests.
  2. Ingestion: Rolling out critical features such as event de-duplication and identity management on our ingestion layer. Configmanager was used to add and remove from the whitelisted set of customers and gradually ramp up traffic.
  3. Query: Invalidating caches for a single customer or globally in case of caching format changes or bugs.
  4. Everywhere: In various parts of our stack, it is used to control the amount of parallelism available to handle our workload. Another common use case is changing rate limits.

To debug the configmanager itself, we have integrated it with our metrics reporting and logging system to report various kinds of errors. It also has some debugging HTTP endpoints that return the in-memory config state in memory or force a configuration reload. We also wrote a dummy configmanager implementation with the same interface for use in unit tests.

Future work

We are pretty happy with the flexibility and developer speedup provided by the configmanager. That said, there is room for improvement; here’s what we plan on changing in the future:

Adding a UI

Currently, developers use their favorite editor and Kubernetes CLI to view and edit configs. Though these are simple and get the job done, for gigantic config files, it may be easier to work with a UI rather than a single massive JSON file. You can edit kube configmaps on google cloud, but it is not the best experience.

More powerful configuration language

Currently, configs are simple key-value pairs. Any additional logic must be encoded by the client. Here is an example of the hypothetical config key route_to_queue:

if {customer: c1, event: alias_call} route to high priority queueif {customer: c1, event: track_call, lib_version: v2.1} route to fail queue...

To do this kind of switch statement style routing, you can store a JSON object and write some code that evals the JSON object and write the switch statement in code. Alternatively, you can express config values themselves in a flexible grammar and encode this switching logic in the configs themselves. It also makes it so that we don’t have to write one-off methods like IsCustomerWhitelisted.

In the past, I used to work on the dynamic config team at Uber and we modeled our configuration values so that every config key had a default value and list of exceptions each of which had a rule written in PEG grammar and associated value. This proved to be quite flexible and useful for a large number of teams, but with the downside that it was harder to reason about a large config with exceptions.

Here is a post I wrote for that. At some point, if there is enough need for it at Mixpanel, we may extend the configmanager in this way.

References

  1. https://medium.com/google-cloud/kubernetes-configmaps-and-secrets-68d061f7ab5b
  2. https://jsonnet.org/
  3. https://github.com/mixpanel/configmanager
  4. https://medium.com/@nikunjyadav/generic-rules-engine-in-golang-using-antlr-d30a0d0bb565

Responses (1)

Write a response