(This is part II of a two part series of posts, you can find part I here)
One of the most powerful parts of the Mixpanel query language is the
any operator, which allows you to select events or profiles based on the value of any element in a list. The
any operator is just a bit more magical than the other operators in our query language, both in its power and in its implementation.
We’ve already written about building the Mixpanel expression language – the language we built inside of the Mixpanel data store to allow you to query and select data for reports. The model we built in the last post can do a lot of work, but parsing and interpreting the
any query takes the language to another level, both metaphorically and syntactically.
Like the basic expression language post, we’ll be using Python and JSON to talk about procedures and data, but won’t assume you’re a serious Pythonista. It will also be worth taking another look at the simple expression language post, since this post elaborates that model.
(This is part one of a two part series, you can find part II here)
The Mixpanel reporting API is built around a custom expression language that customers (and our main reporting application) can use to slice and dice their data. The expression language is a simple tool that allows you to ask powerful and complex questions and quickly get the answers you need.
The actual Mixpanel expression engine is part of a complex, heavily optimized C program, but the core principles are simple. I’d like to build a model of how the expression engine works, in part to illustrate how simple those core principles are, and in part to use for exploring how some of the optimizations work.
This post will use a lot of Python to express common ideas about data and programs. Familiarity with Python should not be required to enjoy and learn from the text, but familiarity with a programming language that has string-keyed hash tables, maps, or dictionaries, or familiarity with the JSON data model will help a lot.
We recommend setting up work queues and batching messages to our customers as an approach for scaling upward server-side Mixpanel implementations, but we use the same approach under the hood in our Android client library to scale downward to fit the constraints–battery power and CPU–of a mobile phone.
The basic technique, where work to be done is discovered in one part of your application and then stored to be executed in another, is a simple but broadly useful; both for scaling up in your big server farm and scaling down for your customer’s smartphones.