A core component of Mixpanel is the server that sits at http://api.mixpanel.com. This server is the entry point for all data that comes into the system – it’s hit every time an event is sent from a browser, phone, or backend server. Since it handles traffic from all of our customers’ customers, it must manage thousands of requests per second, reliably. It implements an interface we’ve spec’d out here, and essentially decodes the requests, cleans them up, and then puts them on a queue for further processing.
Because of these performance requirements, we originally wrote the server in Erlang (with MochiWeb) two years ago. After two years of iteration, the code has become difficult to maintain. No one on our team is an Erlang expert, and we have had trouble debugging downtime and performance problems. So, we decided to rewrite it in Python, the de-facto language at Mixpanel.
Given how crucial this service is to our product, you can imagine my surprise when I found out that this would be my first project as an intern on the backend team. I really enjoy working on scaling problems, and the cool thing about a startup like Mixpanel is that I got to dive into one immediately. Our backend architecture is modular, so as long my service implemented the specification, I didn’t have to worry about ramping up on other Mixpanel infrastructure.
I’m not going to spend much time describing what gevent is. I think the one sentence overview from its web site does a better job than I could:
gevent is a coroutine-based Python networking library that uses greenlet to provide a high-level synchronous API on top of libevent event loop.
What follows are my experiences using gevent for an internal project here at Mixpanel. I even whipped up some performance numbers specifically for this post!
At Mixpanel performance is particularly important to us and as we begin to scale our data volume to support billions of actions. We’ve found ourselves thinking about how to solve problems better.
We’re currently writing a feature that is going require considerable scale and performance but in order to do it we had to think about how to do it in a time for our users to be happy. Unfortunately, Python is too slow for some types of operations we wish to do where we can get an order of a magnitude of performance out of something lower level like C.
So imagine: You want to stick to Python because it’s so fast to develop in but need the performance of C/C++. Let me introduce you to C extensions in Python.
If you’ve ever used something like cJSON in the past, then you’ve already installed something like this before–it’s likely a lot modules you import in Python are built in C and not just pure-python.