At Mixpanel performance is particularly important to us and as we begin to scale our data volume to support billions of actions. We’ve found ourselves thinking about how to solve problems better.
We’re currently writing a feature that is going require considerable scale and performance but in order to do it we had to think about how to do it in a time for our users to be happy. Unfortunately, Python is too slow for some types of operations we wish to do where we can get an order of a magnitude of performance out of something lower level like C.
So imagine: You want to stick to Python because it’s so fast to develop in but need the performance of C/C++. Let me introduce you to C extensions in Python.
If you’ve ever used something like cJSON in the past, then you’ve already installed something like this before–it’s likely a lot modules you import in Python are built in C and not just pure-python.
Now if something wonky happens, I can easily modify the library code. We also get the added benefit of broader platform support – you can use mixpanel.com on your mobile device and it works perfectly.
Actually picking the library was a little tricky. We were lucky – highcharts was released right when we started looking and it has performed admirably. There are a few other good choices though, and I will go into all of them in some depth.
In my previous post covering OpenVPN, I said that we needed to restrict access to most of our servers – they will only be accessible to each other, rather than open to the outside world.
How do we do this? iptables. You can add iptables rules that explicitly state the ip addresses that are allowed through the firewall, and then disallow everything else.
If our network was static – meaning we would never have to add more machines – then this would be really simple. All you’d need to do is update your iptables file once with the ip of every server you own, and you’re done. No worries.
In the real world, the network isn’t static. We’re adding new machines all the time, and if we don’t update iptables at the same time, the new machines won’t be able to communicate with the old ones. To solve this problem, I dynamically generate iptables files and deploy them with Fabric.
Note: all code mentioned in this post can be found on github here: http://github.com/ttrefren/firewall
When I want to deploy code to http://mixpanel.com, I open a new terminal window and type
fab deploy. Even though we have quite a few servers these days, our deploy process is really streamlined – we push code multiple times per day.
When you’re first starting a new web project, deployment is easy. All you have to do is log in to your server and do a
git pull, and probably restart your web server. No problem.
If you grow beyond a single machine, though, this technique is rife with problems: the time it takes to deploy code grows linearly with the number of servers you have, it’s difficult to synchronize deployment, and it’s simply error-prone. Any point in your deployment process that requires you to log in to a server and type multiple commands is just asking for trouble.
We use a tool called Fabric to automate this process. Fabric makes it really easy to run commands across sets of machines. It’s similar to Capistrano (Ruby) but it’s written in Python so it was an easy choice for us
Imagine this scenario: your company is growing rapidly and you’re hiring tons of engineers. The passwords to many of your servers are stored in plaintext in configuration files — everyone has access to them. Your main database – the one with all the user data – is available to anyone who has that file.
You fire an engineer. Now what? He knows the passwords to everything. Do you trust him? What if you’re firing him because he’s just a bad egg? What can you do?
Well, you could change the passwords on every single server… but that’s a huge pain in the ass for everyone involved. Luckily, there’s a solution for this problem, one that comes with quite a few other benefits as well: you set up a VPN.