Our second vi tech blog update comes directly from our team in Kiev. Here, Sergey Koshel takes a chainsaw to hardware environments, as he argues for a radical approach to delivering new products.

trashing hardware product development

Trashing Hardware

 

The ad:tech industry imposes a lot of strong restrictions (SLA) on latency, and the availability of services.

To play globally it requires us to be geographically close to partners and clients. In terms of tech, one must consider different global environments, and continuously deliver new features. And as usual, it should be fast, reliable and safe.

In the simplest case new features are delivered without changing environment; a new version of your jar/deb/nuget/npm/docker package. Nothing special here, so let’s not dwell on it.

The interesting situation starts when the deployment of a new version requires us to change environment, sometimes even including hardware configuration changes. It’s hard to imagine how to deploy a new version without any casualties or downtime for end users. But it is possible.

What if our new version of a product is not just a code we deploy… what if it includes everything needed to setup the whole product from scratch (including hardware/software configuration).

All that’s required is to provide credentials to your cloud provider and you are ready.

At vi we’ve done this in 3 “simple” steps:
– scripts to setup your environment (bash + terraform)
– scripts to provision your environment / cluster (bash + ansible + kubernetes)
– scripts to deploy your services into the cluster (ansible)

This opens us a possibility to prepare a new version in a matter of minutes, smoothly switching traffic flow from the old version to a new one.

The last step is the most exciting – to destroy the old version with all the hardware involved. Seriously! No need to be proud of your server uptime – be proud of your service uptime instead. Hardware is like a software you can stop, launch or uninstall.

Here are the benefits of this approach:

– zero deploy downtime
– new version of a product in the matter of minutes
– rollback is easy – just switch back to the previous version
– easy to update software/hardware
– you can test the whole configuration before going live
– A/B testing from the box
– all the product knowledge concentrated in the one codebase – you don’t need to keep tons of outdated information in your Wiki anymore
– one team with the full product ownership – no need in dedicated people just to prepare environment or open firewalls
– environment versioning – history of all the changes done from version 0

Some tips and tricks on the road:

– keep state/storage separate from the software. Try to use NFS where it is possible
– visualize crucial moments and KPIs from the system to see what’s going on almost in real time
– use central log storage and tools for querying and visualization such as Kibana/Grafana/… It gives you the possibility to see the overall picture and connect events from different parts of the system
– do not reinvent the wheel. Try to check what’s already present and create prototype first

Of course, this approach requires time spent scripting/prototyping all this stuff. Luckily a lot of tools are around to simplify it. So open your toolbox, and trash some hardware.