By Venkat Thiruvengadam, Founder and CEO of DuploCloud
Which is more effective for managing cloud infrastructure: Infrastructure as Code (IaC) or Cloud Platforms? IAC requires engineers to use programming languages to build scripts that achieve the desired topology on a cloud platform. Popular tools for this approach include Terraform, Cloud Formation, Chef, Puppet, and Ansible. Essentially, the technology consists of a script-writing language and a controller that runs the scripts. Engineers save the scripts in a code repository and repeat the process for any future changes.
In contrast, Cloud Orchestrators or Platforms serve as a thin abstraction over native cloud APIs. Users interface with the orchestrator through a web service, where they can create and apply the cloud topology directly within the platform. The orchestrator saves the topology in its own database, eliminating the need for users to explicitly save the configuration. For updates, users log in to the system and make the necessary changes.
While a cloud platform may be too complex for smaller-scale operations, using IaC scripts can offer customization when needed. A better strategy is to use an off-the-shelf platform that can be enhanced with IaC scripts for further customization. It is important to keep in mind that megascale data centers such as those belonging to Facebook and Netflix have different requirements and are not considered here.
The fundamental value that a platform-based approach provides is what we call “long-running context.” People may also call this a “project” or a “tenant.” A context could map to, say, an application or an environment like demo, test, prod or a developer sandbox. When making updates to the topology, the user always operates in this context. The platform would save the updates in its own database within this context before applying the same to the cloud. In short: You are always guaranteed that what is present in this database is what is applied to the cloud.
In the IaC approach, such a context is not provided natively and is left to the user. Typically this would translate to something like “Which scripts need to be run for which context?” or maybe a folder in the code base that represents a configuration for a given tenant or project. Defining the context as a collection of code is harder because many of the scripts might be common across tenants. So most likely it comes down to the developers’ understanding of the code base.
A platform is a more declarative approach to the problem, as it requires little or no coding, as the system would generate the code based on the intent, without requiring knowledge of low-level implementation details. Meanwhile, in the case of IaC, any changes require a good understanding of the code base, especially when operating at scale. In the platform approach, a user can come back and log in to the same context a few days later and continue where they left off without having to dig deep into the code to understand what was done before.
Difference Between the Code Base and What Is Applied to the Cloud
The second fundamental difference between the two is that IaC is a multistep process (write the script, run it and merge it in the repo), while a platform is a one-step process (log in to the context and make the change). With IaC, it is possible that the user might update a script, but may also forget or postpone saving it in the repository. Meanwhile, another engineer could have made changes to the code base for their own side of topology and merged it. Now, since many pieces of code are shared for the two use cases, the first developer might find themselves in a conflict which, even if resolved by merging the code, lands them in a situation where what was run in the cloud is not what is in the repo. Now the developer has to re-run the merged code to validate, notwithstanding the possibility of causing regression. To avoid this risk, we need to now test the script in a QA environment.
All the ‘Other’ Stuff
IaC tools will enable deployments, but there is so much more to running infrastructure for cloud software. We need an application-provisioning mechanism, a way to collect and segregate logs and metrics per application, monitor health and raise alerts, create an audit trail, and an authentication system to manage user access to the infrastructure. Several tools are available to solve these individual problems, but they need to be stitched together and integrated into an application context. Kubernetes, Splunk, CloudWatch, Signalfx, Sentry, Elk and Oauth providers are all examples of these tools. But the developer needs a coherent “platform” to bring all this together if they want to operate at a reasonable scale. This brings us to our next point.
Much of IaC Is Basically a Homegrown Cloud Platform
When talking to many engineers we hear the argument that Infrastructure as Code combined with BASH scripts of even regular programming languages like Go, Java and Python provide all the hooks necessary to overcome the above challenges. Of course, I agree. With this sort of code, you can build anything. However, you might be building the same kind of platform that already exists. Why not start from an existing platform and add customization through scripts?
The second argument I have heard is that Infrastructure as Code is more flexible and allows for deep customization, while in a platform, you might have to wait for the vendor to provide the same support. I think as we are progressing in technology to the point where cars are driving themselves — once thought to be little more than pure fantasy! — platforms are far more advanced than they are given credit for and have great machine-generation techniques to satisfy most, if not all, use cases. Plus, a good platform would not block a user from customizing the part that is beyond its own scope via scripting tools. A well-designed platform should provide the right hooks to consume scripts written outside the platform itself. Hence this argument does not justify building a code base for the majority of the tasks that are standard.
‘There Is No Platform That Fits Our Needs’
This is also a common argument. And I agree: A good platform should strive to solve this prevalent problem. At DuploCloud, we believe we have built a platform that addresses the majority of the use cases while giving developers the ability to integrate policies created and managed outside the system.
‘The San Mateo Line!’
A somewhat surprising argument in favor of building homegrown platforms is that it is simply a very cool project for an engineer to tackle — especially if those engineers are from a systems background. I live in Silicon Valley and have found a very interesting trend while talking to customers specifically in this area.
When we talk to infrastructure engineers, we find that they have a stronger urge to build platforms in-house, and they are quite clear that they are building a “platform” for their respective organizations and are not, as they would consider it, “scripting.” For such companies, customization is the common argument against off-the-shelf tools, while hybrid cloud and on-premises are important use cases. Open source components like Kubernetes, Consul, etc., are common, and thus I frequently hear the assertion that the wheel need not be reinvented. Yet the size of the team and time allocated for the solution is substantial. In some cases, the focus on building the platform overshadows the core business product that the company is supposed to sell. While not entirely scientific, I tend to see these companies south of San Mateo.
Meanwhile, the engineering talent at companies north of San Mateo building purely software as service applications is full stack. The applications use so much native cloud software — S3, Dynamo, Amazon Simple Queue Service (SQS), Amazon Simple Notification Service (SNS) — that it’s hard to be hybrid. They are happy to give the container to Amazon Elastic Container Service (Amazon ECS) via API or UI to deploy it. They find no joy in either deploying or learning about Kubernetes. Hence, the trend and depth of in-house customizations is much less.
How many times and how many people will write the same code to achieve the same use? Time to market will eventually prevail.
– Venkat Thiruvengadam is the Founder and CEO of DuploCloud , an end-to-end DevOps automation and compliance platform that takes high-level application specifications from the user and translates them to secure cloud configurations. Venkat was also an early engineer at Microsoft Azure and the first developer and founding member in Azure’s networking team, Venkat wrote significant parts of Azure’s compute and network controller stack.