IaC Principles
Infrastructure as Code is not a tool or a technology — it is a discipline and a philosophy. It means applying the same practices used to build reliable software (version control, testing, code review, automation) to the infrastructure that software runs on.
The shift from manually managed servers to code-defined infrastructure is one of the most consequential changes in how modern systems are built and operated.
From the Iron Age to the Cloud Age
Section titled “From the Iron Age to the Cloud Age”Traditional “Iron Age” infrastructure was static. Servers were physical machines, provisioned by hand, configured manually, and treated as long-lived, precious resources - the “pet” model. Changing them was risky. Replacing them was expensive.
Cloud-age infrastructure is dynamic. Resources are API-driven, provisioned in minutes, and designed to be disposable - the “cattle” model.
| Iron Age | Cloud Age |
|---|---|
| Physical servers, manual provisioning | API-driven, programmatic provisioning |
| Servers as pets - named, cherished, maintained | Servers as cattle - numbered, replaced on failure |
| Infrastructure changes are risky, infrequent | Infrastructure changes are routine, automated |
| Snowflake configurations (unique, fragile) | Reproducible, identical environments |
| Change management is a bottleneck | Change management is embedded in code review |
Why IaC Exists: The Core Problem It Solves
Section titled “Why IaC Exists: The Core Problem It Solves”Manual infrastructure management fails at scale for three reasons:
- No repeatability - Two engineers provisioning the same environment by hand produce different results. Debugging becomes archaeology.
- No shareability - Tribal knowledge locked in one engineer’s head becomes a bus-factor risk.
- Drift - Environments configured by hand diverge over time. Production works; staging doesn’t. The cause is never documented.
IaC replaces all three failure modes with code:
| Problem | IaC Solution |
|---|---|
| Repeatability | Same code produces identical infrastructure every time |
| Shareability | Code lives in Git — reviewable, forkable, discoverable |
| Drift | Desired state is declared; any deviation can be detected and corrected automatically |
The Three Core Practices
Section titled “The Three Core Practices”IaC rests on three practices that every team must adopt before anything else:
1. Define Everything as Code
Section titled “1. Define Everything as Code”If it’s not in code, it doesn’t exist. This includes:
- Compute resources (VMs, GKE node pools, Cloud Run services)
- Networking (VPCs, subnets, firewall rules, load balancers)
- IAM (service accounts, bindings, organization policies)
- Storage (buckets, databases, Pub/Sub topics)
- Configuration (environment variables, feature flags)
2. Continually Test and Deliver All Work in Progress
Section titled “2. Continually Test and Deliver All Work in Progress”IaC that is only applied once a week is not continuous delivery — it’s batch infrastructure changes. IaC should be held to the same standard as application code: every commit triggers validation, every PR triggers a plan review, and merging to main triggers automated deployment.
3. Build Small, Simple Pieces That Can Change Independently
Section titled “3. Build Small, Simple Pieces That Can Change Independently”Monolithic infrastructure stacks — one giant Terraform configuration that manages everything — are the infrastructure equivalent of a monolithic application. When something changes, everything is at risk. Small, composable stacks with clear interfaces fail in isolation and change safely.
The Four Key Metrics (DORA for Infrastructure)
Section titled “The Four Key Metrics (DORA for Infrastructure)”
The same DORA metrics used to measure software delivery performance apply directly to infrastructure:
| Metric | Infrastructure meaning |
|---|---|
| Deployment frequency | How often infrastructure changes are applied to production |
| Lead time for changes | Time from an infrastructure change commit to it running in production |
| Change failure rate | Percentage of infrastructure changes that cause incidents |
| MTTR | How quickly infrastructure failures are recovered from |
Principles of Cloud Infrastructure
Section titled “Principles of Cloud Infrastructure”Beyond the core practices, the following are seven properties that all well-designed cloud infrastructure should have:
| Principle | What it means in practice |
|---|---|
| Assume systems are unreliable | Design for failure — don’t assume a VM, network call, or managed service will always be available |
| Make everything reproducible | Any resource can be destroyed and recreated from code; no manual steps required |
| Avoid snowflake systems | If it took a specific person to build it, it’s a snowflake. Replace it with code. |
| Create disposable things | Infrastructure components are replaced, not repaired. Immutability is the default. |
| Minimize variation | Dev, staging, and production environments should be as identical as possible — defined by the same code with only parameterized differences |
| Ensure any procedure can be repeated | Runbooks become pipelines. No step that humans do manually that could fail differently each time. |
| Apply software design principles | Separation of concerns, single responsibility, DRY — these apply to infrastructure code too |
Aligning Infrastructure with Organizational Strategy
Section titled “Aligning Infrastructure with Organizational Strategy”
The Strategic Hierarchy The alignment of infrastructure with an organization’s broader goals is fundamentally driven by customer value. This creates a top-down strategic flow where organizational strategy drives product strategy, which drives technology strategy, and ultimately dictates infrastructure strategy. In return, each foundational technical layer must be designed to explicitly support the strategic business layers above it.
The Disconnect Between Leadership and Engineering A significant challenge for many organizations is the communication gap between the people making strategic commercial decisions and the engineers building the foundational systems.
- Leadership blind spots: Organizational leaders often dismiss the need for detailed infrastructure planning, mistakenly assuming that simply selecting a cloud vendor is the end of the process. When architectural problems eventually limit growth, security, or stability, these leaders tend to demand quick fixes rather than addressing the structural root causes.
- Engineering blind spots: Conversely, engineering teams frequently focus on implementing obvious technical solutions without thoroughly understanding the commercial context or the end-user requirements. For example, an engineering team once built a highly segregated multiregion cloud architecture to strictly comply with privacy regulations. Because they did not communicate closely with the commercial strategy team, they missed a critical business requirement: users needed international roaming access while traveling abroad. This strategic misalignment resulted in massive delays, immense expense, and a necessary total rework of the system architecture.
Mapping Business Goals to Infrastructure Capabilities To prevent costly misalignments, it is essential that everyone—from boardroom executives to software developers—understands how technical architecture either enables or hinders strategic success. Specific business goals directly require specific infrastructure capabilities:
- Delivering continuous customer value: To release new products and features quickly and reliably, an organization requires infrastructure that easily supports developing, testing, and hosting services. Success in this area is measured by strong performance on the four key metrics (delivery lead time, deployment frequency, change fail percentage, and MTTR) and a low dependency on platform teams for routine software delivery tasks.
- Growing revenue and expanding into new markets: Expanding into new geographic regions or launching new product lines demands the ability to rapidly deploy new hosting environments and system capacity. The effectiveness of this infrastructure is measured by the speed at which new hosting can be added and the incremental cost of each new region or product instance.
- Providing highly reliable services: To maintain customer trust and satisfaction, systems must possess robust scaling, disaster recovery, and monitoring capabilities. Success here is tracked through standard availability and performance metrics.
The Role of Infrastructure as Code in Achieving Strategic Alignment Broad organizational objectives ultimately filter down into specific, actionable goals for infrastructure architecture. Key strategic infrastructure requirements usually include environment consistency, self-service provisioning, automated recovery testing, and standardized platform products.
Adopting Infrastructure as Code (IaC) is highly effective for achieving these critical goals. For instance, IaC enforces environment consistency across the entire development lifecycle, ensuring that test environments perfectly mirror live production environments. This one foundational infrastructure capability ripples upward to support multiple high-level business goals by:
- Improving software delivery effectiveness and speed.
- Minimizing the manual customization and effort required when expanding into new global regions or launching new products.
- Making it significantly easier to automate overarching operational necessities like system security, regulatory compliance, and disaster recovery.
- Allowing the organization to consolidate, simplify, and rationalize its overall system architecture.
IaC and CI/CD
Section titled “IaC and CI/CD”IaC and CI/CD are the same discipline applied to different artifacts. The pipeline that runs terraform apply is constructed the same way as the pipeline that deploys application code — it validates, tests, stages, and promotes. The difference is the artifact at the center:
| Software CI/CD | Infrastructure CI/CD |
|---|---|
| Build artifact from source | Generate plan from Terraform config |
| Run unit + integration tests | Run terraform validate, tflint, Checkov |
| Deploy to staging | Apply to a test environment |
| Deploy to production | Apply to production |
| Monitor and roll back | Detect drift, run terraform apply to reconcile |
Infrastructure “artifacts” (a GKE cluster, a Cloud SQL instance) take minutes to provision and are expensive to test in isolation. This shapes the testing strategy significantly - covered in Testing IaC and the delivery pipeline in IaC & CI/CD.