
In the era of cloud-native infrastructure, infrastructure-as-code (IaC) is at the heart of repeatability, scalability, and reliability.
Terraform, HashiCorp’s popular IaC tool, has become the backbone for provisioning and managing infrastructure across all major cloud providers AWS, Microsoft Azure and Google Cloud.
Terraform helps you describe your desired infrastructure state (virtual machines, networks, storage, and more) in declarative configuration files. And then, Terraform’s engine takes responsibility for reconciling actual cloud resources to match your code through concepts like plan, apply, and state.
For Microsoft Azure, Terraform’s azurerm provider gives direct, programmable access to almost every Azure resource, including its foundational block storage: Azure Managed Disks.
Managed Disks, for the uninitiated, are high-durability, high-availability virtual disks attached to Azure VMs, abstracting away the fuss of traditional storage account management. Managed Disks offer built-in replication, security features, and performance tiers suited for anything from proof-of-concept to mission-critical workloads.
But simply creating resources programmatically isn’t the hard part. The power of Terraform lies in maintaining your infrastructure by enabling version control, code review, peer collaboration, and automated validation through CI/CD pipelines.
With infrastructure now defined as code, every change is auditable, testable, and reversible. These are fundamental attributes for modern cloud teams operating at scale.
Because disks today are more than just ‘dumb storage’. They’re often the single point of persistence and, by extension, risk. With every change, whether that’s resizing, attaching, moving, or even deleting a disk, you introduce room for outages, silent drift, and most importantly, cost explosions.
The traditional “set and forget” mindset doesn’t survive in the world of cloud storage operations.
Using Terraform to manage Azure disks, when done with discipline and care, offers:
1. Predictable, reproducible state: Disks are created, resized, and destroyed only as your code prescribes
2. Drift detection and mitigation: Manual console changes won’t go unnoticed; your code always reflects the real world, or alerts you if it doesn't
3. Audit and compliance: Every mutation is logged, peer-reviewed, and attributed
4. Disaster recovery made practical: Versioned infrastructure definitions are the key to reliable rollback and reconstruction
In other words, Terraform and IaC don’t just make things easier, they’re an operational shield against entropy and human error, especially as your environments and teams scale.
After all, surviving both the mundane and the disaster in today’s cloud means knowing your infrastructure’s behavior and being able to encode that knowledge in code.
Terraform’s azurerm_managed_disk resource enables you to create, configure, and manage every aspect of Azure Managed Disks, entirely through code. This approach unlocks the full suite of IaC benefits: repeatability, reviewability, and automation.
A Managed Disk in Terraform is composed of a few key attributes:
resource "azurerm_managed_disk" "db_data" {
name = "prod-db-data-disk"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
storage_account_type = "Premium_LRS"
disk_size_gb = 2048
create_option = "Empty"
tags = {
Environment = "Production"
Owner = "DBA Team"
}
}resource "azurerm_managed_disk" "restore" {
name = "restored-prod-disk"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
storage_account_type = "Premium_LRS"
create_option = "Copy"
source_resource_id = azurerm_snapshot.db_snapshot.id
disk_size_gb = 2048
}Disks are attached using the azurerm_virtual_machine_data_disk_attachment resource (as detailed in the Pitfall #2 section), ensuring disk changes do not inadvertently destroy or reboot VMs.
Use the lifecycle meta-argument for imperatives such as:
lifecycle {
prevent_destroy = true
ignore_changes = [tags]
}
These blocks help you avoid accidental data loss and unwanted drift (see below for pitfalls).
While the Azure Portal delivers a powerful, click-driven experience, managing Managed Disks programmatically with Terraform azurerm_managed_disk resource offers critical operational advantages for enterprise reliability and scale.
Here’s why:
When you codify your disk definitions, every terraform apply ensures the real-world disk matches your intended config.
Disks are often the backbone of stateful workloads. Any change (resize, move, re-attach) can materially impact application availability.
With code, it’s trivial to refactor resource names, move disks across resource groups, or replicate environments. Terraform’s support for resource renames (the moved block) and controlled lifecycle events enables safe modification and elimination of “ghost disk” anti-patterns.
Enforce encryption, labeling, network access policies, and regulatory controls through reuseable, reviewed code. No individual operator has to remember every checkbox or dropdown.
Disks defined in Terraform can be managed as part of end-to-end deployment pipelines. Integrate disk operations (e.g., expand, snapshot, clone, attach/detach) with zero manual toil.
Update thousands of disks (e.g., resizing, tagging, or re-encrypting) through a single PR and pipeline run, as opposed to individually clicking through resources.
Summary: Terraform azurerm_managed_disk isn’t just about convenience, it’s the difference between artisanal, hand-crafted ops and reliable, industrial-grade cloud operations.
With these approaches in your toolbox, you’re prepared to avoid some of the most common and impactful pitfalls teams face.
A classic: you refactor a disk resource, rename it, or move it to another location in code—or even update a region. Terraform sees the old one gone, creates a new one, and *deletes* the original, orphaning your data or breaking downstream attach dependencies.
This is the “ghost disk” effect. In audit trails, you see unexplained destroys and claws to recover lost data.
Leverage lifecycle and refactor-aware migration.
resource "azurerm_managed_disk" "data" {
name = "prod-db-data-disk"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
storage_account_type = "Premium_LRS"
create_option = "Empty"
disk_size_gb = 2048
lifecycle {
prevent_destroy = true # Prevent accidental Terraform destroys
}
}
Comment:
This block halts any unsuspecting `terraform apply` that tries to delete this disk, forcing manual intervention.
When you change names or paths, use the `moved` block to tell Terraform your intent explicitly:
moved {
from = azurerm_managed_disk.old_data
to = azurerm_managed_disk.data
}
Comment:
Ensures Terraform moves (not re-creates) the resource, preserving the physical disk in Azure.
Attaching disks via the old `os_disk`/`data_disk` blocks *inside* the VM resource leads to dangerous coupling. If you change disk count, size, or type, Terraform can destroy and re-create the whole VM. If you update disk config, a rolling update triggers unneeded reboots, leading to downtime or “reboot loop” scenarios on critical workloads.
Decouple disk attachment using the `azurerm_virtual_machine_data_disk_attachment` resource. This isolates disk changes from VM lifecycle, enforcing strict idempotency.
resource "azurerm_managed_disk" "data" {
# ...see above...
}
resource "azurerm_linux_virtual_machine" "app" {
# ...your VM definition...
}
resource "azurerm_virtual_machine_data_disk_attachment" "app_data" {
managed_disk_id = azurerm_managed_disk.data.id
virtual_machine_id = azurerm_linux_virtual_machine.app.id
lun = 0
caching = "ReadWrite"
}
Comment:
- *LUN* (Logical Unit Number) **must** be unique per VM.
- Detach/disposal is explicit and safe.
- Changing disk configuration or size does NOT force VM destroy.
Pro Tip: To manage LUN conflicts, use a standardized mapping (e.g., 0 for data, 1 for logs, 2 for backup) or generate LUNs dynamically in IaC for VM fleets.
It’s a fire. An SRE jumps into the Azure Portal, resizes a disk to resolve an IO bottleneck or capacity alert. But now, Terraform’s state and the real world have diverged. The next `terraform apply` either reverts the change (dangerous!), or you add `ignore_changes = [disk_size_gb]`—and forget the drift ever happened. This snowballs into unpredictable, undocumented infra.
Avoid “manual fix, persistent drift” at all costs. Instead, enforce a *GitOps-First* workflow—**all** disk size and perf changes go via code, code review, and CI.
lifecycle {
ignore_changes = [ disk_size_gb ] # Use sparingly; may hide drift!
}
Ignoring changes hides state mismatches, *including* shrinking disks (which Azure doesn’t support, but Terraform may try).
1. SRE files a PR with the disk size change.
2. Peer review ensures it’s the right disk (avoids a fat finger mistake).
3. Merge triggers CI/CD, applies disk resize in the same pipeline as OS change scripts (e.g., `resize2fs` for Linux).
Comment:
- Forces institutional memory in code history.
- Makes drift visible and correctable, not silent.
Don’t equate “Apply Succeeded” with “Healthy Disk.” Drift, silent throttling, and noisy neighbors can all degrade performance.
Proactive teams alert not just on downtime, but pre-failure signals (e.g., 80%+ IOPS saturation).
resource "azurerm_monitor_metric_alert" "disk_iops_alert" {
name = "high-disk-iops"
resource_group_name = azurerm_resource_group.main.name
scopes = [azurerm_managed_disk.data.id]
description = "Alert if disk IOPS usage exceeds 80% for 5 minutes"
severity = 2
enabled = true
criteria {
metric_namespace = "Microsoft.Compute/disks"
metric_name = "DiskIOPSConsumedPercentage"
aggregation = "Average"
operator = "GreaterThan"
threshold = 80
}
frequency = "PT1M"
window_size = "PT5M"
}
Comment:
Surface early warning signals—IOPS close to quota will manifest as app latency/timeout before outright IO errors.
Disks in the cloud aren’t “dumb blocks”; they are the persistent substrate of your critical data and a source of hidden blast radius if neglected.
Treat disks as first-class citizens in your infrastructure-as-code systems: never “set and forget,” always monitored, always codified.
The patterns above—defensive lifecycles, decoupled attachment, immutable ops, and proactive monitoring are what set reliable production infra apart from fragile proof-of-concept scripts.
