user_data
atrribute of Terraform’s aws_instance
provider is a perfect use-case for this sort of setup. In this case, I was working with an Amazon Linux AMI, so I elected to work with cloud-init as my user_data mechanism (vs vanilla shell commands). Using “traditional” cloud-init syntax, such as this:
#cloud-config
repo_update: true
repo_upgrade: all
packages:
- httpd24
- server
runcmd:
- service httpd start
- chkconfig httpd on
- groupadd www
- [ sh, -c, "usermod -a -G www ec2-user" ]
- [ sh, -c, "chown -R root:www /var/www" ]
- chmod 2775 /var/www
- [ find, /var/www, -type, d, -exec, chmod, 2775, {}, + ]
- [ find, /var/www, -type, f, -exec, chmod, 0664, {}, + ]
led to persistent failures. This sort of config, on instance creation, attempts to update all repos (repo_update/repo_upgrade), install apache and some other packages (packages), and run some shell commands (runcmd). While this is handy, it doesn’t give us a great amount of control over timing (it basically runs as soon as the instance is provisioned and RunInstances
is called by TF). After several attempts to run this configuration on machines in a private subnet with a properly allocated S3 endpoint (i.e. the endpoint’s CIDR is allocated in the security group), I surmised that perhaps the endpoint wasn’t coming up before the scripts were invoked, leading to the failure. This theory was further bolstered by two facts:
- That I could SSH to these hosts (through a bastion in the public subnet) after the entire Terraform process was complete, run the commands, and they ran without issue, and
- That when applying the same configuration to a bastion host in a public subnet, the user_data cloud-init statement executed successfully
This validated for me that my cloud-init was correct, and that the machines in the private subnet could actually install via yum commands (if later rather than sooner). This led me to try out an alternative cloud-init script passed into user_data, which looks like this:
#cloud-config
runcmd:
- [ sh, -c, "sleep 3m" ]
- yum -y update
- yum -y install httpd24
- service httpd start
- chkconfig httpd on
- groupadd www
- [ sh, -c, "usermod -a -G www ec2-user" ]
- [ sh, -c, "chown -R root:www /var/www" ]
- chmod 2775 /var/www
- [ find, /var/www, -type, d, -exec, chmod, 2775, {}, + ]
- [ find, /var/www, -type, f, -exec, chmod, 0664, {}, + ]
Imagine my surprise to see service httpd status
return a valid response after changing the cloud-init script! The three minutes assigned in the sleep comand is a totally arbitrary number; it could probably be lowered, but it never failed to apply the configuration successfully the few times I tested it, either.
And there you have it – provisioning servers via Terraform’s user_data attribute for EC2 instances in private subnets with S3 VPC endpoints just takes a little more time!