October 19, 2017 5:41:39 PM CEST By Daniil Baturin
Quick facts: the issue is caused by an unexpected change in the EC2 system, there is no solution or workaround yet but we are working on it.
In the last week a number of people reported an issue with newly created EC2 instances of VyOS where they could not login to their newly created instance. At first we thought it may be an intermittent fault in the AWS since the AMI has not changes and we could not reproduce the problem ourselves, but the number of reports grew quickly, and our own test instances started showing the problem as well.
Since EC2 instances don't provide any console access, it took us a bit of time to debug. By juggling EBS volumes we finally managed to boot an affected instance with an disk image modified to include our own SSH keys.
The root cause is in our script that checks if the machine is running in EC2. We wanted to produce the AMI from an unmodified image, which required inclusion of the script that checks if the environment is EC2. Executing a script that obtains an SSH key from a remote (even if link-local) address is a security risk since in a less controlled environment an attacker could setup a server that could inject their keys to all VyOS systems.
The key observation was that in EC2, both system-uuid and system-serial-number fields in the DMI data always start with "EC2". We thought this is a good enough condition, and for the few years we've been providing AMIs, it indeed was.
However, Amazon changed it without warning and now the system-uuid may not start with EC2 (serial numbers still do), and VyOS instances stopped executing their key fetching script.
We are working on the 1.1.8 release now, but it will go through an RC phase, while the solution to the AWS issue is needed right now. We'll contact Amazon support to see what are the options, stay tuned.