Not so long ago the only scenario when the issues with IPsec and NAT could arise was a remote access setup, while routers invariably had real public addresses and router to router IPsec operators were incredibly unlikely to run into those issues. A router behind NAT was seen as a pathological case.
I believe it's still a pathological case, but as platforms such as Amazon EC2 that use one to one NAT to simplify IP address management rose in popularity and people started setting up their routers there, it became the new normal. This new reality became a constant source of confusion for beginner network admins and people who simply are not VPN specialists. Attempts to setup a VPN as if there's no NAT, with peer external IP addresses and pre-shared keys as it's commonly done, just fail to work.
Sometimes I'm asked why is that so, if the NAT in question is 1:1, and no other protocol just refuses to work behind it without reconfiguration. The reasons are inside the IPsec protocols themselves, and 1:1 NAT is not much better in this regard than any other kind (even though it's somewhat better than the situation when there are multiple clients behind NAT).
IPsec pre-dates NAT by over a decade, and was explicitly designed with end to end connectivity in mind. It was also designed as a way to add built-in security to the IP protocol stack rather than a new protocol within the stack, so it made heavy use of those underlying assumptions. That's why workarounds had to be added when the assumption of end to end connectivity became incorrect.
Let's look into the details and then see how IPsec can be configured to work around those issues. Incidentally, the solutions are equally applicable to machines with dynamic addresses as well as those behind a 1:1 NAT, since the root cause of the issues is not knowing beforehand what your outgoing address will be next time.
AH and ESP
We all (hopefully) remember that there are three criteria of information security: availability, confidentiality, and integrity. Availability (when required) in the IP is provided by reliable transmission protocols such as TCP and SCTP. IPsec was supposed to provide the other two.
AH (Authentication Header) was designed to ensure packet integrity. For that, it calculates a hash function from the entire packet with all its headers, including source and destination addresses. Since NAT changes the addresses, it invalidates the checksum and NATed packets would be considered corrupted. So, AH is unconditionally incompatible with NAT.
ESP only protects the payload of the packet (whether it's some protocol data, or an entire tunnelled packet), so it, luckily, can work behind NAT. The UDP encapsulation of ESP packets was introduced to allow the traffic to pass through incorrect NAT implementations that may discard non-TCP or UDP protocols, and for correct implementations this should be irrelevant (I've seen a number of SOHO routers that couldn't handle typical L2TP/IPsec connections despite that UDP encapsulation — but that's another story).
Before we can start sending ESP packets, we need to establish a "connection" (security association) though. In practice, most IPsec connections are negotiated via the IKE protocol rather than statically configured, and that's where IKE issues come into play.
IKE and pre-shared keys
To begin with, original IKE was specified to use UDP/500 for both source and destination port. Since NAT may change the source port, the specification had to be adjusted to allow other source ports. Even more issues arise when there are multiple clients behind NAT and traffic has to be multiplexed, but we'll limit the discussion to 1:1 NAT where canonical IKE might work. Let's assume the IKE exchange has gone that far.
The vast majority of IPsec setups in the wild use pre-shared keys. I've configured connections to vendors and partners on behalf of my employers and customers countless times, and I believe I've seen an x.509-based setup once or twice. In a setup with pre-shared keys, routers need to know which keys to use for which peers. How do they know? Suppose you've configured a tunnel with peer 192.0.2.10 and key "qwerty". The other side has 172.16.0.1 address NATed to 192.0.2.10. Will it be the source address of the peer that will be used for the key lookup? Not quite.
In the IKE/ISAKMP exchange, there are identifiers. It's the pairs of identifiers and pre-shared keys that must be unique to reliably tell one peer from another and use correct keys for every peer. There are several types of identifiers specified by the ISAKMP protocol: IP (v4 or v6) address, subnet, FQDN, user and FQDN pair (user@example.com), and raw byte string.
By default, most implementations (including StrongSWAN in VyOS) will use the IP address of the outgoing interface for the identifier, and it will be embedded in the IKE packet. If the host is behind NAT, that address is a private address, 172.16.0.1. When the packet passes through the NAT, the payload will obviously remain unchanged, and when it arrives, the responder will try to find the pair of 172.16.0.1:qwerty in its configuration rather than 192.0.2.1:qwerty that is has configured, and will send NO_PROPOSAL_CHOSEN to the NATed peer.
The solutions
So, how do you get around those issues? Whether IKE is configured to use NAT detection or not, you'll get to use peer identification methods that do not rely on known and fixed IP addresses.
The simplest approach is to use a manually configured peer identifier. This approach will work when one side is NATed and the other is not. The only thing you need to remember is that VyOS (right now at least) requires that FQDN identifiers are prepended with the "@" character. They also do not need to match any real domain name of your machine. You also need to set the local-address on the NATed side to "any".
Here's an example:
Non-NATed side
vpn {
ipsec {
site-to-site {
peer @foo {
authentication {
mode pre-shared-secret
pre-shared-secret qwerty
}
default-esp-group Foo
ike-group Foo
local-address 203.0.113.1
tunnel 1 {
local {
prefix 10.103.0.0/24
}
remote {
prefix 10.104.0.0/24
}
}
}
}
NATed side
ipsec {
site-to-site {
peer 203.0.113.1 {
authentication {
id @foo
mode pre-shared-secret
pre-shared-secret qwerty
}
default-esp-group Foo
ike-group Foo
local-address any
tunnel 1 {
local {
prefix 10.104.0.0/24
}
remote {
prefix 10.103.0.0/24
}
}
}
}
}