WireGuard1 Key Rotation
2023-11-22
WireGuard uses the private key as the sole identity of a peer, switching this key requires careful planning to not interrupt the ongoing connections on the network. This post explores different options to rotate the private keys of WireGuard interfaces without interrupting any ongoing flows.
This post assumes a basic understanding of WireGuard and administering linux systems. If you want to follow along you need the following tools:
- wireguard-tools
- iproute2
- sudo / root access
- sysctl:
net.ipv4.ip_forward = 1
The Simple Solution
A simple key rotation (at least in my mind) looks something like this:
- Create a new key pair.
- Distribute that key pair to all peers.
- Switch to the new key on all peers.
- Delete the old key pair.
Unfortunately one peer can only have one key assigned. An alternative approach would be to configure the same peer twice but with a different key, let’s try that.
Assuming we have this simple configuration:
# wg showconf wg0
[Interface]
PrivateKey = yAnz5TF+lXXJte14tji3zlMNq+hd2rYUIgJBgB3fBmk=
[Peer]
PublicKey = xTIBA5rboUvnH4htodjb6e697QjLERt1NAB4mZqp8Dg=
AllowedIPs = 100.64.0.2/32, 10.0.1.0/24
We receive a message that
TrMvSoP4jYQlY6RIzBgbssQqY3vxI2Pi+y71lOWWXX0=
is the new
public key of the configured peer. The intermediate configuration to
support both keys would look like this:
# cat wg0.conf
[Interface]
PrivateKey = yAnz5TF+lXXJte14tji3zlMNq+hd2rYUIgJBgB3fBmk=
[Peer]
PublicKey = xTIBA5rboUvnH4htodjb6e697QjLERt1NAB4mZqp8Dg=
AllowedIPs = 100.64.0.2/32, 10.0.1.0/24
[Peer]
PublicKey = TrMvSoP4jYQlY6RIzBgbssQqY3vxI2Pi+y71lOWWXX0=
AllowedIPs = 100.64.0.2/32, 10.0.1.0/24
Once applied we can check if the configuration took the desired effect:
# wg setconf wg0 ./wg0.conf
# wg showconf wg0
[Interface]
PrivateKey = yAnz5TF+lXXJte14tji3zlMNq+hd2rYUIgJBgB3fBmk=
[Peer]
PublicKey = xTIBA5rboUvnH4htodjb6e697QjLERt1NAB4mZqp8Dg=
[Peer]
PublicKey = TrMvSoP4jYQlY6RIzBgbssQqY3vxI2Pi+y71lOWWXX0=
AllowedIPs = 100.64.0.2/32, 10.0.1.0/24
The allowed IPs are only assigned to the second peer! This is due to a core concept of WireGuard: Cryptokey Routing2.
Since WireGuard uses the list of allowed IPs to decide which public key to use for encryption (and where to send the packet), every IP can only be associated with one peer. When applying a set of allowed IPs to a peer, any conflicts are resolved by silently removing the overlapping IPs from any other peer. To my understanding not throwing an error in this case is a feature of WireGuard, it enables atomic updates of the allowed IPs of peers. Otherwise, the user would have to first remove the allowed IP from one peer and then add it to another one causing a brief time in which packets to that IP cannot be routed.
Rolling Keys Between Handshakes
Looking at mailing list, this seems to be a common pitfall34. Someone else already raised the question about key rotation and Jason’s suggestion was to use the gap between two handshakes to rotate the key5. A new handshake occurs when either approximately two minutes have passed or more than one quintillion messages have been exchanged6.
For a single (or few) tunnel(s) this might be a viable option. You still need to distribute the new key and align the timeframe in which the switch should happen but the probability of a handshake occurring before all peers switched the key should be low (depending on the time it takes to switch keys across all peers). But for a setup that is processing a lot of packets across a lot of peers this might not be acceptable:
Assuming we have 1,000 peers in a network, we are (on average)
performing a handshake every 120s / 1,000 = 0.12s
. If it
takes us one second (due to clock skew or network latencies) to switch
the key on all devices, we will interrupt about 8 handshakes. If such a
handshake failure occurs, the affected tunnel will be interrupted for
about five seconds7.
A five-second interruption should be manageable for all but the most demanding scenarios. But I’m looking for a bulletproof solution and willing to sacrifice some of the simplicity for it.
Rolling Keys in an A / B Fashion
A common suggestion when discussing the inability to handle overlapping allowed IPs is to configure multiple interfaces and do the routing using some other mechanism8. If we can have two tunnels (one using the old, and the other using the new key) we can configure the routing system to make the switch only once we know the new key is usable.
Rotating the keys of a single peer using this approach could look like this:
- Generate the new key pair.
- Distribute the new key pair to all peers.
- All peers set up a second interface identical to the first, except that one key is different.
- All peers adjust their routes to send traffic over the new network.
- The old interfaces are taken down.
To verify our approach we need a test environment. We can simulate
two hosts connected via a public network using linux network namespaces.
The public network is simulated by a network namespace containing a
network bridge, each host connected to the network is also a network
namespace. For each host a virtual ethernet pair is created where one
interfaces is in the host’s network and the other one in the network’s
namespace. For the interfaces on the network side the bridge is
configured as the master to connect everything together. If you want to
play along you can run this
script to configure everything on your own machine. Each side also
gets an IP address, vm1 gets 10.0.1.1/24
and vm2 gets
10.0.1.2/24
.
Now we configure our interfaces. The initial configurations looks like this:
# cat wga0.conf
[Interface]
PrivateKey = 8ONFgn3X/vU2TYYo8lUZV8O4KPvoJtvVPvYjiRN/+1s=
# PublicKey = 3Zk5XsCNJLoEDtv724JK9UDklKKX0LpPG/GWMt/PBgg=
ListenPort = 50000
[Peer]
PublicKey = /3E4XpED6iIOCX3TdZcXm5Z4btMswHog+e4b+E+ACSI=
Endpoint = 10.0.1.2:50000
AllowedIPs = 100.64.0.2/32
# cat wgb0.conf
[Interface]
PrivateKey = OMCQxK7SUim4x+U4j23zO6PFnU/9jo4ZdHt5+Lb9i3M=
# PublicKey = /3E4XpED6iIOCX3TdZcXm5Z4btMswHog+e4b+E+ACSI=
ListenPort = 50000
[Peer]
PublicKey = 3Zk5XsCNJLoEDtv724JK9UDklKKX0LpPG/GWMt/PBgg=
Endpoint = 10.0.1.1:50000
AllowedIPs = 100.64.0.1/32
Create the interface, apply the configuration, bring it up:
ip -n host-a link add wg0 type wireguard
ip -n host-a address add 100.64.0.1/24 dev wg0
ip netns exec host-a wg setconf wg0 ./wga0.conf
ip -n host-a link set wg0 up
ip -n host-b link add wg0 type wireguard
ip -n host-b address add 100.64.0.2/24 dev wg0
ip netns exec host-b wg setconf wg0 ./wgb0.conf
ip -n host-b link set wg0 up
Now we can verify that the two hosts can talk to each other:
ip netns exec host-a ping 100.64.0.2
All good!
Now for the thing we are here for: the key rotation. Following our list above we first generate a new key pair and use it to duplicate the configurations:
# cat wga1.conf
[Interface]
PrivateKey = eHG+epcOFAPw+O+vjKnfwPxVQd0tAOqRBQjmz54w51g=
# PublicKey = nNXf7jqgjtLKSyHWRAGtWRGuiNoOAXBEatovwnW+bjc=
ListenPort = 50000
[Peer]
PublicKey = /3E4XpED6iIOCX3TdZcXm5Z4btMswHog+e4b+E+ACSI=
Endpoint = 10.0.1.2:50000
AllowedIPs = 100.64.0.2/32
# cat wgb1.conf
[Interface]
PrivateKey = OMCQxK7SUim4x+U4j23zO6PFnU/9jo4ZdHt5+Lb9i3M=
# PublicKey = /3E4XpED6iIOCX3TdZcXm5Z4btMswHog+e4b+E+ACSI=
ListenPort = 50000
[Peer]
PublicKey = nNXf7jqgjtLKSyHWRAGtWRGuiNoOAXBEatovwnW+bjc=
Endpoint = 10.0.1.1:50000
AllowedIPs = 100.64.0.1/32
Repeat the steps from above but create the interfaces as
wg1
instead of wg0
. When bringing up the
interfaces the kernel gently reminds us that only one wireguard
interface can use a listening port at a time:
RTNETLINK answers: Address already in use
So we have to use different ports for our second network, switch them all to 50001, and you should be able to bring up the interfaces.
But which interface is in use? There are different ways to check that. First we can use our trusty tcpdump(1) to check where our packets are being processed (make sure the ping is still running in the background):
ip netns exec host-a tcpdump -plni wg0 # or wg1
Which should show something like this for wg0
(and
nothing for wg1
):
IP 100.64.0.1 > 100.64.0.2: ICMP echo request, id <num>, seq <num>, length <num>
IP 100.64.0.2 > 100.64.0.1: ICMP echo reply, id <num>, seq <num>, length <num>
Or we can consult the kernel debug logs:
# enable wireguard debug logs
echo module wireguard +p | tee /sys/kernel/debug/dynamic_debug/control
# monitor for any activity
dmesg -w | grep wireguard
You should occasionally (about every two minutes) see messages like this:
wireguard: wg0: Keypair <num> created for peer <num>
wireguard: wg0: Receiving handshake response from peer <num> (10.0.1.2:50000)
wireguard: wg0: Keypair <num> destroyed for peer <num>
Everything should show that the interfaces we created first are being used for sending traffic between the two peers. Looking at the configured routes we can see why:
# ip -n host-a route show
10.0.1.0/24 dev eth0 proto kernel scope link src 10.0.1.1
100.64.0.0/24 dev wg0 proto kernel scope link src 100.64.0.1
100.64.0.0/24 dev wg1 proto kernel scope link src 100.64.0.1
The route we created first is higher up in the main routing table and decides where packets are sent. Let’s remove the duplicate routes:
ip -n host-a route del 100.64.0.0/24 dev wg1
ip -n host-b route del 100.64.0.0/24 dev wg1
To switch traffic over to the new network we simply adjust the existing route to use the new interface:
ip -n host-a route change 100.64.0.0/24 dev wg1
ip -n host-b route change 100.64.0.0/24 dev wg1
Using the techniques from above we can confirm the wg0
went silent and traffic is now sent via wg1
.
But wait - doesn’t this interrupt the traffic? If you switch only one host but not the other everything still works even though the two directions are using different paths! As long as the packets reach their destination it doesn’t matter which route they took, which is also in line with the end-to-end principle9.
This confirms that you can rotate WireGuard keys by switching traffic between two (almost) identical networks! The main benefit is that it doesn’t rely on timing the switch. Even if it takes multiple days because you have to update 1,000 hosts by hand this approach will still work.
At this point you can experiment with different setups by running, for example, a netcat connection to test TCP while switching the route back and forth.
Thoughts for the Future
The setup can be improved by confirming the tunnel can be established before switching routes that affect any ongoing connections.
With such a setup it should be possible to seamlessly transfer active flows from one host to another.
How does the outer network handle flows that are split across different routes? Especially when you are running in a “real” network and not on a network bridge.
Do the conntrack table entries for the inner flow need to be synchronised across the two hosts?