Back in the old days, I could just type
route (or, later,
ip route) in my Linux terminal and get an accurate picture of all my routes. This is no longer the case.
For instance, the machine where I’m writing this is connected to the Mullvad VPN via the Wireguard protocol using the wg-quick script. I’m pretty sure all my traffic goes through Mullvad, yet you wouldn’t be able to tell this from my
ip route output:
% ip route default via 192.168.1.1 dev enp34s0 proto static metric 100 192.168.1.0/24 dev enp34s0 proto kernel scope link src 192.168.1.121 metric 100 192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1 linkdown
Note that the default route seemingly directs all traffic through my physical network interface, not the virtual VPN interface.
So let’s figure out how this all works.
Routing tables (
In reality, there isn’t the routing table in Linux (and hasn’t been for more than 20 years, since around Linux-2.2). Instead, there are multiple routing tables — and a set of rules that tell the kernel how to choose the right table for each packet.
(By the way, do not confuse routing tables with iptables. To simplify a bit, routing tables specify how to deliver a packet, whereas iptables specify whether to deliver it at all. They are completely different and unrelated.)
What you see when you run
ip route without specifying a table is the contents of one particular table,
main. Tables are identified by integer numbers (from 1 to 232−1) but can be also given textual names, which are listed in the file
/etc/iproute2/rt_tables. The default one will look something like this:
# # reserved values # 255 local 254 main 253 default 0 unspec # # local # #1 inr.ruhep
(Are you wondering what inr.ruhep is? This is just an example, likely added by Alexey Kuznetsov, who worked on these parts of the Linux kernel and iproute tools. It stands for “Institute for Nuclear Research / Russian High Energy Physics”, the place where Alexey worked at the time, and probably refers to their internal network. There was also an old-school Russian computer network/ISP called RUHEP/Radio-MSU.)
You can view the contents of any table like this:
% ip route list table local % ip route list table 13
Routing policies (
So how does the kernel know which routing table to apply? It uses the “routing policy database”, which is managed by the
ip rule command. In particular,
ip rule without any arguments will print all existing rules. These are mine:
% ip rule 0: from all lookup local 32764: from all lookup main suppress_prefixlength 0 32765: not from all fwmark 0xca6c lookup 51820 32766: from all lookup main 32767: from all lookup default
The numbers you see on the left (0, 32764, …) are rule priorities: the lower the priority, the higher the priority. That is to say, rules with lower numbers are processed first.
Apart from the priority, each rule has also a selector and an action. The selector tells us whether the rule applies to the packet at hand. If it does, the action is executed. The most common action is to consult a particular routing table (see the previous section). If that routing table contained a route for our packet, then we’re done; otherwise, we proceed to the next rule.
The rules with priorities 0, 32766 and 32767 above are created automatically by the kernel. To quote the
ip-rule(8) man page:
Priority: 0, Selector: match anything, Action: lookup routing table local (ID 255). The local table is a special routing table containing high priority control routes for local and broadcast addresses.
Priority: 32766, Selector: match anything, Action: lookup routing table main (ID 254). The main table is the normal routing table containing all non-policy routes. This rule may be deleted and/or overrid‐ den with other ones by the administrator.
Priority: 32767, Selector: match anything, Action: lookup routing table default (ID 253). The default table is empty. It is reserved for some post-processing if no previous default rules selected the packet. This rule may also be deleted.
The other two rules have been created by the wg-quick script. If you want to understand how they work, read on.
Let’s look at the two rules that are added by wg-quick:
32764: from all lookup main suppress_prefixlength 0 32765: not from all fwmark 0xca6c lookup 51820
At first sight, these are quite cryptic: what does
suppress_prefixlength do, what is 0xca6c, and how can a packet be “not from all”?
Let’s start from the 32764 rule: as it has a lower number, it’s considered first.
32764: from all lookup main suppress_prefixlength 0
The rule has no selector, making the kernel consult the
main table for every single packet.
If this was the whole rule, every packet would be routed by the main table, never reaching the VPN. This is why the action also contains a suppressor:
suppress_prefixlength 0. From the
ip-rule(8) man page
suppress_prefixlength NUMBER reject routing decisions that have a prefix length of NUMBER or less.
Here “prefix” refers to the address or range of addresses matched in the routing table. So if you have a route for 10.2.3.4, its prefix length is 32 (bits); if you change it to 10.0.0.0/8, the prefix length will be 8.
What is a prefix of length 0 or less? It’s the empty prefix, 0.0.0.0/0, corresponding to the default route. So if the packet was routed by the default route from
main, that routing decision is ignored; otherwise, it’s respected.
To summarize, the effect of this rule is to respect all manual routes that the administrator might have added to the
main table. However, if the packet didn’t match any of the specific routes, then instead of applying the default route, we’re proceeding to the next rule.
32765: not from all fwmark 0xca6c lookup 51820
The “not from all” bit is just a quirk of how
ip rule formats its rules. A better way to express it would be
32765: from all not fwmark 0xca6c lookup 51820
It’s just that when no “from” prefix (address or range) is present in the rule’s selector,
ip rule prints “from all”.
51820 is a routing table, also created by wg-quick, containing a single role:
% ip route list table 51820 default dev mullvad scope link
So the effect of the rule is to route everything that reached it through the VPN, with one exception: the mysterious
not fwmark 0xca6c.
0xca6c is just a numerical label (“firewall mark”) that wg-quick asked wg to mark all of the packets that it emits. These are packets that already encapsulate other packets and are targeted to your VPN peer/server. If these packages were routed back to wireguard, that would create an infinite loop of wrappers on top of wrappers.
So the selector ensures that packets that have already been encapsulated can escape through your normal internet connection. Since these packets are ignored by this rule, they proceed to the rule
32766: from all lookup main
But now there is no suppressor, so these packages are free to use the default route.
Fun fact: wg-quick uses the same numbers for the table and the fwmark: 0xca6c is just 51820 in hexadecimal.
Overall, this setup works quite well. Older VPN scripts used to override your default route in the main table when connecting to the VPN and restore it when disconnecting. Sometimes this wouldn’t work, and after disconnecting from the VPN you would be left without any default route at all. wg-quick doesn’t have this problem, as it never messes with your
main routing table. All it has to do when disconnecting is delete its two rules, and your default route is active again, Or you could even do that yourself with
ip rule del.
Félix Baylac Jacqué says:
If you want to go further on advanced linux-based routing:
man ip-vrf<= routing tables attached to a particular netdev.
man ip-netns<= network namespaces. There’s a nice trick using them to prevent any packet leakeage w/ wireguard https://www.wireguard.com/netns/