I have Router with two ipv6 nics let it be eth0 (default route) and tun0, and Host with self assigned real ipv6 in the range of eth0. All traffic from Host goes thru eth0, no problem, but I want to pass some traffic via tun0 according ip set in nftables.
Because Host has it’s ipv6 and tun0 has different ip masquerading is required. Also packets marking according ip set. So the rules are
chain mangle_PREROUTING { type filter hook prerouting priority mangle; policy accept;
ip6 daddr @addr-tun0 counter meta mark set 4
}
iproute2 rule is
32762: from all fwmark 0x4 lookup tbltun0
In a result packets are correctly route via tun0 BUT no masquerading. Source address is Host’s real ipv6. If I delete mangle mark rule and add iproute2 rule
32760: from <Host’s ipv6> lookup tbltun0
All traffic(as expected) from Host goes via tun0 and source address nated to tun0 ip.
Seems I do not understand something.
In the router eth0 is WAN interface with real address aaaa::1
tun0 has fd00:1::2 the other end of tunnel has fd00:1::1
Another Computer has real address aaaa::2 and wants connect bbbb::100
If do nothing packet will go out from eth0 aaaa::2→bbbb::100 and the answer come to eth0 from bbbb::100 to aaaa::2
I want this packet go via tun0. So I need create iproute2 rule that packets marked with 4 are going according route in tbltun0 (via tun0) and add bbbb::100 element to set @addr-tun0addr-tun0
Then packet from aaaa::2 goes via tun0 to bbbb::100, bbbb::100 send answer to aaaa::2 and this answer returns ANOTHER way to eth0 not tun0. That’s why masquerading is required, it changes source address from aaaa::2 to fd00:1::2. Answer returns same way then.
But masquerading somehow does not work if I add marking rule. Packet goes to tun0 wtih aaaa::2 source address instead fd00:1::2
Remark:
I have a home router that does have IPv6. It gets a 2001:abcd::/64 subnet for my LAN from ISP.
My machine gets four IPv6 routes:
$ ip -6 ro
::1 dev lo proto kernel metric 256 pref medium
2001:abcd::/64 dev enp7s0 proto ra metric 100 pref medium
fe80::/64 dev enp7s0 proto kernel metric 1024 pref medium
default via fe80::cdef dev enp7s0 proto ra metric 100 pref medium
That is, the default route is to link-local address of the router, not any of its public IPv6 addresses.
There obviously must be at least one router between aaaa:: and bbbb:: subnets.
To bbbb::100 via whom? Is it the aaaa::1?
Where is the other end of the tunnel? Is it within aaaa::, or outside?
Do you control/configure the other end?
Lets say tun0 end is at Router2. The other end ip is fd00:1::1 also Router2 has its real ip lets say cccc::1
If bbbb::100 receives packet from aaaa::1 it send answer and it comes to Router1 eth0, because it has aaaa::0/64 net.
If bbbb::100 receives packet from cccc::1 answer comes to Router2 and can be forward via tun0 to Router1 because its net is cccc::0/64
If I want to send some packets via tun0 I need to define iproute2 rule (mark or from…) and table tbltun0 with “default dev tun0”. Then route is:
Host→ Router(lan) → Router1(tun0) → Router2(tun0) → Router2(eth0) → DstHost
Then answer must return same way
DstHost → Router2(eth0) → Router2(tun0) → Router1(tun0) → Router(lan) → Host
And for that It needs to change src address (masquerading) from aaaa::1 to fd00:1::2
Without src change return path is
DstHost → Router1(eth0) → Router(lan) → Host
I have no problem with routing. If I use source base routing instead packet marking traffic goes via tunnel and source address correctly replaced by fd00:1::2. BUT in this case all traffic from aaaa::1 goes via tunnel, not according the list.
Yesterday I have discovered than masquerading work is set is small (2 elements). But if list is about 10k elements (regional list) there is no masquerading.
Let me rephrase – for the source-rounting case:
If there is a rule that all packets from host aaaa::2 are handled by table tbltun0 (rather than table main), then the routes in table tbltun0 must handle all destinations. If the table tbltun0 has only one route in it – the catch-all default – then obviously all packets (from aaaa::2) must use that one route.
When interface (e.g. eno1) gets an address, for example aaaa::8, an implicit route to link-local neighbours is created (to main table). aaaa::/x dev eno1
That route tells that when the destination (e.g. aaaa::42) is a member of aaaa::, then it is sufficient to use ARP broadcast “Who is aaaa::42” on eno1 to get MAC-address of that destination and then throw the packet out from eno1.
If the destination is not a member of aaaa::, then it is in some other subnet. In order to reach other subnets, there must be a route that matches the destination and names the gateway to use. The gateway is the router that is member of aaaa:: and of another subnet (which may be the destination’s subnet, or has next gateway to next subnet on path towards the destination).
The default via fd00:1::1 has the “default” that does match any address, and the via fd00:1::1. The latter tells to pass the packet to gateway fd00:1::1 (in hope it knows what to do with it). Hence, ARP call “Who is fd00:1::1” is sent (to tun0, since fd00:1::1 is a link-local neighbour reachable via tun0). With MAC for fd00:1::1 the packet can be passed on.
With that backgroud,
the plain dev tun0 made no sense as it did not name any gateway, but rather said “Just ask for the MAC of bbbb::100 from tun0”. The bbbb::100 is not linklocal to the tun0 interface, so it cannot answer and routers do not pretend to be someone else. (Network bridges do to some extent.) I’m surprised that it did “work” at all.
The via fd00:1::1 dev tun0 looks like a proper route “to remote destinations”.
tunnel is point-to-point connection it does not need arp(ip4) or neighbor discover (ip6) because they do not have MAC address. Tunnel has itself ip and peer ip. Packets just sends to “other end” (peer ip) and the other end will decide what to do then.
So all tunnels works with “default dev tun0” without “via ”
I do not understand situation.
All the same: iproute2 table with default route, nftables postrouting masquerade for oifname tun0
No mark rule in nftables, iproute2 rule from lookup table tbltun0
All traffic from that ip goes via tunnel with correct src address (masquerading works)
Mark rule for small set
Masquerading works, traffic for ranges in set goes via tun0
mark rule for big set
nftables do not do its job - no masquerade. iproute2 correctly route marked traffic to tun0
3a) addition “via” to default route in tbltun0 seems let nftables understand that src should be changed
And that’s only for ipv6, ipv4 traffic goes correctly.
So I have found solution for myself - use small set because right now that’s enough for me.
Logically, the size of a nftables set (which is used only in prerouting) should have no effect in postrouting. IIRC, there at least were issues with huge (either ipset or nftables) sets. Very slow or something. This could be one sauch case.
Alas, RHEL 8 has already reached 8.10 and is in “Maintenance phase” (until 2029). Hence it (and Rocky 8) will receive only critical fixes. Furthermore, one would have to reproduce the issue on RHEL 8 in order to create a bug report – there is no more CentOS Stream 8. (On the bright side, free-of-charge RHEL licensing does exists.) Anyway, an issue that occurs only with IPv6 P2P tunnels and does have a workaround (the via gw_ip route) is hardly a “critical bug”.
If the same issue is present also in RHEL 9/10, then there is more reason to report it to Red Hat.
As I noted before, all traffic would not go via tunnel if the table tbltun0 would have appropriate routes to direct some traffic via eth0.
All traffic from SPECIFIC host goes via tunnel because I have decided so! I have added the iproute2 rule:
from specific-ip6 lookup tbltun0
that means traffic from Host to any destination goes by routes in tbltun0
That was defined so to check that table route works correctly and masquerading works.
My bad. I got impression that you did emphasize the “all” in that like it were an issue,
while it was a completely insignificant detail. What you actually said was that the
packets that were routed to tunnel did got the expected sNAT treatment.
One could – for completeness – test how would a table like below behave:
ip rule add from specific-ip6 lookup tbltun0
for DST in $addr_tun0
do ip route add "$DST" dev tun0 table tbltun0
done
ip route add from default via xxxx::1 dev eth0 table tbltun0
where the $addr_tun0 is a big list of destinations (in appropriate format).
That puts the big list into a routing table, rather than into nftables set. Traffic from specific-ip6 to any address not in addr_tun0 will use eth0, just like the setup with mark.
10000+ routes… I am not sure that’s good idea
But even if system will not die what happens if destination will be somewhere else, not among these routes? Traffic will be rejected.
If try to route traffic by iproute2 then I need this
for DST in $addr_tun0
do ip ru a from specific-ip6 to “$DST” table tbltun0
done
That will create 10k rules. If dst ip satisfy none of them it will be routed by default
I do agree; mere thought scares me. However, be it table of routes or nftables set, a list of 10000+ has to be loaded into kernel memory and searched with destination addresses. You have found that the 10000+ in a set disturbs postrouting. The (hypothetical) question is what is disturbed by 10000+ in routes.
That is why I had above (alas, with syntax error):
ip route add default via xxxx::1 dev eth0 table tbltun0
If destination is somewhere else, then it matches the default rule and
follows the via xxxx::1 dev eth0 just like all the other packets that do not enter table tbltun0
One more time. I want to route some traffic via tun0 and the rest via default eth0. 10k routes in the table not better than one default because decision send via tun0 is in the iproute2 rule (“from source-ip…” or “fwmark…”). If not to use nftables set then 10k rules but not 10k routes.
Means pass ALL traffic via tbltun0. If no route to destination in the table traffic will be rejected
I did answer to your question about “if destination will be somewhere else”, which was about the case of exact one rule (from specific-ip6 lookup tbltun0) and no mark.
No, it means that all traffic coming from address specific-ip6 will be handled according to the routes that the table tbltun0 has. All routes in that table do not technically have to direct to interface tun0. One can have a “default route” in there that handles all the destinations that the more specific routes do not.
It is naturally up to you what routes you do add to that table.
I don’t think that 10k rules are any cheaper than 10k routes.
Overall, I don’t say that you should use 10k routes or 10k rules. I say that it would be of academic interest to know whether kernel handles those better or worse than the 10k addresses in nftables set.