Discussion:
[Shorewall-users] routing issue with rtrules (with SW dump)
Simon Matter
2017-06-20 13:41:18 UTC
Permalink
Hi,
I used to ping correctly from the shorewall FW to a remote host's IP
address in particular zone (CAIB, see below).
Somehow, this ping is failing now, and I don't know if it's a config error
on my behalf or that the remote host stopped replying.
# ping -I 10.215.246.91 10.215.236.123 -c 1
PING 10.215.236.123 (10.215.236.123) from 10.215.246.91 : 56(84) bytes of
data.
--- 10.215.236.123 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms
Still on $FW, I can ping the same IP address from a differnet source IP
# ping -I 10.215.144.91 10.215.236.123 -c 1
PING 10.215.236.123 (10.215.236.123) from 10.215.144.91 : 56(84) bytes of
data.
64 bytes from 10.215.236.123: icmp_seq=1 ttl=60 time=2.08 ms
--- 10.215.236.123 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 2.084/2.084/2.084/0.000 ms
# grep "10.215.232.0/21" rtrules
10.215.144.0/23 10.215.232.0/21 IBS 11420
- 10.215.232.0/21 CAIB 11615
where IBS and CAIB are providers for the same 10.215.232.0/21 network (can
be used as load-balanced links or failover).
# shorewall show routing | grep 10.215.232.0
11420: from 10.215.144.0/23 to 10.215.232.0/21 lookup IBS
11615: from all to 10.215.232.0/21 lookup CAIB
Note that pinging 10.215.236.123 from a LAN host with IP address
10.215.246.* is successful.
# traceroute -s 10.215.246.91 10.215.236.123
traceroute to 10.215.236.123 (10.215.236.123), 30 hops max, 60 byte
packets
1 * * *
2 * * *
3 * * *
4 * * *
5 * * *
6 * * *
7 * * *
8 * * *
9 * * *
10 * * *
11 * *^C
# traceroute -s 10.215.144.91 10.215.236.123
traceroute to 10.215.236.123 (10.215.236.123), 30 hops max, 60 byte
packets
1 172.28.17.110 (172.28.17.110) 0.694 ms 1.396 ms 1.408 ms
2 10.128.12.0 (10.128.12.0) 2.096 ms 2.558 ms 2.816 ms
3 172.20.30.2 (172.20.30.2) 1.770 ms 1.763 ms 1.732 ms
4 * * *
5 * * *
6 * * *
7 * * *
8 * * *
9 *^C
# traceroute -s 172.20.11.62 10.215.236.123
traceroute to 10.215.236.123 (10.215.236.123), 30 hops max, 60 byte
packets
1 172.20.11.50 (172.20.11.50) 0.518 ms 0.612 ms 0.569 ms
2 172.20.4.210 (172.20.4.210) 2.008 ms 2.009 ms 1.966 ms
3 10.215.4.242 (10.215.4.242) 6.316 ms 6.314 ms 6.317 ms
4 172.20.4.14 (172.20.4.14) 8.094 ms 8.028 ms 8.549 ms^C
I'm attaching a shorewall dump while performing the ping from $FW
(10.215.246.91) to 10.215.236.123.
Hi Vieri,

Last week you asked the list about a possible arp cache issue. Did you
find a solution there or is the issue you report now probably related?

Since you didn't let us know what came out last week I'm not sure both
things are related or not.

Simon
Simon Matter
2017-06-22 04:59:38 UTC
Permalink
________________________________
Post by Simon Matter
# ping -I 10.215.246.91 10.215.236.123 -c 1
Last week you asked the list about a possible arp cache issue. Did you
find a solution there or is the issue you report now probably related?
Since you didn't let us know what came out last week I'm not sure both
things are related or not.
Hi Simon,
I didn't follow up on my last issue regarding arp cache because I ran into
several critical issues. I noticed that my main switch had a default cache
timeout of 300 seconds. I lowered it to 1s before doing the FW change, and
set it back to 300s afterwards. This helped because the change was very
fast, and the clients connected to the first layer of switches were
quickly working again. However, some other clients had trouble, and had to
be rebooted.
In any case, traffic was apparently flowing as expected in all zones
except for one (the WAN zone, or internet access). Since I wasn't able to
pinpoint the cause of the problem in due time, I had to revert to the old
FW. I wasn't even able to do a correct "shorewall dump" as specified in
the troubleshooting guide since I confused "shorewall reset" with
"shorewall clear". :-(
The ping issue I reported here was occurring after falling back to the old
FW. I mean *really* after -- at least three hours later.
Since this wasn't critical I ignored it until I tested it again this
morning.
This time it worked as expected.
Do you know how I can relate this to ARP (this is the old FW, and it's
trying to ping using one of its own IP addresses on the LAN NIC to a
remote host's IP address through another NIC)? Also, how can I deal with
this if it ever occurs again? Should I run "arp -d *" on $FW? Or is the
issue within the ARP cache of the swithes beyond my "other NIC" which are
remotely administered, and to which I do not have access (except maybe if
I unplug them).
I can't help you much here I guess. The only thing I have in mind is that
I had to handle ARP issues on a FW where we had a cold standby FW. When
changing the FW some hosts didn't work until I added the following to the
"init" script then:

grep -s -v "^ *#" /etc/shorewall/proxyarp | while read address interface
external haveroute; do
ip addr add $address dev $external
arping -q -A -c 1 -I $external $address
( sleep 2 ; arping -q -U -c 1 -I $external $address ; ip addr del
$address dev $external ) > /dev/null 2>&1 < /dev/null &
done
sleep 3

That's more than a decade ago so I don't really remember well.

I'm afraid I can't help you with your other questions.

Simon
This leads me to another question. You previously mentioned that proxyarp
in Linux can lead to similar issues. I was/am using proxyarp=1 in
lan $IF_LAN routeback,proxyarp=1,arp_filter=1
wan $IF_WAN routeback,proxyarp=1,arp_filter=1
caib $IF_CAIB arp_filter=1
ibs $IF_IBS arp_filter=1
dmz $IF_DMZ routeback,dhcp,proxyarp=1
- lo -
BTW on the other end of $IF_WAN there's another Shorewall server acting as
a gateway/router/firewall.
net4 $IF_ISP4
optional,tcpflags,nosmurfs,routefilter=0,logmartians,proxyarp=0,arp_ignore=1,sourceroute=0
net3 $IF_ISP3
optional,tcpflags,nosmurfs,routefilter=0,logmartians,proxyarp=0,arp_ignore=1,sourceroute=0
net2 $IF_ISP2
optional,tcpflags,nosmurfs,routefilter,logmartians,proxyarp=0,arp_ignore=1,sourceroute=0
net1 $IF_ISP1
optional,tcpflags,nosmurfs,routefilter,logmartians,proxyarp=0,arp_ignore=1,sourceroute=0
dmz $IF_DMZ routeback
loc $IF_LAN routeback
I used proxyarp on the FW because I had a particular use case a while
back, but I should not require it anymore. I was hoping to use this
lan $IF_LAN routeback,arp_filter=1
wan $IF_WAN routeback,arp_filter=1
caib $IF_CAIB arp_filter=1
ibs $IF_IBS arp_filter=1
dmz $IF_DMZ routeback,dhcp
- lo -
However, when I replaced the old FW with the new one without "proxyarp=1"
connections were not working within my zones. When I re-enabled
"proxyarp=1" all zone traffic worked except for LAN2WAN.
Anyway, I'll post a dump later on if I get the chance. In the meantime,
I'd like to know how to truly disable proxyarp on a system that had it
enabled previously. Removing "proxyarp=1" might not be enough. I might
need to use "proxyarp=0"? Or should I echo 0 >
/proc/sys/net/ipv4/conf/DEVICE/proxy_arp?
I'm asking because even after removing "proxyarp=1" I still have this in
# cat /proc/sys/net/ipv4/conf/$IF_LAN/proxy_arp
1
# cat /proc/sys/net/ipv4/conf/$IF_WAN/proxy_arp
1
# cat /proc/sys/net/ipv4/conf/$IF_DMZ/proxy_arp
1
Thanks,
Vieri
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Shorewall-users mailing list
https://lists.sourceforge.net/lists/listinfo/shorewall-users
Loading...