Discussion:
[Shorewall-users] reducing latency, removing conntrack, other options?
Daniel Pocock
2017-04-28 17:30:16 UTC
Permalink
Hi,

I'm running some applications on virtual servers with a virtual
firewall/router running Shorewall. Shorewall is version 4.6.4.3-2 on
Debian.

The virtualization platform is libvirt/KVM + Open vSwitch.

I'm noticing latency doubles when things go through the firewall. In
particular, I have recently set up a couple of virtual desktops and I'm
trying to access them with the SPICE protocol. It is supposed to be
more efficient than VNC or RDP but I'm finding there is always latency
in the UI.

I tried some ping tests (from my home, using a gigabit fibre connection)
and observed:

ping the physical server = 0.8ms
ping the virtual firewall = 1.4ms
ping the virtual server = 1.8ms

I run Smokeping on various other nodes to monitor latency as well, the
reports are consistent with those ping times.

I tried increasing RAM and CPU cores for the virtual firewall and
upgrading it to a Linux 4.9 kernel. There was no change.

Are there other improvements I can make to reduce latency?

Is it possible an upgrade to Shorewall 5 will make any difference?
5.0.15.6 is in Debian stretch[1]

Can Shorewall be used without connection tracking and could that
possibly make a difference?

Regards,

Daniel



1. https://packages.qa.debian.org/s/shorewall.html
Simon Hobson
2017-04-28 21:13:58 UTC
Permalink
Post by Daniel Pocock
I'm noticing latency doubles when things go through the firewall. In
particular, I have recently set up a couple of virtual desktops and I'm
trying to access them with the SPICE protocol. It is supposed to be
more efficient than VNC or RDP but I'm finding there is always latency
in the UI.
I tried some ping tests (from my home, using a gigabit fibre connection)
ping the physical server = 0.8ms
ping the virtual firewall = 1.4ms
ping the virtual server = 1.8ms
What happens if you clear the firewall (shorewall clear) ?
Bear in mind that when you introduce the firewall, you are (I assume) sending the packets through an extra switch, virtual NIC, virtual machine, virtual NIC. So even without any firewall processing you will add latency.
Looking at the times you give above, adding the virtual switch and NIC to get to the firewall VM adds .6ms, the extra virtual NIC, virtual switch, virtual NIC to get to the server adds an additional 0.4ms. Not much in it.

For something really latency sensitive, you might be better just running a firewall on the server.
Daniel Pocock
2017-04-29 05:58:00 UTC
Permalink
Post by Simon Hobson
Post by Daniel Pocock
I'm noticing latency doubles when things go through the firewall. In
particular, I have recently set up a couple of virtual desktops and I'm
trying to access them with the SPICE protocol. It is supposed to be
more efficient than VNC or RDP but I'm finding there is always latency
in the UI.
I tried some ping tests (from my home, using a gigabit fibre connection)
ping the physical server = 0.8ms
ping the virtual firewall = 1.4ms
ping the virtual server = 1.8ms
What happens if you clear the firewall (shorewall clear) ?
Bear in mind that when you introduce the firewall, you are (I assume) sending the packets through an extra switch, virtual NIC, virtual machine, virtual NIC. So even without any firewall processing you will add latency.
Looking at the times you give above, adding the virtual switch and NIC to get to the firewall VM adds .6ms, the extra virtual NIC, virtual switch, virtual NIC to get to the server adds an additional 0.4ms. Not much in it.
I tried "shorewall clear && shorewall6 clear" while running ping and
didn't see much difference in the ping times so it may not be
firewalling at all.

If I understand the conntrack documentation[1] correctly, each TCP
packet is still processed by conntrack even if there are no firewall
rules using NAT. The only way to stop conntrack looking at packets is
to unload the modules for conntrack. Could conntrack be adding that
much latency though?

Beyond Shorewall, can anybody suggest any general strategies to reduce
latency in a Linux router/firewall setup or any good links on this topic?
Post by Simon Hobson
For something really latency sensitive, you might be better just running a firewall on the server.
That would be preferable, but in this case space for physical servers is
limited.

Regards,

Daniel


1. http://people.netfilter.org/pablo/docs/login.pdf
Simon Hobson
2017-04-29 09:57:55 UTC
Permalink
Post by Daniel Pocock
Post by Simon Hobson
For something really latency sensitive, you might be better just running a firewall on the server.
That would be preferable, but in this case space for physical servers is
limited.
Sorry, I don't understand this bit. All I'm suggesting is that for a really latency sensitive application, you run a local (software) firewall on that (virtual) server and connect it's interface outside of your firewall (virtual) device. There's no hardware difference, just removing some (software) elements from the packet path.


But there is an important thing to remember about software firewalling like this. If you go out and spend loads of dosh on a firewall device from the likes of Cisco, part of what that money buys you is a hardware packet processing engine.
The first packet in any conversation will (may ?*) still go through the supervisor processor, but once it's evaluated all the rules, it will cache it in the hardware filter engine - thereafter, the packets are processed in hardware, probably with "cut through**" and be handled very fast.
Using a software firewall on a Linux box, every packet must traverse the IP stack. So each packet must be received fully into a buffer, then the next level up can decide where that packet needs to be passed, at various points filters will be applied, it ends up in another buffer, and finally gets sent out of an interface. No matter how fast you make the processing, there is a fundamental limit that the packet isn't processed until it's all been received, and it can't start transmitting on the outbound interface until it's finished processing.

So I don't think any software implementation is going to match a hardware firewall/router UNLESS your processing rules are such that the traffic of interest can't be processed through the "fast path" of the hardware routing engine.

* I'm not that clued up on the current state, but I believe that modern hardware routing engines now have sufficient capabilities to apply some rules independently of the supervisory processor. If the packet can be processed by the fast routing engine then it is, only if it exceeds the capabilities or knowledge (eg needs more complicated rule processing than the engine can do) does the packet leave the fast path and be passed up to the supervisory engine for a decision. Once that decision is made, the results are cached so that the engine can handle further packets in the conversation - basically a hardware equivalent to the Linux conntrack processing.

** cut-through packet handling allows the packet to be sent out as soon as there is enough information to determine it's routing - and providing the egress interface is not already busy. So typically, you only need the packet headers (MAC addresses for switching, IP header for basic routing, IP+TCP/UDP/whatever headers for advanced routing or filtering) to make that decision, and can start sending the packet before the rest of it has come in.
A bit more reading from Cisco here http://www.cisco.com/c/en/us/products/collateral/switches/nexus-5020-switch/white_paper_c11-465436.html
Simon Hobson
2017-04-29 10:13:17 UTC
Permalink
Post by Simon Hobson
But there is an important thing to remember about software firewalling like this. If you go out and spend loads of dosh on a firewall device from the likes of Cisco, part of what that money buys you is a hardware packet processing engine.
The first packet in any conversation will (may ?*) still go through the supervisor processor, but once it's evaluated all the rules, it will cache it in the hardware filter engine - thereafter, the packets are processed in hardware, probably with "cut through**" and be handled very fast.
Using a software firewall on a Linux box, every packet must traverse the IP stack. So each packet must be received fully into a buffer, then the next level up can decide where that packet needs to be passed, at various points filters will be applied, it ends up in another buffer, and finally gets sent out of an interface. No matter how fast you make the processing, there is a fundamental limit that the packet isn't processed until it's all been received, and it can't start transmitting on the outbound interface until it's finished processing.
Just had another think about this ...
Where the server, firewall, and virtual switches are all on one virtual host, there is another factor in favour of the software setup. The packets aren't serialised and sent down a bit of wire - transmitting a packet means copying a buffer of bytes in memory. So after the initial reception of the packet from the outside network, there's no further serialisation as the virtual switches and virtual NICs are all "just buffers in memory".

But against the standard setup, AIUI all the host packet handling is done by one thread running in Dom0. So that could ba dding latency as this single thread will be handling packets going into your firewall "device", the packets coming out and going into another (virtual) switch, and the packets coming out the other side of that switch and going into your "server" virtual machine. I believe there are ways around this, but it's not something I've looked into.
Tom Eastep
2017-04-29 15:04:07 UTC
Permalink
Post by Daniel Pocock
Post by Daniel Pocock
I'm noticing latency doubles when things go through the
firewall. In particular, I have recently set up a couple of
virtual desktops and I'm trying to access them with the SPICE
protocol. It is supposed to be more efficient than VNC or RDP
but I'm finding there is always latency in the UI.
I tried some ping tests (from my home, using a gigabit fibre
ping the physical server = 0.8ms ping the virtual firewall =
1.4ms ping the virtual server = 1.8ms
What happens if you clear the firewall (shorewall clear) ? Bear
in mind that when you introduce the firewall, you are (I assume)
sending the packets through an extra switch, virtual NIC, virtual
machine, virtual NIC. So even without any firewall processing you
will add latency. Looking at the times you give above, adding the
virtual switch and NIC to get to the firewall VM adds .6ms, the
extra virtual NIC, virtual switch, virtual NIC to get to the
server adds an additional 0.4ms. Not much in it.
I tried "shorewall clear && shorewall6 clear" while running ping
and didn't see much difference in the ping times so it may not be
firewalling at all.
If I understand the conntrack documentation[1] correctly, each TCP
packet is still processed by conntrack even if there are no
firewall rules using NAT. The only way to stop conntrack looking
at packets is to unload the modules for conntrack. Could conntrack
be adding that much latency though?
Not by itself. Note that you can selectively disable conntrack by
adding entries in the conntrack file (shorewall-conntrack(5)).

- -Tom
- --
Tom Eastep \ Q: What do you get when you cross a mobster with
Shoreline, \ an international standard?
Washington, USA \ A: Someone who makes you an offer you can't
http://shorewall.net \________________________________________________
Loading...