First of all a few things need to be known prior to tuning. They are:
- The amount of bandwidth available in Kb
- The average ping response time between the source and destination hosts (in ms)
- The maximum TCP segment size
For the purpose of this article, let's set some numbers of our own
Bandwidth (Kb) | 20,000 |
---|---|
Ping Time (ms) | 340 |
Max Segment Size | 1300 |
Note: To make full use of receive or even transmit window size tuning, BOTH hosts should be tuned.
- First of all, we need to find the current settings for the following parameters:
sysctl net.ipv4.tcp_window_scalingsysctl net.ipv4.tcp_slow_start_after_idlesysctl net.ipv4.tcp_rmemsysctl net.ipv4.tcp_wmemsysctl net.core.rmem_defaultsysctl net.core.wmem_defaultsysctl net.core.rmem_maxsysctl net.core.wmem_maxsysctl net.core.optmem_maxsysctl net.core.netdev_max_backlogsysctl net.ipv4.tcp_congestion_controlsysctl net.ipv4.tcp_timestampssysctl net.ipv4.tcp_sack
- As an example, this is the output when running those commands on our test system:
net.ipv4.tcp_window_scaling = 1net.ipv4.tcp_slow_start_after_idle = 1net.ipv4.tcp_rmem = 4096 87380 4194304net.ipv4.tcp_wmem = 4096 16384 4194304net.core.rmem_default = 124928net.core.wmem_default = 124928net.core.rmem_max = 124928net.core.wmem_max = 124928net.core.optmem_max = 20480net.core.netdev_max_backlog = 1000net.ipv4.tcp_congestion_control = cubicnet.ipv4.tcp_timestamps = 1net.ipv4.tcp_sack = 1
We’re only going to do receive window tuning so ignore the wmem parameters (it’s the same process anyway but requires bi-directional access in order to test).
Calculating The Optimal Window Size
First we need to find our BDP or Bandwidth Delay Product.
Start by multiplying the amount of bandwidth available (in our case, is 20000) by the ping response time (340ms). This gives us 6800000. Divide this by 8 and we have our BDP which is 850000.
Now we need to find our unscaled window value.
Take 65535 (why 65535? I don’t know) and divide it by our MSS (1300) and round down the result to the nearest even number. 65535 / 1300 is 50.41153846153846. Rounded down to the nearest even number brings us to 50. Then, multiply this value by 1300 (our MSS) to find the optimal unscaled window value which is 65000.
Still following? Hope so.
Multiply 65000 by 2 until it is larger than our BDP (which is 850000) and you should arrive at 1040000 which is your optimal window size. You can’t use 1040000 because it’s not a valid number. I usually divide it by 1024 and round up the result to the nearest whole number. In our case, 104000 / 1024 is 1015.625 which when rounded up is 1016. 1024 * 1016 = 1040384 which after all that is now our default window size. That’s just the method I use but you can do more research if you want to but I find it easy to remember especially since 1024 is usually the default minimum value given when you dump the current configuration.
We also need to set a minimum window size and a maximum window size. To save you some time and effort, use 16MB or 16777216 which is the maximum for a 1Gb/s local network link. Of course, if the server has a larger local network link ie 10Gb/s the maximum should be at least double that.
Minimum receive window size can be set where ever you want. Set this value too high and It will cause problems because there isn’t enough memory available to handle your minimum.
The Optimised Values
Note that this also includes write optimisation (wmem) values as well because i find myself referencing this documentation a lot.
# Turn on automatic TCP window size scalingsysctl -w net.ipv4.tcp_window_scaling=1# Disable TCP slow startsysctl -w net.ipv4.tcp_slow_start_after_idle=0# Set the min, default and maximum receive window sizes used during auto tuningsysctl -w net.ipv4.tcp_rmem='20480 1040384 16777216'sysctl -w net.ipv4.tcp_wmem='20480 1040384 16777216'# Set default receive window size here as well. This one is used when window size scaling is disabledsysctl -w net.core.rmem_default=1040384sysctl -w net.core.wmem_default=1040384# Set max receive window size here as well. This one is used when window size scaling is disabledsysctl -w net.core.rmem_max=16777216sysctl -w net.core.wmem_max=16777216# Set the maximum buffer size allowed per socket# I recommend just setting this to your maximum window sizesysctl -w net.core.optmem_max=16777216# Sets the maximum number of packets that will be buffered if the kernel can’t keep up# There’s no real method, I just set it to something that’s a lot higher than defaultsysctl -w net.core.netdev_max_backlog=65536# Set the congestion control algorithm. Not sure which one is better but cubic seems like it’s better than renosysctl -w net.ipv4.tcp_available_congestion_control=’cubic’# Enable timestamps as defined in RFC1323sysctl -w net.ipv4.tcp_timestamps=1# Enable select acknowledgmentssysctl -w net.ipv4.tcp_sack=1# Force all new TCP connections to use the above settingssysctl -w net.ipv4.route.flush=1
- You can (and definitely should) store these values in a file like so:
cat | tee /etc/sysctl.d/tcp_optimisations.conf <<EOF
# Turn on automatic TCP window size scaling
net.ipv4.tcp_window_scaling = 1
# Disable TCP slow start
net.ipv4.tcp_slow_start_after_idle = 0
# Set the min, default and maximum receive window sizes used during auto
tuning
net.ipv4.tcp_rmem = 20480 1040384 16777216
net.ipv4.tcp_wmem = 20480 1040384 16777216
# Set default receive window size here as well. This one is used when
window size scaling is disabled
net.core.rmem_default = 1040384
net.core.wmem_default = 1040384
# Set max receive window size here as well. This one is used when window
size scaling is disabled
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
# Set the maximum buffer size allowed per socket
# I recommend just setting this to your maximum window size
net.core.optmem_max = 16777216
# Sets the maximum number of packets that will be buffered if the kernel
can’t keep up
# There’s no real method, I just set it to something that’s a lot higher
than default
net.core.netdev_max_backlog = 65536
# Set the congestion control algorithm. Not sure which one is better but
cubic seems like it’s better than reno
net.ipv4.tcp_available_congestion_control = cubic
# Enable timestamps as defined in RFC1323
net.ipv4.tcp_timestamps = 1
# Enable select acknowledgments
net.ipv4.tcp_sack = 1
# Force all new TCP connections to use the above settings
net.ipv4.route.flush = 1
EOF
... and that's it! Happy tuning!
No comments:
Post a Comment