Long Fat Networks

Living in Australia generally means that you're on the end of a Long Fat Network (LFN), internet-wise. That's a serious technical term which is important to the networking stack when determining optimal data transfer sizes.

Two of my colleagues down in Melbourne are also with Aussie Broadband and using the top (100Mbit down, 40Mbit up) NBN speed tier. We also have company-issued hardware vpn units because we work from home fulltime. I was delighted at the bandwidth available from Aussie for our connections to work systems in the SF Bay Area, and when I had cause to update my systems to a new build I observed that it now took about 55 minutes on our media server, rather than the 80-90 minutes it took with the SkyMesh connection.

There was a fly in the ointment, however, because my colleagues and I calculated that while we should be getting 1Mb/s or more as a sustained transfer rate from the internal pkg server, we'd often get around 400kb/s. Since networking is supposed to be something Solaris is good at, we started digging.

The first thing we looked at was the receive buffer size, which defaults to 1Mb. Greg found https://fasterdata.es.net/host-tuning/other/ so we changed that for tcp, udp and sctp. While fasterdata document talked about using /usr/sbin/ndd, the Proper Way(tm) to do this in Solaris 11.x is with /usr/sbin/ipadm:

 # for pp in tcp udp sctp; do ipadm show-prop -p max-buf $pp; done

PROTO PROPERTY              PERM CURRENT      PERSISTENT   DEFAULT      POSSIBLE
tcp   max-buf               rw   1048576      --           1048576      1048576-1073741824

PROTO PROPERTY              PERM CURRENT      PERSISTENT   DEFAULT      POSSIBLE
udp   max-buf               rw   2097152      --           2097152      65536-1073741824

PROTO PROPERTY              PERM CURRENT      PERSISTENT   DEFAULT      POSSIBLE
sctp  max-buf               rw   1048576      --           1048576      102400-1073741824

To effect a quick and persistent change, we uttered:

# for pp in tcp udp sctp; do ipadm set-prop -p max-buf=1073741824 $pp; done

While that did seem to make a positive difference, transferring a sample large file from across the Pacific still cycled up and down in the transfer rate. The cycling was really annoying. We kept digging.

The next thing we investigated was the congestion window, which is where the afore-mentioned LFN comes in to play. That property is cwnd-max:

 # for pp in tcp sctp; do ipadm show-prop -p cwnd-max $pp; done

PROTO PROPERTY              PERM CURRENT      PERSISTENT   DEFAULT      POSSIBLE
tcp   cwnd-max              rw   1048576      --           1048576      128-1073741824

PROTO PROPERTY              PERM CURRENT      PERSISTENT   DEFAULT      POSSIBLE
sctp  cwnd-max              rw   1048576      --           1048576      128-1073741824

Figuring that if it was worth doing, it was worth overdoing, we bumped that parameter up too:

# for pp in tcp sctp; do ipadm set-prop -p cwnd-max=1073741824 $pp; done
$ curl -o moz.bz2 http://ftp.mozilla.org/pub/mozilla/VMs/CentOS5-ReferencePlatform.tar.bz2
  % Total    % Received % Xferd  Average Speed   Time    Time     Time Current
                                 Dload  Upload   Total   Spent    Left  Speed
  3 3091M    3  102M    0     0  4542k      0  0:11:36  0:00:23  0:11:13 5747k^C

While that speed cycled around a lot, it mostly remained above 5MB/s.

Another large improvement. Yay!

However... we still saw the cycling. Intriguingly, the period was about 20 seconds, so there was still something else to twiddle.

In the meantime, however, I decided to update our media server.

I was blown away.

23 minutes 1 second

Not bad at all, even considering that when pkg(1) is transferring lots of small files it's difficult to keep the pipes filled.

Now that both Greg and I had several interesting data points to consider, I asked some of our network gurus for advice on what else we could look at. N suggested looking at the actual congestion algorithm in use, and pointed me to this article on High speed TCP.

High-speed TCP (HS-TCP ). HS-TCP is an update of TCP that reacts better when using large congestion windows on high-bandwidth, high-latency networks.

The Solaris default is the newreno algorithm:

 # ipadm show-prop -p cong-default,cong-enabled tcp
PROTO PROPERTY              PERM CURRENT      PERSISTENT   DEFAULT      POSSIBLE
tcp   cong-default          rw   newreno      --           newreno      newreno,cubic,
                                                                        dctcp,
                                                                        highspeed,
                                                                        vegas
tcp   cong-enabled          rw   newreno,     newreno,     newreno      newreno,cubic,
                                 cubic,dctcp, cubic,dctcp,              dctcp,
                                 highspeed,   highspeed,                highspeed,
                                 vegas        vegas                     vegas

Changing that was easy:

# for pp in tcp sctp ; do ipadm set-prop -p cong-default=highspeed $pp; done

Off to pull down that bz2 from mozilla.org again:

 $ curl -o blah.tar.bz2 http://ftp.mozilla.org/pub/mozilla/VMs/CentOS5-ReferencePlatform.tar.bz2
  % Total    % Received % Xferd  Average Speed   Time    Time     Time Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 3091M  100 3091M    0     0  5866k      0  0:08:59  0:08:59 --:--:-- 8684k

For a more local test (within Australia) I made use of Internode's facility:

$ curl -o t.test http://mirror.internode.on.net/pub/test/1000meg.test
  % Total    % Received % Xferd  Average Speed   Time    Time     Time Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  953M  100  953M    0     0  10.0M      0  0:01:35  0:01:35 --:--:-- 11.0M

And finally, updating my global zone.

 # time pkg update --be-name $NEWBE core-os@$version *incorporation@$version
            Packages to update: 291
       Create boot environment: Yes
Create backup boot environment:  No

DOWNLOAD                                PKGS         FILES    XFER (MB)   SPEED
Completed                            291/291     2025/2025  116.9/116.9  317k/s

PHASE                                          ITEMS
Removing old actions                       1544/1544
Installing new actions                     1552/1552
Updating modified actions                  2358/2358
Updating package state database                 Done
Updating package cache                       291/291
Updating image state                            Done
Creating fast lookup database                   Done
Reading search index                            Done
Building new search index                  1932/1932

A clone of $oldbe exists and has been updated and activated.
On the next boot the Boot Environment be://rpool/$newbe will be
mounted on '/'.  Reboot when ready to switch to this updated BE.


real    12m30.391s
user    4m4.173s
sys     0m21.496s

I think that's sufficient.