Wednesday, April 8, 2015

Fixing Docker VPN Incompatibilities on OSX

I currently use a Macbook Pro as my dev box. And I have Cisco's AnyConnect for VPN access. From time to time I've noticed that I get into a state where I can't access my docker container running in boot2docker. For example I'll get the following behavior:

$ docker images
FATA[0032] An error occurred trying to connect: Get https://192.168.59.103:2376/v1.17/images/json: dial tcp 192.168.59.103:2376: i/o timeout


When in this state boot2docker works just fine. I can start it, stop it, and check its status just fine:

$ boot2docker status
running

But docker is nowhere to be found including contacting the virtual box host IP address which gives the biggest clue as to what is causing this problem:

$ boot2docker ip
192.168.59.103

$ ping 192.168.59.103
PING 192.168.59.103 (192.168.59.103): 56 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
Request timeout for icmp_seq 2

Turns out that one VPN profile that I use, significantly alters the routing table entries. To see this, I captured my routing table contents after a fresh hard restart of the Mac. Here is the output with things that might be related showing:

$ netstat -rn
Routing tables

Internet:
Destination  Gateway       Flags   Refs    Use     Netif Expire
default      10.109.16.1   UGSc      90      0     en8
default      10.109.128.1  UGScI      2      0     en0
...
127          127.0.0.1     UCS        0      0     lo0
127.0.0.1    127.0.0.1     UH       135   7682     lo0
...

By capturing this output to a text file then starting boot2docker and capturing it again in a separate text file and perform a diff between them the only changes after factoring out changes in Refs, Use, and Expire is this line that has been added to the bottom of the table:

Destination  Gateway       Flags   Refs    Use   Netif Expire
...
192.168.59   link#14       UC         1      0 vboxnet

Depending on if you have run any docker commands after running boot2docker up or boot2docker start you may have a "host route" in there as well looking something like this:

Destination    Gateway         Flags  Refs   Use   Netif Expire
...
192.168.59.103 8:0:27:ac:51:8b UHLWIi    1    40 vboxnet   1197

If you haven't done much with networking issues, like me, pull up the man page for the netstat -r output to grasp what is going on here. It indicates:

The routing table display indicates the available routes and their status.  Each route consists of a destination host or network and a gateway to use in forwarding packets.  The flags field shows a collection of information about the route stored as binary choices.  The individual flags are discussed in more detail in the route(8) and route(4) manual pages.  The mapping between letters and flags is:

1  RTF_PROTO1     Protocol specific routing flag #1
2  RTF_PROTO2     Protocol specific routing flag #2
3  RTF_PROTO3     Protocol specific routing flag #3
B  RTF_BLACKHOLE  Just discard packets (during updates)
b  RTF_BROADCAST  The route represents a broadcast address
C  RTF_CLONING    Generate new routes on use
c  RTF_PRCLONING  Protocol-specified generate new routes on use
D  RTF_DYNAMIC    Created dynamically (by redirect)
G  RTF_GATEWAY    Destination requires forwarding by intermediary
H  RTF_HOST       Host entry (net otherwise)
I  RTF_IFSCOPE    Route is associated with an interface scope
i  RTF_IFREF      Route is holding a reference to the interface
L  RTF_LLINFO     Valid protocol to link address translation
M  RTF_MODIFIED   Modified dynamically (by redirect)
m  RTF_MULTICAST  The route represents a multicast address
R  RTF_REJECT     Host or net unreachable
r  RTF_ROUTER     Host is a default router
S  RTF_STATIC     Manually added
U  RTF_UP         Route usable
W  RTF_WASCLONED  Route was generated as a result of cloning
X  RTF_XRESOLVE   External daemon translates proto to link address
Y  RTF_PROXY      Proxying; cloned routes will not be scoped

     Direct routes are created for each interface attached to the local host; the gateway field for such entries shows the address of the outgoing interface.  The refcnt field gives the current number of active uses of the route.  Connection oriented protocols normally hold on to a single route for the duration of a connection while connectionless protocols obtain a route while sending to the same destination.  The use field provides a count of the number of packets sent using that route.  The interface entry indicates the network interface utilized for the route.  A route which is marked with the RTF_IFSCOPE flag is instantiated for the corresponding interface.  A cloning route which is marked with the RTF_PROXY flag will not generate new routes that are associated with its interface scope.


So the first new line is for a network itself and having the UC flags indicates that the route is useable and will generate (clone) a new route upon being used; namely a host route which is that second line that gets added when first you run a docker command and access the VM. Its flags, UHLWIi, respectively indicate it is useable, is a host route, performs address translation, was generated as a result of cloning, is associated with an interface, and is holding a reference to that interface which is indicated in the Netif column as vboxnet. And docker is happy as can be seen with any of its command such as:

$ docker version
Client version: 1.5.0
Client API version: 1.17
Go version (client): go1.4.1
Git commit (client): a8a31ef
OS/Arch (client): darwin/amd64
Server version: 1.5.0
Server API version: 1.17
Go version (server): go1.4.1
Git commit (server): a8a31ef

Interestingly, if I put the Macbook into sleep and come back at this point, when I look at the route table the host route now has changed. Notably the destination, gateway, and flags. That new flag, b, indicates that it is a broadcast address. Once exercised it reverts back to the same destination, gateway, and flags that we had before.

Destination    Gateway           Flags  Refs  Use    Netif Expire
...
92.168.59      link#14           UC        2    0  vboxnet
192.168.59.255 ff:ff:ff:ff:ff:ff UHLWbI    0   15  vboxnet

So far so good.

VPN to the Rescue (NOT)

Now I start up that VPN profile and try using docker which is no longer happy after quite a TCP timeout delay:

$ docker version
Client version: 1.5.0
Client API version: 1.17
Go version (client): go1.4.1
Git commit (client): a8a31ef
OS/Arch (client): darwin/amd64
FATA[0032] An error occurred trying to connect: Get https://192.168.59.103:2376/v1.17/version: dial tcp 192.168.59.103:2376: i/o timeout

Looking at the routing table, the host route has been removed completely and the 192.168.59 network routing has now been changed to:

Destination    Gateway           Flags  Refs  Use  Netif Expire
...
192.168.59     link#13           UCS       0    0  utun0

That changed entry is the problem. With this entry in the routing table the network to which 192.168.59.X traffic gets routed is the VPN's network adaptor and it knows nothing about any VM running locally.

In this state with VPN running I've been unable to remove that route with any of the following:

sudo route -n delete -net 192.168.59.0/24 -interface utun0
sudo route -n delete -net 192.168.59.0 -interface utun0 

Even though they appear to work, looking at the route table afterward shows the same route still in there. However, once VPN is turned off there is hope.


Fixing Once VPN is Off

Once I shut down my VPN and look at the route table that entry has disappeared and neither boot2docker start nore sudo route -n flush followed by boot2docker start will add it back in although restarting the Mac will fix it. But it can be added back in manually with the following command:

sudo route -n add -net 192.168.59.0/24 -interface vboxnet0

If you are uncertain if your VM is using the vboxnet0 interface you can find out with this command which  is returning vboxnet0 on my machine as shown:

$VBoxManage showvminfo boot2docker-vm --machinereadable | grep hostonlyadapter | cut -d '"' -f 2
vboxnet0

Once re-added, take a look at the table and you'll see:

Destination    Gateway           Flags  Refs  Use    Netif Expire
...
92.168.59      link#14           UCSc      0    0  vboxnet

And docker is happy again. 

If anyone figures out how to fix that link in the route table while VPN is open please leave a post so the rest of us can docker while we work.

Enjoy.

12 comments:

  1. > And I have Cisco's AnyConnect for VPN access

    I avoid AnyConnect like the plague that it is. That's only because I know the "shared secrets" that the org prefers not to share with us, but it enables me to setup the Cisco IPSec VPN that comes built-in to Mac OS X.

    ReplyDelete
  2. Are you able to run docker commands while connecting with Cisco IPSec VPN? If so that would be a great way to get around this issue.

    ReplyDelete
  3. I think you can remove the unwanted route by removing the extra data and then add the right one... If I where you I'll try this (as root), with VPN Connected:
    route -n delete -net 192.168.59.0/24
    route -n add -net 192.168.59.0/24 -interface vboxnet0

    I think the error here is trying to define a route two times. Of course I can be truly wrong, but I hope this helps.

    (I've posted two times in order to provide contact info)

    ReplyDelete
  4. Cool solution for OS X.

    ReplyDelete
  5. I haven't revisited the docker issue since I posted. When I take another look I'll post what I find.

    ReplyDelete
  6. This comment has been removed by a blog administrator.

    ReplyDelete
  7. Whenever I try to change/delete/add a new route, AnyConnect quickly reverts/adds/deletes what I've changed. My VM's ip was 192.168.99 with the vboxnet0 interface. When I added the route for 192.168.99.100, it actually added, but then was immediately commandeered to be utun0.

    ReplyDelete
  8. @Kelly Nicholdes - exactly. Your VPN must still be connected as I noted in this post. The fix only works once VPN is disconnected which sucks but at least gets you working again.

    ReplyDelete
  9. This seems to work after disconnect:

    docker-machine stop default && sudo /Library/Application\ Support/VirtualBox/LaunchDaemons/VirtualBoxStartup.sh restart && docker-machine restart default

    Or, using openconnect allows me to not worry about it ever. :)

    ReplyDelete