Trying to solve problems, only to find other problems

Docker iptables firewalling

Some time ago I was working on a web site for collecting data on the Croatian 2015 parliamentary elections. It was not rocket science, just a python web application behind nginx collecting user responses to a couple of questions. When designing the architecture of the system we had Docker in mind so every part of the system was a separate container.

Docker is a really nice piece of technology, it lets you define images from small textual Dockerfiles and then from those images you can create running containers. The cool part is that the images can be migrated to other machines so you always have the same template from which you create a container. No more dev/test/production mismatch.

Once you set up your base images and run your entire system on a server you remember that you have to write some firewall rules so that external ports are not exposed to the public. The basic example is to close all ports, and open one for connecting to the server using SSH. This basic iptables setup is shown in the following listing:

1
2
3
4
5
6
7
8
*filter
:INPUT DROP [0:0]
:FORWARD DROP [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -s 127.0.0.0/8 -d 127.0.0.0/8 -i lo -j ACCEPT
-A INPUT -p tcp -m tcp --dport 22 -m state --state NEW -j ACCEPT
COMMIT

Note

This is an example from a machine without Docker installed.

As you can see we explicitly drop input and forward packets if no rule is matched in those chains. The rule in line 5 enables established connection to continue living, never miss this one. The rule in line 7 allows connection on port 22 to be established. As this is the port that we use for SSH we are done.

Now, lets install Docker on this machine. Once you do the default install and look at the iptables again you can see the following:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:DOCKER - [0:0]
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE
COMMIT
*filter
:INPUT DROP [0:0]
:FORWARD DROP [0:0]
:OUTPUT ACCEPT [0:0]
:DOCKER - [0:0]
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -s 127.0.0.0/8 -d 127.0.0.0/8 -i lo -j ACCEPT
-A INPUT -p tcp -m tcp --dport 22 -m state --state NEW -j ACCEPT
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
COMMIT

Docker added quite a few rules here, both in the nat and in the filter tables. To better understand it you should know that Docker creates a bridge, named docker0, on the host and connects all eth0 interfaces in containers to this bridge. The default ip address for the docker0 bridge is 172.17.42.1 (until Docker 1.9 when they switched it to 172.17.0.1). Interfaces in containers get assigned free addresses in the 172.17.0.0/16 subnet. Docker uses the iptables extensively to route the packet from the container. It uses NAT when the containers need to communicate with the outside world. Explaining every single line here is a bit tedious and useless outside a context. So, lets fist add a nginx container, and map its port 80 to the host’s port 80.

docker run --name nginx_server -d -p 80:80 nginx

Docker added a few rules here to make this mapping work:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
*nat
:PREROUTING ACCEPT [3:234]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [3:180]
:POSTROUTING ACCEPT [4:232]
:DOCKER - [0:0]
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE
-A POSTROUTING -s 172.17.0.2/32 -d 172.17.0.2/32 -p tcp -m tcp --dport 80 -j MASQUERADE
-A DOCKER ! -i docker0 -p tcp -m tcp --dport 80 -j DNAT --to-destination 172.17.0.2:80
COMMIT
*filter
:INPUT DROP [3:234]
:FORWARD DROP [0:0]
:OUTPUT ACCEPT [230:27481]
:DOCKER - [0:0]
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -s 127.0.0.0/8 -d 127.0.0.0/8 -i lo -j ACCEPT
-A INPUT -p tcp -m tcp --dport 22 -m state --state NEW -j ACCEPT
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A DOCKER -d 172.17.0.2/32 ! -i docker0 -o docker0 -p tcp -m tcp --dport 80 -j ACCEPT
COMMIT

The new rules are in lines 10 (routing from the nginx container to itself), 11 (routing from outside, port 80, to this container), and 25 (routing to this container from the host). We will dig inte them later.

To check that the web is working you can use a simple curl:

curl 0.0.0.0:80

And you will get the nginx default html page. So far so good. Now you want to check that this is not working from the outside. So you use an outside machine and point your browser to the external IP address of this machine (e.g. 192.168.1.100). And this still works. You haven’t been expecting that, did you? Neither did I :)

So we have a problem, our default M.O. for filtering ports doesn’t work. Why doesn’t it work and how do we fix it? As I haven’t been using iptables for a long time i tried to refresh my memory by reading a few howto-s ([1], [2], [3], and [4]). After reading all that I was still not sure why it behaved like that. I knew that the chains are traversed in a specific order ([5]), but it still wasn’t clear. So I used the trick with the logging facilities of the iptables. We add a rule at the beginning of each chain that routes the packets to a copy chain with logging turned on. For our example the following commands should be executed:

iptables -t nat -N LOGGING_PREROUTING
iptables -t nat -I PREROUTING 1 -j LOGGING_PREROUTING
iptables -t nat -A LOGGING_PREROUTING -m limit --limit 10/sec -j LOG --log-prefix "PREROUTING-chain: " --log-level 7
iptables -t nat -A LOGGING_PREROUTING -m addrtype --dst-type LOCAL -j DOCKER

iptables -t nat -N LOGGING_POSTROUTING
iptables -t nat -I POSTROUTING 1 -j LOGGING_POSTROUTING
iptables -t nat -A LOGGING_POSTROUTING -m limit --limit 10/sec -j LOG --log-prefix "POSTROUTING-1-chain: " --log-level 7
iptables -t nat -A LOGGING_POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE
iptables -t nat -A LOGGING_POSTROUTING -m limit --limit 10/sec -j LOG --log-prefix "POSTROUTING-2-chain: " --log-level 7
iptables -t nat -A LOGGING_POSTROUTING -s 172.17.0.2/32 -d 172.17.0.2/32 -p tcp -m tcp --dport 80 -j MASQUERADE
iptables -t nat -A LOGGING_POSTROUTING -m limit --limit 10/sec -j LOG --log-prefix "POSTROUTING-3-chain: " --log-level 7

iptables -t nat -N LOGGING_DOCKER_NAT
iptables -t nat -I DOCKER 1 -j LOGGING_DOCKER_NAT
iptables -t nat -A LOGGING_DOCKER_NAT -m limit --limit 10/sec -j LOG --log-prefix "DOCKER_NAT-1-chain: " --log-level 7
iptables -t nat -A LOGGING_DOCKER_NAT ! -i docker0 -p tcp -m tcp --dport 80 -j DNAT --to-destination 172.17.0.2:80
iptables -t nat -A LOGGING_DOCKER_NAT -m limit --limit 10/sec -j LOG --log-prefix "DOCKER_NAT-2-chain: " --log-level 7

iptables -t nat -N LOGGING_OUTPUT_NAT
iptables -t nat -I OUTPUT 1 -j LOGGING_OUTPUT_NAT
iptables -t nat -A LOGGING_OUTPUT_NAT -m limit --limit 10/sec -j LOG --log-prefix "OUTPUT_NAT-chain: " --log-level 7
iptables -t nat -A LOGGING_OUTPUT_NAT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER

iptables -N LOGGING_FORWARD
iptables -I FORWARD 1 -j LOGGING_FORWARD
iptables -A LOGGING_FORWARD -m limit --limit 10/sec -j LOG --log-prefix "FOWARD-chain: " --log-level 7
iptables -A LOGGING_FORWARD -o docker0 -j DOCKER
iptables -A LOGGING_FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
iptables -A LOGGING_FORWARD -i docker0 ! -o docker0 -j ACCEPT
iptables -A LOGGING_FORWARD -i docker0 -o docker0 -j ACCEPT

iptables -N LOGGING_DOCKER_FILTER
iptables -I DOCKER 1 -j LOGGING_DOCKER_FILTER
iptables -A LOGGING_DOCKER_FILTER -m limit --limit 10/sec -j LOG --log-prefix "DOCKER_FILTER-chain: " --log-level 7
iptables -A LOGGING_DOCKER_FILTER -d 172.17.0.2/32 ! -i docker0 -o docker0 -p tcp -m tcp --dport 80 -j ACCEPT

iptables -N LOGGING_INPUT
iptables -I INPUT 1 -j LOGGING_INPUT
iptables -A LOGGING_INPUT -m limit --limit 10/sec -j LOG --log-prefix "INPUT_FILTER-chain: " --log-level 7

This is quite a few rules, but if you watch carefully you can see it is a 1-to-1 mapping of the original. The only exception is in the LOGGING_INPUT chain which is not like that because it is not hit (causing our filtering problem). So we have set this up and now it time to see the logs. If you try to access the nginx default web page from the outside like before the logs will fill with data. The log file on Ubutnu is /var/log/kern.log. It is enough to list only the log prefixes to see the route of the packets.

PREROUTING-chain: IN=eth0 OUT=
DOCKER_NAT-1-chain: IN=eth0 OUT=
FOWARD-chain: IN=eth0 OUT=docker0
DOCKER_FILTER-chain: IN=eth0 OUT=docker0
POSTROUTING-1-chain: IN= OUT=docker0
POSTROUTING-2-chain: IN= OUT=docker0
POSTROUTING-3-chain: IN= OUT=docker0
FOWARD-chain: IN=docker0 OUT=eth0

Once you reach the FOWARD-chain you are done, because that is a return packet. So, the host is actually a router that routes packets to the container on the 172.17.0.2:80 address and port. That is the reason the INPUT chain rule that we used for ssh did’t work in this case. Ok, now we know the problem so the solution should be easy. Instead of putting these rules in the INPUT chain you can put them into the DOCKER chain in the filter table. In the filter table you should put this two lines before the COMMIT command:

-A DOCKER -i eth0 -m state --state RELATED,ESTABLISHED -j ACCEPT
-A DOCKER -i eth0 -j DROP

If you now restart the docker service it doesn’t work any more. To enable it when you create a container you add another rule per port:

iptables -I DOCKER 1 -i eth0 -p tcp -m tcp --dport 80 -j ACCEPT

To clean the logging garbage you should reboot or issue something like this:

iptables -t nat -F LOGGING_PREROUTING
iptables -t nat -D PREROUTING -j LOGGING_PREROUTING
iptables -t nat -F LOGGING_POSTROUTING
iptables -t nat -D POSTROUTING -j LOGGING_POSTROUTING
iptables -t nat -F LOGGING_DOCKER_NAT
iptables -t nat -D DOCKER -j LOGGING_DOCKER_NAT
iptables -t nat -F LOGGING_OUTPUT_NAT
iptables -t nat -D OUTPUT -j LOGGING_OUTPUT_NAT
iptables -F LOGGING_FORWARD
iptables -D FORWARD -j LOGGING_FORWARD
iptables -F LOGGING_DOCKER_FILTER
iptables -D DOCKER -j LOGGING_DOCKER_FILTER
iptables -F LOGGING_INPUT
iptables -D INPUT -j LOGGING_INPUT

iptables -t nat -X LOGGING_PREROUTING
iptables -t nat -X LOGGING_POSTROUTING
iptables -t nat -X LOGGING_DOCKER_NAT
iptables -t nat -X LOGGING_OUTPUT_NAT
iptables -X LOGGING_FORWARD
iptables -X LOGGING_DOCKER_FILTER
iptables -X LOGGING_INPUT

TL;DR;

To enable only ports mapped from docker containers using the --port host_port:container_port (--port 80:80) syntax add the rules in the DOCKER chain, not the INPUT chain (lines 5, and 9-11).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
*filter
:INPUT DROP [0:0]
:FORWARD DROP [0:0]
:OUTPUT ACCEPT [0:0]
:DOCKER - [0:0]
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -s 127.0.0.0/8 -d 127.0.0.0/8 -i lo -j ACCEPT
-A INPUT -p tcp -m tcp --dport 22 -m state --state NEW -j ACCEPT
-A DOCKER -i eth0 -p tcp -m tcp --dport 80 -j ACCEPT
-A DOCKER -i eth0 -m state --state RELATED,ESTABLISHED -j ACCEPT
-A DOCKER -i eth0 -j DROP
COMMIT
Mladen Marinović

Mladen Marinović

CTO @ Smartivo

2016-01-30