Examining iptables rules for a Kubernetes service

When a service gets created, kube-proxy daemonset will inject a few of iptables chains and rules to agent node.

Let’s look at the iptables rules for the sample nginx Service with 2 Pods:

# iptables -L -t nat|grep nginx
The output is like this:
...
KUBE-EXT-OVTWZ4GROBJZO4C5  tcp  --  anywhere             anywhere             /* default/nginx:80-80 */ tcp dpt:32754
...
From output can be noticed destination port for Service - 32754. The comments also show the namespace/pod name.

KUBE-SERVICES (chain) containts all Services. Since the service we created is of type NodePort, let’s list the other rules in KUBE-NODEPORTS chain:
# iptables -L KUBE-NODEPORTS -t nat
To get the full rules, you can use:
iptables-save > rules.txt
So we have:
-A KUBE-SERVICES -d 10.110.199.45/32 -p tcp -m comment --comment "default/nginx:80-80 cluster IP" -m tcp --dport 80 -j KUBE-SVC-OVTWZ4GROBJZO4C5
The above iptables rules tells Kubernetess: any IP packet whose destination address is 10.110.199.45 and whose destination port is 80 should jump to another iptables chain named KUBE-SVC-OVTWZ4GROBJZO4C5 for processing.

Let's see that chain by doing a grep for KUBE-SVC-OVTWZ4GROBJZO4C5:
# cat rules.txt|grep KUBE-SVC-OVTWZ4GROBJZO4C5
Output:
-A KUBE-SVC-OVTWZ4GROBJZO4C5 ! -s 10.42.0.0/24 -d 10.110.199.45/32 -p tcp -m comment --comment "default/nginx:80-80 cluster IP" -m tcp --dport 80 -j KUBE-MARK-MASQ
-A KUBE-SVC-OVTWZ4GROBJZO4C5 -m comment --comment "default/nginx:80-80 -> 10.42.0.199:80" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-ZGUCVTBPJNCW75NN
-A KUBE-SVC-OVTWZ4GROBJZO4C5 -m comment --comment "default/nginx:80-80 -> 10.42.0.200:80" -j KUBE-SEP-UIOQG3KKGTPDCR56
Some rules are actually a iptables chain with random mode (--mode random).

So this set of rules is where the Service achieves load balancing to the 2 destinations: KUBE-SEP-ZGUCVTBPJNCW75NN and KUBE-SEP-UIOQG3KKGTPDCR56, because there are only 2 Pods for this Service.

The probability of the first rule being selected is 1/2, the last entry is 1 - so it is last if no other rules will be selected. For 3 pods, probablility will be 1/3, 1/2 and 1.

Pod rules - by looking at the details of the above 2 chains, KUBE-SEP-ZGUCVTBPJNCW75NN and KUBE-SEP-UIOQG3KKGTPDCR56:
# cat rules.txt|grep KUBE-SEP-ZGUCVTBPJNCW75NN
...
-A KUBE-SEP-ZGUCVTBPJNCW75NN -p tcp -m comment --comment "default/nginx:80-80" -m tcp -j DNAT --to-destination 10.42.0.199:80
...
and:
# cat rules.txt|grep KUBE-SEP-UIOQG3KKGTPDCR56
...
-A KUBE-SEP-UIOQG3KKGTPDCR56 -p tcp -m comment --comment "default/nginx:80-80" -m tcp -j DNAT --to-destination 10.42.0.200:80
...
We can understand in this way the Service forwarding by seeing the target pods IP adresses.

It can be seen that these two chains are actually two DNAT rules. But before DNAT rules, iptables also sets a mark (--set-xmark) on incoming IP packets.

The role of the DNAT rule is to change the destination address and port of the incoming IP packet to the new destination address and port specified by –-to-destination before the PREROUTING checkpoint, that is, before routing.