kubernetes网络 – kube-proxy详解

欢迎加入本站的kubernetes技术交流群,微信添加:加Blue_L。


kube-proxy是kubernetes中网络核心组件,实现了服务暴露和转发等网络功能。如前面章节讲到,kube-proxy支持用户空间模式,ipvs和iptables三种代理模式。用户空间模式性能问题较严重,基本不再使用,应用最多的是iptables和ipvs模式。如果启动时不指定具体使用哪种模式,当前版本kube-proxy会优先使用iptables模式,如果探测到系统不支持则回滚到用户空间模式。当然,如果指定了ipvs模式,但是系统不支持的话也会回滚到iptables模式。

kube-proxy通过apiserver监听service和endpoint资源的变化,并动态更新网络配置规则。当前iptables应该是应用较多的模式,因此本节主要针对iptables规则进行分析。对于ipvs模式来说也并不是不需要iptables规则,在下列情况的时候还是会依赖iptables的规则实现相关功能:参考这里。在本节对iptables规则进行分析之后,可以按照这种方式再去分析ipvs相关规则配置,在分析ipvs规则时也需要要求我们熟悉ipvs和ipset等。

在kubernetes网络中,还有其他网络插件实现了和kube-proxy相同的功能,若kube-routercilium,他们都支持kube-proxy实现的功能,因此在使用上述网络插件时,集群里可以不部署kube-proxy。

kube-proxy在同步iptables规则时是通过iptables-save方式获取当前的规则,然后根据集群内的service状态动态对已有规则进行更新,最后使用iptables-save将规则在配置到系统上。kube-proxy重启不会影响到新建链接和已有链接的状态。在删除已有service,kube-proxy还会释放相关的链接,以及对不同协议的额外处理操作,这里不做太多分析。

下面是一个固定nginx的service的来进行分析iptables规则。

[root@master1 ~]# kubectl get svc
NAME         TYPE           CLUSTER-IP     EXTERNAL-IP           PORT(S)       AGE
kubernetes   ClusterIP     172.17.0.1     <none>                 443/TCP       79d
nginx       LoadBalancer   172.17.45.122   192.168.64.1,4.3.2.1   80:30935/TCP   3d5h
[root@master1 ~]# kubectl edit svc nginx
Edit cancelled, no changes made.
[root@master1 ~]# kubectl get svc nginx -oyaml
apiVersion: v1
kind: Service
metadata:
creationTimestamp: "2021-10-16T05:55:09Z"
labels:
  app: nginx
name: nginx
namespace: default
resourceVersion: "619828"
uid: 58ad25b5-6ee0-4a10-8c60-3a4371db1bbc
spec:
clusterIP: 172.17.45.122
clusterIPs:
- 172.17.45.122
externalIPs:
- 4.3.2.1
externalTrafficPolicy: Cluster
ipFamilies:
- IPv4
ipFamilyPolicy: SingleStack
ports:
- nodePort: 30935
  port: 80
  protocol: TCP
  targetPort: 80
selector:
  app: nginx
sessionAffinity: None
type: LoadBalancer
status:
loadBalancer:
  ingress:
  - ip: 192.168.64.1

iptables在五个内置链开始匹配,在每个链执行的时候从四个表里找对应的规则判断数据包是否可以通过,后者是跳转到用户自定义的链中,以及跳转到目标处理中。四个表的优先顺序如下所示,raw表优先级最高,filter表优先级最低,kube-proxy只使用了nat表和filter表。

raw -> mangle -> nat -> filter

iptables分析

我们通过这两个命令看iptables第一级跳转到哪个链,然后再去第一级跳转逐个分析后面的链跳转。

iptables -S | grep '\-A INPUT\|\-A PREROUTING\|\-A FORWARD\|\-A POSTROUTING\|\-A OUTPUT' | grep KUBE
iptables -tnat -S | grep '\-A INPUT\|\-A PREROUTING\|\-A FORWARD\|\-A POSTROUTING\|\-A OUTPUT' | grep KUBE

下面是针对节点上主机iptables规则的整理。并没有按照表的方式去做分析,而是按照5个内置链作为最初入口,分析跟踪数据包在各个链处会跳转到哪些其他链,应用了nat表和filter表里的哪些规则,这样可以更好的帮助我们理解数据包的流动。

- **PREROUTING**

  - nat表

    - **KUBE-SERVICES (1)**()

      - 如果没有endpoint

        无规则

      - 如果有endpoint

        ```bash
        # 如果不是来自集群内pod访问,打上标记,后面进行源地址nat
        -A KUBE-SERVICES ! -s 10.248.0.0/13 -d 172.17.45.122/32 -p tcp -m comment --comment "default/nginx cluster IP" -m tcp --dport 80 -j KUBE-MARK-MASQ
        # 针对直接访问service ip的
        -A KUBE-SERVICES -d 172.17.45.122/32 -p tcp -m comment --comment "default/nginx cluster IP" -m tcp --dport 80 -j KUBE-SVC-2CMXP7HKUVJN7L6M
        # 针对负载均衡器访问的
        -A KUBE-SERVICES -d 192.168.64.1/32 -p tcp -m comment --comment "default/nginx loadbalancer IP" -m tcp --dport 80 -j KUBE-FW-2CMXP7HKUVJN7L6M
        # 针对访问externalIPs的,如果externalTrafficPolicy=Cluster
        -A KUBE-SERVICES ! -s 10.248.0.0/13 -d 4.3.2.1/32 -p tcp -m comment --comment "default/nginx external IP" -m tcp --dport 80 -j KUBE-MARK-MASQ
        -A KUBE-SERVICES -d 4.3.2.1/32 -p tcp -m comment --comment "default/nginx external IP" -m tcp --dport 80 -j KUBE-SVC-2CMXP7HKUVJN7L6M
        # 针对访问externalIPs的,如果externalTrafficPolicy=Local
        -A KUBE-SERVICES -d 4.3.2.1/32 -p tcp -m comment --comment "default/nginx external IP" -m tcp --dport 80 -j KUBE-XLB-2CMXP7HKUVJN7L6M
        ```

      - **最后是nodeport**

        ```bash
        -A KUBE-SERVICES -m comment --comment "kubernetes service nodeports; NOTE: this must be the last rule in this chain" -m addrtype --dst-type LOCAL -j KUBE-NODEPORTS
        ```

    - **KUBE-SVC-2CMXP7HKUVJN7L6M (2)**

      ```bash
      # 特定服务的。使用statistic模块按概率匹配,下面是针对3个pod的
      -A KUBE-SVC-2CMXP7HKUVJN7L6M -m comment --comment "default/nginx" -m statistic --mode random --probability 0.33333333349 -j KUBE-SEP-FE2RUGH5B3B35OF2
      -A KUBE-SVC-2CMXP7HKUVJN7L6M -m comment --comment "default/nginx" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-B3DCXHZMDQ7P4TD4
      -A KUBE-SVC-2CMXP7HKUVJN7L6M -m comment --comment "default/nginx" -j KUBE-SEP-O356R7YCHBWDV6FO
      ```

    - **KUBE-SEP-FE2RUGH5B3B35OF2 (3)**

      ```bash
      # 特定endpoint的,hairpin
      -A KUBE-SEP-FE2RUGH5B3B35OF2 -s 10.248.4.135/32 -m comment --comment "default/nginx" -j KUBE-MARK-MASQ
      -A KUBE-SEP-FE2RUGH5B3B35OF2 -p tcp -m comment --comment "default/nginx" -m tcp -j DNAT --to-destination 10.248.4.135:80
      ```

    - **KUBE-XLB-2CMXP7HKUVJN7L6M (2)**

      - 如果externalTrafficPolicy=Local

        ```bash
        # pod节点
        -A KUBE-XLB-2CMXP7HKUVJN7L6M -s 10.248.0.0/13 -m comment --comment "Redirect pods trying to reach external loadbalancer VIP to clusterIP" -j KUBE-SVC-2CMXP7HKUVJN7L6M
        -A KUBE-XLB-2CMXP7HKUVJN7L6M -m comment --comment "masquerade LOCAL traffic for default/nginx LB IP" -m addrtype --src-type LOCAL -j KUBE-MARK-MASQ
        -A KUBE-XLB-2CMXP7HKUVJN7L6M -m comment --comment "route LOCAL traffic for default/nginx LB IP to service chain" -m addrtype --src-type LOCAL -j KUBE-SVC-2CMXP7HKUVJN7L6M
        -A KUBE-XLB-2CMXP7HKUVJN7L6M -m comment --comment "Balancing rule 0 for default/nginx" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-L3GP2ZY57RKPXMWQ
        -A KUBE-XLB-2CMXP7HKUVJN7L6M -m comment --comment "Balancing rule 1 for default/nginx" -j KUBE-SEP-4YKHL2MNKW2M7HD6
        
        # 非pod节点
        -A KUBE-XLB-2CMXP7HKUVJN7L6M -s 10.248.0.0/13 -m comment --comment "Redirect pods trying to reach external loadbalancer VIP to clusterIP" -j KUBE-SVC-2CMXP7HKUVJN7L6M
        -A KUBE-XLB-2CMXP7HKUVJN7L6M -m comment --comment "masquerade LOCAL traffic for default/nginx LB IP" -m addrtype --src-type LOCAL -j KUBE-MARK-MASQ
        -A KUBE-XLB-2CMXP7HKUVJN7L6M -m comment --comment "route LOCAL traffic for default/nginx LB IP to service chain" -m addrtype --src-type LOCAL -j KUBE-SVC-2CMXP7HKUVJN7L6M
        -A KUBE-XLB-2CMXP7HKUVJN7L6M -m comment --comment "default/nginx has no local endpoints" -j KUBE-MARK-DROP
        ```

      - 如果externalTrafficPolicy=Cluster

        无规则

    - **KUBE-FW-2CMXP7HKUVJN7L6M (2)**

      针对负载均衡器的。

      - 如果externalTrafficPolicy=Local

        ```bash
        # 如果loadBalancerSourceRange设置了值
        -A KUBE-FW-2CMXP7HKUVJN7L6M -s 192.168.10.0/24 -m comment --comment "default/nginx loadbalancer IP" -j KUBE-XLB-2CMXP7HKUVJN7L6M
        # 如果loadBalancerSourceRange没设置
        -A KUBE-FW-2CMXP7HKUVJN7L6M -m comment --comment "default/nginx loadbalancer IP" -j KUBE-XLB-2CMXP7HKUVJN7L6M
        # 这条固定
        -A KUBE-FW-2CMXP7HKUVJN7L6M -m comment --comment "default/nginx loadbalancer IP" -j KUBE-MARK-DROP
        ```

      - 如果externalTrafficPolicy=Cluster

        ```bash
        -A KUBE-FW-2CMXP7HKUVJN7L6M -m comment --comment "default/nginx loadbalancer IP" -j KUBE-MARK-MASQ
        # 如果loadBalancerSourceRange设置了值
        -A KUBE-FW-2CMXP7HKUVJN7L6M -s 192.168.10.0/24 -m comment --comment "default/nginx loadbalancer IP" -j KUBE-SVC-2CMXP7HKUVJN7L6M
        # 如果loadBalancerSourceRange没设置(包含下面两条)
        -A KUBE-FW-2CMXP7HKUVJN7L6M -m comment --comment "default/nginx loadbalancer IP" -j KUBE-SVC-2CMXP7HKUVJN7L6M
        -A KUBE-FW-2CMXP7HKUVJN7L6M -m comment --comment "default/nginx loadbalancer IP" -j KUBE-MARK-DROP
        ```

    - **KUBE-MARK-MASQ**

      打上标签以便做源地址nat

      ```bash
      -A KUBE-MARK-MASQ -j MARK --set-xmark 0x4000/0x4000
      ```

    - **KUBE-MARK-DROP**

      都不匹配丢弃

      ```bash
      -A KUBE-MARK-DROP -j MARK --set-xmark 0x8000/0x8000
      ```

    - **KUBE-NODEPORTS (2)**

      如果是有分配nodeport的话,此处的port和上面的健康检查的不一样

      ```bash
      # 如果externalTrafficPolicy=Cluster
      -A KUBE-NODEPORTS -p tcp -m comment --comment "default/nginx" -m tcp --dport 30935 -j KUBE-MARK-MASQ
      -A KUBE-NODEPORTS -p tcp -m comment --comment "default/nginx" -m tcp --dport 30935 -j KUBE-SVC-2CMXP7HKUVJN7L6M
      # 如果externalTrafficPolicy=Local
      -A KUBE-NODEPORTS -s 127.0.0.0/8 -p tcp -m comment --comment "default/nginx" -m tcp --dport 30935 -j KUBE-MARK-MASQ
      -A KUBE-NODEPORTS -p tcp -m comment --comment "default/nginx" -m tcp --dport 30935 -j KUBE-XLB-2CMXP7HKUVJN7L6M
      ```

- **INPUT**

  - filter表

    - **KUBE-NODEPORTS (1)**(externalTrafficPolicy=Local时用来做健康检查)

      - 如果externalTrafficPolicy=Local

        ```bash
        -A KUBE-NODEPORTS -p tcp -m comment --comment "default/nginx health check node port" -m tcp --dport 31745 -j ACCEPT
        ```

        31745这个端口由kube-proxy启动监听,apiserver随机分配,每次可能不一样,每个节点上都有。是一个http接口,参考下面

      - externalTrafficPolicy=Cluster

        无规则。

    - **KUBE-EXTERNAL-SERVICES (1)** (-m conntrack --ctstate NEW)

      与externalTrafficPolicy无关。

      - 如果没有endpoint

        ```bash
        # 4.3.2.1是service.spec.externalIPs指定
        -A KUBE-EXTERNAL-SERVICES -d 4.3.2.1/32 -p tcp -m comment --comment "default/nginx has no endpoints" -m tcp --dport 80 -j REJECT --reject-with icmp-port-unreachable
        # 192.168.64.1是LB地址
        -A KUBE-EXTERNAL-SERVICES -d 192.168.64.1/32 -p tcp -m comment --comment "default/nginx has no endpoints" -m tcp --dport 80 -j REJECT --reject-with icmp-port-unreachable
        # 这条是给NodePort方式使用
        -A KUBE-EXTERNAL-SERVICES -p tcp -m comment --comment "default/nginx has no endpoints" -m addrtype --dst-type LOCAL -m tcp --dport 30935 -j REJECT --reject-with icmp-port-unreachable
        ```

      - 如果有endpoint

        无规则

    - **KUBE-FIREWALL (1)**

      ```bash
      -A KUBE-FIREWALL -m comment --comment "kubernetes firewall for dropping marked packets" -m mark --mark 0x8000/0x8000 -j DROP
      -A KUBE-FIREWALL ! -s 127.0.0.0/8 -d 127.0.0.0/8 -m comment --comment "block incoming localnet connections" -m conntrack ! --ctstate RELATED,ESTABLISHED,DNAT -j DROP
      ```

- **FORWARD**

  - filter表

    - **KUBE-FORWARD (1)**()

      ```bash
      -A KUBE-FORWARD -m conntrack --ctstate INVALID -j DROP
      -A KUBE-FORWARD -m comment --comment "kubernetes forwarding rules" -m mark --mark 0x4000/0x4000 -j ACCEPT
      -A KUBE-FORWARD -m comment --comment "kubernetes forwarding conntrack pod source rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
      -A KUBE-FORWARD -m comment --comment "kubernetes forwarding conntrack pod destination rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
      ```

    - **KUBE-SERVICES (1)**(-m conntrack --ctstate NEW)

      - 如果没有endpoint

        ```bash
        -A KUBE-SERVICES -d 172.17.45.122/32 -p tcp -m comment --comment "default/nginx has no endpoints" -m tcp --dport 80 -j REJECT --reject-with icmp-port-unreachable
        ```

      - 如果有endpoint

        无规则

    - **KUBE-EXTERNAL-SERVICES (1)**(-m conntrack --ctstate NEW)

      和上面INPUT中nat表里的KUBE-EXTERNAL-SERVICES是同一个链,所以规则相同。主要针对本机转发的访问请求。

- **OUTPUT**

  主要针对本地发起的访问请求。

  - nat表

    - **KUBE-SERVICES (1)**

      和上面PREROUTING中nat表里的KUBE-SERVICES是同一个链,所以规则相同。

  - filter表

    - **KUBE-SERVICES (1)** (-m conntrack --ctstate NEW)

      和上面FORWARD中nat表里的KUBE-SERVICES是同一个链,所以规则相同。

    - **KUBE-FIREWALL (1)**

      和上面INPUT中nat表里的KUBE-FIREALL是同一个链,所以规则相同。

- **POSTROUTING**

  - nat表

    - **KUBE-POSTROUTING (1)**()

      ```bash
      -A KUBE-POSTROUTING -m mark ! --mark 0x4000/0x4000 -j RETURN
      -A KUBE-POSTROUTING -j MARK --set-xmark 0x4000/0x0
      -A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -j MASQUERADE
      ```

masquerade

172.0.0.1:5544 > 10.0.0.1:80 : 192.168.1.1:4532 -> 172.0.0.2

nodePort的健康检查

当将service.spec.externalTrafficPolicy设置为Local时, apiserver会在我们配置的nodePort端口范围内分配一个新的本地端口,所有节点上的kube-proxy都会启动这个端口的监听,运行一个简单的接口服务,通过访问这个接口可以获取该节点上是否有属于该service的pod实例。

[root@node3 ~]# iptables -S | grep NODEPORT
-N KUBE-NODEPORTS
-A INPUT -m comment --comment "kubernetes health check service ports" -j KUBE-NODEPORTS
-A KUBE-NODEPORTS -p tcp -m comment --comment "default/nginx health check node port" -m tcp --dport 30459 -j ACCEPT
[root@node3 ~]# curl node3:30459 -v
* About to connect() to node3 port 30459 (#0)
*   Trying 192.168.3.28...
* Connected to node3 (192.168.3.28) port 30459 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.29.0
> Host: node3:30459
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: application/json
< X-Content-Type-Options: nosniff
< Date: Wed, 20 Oct 2021 04:54:19 GMT
< Content-Length: 86
<
{
	"service": {
		"namespace": "default",
		"name": "nginx"
	},
	"localEndpoints": 1
* Connection #0 to host node3 left intact
}[root@node3 ~]#
[root@node3 ~]# exit
登出
Connection to node3 closed.
[root@master1 ~]# iptables -S | grep NODEPORT
-N KUBE-NODEPORTS
-A INPUT -m comment --comment "kubernetes health check service ports" -j KUBE-NODEPORTS
-A KUBE-NODEPORTS -p tcp -m comment --comment "default/nginx health check node port" -m tcp --dport 30459 -j ACCEPT
[root@master1 ~]# curl master1:30459 -v
* About to connect() to master1 port 30459 (#0)
*   Trying 192.168.3.29...
* Connected to master1 (192.168.3.29) port 30459 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.29.0
> Host: master1:30459
> Accept: */*
>
< HTTP/1.1 503 Service Unavailable
< Content-Type: application/json
< X-Content-Type-Options: nosniff
< Date: Wed, 20 Oct 2021 04:54:55 GMT
< Content-Length: 86
<
{
	"service": {
		"namespace": "default",
		"name": "nginx"
	},
	"localEndpoints": 0
* Connection #0 to host master1 left intact
}[root@master1 ~]#

路径分析

  • 集群外节点访问负载均衡器地址
  • 集群内节点访问clusterIP
  • 集群外访问nodePort
  • 在externalTrafficPolicy为Local时

ipvs模式

ipvs介绍

ipvs也是linux内核中集成的一个功能,实现了4层负载均衡的功能。在内核中对应的模块为ip_vs,ip_vs_rr等内核模块。在用户空间也有一个管理程序叫做ipvsadm,用来管理配置内核中虚拟服务器的工具。下面我们简单介绍一下这个命令的基本使用方法。

###################
# 添加和修改虚拟服务 #
###################
# -A 添加
# -E 编辑
# -t tcp服务
# -u udp服务
# -f 数据包打标记,方便防火墙规则配置
# -S 调度算法:rr, wrr, lc, dh等
# -p 配置会话保持功能
# -M 配置针对网段的会话保持功能
ipvsadm -A|E -t|u|f service-address [-s scheduler] [-p [timeout]] [-O] [-M netmask]

###################
# 查看虚拟服务器信息 #
###################
# -L|l 查询
# -t 如果指定了服务,只查看这个服务的
# -c 查看链接信息
# --stats 统计信息
# --rate 速率
ipvsadm -L|l [options]

##############
# 删除虚拟服务 #
##############
ipvsadm -D -t|u|f service-address

################
# 清空虚拟服务表 #
################
ipvsadm -C

#######################
# 导出虚拟服务器规则配置 #
#######################
# -n ip地址和端口以数字形式标识
ipvsadm -S [-n]

#######################
# 恢复虚拟服务器规则配置 #
#######################
ipvsadm -R

#####################
# 添加或修改真实服务器 #
#####################
# -t tcp服务
# -u udp服务
# -f 数据包打标记,方便防火墙规则配置
# -r 真实服务器地址
# 
# 包转发方法
# -g 直接路由模式
# -i ipip协议封包
# -m 网络地址转换
#
# -w 服务器权重,0和大于0
# -x 连接数上限
# -y 连接数下限
ipvsadm -a|e -t|u|f service-address -r server-address
[-g|i|m] [-w weight] [-x upper] [-y lower]

################
# 删除真实服务器 #
################
# -r 真实服务器地址
ipvsadm -d -t|u|f service-address -r server-address

###################
# 清空包和字节计数器 #
###################
ipvsadm -Z [-t|u|f service-address]
# 创建虚拟服务器
# 某个网卡ip,一般是虚拟ip,别的机器可以访问
ipvsadm -A -t 1.3.5.7:8080 -s rr -p
ipvsadm -A -u 1.3.5.7:8080 -s rr -p
ipvsadm -A -f 1234

# 删除虚拟服务器
ipvsadm -D -t 1.3.5.7:8080
ipvsadm -D -u 1.3.5.7:8080
ipvsadm -D -f 1234

# 修改配置
ipvsadm -E -t 1.3.5.7:6443 -s wrr -p 180
ipvsadm -E -f 1234 -p 60 -s rr

# 添加真实服务器
ipvsadm -a -t 1.3.5.7:8080 -r master2:6443 -m

# 删除真实服务器
ipvsadm -d -t 1.3.5.7:8080 -r master2:6443

# 查看服务信息
ipvsadm -L -n --stats --rate

ipvs本身不提供健康检查机制,需要我们外部工具动态管理后端服务器,如ldirectord、keepalived、kube-proxy。

查看kube-proxy配置的ipvs规则,可以看到ipvs配置里把所有可能访问的地址都配置进来了,并且还有本机ip地址+nodePort的规则也配置了进来。

# loadBalancerIp
[root@node1 ~]# ipvsadm -Ln | grep 192.168.64.1 -A 5
TCP  192.168.64.1:80 rr # externalTrafficPolicy=Local没有rs
TCP  192.168.64.3:80 rr
  -> 10.248.3.214:80              Masq    1      0          0
TCP  192.168.64.3:443 rr
  -> 10.248.3.214:443             Masq    1      0          0
TCP  4.3.2.1:80 rr
[root@node1 ~]# ipvsadm -Ln | grep 192.168.64.1 -A 5
TCP  192.168.64.1:80 rr # externalTrafficPolicy=Cluster有rs
  -> 10.248.4.147:80              Masq    1      0          0
  -> 10.248.4.148:80              Masq    1      0          0
  -> 10.248.5.62:80               Masq    1      0          0
TCP  192.168.64.3:80 rr
  -> 10.248.3.214:80              Masq    1      0          0
  
# externalIPs
[root@node1 ~]# ipvsadm -Ln | grep 4.3.2.1 -A 5
TCP  4.3.2.1:80 rr # externalTrafficPolicy=Cluster有rs
  -> 10.248.4.147:80              Masq    1      0          0
  -> 10.248.4.148:80              Masq    1      0          0
  -> 10.248.5.62:80               Masq    1      0          0
TCP  10.248.3.0:30840 rr
  -> 10.248.3.214:443             Masq    1      0          0
[root@node1 ~]# ipvsadm -Ln | grep 4.3.2.1 -A 5
TCP  4.3.2.1:80 rr # externalTrafficPolicy=Local没有rs
TCP  10.248.3.0:30840 rr
  -> 10.248.3.214:443             Masq    1      0          0
TCP  10.248.3.0:30935 rr
TCP  10.248.3.0:31415 rr
  -> 10.248.3.214:80              Masq    1      0          0
  
# clusterIP
[root@node1 ~]# ipvsadm -Ln | grep 172.17.45.122 -A 5
TCP  172.17.45.122:80 rr
  -> 10.248.4.147:80              Masq    1      0          0
  -> 10.248.4.148:80              Masq    1      0          0
  -> 10.248.5.62:80               Masq    1      0          0
TCP  172.17.89.152:8000 rr
  -> 10.248.3.218:8000            Masq    1      0          0

# 本机所有ip+nodeport
[root@node1 ~]# ipvsadm -Ln | grep 30935 -A 5
TCP  172.17.0.1:30935 rr
  -> 10.248.4.147:80              Masq    1      0          0
  -> 10.248.4.148:80              Masq    1      0          0
  -> 10.248.5.62:80               Masq    1      0          0
TCP  172.17.0.1:31415 rr
  -> 10.248.3.214:80              Masq    1      0          0
--
TCP  192.168.3.27:30935 rr
  -> 10.248.4.147:80              Masq    1      0          0
  -> 10.248.4.148:80              Masq    1      0          0
  -> 10.248.5.62:80               Masq    1      0          0
TCP  192.168.3.27:31415 rr
  -> 10.248.3.214:80              Masq    1      0          0
--
TCP  10.248.3.0:30935 rr
  -> 10.248.4.147:80              Masq    1      0          0
  -> 10.248.4.148:80              Masq    1      0          0
  -> 10.248.5.62:80               Masq    1      0          0
TCP  10.248.3.0:31415 rr
  -> 10.248.3.214:80              Masq    1      0          0
--
TCP  127.0.0.1:30935 rr
  -> 10.248.4.147:80              Masq    1      0          0
  -> 10.248.4.148:80              Masq    1      0          0
  -> 10.248.5.62:80               Masq    1      0          0
TCP  127.0.0.1:31415 rr
  -> 10.248.3.214:80              Masq    1      0          0

ipvs分析

修改kube-proxy的configmap,切换节点为ipvs模式,清空iptables规则,结合下面链接进行大概分析。

https://kubernetes.io/blog/2018/07/09/ipvs-based-in-cluster-load-balancing-deep-dive/

我们知道了对于service在集群内部访问使用ipvs,那么对于nodeport,externalIPs等地址还是需要使用iptables,那么我们利用上面同样的方法来进行分析。

- **PREROUTING**

  - nat表

    - **KUBE-SERVICES**

      ```bash
      -A KUBE-SERVICES -m comment --comment "Kubernetes service lb portal" -m set --match-set KUBE-LOAD-BALANCER dst,dst -j KUBE-LOAD-BALANCER
      # 决定是否允许数据包通过和是否做nat
      # ipset list KUBE-LOAD-BALANCER
      # ...
      # 192.168.64.1,tcp:80
      # ...
      
      -A KUBE-SERVICES ! -s 10.248.0.0/13 -m comment --comment "Kubernetes service cluster ip + port for masquerade purpose" -m set --match-set KUBE-CLUSTER-IP dst,dst -j KUBE-MARK-MASQ
      # 如果访问的service的clusterIP,原地址不是集群网段则做nat
      # ipset list KUBE-CLUSTER-IP
      # ...
      # 172.17.45.122,tcp:80
      # ...
      
      -A KUBE-SERVICES -m comment --comment "Kubernetes service external ip + port with externalTrafficPolicy=local" -m set --match-set KUBE-EXTERNAL-IP-LOCAL dst,dst -m physdev ! --physdev-is-in -m addrtype ! --src-type LOCAL -j ACCEPT
      # Allow traffic for external IPs that does not come from a bridge (i.e. not from a container)
      # nor from a local process to be forwarded to the service.
      # This rule roughly translates to "all traffic from off-machine".
      #	This is imperfect in the face of network plugins that might not use a bridge, but we can revisit that later.
      # 如果是externalTrafficPolicy=Local,,如果数据包不是从bridge进入的且原地址不是本地地址
      # 允许数据包通过,相当于本机之外的访问都允许
      -A KUBE-SERVICES -m comment --comment "Kubernetes service external ip + port with externalTrafficPolicy=local" -m set --match-set KUBE-EXTERNAL-IP-LOCAL dst,dst -m addrtype --dst-type LOCAL -j ACCEPT
      # Allow traffic bound for external IPs that happen to be recognized as local IPs to stay local.
      #	This covers cases like GCE load-balancers which get added to the local routing table.
      # 如果是externalTrafficPolicy=Local,,如果是目标本地ip地址则允许数据包通过(看意思推断是某些LB实现有什么限制,比如本机发起的不能经过LB再回到本机,这里特意加了允许的规则)。
      # ipset list KUBE-EXTERNAL-IP-LOCAL
      # ...
      # 4.3.2.1,tcp:80
      # ...
      
      -A KUBE-SERVICES -m addrtype --dst-type LOCAL -j KUBE-NODE-PORT
      -A KUBE-SERVICES -m set --match-set KUBE-CLUSTER-IP dst,dst -j ACCEPT
      -A KUBE-SERVICES -m set --match-set KUBE-LOAD-BALANCER dst,dst -j ACCEPT
      ```

    - **KUBE-LOAD-BALANCER**

      ```bash
      -A KUBE-LOAD-BALANCER -m comment --comment "Kubernetes service load balancer ip + port for load balancer with sourceRange" -m set --match-set KUBE-LOAD-BALANCER-FW dst,dst -j KUBE-FIREWALL
      # ipset list KUBE-LOAD-BALANCER-FW
      # ...
      # 192.168.64.1,tcp:80
      # ...
      
      -A KUBE-LOAD-BALANCER -m comment --comment "Kubernetes service load balancer ip + port with externalTrafficPolicy=local" -m set --match-set KUBE-LOAD-BALANCER-LOCAL dst,dst -j RETURN
      # 如果是externalTrafficPolicy=Local,则能匹配,这样就直接返回不做nat
      # 如果是Cluster,则执行下面的masq
      # ipset list KUBE-LOAD-BALANCER-LOCAL
      # ...
      # 192.168.64.1,tcp:80
      # ...
      -A KUBE-LOAD-BALANCER -j KUBE-MARK-MASQ
      ```

    - **KUBE-FIREWALL**

      ```bash
      -A KUBE-FIREWALL -m comment --comment "Kubernetes service load balancer ip + port + source cidr for packet filter purpose" -m set --match-set KUBE-LOAD-BALANCER-SOURCE-CIDR dst,dst,src -j RETURN
      # ipset list KUBE-LOAD-BALANCER-SOURCE-CIDR
      # ...
      # 192.168.64.1,tcp:80,192.168.10.0/24
      # ...
      -A KUBE-FIREWALL -j KUBE-MARK-DROP
      ```

    - **KUBE-NODE-PORT**

      ```bash
      -A KUBE-NODE-PORT -p tcp -m comment --comment "Kubernetes nodeport TCP port with externalTrafficPolicy=local" -m set --match-set KUBE-NODE-PORT-LOCAL-TCP dst -j RETURN
      -A KUBE-NODE-PORT -p tcp -m comment --comment "Kubernetes nodeport TCP port for masquerade purpose" -m set --match-set KUBE-NODE-PORT-TCP dst -j KUBE-MARK-MASQ
      # 先匹配Local的,匹配的话直接返回,没有匹配到的进入第二条则会做nat。
      # 根据不同协议还有:KUBE-NODE-PORT-UDP,KUBE-NODE-PORT-LOCAL-UDP
      # ipset list KUBE-NODE-PORT-TCP
      # ...
      # 30935
      # ...
      ```

- **INPUT**

  - filter表

    - **KUBE-NODE-PORT**

      ```bash
      -A KUBE-NODE-PORT -m comment --comment "Kubernetes health check node port" -m set --match-set KUBE-HEALTH-CHECK-NODE-PORT dst -j ACCEPT
      # ipset list KUBE-HEALTH-CHECK-NODE-PORT
      # 获取healthCheckNodePort,使用该命令测试是否在集合中。如果是健康检查接口则要放行。这个端口
      # 还是有kube-proxy启动监听
      # ipset test KUBE-HEALTH-CHECK-NODE-PORT 3215
      ```

- **FORWARD**

  - filter表

    - **KUBE-FORWARD**

      ```bash
      -A KUBE-FORWARD -m comment --comment "kubernetes forwarding rules" -m mark --mark 0x4000/0x4000 -j ACCEPT
      -A KUBE-FORWARD -m comment --comment "kubernetes forwarding conntrack pod source rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
      -A KUBE-FORWARD -m comment --comment "kubernetes forwarding conntrack pod destination rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
      ```

- **OUTPUT**

  - nat表

    - **KUBE-SERVICES**

      和上面PREROUTING中nat表里的KUBE-SERVICES是同一个链,所以规则相同。主要针对本地发起的访问请求。

- **POSTROUTING**

  - nat表

    - **KUBE-POSTROUTING**

      ```bash
      -A KUBE-POSTROUTING -m comment --comment "Kubernetes endpoints dst ip:port, source ip for solving hairpin purpose" -m set --match-set KUBE-LOOP-BACK dst,dst,src -j MASQUERADE
      -A KUBE-POSTROUTING -m mark ! --mark 0x4000/0x4000 -j RETURN
      -A KUBE-POSTROUTING -j MARK --set-xmark 0x4000/0x0
      -A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -j MASQUERADE
      ```

发表回复

您的电子邮箱地址不会被公开。 必填项已用*标注