作者:DevOps旭
來自:DevOps探路者
一、初識docker網絡
1、從內核開始認識docker網絡
從網絡角度看docker的技術架構,則是利用NetWork NameSpace 為底層實現的。那麼,NetWorkNameSpace又是什麼呢?
1.1 NetWorkNameSpace是什麼
NetWorkNameSpace是linux內核提供的一種進行網絡隔離額資源,通過調用CLONE_NEWNET參數來實現網絡設備、網絡棧、端口等資源的隔離。NetWorkNameSpace可以在一個操作系統內創建多個網絡空間,併為每一個網絡空間創建獨立的網絡協議棧,系統管理員也可以通過ip工具來進行管理
<code>[root@docker1 ~]# ip net help Usage: ip netns list ip netns add NAME ip netns set NAME NETNSID ip [-all] netns delete [NAME] ip netns identify [PID] ip netns pids NAME ip [-all] netns exec [NAME] cmd ... ip netns monitor ip netns list-id /<code>
我們先嚐試通過ip netns add example_net1命令創建一個名為example_net1的namespace
<code>[root@docker1 ~]# ip netns add example_net1 [root@docker1 ~]# ip netns list example_net1 [root@docker1 ~]# ls /var/run/netns/ example_net1 /<code>
在這個新的namespace內,會有獨立的網卡、arp表、路由表、iptables規則,我們可以通過ip netns exec 命令來進行查看
<code>[root@docker1 ~]# ip netns exec example_net1 ip link list 1: lo: mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 [root@docker1 ~]# ip netns exec example_net1 route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface [root@docker1 ~]# ip netns exec example_net1 iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination Chain FORWARD (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination [root@docker1 ~]# ip netns exec example_net1 arp -a /<code>
通過上述實驗我們可以看到,namespace的arp表、路由表、iptables規則是隔離開的,那麼網卡我們怎麼來驗證一下呢?下面我們先創建一對網卡,並將其中一個網卡插入到namespace中
<code>[root@docker1 ~]# ip link add type veth [root@docker1 ~]# ip link list 1: lo: mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: ens32: mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000 link/ether 00:0c:29:fc:3f:5f brd ff:ff:ff:ff:ff:ff 3: docker0: mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default link/ether 02:42:ad:84:cb:87 brd ff:ff:ff:ff:ff:ff 4: veth0@veth1: mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 86:62:51:08:e8:1b brd ff:ff:ff:ff:ff:ff 5: veth1@veth0: mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 3a:21:26:35:f8:82 brd ff:ff:ff:ff:ff:ff [root@docker1 ~]# ip link set veth0 netns example_net1 [root@docker1 ~]# ip link list 1: lo: mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: ens32: mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000 link/ether 00:0c:29:fc:3f:5f brd ff:ff:ff:ff:ff:ff 3: docker0: mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default link/ether 02:42:ad:84:cb:87 brd ff:ff:ff:ff:ff:ff 5: veth1@if4: mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 3a:21:26:35:f8:82 brd ff:ff:ff:ff:ff:ff link-netnsid 0 [root@docker1 ~]# ip netns exec example_net1 ip link list 1: lo: mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 4: veth0@if5: mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 86:62:51:08:e8:1b brd ff:ff:ff:ff:ff:ff link-netnsid 0 [root@docker1 ~]# /<code>
我們可以看到,創建完veth pair後,可以看到5個網卡,當將veth0插入到example_net1 namespace後,便只剩下了4個網卡,而example_net1 中多了一個veth0,所以,網卡也是隔離開的。
1.2 不同NetWorkNameSpace之間是怎麼通信的呢
那麼跨namespace是怎麼通信的呢?我們可以再深入模擬一下,下面我們創建另一個namespace,並veth1插入到另一個namespace中
<code>[root@docker1 ~]# ip netns add example_net2 [root@docker1 ~]# ip link set veth1 netns example_net2 /<code>
下面我們開始配置網卡
<code>[root@docker1 ~]# ip netns exec example_net1 ip link set veth0 up [root@docker1 ~]# ip netns exec example_net2 ip link set veth1 up [root@docker1 ~]# ip netns exec example_net1 ip addr add 172.19.0.1/24 dev veth0 [root@docker1 ~]# ip netns exec example_net2 ip addr add 172.19.0.2/24 dev veth1 [root@docker1 ~]# ip netns exec example_net1 ip a 1: lo: mtu 65536 qdisc noop state DOWN group default qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 4: veth0@if5: mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 86:62:51:08:e8:1b brd ff:ff:ff:ff:ff:ff link-netnsid 1 inet 172.19.0.1/24 scope global veth0 valid_lft forever preferred_lft forever inet6 fe80::8462:51ff:fe08:e81b/64 scope link valid_lft forever preferred_lft forever [root@docker1 ~]# ip netns exec example_net2 ip a 1: lo: mtu 65536 qdisc noop state DOWN group default qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 5: veth1@if4: mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 3a:21:26:35:f8:82 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 172.19.0.2/24 scope global veth1 valid_lft forever preferred_lft forever inet6 fe80::3821:26ff:fe35:f882/64 scope link valid_lft forever preferred_lft foreve /<code>
現在兩個namespace可以通信了
<code>[root@docker1 ~]# ip netns exec example_net1 ping 172.19.0.2 PING 172.19.0.2 (172.19.0.2) 56(84) bytes of data. 64 bytes from 172.19.0.2: icmp_seq=1 ttl=64 time=0.095 ms 64 bytes from 172.19.0.2: icmp_seq=2 ttl=64 time=0.042 ms 64 bytes from 172.19.0.2: icmp_seq=3 ttl=64 time=0.057 ms ^C --- 172.19.0.2 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2015ms rtt min/avg/max/mdev = 0.042/0.064/0.095/0.024 ms [root@docker1 ~]# /<code>
現在我們已經高度接近容器網絡了,那麼docker中用於跨主機通信的docker0又是怎麼實現的呢?下面我們在另一臺機器上模擬一下。
<code>[root@docker2 ~]# ip netns add ns1 [root@docker2 ~]# ip netns add ns2 [root@docker2 ~]# brctl addbr bridge0 [root@docker2 ~]# ip link set dev bridge0 up [root@docker2 ~]# ip link add type veth [root@docker2 ~]# ip link add type veth [root@docker2 ~]# ip link list 1: lo: mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: ens32: mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000 link/ether 00:0c:29:1b:1e:63 brd ff:ff:ff:ff:ff:ff 3: docker0: mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default link/ether 02:42:85:70:1a:f0 brd ff:ff:ff:ff:ff:ff 24: bridge0: mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/ether 8e:34:ad:1e:80:18 brd ff:ff:ff:ff:ff:ff 25: veth0@veth1: mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether a2:2f:0e:64:d0:7b brd ff:ff:ff:ff:ff:ff 26: veth1@veth0: mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether d6:13:14:1a:d4:42 brd ff:ff:ff:ff:ff:ff 27: veth2@veth3: mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 2e:f1:f1:ae:77:1e brd ff:ff:ff:ff:ff:ff 28: veth3@veth2: mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether da:08:09:21:36:98 brd ff:ff:ff:ff:ff:ff [root@docker2 ~]# ip link set dev veth0 netns ns1 [root@docker2 ~]# ip netns exec ns1 ip addr add 192.21.0.1/24 dev veth0 [root@docker2 ~]# ip netns exec ns1 ip link set veth0 up [root@docker2 ~]# ip link set dev veth3 netns ns2 [root@docker2 ~]# ip netns exec ns2 ip addr add 192.21.0.2/24 dev veth3 [root@docker2 ~]# ip netns exec ns2 ip link set veth3 up [root@docker2 ~]# ip link set dev veth1 master bridge0 [root@docker2 ~]# ip link set dev veth2 master bridge0 [root@docker2 ~]# ip link set dev veth1 up [root@docker2 ~]# ip link set dev veth2 up [root@docker2 ~]# ip netns exec ns1 ping 192.21.0.2 PING 192.21.0.2 (192.21.0.2) 56(84) bytes of data. 64 bytes from 192.21.0.2: icmp_seq=1 ttl=64 time=0.075 ms 64 bytes from 192.21.0.2: icmp_seq=2 ttl=64 time=0.064 ms 64 bytes from 192.21.0.2: icmp_seq=3 ttl=64 time=0.077 ms ^C --- 192.21.0.2 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2005ms rtt min/avg/max/mdev = 0.064/0.072/0.077/0.005 ms /<code>
這個就模擬了容器通過docker0網橋通信的過程。
在這裡我們初步的認識一下network namespace,可以方便我們學習docker網絡,下面我們一起來認識一下docker的四大網絡模型。
2、docker的網絡模式
docker提供了包括bridge模式、host模式、container模式和none模式在內的四大網絡模式,下面我們一起學習一下這四個網絡模式。
2.1、bridge模式
<code>當我們將docker安裝好之後,docker會自動一個默認的網橋docker0,而bridge模式也是docker默認的網絡模式,在不指定--network的前提下,會自動為每一個容器分配一個network namespace,並單獨配置IP。我們可以在宿主機上更直觀的看到這個現象 /<code>
<code>[root@docker1 ~]# ifconfig docker0: flags=4099 mtu 1500 inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255 ether 02:42:f7:b9:1b:10 txqueuelen 0 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 ens32: flags=4163 mtu 1500 inet 192.168.1.51 netmask 255.255.255.0 broadcast 192.168.1.255 inet6 fe80::20c:29ff:fefc:3f5f prefixlen 64 scopeid 0x20 ether 00:0c:29:fc:3f:5f txqueuelen 1000 (Ethernet) RX packets 145458 bytes 212051596 (202.2 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 9782 bytes 817967 (798.7 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73 mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10 loop txqueuelen 1 (Local Loopback) RX packets 8 bytes 684 (684.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 8 bytes 684 (684.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 /<code>
我們可以看到,docker在宿主機上創建的網橋docker,其IP為172.17.0.1,而網橋則是橋接在了宿主機網卡ens33上面。下面我們將創建兩個容器
<code>[root@docker1 ~]# docker run -itd centos:7.2.1511 22db06123b51ad671e2545e5204dec5c40853f60cbe1784519d614fd54fee838 [root@docker1 ~]# docker run -itd centos:7.2.1511 f35b9d78e08f9c708c421913496cd1b0f37e7afd05a6619b2cd0997513d98885 [root@docker1 ~]# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES f35b9d78e08f centos:7.2.1511 "/bin/bash" 3 seconds ago Up 2 seconds tender_nightingale 22db06123b51 centos:7.2.1511 "/bin/bash" 4 seconds ago Up 3 seconds elastic_darwin /<code>
現在我們再查看一下網卡,又會有什麼發現呢?
<code>[root@docker1 ~]# ifconfig docker0: flags=4163 mtu 1500 inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255 inet6 fe80::42:f7ff:feb9:1b10 prefixlen 64 scopeid 0x20 ether 02:42:f7:b9:1b:10 txqueuelen 0 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 8 bytes 648 (648.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 ens32: flags=4163 mtu 1500 inet 192.168.1.51 netmask 255.255.255.0 broadcast 192.168.1.255 inet6 fe80::20c:29ff:fefc:3f5f prefixlen 64 scopeid 0x20 ether 00:0c:29:fc:3f:5f txqueuelen 1000 (Ethernet) RX packets 145565 bytes 212062634 (202.2 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 9839 bytes 828247 (808.8 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73 mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10 loop txqueuelen 1 (Local Loopback) RX packets 8 bytes 684 (684.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 8 bytes 684 (684.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 vethe96ebe7: flags=4163 mtu 1500 inet6 fe80::fcf9:fbff:fefb:db3b prefixlen 64 scopeid 0x20 ether fe:f9:fb:fb:db:3b txqueuelen 0 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 13 bytes 1038 (1.0 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 vethfdafaf4: flags=4163 mtu 1500 inet6 fe80::1426:8eff:fe86:1dfb prefixlen 64 scopeid 0x20 ether 16:26:8e:86:1d:fb txqueuelen 0 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 16 bytes 1296 (1.2 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 /<code>
我們可以清楚地看到這裡多出來了兩個網卡:vethe96ebe7和vethfdafaf4,同時每個容器內也創建了一個網卡設備ens3,而這就組成了bridge模式的虛擬網卡設備veth pair。veth pair會組成一個數據通道,而網橋docker0則充當了網關,可參考下圖
同一宿主機通過廣播通信,跨主機則是通過docker0網關。
2.2、host模式
host模式是共享宿主機網絡,使得容器和host的網絡配置完全一樣。我們先啟動一個host模式的容器
<code>[root@docker1 ~]# docker run -itd --network=host centos:7.2.1511 841b66477f9c1fc1cdbcd4377cc691db919f7e02565e03aa8367eb5b0357abeb [root@docker1 ~]# ifconfig docker0: flags=4099 mtu 1500 inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255 inet6 fe80::42:65ff:fead:8ecf prefixlen 64 scopeid 0x20 ether 02:42:65:ad:8e:cf txqueuelen 0 (Ethernet) RX packets 18 bytes 19764 (19.3 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 16 bytes 2845 (2.7 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 ens32: flags=4163 mtu 1500 inet 192.168.1.51 netmask 255.255.255.0 broadcast 192.168.1.255 inet6 fe80::20c:29ff:fefc:3f5f prefixlen 64 scopeid 0x20 ether 00:0c:29:fc:3f:5f txqueuelen 1000 (Ethernet) RX packets 147550 bytes 212300608 (202.4 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 11002 bytes 1116344 (1.0 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73 mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10 loop txqueuelen 1 (Local Loopback) RX packets 8 bytes 684 (684.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 8 bytes 684 (684.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 /<code>
可以看到host模式下並未創建虛擬網卡設備,我們進入到容器內看一下
<code>[root@docker1 /]# hostname -i fe80::20c:29ff:fefc:3f5f%ens32 fe80::42:65ff:fead:8ecf%docker0 192.168.1.51 172.17.0.1 /<code>
我們可以很直觀的看到容器內IP為宿主機IP,同時還有一個docker0的網關。我們可以通過下圖更加直觀的認識一下
當然了,host模式很方便,但是也有很多的不足,這個後續會談。
2.3、container模式
這個模式下,是將新創建的容器丟到一個已存在的容器的網絡棧中,共享ip等網絡資源,而兩者通過lo迴環進行通信,下面我們啟動一個容器可以觀察一下
<code>[root@docker2 ~]# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES b7b201d3139f centos:7.2.1511 "/bin/bash" 28 minutes ago Up 27 minutes jovial_kepler [root@docker2 ~]# docker run -itd --network=container:b7b201d3139f centos:7.2.1511 c4a6d59a971a7f5d92bfcc22de85853bfd36580bf08272de58cb70bc4898a5f0 [root@docker2 ~]# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES c4a6d59a971a centos:7.2.1511 "/bin/bash" 2 seconds ago Up 2 seconds confident_shockley b7b201d3139f centos:7.2.1511 "/bin/bash" 28 minutes ago Up 28 minutes jovial_kepler [root@docker2 ~]# ifconfig docker0: flags=4163 mtu 1500 inet 172.18.0.1 netmask 255.255.0.0 broadcast 172.18.255.255 inet6 fe80::42:8ff:feed:83e7 prefixlen 64 scopeid 0x20 ether 02:42:08:ed:83:e7 txqueuelen 0 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 8 bytes 648 (648.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 ens32: flags=4163 mtu 1500 inet 192.168.1.52 netmask 255.255.255.0 broadcast 192.168.1.255 inet6 fe80::20c:29ff:fe1b:1e63 prefixlen 64 scopeid 0x20 ether 00:0c:29:1b:1e:63 txqueuelen 1000 (Ethernet) RX packets 146713 bytes 212204228 (202.3 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 10150 bytes 947975 (925.7 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73 mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10 loop txqueuelen 1 (Local Loopback) RX packets 4 bytes 340 (340.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 4 bytes 340 (340.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 vethb360066: flags=4163 mtu 1500 inet6 fe80::ca6:62ff:fe45:70e8 prefixlen 64 scopeid 0x20 ether 0e:a6:62:45:70:e8 txqueuelen 0 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 13 bytes 1038 (1.0 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 /<code>
這裡可以看到,兩個容器,但是虛擬網卡只有一個,下面我們進入到容器內
<code>[root@docker2 ~]# docker exec -it c4a6d59a971a bash [root@b7b201d3139f /]# hostname -i 172.18.0.3 /<code>
可以清晰地看到,兩個容器 是同一套網絡資源
2.4、none
通過--network=node 可以創建一個隔離的namespace,但是沒有進行任何配置。使用這一網絡模式需要管理員對網絡配置有自己獨到的認識,根據需求配置網絡棧即可。
通過對以上內容的學習,我們對docker網絡有了初步的認識,下面我們將進行docker集群網絡方案的學習。
二、進階,深入理解docker集群網絡方案
1、同宿主機外進行通信
一般來說,linux系統的ip_forward 已打開,同時 SNAT/MASQUERADE規則已創建後,都是可以進行對外通信的。主要是因為linux的轉發依託於ip_forward,同時從容器網段出來的包都要經過一次MASQUERADE轉發,來實現ip地址的替換。
同時也需要注意一下DNS和主機名的影響,其中
/etc/resolv/conf 在創建容器的時候,默認與宿主機的/etc/resolv.conf一致
/etc/hosts 中記載了容器自身的一些地址和名稱
/etc/hostname 中記載的是容器的主機名
這三個文件在容器內修改是即時生效,但是重啟後失效,所以在DNS上可以通過--dns=address來指定的,主機名同理。
2、集群網絡方案——隧道方案
隧道方案,也就是overlay網絡,適用於幾乎所以有的網絡基礎架構,唯一的要求是主機之間的IP鏈接,但是由於二次封包,會導致性能的下降,同時定位問題也是很麻煩的。