深度解讀 Kubernetes CRI 容器運行時接口

簡介: Kubernetes提供了多種容器開放接口用於對接不同的後端來提供資源,如提供計算資源的容器運行時接口(Container Runtime Interface, CRI),提供網絡資源的容器網絡接口(Container Network Interface, CNI),提供提供存儲資源的容器存儲接口(Container Storage Interface, CSI)。這篇作為這系列的開篇,主要介紹了kubelet的CRI接口實現。


CRI 簡介

在 Kubernetes1.5 之前 Docker 作為第一個容器運行時,Kubelet 通過內嵌 dockershim 操作容器API,但隨著越來越多的容器運行時的希望加入kubelet,社區開始有人提出通過“加入一個client/server接口來抽象容器運行時”的ISSUE。在 v1.6.0 後, Kubernetes 開始默認啟用 CRI(容器運行時接口),下圖是容器運行時在 kubernets 中得作用。

深度解讀 Kubernetes CRI 容器運行時接口

圖1. kubernets操作數據流圖


CRI 架構介紹

以下主要介紹Kubernetes1.18版的CRI

CRI 為 kubelet 提供一套抽象的容器調度的接口,CRI 主要承接 kubelet 對容器的操作。CRI 得通信協議是 gRPC,當時主要考慮到性能問題。加入 CRI 後 kubelet 得架構如下圖所示。

深度解讀 Kubernetes CRI 容器運行時接口

圖2. kubelet 架構

Kubelet 現在主要包含兩個運行時的模塊,一個是 dockershim, 一個是 remote。dockershim 是原來的提供Docker的運行時接口(PS: docker果然還是一等公民:)。remote包對應的就是 CRI 接口,採用gRPC,通過 RemoteRuntime 和 CRI RuntimeService相連:

<code>...
// createAndStartFakeRemoteRuntime creates and starts fakeremote.RemoteRuntime.
// It returns the RemoteRuntime, endpoint on success.
// Users should call fakeRuntime.Stop() to cleanup the server.
func createAndStartFakeRemoteRuntime(t *testing.T) (*fakeremote.RemoteRuntime, string) {
endpoint, err := fakeremote.GenerateEndpoint()
require.NoError(t, err)

fakeRuntime := fakeremote.NewFakeRemoteRuntime()
fakeRuntime.Start(endpoint)

return fakeRuntime, endpoint
}

func createRemoteRuntimeService(endpoint string, t *testing.T) internalapi.RuntimeService {
runtimeService, err := NewRemoteRuntimeService(endpoint, defaultConnectionTimeout)
require.NoError(t, err)

return runtimeService
}

func TestVersion(t *testing.T) {
fakeRuntime, endpoint := createAndStartFakeRemoteRuntime(t)
defer fakeRuntime.Stop()

r := createRemoteRuntimeService(endpoint, t)
version, err := r.Version(apitest.FakeVersion)
assert.NoError(t, err)
assert.Equal(t, apitest.FakeVersion, version.Version)
assert.Equal(t, apitest.FakeRuntimeName, version.RuntimeName)
}/<code>

kubernetes remote代碼地址:https://github.com/kubernetes/kubernetes/tree/release-1.18/pkg/kubelet/remote


CRI 容器運行時的三類行為

CRI 容器運行時主要描述了三種服務的行為 Sandbox、Container、Image:

深度解讀 Kubernetes CRI 容器運行時接口

圖3. CRI容器運行時流程


Sandbox:

<code>// PodSandboxManager contains methods for operating on PodSandboxes. The methods
// are thread-safe.
type PodSandboxManager interface {
// RunPodSandbox creates and starts a pod-level sandbox. Runtimes should ensure
// the sandbox is in ready state.
RunPodSandbox(config *runtimeapi.PodSandboxConfig, runtimeHandler string) (string, error)
// StopPodSandbox stops the sandbox. If there are any running containers in the

// sandbox, they should be force terminated.
StopPodSandbox(podSandboxID string) error
// RemovePodSandbox removes the sandbox. If there are running containers in the
// sandbox, they should be forcibly removed.
RemovePodSandbox(podSandboxID string) error
// PodSandboxStatus returns the Status of the PodSandbox.
PodSandboxStatus(podSandboxID string) (*runtimeapi.PodSandboxStatus, error)
// ListPodSandbox returns a list of Sandbox.
ListPodSandbox(filter *runtimeapi.PodSandboxFilter) ([]*runtimeapi.PodSandbox, error)
// PortForward prepares a streaming endpoint to forward ports from a PodSandbox, and returns the address.
PortForward(*runtimeapi.PortForwardRequest) (*runtimeapi.PortForwardResponse, error)
}/<code>

Container:

<code>// ContainerManager contains methods to manipulate containers managed by a
// container runtime. The methods are thread-safe.
type ContainerManager interface {
// CreateContainer creates a new container in specified PodSandbox.
CreateContainer(podSandboxID string, config *runtimeapi.ContainerConfig, sandboxConfig *runtimeapi.PodSandboxConfig) (string, error)
// StartContainer starts the container.
StartContainer(containerID string) error
// StopContainer stops a running container with a grace period (i.e., timeout).
StopContainer(containerID string, timeout int64) error
// RemoveContainer removes the container.
RemoveContainer(containerID string) error
// ListContainers lists all containers by filters.
ListContainers(filter *runtimeapi.ContainerFilter) ([]*runtimeapi.Container, error)
// ContainerStatus returns the status of the container.
ContainerStatus(containerID string) (*runtimeapi.ContainerStatus, error)
// UpdateContainerResources updates the cgroup resources for the container.
UpdateContainerResources(containerID string, resources *runtimeapi.LinuxContainerResources) error
// ExecSync executes a command in the container, and returns the stdout output.
// If command exits with a non-zero exit code, an error is returned.
ExecSync(containerID string, cmd []string, timeout time.Duration) (stdout []byte, stderr []byte, err error)
// Exec prepares a streaming endpoint to execute a command in the container, and returns the address.
Exec(*runtimeapi.ExecRequest) (*runtimeapi.ExecResponse, error)
// Attach prepares a streaming endpoint to attach to a running container, and returns the address.
Attach(req *runtimeapi.AttachRequest) (*runtimeapi.AttachResponse, error)
// ReopenContainerLog asks runtime to reopen the stdout/stderr log file
// for the container. If it returns error, new container log file MUST NOT
// be created.
ReopenContainerLog(ContainerID string) error
}/<code>

Image:

<code>// ImageManagerService interface should be implemented by a container image
// manager.
// The methods should be thread-safe.
type ImageManagerService interface {
// ListImages lists the existing images.
ListImages(filter *runtimeapi.ImageFilter) ([]*runtimeapi.Image, error)
// ImageStatus returns the status of the image.
ImageStatus(image *runtimeapi.ImageSpec) (*runtimeapi.Image, error)
// PullImage pulls an image with the authentication config.
PullImage(image *runtimeapi.ImageSpec, auth *runtimeapi.AuthConfig, podSandboxConfig *runtimeapi.PodSandboxConfig) (string, error)
// RemoveImage removes the image.
RemoveImage(image *runtimeapi.ImageSpec) error
// ImageFsInfo returns information of the filesystem that is used to store images.
ImageFsInfo() ([]*runtimeapi.FilesystemUsage, error)
}/<code>

CRI 容器生命週期操作流程

kubelet創建一個Pod主要可以拆解成:

  1. 調用RunPodSandox創建一個pod沙盒
  2. 調用CreateContainer創建一個容器
  3. 調用StartContainer啟動一個容器
深度解讀 Kubernetes CRI 容器運行時接口

圖4. 容器生命週期操作流程


CRI Streaming接口

streaming接口主要是用於執行 exec 命令,exec 命令主要用於 attach 容器進行交互,通過流式接口的可以節省資源,提高連接的可靠性。kubelet 調用 Exec() 接口發給 RuntimeService ,RuntimeService 返回一個 url 給到 apiserver, 讓 apiserver 跟 Stream Server 直接建立連接,獲取流式數據。 由於繞過了kubelet,因此Stream Server 也提高連接的可靠性CRI 中 Exec() 接口:

<code>// ContainerManager contains methods to manipulate containers managed by a
// container runtime. The methods are thread-safe.
type ContainerManager interface {
...
// Exec prepares a streaming endpoint to execute a command in the container, and returns the address.
Exec(*runtimeapi.ExecRequest) (*runtimeapi.ExecResponse, error)
// Attach prepares a streaming endpoint to attach to a running container, and returns the address.
Attach(req *runtimeapi.AttachRequest) (*runtimeapi.AttachResponse, error)
...
}/<code>
深度解讀 Kubernetes CRI 容器運行時接口

圖5. streaming數據流


CRI proto接口定義

CRI 是一個為kubelet提供的一個廣泛的容器運行時的無需編譯的接口插件。 CRI 包含了一個 protocol buffers 和 gRPC API。kubernetes1.18的 CRI 代碼路徑:kubernetes/staging/src/k8s.io/cri-api/。CRI中定義了容器和鏡像的服務的接口,因為容器運行時與鏡像的生命週期是彼此隔離的,因此需要定義兩個服務 RuntimeService 和 ImageService。

RuntimeService的proto接口定義文件

<code>// Runtime service defines the public APIs for remote container runtimes
service RuntimeService {
// Version returns the runtime name, runtime version, and runtime API version.
rpc Version(VersionRequest) returns (VersionResponse) {}

// RunPodSandbox creates and starts a pod-level sandbox. Runtimes must ensure
// the sandbox is in the ready state on success.
rpc RunPodSandbox(RunPodSandboxRequest) returns (RunPodSandboxResponse) {}
// StopPodSandbox stops any running process that is part of the sandbox and
// reclaims network resources (e.g., IP addresses) allocated to the sandbox.
// If there are any running containers in the sandbox, they must be forcibly
// terminated.
// This call is idempotent, and must not return an error if all relevant
// resources have already been reclaimed. kubelet will call StopPodSandbox
// at least once before calling RemovePodSandbox. It will also attempt to
// reclaim resources eagerly, as soon as a sandbox is not needed. Hence,
// multiple StopPodSandbox calls are expected.
rpc StopPodSandbox(StopPodSandboxRequest) returns (StopPodSandboxResponse) {}
// RemovePodSandbox removes the sandbox. If there are any running containers
// in the sandbox, they must be forcibly terminated and removed.
// This call is idempotent, and must not return an error if the sandbox has
// already been removed.
rpc RemovePodSandbox(RemovePodSandboxRequest) returns (RemovePodSandboxResponse) {}
// PodSandboxStatus returns the status of the PodSandbox. If the PodSandbox is not
// present, returns an error.
rpc PodSandboxStatus(PodSandboxStatusRequest) returns (PodSandboxStatusResponse) {}
// ListPodSandbox returns a list of PodSandboxes.
rpc ListPodSandbox(ListPodSandboxRequest) returns (ListPodSandboxResponse) {}


// CreateContainer creates a new container in specified PodSandbox
rpc CreateContainer(CreateContainerRequest) returns (CreateContainerResponse) {}
// StartContainer starts the container.
rpc StartContainer(StartContainerRequest) returns (StartContainerResponse) {}
// StopContainer stops a running container with a grace period (i.e., timeout).
// This call is idempotent, and must not return an error if the container has
// already been stopped.
// TODO: what must the runtime do after the grace period is reached?
rpc StopContainer(StopContainerRequest) returns (StopContainerResponse) {}
// RemoveContainer removes the container. If the container is running, the
// container must be forcibly removed.
// This call is idempotent, and must not return an error if the container has
// already been removed.
rpc RemoveContainer(RemoveContainerRequest) returns (RemoveContainerResponse) {}
// ListContainers lists all containers by filters.
rpc ListContainers(ListContainersRequest) returns (ListContainersResponse) {}
// ContainerStatus returns status of the container. If the container is not
// present, returns an error.
rpc ContainerStatus(ContainerStatusRequest) returns (ContainerStatusResponse) {}
// UpdateContainerResources updates ContainerConfig of the container.
rpc UpdateContainerResources(UpdateContainerResourcesRequest) returns (UpdateContainerResourcesResponse) {}
// ReopenContainerLog asks runtime to reopen the stdout/stderr log file
// for the container. This is often called after the log file has been
// rotated. If the container is not running, container runtime can choose
// to either create a new log file and return nil, or return an error.
// Once it returns error, new container log file MUST NOT be created.
rpc ReopenContainerLog(ReopenContainerLogRequest) returns (ReopenContainerLogResponse) {}

// ExecSync runs a command in a container synchronously.
rpc ExecSync(ExecSyncRequest) returns (ExecSyncResponse) {}
// Exec prepares a streaming endpoint to execute a command in the container.
rpc Exec(ExecRequest) returns (ExecResponse) {}
// Attach prepares a streaming endpoint to attach to a running container.
rpc Attach(AttachRequest) returns (AttachResponse) {}
// PortForward prepares a streaming endpoint to forward ports from a PodSandbox.
rpc PortForward(PortForwardRequest) returns (PortForwardResponse) {}

// ContainerStats returns stats of the container. If the container does not
// exist, the call returns an error.
rpc ContainerStats(ContainerStatsRequest) returns (ContainerStatsResponse) {}
// ListContainerStats returns stats of all running containers.
rpc ListContainerStats(ListContainerStatsRequest) returns (ListContainerStatsResponse) {}

// UpdateRuntimeConfig updates the runtime configuration based on the given request.
rpc UpdateRuntimeConfig(UpdateRuntimeConfigRequest) returns (UpdateRuntimeConfigResponse) {}

// Status returns the status of the runtime.
rpc Status(StatusRequest) returns (StatusResponse) {}
}/<code>

ImageService 的 proto 接口定義文件:

<code>// ImageService defines the public APIs for managing images.
service ImageService {
// ListImages lists existing images.
rpc ListImages(ListImagesRequest) returns (ListImagesResponse) {}
// ImageStatus returns the status of the image. If the image is not
// present, returns a response with ImageStatusResponse.Image set to
// nil.
rpc ImageStatus(ImageStatusRequest) returns (ImageStatusResponse) {}
// PullImage pulls an image with authentication config.
rpc PullImage(PullImageRequest) returns (PullImageResponse) {}
// RemoveImage removes the image.
// This call is idempotent, and must not return an error if the image has
// already been removed.
rpc RemoveImage(RemoveImageRequest) returns (RemoveImageResponse) {}
// ImageFSInfo returns information of the filesystem that is used to store images.
rpc ImageFsInfo(ImageFsInfoRequest) returns (ImageFsInfoResponse) {}
}/<code>

CRI工具介紹

  • CRI命令工具:crictl,幫助用戶和開發者調試容器問題
  • CRI測試工具:critest,用於驗證CRI接口的測試工具,驗證是否滿足Kubelet要求。

crictl 安裝:

<code>VERSION="v1.17.0"
wget https://github.com/kubernetes-sigs/cri-tools/releases/download/$VERSION/crictl-$VERSION-linux-amd64.tar.gz
sudo tar zxvf crictl-$VERSION-linux-amd64.tar.gz -C /usr/local/bin
rm -f crictl-$VERSION-linux-amd64.tar.gz/<code>

critest 安裝:

<code>VERSION="v1.17.0"
wget https://github.com/kubernetes-sigs/cri-tools/releases/download/$VERSION/critest-$VERSION-linux-amd64.tar.gz
sudo tar zxvf critest-$VERSION-linux-amd64.tar.gz -C /usr/local/bin
rm -f critest-$VERSION-linux-amd64.tar.gz/<code>

參考文獻

- Introducing Container Runtime Interface (CRI) in Kubernetes

- CNCF x Alibaba 雲原生技術公開課

- cri-tools

- shikanon.com


分享到:


相關文章: