k8s中通過consul實現prometheus聯邦功能


k8s中通過consul實現prometheus聯邦功能

目錄

1、背景介紹

2、架構

3、部署

4、驗證


背景介紹

應項目需要,對於平臺的監控數據需要做統一存儲和展示,對於原生的prometheus聯邦功能需要進行靜態配置聯邦部分,無法實現動態添加prometheus的功能,因此我們討論以後計劃使用consul來達到動態配置的效果。這樣新加入的prometheus只需要自動註冊到consul,負責存儲展示的prometheus只需要從consul拿到prometheus實例列表即可。


架 構

為了方便描述,基於上面的情況我們暫且先將參與的prometheus分一下角色。負責統一存儲展示的Prometheus稱為Data Center。各個環境下的prometheus稱為 Data Node吧。大致架構如下:

k8s中通過consul實現prometheus聯邦功能

如上圖所示, Data Center我們使用了Prometheus-Operator,至於Prometheus-Operator的優點我就不介紹了,大家可以移步官網查看。圖中的Data Node節點都是部署的原生的Prometheus。

  • 部署consul(這裡consul可以部署在任意k8s集群中,只要Data Node註冊時能拿到改地址即可)
  • 部署Data Node,並註冊到consul
  • 部署Prometheus-Operator,從consul發現已經註冊的prometheus

注意: 經過採坑,推薦大家使用Prometheus-Operator v0.24.0及以後的版本。原因在於operator在v0.24.0之前對於additionalScrapeConfigs的解析有誤。


部 署

  • 部署Data Node prometheus.yaml
apiVersion: v1
kind: "Service"
metadata:
name: prometheus
namespace: wisecloud-agent
labels:
name: prometheus-service
spec:
ports:
- name: prometheus-server
protocol: TCP
port: 9090
targetPort: 9090

nodePort: 32001
selector:
app: prometheus-server
type: NodePort
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
labels:
name: prometheus
app: prometheus-server
com.wise2c.service: prometheus
com.wise2c.stack: wisecloud-agent
name: prometheus
namespace: wisecloud-agent
spec:
replicas: 1
template:
metadata:
labels:
name: prometheus
com.wise2c.service: prometheus
com.wise2c.stack: wisecloud-agent
app: prometheus-server
spec:
hostNetwork: true
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: io.wise2c.service.prometheus
operator: Exists
serviceAccountName: wisecloud-agent
containers:
- name: service-register
image: registry.cn-hangzhou.aliyuncs.com/tder/service-register
imagePullPolicy: IfNotPresent
env:
- name: CONSUL_URL
value: 192.168.5.11:8500
- name: LISTEN_PORT
value: "8089"
- name: SERVICE_NAME
value: wisecloud-agent-prometheus
- name: SERVICE_PORT
value: "9090"
- name: SERVICE_HEALTH_CHECK_PATH
value: "/health"
- name: prometheus

image: prom/prometheus:v1.7.1
imagePullPolicy: IfNotPresent
command:
- "/bin/prometheus"
args:
- "-config.file=/etc/prometheus/prometheus.yml"
- "-storage.local.path=/prometheus"
- "-storage.local.retention=180h"
ports:
- containerPort: 9090
protocol: TCP
volumeMounts:
- name: data
mountPath: "/prometheus"
- name: config-volume
mountPath: "/etc/prometheus"
- name: alert-roles
mountPath: "/etc/prometheus-rules"
- name: local-timezone
mountPath: "/etc/localtime"
resources:
requests:
cpu: 100m
memory: 100Mi
limits:
cpu: 500m
memory: 2500Mi
tolerations:
- key: "node-role.kubernetes.io/master"
operator: "Exists"
effect: "NoSchedule"
volumes:
- name: data
hostPath:
path: /prometheus
- name: alert-roles
hostPath:
path: /etc/prometheus-rules
- name: config-volume
configMap:
name: prometheus-config
- name: local-timezone
hostPath:
path: /etc/localtime
---
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: wisecloud-agent

data:
prometheus.yml: |
global:
scrape_interval: 30s
scrape_timeout: 10s
evaluation_interval: 30s
external_labels:
environment: k8s_test
rule_files:
- "/etc/prometheus-rules/*.rules"
scrape_configs:
- job_name: 'cadvisor'
honor_labels: true
metrics_path: /metrics/cadvisor
scheme: https
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- kube-system
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
insecure_skip_verify: true
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_k8s_app]
separator: ;
regex: kubelet
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: https-metrics
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: pod
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service

replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_service_label_k8s_app]
separator: ;
regex: (.+)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: https-metrics
action: replace


針對上述Yaml我挑重要的部分做一下說明。

- name: service-register
image: registry.cn-hangzhou.aliyuncs.com/tder/service-register
imagePullPolicy: IfNotPresent
env:
- name: CONSUL_URL
value: 192.168.5.11:8500
- name: LISTEN_PORT
value: "8089"
- name: SERVICE_NAME
value: wisecloud-agent-prometheus
- name: SERVICE_PORT
value: "9090"
- name: SERVICE_HEALTH_CHECK_PATH
value: "/health"
k8s中通過consul實現prometheus聯邦功能

如果上述參數不對可能導致實例註冊不到consul。上面的ConfigMap為prometheus的監控配置文件,大家可以自行配置。注意這裡的配置使用的是k8s中內置的cadvisor exporter。如果是自己部署的cadvisor需要修改ConfigMap中data的定義。

部署yaml kubectl apply -f prometheus.yaml -n xxx。效果如下:

k8s中通過consul實現prometheus聯邦功能


  • 部署Data Center。 為了簡化Prometheus-Operator的部署可以使用
  • https://github.com/alanpeng/install-prometheus-operator (喜歡的可以star哦)。啟動以後效果如下:
k8s中通過consul實現prometheus聯邦功能

如果用戶為了方便調試可以把svc列表中的grafana和prometheus端口暴露成NodePort。

目前為止Prometheus-Operator部署已經告一段落。接下來開始配置prometheus-operator的動態發現功能了。Prometheus-Operator雖然提供了ServiceMonitor來生成Prometheus需要的配置,但對於複雜的場景並不支持的很好,所以Operator中提出了AdditionalScrapeConfigs的屬性。基於AdditionalScrapeConfigs配置需要對已經部署的Prometheus-Operator做一下修改

  • 創建secrets(additional.yaml)


apiVersion: v1
data:
prometheus-additional.yaml: LSBqb2JfbmFtZTogJ2ZlZGVyYXRlJwogIHNjcmFwZV9pbnRlcnZhbDogMTVzCgogIGhvbm9yX2xhYmVsczogdHJ1ZQogIG1ldHJpY3NfcGF0aDogJy9mZWRlcmF0ZScKCiAgcGFyYW1zOgogICAgJ21hdGNoW10nOgogICAgLSAne2pvYj0iazhzLWFwaS1leHBvcnRlciJ9JwogICAgLSAne2pvYj0ia3ViZS1zdGF0ZS1tZXRyaWNzIn0nCiAgICAtICd7am9iPSJrdWJlbGV0In0nCgogIGNvbnN1bF9zZF9jb25maWdzOgogICAgICAtIHNlcnZlcjogeHgueHgueHgueHg6ODUwMAogICAgICAgIHNlcnZpY2VzOiBbIndpc2VjbG91ZC1hZ2VudC1wcm9tZXRoZXVzIl0=
kind: Secret
metadata:
labels:
managed-by: prometheus-operator
name: prometheus-k8s-new
namespace: monitoring

針對面的prometheus-additional.yaml屬性我要多介紹一下。把上面的base64編碼進行解碼,得到一下內容

- job_name: 'federate'
scrape_interval: 15s
honor_labels: true
metrics_path: '/federate'
params:
'match[]':
- '{job="k8s-api-exporter"}'
- '{job="kube-state-metrics"}'
- '{job="kubelet"}'
consul_sd_configs:
- server: xx.xx.xx.xx:8500
services:
- wisecloud-agent-prometheus


k8s中通過consul實現prometheus聯邦功能


  • 修改prometheus-prometheus.yaml


apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
labels:
prometheus: k8s
name: k8s
namespace: monitoring
spec:
alerting:
alertmanagers:
- name: alertmanager-main
namespace: monitoring
port: web
baseImage: 192.168.5.14/library/prometheus
nodeSelector:
beta.kubernetes.io/os: linux
replicas: 1
resources:
requests:
memory: 100Mi
ruleSelector:
matchLabels:
prometheus: k8s
role: alert-rules
serviceAccountName: prometheus-k8s
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector: {}
version: v2.3.1
additionalScrapeConfigs:
name: prometheus-k8s-new
key: prometheus-additional.yaml


注意: Prometheus-Operator部署以後本身已經對該集群的相關資源進行了監控,如果prometheus和operator部署在同一環境最好刪掉operator中的相關ServiceMonitor

kubectl delete servicemonitor alertmanager coredns kube-apiserver kube-controller-manager kube-scheduler kube-state-metrics kubelet prometheus prometheus-operator -n monitoring


部署上述文件

additional.yaml prometheus-prometheus.yaml

kubectl apply -f additional.yaml -f prometheus-prometheus.yaml


驗 證

訪問operator中的prometheus可以看到如下效果:

  • target一覽
k8s中通過consul實現prometheus聯邦功能


  • 查詢metrics
k8s中通過consul實現prometheus聯邦功能

上圖的environment=k8s-test就是在Data Node中定義的external-labels。看到上面的效果,表示聯邦已經生效了。

原文鏈接:https://mp.weixin.qq.com/s/C4f5rhBSyOTHD_mtaujeLw

關於睿雲智合

深圳睿雲智合科技有限公司成立於2012年,總部位於深圳,並分別在成都、深圳設立了研發中心,北京、上海設立了分支機構,核心骨幹人員全部為來自金融、科技行業知名企業資深業務專家、技術專家。早期專注於為中國金融保險等大型企業提供創新技術、電子商務、CRM等領域專業諮詢服務。

自2016年始,在率先將容器技術引進到中國保險行業客戶後,公司組建了專業的容器技術產品研發和實施服務團隊,旨在幫助中國金融行業客戶將容器創新技術應用於企業信息技術支持業務發展的基礎能力改善與提升,成為中國金融保險行業容器技術服務領導品牌。

此外,憑藉多年來在呼叫中心領域的業務經驗與技術積累,睿雲智合率先在業界推出基於開源軟交換平臺FreeSwitch的微服務架構多媒體數字化業務平臺,將語音、視頻、webchat、微信、微博等多種客戶接觸渠道集成,實現客戶統一接入、精準識別、智能路由的CRM策略,並以容器化治理來支持平臺的全應用生命週期管理,顯著提升了數字化業務處理的靈活、高效、彈性、穩定等特性,為幫助傳統企業向“以客戶為中心”的數字化業務轉型提供完美的一站式整體解決方案。


分享到:


相關文章: