我们浏览 Prometheus Dashboard 上的 Configuration 页面,会有个疑问,这些配置内容是从哪里来的呢?
prometheus-operator 和直接部署 prometheus 区别是 operator 把 prometheus , alertmanager server 的配置, 还有 scape config , record / alert rule 包装成了 k8s 中的 CRD
[root@master01 ~]# kubectl get crd | grep monitoring alertmanagers.monitoring.coreos.com 2019-10-14T10:19:34Z podmonitors.monitoring.coreos.com 2019-10-14T10:19:34Z prometheuses.monitoring.coreos.com 2019-10-14T10:19:35Z prometheusrules.monitoring.coreos.com 2019-10-14T10:19:35Z servicemonitors.monitoring.coreos.com 2019-10-14T10:19:35Z
修改 CRD 之后,operator 监控到 CRD 的修改,生成一份prometheus 的配置文件,gzip 压缩后存成 k8s Secret。
那么如何获取配置并修改这个文件呢?
我们先看一下 monitoring 空间下都有哪些证书
[root@master01 ~]# kubectl get secret -n monitoring
NAME TYPE DATA AGE
alertmanager-main Opaque 1 295d
alertmanager-main-token-fpw52 kubernetes.io/service-account-token 3 295d
default-token-dbggv kubernetes.io/service-account-token 3 295d
etcd-certs Opaque 3 295d
grafana-datasources Opaque 1 295d
grafana-token-b2vkn kubernetes.io/service-account-token 3 295d
istio.alertmanager-main istio.io/key-and-cert 3 295d
istio.default istio.io/key-and-cert 3 295d
istio.grafana istio.io/key-and-cert 3 295d
istio.kube-state-metrics istio.io/key-and-cert 3 295d
istio.node-exporter istio.io/key-and-cert 3 295d
istio.prometheus-adapter istio.io/key-and-cert 3 295d
istio.prometheus-k8s istio.io/key-and-cert 3 295d
istio.prometheus-operator istio.io/key-and-cert 3 295d
kube-state-metrics-token-24vd9 kubernetes.io/service-account-token 3 295d
node-exporter-token-m8gbn kubernetes.io/service-account-token 3 295d
prometheus-adapter-token-gz56f kubernetes.io/service-account-token 3 295d
prometheus-k8s Opaque 1 295d
prometheus-k8s-token-bwq27 kubernetes.io/service-account-token 3 295d
prometheus-operator-token-bgds6 kubernetes.io/service-account-token 3 295d
读取 prometheus-k8s secret 内容为 json , 主要是看一下 data 中 xxx.xxx.gz 属性的值
[root@master01 ~]# kubectl get secret -n monitoring prometheus-k8s -o json
{
"apiVersion": "v1",
"data": {
"prometheus.yaml.gz": "xxxxxxxxx"
},
"kind": "Secret",
"metadata": {
"annotations": {
"generated": "true"
},
"creationTimestamp": "2019-10-14T10:19:55Z",
"labels": {
"managed-by": "prometheus-operator"
},
"name": "prometheus-k8s",
"namespace": "monitoring",
"ownerReferences": [
{
"apiVersion": "monitoring.coreos.com/v1",
"blockOwnerDeletion": true,
"controller": true,
"kind": "Prometheus",
"name": "k8s",
"uid": "41a949b2-bc18-43d7-b87d-db6e8990c27f"
}
],
"resourceVersion": "1953",
"selfLink": "/api/v1/namespaces/monitoring/secrets/prometheus-k8s",
"uid": "beda93d3-b123-4085-acea-30225f899f5a"
},
"type": "Opaque"
}
取出 data.“xxx.xxx.gz” 的值,再做 base64 解密和 gzip 还原, 得到最终配置文件
[root@master01 tmp]# kubectl get secret -n monitoring prometheus-k8s -o json | jq -r '.data."prometheus.yaml.gz"' | base64 -d | gzip -d
或
[root@master01 tmp]# echo -n " data.‘xxx.xxx.gz’ 值 " | base64 -d | gzip -d
global:
evaluation_interval: 30s
scrape_interval: 30s
external_labels:
prometheus: monitoring/k8s
prometheus_replica: $(POD_NAME)
rule_files:
- /etc/prometheus/rules/prometheus-k8s-rulefiles-0/*.yaml
scrape_configs:
- job_name: monitoring/alertmanager/0
......
- job_name: monitoring/coredns/0
......
- job_name: monitoring/etcd-k8s/0
......
- job_name: monitoring/grafana/0
......
- job_name: monitoring/kube-apiserver/0
......
- job_name: monitoring/kube-controller-manager/0
......
- job_name: monitoring/kube-scheduler/0
......
- job_name: monitoring/kube-state-metrics/0
......
- job_name: monitoring/kube-state-metrics/1
......
- job_name: monitoring/kubelet/0
......
- job_name: monitoring/kubelet/1
......
- job_name: monitoring/node-exporter/0
......
- job_name: monitoring/prometheus/0
......
- job_name: monitoring/prometheus-operator/0
......
alerting:
alert_relabel_configs:
- action: labeldrop
regex: prometheus_replica
alertmanagers:
- path_prefix: /
scheme: http
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- monitoring
relabel_configs:
- action: keep
source_labels:
- __meta_kubernetes_service_name
regex: alertmanager-main
- action: keep
source_labels:
- __meta_kubernetes_endpoint_port_name
regex: web
同理可得 alertmanage 的配置内容
[root@master01 ~]# kubectl get secret -n monitoring alertmanager-main -o json | jq -r '.data."alertmanager.yaml"' | base64 -d
"global":
"resolve_timeout": "5m"
"receivers":
- "name": "null"
"route":
"group_by":
- "job"
"group_interval": "5m"
"group_wait": "30s"
"receiver": "null"
"repeat_interval": "12h"
"routes":
- "match":
"alertname": "Watchdog"
"receiver": "null"
那么如何修改这个文件呢?
先说说稍微简单的 alertmanager-main secret 如何修改
1.将获取到的 alertmanager-main secret 的内容
"global":
"resolve_timeout": "5m"
"receivers":
- "name": "null"
"route":
"group_by":
- "job"
"group_interval": "5m"
"group_wait": "30s"
"receiver": "null"
"repeat_interval": "12h"
"routes":
- "match":
"alertname": "Watchdog"
"receiver": "null"
2.调整为以下内容,并保存至 alertmanage-main-upd-v1.yaml 文件中
global:
resolve_timeout: 5m
smtp_auth_username: info@jz-ins.com
smtp_auth_password: Dgn1JfL4oBfHTWPE
smtp_from: info@jz-ins.com
smtp_require_tls: false
smtp_smarthost: smtp.qq.com:465
receivers:
- email_configs:
- headers:
Subject: '[ERROR] prometheus............'
to: 517469812@qq.com,snail@jz-ins.com,dracula@jz-ins.com
name: team-X-mails
- name: "null"
route:
group_by:
- alertname
- cluster
- service
group_interval: 5m
group_wait: 60s
receiver: team-X-mails
repeat_interval: 24h
routes:
- match:
alertname: Watchdog
receiver: "null"
3.将调整后的 alertmanage-main-upd-v1.yaml 文件通过命令方式将配置通过base64编码并将编码结果输出到 alertmanage-main-upd-v1.yaml.txt 文件中, Base64 加解密也可以使用在线工具 https://www.sojson.com/base64.html
base64 alertmanage-main-upd-v1.yaml > alertmanage-main-upd-v1.yaml.txt
4.然后进入 alertmanage-main-upd-v1.yaml.txt 文件中复制编码之后的字符串, 执行命令 kubectl edit secrets -n monitoring alertmanager-main 然后data下面的alertmanager.yaml的值替换为刚才复制的字符串,保存并退出(esc–>:wq)就可以了。在完成更新之后可以访问alertmanager的界面,查看配置是否已经生效
评论区