Consul 监控

微服务的框架体系中,服务发现是很重要的一个模块,常用来做服务发现的框架有 Consul 、 Etcd、Zookeeper 等等。今天我们来看看 Consul 要如何监控,其他的框架有时间我们也看一下。

针对 Consul 的监控 Prometheus 官方出了一个插件,叫 Consul Exporter ,地址是 :https://github.com/prometheus/consul_exporter ,这个仓库已经一年没有更新,但是 Prometheus 官方出品,可以继续使用,如果到了 2022 年 10 月份还没有更新,那么就要重新评估了。

Consul Exporter 当前最新版本是 0.7.1 ,发布于 2020.07.21 。

安装

Consul Exporter 可以使用二进制运行,也可以使用 Docker image 来运行。一个 Consul 机器启动一个 Consul Exporter 就可以了。 像下边这样,指定 Consul Server 的地址就可以。缺省从 9107 端口暴露监控数据。

./consul_exporter --consul.server=http://192.168.1.1:8500

如果是 Docker image 启动的话可以使用如下命令:

docker run -d -p 9107:9107 prom/consul-exporter --consul.server=192.168.1.1:8500

对于启动的参数 Consul Exporter 有很多。

  • consul.allow_stale: Allows any Consul server (non-leader) to service a read.
  • consul.ca-file: File path to a PEM-encoded certificate authority used to validate the authenticity of a server certificate.
  • consul.cert-file: File path to a PEM-encoded certificate used with the private key to verify the exporter's authenticity.
  • consul.health-summary: Collects information about each registered service and exports consul_catalog_service_node_healthy. This requires n+1 Consul API queries to gather all information about each service. Health check information are available via consul_health_service_status as well, but only for services which have a health check configured. Defaults to true. Disable using --no-consul.heatlh-summary.
  • consul.key-file: File path to a PEM-encoded private key used with the certificate to verify the exporter's authenticity.
  • consul.insecure: Disable TLS host verification.
  • consul.require_consistent: Forces the read to be fully consistent.
  • consul.server: Address (host and port) of the Consul instance we should connect to. This could be a local agent (localhost:8500, for instance), or the address of a Consul server.
  • consul.server-name: When provided, this overrides the hostname for the TLS certificate. It can be used to ensure that the certificate name matches the hostname we declare.
  • consul.timeout: Timeout on HTTP requests to consul. consul.request-limit: Limit the maximum number of concurrent requests to consul, 0 means no limit.
  • log.format: Set the log target and format. Example: logger:syslog?appname=bob&local=7 or logger:stdout?json=true
  • log.level: Logging level. info by default.
  • web.listen-address: Address to listen on for web interface and telemetry.
  • web.telemetry-path: Path under which to expose metrics.

对于 Consul Exporter 支持 Consul 官方提供的所有环境变量,包括使用 CONSUL_HTTP_TOKEN 去设置 ACL TOKEN 。官方提供的环境变量可以参考 https://github.com/hashicorp/consul/blob/b2478036d88a7e8eb9d6a0daf1a1c9ad0c8885ca/api/api.go#L24-L74

指标

从 Consul Exporter 暴露出来的

  • consul_up Was the last query of Consul successful
  • consul_raft_peers How many peers (servers) are in the Raft cluster
  • consul_serf_lan_members How many members are in the cluster
  • consul_serf_lan_member_status Status of member in the cluster. 1=Alive, 2=Leaving, 3=Left, 4=Failed. 这个指标的 Label 有 member
  • consul_catalog_services How many services are in the cluster
  • consul_catalog_service_node_healthy Is this service healthy on this node 这个指标的 Label 有 service, node
  • consul_health_node_status Status of health checks associated with a node 这个指标的 Label 有 check, node, status
  • consul_health_service_status Status of health checks associated with a service 这个指标的 Label 有 check, node, service, status
  • consul_catalog_kv The values for selected keys in Consul's key/value catalog. Keys with non-numeric values are omitted 这个指标的 Label 有 key
  • consul_service_checks Link the Consul service ID with check name if available 这个指标的 Label 有 service_id,service_name, check_id, check_name, node

对于这些指标有这些常用的计算过程如下:

通过下面这个计算,当值为 1 的时候,说明这个服务所有的节点都是好的,状态是 passing ,如果值为 0 ,那么说明这个服务至少有一个节点的状态是异常的。

min(consul_catalog_service_node_healthy) by (service_name)

那么我们可以继续计算,通过下面这个查询我们就能得到 状态异常的服务名称和节点名称:

sum by (node, service_name)(consul_catalog_service_node_healthy == 0)

通过下面这个查询可以得到服务状态是 critical 的服务名称:

consul_health_service_status{status="critical"} == 1

小结

简单列举了 Consul Exporter 的使用方法和指标,活用这些指标可以快速的发现问题。

results matching ""

    No results matching ""