Elasticsearch Exporter
Elasticsearch 是一个实时的分布式存储、搜索、分析的引擎,官网是 https://www.elastic.co/cn/elasticsearch/ 。 很多公司在用这个,你的公司大概率也有,那么对于 Elasticsearch 该怎么监控呢?针对这个问题,Prometheus Community 开发了一个 Elasticsearch Exporter ,用来导出 Elasticsearch 的监控数据,仓库地址是 https://github.com/prometheus-community/elasticsearch_exporter ,用 Golang 编写。
Elasticsearch Exporter 当前版本是 1.3.0 ,发布于 2021.10.21
安装和配置
安装是比较简单的,下载最新版本的二进制包,或者下载 Docker image 。启动 Docker image 可以使用
docker run --rm -p 9114:9114 quay.io/prometheuscommunity/elasticsearch-exporter:latest --es.uri=http://localhost:9200
在 Kubernetes 里使用 helm 部署可以参考 https://github.com/kubernetes/charts/tree/master/stable/elasticsearch-exporter 。
每次从这个 Exporter 获取 Elasticsearch 的监控数据都会从 Elasticsearch 服务拉取一次,如果获取时间间隔太短的话,可能会增大服务端的压力,尤其是使用 --es.all
和 --es.indices
运行 Exporter 的时候。所以在实际使用过程中,建议大家在确定获取数据间隔的时候,先测量获取 /_nodes/stats
和 /_all/_stats
指标的时间,防止监控对服务造成影响。
Elasticsearch Exporter 启动的时候有如下启动参数可以选择。
参数 | 出现的版本 | 描述 | 缺省值 |
---|---|---|---|
es.uri |1.0.2 |Address (host and port) of the Elasticsearch node we should connect to. This could be a local node (localhost:9200, for instance), or the address of a remote Elasticsearch server. When basic auth is needed, specify as:
es.client-private-key |1.0.2 |Path to PEM file that contains the private key for client auth when connecting to Elasticsearch.
es.client-cert |1.0.2 |Path to PEM file that contains the corresponding cert for the private key to connect to Elasticsearch.
es.clusterinfo.interval |1.1.0rc1 |Cluster info update interval for the cluster label |5m
es.ssl-skip-verify |1.0.4rc1 |Skip SSL verification when connecting to Elasticsearch. |false
es.apiKey |unreleased |API Key to use for authenticating against Elasticsearch.
web.listen-address |1.0.2 |Address to listen on for web interface and telemetry. |:9114
web.telemetry-path |1.0.2 |Path under which to expose metrics. |/metrics
version |1.0.2 |Show version info on stdout and exit.
命令行参数以单个参数开始,用于版本低于 1.1.0rc1 。对于大于 1.1.0rc1 的版本,命令行参数用 --
指定。此外,所有命令行参数都可以作为环境变量提供。
对于 Elasticsearch 7.x 版本提供了一些安全性的措施,用户名和密码可以直接在 URI 中传递,也可以通过ES_USERNAME
和ES_PASSWORD
环境变量传递。如果在 URI 中传递了身份验证,那么指定这两个环境变量以后将覆盖URI 里的内容。
Elasticsearch 7.x 支持 RBAC,这些参数可以在 Elasticsearch Exporter 里使用。
Setting |Privilege Required |Description
| --- | --- | --- |
exporter defaults |cluster monitor |All cluster read-only operations, like cluster health and state, hot threads, node info, node and cluster stats, and pending cluster tasks.
es.cluster_settings |cluster monitor
es.indices |indices monitor (per index or ) |All actions that are required for monitoring (recovery, segments info, index stats and status)
es.indices_settings |indices monitor (per index or )
es.shards |not sure if indices or cluster monitor or both
es.snapshots |cluster:admin/snapshot/status and cluster:admin/repository/get |ES Forum Post
指标
Elasticsearch Exporter 可以导出很多指标,如下所示:
指标 | 类型 | 基数 | 帮助信息 |
---|---|---|---|
elasticsearch_breakers_estimated_size_bytes | gauge | 4 | Estimated size in bytes of breaker |
elasticsearch_breakers_limit_size_bytes | gauge | 4 | Limit size in bytes for breaker |
elasticsearch_breakers_tripped | counter | 4 | tripped for breaker |
elasticsearch_cluster_health_active_primary_shards | gauge | 1 | The number of primary shards in your cluster. This is an aggregate total across all indices. |
elasticsearch_cluster_health_active_shards | gauge | 1 | Aggregate total of all shards across all indices, which includes replica shards. |
elasticsearch_cluster_health_delayed_unassigned_shards | gauge | 1 | Shards delayed to reduce reallocation overhead |
elasticsearch_cluster_health_initializing_shards | gauge | 1 | Count of shards that are being freshly created. |
elasticsearch_cluster_health_number_of_data_nodes | gauge | 1 | Number of data nodes in the cluster. |
elasticsearch_cluster_health_number_of_in_flight_fetch | gauge | 1 | The number of ongoing shard info requests. |
elasticsearch_cluster_health_number_of_nodes | gauge | 1 | Number of nodes in the cluster. |
elasticsearch_cluster_health_number_of_pending_tasks | gauge | 1 | Cluster level changes which have not yet been executed |
elasticsearch_cluster_health_task_max_waiting_in_queue_millis | gauge | 1 | Max time in millis that a task is waiting in queue. |
elasticsearch_cluster_health_relocating_shards | gauge | 1 | The number of shards that are currently moving from one node to another node. |
elasticsearch_cluster_health_status | gauge | 3 | Whether all primary and replica shards are allocated. |
elasticsearch_cluster_health_timed_out | gauge | 1 | Number of cluster health checks timed out |
elasticsearch_cluster_health_unassigned_shards | gauge | 1 | The number of shards that exist in the cluster state, but cannot be found in the cluster itself. |
elasticsearch_clustersettings_stats_max_shards_per_node | gauge | 0 | Current maximum number of shards per node setting. |
elasticsearch_filesystem_data_available_bytes | gauge | 1 | Available space on block device in bytes |
elasticsearch_filesystem_data_free_bytes | gauge | 1 | Free space on block device in bytes |
elasticsearch_filesystem_data_size_bytes | gauge | 1 | Size of block device in bytes |
elasticsearch_filesystem_io_stats_device_operations_count | gauge | 1 | Count of disk operations |
elasticsearch_filesystem_io_stats_device_read_operations_count | gauge | 1 | Count of disk read operations |
elasticsearch_filesystem_io_stats_device_write_operations_count | gauge | 1 | Count of disk write operations |
elasticsearch_filesystem_io_stats_device_read_size_kilobytes_sum | gauge | 1 | Total kilobytes read from disk |
elasticsearch_filesystem_io_stats_device_write_size_kilobytes_sum | gauge | 1 | Total kilobytes written to disk |
elasticsearch_indices_active_queries | gauge | 1 | The number of currently active queries |
elasticsearch_indices_docs | gauge | 1 | Count of documents on this node |
elasticsearch_indices_docs_deleted | gauge | 1 | Count of deleted documents on this node |
elasticsearch_indices_docs_primary | gauge | Count of documents with only primary shards on all nodes | |
elasticsearch_indices_fielddata_evictions | counter | 1 | Evictions from field data |
elasticsearch_indices_fielddata_memory_size_bytes | gauge | 1 | Field data cache memory usage in bytes |
elasticsearch_indices_filter_cache_evictions | counter | 1 | Evictions from filter cache |
elasticsearch_indices_filter_cache_memory_size_bytes | gauge | 1 | Filter cache memory usage in bytes |
elasticsearch_indices_flush_time_seconds | counter | 1 | Cumulative flush time in seconds |
elasticsearch_indices_flush_total | counter | 1 | Total flushes |
elasticsearch_indices_get_exists_time_seconds | counter | 1 | Total time get exists in seconds |
elasticsearch_indices_get_exists_total | counter | 1 | Total get exists operations |
elasticsearch_indices_get_missing_time_seconds | counter | 1 | Total time of get missing in seconds |
elasticsearch_indices_get_missing_total | counter | 1 | Total get missing |
elasticsearch_indices_get_time_seconds | counter | 1 | Total get time in seconds |
elasticsearch_indices_get_total | counter | 1 | Total get |
elasticsearch_indices_indexing_delete_time_seconds_total | counter | 1 | Total time indexing delete in seconds |
elasticsearch_indices_indexing_delete_total | counter | 1 | Total indexing deletes |
elasticsearch_indices_index_current | gauge | 1 | The number of documents currently being indexed to an index |
elasticsearch_indices_indexing_index_time_seconds_total | counter | 1 | Cumulative index time in seconds |
elasticsearch_indices_indexing_index_total | counter | 1 | Total index calls |
elasticsearch_indices_mappings_stats_fields | gauge | 1 | Count of fields currently mapped by index |
elasticsearch_indices_mappings_stats_json_parse_failures_total | counter | 0 | Number of errors while parsing JSON |
elasticsearch_indices_mappings_stats_scrapes_total | counter | 0 | Current total ElasticSearch Indices Mappings scrapes |
elasticsearch_indices_mappings_stats_up | gauge | 0 | Was the last scrape of the ElasticSearch Indices Mappings endpoint successful |
elasticsearch_indices_merges_docs_total | counter | 1 | Cumulative docs merged |
elasticsearch_indices_merges_total | counter | 1 | Total merges |
elasticsearch_indices_merges_total_size_bytes_total | counter | 1 | Total merge size in bytes |
elasticsearch_indices_merges_total_time_seconds_total | counter | 1 | Total time spent merging in seconds |
elasticsearch_indices_query_cache_cache_total | counter | 1 | Count of query cache |
elasticsearch_indices_query_cache_cache_size | gauge | 1 | Size of query cache |
elasticsearch_indices_query_cache_count | counter | 2 | Count of query cache hit/miss |
elasticsearch_indices_query_cache_evictions | counter | 1 | Evictions from query cache |
elasticsearch_indices_query_cache_memory_size_bytes | gauge | 1 | Query cache memory usage in bytes |
elasticsearch_indices_query_cache_total | counter | 1 | Size of query cache total |
elasticsearch_indices_refresh_time_seconds_total | counter | 1 | Total time spent refreshing in seconds |
elasticsearch_indices_refresh_total | counter | 1 | Total refreshes |
elasticsearch_indices_request_cache_count | counter | 2 | Count of request cache hit/miss |
elasticsearch_indices_request_cache_evictions | counter | 1 | Evictions from request cache |
elasticsearch_indices_request_cache_memory_size_bytes | gauge | 1 | Request cache memory usage in bytes |
elasticsearch_indices_search_fetch_time_seconds | counter | 1 | Total search fetch time in seconds |
elasticsearch_indices_search_fetch_total | counter | 1 | Total number of fetches |
elasticsearch_indices_search_query_time_seconds | counter | 1 | Total search query time in seconds |
elasticsearch_indices_search_query_total | counter | 1 | Total number of queries |
elasticsearch_indices_segments_count | gauge | 1 | Count of index segments on this node |
elasticsearch_indices_segments_memory_bytes | gauge | 1 | Current memory size of segments in bytes |
elasticsearch_indices_settings_stats_read_only_indices | gauge | 1 | Count of indices that have read_only_allow_delete=true |
elasticsearch_indices_settings_total_fields | gauge | Index setting value for index.mapping.total_fields.limit (total allowable mapped fields in a index) | |
elasticsearch_indices_shards_docs | gauge | 3 | Count of documents on this shard |
elasticsearch_indices_shards_docs_deleted | gauge | 3 | Count of deleted documents on each shard |
elasticsearch_indices_store_size_bytes | gauge | 1 | Current size of stored index data in bytes |
elasticsearch_indices_store_size_bytes_primary | gauge | Current size of stored index data in bytes with only primary shards on all nodes | |
elasticsearch_indices_store_size_bytes_total | gauge | Current size of stored index data in bytes with all shards on all nodes | |
elasticsearch_indices_store_throttle_time_seconds_total | counter | 1 | Throttle time for index store in seconds |
elasticsearch_indices_translog_operations | counter | 1 | Total translog operations |
elasticsearch_indices_translog_size_in_bytes | counter | 1 | Total translog size in bytes |
elasticsearch_indices_warmer_time_seconds_total | counter | 1 | Total warmer time in seconds |
elasticsearch_indices_warmer_total | counter | 1 | Total warmer count |
elasticsearch_jvm_gc_collection_seconds_count | counter | 2 | Count of JVM GC runs |
elasticsearch_jvm_gc_collection_seconds_sum | counter | 2 | GC run time in seconds |
elasticsearch_jvm_memory_committed_bytes | gauge | 2 | JVM memory currently committed by area |
elasticsearch_jvm_memory_max_bytes | gauge | 1 | JVM memory max |
elasticsearch_jvm_memory_used_bytes | gauge | 2 | JVM memory currently used by area |
elasticsearch_jvm_memory_pool_used_bytes | gauge | 3 | JVM memory currently used by pool |
elasticsearch_jvm_memory_pool_max_bytes | counter | 3 | JVM memory max by pool |
elasticsearch_jvm_memory_pool_peak_used_bytes | counter | 3 | JVM memory peak used by pool |
elasticsearch_jvm_memory_pool_peak_max_bytes | counter | 3 | JVM memory peak max by pool |
elasticsearch_os_cpu_percent | gauge | 1 | Percent CPU used by the OS |
elasticsearch_os_load1 | gauge | 1 | Shortterm load average |
elasticsearch_os_load5 | gauge | 1 | Midterm load average |
elasticsearch_os_load15 | gauge | 1 | Longterm load average |
elasticsearch_process_cpu_percent | gauge | 1 | Percent CPU used by process |
elasticsearch_process_cpu_time_seconds_sum | counter | 3 | Process CPU time in seconds |
elasticsearch_process_mem_resident_size_bytes | gauge | 1 | Resident memory in use by process in bytes |
elasticsearch_process_mem_share_size_bytes | gauge | 1 | Shared memory in use by process in bytes |
elasticsearch_process_mem_virtual_size_bytes | gauge | 1 | Total virtual memory used in bytes |
elasticsearch_process_open_files_count | gauge | 1 | Open file descriptors |
elasticsearch_snapshot_stats_number_of_snapshots | gauge | 1 | Total number of snapshots |
elasticsearch_snapshot_stats_oldest_snapshot_timestamp | gauge | 1 | Oldest snapshot timestamp |
elasticsearch_snapshot_stats_snapshot_start_time_timestamp | gauge | 1 | Last snapshot start timestamp |
elasticsearch_snapshot_stats_latest_snapshot_timestamp_seconds | gauge | 1 | Timestamp of the latest SUCCESS or PARTIAL snapshot |
elasticsearch_snapshot_stats_snapshot_end_time_timestamp | gauge | 1 | Last snapshot end timestamp |
elasticsearch_snapshot_stats_snapshot_number_of_failures | gauge | 1 | Last snapshot number of failures |
elasticsearch_snapshot_stats_snapshot_number_of_indices | gauge | 1 | Last snapshot number of indices |
elasticsearch_snapshot_stats_snapshot_failed_shards | gauge | 1 | Last snapshot failed shards |
elasticsearch_snapshot_stats_snapshot_successful_shards | gauge | 1 | Last snapshot successful shards |
elasticsearch_snapshot_stats_snapshot_total_shards | gauge | 1 | Last snapshot total shard |
elasticsearch_thread_pool_active_count | gauge | 14 | Thread Pool threads active |
elasticsearch_thread_pool_completed_count | counter | 14 | Thread Pool operations completed |
elasticsearch_thread_pool_largest_count | gauge | 14 | Thread Pool largest threads count |
elasticsearch_thread_pool_queue_count | gauge | 14 | Thread Pool operations queued |
elasticsearch_thread_pool_rejected_count | counter | 14 | Thread Pool operations rejected |
elasticsearch_thread_pool_threads_count | gauge | 14 | Thread Pool current threads count |
elasticsearch_transport_rx_packets_total | counter | 1 | Count of packets received |
elasticsearch_transport_rx_size_bytes_total | counter | 1 | Total number of bytes received |
elasticsearch_transport_tx_packets_total | counter | 1 | Count of packets sent |
elasticsearch_transport_tx_size_bytes_total | counter | 1 | Total number of bytes sent |
elasticsearch_clusterinfo_last_retrieval_success_ts | gauge | 1 | Timestamp of the last successful cluster info retrieval |
elasticsearch_clusterinfo_up | gauge | 1 | Up metric for the cluster info collector |
elasticsearch_clusterinfo_version_info | gauge | 6 | Constant metric with ES version information as labels |
Elasticsearch Exporter 提供了一些告警规则 alerts and recording rules 在这里有一个 Grafana 的 Dashboard 模板 和 Kubernetes 里的部署 Deployment 的 yaml 文件。
这个示例的 Grafana Dashboard 需要安装 node_exporter 。