Elasticsearch Exporter

Elasticsearch 是一个实时的分布式存储、搜索、分析的引擎,官网是 https://www.elastic.co/cn/elasticsearch/ 。 很多公司在用这个,你的公司大概率也有,那么对于 Elasticsearch 该怎么监控呢?针对这个问题,Prometheus Community 开发了一个 Elasticsearch Exporter ,用来导出 Elasticsearch 的监控数据,仓库地址是 https://github.com/prometheus-community/elasticsearch_exporter ,用 Golang 编写。

Elasticsearch Exporter 当前版本是 1.3.0 ,发布于 2021.10.21

安装和配置

安装是比较简单的,下载最新版本的二进制包,或者下载 Docker image 。启动 Docker image 可以使用

docker run --rm -p 9114:9114 quay.io/prometheuscommunity/elasticsearch-exporter:latest --es.uri=http://localhost:9200

在 Kubernetes 里使用 helm 部署可以参考 https://github.com/kubernetes/charts/tree/master/stable/elasticsearch-exporter

每次从这个 Exporter 获取 Elasticsearch 的监控数据都会从 Elasticsearch 服务拉取一次,如果获取时间间隔太短的话,可能会增大服务端的压力,尤其是使用 --es.all--es.indices 运行 Exporter 的时候。所以在实际使用过程中,建议大家在确定获取数据间隔的时候,先测量获取 /_nodes/stats/_all/_stats 指标的时间,防止监控对服务造成影响。

Elasticsearch Exporter 启动的时候有如下启动参数可以选择。

参数 |出现的版本 |描述 |缺省值 | --- | --- | --- | --- | es.uri |1.0.2 |Address (host and port) of the Elasticsearch node we should connect to. This could be a local node (localhost:9200, for instance), or the address of a remote Elasticsearch server. When basic auth is needed, specify as: ://:<password>@:. E.G., http://admin:pass@localhost:9200. Special characters in the user credentials need to be URL-encoded. |http://localhost:9200 es.all |1.0.2 |If true, query stats for all nodes in the cluster, rather than just the node we connect to. |false es.cluster_settings |1.1.0rc1 |If true, query stats for cluster settings. |false es.indices |1.0.2 |If true, query stats for all indices in the cluster. |false es.indices_settings |1.0.4rc1 |If true, query settings stats for all indices in the cluster. |false es.indices_mappings |1.2.0 |If true, query stats for mappings of all indices of the cluster. |false es.shards |1.0.3rc1 |If true, query stats for all indices in the cluster, including shard-level stats (implies es.indices=true). |false es.snapshots |1.0.4rc1 |If true, query stats for the cluster snapshots. |false es.timeout |1.0.2 |Timeout for trying to get stats from Elasticsearch. (ex: 20s) |5s es.ca |1.0.2 |Path to PEM file that contains trusted Certificate Authorities for the Elasticsearch connection.
es.client-private-key |1.0.2 |Path to PEM file that contains the private key for client auth when connecting to Elasticsearch.
es.client-cert |1.0.2 |Path to PEM file that contains the corresponding cert for the private key to connect to Elasticsearch.
es.clusterinfo.interval |1.1.0rc1 |Cluster info update interval for the cluster label |5m es.ssl-skip-verify |1.0.4rc1 |Skip SSL verification when connecting to Elasticsearch. |false es.apiKey |unreleased |API Key to use for authenticating against Elasticsearch.
web.listen-address |1.0.2 |Address to listen on for web interface and telemetry. |:9114 web.telemetry-path |1.0.2 |Path under which to expose metrics. |/metrics version |1.0.2 |Show version info on stdout and exit.

命令行参数以单个参数开始,用于版本低于 1.1.0rc1 。对于大于 1.1.0rc1 的版本,命令行参数用 -- 指定。此外,所有命令行参数都可以作为环境变量提供。

对于 Elasticsearch 7.x 版本提供了一些安全性的措施,用户名和密码可以直接在 URI 中传递,也可以通过ES_USERNAMEES_PASSWORD环境变量传递。如果在 URI 中传递了身份验证,那么指定这两个环境变量以后将覆盖URI 里的内容。

Elasticsearch 7.x 支持 RBAC,这些参数可以在 Elasticsearch Exporter 里使用。

Setting |Privilege Required |Description | --- | --- | --- | exporter defaults |cluster monitor |All cluster read-only operations, like cluster health and state, hot threads, node info, node and cluster stats, and pending cluster tasks. es.cluster_settings |cluster monitor
es.indices |indices monitor (per index or ) |All actions that are required for monitoring (recovery, segments info, index stats and status) es.indices_settings |indices monitor (per index or )
es.shards |not sure if indices or cluster monitor or both
es.snapshots |cluster:admin/snapshot/status and cluster:admin/repository/get |ES Forum Post

指标

Elasticsearch Exporter 可以导出很多指标,如下所示:

指标 |类型 |基数 |帮助信息 |---- |---- |----------- |---- | elasticsearch_breakers_estimated_size_bytes | gauge | 4 | Estimated size in bytes of breaker | elasticsearch_breakers_limit_size_bytes | gauge | 4 | Limit size in bytes for breaker | elasticsearch_breakers_tripped | counter | 4 | tripped for breaker | elasticsearch_cluster_health_active_primary_shards | gauge | 1 | The number of primary shards in your cluster. This is an aggregate total across all indices. | elasticsearch_cluster_health_active_shards | gauge | 1 | Aggregate total of all shards across all indices, which includes replica shards. | elasticsearch_cluster_health_delayed_unassigned_shards | gauge | 1 | Shards delayed to reduce reallocation overhead | elasticsearch_cluster_health_initializing_shards | gauge | 1 | Count of shards that are being freshly created. | elasticsearch_cluster_health_number_of_data_nodes | gauge | 1 | Number of data nodes in the cluster. | elasticsearch_cluster_health_number_of_in_flight_fetch | gauge | 1 | The number of ongoing shard info requests. | elasticsearch_cluster_health_number_of_nodes | gauge | 1 | Number of nodes in the cluster. | elasticsearch_cluster_health_number_of_pending_tasks | gauge | 1 | Cluster level changes which have not yet been executed | elasticsearch_cluster_health_task_max_waiting_in_queue_millis | gauge | 1 | Max time in millis that a task is waiting in queue. | elasticsearch_cluster_health_relocating_shards | gauge | 1 | The number of shards that are currently moving from one node to another node. | elasticsearch_cluster_health_status | gauge | 3 | Whether all primary and replica shards are allocated. | elasticsearch_cluster_health_timed_out | gauge | 1 | Number of cluster health checks timed out | elasticsearch_cluster_health_unassigned_shards | gauge | 1 | The number of shards that exist in the cluster state, but cannot be found in the cluster itself. | elasticsearch_clustersettings_stats_max_shards_per_node | gauge | 0 | Current maximum number of shards per node setting. | elasticsearch_filesystem_data_available_bytes | gauge | 1 | Available space on block device in bytes | elasticsearch_filesystem_data_free_bytes | gauge | 1 | Free space on block device in bytes | elasticsearch_filesystem_data_size_bytes | gauge | 1 | Size of block device in bytes | elasticsearch_filesystem_io_stats_device_operations_count | gauge | 1 | Count of disk operations | elasticsearch_filesystem_io_stats_device_read_operations_count | gauge | 1 | Count of disk read operations | elasticsearch_filesystem_io_stats_device_write_operations_count | gauge | 1 | Count of disk write operations | elasticsearch_filesystem_io_stats_device_read_size_kilobytes_sum | gauge | 1 | Total kilobytes read from disk | elasticsearch_filesystem_io_stats_device_write_size_kilobytes_sum | gauge | 1 | Total kilobytes written to disk | elasticsearch_indices_active_queries | gauge | 1 | The number of currently active queries | elasticsearch_indices_docs | gauge | 1 | Count of documents on this node | elasticsearch_indices_docs_deleted | gauge | 1 | Count of deleted documents on this node | elasticsearch_indices_docs_primary | gauge | | Count of documents with only primary shards on all nodes | elasticsearch_indices_fielddata_evictions | counter | 1 | Evictions from field data | elasticsearch_indices_fielddata_memory_size_bytes | gauge | 1 | Field data cache memory usage in bytes | elasticsearch_indices_filter_cache_evictions | counter | 1 | Evictions from filter cache | elasticsearch_indices_filter_cache_memory_size_bytes | gauge | 1 | Filter cache memory usage in bytes | elasticsearch_indices_flush_time_seconds | counter | 1 | Cumulative flush time in seconds | elasticsearch_indices_flush_total | counter | 1 | Total flushes | elasticsearch_indices_get_exists_time_seconds | counter | 1 | Total time get exists in seconds | elasticsearch_indices_get_exists_total | counter | 1 | Total get exists operations | elasticsearch_indices_get_missing_time_seconds | counter | 1 | Total time of get missing in seconds | elasticsearch_indices_get_missing_total | counter | 1 | Total get missing | elasticsearch_indices_get_time_seconds | counter | 1 | Total get time in seconds | elasticsearch_indices_get_total | counter | 1 | Total get | elasticsearch_indices_indexing_delete_time_seconds_total | counter | 1 | Total time indexing delete in seconds | elasticsearch_indices_indexing_delete_total | counter | 1 | Total indexing deletes | elasticsearch_indices_index_current | gauge | 1 | The number of documents currently being indexed to an index | elasticsearch_indices_indexing_index_time_seconds_total | counter | 1 | Cumulative index time in seconds | elasticsearch_indices_indexing_index_total | counter | 1 | Total index calls | elasticsearch_indices_mappings_stats_fields | gauge | 1 | Count of fields currently mapped by index | elasticsearch_indices_mappings_stats_json_parse_failures_total | counter | 0 | Number of errors while parsing JSON | elasticsearch_indices_mappings_stats_scrapes_total | counter | 0 | Current total ElasticSearch Indices Mappings scrapes | elasticsearch_indices_mappings_stats_up | gauge | 0 | Was the last scrape of the ElasticSearch Indices Mappings endpoint successful | elasticsearch_indices_merges_docs_total | counter | 1 | Cumulative docs merged | elasticsearch_indices_merges_total | counter | 1 | Total merges | elasticsearch_indices_merges_total_size_bytes_total | counter | 1 | Total merge size in bytes | elasticsearch_indices_merges_total_time_seconds_total | counter | 1 | Total time spent merging in seconds | elasticsearch_indices_query_cache_cache_total | counter | 1 | Count of query cache | elasticsearch_indices_query_cache_cache_size | gauge | 1 | Size of query cache | elasticsearch_indices_query_cache_count | counter | 2 | Count of query cache hit/miss | elasticsearch_indices_query_cache_evictions | counter | 1 | Evictions from query cache | elasticsearch_indices_query_cache_memory_size_bytes | gauge | 1 | Query cache memory usage in bytes | elasticsearch_indices_query_cache_total | counter | 1 | Size of query cache total | elasticsearch_indices_refresh_time_seconds_total | counter | 1 | Total time spent refreshing in seconds | elasticsearch_indices_refresh_total | counter | 1 | Total refreshes | elasticsearch_indices_request_cache_count | counter | 2 | Count of request cache hit/miss | elasticsearch_indices_request_cache_evictions | counter | 1 | Evictions from request cache | elasticsearch_indices_request_cache_memory_size_bytes | gauge | 1 | Request cache memory usage in bytes | elasticsearch_indices_search_fetch_time_seconds | counter | 1 | Total search fetch time in seconds | elasticsearch_indices_search_fetch_total | counter | 1 | Total number of fetches | elasticsearch_indices_search_query_time_seconds | counter | 1 | Total search query time in seconds | elasticsearch_indices_search_query_total | counter | 1 | Total number of queries | elasticsearch_indices_segments_count | gauge | 1 | Count of index segments on this node | elasticsearch_indices_segments_memory_bytes | gauge | 1 | Current memory size of segments in bytes | elasticsearch_indices_settings_stats_read_only_indices | gauge | 1 | Count of indices that have read_only_allow_delete=true | elasticsearch_indices_settings_total_fields | gauge | | Index setting value for index.mapping.total_fields.limit (total allowable mapped fields in a index) | elasticsearch_indices_shards_docs | gauge | 3 | Count of documents on this shard | elasticsearch_indices_shards_docs_deleted | gauge | 3 | Count of deleted documents on each shard | elasticsearch_indices_store_size_bytes | gauge | 1 | Current size of stored index data in bytes | elasticsearch_indices_store_size_bytes_primary | gauge | | Current size of stored index data in bytes with only primary shards on all nodes | elasticsearch_indices_store_size_bytes_total | gauge | | Current size of stored index data in bytes with all shards on all nodes | elasticsearch_indices_store_throttle_time_seconds_total | counter | 1 | Throttle time for index store in seconds | elasticsearch_indices_translog_operations | counter | 1 | Total translog operations | elasticsearch_indices_translog_size_in_bytes | counter | 1 | Total translog size in bytes | elasticsearch_indices_warmer_time_seconds_total | counter | 1 | Total warmer time in seconds | elasticsearch_indices_warmer_total | counter | 1 | Total warmer count | elasticsearch_jvm_gc_collection_seconds_count | counter | 2 | Count of JVM GC runs | elasticsearch_jvm_gc_collection_seconds_sum | counter | 2 | GC run time in seconds | elasticsearch_jvm_memory_committed_bytes | gauge | 2 | JVM memory currently committed by area | elasticsearch_jvm_memory_max_bytes | gauge | 1 | JVM memory max | elasticsearch_jvm_memory_used_bytes | gauge | 2 | JVM memory currently used by area | elasticsearch_jvm_memory_pool_used_bytes | gauge | 3 | JVM memory currently used by pool | elasticsearch_jvm_memory_pool_max_bytes | counter | 3 | JVM memory max by pool | elasticsearch_jvm_memory_pool_peak_used_bytes | counter | 3 | JVM memory peak used by pool | elasticsearch_jvm_memory_pool_peak_max_bytes | counter | 3 | JVM memory peak max by pool | elasticsearch_os_cpu_percent | gauge | 1 | Percent CPU used by the OS | elasticsearch_os_load1 | gauge | 1 | Shortterm load average | elasticsearch_os_load5 | gauge | 1 | Midterm load average | elasticsearch_os_load15 | gauge | 1 | Longterm load average | elasticsearch_process_cpu_percent | gauge | 1 | Percent CPU used by process | elasticsearch_process_cpu_time_seconds_sum | counter | 3 | Process CPU time in seconds | elasticsearch_process_mem_resident_size_bytes | gauge | 1 | Resident memory in use by process in bytes | elasticsearch_process_mem_share_size_bytes | gauge | 1 | Shared memory in use by process in bytes | elasticsearch_process_mem_virtual_size_bytes | gauge | 1 | Total virtual memory used in bytes | elasticsearch_process_open_files_count | gauge | 1 | Open file descriptors | elasticsearch_snapshot_stats_number_of_snapshots | gauge | 1 | Total number of snapshots | elasticsearch_snapshot_stats_oldest_snapshot_timestamp | gauge | 1 | Oldest snapshot timestamp | elasticsearch_snapshot_stats_snapshot_start_time_timestamp | gauge | 1 | Last snapshot start timestamp | elasticsearch_snapshot_stats_latest_snapshot_timestamp_seconds | gauge | 1 | Timestamp of the latest SUCCESS or PARTIAL snapshot | elasticsearch_snapshot_stats_snapshot_end_time_timestamp | gauge | 1 | Last snapshot end timestamp | elasticsearch_snapshot_stats_snapshot_number_of_failures | gauge | 1 | Last snapshot number of failures | elasticsearch_snapshot_stats_snapshot_number_of_indices | gauge | 1 | Last snapshot number of indices | elasticsearch_snapshot_stats_snapshot_failed_shards | gauge | 1 | Last snapshot failed shards | elasticsearch_snapshot_stats_snapshot_successful_shards | gauge | 1 | Last snapshot successful shards | elasticsearch_snapshot_stats_snapshot_total_shards | gauge | 1 | Last snapshot total shard | elasticsearch_thread_pool_active_count | gauge | 14 | Thread Pool threads active | elasticsearch_thread_pool_completed_count | counter | 14 | Thread Pool operations completed | elasticsearch_thread_pool_largest_count | gauge | 14 | Thread Pool largest threads count | elasticsearch_thread_pool_queue_count | gauge | 14 | Thread Pool operations queued | elasticsearch_thread_pool_rejected_count | counter | 14 | Thread Pool operations rejected | elasticsearch_thread_pool_threads_count | gauge | 14 | Thread Pool current threads count | elasticsearch_transport_rx_packets_total | counter | 1 | Count of packets received | elasticsearch_transport_rx_size_bytes_total | counter | 1 | Total number of bytes received | elasticsearch_transport_tx_packets_total | counter | 1 | Count of packets sent | elasticsearch_transport_tx_size_bytes_total | counter | 1 | Total number of bytes sent | elasticsearch_clusterinfo_last_retrieval_success_ts | gauge | 1 | Timestamp of the last successful cluster info retrieval | elasticsearch_clusterinfo_up | gauge | 1 | Up metric for the cluster info collector | elasticsearch_clusterinfo_version_info | gauge | 6 | Constant metric with ES version information as labels

Elasticsearch Exporter 提供了一些告警规则 alerts and recording rules 在这里有一个 Grafana 的 Dashboard 模板 和 Kubernetes 里的部署 Deployment 的 yaml 文件。

这个示例的 Grafana Dashboard 需要安装 node_exporter

results matching ""

    No results matching ""