Prometheus 故障汇总(一)

1. out of bounds

日志详情

level=warn ts=2021-01-19T09:16:01.434Z caller=scrape.go:1378 component="scrape manager" scrape_pool=promtsdb target=http://127.0.0.1:9090/metrics msg="Error on ingesting samples that are too old or are too far into the future" num_dropped=580
level=warn ts=2021-01-19T09:16:01.434Z caller=scrape.go:1145 component="scrape manager" scrape_pool=promtsdb target=http://127.0.0.1:9090/metrics msg="Append failed" err="out of bounds"
level=warn ts=2021-01-19T09:16:01.434Z caller=scrape.go:1094 component="scrape manager" scrape_pool=promtsdb target=http://127.0.0.1:9090/metrics msg="Appending scrape report failed" err="out of bounds"

原因:Prometheus 机器长时间断电或者在打包 Prometheus 镜像时将一部分数据打包到镜像中,导致 Prometheus 的部分数据与当前时间相差较大。

处理方法 1 :新建数据目录,在 Prometheus 启动时指定新的数据目录。

处理方法 2 :清空数据目录,并且重启 Prometheus 服务。

参考资料: 参考 Issue #3930

2. CPU User counter jumped backwards

日志详情

level=warn ts=2021-03-08T20:24:12.523Z caller=cpu_linux.go:255 collector=cpu msg="CPU User counter jumped backwards" cpu=22 old_value=296678.2 new_value=296676.19
level=warn ts=2021-03-08T20:24:12.623Z caller=cpu_linux.go:267 collector=cpu msg="CPU System counter jumped backwards" cpu=22 old_value=137405.56 new_value=137405.28
level=warn ts=2021-03-08T20:24:12.623Z caller=cpu_linux.go:247 collector=cpu msg="CPU Idle counter jumped backwards, possible hotplug event, resetting CPU stats" cpu=23 old_value=1.089999957e+07 new_value=1.089999731e+07
level=warn ts=2021-03-08T20:32:31.229Z caller=cpu_linux.go:247 collector=cpu msg="CPU Idle counter jumped backwards, possible hotplug event, resetting CPU stats" cpu=0 old_value=1.087819428e+07 new_value=1.087816568e+07
level=warn ts=2021-03-08T20:32:31.229Z caller=cpu_linux.go:247 collector=cpu msg="CPU Idle counter jumped backwards, possible hotplug event, resetting CPU stats" cpu=1 old_value=1.090932216e+07 new_value=1.090929192e+07
level=warn ts=2021-03-08T20:32:31.229Z caller=cpu_linux.go:247 collector=cpu msg="CPU Idle counter jumped backwards, possible hotplug event, resetting CPU stats" cpu=2 old_value=1.097667619e+07 new_value=1.09766428e+07
level=warn ts=2021-03-08T20:32:31.229Z caller=cpu_linux.go:247 collector=cpu msg="CPU Idle counter jumped backwards, possible hotplug event, resetting CPU stats" cpu=3 old_value=1.102438945e+07 new_value=1.102435437e+07

原因:node-exporter 1.0.0 版本的问题,在下一个版本,将该信息调整为了 debug 级别。

解决办法: 升级 node-exporter 版本到 1.1.0+

参考链接

https://github.com/prometheus/node_exporter/issues/1903

results matching ""

    No results matching ""