Commit Graph

33 Commits

Author SHA1 Message Date
5fda0b60bc telegraf: run as root + bypass gosu drop to read docker.sock GID-independently
The container ran as user 0:994 and accessed the docker socket via group
membership, but the host docker group GID is auto-assigned and varies per
host (e.g. uk-8 is 988, not 994), so the hardcoded gid silently breaks
telegraf's docker input wherever it differs (uk-8 was in a restart loop:
permission denied on /var/run/docker.sock). Run as root (0:0) with
entrypoint [telegraf] to skip the image's gosu privilege-drop, so telegraf
reads the socket as its owner regardless of the host docker gid. Works
uniformly fleet-wide; no regression on hosts where the gid happened to match.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 15:11:26 +00:00
9ce0e14cd6 Revert "monitoring: enable cadvisor diskIO metrics"
This reverts commit a5081013f3.
2026-06-16 10:17:23 +00:00
941a0aa691 Revert "monitoring: run node_exporter on host network"
This reverts commit d48713cb15.
2026-06-16 10:10:01 +00:00
d48713cb15 monitoring: run node_exporter on host network
netdev (netlink) was reading the container's own veth (idle) -> node_network_*
showed ~0 on every host. Host network lets it see real host interfaces, so
bandwidth (egress vs tariff quota) works. Removed networks/expose (exclusive
with network_mode:host).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 10:06:55 +00:00
a5081013f3 monitoring: enable cadvisor diskIO metrics
Adds diskIO to cadvisor --enable_metrics so per-container disk read/write
(container_fs_reads/writes_bytes_total) is exposed — lets node attribution
name which node is the noisy IO neighbor, not just flag the host.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 10:01:33 +00:00
6d65582af4 monitoring: scrape cadvisor/nodeexporter/telegraf at /metrics not /
prometheus-scrape.metrics_path was '/' which made prometheus-docker-sd scrape
the HTML root and fail with 'INVALID is not a valid start token', leaving the
targets up=0. Fixes per-container (cadvisor) + host (node_exporter) metrics so
they can be wired into the DRPC insights MCP for per-node resource attribution.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 09:06:57 +00:00
5687d74a62 Pin Loki to version 3.4.3 to trigger container restart
Co-Authored-By: Claude Agent <claude@stakesquid.eu>
2026-01-28 13:49:05 +00:00
Para Dox
6c63b34a20 compress grafana 2025-06-15 20:53:57 +07:00
squidbear
9b619cb611 add monitoring 2025-06-14 08:35:27 +02:00
Sebastian
05c87006ac remove unnecessary default monitoring tools due to weirtd load characteristics on small machines 2024-12-22 07:04:20 +01:00
Sebastian
2ab53a937a fix 2024-09-03 15:21:50 +02:00
Sebastian
31ca3b5482 fix 2024-09-03 15:15:03 +02:00
Sebastian
fc980407c2 update 2024-04-17 08:45:56 +02:00
Sebastian
03cd9ae1d0 update 2024-04-17 08:43:18 +02:00
Sebastian
da3526a50b update 2024-04-17 08:41:00 +02:00
Sebastian
e1bca8337f update 2024-04-17 08:37:51 +02:00
cventastic
0136227b4c forgot metric ports 2022-09-07 23:16:13 +02:00
cventastic
f6d85e4014 add monitoring network 2022-09-07 23:08:31 +02:00
cventastic
22c91bf68c add monitoring network 2022-09-07 22:29:48 +02:00
cventastic
a50380028f bump monitoring.yml 2022-09-06 17:15:44 +02:00
cventastic
b966320f77 conflicting parameters expose/network_mode 2022-06-28 12:26:32 +02:00
cventastic
dbdb6e963c conflicting parameters port/network_mode 2022-06-28 12:25:39 +02:00
cventastic
7235941152 put promehteus into wireguard network 2022-06-28 12:24:03 +02:00
cventastic
e3b2558333 added forked prometheus-docker-sd 2022-06-28 11:07:51 +02:00
cventastic
b625adc085 format still doesnt work 2022-06-23 14:36:51 +02:00
cventastic
8a6cd18038 format doesnt work 2022-06-23 14:33:02 +02:00
cventastic
1b1913d335 overhaul monitoring 2022-06-23 14:30:17 +02:00
cventastic
33e2908492 put prometheus into chains network... 2022-03-16 14:19:59 +01:00
cventastic
b10221e068 remove internal network...
cant reach prometheus for queries this way
2022-03-16 14:17:21 +01:00
cventastic
766caca32d fix typo 2022-03-16 14:06:02 +01:00
cventastic
892c5eb1e6 prom autodiscovery 2022-03-16 14:01:50 +01:00
czarly
1d03732438 fix 2022-03-14 18:01:31 +04:00
czarly
6ddc7d5926 split up the big files 2022-03-14 17:49:20 +04:00