Commit Graph

4651 Commits

Author SHA1 Message Date
6ce0fc2346 deploy: regenerate compose from vibe-node main c66ad823f9e3 2026-06-20 00:58:03 +00:00
7ce0428db4 erigon3: bump Dockerfile VERSION from v3.0.7 to v3.4.4 2026-06-19 16:36:18 +00:00
b72cf641a9 deploy: regenerate compose from vibe-node main 666f653d478a 2026-06-19 16:03:20 +00:00
5891d050f9 deploy: regenerate compose from vibe-node main 59f3f7f3cf2e 2026-06-19 14:40:16 +00:00
196744cf4f deploy: regenerate compose from vibe-node main 276ea4d82e62 2026-06-19 14:29:52 +00:00
rob
4dd902e9af Merge branch 'issue-63' 2026-06-19 14:19:50 +00:00
a7e9d4a65d deploy: regenerate compose from vibe-node main 60976572a3c4 2026-06-19 13:05:25 +00:00
sebastian
8c46b66bdc Merge pull request 'shibarium: build bor + heimdall from source (bone fork, chain 109)' (#12) from shibarium-bone-fork-fix into main
Reviewed-on: #12
2026-06-19 13:02:00 +00:00
rob
ee21a0245e shibarium: build bor + heimdall from source (no published images)
shibaone ships no docker images for the bone fork — only source + .deb config
packages — so both Dockerfiles must clone+build, not FROM a (nonexistent) image.

- bor.Dockerfile: clone shibaone/bor@${BOR_VERSION}, make bor, cp build/bin/bor
  (was: alpine + wrong /src/build/bor path). golang:1.22.1 like upstream.
- cometbft.Dockerfile: clone shibaone/heimdall@${CL_VERSION}, make install
  (was: FROM shibaone/heimdall:v1.0.7-bone — that tag does not exist on any
  registry), then layer the CometBFT init entrypoint.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-19 11:46:00 +00:00
65695472f1 shibarium: set heimdall CL_VERSION to v1.0.7-bone in cometbft.Dockerfile 2026-06-19 11:20:45 +00:00
bda550eef6 shibarium: add bone fork build-from-source assets
- Update heimdall init.sh with heimdall-109 chain_id and mainnet seeds
- Update cometbft.Dockerfile with v1.0.7-bone version
- Add bor.Dockerfile for building shibaone/bor from source at v1.3.9-bone

Generated by Mistral Vibe.
Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
2026-06-19 11:05:34 +00:00
59ff415fdb cronos: add mainnet EVM chain (chain-id 25) as client-type node
Add cronosd cometbft.Dockerfile and init.sh for Cronos EVM mainnet.
- Dockerfile: layer cometbft-common.sh + init.sh onto upstream cronos image
- init.sh: adapted from haqq pattern with EVM JSON-RPC on 8545, WS on 8546,
  CometBFT RPC on 26657, P2P on 10521, chain-id cronosmainnet_25-1
- Statesync via ct_configure_statesync, genesis from official repo
- Pruning: custom with keep-recent=100, interval=19

Generated by Mistral Vibe.
Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
2026-06-19 09:35:50 +00:00
efcb1f451a deploy: regenerate compose from vibe-node main 0fd15005e080 2026-06-19 09:22:31 +00:00
ee13d0dd23 deploy: regenerate compose from vibe-node main e5b5567e535a 2026-06-19 06:48:10 +00:00
6d8920d659 deploy: regenerate compose from vibe-node main fb44bc5499e8 2026-06-19 06:28:21 +00:00
sebastian
5076c90a12 Merge pull request 'Bump reth latest→v2.1.0' (#10) from vupd/53-reth-v2.1.0 into main
Reviewed-on: #10
2026-06-19 06:23:20 +00:00
bcdd950eb6 reth: bump Dockerfile RETH_VERSION from v1.4.3 to v2.1.0 2026-06-19 05:36:38 +00:00
a17a21b55f shibarium: add heimdall node ASSET (cometbft.Dockerfile, init.sh) 2026-06-19 05:25:31 +00:00
rob
aefcd41a88 status sweep: cap check-health per node (timeout) so one stuck node can't wedge fleet rpc-update
A hung check-health.sh (aztec-testnet, looping on an unresponsive reference RPC)
blocked show-status.sh's parallel 'wait' for 3.5h, hanging the whole fleet
rpc-update and holding the deploy lock. Each curl was bounded (-m 3) and the
retry loop capped (3x), but the call itself wasn't time-bounded.
- sync-status.sh: wrap each check-health.sh call in 'timeout ${HC_TIMEOUT:-30}'
  (-> exit 124 + 'timeout' status on overrun).
- show-status.sh: wrap the whole per-node sync-status.sh call in
  'timeout ${SYNC_TIMEOUT:-60}' so the parallel wait can never block forever.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-19 03:57:18 +00:00
9ee59cf9fa deploy: regenerate compose from vibe-node main 9053889ed7bb 2026-06-19 03:45:34 +00:00
rob
9ad7565f97 deploy: regenerate compose from vibe-node main f843789776db 2026-06-18 16:34:36 +00:00
sebastian
30e866802a Merge pull request 'check-health.sh: retry logic for hash comparison (fix false-positive forked)' (#8) from issue-42 into main
Reviewed-on: #8
2026-06-18 16:08:31 +00:00
rob
c56542ade0 deploy: regenerate compose from parent main 1ecbe0739ddc 2026-06-18 15:19:10 +00:00
rob
e9ed1c0cd3 check-health.sh: add retry logic for hash comparison to fix false-positive forked status 2026-06-18 14:49:34 +00:00
rob
a3a78cb3be deploy: regenerate compose from parent main 3cc8d26c8d58 2026-06-18 13:44:07 +00:00
rob
7d00f3a1ce deploy: regenerate compose from parent main 0abbb3abd857 2026-06-18 11:20:34 +00:00
rob
6bb0b19f45 Harden restore-volumes.sh against silent restore truncation and incomplete-download skips
- BUG 1: Add error checking after tar extraction for both LOCAL and REMOTE-CACHE branches
  - Check exit status of tar -I zstd -xf commands
  - Print error to stderr and exit non-zero on failure
  - Prevents silent truncation where corrupt/incomplete backup extracts partial data
  - Mirrors existing remote-STREAM branch error handling
- BUG 2: Fix REMOTE branch to resume incomplete aria2c downloads
  - Check for presence of <file>.aria2 control file as incomplete signal
  - aria2c -c continues/resumes download when .aria2 file exists
  - Only skip download when file exists AND no .aria2 control file remains
  - aria2 deletes .aria2 sidecar on successful completion, making it a reliable signal
- Maintain all existing flags: aria2c -c -Z -x8 -j8 -s8 -d
- Preserve reth guard logic and static-file offload behavior unchanged

Generated by Mistral Vibe.
Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
2026-06-17 16:22:50 +00:00
rob
6ddb18dbc5 Fix /slowdisk static-file offload: add reth guard and --keep-directory-symlink
- Add RETH GUARD to clone-backup.sh and clone-node.sh: when the config name
  contains 'reth', skip the whole /slowdisk static-file symlink offload and
  extract everything onto the primary disk (equivalent to --no-slowdisk).
  This matches the already-correct restore-volumes.sh behavior.
  Reason: reth refuses to start when its static_files directory is a symlink,
  failing at boot with 'failed to create dir static_files: File exists'.

- Add --keep-directory-symlink to all tar extraction options in both scripts
  for the SLOWDISK path. This allows tar to extract files THROUGH the
  pre-created directory symlinks instead of trying to mkdir over them
  (which fails with 'Cannot mkdir: File exists'). This matches the
  already-correct restore-volumes.sh behavior.

Generated by Mistral Vibe.
Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
2026-06-17 16:07:21 +00:00
rob
27d0ea0d28 op/doma: rename celestia relay to op-alt, add mainnet op-alt relay, vendor mainnet rollup.json
- Rename doma testnet relay from celestia to op-alt (ghcr.io/celestiaorg/op-alt-da)
- Add op-alt relay for doma mainnet with image tag 0.15.0 (not v0.15.0)
- Vendor complete mainnet rollup.json with alt_da (GenericCommitment)
- Fix per-network relay settings: namespace + CELESTIA_*_RPC env vars
- Switch mainnet op-geth to use vendored rollup.json (bind-mounted)

Generated by Mistral Vibe.
Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
2026-06-17 14:02:34 +00:00
rob
1944662053 op/doma/testnet: fix celestia relay image tag from v0.15.0 to 0.14.0-mocha
Generated by Mistral Vibe.
Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
2026-06-17 11:21:56 +00:00
rob
fc03d6ddf6 op/doma/testnet: add Celestia Alt-DA support (celestiaorg/op-alt-da:v0.15.0)
- Add relay template (templates/relays/celestia.doma.yml) for op-alt-da
- Add celestia relay config to op/doma stack in context.yml
- Add relay: celestia to op/doma testnet in config.yml
- Add op-node.doma.yml template with ALTDA env vars gated on relay_name
- Update op/doma/testnet/rollup.json with complete config including alt_da block

Generated by Mistral Vibe.
Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
2026-06-17 10:40:59 +00:00
4ea0bed344 doma-testnet: vendor rollup.json with chain_op_config
Conduit's testnet rollup.json (doma-dev-ix58nm4rnd) omits chain_op_config,
crashing op-node v1.19.0 with 'cfg.Rollup.ChainOpConfig is nil'. Vendor a
complete rollup.json (chain_op_config eip1559 6/50/250, confirmed identical
in mainnet rollup.json and testnet genesis optimism config) plus the
post-fjord fork times baked in, mounted via custom_config so op-node uses
the local file instead of curling Conduit's incomplete one.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-17 02:50:29 +00:00
f4b42ff530 doma-testnet: add op-reth variant on conduit-op-reth (chain 97476, L1 sepolia)
Doma testnet (Conduit slug doma-dev-ix58nm4rnd) on Conduit's purpose-built
conduit-op-reth image - better-tested for Conduit chains than upstream op-geth.
Fork times granite/holocene/isthmus/jovian (missing from Conduit's bare
rollup.json) supplied via env/op/doma/testnet/node.env. Experiment to validate
a clean canonical sync before redoing mainnet on op-reth.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-17 02:31:25 +00:00
9921391cfa doma-mainnet: add OP_NODE_OVERRIDE_JOVIAN=1769065201 (Conduit rollup.json omits it)
Conduit's bare rollup.json for doma-mainnet omits jovian_time entirely, but the
EL genesis activates Jovian at 1769065201. Without the override op-node has no
Jovian fork time and mis-derives at the boundary (chain forks off canonical).
Same fix pattern as bob-sepolia. Override comes from env/op/doma/mainnet/node.env.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-17 01:39:58 +00:00
cd7f05997b berachain: bump beacon-kit v1.3.4 -> v1.4.0-rc3 (Bepolia Fusaka/Fulu fork)
Bepolia activated the Fusaka/Fulu hard fork at 2026-05-27 16:00 UTC (unix
1779897600), which ships new EVM-inflation-withdrawal values. beacon-kit
v1.3.4 (pre-fork) crash-loops replaying the fork-boundary block 20513284:
'first withdrawal is not the EVM inflation withdrawal' (RestartCount 3742 on
us-41). The EL (bera-reth v1.4.0) already speaks the Fusaka engine API
(newPayloadV4P11 etc.) and accepted the post-fork block, so only the CL is
behind. v1.4.0-rc3 sets the Fulu fork time + new inflation values. Applies
to all berachain chains' beacon-kit default; only bepolia is deployed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-17 00:54:30 +00:00
5fda0b60bc telegraf: run as root + bypass gosu drop to read docker.sock GID-independently
The container ran as user 0:994 and accessed the docker socket via group
membership, but the host docker group GID is auto-assigned and varies per
host (e.g. uk-8 is 988, not 994), so the hardcoded gid silently breaks
telegraf's docker input wherever it differs (uk-8 was in a restart loop:
permission denied on /var/run/docker.sock). Run as root (0:0) with
entrypoint [telegraf] to skip the image's gosu privilege-drop, so telegraf
reads the socket as its owner regardless of the host docker gid. Works
uniformly fleet-wide; no regression on hosts where the gid happened to match.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 15:11:26 +00:00
fe94f3b605 doma: op-node consensus-layer sync (node_sync)
op-geth was stuck at genesis (block 0, 0 L2 peers for 31h) because execution-layer
snap-sync had no peers. node_sync=true -> OP_NODE_SYNCMODE=consensus-layer (derive
L2 from L1) + geth --syncmode=full. Diagnosed by cursor on de-13.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 14:23:25 +00:00
9ce0e14cd6 Revert "monitoring: enable cadvisor diskIO metrics"
This reverts commit a5081013f3.
2026-06-16 10:17:23 +00:00
941a0aa691 Revert "monitoring: run node_exporter on host network"
This reverts commit d48713cb15.
2026-06-16 10:10:01 +00:00
d48713cb15 monitoring: run node_exporter on host network
netdev (netlink) was reading the container's own veth (idle) -> node_network_*
showed ~0 on every host. Host network lets it see real host interfaces, so
bandwidth (egress vs tariff quota) works. Removed networks/expose (exclusive
with network_mode:host).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 10:06:55 +00:00
a5081013f3 monitoring: enable cadvisor diskIO metrics
Adds diskIO to cadvisor --enable_metrics so per-container disk read/write
(container_fs_reads/writes_bytes_total) is exposed — lets node attribution
name which node is the noisy IO neighbor, not just flag the host.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 10:01:33 +00:00
6d65582af4 monitoring: scrape cadvisor/nodeexporter/telegraf at /metrics not /
prometheus-scrape.metrics_path was '/' which made prometheus-docker-sd scrape
the HTML root and fail with 'INVALID is not a valid start token', leaving the
targets up=0. Fixes per-container (cadvisor) + host (node_exporter) metrics so
they can be wired into the DRPC insights MCP for per-node resource attribution.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 09:06:57 +00:00
2761625c15 utils: add mint-grafana-service-account-token.sh
Mint a Viewer Grafana service-account token on a host and print it to stdout,
so each box's local Grafana (dshackle + node_exporter + cadvisor) can be wired
into the DRPC insights MCP (grafana_servers.yaml). Idempotent SA, fresh token
per call.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 08:59:46 +00:00
1134a3774a sync-status: dRPC-homogeneous block-lag status + fix never-used reference fallbacks
Match the dRPC gateway's per-chain "how many blocks behind is ok" model instead of a
fixed 2s/5s timestamp tolerance:
- check-health.sh: compare the reference head vs local head by BLOCK NUMBER and classify
  with the chain's dRPC lag thresholds (LAGGING_LAG/SYNCING_LAG, in blocks, from
  chains.yaml). dRPC uses the two thresholds inconsistently across chains (sometimes
  lagging<syncing, sometimes the reverse) so the smaller is the online boundary and the
  larger the syncing/drop boundary. Defaults 2/6 when a chain has no thresholds.
- multicurl.sh: also skip responses with result:null (a lagging endpoint lacking the
  requested block) so the fallback reference URLs are actually tried. Previously the first
  endpoint's {"result":null} was accepted as success -> fallbacks never ran, and the null
  reference hash made check-health report false "forked" (the online/forked flapping).
- sync-status.sh: resolve the lag thresholds (by drpc slug or chain id) and export
  LAGGING_LAG/SYNCING_LAG.
- reference-rpc-endpoint.sh: add --lags and --block-time-ms lookups.
- reference-rpc-endpoint.json: regenerated with per-chain block_time_ms + lagging_lag +
  syncing_lag (additive).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 05:55:55 +00:00
rob
df6c17f5cc arb nitro minimal: prune-cycle model (.prune.yml --init.prune=minimal)
Revert to the prune-cycle model for minimal nodes: the normal compose serves
RPC with no --init.prune, and a generated .prune.yml runs --init.prune=minimal,
driven periodically by prune-if-prunable (same mechanism as pruned/full). Minimal
nodes are seeded from a pruned backup, then pruned to minimal.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 05:43:09 +00:00
577ac5d7f2 bsc minimal: disable eth_getProof on dshackle upstream
Minimal BSC nodes lack the state to serve eth_getProof and return
"header not found" (especially while catching up after a restart). The
drpc gateway probes eth_getProof and marks the WHOLE upstream unavailable
on failure, so e.g. bsc-mainnet-bsc-minimal on us-32 showed unavailable in
US-East despite the node being online and at head.

Disable eth_getProof on bsc minimal upstreams only (network=bsc,
db_type=minimal) so they stay available for every other method. Archive/
pruned bsc and all other nodes keep serving getProof.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 05:29:55 +00:00
677a98d9bd xlayer: archive-only reth on OKX xlayer/op-reth image + snapshot reth.toml
X Layer mainnet/testnet op-reth now use the OKX xlayer/op-reth:v0.0.4.1 fork
(generic op-reth can't read the X Layer DB format), the image's built-in
xlayer-<network> chain spec, --rpc.legacy-url for post-snapshot history gap-fill,
and the OKX archive-compatible reth.toml (light pruning only: merkle_changesets
distance=10064). Switched the reth profile from full_trace to archive_trace and
deleted the pruned variants — a "pruned" compose over archive snapshot data
crash-loops on a block-height mismatch. Requires the official OKX reth snapshot
pre-loaded into the volume; do NOT sync from scratch.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 04:35:28 +00:00
rob
fd283122f5 arb nitro minimal: self-prune via --init.prune=minimal in normal compose
Correct the minimal-node model: the minimal compose itself carries
--init.prune=minimal (prunes to genesis+head on start) instead of relying on a
separate .prune.yml — otherwise a minimal node was byte-identical to pruned in
normal operation and had no way to enforce minimal state. Removed the redundant
minimal .prune.yml files.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 04:19:09 +00:00
95f9da1e73 bob-sepolia: consensus-layer sync + debug_geth L1 + Jovian override (unstick)
op-node was following the sequencer tip in execution-layer mode while op-reth stayed
at genesis (blocks buffered, never canonical). Switch to consensus-layer sync so
op-node drives L1 derivation, add op-reth --disable-discovery, and use debug_geth
(eth_getBlockReceipts) for the L1 RPC kind — "basic" per-tx receipts hit "receipt 0
has unexpected nil block number" on the Sepolia L1 endpoint. OP_NODE_OVERRIDE_JOVIAN
=1772548201 is belt-and-suspenders (jovian_time already in rollup.json). Applies to
all bob-sepolia EL variants. Regenerated from the vibe-node generator.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 04:14:34 +00:00
fbd498aaa2 superseed-sepolia: consensus-layer sync + debug_geth L1 + drop genesis blob fields
op-reth/op-geth sepolia now run op-node consensus-layer sync (empty DB is never
P2P-backfilled) with --disable-discovery, derive L1 via debug_geth against an
operator-set full-history Sepolia RPC (SUPERSEED_SEPOLIA_L1_EXECUTION_RPC, falling
back to ETHEREUM_SEPOLIA_EXECUTION_RPC), and serve a genesis.json with the invalid
top-level excessBlobGas/blobGasUsed nulls removed (eip1559DenominatorCanyon=250
retained). Regenerated from the vibe-node generator.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 04:05:05 +00:00