Commit Graph

4609 Commits

Author SHA1 Message Date
2761625c15 utils: add mint-grafana-service-account-token.sh
Mint a Viewer Grafana service-account token on a host and print it to stdout,
so each box's local Grafana (dshackle + node_exporter + cadvisor) can be wired
into the DRPC insights MCP (grafana_servers.yaml). Idempotent SA, fresh token
per call.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 08:59:46 +00:00
1134a3774a sync-status: dRPC-homogeneous block-lag status + fix never-used reference fallbacks
Match the dRPC gateway's per-chain "how many blocks behind is ok" model instead of a
fixed 2s/5s timestamp tolerance:
- check-health.sh: compare the reference head vs local head by BLOCK NUMBER and classify
  with the chain's dRPC lag thresholds (LAGGING_LAG/SYNCING_LAG, in blocks, from
  chains.yaml). dRPC uses the two thresholds inconsistently across chains (sometimes
  lagging<syncing, sometimes the reverse) so the smaller is the online boundary and the
  larger the syncing/drop boundary. Defaults 2/6 when a chain has no thresholds.
- multicurl.sh: also skip responses with result:null (a lagging endpoint lacking the
  requested block) so the fallback reference URLs are actually tried. Previously the first
  endpoint's {"result":null} was accepted as success -> fallbacks never ran, and the null
  reference hash made check-health report false "forked" (the online/forked flapping).
- sync-status.sh: resolve the lag thresholds (by drpc slug or chain id) and export
  LAGGING_LAG/SYNCING_LAG.
- reference-rpc-endpoint.sh: add --lags and --block-time-ms lookups.
- reference-rpc-endpoint.json: regenerated with per-chain block_time_ms + lagging_lag +
  syncing_lag (additive).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 05:55:55 +00:00
rob
df6c17f5cc arb nitro minimal: prune-cycle model (.prune.yml --init.prune=minimal)
Revert to the prune-cycle model for minimal nodes: the normal compose serves
RPC with no --init.prune, and a generated .prune.yml runs --init.prune=minimal,
driven periodically by prune-if-prunable (same mechanism as pruned/full). Minimal
nodes are seeded from a pruned backup, then pruned to minimal.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 05:43:09 +00:00
577ac5d7f2 bsc minimal: disable eth_getProof on dshackle upstream
Minimal BSC nodes lack the state to serve eth_getProof and return
"header not found" (especially while catching up after a restart). The
drpc gateway probes eth_getProof and marks the WHOLE upstream unavailable
on failure, so e.g. bsc-mainnet-bsc-minimal on us-32 showed unavailable in
US-East despite the node being online and at head.

Disable eth_getProof on bsc minimal upstreams only (network=bsc,
db_type=minimal) so they stay available for every other method. Archive/
pruned bsc and all other nodes keep serving getProof.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 05:29:55 +00:00
677a98d9bd xlayer: archive-only reth on OKX xlayer/op-reth image + snapshot reth.toml
X Layer mainnet/testnet op-reth now use the OKX xlayer/op-reth:v0.0.4.1 fork
(generic op-reth can't read the X Layer DB format), the image's built-in
xlayer-<network> chain spec, --rpc.legacy-url for post-snapshot history gap-fill,
and the OKX archive-compatible reth.toml (light pruning only: merkle_changesets
distance=10064). Switched the reth profile from full_trace to archive_trace and
deleted the pruned variants — a "pruned" compose over archive snapshot data
crash-loops on a block-height mismatch. Requires the official OKX reth snapshot
pre-loaded into the volume; do NOT sync from scratch.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 04:35:28 +00:00
rob
fd283122f5 arb nitro minimal: self-prune via --init.prune=minimal in normal compose
Correct the minimal-node model: the minimal compose itself carries
--init.prune=minimal (prunes to genesis+head on start) instead of relying on a
separate .prune.yml — otherwise a minimal node was byte-identical to pruned in
normal operation and had no way to enforce minimal state. Removed the redundant
minimal .prune.yml files.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 04:19:09 +00:00
95f9da1e73 bob-sepolia: consensus-layer sync + debug_geth L1 + Jovian override (unstick)
op-node was following the sequencer tip in execution-layer mode while op-reth stayed
at genesis (blocks buffered, never canonical). Switch to consensus-layer sync so
op-node drives L1 derivation, add op-reth --disable-discovery, and use debug_geth
(eth_getBlockReceipts) for the L1 RPC kind — "basic" per-tx receipts hit "receipt 0
has unexpected nil block number" on the Sepolia L1 endpoint. OP_NODE_OVERRIDE_JOVIAN
=1772548201 is belt-and-suspenders (jovian_time already in rollup.json). Applies to
all bob-sepolia EL variants. Regenerated from the vibe-node generator.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 04:14:34 +00:00
fbd498aaa2 superseed-sepolia: consensus-layer sync + debug_geth L1 + drop genesis blob fields
op-reth/op-geth sepolia now run op-node consensus-layer sync (empty DB is never
P2P-backfilled) with --disable-discovery, derive L1 via debug_geth against an
operator-set full-history Sepolia RPC (SUPERSEED_SEPOLIA_L1_EXECUTION_RPC, falling
back to ETHEREUM_SEPOLIA_EXECUTION_RPC), and serve a genesis.json with the invalid
top-level excessBlobGas/blobGasUsed nulls removed (eip1559DenominatorCanyon=250
retained). Regenerated from the vibe-node generator.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 04:05:05 +00:00
ad56365253 cometbft: WS Upgrade matcher case-insensitive + ct_ensure_wasm + statesync-skip-if-data
Per cursor cosmos handoff. (1) Traefik WS router rule Headers(Upgrade,websocket) is
case-SENSITIVE -> clients sending 'Upgrade: WebSocket' (python websocket-client) fall
through to the RPC router (200/400 not 101). rpc-client.yml now emits
HeadersRegexp(Upgrade,(?i)websocket) for all split-WS chains; regenerated cosmos+avalanche.
(2) Refactor into cometbft-common.sh: ct_configure_statesync skips if data/application.db
exists; new ct_ensure_wasm seeds CosmWasm/IBC-08 wasm (statesync omits them). init.sh calls
both. README documents wasm/version/WS gotchas. 45 other split-WS composes get HeadersRegexp
on their next regen.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 03:27:19 +00:00
c07fb81a56 op-node: default L1 RPC kind basic -> standard (eth_getBlockReceipts)
Our L1 geth (e.g. rpc-de-31 ethereum-mainnet-geth) returns null for the
per-tx receipt path on historical blocks but serves eth_getBlockReceipts
fine. op-node's 'basic' kind only uses the per-tx batch, so chains
deriving from genesis (e.g. Katana) hit the broken path immediately;
chains already synced past it don't. 'standard' uses eth_getBlockReceipts
and keeps the per-tx batch as fallback (strict superset of 'basic'), so
it is safe for every OP-stack node.

Flips OP_NODE_L1_RPC_KIND default across all 176 op-node compose files.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 03:26:02 +00:00
576530e68f gaiad: v27.4.0 + statesync-skip-if-data + wasm-snapshot seed (port cursor host fix)
Live chain is v27.4.0 (abci_info); v25.3.2 halts at the v26/v27 upgrade heights (ghcr
lags but v27.4.0 IS pullable). Statesync omits CosmWasm + IBC 08-wasm files -> gaia panics
'wasmlckeeper failed initialize pinned codes / Error opening Wasm file'. init.sh now: skip
statesync if data/application.db exists (idempotent restart, preserves a restored snapshot);
on fresh statesync, seed wasm from polkachu cosmos_wasmonly.tar.lz4. Porting cursor's host
fix into the repo (host /root/rpc edits get reverted by run-rpc-update re-clone).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 03:08:02 +00:00
b4922e7fd3 gaiad: traefik split-WS router -> /websocket (CometBFT WS endpoint)
dshackle's WS head subscription hit the RPC root (got HTTP 200, not a WS upgrade).
Set client_ws_path=/websocket -> rpc-client.yml emits the avalanche-style priority-100
WS router (Header Upgrade:websocket -> replacepath /websocket) so WS reaches CometBFT's
/websocket. RPC router unchanged (stripprefix -> root).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 01:53:12 +00:00
rob
48683cabba arb nitro: add minimal-prune node variant (--init.prune=minimal)
New minimal profile for arbitrum one/nova/sepolia: a pruned-style node whose
prune cycle uses --init.prune=minimal (most aggressive: genesis+head only)
instead of full. Separate composes so it can be tested independently of the
production pruned nodes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 13:34:22 +00:00
814afc05af gaiad: lead client_staticpeers with polkachu statesync node (direct IP, serves snapshots)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 12:56:56 +00:00
68d77df969 gaiad: set client_staticpeers to live cosmos peers (statesync peer discovery)
The polkachu seed was refusing -> 0 peers -> statesync couldn't discover snapshots.
Use client_staticpeers (cosmos-hub is a CLIENT in our stack) with 8 live peers from
polkachu /net_info. Template now reads client_staticpeers/client_bootnodes (standard attrs).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 12:52:57 +00:00
0df57827f2 cometbft statesync: fix rpc_servers/trust_height/trust_hash (underscore, not hyphen)
CometBFT config.toml uses underscore keys; the sed targeted hyphens (rpc-servers etc.)
so they never matched -> rpc_servers stayed empty -> 'at least 2 RPC servers required,
got 0'. Now section-anchored to [statesync] + [_-] tolerant. (This is also why sei
never held chainhead — same hyphen bug in its bespoke init.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 12:45:42 +00:00
rob
e8dafa1383 prune: regenerate matching-version .prune.yml + add avalanche offline-pruning
Fixes prune-version drift that corrupted nodes (prune ran an older nitro binary
than the node). All .prune.yml now regenerate from the same config as the normal
compose via client_needs_prune.

- arb nitro: nova/one/sepolia .prune.yml bumped to the node's version (v3.10.1)
- avalanche: add .prune.yml variants that mount a /config/prune chain-config with
  offline-pruning-enabled, plus avalanche/{mainnet,fuji}/prune/C/config.json
- archive profiles are excluded (db_type==pruned gate) so they are never pruned

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 12:40:10 +00:00
fc0bd40523 gaiad: drop invalid --api.*/--grpc.* start flags (printed help + exited)
CometBFT RPC :26657 (the dshackle upstream) is served via rpc.laddr regardless;
minimal 'gaiad start --home --minimum-gas-prices' is the valid invocation.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 12:39:47 +00:00
55433a4822 cosmos: fix CometBFT RPC .result wrapping in statesync + sync-status helpers
ct_configure_statesync read .block.header.height (null — CometBFT wraps in .result) so
statesync silently skipped -> gaiad fell back to genesis replay (panic). Use
'.result.X // .X' (robust to wrapped/unwrapped). Same fallback in check-health --cosmos.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 12:36:47 +00:00
0c67fe451b gaiad: drop hardcoded p2p port (auto-assign to avoid host collision w/ morph 26656)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 12:32:45 +00:00
71345092f9 cosmos-hub (gaiad): add compose + init.sh + cometbft.Dockerfile
First pure-cosmos chain (validates the §2 cometbft-common.sh lib via real gaiad init).
Single-binary CometBFT client serving :26657 (dshackle chain=cosmos-hub). Statesync
bootstrap (polkachu) since genesis-replay across gaia governance upgrades is impractical.
gzipped genesis. v25.3.2 (live chain version; registry's v27.4.0 not on ghcr). Params
(chain_id/genesis_url/statesync_rpc/seeds/min_gas) from context.yml.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 12:30:07 +00:00
78c78f5079 sync-status: add cosmos/CometBFT handler (gaiad + cosmos batch)
Cosmos-hub has NO EVM RPC — eth_blockNumber checks don't apply. Add a --cosmos
handler that probes the CometBFT /status method: sync_info.catching_up=false =>
online, else syncing; optional head-gap check vs the drpc reference. sync-status.sh
dispatches protocol=cosmos -> check-health.sh --cosmos. eth/starknet/aztec untouched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 12:15:54 +00:00
9566a6d23f morph: init container also mkdir /db/data + seed priv_validator_state.json
morphnode panics (open /db/data/...: no such file) without the data dir + initial
validator state. EL healthy; this unblocks the CL.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 12:05:22 +00:00
2adce4cf5a morph: rename node morph-node->morph; add CL config seed (config.toml+genesis.json)
morphnode --mainnet does NOT self-generate config (crash: Config File not found in
/db/config). Add init container that seeds the committed morph/mainnet/node-data/config/
(config.toml + genesis.json, from run-morph-node) into /db/config (no-clobber). Node
renamed morph-node -> morph (cleaner; vol _morph, MORPH_MAINNET_MORPH_* env).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 12:02:56 +00:00
f4ce85bbcd morph-mainnet: add generated compose (Scroll-l2geth EL + op-node-style CL)
EL morph-geth (go-ethereum 2.2.3, --morph --morph-mpt, archive leveldb/hash/mpt,
pure EL driven via engine API) + CL morph-node (node:0.5.7, env-driven, derives from
eth-mainnet L1). Params from authoritative morph-l2/run-morph-node. Fresh L1 derivation.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 11:52:26 +00:00
535f51bda2 cometbft: add reusable family-C bootstrap library + docs
Generalizes the berachain beacon-kit init pattern into cometbft-common.sh (sourced
helper lib) so morph-node, gaiad/cosmos batch, shibarium and the EVM-cosmos chains
(haqq/sei/zero-gravity) become thin init.sh overrides. Validated against the real
init flows of beacon-kit, haqq, sei, zero-gravity. README.cometbft.md documents the
pattern (per-chain Dockerfile + thin init.sh + context). berachain stays on its own
beacon-kit.yml (untouched).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 11:26:06 +00:00
fa3f96382c base: bump node-reth client + op-node v0.16.1 -> v1.1.0 (Storage V2 / reth 2.3.0-dev)
For de-33 base V1->V2 migrate-v2 experiment. Global pin per operator;
de-13 prod base only moves on its next run-rpc-update.
2026-06-15 11:16:49 +00:00
237ec1d2e1 apechain: pin official Caldera image apechain-v3.5.6 (revert DAC flip)
Generic nitro v3.10.1 strictly refuses DA on a DAC=false chain, so DAC=true
override did not work (confirmed live: node still errored 'AnyTrust DA usage
set to false'). ApeChain's canonical config is DAC=false; its official build
public.ecr.aws/i6b2w2n6/nitro-node:apechain-v3.5.6 accepts DAC=false + AnyTrust
DAS. Revert DAC to false (canonical) + keep das rest-aggregator.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 11:03:14 +00:00
110c3483cd apechain: enable AnyTrust DAS reader (it IS AnyTrust, not pure rollup)
Live node was 'forked', looping 'no AnyTrust reader configured for AnyTrust
message (header byte 0x88)'. ApeChain (Caldera) posts AnyTrust DA certs; the
earlier 'pure rollup' fix removed the DAS reader. Set chain-config
DataAvailabilityCommittee=true (matching every other AnyTrust chain in repo,
required by nitro v3.10.1 when DA enabled) and add das rest-aggregator
https://apechain.calderachain.xyz/rest-aggregator (from official Caldera node config).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 10:52:30 +00:00
0838787aa8 katana-mainnet: add Jovian (1773066601) fork override for op-geth + op-node
Conduit rollup.json omits post-Fjord fork times; chain is past Jovian so
without it op-geth/op-node mis-execute/mis-derive at the boundary.
2026-06-15 10:38:58 +00:00
be54c7d7cd Revert "superseed-sepolia: add Karst hardfork time (1781712001 = 2026-06-17 16:00:01 UTC)"
This reverts commit ec702984f1.
2026-06-15 10:36:35 +00:00
ec702984f1 superseed-sepolia: add Karst hardfork time (1781712001 = 2026-06-17 16:00:01 UTC)
OP Sepolia network-wide Karst activation. op-node v1.19.0 (pinned) and
conduit-op-reth:latest are Karst-aware; op-node v1.17.0 introduced Karst.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 10:29:17 +00:00
a032f3c9d6 apechain: drop AnyTrust DA flags (it is a pure rollup, DAC=false)
nitro crash-looped: 'AnyTrust DA usage for this chain is set to false but
--node.da.anytrust.enable is set to true'. apechain posts data to Arbitrum One,
not a DAS. Regenerated without DA flags; added parent-chain-is-arbitrum:true to
the chain-info. Derives batches from the Arbitrum One parent + sequencer feed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 09:55:40 +00:00
e68c802d7c Add apechain-mainnet nitro node (Arbitrum Orbit L3 on Arbitrum One)
ApeChain chain 33139, parent Arbitrum One (42161), Caldera infra, $APE gas.
baseConfig.json chain-info from the official Constellation replica nodeConfig
(ArbOS 31, owner 0x5737..601c, full rollup contract set). das/feed/sequencer
point at Caldera; parent-chain = ARBITRUM_ONE_EXECUTION_RPC (no L1 beacon).
pruned-pebble-path + archive-pebble-hash profiles.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 06:45:51 +00:00
450c9d7874 Restore full reference-rpc-endpoint.json (filtered gen truncated it)
The doma --filter run regenerated this global aggregate from only the
doma subset, emptying it. Restore the complete file; doma resolves via
its dshackle label so no reference-endpoint entry is required.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 05:52:52 +00:00
1afe87fe12 Add doma-mainnet op-geth node (Conduit OP-stack, chain 97477)
Conduit OP-stack L2 on Ethereum mainnet L1. Canyon-at-genesis; op-geth
ingests the Conduit genesis.json directly (no op-reth hash pitfall).
Config served from Conduit slug doma-mainnet-qvzsfv8nv0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 05:51:18 +00:00
rob
b24c0f12dd clone-node/clone-backup: add --no-slowdisk override for /slowdisk offload
Mirror the restore-volumes.sh --no-slowdisk capability for live/backup clones.
Both scripts gate the target /slowdisk static-file offload on the target's
SLOWDISK env (case-insensitive, matches the Python-templated 'True') and accept
a --no-slowdisk flag that forces the offload off for one run. When SLOWDISK is
on but the target /slowdisk is too small for the static files, the clone warns
and aborts, telling the operator to re-run with --no-slowdisk.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 05:43:45 +00:00
baf26b234f static-file size/manifest: match offloaded (symlinked) dirs + show offload indicator
find -type d skipped static dirs that are already offloaded (a symlink to /slowdisk is type
l, not d) — so show-static-file-size.sh reported zero static for an offloaded node (e.g. bob
on de-35), and backup-node.sh would drop them from the manifest (breaking re-offload on the
next restore). Match dirs AND symlinks now (root-level entries too).

show-static-file-size.sh also tags each static dir with its location:
[OFFLOADED -> /slowdisk/...], [on-disk], or [BROKEN SYMLINK ...].

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 05:03:16 +00:00
57b5757a85 restore: SLOWDISK gate matches Python-templated "True" (capitalized)
The SLOWDISK value is emitted by Python templates as capitalized booleans (True/False), so
match "True" (also accept manual lowercase "true"); anything else = offload off. --no-slowdisk
sets SLOWDISK=False to override.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 04:58:13 +00:00
6c110c08ed restore: gate offload on SLOWDISK=true (exact) with --no-slowdisk override
Use the single SLOWDISK env var from .env (already set to "true" on the dedicated-extra-disk
hosts, e.g. us-35). Offload runs only when SLOWDISK is exactly the literal "true" (case
matters) and the --no-slowdisk flag was not passed. --no-slowdisk forces it off even when
SLOWDISK=true (extra disk full). Replaces the prior NO_SLOWDISK inversion.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 04:50:06 +00:00
b9efcfe34d restore: invert gate to NO_SLOWDISK (default true) + add --no-slowdisk override flag
Static-file offload gating, clearer semantics:
- NO_SLOWDISK defaults to TRUE (offload OFF) — safe default everywhere.
- A host with a real dedicated extra disk at /slowdisk sets NO_SLOWDISK=false in its .env
  to ENABLE the offload.
- New --no-slowdisk CLI flag forces NO_SLOWDISK=true, overriding the .env false — for when
  the extra disk exists but is full. Flag is parsed position-independently; the positional
  args ($1 compose, $2 remote source) are preserved.

Offload runs only when NO_SLOWDISK is false AND the flag was not passed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 04:43:47 +00:00
3caa4ab873 restore: gate static-file offload on SLOWDISK env var (.env), not just /slowdisk presence
A `/slowdisk` directory exists on hosts even when it is just a folder on the root disk
(no dedicated extra disk) — offloading there gives no benefit. Source the host .env and
require SLOWDISK to be set (operator sets it only on hosts with a real extra disk mounted
at /slowdisk) before activating the static-file -> /slowdisk symlink offload. Unset =
normal extract everywhere. Target path stays /slowdisk (the fixed in-container mount).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 04:39:48 +00:00
a9e8fba794 static-file list: fleet-verified rewrite + root-vs-suffix matching
Verified static-file dirs across the reachable fleet via per-host cursor inspection
(de-13/14/16/22/27/30/31/32/35, us-16/41, uk-4). Findings:
- nitro freezers are network-prefixed (arbitrum-one/nitro/l2chaindata/ancient ... up to
  1.3 TB), so the old chain/ and data/ prefixes matched nothing;
- missing: snapshots (erigon3/op-erigon/cosmos), lightchaindata/ancient, nested l2geth
  geth/geth/chaindata/ancient, aztec archiver, nitro classic-msg/ancient;
- bare `ancient` matched nothing (all are nested).

List rewritten to canonical entries. Matching (backup-node.sh manifest + show-static-
file-size.sh) now: entry with NO slash = root-level only (so `snapshots` does NOT catch
postgres pg_logical/snapshots), entry WITH a slash = path-suffix via find -path "*/X"
(matches any prefix: network-prefixed nitro, nested datadirs). The manifest now records
the CONCRETE per-volume path so restore-volumes.sh recreates the exact symlink.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 03:22:26 +00:00
345538954d restore/cleanup: implement static-file -> /slowdisk offload + free it on removal
restore-volumes.sh: pre-create static-file symlinks from the backup's .txt manifest so
the immutable ancient/freezer dirs land on /slowdisk (SSD) and extract THROUGH the
symlinks via tar --keep-directory-symlink (was --dereference, which clobbered them);
hot state stays on the primary disk. Cleans stale /slowdisk targets first (no leak on
re-restore). Safe fallbacks: no /slowdisk / no manifest / no static paths -> normal
extract. Reth excluded (reth dropped whole-dir static-file symlinks).

volume-utils.sh: add delete_slowdisk_targets_for_key() — follows a volume's symlinks and
sweeps the rpc_<key>__data_ pattern under /slowdisk (matches delete-volumes.sh).

cleanup-volumes.sh: free the /slowdisk static data before docker volume rm (was leaking),
and fix the fragile substring used/unused match to an exact name match.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 03:05:04 +00:00
f39e09dac0 hemi mainnet: fix op-geth/op-node startup crashes
- op-geth: drop stray trailing ` \` on --hvm.genesisheader and
  --tbc.leveldbhome (the backslash was decoded as part of the hVM
  genesis header -> "invalid byte U+0020 ' '" crash)
- op-node: add OP_NODE_OVERRIDE_ECOTONE/CANYON/DELTA=1725868497
  (rollup.json carries only regolith_time; without these the later
  fjord override tripped "fork fjord set but prior fork ecotone missing")
- op-node: OP_NODE_BSS_WS no longer wrapped in literal quotes
- bump to Hemi's current pins (mainnet is past isthmus/holocene):
  op-geth 05d4d8f, op-node 7c70d2d

Values mirror hemilabs/hemi-node mainnet scripts/gen.sh + config.json.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 02:27:32 +00:00
3c4492179c arc: Malachite CL is now a proper node component (arc-testnet-node)
Consensus service renamed arc-testnet-consensus -> arc-testnet-node: it is now
generated by templates/nodes/malachite.yml (node: malachite) instead of being
hacked into the arc client template's indexer block. Container body unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 02:19:36 +00:00
2249352c20 op-node: default OP_NODE_L2_ENGINE_RPC_TIMEOUT=120s for all OP-stack nodes
op-node's default engine-API RPC timeout is short; on nodes whose EL responds
slowly to engine_newPayload/forkChoiceUpdated (e.g. boba-mainnet op-reth pruned
during catch-up) op-node times out driving the EL and the execution head freezes
while op-node keeps deriving from L1. Raising the engine RPC timeout to 120s
prevents this class of stall fleet-wide.

Regenerated from env/op/node.env: adds the env var to every op-node service
(116 geth + 40 reth + 22 erigon composes); additions only, no other changes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 15:43:34 +00:00
16057e4875 bob-sepolia: add Jovian to EL genesis; drop redundant op-node override
genesis.json had no jovianTime, so op-geth/op-reth never activated Jovian
and op-node rejected Jovian-era L1-info txs ("unexpected length"). Added
jovianTime=1772548201 to the genesis config (EL source of truth for both
geth and reth). Removed OP_NODE_OVERRIDE_JOVIAN from the op-node env: the
local rollup.json already carries jovian_time, making the override redundant.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 06:28:01 +00:00
36b8243a03 op-reth: emit --disable-discovery (not --nodiscover) for node_sync nodes
op-reth rejects geth's --nodiscover flag, which the op-geth base template
emits for node_sync nodes. Use reth's --disable-discovery instead.

- hashkeychain-testnet: now node_sync/consensus-layer (was execution-layer
  with an empty DB, which could not backfill) + --disable-discovery
- xlayer mainnet/testnet (archive+pruned): --nodiscover -> --disable-discovery
  (these were latently broken on restart)

Fix lives in the op-reth.yml base template so all reth nodes are covered;
the redundant override in op-reth.boba.yml was removed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 06:18:55 +00:00
10deed7819 superseed-sepolia op-reth: use conduit-op-reth image + historicalrpc
Chain 53302 is legacy Canyon-at-genesis (block 0 has no Shanghai
withdrawalsRoot). Upstream op-reth always injects it and computes the wrong
genesis hash 0xfa6f21ae vs canonical 0x7274a90e, so it crash-loops against
the DB. conduit-op-reth has legacy-Canyon genesis compatibility for 53302.

Regenerated from context.yml (reth client_image override) + op-reth.yml
template (historicalrpc for superseed-sepolia fresh-sync backfill).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 06:10:06 +00:00