Match the dRPC gateway's per-chain "how many blocks behind is ok" model instead of a
fixed 2s/5s timestamp tolerance:
- check-health.sh: compare the reference head vs local head by BLOCK NUMBER and classify
with the chain's dRPC lag thresholds (LAGGING_LAG/SYNCING_LAG, in blocks, from
chains.yaml). dRPC uses the two thresholds inconsistently across chains (sometimes
lagging<syncing, sometimes the reverse) so the smaller is the online boundary and the
larger the syncing/drop boundary. Defaults 2/6 when a chain has no thresholds.
- multicurl.sh: also skip responses with result:null (a lagging endpoint lacking the
requested block) so the fallback reference URLs are actually tried. Previously the first
endpoint's {"result":null} was accepted as success -> fallbacks never ran, and the null
reference hash made check-health report false "forked" (the online/forked flapping).
- sync-status.sh: resolve the lag thresholds (by drpc slug or chain id) and export
LAGGING_LAG/SYNCING_LAG.
- reference-rpc-endpoint.sh: add --lags and --block-time-ms lookups.
- reference-rpc-endpoint.json: regenerated with per-chain block_time_ms + lagging_lag +
syncing_lag (additive).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Revert to the prune-cycle model for minimal nodes: the normal compose serves
RPC with no --init.prune, and a generated .prune.yml runs --init.prune=minimal,
driven periodically by prune-if-prunable (same mechanism as pruned/full). Minimal
nodes are seeded from a pruned backup, then pruned to minimal.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Minimal BSC nodes lack the state to serve eth_getProof and return
"header not found" (especially while catching up after a restart). The
drpc gateway probes eth_getProof and marks the WHOLE upstream unavailable
on failure, so e.g. bsc-mainnet-bsc-minimal on us-32 showed unavailable in
US-East despite the node being online and at head.
Disable eth_getProof on bsc minimal upstreams only (network=bsc,
db_type=minimal) so they stay available for every other method. Archive/
pruned bsc and all other nodes keep serving getProof.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
X Layer mainnet/testnet op-reth now use the OKX xlayer/op-reth:v0.0.4.1 fork
(generic op-reth can't read the X Layer DB format), the image's built-in
xlayer-<network> chain spec, --rpc.legacy-url for post-snapshot history gap-fill,
and the OKX archive-compatible reth.toml (light pruning only: merkle_changesets
distance=10064). Switched the reth profile from full_trace to archive_trace and
deleted the pruned variants — a "pruned" compose over archive snapshot data
crash-loops on a block-height mismatch. Requires the official OKX reth snapshot
pre-loaded into the volume; do NOT sync from scratch.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Correct the minimal-node model: the minimal compose itself carries
--init.prune=minimal (prunes to genesis+head on start) instead of relying on a
separate .prune.yml — otherwise a minimal node was byte-identical to pruned in
normal operation and had no way to enforce minimal state. Removed the redundant
minimal .prune.yml files.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
op-node was following the sequencer tip in execution-layer mode while op-reth stayed
at genesis (blocks buffered, never canonical). Switch to consensus-layer sync so
op-node drives L1 derivation, add op-reth --disable-discovery, and use debug_geth
(eth_getBlockReceipts) for the L1 RPC kind — "basic" per-tx receipts hit "receipt 0
has unexpected nil block number" on the Sepolia L1 endpoint. OP_NODE_OVERRIDE_JOVIAN
=1772548201 is belt-and-suspenders (jovian_time already in rollup.json). Applies to
all bob-sepolia EL variants. Regenerated from the vibe-node generator.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
op-reth/op-geth sepolia now run op-node consensus-layer sync (empty DB is never
P2P-backfilled) with --disable-discovery, derive L1 via debug_geth against an
operator-set full-history Sepolia RPC (SUPERSEED_SEPOLIA_L1_EXECUTION_RPC, falling
back to ETHEREUM_SEPOLIA_EXECUTION_RPC), and serve a genesis.json with the invalid
top-level excessBlobGas/blobGasUsed nulls removed (eip1559DenominatorCanyon=250
retained). Regenerated from the vibe-node generator.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per cursor cosmos handoff. (1) Traefik WS router rule Headers(Upgrade,websocket) is
case-SENSITIVE -> clients sending 'Upgrade: WebSocket' (python websocket-client) fall
through to the RPC router (200/400 not 101). rpc-client.yml now emits
HeadersRegexp(Upgrade,(?i)websocket) for all split-WS chains; regenerated cosmos+avalanche.
(2) Refactor into cometbft-common.sh: ct_configure_statesync skips if data/application.db
exists; new ct_ensure_wasm seeds CosmWasm/IBC-08 wasm (statesync omits them). init.sh calls
both. README documents wasm/version/WS gotchas. 45 other split-WS composes get HeadersRegexp
on their next regen.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Our L1 geth (e.g. rpc-de-31 ethereum-mainnet-geth) returns null for the
per-tx receipt path on historical blocks but serves eth_getBlockReceipts
fine. op-node's 'basic' kind only uses the per-tx batch, so chains
deriving from genesis (e.g. Katana) hit the broken path immediately;
chains already synced past it don't. 'standard' uses eth_getBlockReceipts
and keeps the per-tx batch as fallback (strict superset of 'basic'), so
it is safe for every OP-stack node.
Flips OP_NODE_L1_RPC_KIND default across all 176 op-node compose files.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
dshackle's WS head subscription hit the RPC root (got HTTP 200, not a WS upgrade).
Set client_ws_path=/websocket -> rpc-client.yml emits the avalanche-style priority-100
WS router (Header Upgrade:websocket -> replacepath /websocket) so WS reaches CometBFT's
/websocket. RPC router unchanged (stripprefix -> root).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
New minimal profile for arbitrum one/nova/sepolia: a pruned-style node whose
prune cycle uses --init.prune=minimal (most aggressive: genesis+head only)
instead of full. Separate composes so it can be tested independently of the
production pruned nodes.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The polkachu seed was refusing -> 0 peers -> statesync couldn't discover snapshots.
Use client_staticpeers (cosmos-hub is a CLIENT in our stack) with 8 live peers from
polkachu /net_info. Template now reads client_staticpeers/client_bootnodes (standard attrs).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CometBFT config.toml uses underscore keys; the sed targeted hyphens (rpc-servers etc.)
so they never matched -> rpc_servers stayed empty -> 'at least 2 RPC servers required,
got 0'. Now section-anchored to [statesync] + [_-] tolerant. (This is also why sei
never held chainhead — same hyphen bug in its bespoke init.)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fixes prune-version drift that corrupted nodes (prune ran an older nitro binary
than the node). All .prune.yml now regenerate from the same config as the normal
compose via client_needs_prune.
- arb nitro: nova/one/sepolia .prune.yml bumped to the node's version (v3.10.1)
- avalanche: add .prune.yml variants that mount a /config/prune chain-config with
offline-pruning-enabled, plus avalanche/{mainnet,fuji}/prune/C/config.json
- archive profiles are excluded (db_type==pruned gate) so they are never pruned
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CometBFT RPC :26657 (the dshackle upstream) is served via rpc.laddr regardless;
minimal 'gaiad start --home --minimum-gas-prices' is the valid invocation.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ct_configure_statesync read .block.header.height (null — CometBFT wraps in .result) so
statesync silently skipped -> gaiad fell back to genesis replay (panic). Use
'.result.X // .X' (robust to wrapped/unwrapped). Same fallback in check-health --cosmos.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
First pure-cosmos chain (validates the §2 cometbft-common.sh lib via real gaiad init).
Single-binary CometBFT client serving :26657 (dshackle chain=cosmos-hub). Statesync
bootstrap (polkachu) since genesis-replay across gaia governance upgrades is impractical.
gzipped genesis. v25.3.2 (live chain version; registry's v27.4.0 not on ghcr). Params
(chain_id/genesis_url/statesync_rpc/seeds/min_gas) from context.yml.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Cosmos-hub has NO EVM RPC — eth_blockNumber checks don't apply. Add a --cosmos
handler that probes the CometBFT /status method: sync_info.catching_up=false =>
online, else syncing; optional head-gap check vs the drpc reference. sync-status.sh
dispatches protocol=cosmos -> check-health.sh --cosmos. eth/starknet/aztec untouched.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
morphnode panics (open /db/data/...: no such file) without the data dir + initial
validator state. EL healthy; this unblocks the CL.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
morphnode --mainnet does NOT self-generate config (crash: Config File not found in
/db/config). Add init container that seeds the committed morph/mainnet/node-data/config/
(config.toml + genesis.json, from run-morph-node) into /db/config (no-clobber). Node
renamed morph-node -> morph (cleaner; vol _morph, MORPH_MAINNET_MORPH_* env).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
EL morph-geth (go-ethereum 2.2.3, --morph --morph-mpt, archive leveldb/hash/mpt,
pure EL driven via engine API) + CL morph-node (node:0.5.7, env-driven, derives from
eth-mainnet L1). Params from authoritative morph-l2/run-morph-node. Fresh L1 derivation.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Generalizes the berachain beacon-kit init pattern into cometbft-common.sh (sourced
helper lib) so morph-node, gaiad/cosmos batch, shibarium and the EVM-cosmos chains
(haqq/sei/zero-gravity) become thin init.sh overrides. Validated against the real
init flows of beacon-kit, haqq, sei, zero-gravity. README.cometbft.md documents the
pattern (per-chain Dockerfile + thin init.sh + context). berachain stays on its own
beacon-kit.yml (untouched).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Generic nitro v3.10.1 strictly refuses DA on a DAC=false chain, so DAC=true
override did not work (confirmed live: node still errored 'AnyTrust DA usage
set to false'). ApeChain's canonical config is DAC=false; its official build
public.ecr.aws/i6b2w2n6/nitro-node:apechain-v3.5.6 accepts DAC=false + AnyTrust
DAS. Revert DAC to false (canonical) + keep das rest-aggregator.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Live node was 'forked', looping 'no AnyTrust reader configured for AnyTrust
message (header byte 0x88)'. ApeChain (Caldera) posts AnyTrust DA certs; the
earlier 'pure rollup' fix removed the DAS reader. Set chain-config
DataAvailabilityCommittee=true (matching every other AnyTrust chain in repo,
required by nitro v3.10.1 when DA enabled) and add das rest-aggregator
https://apechain.calderachain.xyz/rest-aggregator (from official Caldera node config).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
OP Sepolia network-wide Karst activation. op-node v1.19.0 (pinned) and
conduit-op-reth:latest are Karst-aware; op-node v1.17.0 introduced Karst.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
nitro crash-looped: 'AnyTrust DA usage for this chain is set to false but
--node.da.anytrust.enable is set to true'. apechain posts data to Arbitrum One,
not a DAS. Regenerated without DA flags; added parent-chain-is-arbitrum:true to
the chain-info. Derives batches from the Arbitrum One parent + sequencer feed.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ApeChain chain 33139, parent Arbitrum One (42161), Caldera infra, $APE gas.
baseConfig.json chain-info from the official Constellation replica nodeConfig
(ArbOS 31, owner 0x5737..601c, full rollup contract set). das/feed/sequencer
point at Caldera; parent-chain = ARBITRUM_ONE_EXECUTION_RPC (no L1 beacon).
pruned-pebble-path + archive-pebble-hash profiles.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The doma --filter run regenerated this global aggregate from only the
doma subset, emptying it. Restore the complete file; doma resolves via
its dshackle label so no reference-endpoint entry is required.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Conduit OP-stack L2 on Ethereum mainnet L1. Canyon-at-genesis; op-geth
ingests the Conduit genesis.json directly (no op-reth hash pitfall).
Config served from Conduit slug doma-mainnet-qvzsfv8nv0.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Mirror the restore-volumes.sh --no-slowdisk capability for live/backup clones.
Both scripts gate the target /slowdisk static-file offload on the target's
SLOWDISK env (case-insensitive, matches the Python-templated 'True') and accept
a --no-slowdisk flag that forces the offload off for one run. When SLOWDISK is
on but the target /slowdisk is too small for the static files, the clone warns
and aborts, telling the operator to re-run with --no-slowdisk.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
find -type d skipped static dirs that are already offloaded (a symlink to /slowdisk is type
l, not d) — so show-static-file-size.sh reported zero static for an offloaded node (e.g. bob
on de-35), and backup-node.sh would drop them from the manifest (breaking re-offload on the
next restore). Match dirs AND symlinks now (root-level entries too).
show-static-file-size.sh also tags each static dir with its location:
[OFFLOADED -> /slowdisk/...], [on-disk], or [BROKEN SYMLINK ...].
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The SLOWDISK value is emitted by Python templates as capitalized booleans (True/False), so
match "True" (also accept manual lowercase "true"); anything else = offload off. --no-slowdisk
sets SLOWDISK=False to override.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Use the single SLOWDISK env var from .env (already set to "true" on the dedicated-extra-disk
hosts, e.g. us-35). Offload runs only when SLOWDISK is exactly the literal "true" (case
matters) and the --no-slowdisk flag was not passed. --no-slowdisk forces it off even when
SLOWDISK=true (extra disk full). Replaces the prior NO_SLOWDISK inversion.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Static-file offload gating, clearer semantics:
- NO_SLOWDISK defaults to TRUE (offload OFF) — safe default everywhere.
- A host with a real dedicated extra disk at /slowdisk sets NO_SLOWDISK=false in its .env
to ENABLE the offload.
- New --no-slowdisk CLI flag forces NO_SLOWDISK=true, overriding the .env false — for when
the extra disk exists but is full. Flag is parsed position-independently; the positional
args ($1 compose, $2 remote source) are preserved.
Offload runs only when NO_SLOWDISK is false AND the flag was not passed.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A `/slowdisk` directory exists on hosts even when it is just a folder on the root disk
(no dedicated extra disk) — offloading there gives no benefit. Source the host .env and
require SLOWDISK to be set (operator sets it only on hosts with a real extra disk mounted
at /slowdisk) before activating the static-file -> /slowdisk symlink offload. Unset =
normal extract everywhere. Target path stays /slowdisk (the fixed in-container mount).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Verified static-file dirs across the reachable fleet via per-host cursor inspection
(de-13/14/16/22/27/30/31/32/35, us-16/41, uk-4). Findings:
- nitro freezers are network-prefixed (arbitrum-one/nitro/l2chaindata/ancient ... up to
1.3 TB), so the old chain/ and data/ prefixes matched nothing;
- missing: snapshots (erigon3/op-erigon/cosmos), lightchaindata/ancient, nested l2geth
geth/geth/chaindata/ancient, aztec archiver, nitro classic-msg/ancient;
- bare `ancient` matched nothing (all are nested).
List rewritten to canonical entries. Matching (backup-node.sh manifest + show-static-
file-size.sh) now: entry with NO slash = root-level only (so `snapshots` does NOT catch
postgres pg_logical/snapshots), entry WITH a slash = path-suffix via find -path "*/X"
(matches any prefix: network-prefixed nitro, nested datadirs). The manifest now records
the CONCRETE per-volume path so restore-volumes.sh recreates the exact symlink.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
restore-volumes.sh: pre-create static-file symlinks from the backup's .txt manifest so
the immutable ancient/freezer dirs land on /slowdisk (SSD) and extract THROUGH the
symlinks via tar --keep-directory-symlink (was --dereference, which clobbered them);
hot state stays on the primary disk. Cleans stale /slowdisk targets first (no leak on
re-restore). Safe fallbacks: no /slowdisk / no manifest / no static paths -> normal
extract. Reth excluded (reth dropped whole-dir static-file symlinks).
volume-utils.sh: add delete_slowdisk_targets_for_key() — follows a volume's symlinks and
sweeps the rpc_<key>__data_ pattern under /slowdisk (matches delete-volumes.sh).
cleanup-volumes.sh: free the /slowdisk static data before docker volume rm (was leaking),
and fix the fragile substring used/unused match to an exact name match.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- op-geth: drop stray trailing ` \` on --hvm.genesisheader and
--tbc.leveldbhome (the backslash was decoded as part of the hVM
genesis header -> "invalid byte U+0020 ' '" crash)
- op-node: add OP_NODE_OVERRIDE_ECOTONE/CANYON/DELTA=1725868497
(rollup.json carries only regolith_time; without these the later
fjord override tripped "fork fjord set but prior fork ecotone missing")
- op-node: OP_NODE_BSS_WS no longer wrapped in literal quotes
- bump to Hemi's current pins (mainnet is past isthmus/holocene):
op-geth 05d4d8f, op-node 7c70d2d
Values mirror hemilabs/hemi-node mainnet scripts/gen.sh + config.json.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Consensus service renamed arc-testnet-consensus -> arc-testnet-node: it is now
generated by templates/nodes/malachite.yml (node: malachite) instead of being
hacked into the arc client template's indexer block. Container body unchanged.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
op-node's default engine-API RPC timeout is short; on nodes whose EL responds
slowly to engine_newPayload/forkChoiceUpdated (e.g. boba-mainnet op-reth pruned
during catch-up) op-node times out driving the EL and the execution head freezes
while op-node keeps deriving from L1. Raising the engine RPC timeout to 120s
prevents this class of stall fleet-wide.
Regenerated from env/op/node.env: adds the env var to every op-node service
(116 geth + 40 reth + 22 erigon composes); additions only, no other changes.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
genesis.json had no jovianTime, so op-geth/op-reth never activated Jovian
and op-node rejected Jovian-era L1-info txs ("unexpected length"). Added
jovianTime=1772548201 to the genesis config (EL source of truth for both
geth and reth). Removed OP_NODE_OVERRIDE_JOVIAN from the op-node env: the
local rollup.json already carries jovian_time, making the override redundant.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
op-reth rejects geth's --nodiscover flag, which the op-geth base template
emits for node_sync nodes. Use reth's --disable-discovery instead.
- hashkeychain-testnet: now node_sync/consensus-layer (was execution-layer
with an empty DB, which could not backfill) + --disable-discovery
- xlayer mainnet/testnet (archive+pruned): --nodiscover -> --disable-discovery
(these were latently broken on restart)
Fix lives in the op-reth.yml base template so all reth nodes are covered;
the redundant override in op-reth.boba.yml was removed.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Chain 53302 is legacy Canyon-at-genesis (block 0 has no Shanghai
withdrawalsRoot). Upstream op-reth always injects it and computes the wrong
genesis hash 0xfa6f21ae vs canonical 0x7274a90e, so it crash-loops against
the DB. conduit-op-reth has legacy-Canyon genesis compatibility for 53302.
Regenerated from context.yml (reth client_image override) + op-reth.yml
template (historicalrpc for superseed-sepolia fresh-sync backfill).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Conduit's downloaded genesis.json/rollup.json are faulty (rollup.json
lacks chain_op_config + post-Fjord fork times, crashing op-node v1.19.0).
The fixed configs live in op/bob/sepolia/. Removed the client_genesis /
node_rollup_json Conduit URLs from context.yml and regenerated, so all 7
bob-sepolia composes (geth + reth) now bind-mount ./op/bob/sepolia:/config
and drop the curl-from-Conduit init containers.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>