Commit Graph

32 Commits

Author SHA1 Message Date
rob
aefcd41a88 status sweep: cap check-health per node (timeout) so one stuck node can't wedge fleet rpc-update
A hung check-health.sh (aztec-testnet, looping on an unresponsive reference RPC)
blocked show-status.sh's parallel 'wait' for 3.5h, hanging the whole fleet
rpc-update and holding the deploy lock. Each curl was bounded (-m 3) and the
retry loop capped (3x), but the call itself wasn't time-bounded.
- sync-status.sh: wrap each check-health.sh call in 'timeout ${HC_TIMEOUT:-30}'
  (-> exit 124 + 'timeout' status on overrun).
- show-status.sh: wrap the whole per-node sync-status.sh call in
  'timeout ${SYNC_TIMEOUT:-60}' so the parallel wait can never block forever.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-19 03:57:18 +00:00
1134a3774a sync-status: dRPC-homogeneous block-lag status + fix never-used reference fallbacks
Match the dRPC gateway's per-chain "how many blocks behind is ok" model instead of a
fixed 2s/5s timestamp tolerance:
- check-health.sh: compare the reference head vs local head by BLOCK NUMBER and classify
  with the chain's dRPC lag thresholds (LAGGING_LAG/SYNCING_LAG, in blocks, from
  chains.yaml). dRPC uses the two thresholds inconsistently across chains (sometimes
  lagging<syncing, sometimes the reverse) so the smaller is the online boundary and the
  larger the syncing/drop boundary. Defaults 2/6 when a chain has no thresholds.
- multicurl.sh: also skip responses with result:null (a lagging endpoint lacking the
  requested block) so the fallback reference URLs are actually tried. Previously the first
  endpoint's {"result":null} was accepted as success -> fallbacks never ran, and the null
  reference hash made check-health report false "forked" (the online/forked flapping).
- sync-status.sh: resolve the lag thresholds (by drpc slug or chain id) and export
  LAGGING_LAG/SYNCING_LAG.
- reference-rpc-endpoint.sh: add --lags and --block-time-ms lookups.
- reference-rpc-endpoint.json: regenerated with per-chain block_time_ms + lagging_lag +
  syncing_lag (additive).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 05:55:55 +00:00
78c78f5079 sync-status: add cosmos/CometBFT handler (gaiad + cosmos batch)
Cosmos-hub has NO EVM RPC — eth_blockNumber checks don't apply. Add a --cosmos
handler that probes the CometBFT /status method: sync_info.catching_up=false =>
online, else syncing; optional head-gap check vs the drpc reference. sync-status.sh
dispatches protocol=cosmos -> check-health.sh --cosmos. eth/starknet/aztec untouched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 12:15:54 +00:00
405f36d02f monitoring scripts: protocol dispatch via registry slug + mantle/metis chainid fixes
- sync-status.sh: resolve protocol family from the compose x-upstreams chain
  label via the registry (reference-rpc-endpoint.sh --protocol) instead of
  path-substring guessing; legacy path detection kept as fallback for
  composes without a resolved label. Unknown families report
  'unsupported protocol: X' honestly instead of a bogus eth_chainId error.
  Aztec reference lookup falls back to slug urls when rollup_version is
  not in the registry.
- reference-rpc-endpoint.sh: new --chain <slug> (urls by registry key,
  works for idless non-EVM entries) and --protocol <slug> modes; existing
  chainid and --rollup-version lookups unchanged.
- mantle-sepolia: chainid 5001 -> 5003 (verified live: 0x138b), label and
  --networkid now correct
- metis-sepolia: label resolves via registry override (drpc chains.yaml
  carries wrong id 59901; live chain is 59902, verified via official RPC)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 06:34:31 +00:00
goldsquid
376b1a750f fix 2026-01-31 11:26:01 +07:00
goldsquid
eafbb2e2c3 fix 2026-01-31 11:24:14 +07:00
goldsquid
a6e7348b40 some fix 2026-01-31 11:09:28 +07:00
goldsquid
3c20aac136 aztec maybe 2026-01-31 11:00:36 +07:00
rob
52d7ec6d40 Fix Starknet chain ID matching - handle hex-encoded ASCII
Juno returns chain ID as hex-encoded ASCII (0x534e5f5345504f4c4941)
rather than plain string (SN_SEPOLIA). Match both formats.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-15 09:17:22 +00:00
rob
63b720f1e9 Add Starknet support to sync-status and check-health scripts
- sync-status.sh now detects Starknet paths and uses starknet_chainId
- Maps SN_MAIN/SN_SEPOLIA chain IDs to reference endpoints
- check-health.sh accepts --starknet flag for Starknet mode
- Uses starknet_getBlockWithTxHashes instead of eth_getBlockByNumber
- Handles decimal timestamps and block_hash field differences

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-15 09:10:52 +00:00
goldsquid
0b3cf5eb38 fix 2025-11-15 11:24:52 +07:00
Para Dox
e8f8f8b0b2 fix 2025-04-12 11:45:19 +07:00
squidbear
0a1c490e87 fix for tron 2025-03-31 12:58:11 +02:00
squidbear
0e85af9526 use mulitple reference rpc 2025-03-23 11:27:14 +01:00
Sebastian
003bcc7e7b speedup with less errors 2025-03-21 06:32:25 +01:00
Sebastian
5159490e75 fix 2025-03-18 06:49:51 +01:00
Sebastian
1ac9afbd19 fix nossl 2025-03-18 06:48:49 +01:00
root
2eaa8f99e4 homecoming 2025-03-18 07:18:19 +02:00
Sebastian
0cebb54c00 refactor 2025-02-12 12:32:46 +01:00
Sebastian
c299275992 get going with tron 2024-12-15 08:27:07 +01:00
Sebastian
f29ef5a236 remove catchup form status 2024-10-20 14:55:58 +02:00
Sebastian
5ad15b9823 add estimate 2024-09-11 13:24:34 +02:00
Sebastian
70fd7e0346 add estimate 2024-09-11 13:23:28 +02:00
Sebastian
68d1ab5495 fix 2024-03-21 07:42:46 +01:00
Sebastian
f69081ec4e fix 2024-03-19 07:05:47 +01:00
Sebastian
bbb3cd28b3 make blacklist configurable 2024-03-19 05:59:24 +01:00
Sebastian
6d18778286 fix 2024-03-18 16:50:15 +01:00
Sebastian
eb12637cc3 fix 2024-03-18 16:36:17 +01:00
Sebastian
d873bb263a better sync status 2024-03-18 16:21:27 +01:00
Sebastian
5a347724fe almost 2024-03-18 08:42:41 +01:00
Sebastian
ce7c32da47 fix 2024-03-18 04:27:41 +01:00
Sebastian
08109790e6 init 2024-03-18 04:20:42 +01:00