Files
ethereum-rpc-docker/VIBE.md
rob dd8ce689e4 Add VIBE.md debugging guide and update README.md
- Add VIBE.md as primary debugging reference for automated tools
- Rewrite README.md as human-focused operator guide
- Fix README.md inaccuracies (remove show-networks.sh references, fix typo)
- Split content: README for humans, VIBE for agents

Generated by Mistral Vibe.
Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
2026-06-22 08:37:38 +00:00

21 KiB

VIBE.md — ethereum-rpc-docker Operations & Debugging Guide

You are an LLM agent or operator running or debugging blockchain RPC nodes from this repository. This file is your primary reference for all operational tasks.

This repo contains Docker Compose configurations for blockchain RPC nodes plus operational scripts for managing them. Everything you need to run, monitor, debug, and fix nodes is here.


0. WHEN A NODE IS FAULTY — Start Here

Immediate Triage (30 seconds)

# 1. Is the container running?
./show-running.sh

# 2. Check overall status of all configured nodes
./show-status.sh

# 3. If you know the config name, check its specific status
./sync-status.sh <config-name>

# 4. Check logs for the faulty node
./logs.sh <config-name>

If the container isn't running, go to §3. Container Lifecycle Issues

If the container is running but not synced, go to §4. Sync Issues

If the container is running and synced but RPC fails, go to §5. RPC/Connectivity Issues

If you see errors in logs but aren't sure what they mean, go to §6. Log Interpretation


1. Repository Overview

What This Repo Contains

rpc/
├── *.yml                    # Docker Compose files for node configurations
├── *.sh                     # Operational scripts (YOUR PRIMARY TOOLS)
├── scripts/                 # Additional helper scripts (CometBFT support)
├── <network>/               # Network directories (e.g., ethereum/, op/, arb/)
│   ├── *.yml                # Compose files for specific chains
│   └── <chain>/             # Chain-specific assets
│       ├── genesis.json     # Custom genesis files
│       ├── rollup.json      # Rollup configurations (OP Stack)
│       └── *.Dockerfile      # Custom build files
├── README.md                # User documentation
└── VIBE.md                  # THIS FILE — operations guide

Key Concepts

  • Config name: The compose filename WITHOUT .yml (e.g., ethereum-mainnet-geth-pruned)
  • Service name: Derived from config name, used in docker compose commands
  • Short name: Used in URL paths, container labels. Format: {network}-{chain}[-{client}][-{db_type}]
  • Volume names: Docker volumes follow the full config name pattern

Supported Networks

Layer 1: Ethereum, Polygon, BSC, Avalanche, Gnosis, Fantom, Core, Berachain, Ronin, Viction, Fuse, Tron, ThunderCore, Goat, AlephZero, Haqq, Taiko, Rootstock, Dogecoin, Litecoin, Bitcoin, Bitcoin-Cash, Ripple, Solana, Tron

Layer 2 (OP Stack): Optimism, Base, Zora, Mode, Blast, Fraxtal, Bob, Boba, Worldchain, Metal, Ink, Lisk, SNAX, Celo

Layer 2 (Arbitrum): Arbitrum One, Arbitrum Nova, Everclear, Playblock, Real, Connext, OpenCampusCodex

Other L2s: Linea, Scroll, zkSync Era, Metis, Moonbeam, Starknet, zkEVM, Immutable zkEVM, Polygon zkEVM


2. Essential Scripts Reference

Status & Monitoring Scripts

Script Usage What It Does
show-status.sh [config-name] Lists ALL configured nodes with sync status, block height, health
show-running.sh Lists currently running containers
sync-status.sh <config-name> Detailed sync status for one config
latest.sh <config-name> Latest block number + hash
logs.sh <config-name> Tail logs from all containers in a config
show-db-size.sh Disk usage of ALL Docker volumes, sorted by size
show-ram.sh <config-name> Memory usage of containers
show-cpu.sh CPU usage display
peer-count.sh P2P peer count for all running nodes
time-since-last-block.sh <config-name> How long since last block was processed
ping.sh <container-name> Test network connectivity from container
show-errors.sh Show error counts/logs across containers
show-size.sh Show size of containers/volumes
show-file-size.sh Show static file sizes
show-static-file-size.sh Show static file sizes (alternative)

Lifecycle Management Scripts

Script Usage What It Does
start.sh <config-name> Start all containers for a config
stop.sh <config-name> Stop all containers for a config
force-recreate.sh <config-name> Force recreate containers (keeps volumes)
rm.sh <config-name> Remove containers (keeps volumes)
delete-volumes.sh <config-name> DESTRUCTIVE - Remove containers AND volumes
delete-node-keys.sh <config-name> Remove node keys (for re-initialization)

Backup & Restore Scripts

Script Usage What It Does
backup-node.sh <config-name> [url] Backup volumes locally or to WebDAV
restore-volumes.sh <config-name> [url] Restore volumes from local or HTTP
clone-node.sh <config-name> Clone a node's state
clone-backup.sh Clone backup files
clone-peers.sh Clone peer information
restore-peers.sh Restore peer connections
list-backups.sh List available backup files
list-peer-backups.sh List peer backup files
list-restorable.sh List restorable configurations
cleanup-backups.sh Remove old backups
cleanup-volumes.sh Clean up unused volumes

Network & Connectivity Scripts

Script Usage What It Does
upstreams.sh Generate dshackle upstream configuration
connect-peers.sh Connect to peer nodes
search-node.sh <query> Search compose files for patterns
search-compose.sh <query> Search compose files
network-to-config.sh Map network names to config files
reload_dshackle.sh Reload dshackle configuration
update-whitelist.sh Update IP whitelist
update-ip.sh Update IP configuration

Specialized Scripts

Script Usage What It Does
op-wheel.sh OP rollup maintenance (rewind, set forkchoice)
op-wheel-finalize-latest-block.sh <client_svc> [node_svc] Finalize latest block (nuclear option)
catchup.sh <config-name> Help node catch up to chain head
success-if-almost-synced.sh <config-name> <seconds> Exit 0 if node is almost synced
groq.sh Query using Groq
trai.sh Trace transaction
multicurl.sh Parallel curl requests
blocknumber.sh Get block number
get-block.sh Get block information
get-local-url.sh Get local RPC URL
get-shortname.sh <config-file> Get short name for a config
disk-space.sh Check disk space
limit-bandwidth.sh Limit bandwidth
maintenance.sh Maintenance helper
random-port.sh Generate random port
reference-rpc-endpoint.sh Reference RPC endpoint helper
reset-terminal.sh Reset terminal
setup-bandwidth-limit-cron.sh Setup cron for bandwidth limiting

3. Container Lifecycle Issues

Symptom: Container Won't Start

# Check why it failed
./logs.sh <config-name> 2>&1 | tail -50

# Check container exit code
docker ps -a --filter "name=<config-name>" --format "{{.Names}} | {{.State}} | {{.Status}}"

# Inspect the container
docker inspect <container-name> | jq '.[0].State'

Common causes:

  • Port conflict: Two services trying to bind to same host port
  • Volume permission issues: Docker can't write to volume
  • Missing environment variables: .env file incomplete
  • Invalid compose syntax: YAML parsing error
  • Image pull failure: Network issue or private registry auth

Fixes:

# Check for port conflicts
grep -h "^[0-9]\{1,5\}:[0-9]" *.yml | sort | uniq -d

# Validate compose syntax
docker compose -f <config-file>.yml config

# Pull images manually
docker compose -f <config-file>.yml pull

# Start with --build if using custom Dockerfiles
docker compose -f <config-file>.yml up -d --build

Symptom: Container Exits Immediately After Starting

# View the last 100 lines of logs before exit
./logs.sh <config-name> 2>&1 | tail -100

# Check exit code
docker ps -a --filter "name=<service>" --format "{{.Status}}"

# Run interactively to see error
docker compose -f <config-file>.yml run --rm <service-name> sh

Common causes:

  • Missing config files: /config/ mount empty or wrong path
  • Invalid flags: Command-line arguments malformed
  • Database corruption: Existing data incompatible with new version
  • Checkpoint/genesis mismatch: Chain ID or genesis doesn't match

Fixes:

# Verify config directory exists (if using custom configs)
ls -la <network>/<chain>/

# Try with fresh volumes (DESTRUCTIVE)
./delete-volumes.sh <config-name>
./start.sh <config-name>

Symptom: Container Restarts Repeatedly (Crash Loop)

# Watch logs in real-time
./logs.sh <config-name> -f

# Check restart count
docker inspect <container-name> | jq '.[0].RestartCount'

# Check last restart reason
docker inspect <container-name> | jq '.[0].State.ExitCode, .[0].State.Error'

Common causes:

  • OOM killed: Memory limit exceeded
  • Out of disk space: No space left on device
  • Segmentation fault: Client bug or bad data
  • Panic: Go client panic

Fixes:

# Check memory usage
./show-ram.sh <config-name>

# Check disk space
df -h /var/lib/docker
./show-db-size.sh

# Increase resources in compose file or .env
# Then force recreate
./force-recreate.sh <config-name>

4. Sync Issues

Symptom: Node Not Syncing (Stuck at Block 0 or Low Block)

# Check sync status
./sync-status.sh <config-name>

# Check current block
./latest.sh <config-name>

# Check logs for sync errors
./logs.sh <config-name> | grep -i -E "sync|error|fail|warn|stuck|behind"

# Check peer count
./peer-count.sh | grep <config-name>

Common causes:

  • No peers: P2P network connection failed
  • Wrong network: Connected to wrong chain
  • Checkpoint too old: Checkpoint URL unavailable or outdated
  • Snapshot download failed: Snapshot server unreachable

Fixes:

# Check if checkpoint/snapshot is configured
grep -E "(checkpoint|snapshot)" <config-file>.yml

# Test checkpoint URL manually
curl -I $(grep checkpoint <config-file>.yml | grep -oE 'http[^ ]+')

# Check peer connections (geth example)
docker exec <client-container> admin_peers | jq '.[] | .network.remoteAddress' | wc -l

Symptom: Sync is Very Slow

# Check sync speed over time
./latest.sh <config-name>; sleep 60; ./latest.sh <config-name>

# Check if node is processing blocks
./time-since-last-block.sh <config-name>

# Check CPU and memory
top -d 1 -p $(docker inspect <container> | jq -r '.[0].State.Pid')

Common causes:

  • Resource constrained: CPU throttled, memory swapped
  • Disk I/O bottleneck: Slow storage or contention
  • Network rate limited: P2P or RPC rate limiting
  • Too many peers: P2P overhead
  • Wrong sync mode: Full sync instead of snap sync

Symptom: Sync Stuck at Specific Block

# Check logs around the stuck block
./logs.sh <config-name> | grep -A 10 -B 10 "block <stuck-block-number>"

# Check if it's a known bad block
# Search online: <chain> bad block <number>

Common causes:

  • Bad block in chain: Requires client patch or manual intervention
  • State trie inconsistency: Database corruption
  • Fork choice issue: Node on wrong fork

Fixes for OP Stack:

# Try to finalize past the block
./op-wheel-finalize-latest-block.sh <client-service>

Symptom: Node on Wrong Fork / Chain

# Check chain ID
./latest.sh <config-name> | grep -i chain

# Check what chain the node thinks it's on
docker exec <client-container> ethdo chain --endpoint=http://localhost:8545

# Compare with expected chain ID
grep chainId <config-file>.yml

5. RPC/Connectivity Issues

Symptom: RPC Endpoint Not Responding

# Test from host
curl -s http://localhost:<port> | head -c 100

# Check if traefik/proxy is running
docker ps | grep -E "(traefik|proxy|nginx)"

# Check traefik logs
docker logs <traefik-container> | tail -50

Common causes:

  • Container not running: Client crashed
  • Port not exposed: Wrong port mapping
  • Traefik misconfiguration: Labels wrong or missing
  • Firewall blocking: Host firewall or cloud security group

Symptom: RPC Returns Wrong Chain ID

# Query chain ID from RPC
curl -s -X POST http://localhost:<port> \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","method":"eth_chainId","params":[],"id":1}'

Symptom: Cannot Connect to P2P Network

# Check peer count
./peer-count.sh | grep <config-name>

# Test P2P connectivity from container
docker exec <client-container> nc -zv <bootstrap-node> <p2p-port>

Fixes:

# Set public IP in .env
IP=$(curl -s ipinfo.io/ip)
echo "IP=$IP" >> .env
./force-recreate.sh <config-name>

6. Log Interpretation

Common Log Patterns

Warnings (Node may still function)

Pattern Meaning Action
WARN.*sync.*slow Sync slower than expected Check resources
WARN.*peers.*low Fewer peers than desired Check P2P connectivity
WARN.*rate.*limit API rate limiting active Normal for public endpoints

Errors (Node is degraded)

Pattern Meaning Action
Error.*database.*corrupt Database corruption Restore from backup or resync
Error.*handshake.*fail P2P handshake failed Check chain ID
Error.*no.*peers Cannot connect to P2P Check bootstrap nodes
Error.*timeout RPC/HTTP timeout Check network, increase timeout

Fatal (Node will not function)

Pattern Meaning Action
Fatal.*panic Client crashed Check client version
Fatal.*OOM Out of memory Increase memory limit
Fatal.*disk.*full No disk space Free space
Fatal.*permission.*denied Filesystem permissions Fix volume permissions

7. Resource Issues

High CPU Usage

./show-ram.sh <config-name>
./show-cpu.sh
docker stats <container-name> --no-stream

High Memory Usage

./show-ram.sh <config-name>
docker stats <container-name> --no-stream --format "{{.Container}} | {{.MemUsage}} | {{.MemPerc}}"

High Disk Usage

./show-db-size.sh
docker system df -v

Disk I/O Bottleneck

iotop -o -d 1

8. Backup and Restore

Creating a Backup

# Local backup (to /backup directory)
./backup-node.sh <config-name>

# Remote backup (to WebDAV)
./backup-node.sh <config-name> https://backup-server.tld/dav

Restoring from Backup

# List available backups
./list-backups.sh

# Restore latest backup for config
./restore-volumes.sh <config-name>

# Restore from specific URL
./restore-volumes.sh <config-name> https://backup-server.tld/backup/

Cloning a Node

# Clone a node to a new location
./clone-node.sh <config-name>

# Clone peers (for faster sync)
./clone-peers.sh <config-name>

Nuclear Option: Full Reset

# WARNING: This deletes ALL data for the config
./stop.sh <config-name> && \
./rm.sh <config-name> && \
./delete-volumes.sh <config-name> && \
./delete-node-keys.sh <config-name> && \
./force-recreate.sh <config-name>

# Then check logs
./logs.sh <config-name>

9. Common Error Messages

Database Errors

Error Cause Solution
database is corrupted Power loss, bug Restore from backup or resync
database version mismatch Client version changed Delete and resync

P2P Errors

Error Cause Solution
no configured peers Missing bootstrap nodes Add bootstrap nodes
handshake failed Chain ID mismatch Verify genesis.json

RPC Errors

Error Cause Solution
method not found Wrong client Use correct client
connection refused Port not open Check container running, port mapping

10. OP Stack Specific Debugging

OP Node Issues

# Check op-node logs
./logs.sh <config-name> | grep -i "op-node\|rollup\|sequencer"

# Check rollup configuration (if custom)
cat op/<network>/ethereum/rollup.json | jq .

# Check if rollup.json is mounted
docker exec <op-node-container> cat /config/rollup.json | jq .

OP Wheel (Manual Intervention)

# Rewind to specific block (DANGEROUS - only if you know what you're doing)
./op-wheel.sh engine set-forkchoice \
  --unsafe=<block-hash> \
  --safe=<block-hash> \
  --finalized=<block-hash> \
  --engine=http://<client-service>:8551/ \
  --engine.open=http://<client-service>:8545 \
  --engine.jwt-secret-path=/jwtsecret

# Nuclear option: finalize latest local block
./op-wheel-finalize-latest-block.sh <client-service> <node-service>

11. CometBFT Family (Cosmos, etc.) Specific

Init Container Issues

# CometBFT chains use init.sh inside the container
# The master script is at scripts/cometbft-common.sh

# Check if init completed
./logs.sh <config-name> | grep -i "init\|setup\|complete"

# Check the init script
cat <network>/<chain>/scripts/init.sh

12. Quick Start Guide

Starting a Node

# 1. Set up environment
echo "IP=$(curl -s ipinfo.io/ip)" > .env
echo "DOMAIN=${IP//./-}.traefik.me" >> .env
echo "MAIL=your-email@example.com" >> .env

# 2. Select which nodes to run
# Add compose files to COMPOSE_FILE (colon-separated)
echo "COMPOSE_FILE=base.yml:rpc.yml:ethereum-mainnet-geth-pruned.yml" >> .env

# 3. Start the node
docker compose up -d

# 4. Verify it's running
./show-status.sh

Accessing Your Node

# Once running, access via:
# HTTP: http://<your-domain>/ethereum-mainnet-geth-pruned
# HTTPS: https://<your-domain>/ethereum-mainnet-geth-pruned
# WebSocket: wss://<your-domain>/ethereum-mainnet-geth-pruned

# Or locally (if NO_SSL=true):
# HTTP: http://localhost:<port>

13. Configuration Reference

Environment Variables

Required for most setups:

IP=203.0.113.42                    # Your public IP
DOMAIN=203-0-113-42.traefik.me    # Your domain (traefik.me for testing)
MAIL=your-email@example.com        # For Let's Encrypt SSL
WHITELIST=0.0.0.0/0               # IP whitelist (0.0.0.0/0 = all)

Optional:

NO_SSL=true                       # Disable SSL (testing only)
CHAINS_SUBNET=192.168.0.0/26      # Docker network subnet

Chain-specific (examples):

ETHEREUM_MAINNET_EXECUTION_RPC=https://fallback-rpc.example.com
ARBITRUM_SEPOLIA_EXECUTION_RPC=https://arb-sepolia-rpc.example.com
OP_NODE_NETWORK=mainnet
OP_NODE_L1_RPC_URL=https://l1-rpc.example.com

Compose File Structure

Each compose file defines one or more services:

  • client: Execution layer (Geth, Erigon, Reth, etc.)
  • node: Consensus/derivation node (op-node, lighthouse, etc.)
  • relay: DA relay (eigenda-proxy, op-alt, etc.)
  • proxy: HTTP/WS proxy (nginx, etc.)
  • database: External database (Postgres, etc.)

Volume Naming

Volumes are named after the config:

<config-name>_<service>_data
<config-name>_<service>_config

Example: ethereum-mainnet-geth-pruned_client_data


14. Quick Debugging Checklist

Use this checklist when debugging an issue:

  • Is the container running?./show-running.sh
  • Are there errors in logs?./logs.sh <config> | grep -i error
  • Is the node synced?./sync-status.sh <config>
  • Are peers connected?./peer-count.sh
  • Are resources adequate?./show-ram.sh, ./show-db-size.sh
  • Is P2P working? → Check peer count
  • Is RPC responding? → Test with curl
  • Is disk space available?df -h /var/lib/docker
  • Is the config file correct?docker compose -f <file>.yml config
  • Are environment variables set? → Check .env
  • Is the genesis file correct? → Check chain ID

15. When to Escalate

Escalate to a human operator if:

  • Node stuck for > 2 hours with no progress
  • Repeated Fatal or panic errors after restart
  • Database corruption confirmed
  • Issue affects multiple nodes across different chains
  • Need to force-push to this repo

16. File Locations Quick Reference

What You Need Where to Find It
Compose files Root of this repo (*.yml)
Operational scripts Root of this repo (*.sh)
Chain assets <network>/<chain>/ or <stack>/<network>/<settlement>/
Genesis files <stack>/<network>/<settlement>/genesis.json
Rollup configs op/<network>/<settlement>/rollup.json
Custom Dockerfiles <path>/*.Dockerfile
Init scripts <path>/scripts/init.sh
CometBFT common scripts/cometbft-common.sh
Compose registry compose_registry.json
RPC endpoints reference-rpc-endpoint.json
Environment .env

17. Resource Requirements Reference

Node Type Disk RAM CPU
Ethereum pruned ~500GB 8GB 2+ cores
Ethereum archive ~2TB+ 16GB+ 4+ cores
Ethereum archive-trace ~4TB+ 32GB+ 8+ cores
L2 pruned ~100-500GB 4-8GB 2+ cores
L2 archive ~1-2TB 8-16GB 4+ cores

Note: Requirements vary by chain. Check specific chain documentation.


This file is your complete operations and debugging reference. For additional user documentation, see README.md.