Files

rob dd8ce689e4 Add VIBE.md debugging guide and update README.md

- Add VIBE.md as primary debugging reference for automated tools
- Rewrite README.md as human-focused operator guide
- Fix README.md inaccuracies (remove show-networks.sh references, fix typo)
- Split content: README for humans, VIBE for agents

Generated by Mistral Vibe.
Co-Authored-By: Mistral Vibe <vibe@mistral.ai>

2026-06-22 08:37:38 +00:00

21 KiB

Raw Blame History

VIBE.md — ethereum-rpc-docker Operations & Debugging Guide

You are an LLM agent or operator running or debugging blockchain RPC nodes from this repository. This file is your primary reference for all operational tasks.

This repo contains Docker Compose configurations for blockchain RPC nodes plus operational scripts for managing them. Everything you need to run, monitor, debug, and fix nodes is here.

0. WHEN A NODE IS FAULTY — Start Here

Immediate Triage (30 seconds)

# 1. Is the container running?
./show-running.sh

# 2. Check overall status of all configured nodes
./show-status.sh

# 3. If you know the config name, check its specific status
./sync-status.sh <config-name>

# 4. Check logs for the faulty node
./logs.sh <config-name>

If the container isn't running, go to §3. Container Lifecycle Issues

If the container is running but not synced, go to §4. Sync Issues

If the container is running and synced but RPC fails, go to §5. RPC/Connectivity Issues

If you see errors in logs but aren't sure what they mean, go to §6. Log Interpretation

1. Repository Overview

What This Repo Contains

rpc/
├── *.yml                    # Docker Compose files for node configurations
├── *.sh                     # Operational scripts (YOUR PRIMARY TOOLS)
├── scripts/                 # Additional helper scripts (CometBFT support)
├── <network>/               # Network directories (e.g., ethereum/, op/, arb/)
│   ├── *.yml                # Compose files for specific chains
│   └── <chain>/             # Chain-specific assets
│       ├── genesis.json     # Custom genesis files
│       ├── rollup.json      # Rollup configurations (OP Stack)
│       └── *.Dockerfile      # Custom build files
├── README.md                # User documentation
└── VIBE.md                  # THIS FILE — operations guide

Key Concepts

Config name: The compose filename WITHOUT .yml (e.g., ethereum-mainnet-geth-pruned)
Service name: Derived from config name, used in docker compose commands
Short name: Used in URL paths, container labels. Format: {network}-{chain}[-{client}][-{db_type}]
Volume names: Docker volumes follow the full config name pattern

Supported Networks

Layer 1: Ethereum, Polygon, BSC, Avalanche, Gnosis, Fantom, Core, Berachain, Ronin, Viction, Fuse, Tron, ThunderCore, Goat, AlephZero, Haqq, Taiko, Rootstock, Dogecoin, Litecoin, Bitcoin, Bitcoin-Cash, Ripple, Solana, Tron

Layer 2 (OP Stack): Optimism, Base, Zora, Mode, Blast, Fraxtal, Bob, Boba, Worldchain, Metal, Ink, Lisk, SNAX, Celo

Layer 2 (Arbitrum): Arbitrum One, Arbitrum Nova, Everclear, Playblock, Real, Connext, OpenCampusCodex

Other L2s: Linea, Scroll, zkSync Era, Metis, Moonbeam, Starknet, zkEVM, Immutable zkEVM, Polygon zkEVM

2. Essential Scripts Reference

Status & Monitoring Scripts

Script	Usage	What It Does
`show-status.sh`	`[config-name]`	Lists ALL configured nodes with sync status, block height, health
`show-running.sh`		Lists currently running containers
`sync-status.sh`	`<config-name>`	Detailed sync status for one config
`latest.sh`	`<config-name>`	Latest block number + hash
`logs.sh`	`<config-name>`	Tail logs from all containers in a config
`show-db-size.sh`		Disk usage of ALL Docker volumes, sorted by size
`show-ram.sh`	`<config-name>`	Memory usage of containers
`show-cpu.sh`		CPU usage display
`peer-count.sh`		P2P peer count for all running nodes
`time-since-last-block.sh`	`<config-name>`	How long since last block was processed
`ping.sh`	`<container-name>`	Test network connectivity from container
`show-errors.sh`		Show error counts/logs across containers
`show-size.sh`		Show size of containers/volumes
`show-file-size.sh`		Show static file sizes
`show-static-file-size.sh`		Show static file sizes (alternative)

Lifecycle Management Scripts

Script	Usage	What It Does
`start.sh`	`<config-name>`	Start all containers for a config
`stop.sh`	`<config-name>`	Stop all containers for a config
`force-recreate.sh`	`<config-name>`	Force recreate containers (keeps volumes)
`rm.sh`	`<config-name>`	Remove containers (keeps volumes)
`delete-volumes.sh`	`<config-name>`	DESTRUCTIVE - Remove containers AND volumes
`delete-node-keys.sh`	`<config-name>`	Remove node keys (for re-initialization)

Backup & Restore Scripts

Script	Usage	What It Does
`backup-node.sh`	`<config-name> [url]`	Backup volumes locally or to WebDAV
`restore-volumes.sh`	`<config-name> [url]`	Restore volumes from local or HTTP
`clone-node.sh`	`<config-name>`	Clone a node's state
`clone-backup.sh`		Clone backup files
`clone-peers.sh`		Clone peer information
`restore-peers.sh`		Restore peer connections
`list-backups.sh`		List available backup files
`list-peer-backups.sh`		List peer backup files
`list-restorable.sh`		List restorable configurations
`cleanup-backups.sh`		Remove old backups
`cleanup-volumes.sh`		Clean up unused volumes

Network & Connectivity Scripts

Script	Usage	What It Does
`upstreams.sh`		Generate dshackle upstream configuration
`connect-peers.sh`		Connect to peer nodes
`search-node.sh`	`<query>`	Search compose files for patterns
`search-compose.sh`	`<query>`	Search compose files
`network-to-config.sh`		Map network names to config files
`reload_dshackle.sh`		Reload dshackle configuration
`update-whitelist.sh`		Update IP whitelist
`update-ip.sh`		Update IP configuration

Specialized Scripts

Script	Usage	What It Does
`op-wheel.sh`		OP rollup maintenance (rewind, set forkchoice)
`op-wheel-finalize-latest-block.sh`	`<client_svc> [node_svc]`	Finalize latest block (nuclear option)
`catchup.sh`	`<config-name>`	Help node catch up to chain head
`success-if-almost-synced.sh`	`<config-name> <seconds>`	Exit 0 if node is almost synced
`groq.sh`		Query using Groq
`trai.sh`		Trace transaction
`multicurl.sh`		Parallel curl requests
`blocknumber.sh`		Get block number
`get-block.sh`		Get block information
`get-local-url.sh`		Get local RPC URL
`get-shortname.sh`	`<config-file>`	Get short name for a config
`disk-space.sh`		Check disk space
`limit-bandwidth.sh`		Limit bandwidth
`maintenance.sh`		Maintenance helper
`random-port.sh`		Generate random port
`reference-rpc-endpoint.sh`		Reference RPC endpoint helper
`reset-terminal.sh`		Reset terminal
`setup-bandwidth-limit-cron.sh`		Setup cron for bandwidth limiting

3. Container Lifecycle Issues

Symptom: Container Won't Start

# Check why it failed
./logs.sh <config-name> 2>&1 | tail -50

# Check container exit code
docker ps -a --filter "name=<config-name>" --format "{{.Names}} | {{.State}} | {{.Status}}"

# Inspect the container
docker inspect <container-name> | jq '.[0].State'

Common causes:

Port conflict: Two services trying to bind to same host port
Volume permission issues: Docker can't write to volume
Missing environment variables: .env file incomplete
Invalid compose syntax: YAML parsing error
Image pull failure: Network issue or private registry auth

Fixes:

# Check for port conflicts
grep -h "^[0-9]\{1,5\}:[0-9]" *.yml | sort | uniq -d

# Validate compose syntax
docker compose -f <config-file>.yml config

# Pull images manually
docker compose -f <config-file>.yml pull

# Start with --build if using custom Dockerfiles
docker compose -f <config-file>.yml up -d --build

Symptom: Container Exits Immediately After Starting

# View the last 100 lines of logs before exit
./logs.sh <config-name> 2>&1 | tail -100

# Check exit code
docker ps -a --filter "name=<service>" --format "{{.Status}}"

# Run interactively to see error
docker compose -f <config-file>.yml run --rm <service-name> sh

Common causes:

Missing config files: /config/ mount empty or wrong path
Invalid flags: Command-line arguments malformed
Database corruption: Existing data incompatible with new version
Checkpoint/genesis mismatch: Chain ID or genesis doesn't match

Fixes:

# Verify config directory exists (if using custom configs)
ls -la <network>/<chain>/

# Try with fresh volumes (DESTRUCTIVE)
./delete-volumes.sh <config-name>
./start.sh <config-name>

Symptom: Container Restarts Repeatedly (Crash Loop)

# Watch logs in real-time
./logs.sh <config-name> -f

# Check restart count
docker inspect <container-name> | jq '.[0].RestartCount'

# Check last restart reason
docker inspect <container-name> | jq '.[0].State.ExitCode, .[0].State.Error'

Common causes:

OOM killed: Memory limit exceeded
Out of disk space: No space left on device
Segmentation fault: Client bug or bad data
Panic: Go client panic

Fixes:

# Check memory usage
./show-ram.sh <config-name>

# Check disk space
df -h /var/lib/docker
./show-db-size.sh

# Increase resources in compose file or .env
# Then force recreate
./force-recreate.sh <config-name>

4. Sync Issues

Symptom: Node Not Syncing (Stuck at Block 0 or Low Block)

# Check sync status
./sync-status.sh <config-name>

# Check current block
./latest.sh <config-name>

# Check logs for sync errors
./logs.sh <config-name> | grep -i -E "sync|error|fail|warn|stuck|behind"

# Check peer count
./peer-count.sh | grep <config-name>

Common causes:

No peers: P2P network connection failed
Wrong network: Connected to wrong chain
Checkpoint too old: Checkpoint URL unavailable or outdated
Snapshot download failed: Snapshot server unreachable

Fixes:

# Check if checkpoint/snapshot is configured
grep -E "(checkpoint|snapshot)" <config-file>.yml

# Test checkpoint URL manually
curl -I $(grep checkpoint <config-file>.yml | grep -oE 'http[^ ]+')

# Check peer connections (geth example)
docker exec <client-container> admin_peers | jq '.[] | .network.remoteAddress' | wc -l

Symptom: Sync is Very Slow

# Check sync speed over time
./latest.sh <config-name>; sleep 60; ./latest.sh <config-name>

# Check if node is processing blocks
./time-since-last-block.sh <config-name>

# Check CPU and memory
top -d 1 -p $(docker inspect <container> | jq -r '.[0].State.Pid')

Common causes:

Resource constrained: CPU throttled, memory swapped
Disk I/O bottleneck: Slow storage or contention
Network rate limited: P2P or RPC rate limiting
Too many peers: P2P overhead
Wrong sync mode: Full sync instead of snap sync

Symptom: Sync Stuck at Specific Block

# Check logs around the stuck block
./logs.sh <config-name> | grep -A 10 -B 10 "block <stuck-block-number>"

# Check if it's a known bad block
# Search online: <chain> bad block <number>

Common causes:

Bad block in chain: Requires client patch or manual intervention
State trie inconsistency: Database corruption
Fork choice issue: Node on wrong fork

Fixes for OP Stack:

# Try to finalize past the block
./op-wheel-finalize-latest-block.sh <client-service>

Symptom: Node on Wrong Fork / Chain

# Check chain ID
./latest.sh <config-name> | grep -i chain

# Check what chain the node thinks it's on
docker exec <client-container> ethdo chain --endpoint=http://localhost:8545

# Compare with expected chain ID
grep chainId <config-file>.yml

5. RPC/Connectivity Issues

Symptom: RPC Endpoint Not Responding

# Test from host
curl -s http://localhost:<port> | head -c 100

# Check if traefik/proxy is running
docker ps | grep -E "(traefik|proxy|nginx)"

# Check traefik logs
docker logs <traefik-container> | tail -50

Common causes:

Container not running: Client crashed
Port not exposed: Wrong port mapping
Traefik misconfiguration: Labels wrong or missing
Firewall blocking: Host firewall or cloud security group

Symptom: RPC Returns Wrong Chain ID

# Query chain ID from RPC
curl -s -X POST http://localhost:<port> \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","method":"eth_chainId","params":[],"id":1}'

Symptom: Cannot Connect to P2P Network

# Check peer count
./peer-count.sh | grep <config-name>

# Test P2P connectivity from container
docker exec <client-container> nc -zv <bootstrap-node> <p2p-port>

Fixes:

# Set public IP in .env
IP=$(curl -s ipinfo.io/ip)
echo "IP=$IP" >> .env
./force-recreate.sh <config-name>

6. Log Interpretation

Common Log Patterns

Warnings (Node may still function)

Pattern	Meaning	Action
`WARN.sync.slow`	Sync slower than expected	Check resources
`WARN.peers.low`	Fewer peers than desired	Check P2P connectivity
`WARN.rate.limit`	API rate limiting active	Normal for public endpoints

Errors (Node is degraded)

Pattern	Meaning	Action
`Error.database.corrupt`	Database corruption	Restore from backup or resync
`Error.handshake.fail`	P2P handshake failed	Check chain ID
`Error.no.peers`	Cannot connect to P2P	Check bootstrap nodes
`Error.*timeout`	RPC/HTTP timeout	Check network, increase timeout

Fatal (Node will not function)

Pattern	Meaning	Action
`Fatal.*panic`	Client crashed	Check client version
`Fatal.*OOM`	Out of memory	Increase memory limit
`Fatal.disk.full`	No disk space	Free space
`Fatal.permission.denied`	Filesystem permissions	Fix volume permissions

7. Resource Issues

High CPU Usage

./show-ram.sh <config-name>
./show-cpu.sh
docker stats <container-name> --no-stream

High Memory Usage

./show-ram.sh <config-name>
docker stats <container-name> --no-stream --format "{{.Container}} | {{.MemUsage}} | {{.MemPerc}}"

High Disk Usage

./show-db-size.sh
docker system df -v

Disk I/O Bottleneck

iotop -o -d 1

8. Backup and Restore

Creating a Backup

# Local backup (to /backup directory)
./backup-node.sh <config-name>

# Remote backup (to WebDAV)
./backup-node.sh <config-name> https://backup-server.tld/dav

Restoring from Backup

# List available backups
./list-backups.sh

# Restore latest backup for config
./restore-volumes.sh <config-name>

# Restore from specific URL
./restore-volumes.sh <config-name> https://backup-server.tld/backup/

Cloning a Node

# Clone a node to a new location
./clone-node.sh <config-name>

# Clone peers (for faster sync)
./clone-peers.sh <config-name>

Nuclear Option: Full Reset

# WARNING: This deletes ALL data for the config
./stop.sh <config-name> && \
./rm.sh <config-name> && \
./delete-volumes.sh <config-name> && \
./delete-node-keys.sh <config-name> && \
./force-recreate.sh <config-name>

# Then check logs
./logs.sh <config-name>

9. Common Error Messages

Database Errors

Error	Cause	Solution
`database is corrupted`	Power loss, bug	Restore from backup or resync
`database version mismatch`	Client version changed	Delete and resync

P2P Errors

Error	Cause	Solution
`no configured peers`	Missing bootstrap nodes	Add bootstrap nodes
`handshake failed`	Chain ID mismatch	Verify genesis.json

RPC Errors

Error	Cause	Solution
`method not found`	Wrong client	Use correct client
`connection refused`	Port not open	Check container running, port mapping

10. OP Stack Specific Debugging

OP Node Issues

# Check op-node logs
./logs.sh <config-name> | grep -i "op-node\|rollup\|sequencer"

# Check rollup configuration (if custom)
cat op/<network>/ethereum/rollup.json | jq .

# Check if rollup.json is mounted
docker exec <op-node-container> cat /config/rollup.json | jq .

OP Wheel (Manual Intervention)

# Rewind to specific block (DANGEROUS - only if you know what you're doing)
./op-wheel.sh engine set-forkchoice \
  --unsafe=<block-hash> \
  --safe=<block-hash> \
  --finalized=<block-hash> \
  --engine=http://<client-service>:8551/ \
  --engine.open=http://<client-service>:8545 \
  --engine.jwt-secret-path=/jwtsecret

# Nuclear option: finalize latest local block
./op-wheel-finalize-latest-block.sh <client-service> <node-service>

11. CometBFT Family (Cosmos, etc.) Specific

Init Container Issues

# CometBFT chains use init.sh inside the container
# The master script is at scripts/cometbft-common.sh

# Check if init completed
./logs.sh <config-name> | grep -i "init\|setup\|complete"

# Check the init script
cat <network>/<chain>/scripts/init.sh

12. Quick Start Guide

Starting a Node

# 1. Set up environment
echo "IP=$(curl -s ipinfo.io/ip)" > .env
echo "DOMAIN=${IP//./-}.traefik.me" >> .env
echo "MAIL=your-email@example.com" >> .env

# 2. Select which nodes to run
# Add compose files to COMPOSE_FILE (colon-separated)
echo "COMPOSE_FILE=base.yml:rpc.yml:ethereum-mainnet-geth-pruned.yml" >> .env

# 3. Start the node
docker compose up -d

# 4. Verify it's running
./show-status.sh

Accessing Your Node

# Once running, access via:
# HTTP: http://<your-domain>/ethereum-mainnet-geth-pruned
# HTTPS: https://<your-domain>/ethereum-mainnet-geth-pruned
# WebSocket: wss://<your-domain>/ethereum-mainnet-geth-pruned

# Or locally (if NO_SSL=true):
# HTTP: http://localhost:<port>

13. Configuration Reference

Environment Variables

Required for most setups:

IP=203.0.113.42                    # Your public IP
DOMAIN=203-0-113-42.traefik.me    # Your domain (traefik.me for testing)
MAIL=your-email@example.com        # For Let's Encrypt SSL
WHITELIST=0.0.0.0/0               # IP whitelist (0.0.0.0/0 = all)

Optional:

NO_SSL=true                       # Disable SSL (testing only)
CHAINS_SUBNET=192.168.0.0/26      # Docker network subnet

Chain-specific (examples):

ETHEREUM_MAINNET_EXECUTION_RPC=https://fallback-rpc.example.com
ARBITRUM_SEPOLIA_EXECUTION_RPC=https://arb-sepolia-rpc.example.com
OP_NODE_NETWORK=mainnet
OP_NODE_L1_RPC_URL=https://l1-rpc.example.com

Compose File Structure

Each compose file defines one or more services:

client: Execution layer (Geth, Erigon, Reth, etc.)
node: Consensus/derivation node (op-node, lighthouse, etc.)
relay: DA relay (eigenda-proxy, op-alt, etc.)
proxy: HTTP/WS proxy (nginx, etc.)
database: External database (Postgres, etc.)

Volume Naming

Volumes are named after the config:

<config-name>_<service>_data
<config-name>_<service>_config

Example: ethereum-mainnet-geth-pruned_client_data

14. Quick Debugging Checklist

Use this checklist when debugging an issue:

Is the container running? → ./show-running.sh
Are there errors in logs? → ./logs.sh <config> | grep -i error
Is the node synced? → ./sync-status.sh <config>
Are peers connected? → ./peer-count.sh
Are resources adequate? → ./show-ram.sh, ./show-db-size.sh
Is P2P working? → Check peer count
Is RPC responding? → Test with curl
Is disk space available? → df -h /var/lib/docker
Is the config file correct? → docker compose -f <file>.yml config
Are environment variables set? → Check .env
Is the genesis file correct? → Check chain ID

15. When to Escalate

Escalate to a human operator if:

Node stuck for > 2 hours with no progress
Repeated Fatal or panic errors after restart
Database corruption confirmed
Issue affects multiple nodes across different chains
Need to force-push to this repo

16. File Locations Quick Reference

What You Need	Where to Find It
Compose files	Root of this repo (`*.yml`)
Operational scripts	Root of this repo (`*.sh`)
Chain assets	`<network>/<chain>/` or `<stack>/<network>/<settlement>/`
Genesis files	`<stack>/<network>/<settlement>/genesis.json`
Rollup configs	`op/<network>/<settlement>/rollup.json`
Custom Dockerfiles	`<path>/*.Dockerfile`
Init scripts	`<path>/scripts/init.sh`
CometBFT common	`scripts/cometbft-common.sh`
Compose registry	`compose_registry.json`
RPC endpoints	`reference-rpc-endpoint.json`
Environment	`.env`

17. Resource Requirements Reference

Node Type	Disk	RAM	CPU
Ethereum pruned	~500GB	8GB	2+ cores
Ethereum archive	~2TB+	16GB+	4+ cores
Ethereum archive-trace	~4TB+	32GB+	8+ cores
L2 pruned	~100-500GB	4-8GB	2+ cores
L2 archive	~1-2TB	8-16GB	4+ cores

Note: Requirements vary by chain. Check specific chain documentation.

This file is your complete operations and debugging reference. For additional user documentation, see README.md.

21 KiB Raw Blame History

VIBE.md — ethereum-rpc-docker Operations & Debugging Guide

0. WHEN A NODE IS FAULTY — Start Here

Immediate Triage (30 seconds)

1. Repository Overview

What This Repo Contains

Key Concepts

Supported Networks

2. Essential Scripts Reference

Status & Monitoring Scripts

Lifecycle Management Scripts

Backup & Restore Scripts

Network & Connectivity Scripts

Specialized Scripts

3. Container Lifecycle Issues

Symptom: Container Won't Start

Symptom: Container Exits Immediately After Starting

Symptom: Container Restarts Repeatedly (Crash Loop)

4. Sync Issues

Symptom: Node Not Syncing (Stuck at Block 0 or Low Block)

Symptom: Sync is Very Slow

Symptom: Sync Stuck at Specific Block

Symptom: Node on Wrong Fork / Chain

5. RPC/Connectivity Issues

Symptom: RPC Endpoint Not Responding

Symptom: RPC Returns Wrong Chain ID

Symptom: Cannot Connect to P2P Network

6. Log Interpretation

Common Log Patterns

Warnings (Node may still function)

Errors (Node is degraded)

Fatal (Node will not function)

7. Resource Issues

High CPU Usage

High Memory Usage

High Disk Usage

Disk I/O Bottleneck

8. Backup and Restore

Creating a Backup

Restoring from Backup

Cloning a Node

Nuclear Option: Full Reset

9. Common Error Messages

Database Errors

P2P Errors

RPC Errors

10. OP Stack Specific Debugging

OP Node Issues

OP Wheel (Manual Intervention)

11. CometBFT Family (Cosmos, etc.) Specific

Init Container Issues

12. Quick Start Guide

Starting a Node

Accessing Your Node

13. Configuration Reference

Environment Variables

Compose File Structure

Volume Naming

14. Quick Debugging Checklist

15. When to Escalate

16. File Locations Quick Reference

17. Resource Requirements Reference

21 KiB

Raw Blame History