- Add VIBE.md as primary debugging reference for automated tools - Rewrite README.md as human-focused operator guide - Fix README.md inaccuracies (remove show-networks.sh references, fix typo) - Split content: README for humans, VIBE for agents Generated by Mistral Vibe. Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
21 KiB
VIBE.md — ethereum-rpc-docker Operations & Debugging Guide
You are an LLM agent or operator running or debugging blockchain RPC nodes from this repository. This file is your primary reference for all operational tasks.
This repo contains Docker Compose configurations for blockchain RPC nodes plus operational scripts for managing them. Everything you need to run, monitor, debug, and fix nodes is here.
0. WHEN A NODE IS FAULTY — Start Here
Immediate Triage (30 seconds)
# 1. Is the container running?
./show-running.sh
# 2. Check overall status of all configured nodes
./show-status.sh
# 3. If you know the config name, check its specific status
./sync-status.sh <config-name>
# 4. Check logs for the faulty node
./logs.sh <config-name>
If the container isn't running, go to §3. Container Lifecycle Issues
If the container is running but not synced, go to §4. Sync Issues
If the container is running and synced but RPC fails, go to §5. RPC/Connectivity Issues
If you see errors in logs but aren't sure what they mean, go to §6. Log Interpretation
1. Repository Overview
What This Repo Contains
rpc/
├── *.yml # Docker Compose files for node configurations
├── *.sh # Operational scripts (YOUR PRIMARY TOOLS)
├── scripts/ # Additional helper scripts (CometBFT support)
├── <network>/ # Network directories (e.g., ethereum/, op/, arb/)
│ ├── *.yml # Compose files for specific chains
│ └── <chain>/ # Chain-specific assets
│ ├── genesis.json # Custom genesis files
│ ├── rollup.json # Rollup configurations (OP Stack)
│ └── *.Dockerfile # Custom build files
├── README.md # User documentation
└── VIBE.md # THIS FILE — operations guide
Key Concepts
- Config name: The compose filename WITHOUT
.yml(e.g.,ethereum-mainnet-geth-pruned) - Service name: Derived from config name, used in
docker composecommands - Short name: Used in URL paths, container labels. Format:
{network}-{chain}[-{client}][-{db_type}] - Volume names: Docker volumes follow the full config name pattern
Supported Networks
Layer 1: Ethereum, Polygon, BSC, Avalanche, Gnosis, Fantom, Core, Berachain, Ronin, Viction, Fuse, Tron, ThunderCore, Goat, AlephZero, Haqq, Taiko, Rootstock, Dogecoin, Litecoin, Bitcoin, Bitcoin-Cash, Ripple, Solana, Tron
Layer 2 (OP Stack): Optimism, Base, Zora, Mode, Blast, Fraxtal, Bob, Boba, Worldchain, Metal, Ink, Lisk, SNAX, Celo
Layer 2 (Arbitrum): Arbitrum One, Arbitrum Nova, Everclear, Playblock, Real, Connext, OpenCampusCodex
Other L2s: Linea, Scroll, zkSync Era, Metis, Moonbeam, Starknet, zkEVM, Immutable zkEVM, Polygon zkEVM
2. Essential Scripts Reference
Status & Monitoring Scripts
| Script | Usage | What It Does |
|---|---|---|
show-status.sh |
[config-name] |
Lists ALL configured nodes with sync status, block height, health |
show-running.sh |
Lists currently running containers | |
sync-status.sh |
<config-name> |
Detailed sync status for one config |
latest.sh |
<config-name> |
Latest block number + hash |
logs.sh |
<config-name> |
Tail logs from all containers in a config |
show-db-size.sh |
Disk usage of ALL Docker volumes, sorted by size | |
show-ram.sh |
<config-name> |
Memory usage of containers |
show-cpu.sh |
CPU usage display | |
peer-count.sh |
P2P peer count for all running nodes | |
time-since-last-block.sh |
<config-name> |
How long since last block was processed |
ping.sh |
<container-name> |
Test network connectivity from container |
show-errors.sh |
Show error counts/logs across containers | |
show-size.sh |
Show size of containers/volumes | |
show-file-size.sh |
Show static file sizes | |
show-static-file-size.sh |
Show static file sizes (alternative) |
Lifecycle Management Scripts
| Script | Usage | What It Does |
|---|---|---|
start.sh |
<config-name> |
Start all containers for a config |
stop.sh |
<config-name> |
Stop all containers for a config |
force-recreate.sh |
<config-name> |
Force recreate containers (keeps volumes) |
rm.sh |
<config-name> |
Remove containers (keeps volumes) |
delete-volumes.sh |
<config-name> |
DESTRUCTIVE - Remove containers AND volumes |
delete-node-keys.sh |
<config-name> |
Remove node keys (for re-initialization) |
Backup & Restore Scripts
| Script | Usage | What It Does |
|---|---|---|
backup-node.sh |
<config-name> [url] |
Backup volumes locally or to WebDAV |
restore-volumes.sh |
<config-name> [url] |
Restore volumes from local or HTTP |
clone-node.sh |
<config-name> |
Clone a node's state |
clone-backup.sh |
Clone backup files | |
clone-peers.sh |
Clone peer information | |
restore-peers.sh |
Restore peer connections | |
list-backups.sh |
List available backup files | |
list-peer-backups.sh |
List peer backup files | |
list-restorable.sh |
List restorable configurations | |
cleanup-backups.sh |
Remove old backups | |
cleanup-volumes.sh |
Clean up unused volumes |
Network & Connectivity Scripts
| Script | Usage | What It Does |
|---|---|---|
upstreams.sh |
Generate dshackle upstream configuration | |
connect-peers.sh |
Connect to peer nodes | |
search-node.sh |
<query> |
Search compose files for patterns |
search-compose.sh |
<query> |
Search compose files |
network-to-config.sh |
Map network names to config files | |
reload_dshackle.sh |
Reload dshackle configuration | |
update-whitelist.sh |
Update IP whitelist | |
update-ip.sh |
Update IP configuration |
Specialized Scripts
| Script | Usage | What It Does |
|---|---|---|
op-wheel.sh |
OP rollup maintenance (rewind, set forkchoice) | |
op-wheel-finalize-latest-block.sh |
<client_svc> [node_svc] |
Finalize latest block (nuclear option) |
catchup.sh |
<config-name> |
Help node catch up to chain head |
success-if-almost-synced.sh |
<config-name> <seconds> |
Exit 0 if node is almost synced |
groq.sh |
Query using Groq | |
trai.sh |
Trace transaction | |
multicurl.sh |
Parallel curl requests | |
blocknumber.sh |
Get block number | |
get-block.sh |
Get block information | |
get-local-url.sh |
Get local RPC URL | |
get-shortname.sh |
<config-file> |
Get short name for a config |
disk-space.sh |
Check disk space | |
limit-bandwidth.sh |
Limit bandwidth | |
maintenance.sh |
Maintenance helper | |
random-port.sh |
Generate random port | |
reference-rpc-endpoint.sh |
Reference RPC endpoint helper | |
reset-terminal.sh |
Reset terminal | |
setup-bandwidth-limit-cron.sh |
Setup cron for bandwidth limiting |
3. Container Lifecycle Issues
Symptom: Container Won't Start
# Check why it failed
./logs.sh <config-name> 2>&1 | tail -50
# Check container exit code
docker ps -a --filter "name=<config-name>" --format "{{.Names}} | {{.State}} | {{.Status}}"
# Inspect the container
docker inspect <container-name> | jq '.[0].State'
Common causes:
- Port conflict: Two services trying to bind to same host port
- Volume permission issues: Docker can't write to volume
- Missing environment variables:
.envfile incomplete - Invalid compose syntax: YAML parsing error
- Image pull failure: Network issue or private registry auth
Fixes:
# Check for port conflicts
grep -h "^[0-9]\{1,5\}:[0-9]" *.yml | sort | uniq -d
# Validate compose syntax
docker compose -f <config-file>.yml config
# Pull images manually
docker compose -f <config-file>.yml pull
# Start with --build if using custom Dockerfiles
docker compose -f <config-file>.yml up -d --build
Symptom: Container Exits Immediately After Starting
# View the last 100 lines of logs before exit
./logs.sh <config-name> 2>&1 | tail -100
# Check exit code
docker ps -a --filter "name=<service>" --format "{{.Status}}"
# Run interactively to see error
docker compose -f <config-file>.yml run --rm <service-name> sh
Common causes:
- Missing config files:
/config/mount empty or wrong path - Invalid flags: Command-line arguments malformed
- Database corruption: Existing data incompatible with new version
- Checkpoint/genesis mismatch: Chain ID or genesis doesn't match
Fixes:
# Verify config directory exists (if using custom configs)
ls -la <network>/<chain>/
# Try with fresh volumes (DESTRUCTIVE)
./delete-volumes.sh <config-name>
./start.sh <config-name>
Symptom: Container Restarts Repeatedly (Crash Loop)
# Watch logs in real-time
./logs.sh <config-name> -f
# Check restart count
docker inspect <container-name> | jq '.[0].RestartCount'
# Check last restart reason
docker inspect <container-name> | jq '.[0].State.ExitCode, .[0].State.Error'
Common causes:
- OOM killed: Memory limit exceeded
- Out of disk space: No space left on device
- Segmentation fault: Client bug or bad data
- Panic: Go client panic
Fixes:
# Check memory usage
./show-ram.sh <config-name>
# Check disk space
df -h /var/lib/docker
./show-db-size.sh
# Increase resources in compose file or .env
# Then force recreate
./force-recreate.sh <config-name>
4. Sync Issues
Symptom: Node Not Syncing (Stuck at Block 0 or Low Block)
# Check sync status
./sync-status.sh <config-name>
# Check current block
./latest.sh <config-name>
# Check logs for sync errors
./logs.sh <config-name> | grep -i -E "sync|error|fail|warn|stuck|behind"
# Check peer count
./peer-count.sh | grep <config-name>
Common causes:
- No peers: P2P network connection failed
- Wrong network: Connected to wrong chain
- Checkpoint too old: Checkpoint URL unavailable or outdated
- Snapshot download failed: Snapshot server unreachable
Fixes:
# Check if checkpoint/snapshot is configured
grep -E "(checkpoint|snapshot)" <config-file>.yml
# Test checkpoint URL manually
curl -I $(grep checkpoint <config-file>.yml | grep -oE 'http[^ ]+')
# Check peer connections (geth example)
docker exec <client-container> admin_peers | jq '.[] | .network.remoteAddress' | wc -l
Symptom: Sync is Very Slow
# Check sync speed over time
./latest.sh <config-name>; sleep 60; ./latest.sh <config-name>
# Check if node is processing blocks
./time-since-last-block.sh <config-name>
# Check CPU and memory
top -d 1 -p $(docker inspect <container> | jq -r '.[0].State.Pid')
Common causes:
- Resource constrained: CPU throttled, memory swapped
- Disk I/O bottleneck: Slow storage or contention
- Network rate limited: P2P or RPC rate limiting
- Too many peers: P2P overhead
- Wrong sync mode: Full sync instead of snap sync
Symptom: Sync Stuck at Specific Block
# Check logs around the stuck block
./logs.sh <config-name> | grep -A 10 -B 10 "block <stuck-block-number>"
# Check if it's a known bad block
# Search online: <chain> bad block <number>
Common causes:
- Bad block in chain: Requires client patch or manual intervention
- State trie inconsistency: Database corruption
- Fork choice issue: Node on wrong fork
Fixes for OP Stack:
# Try to finalize past the block
./op-wheel-finalize-latest-block.sh <client-service>
Symptom: Node on Wrong Fork / Chain
# Check chain ID
./latest.sh <config-name> | grep -i chain
# Check what chain the node thinks it's on
docker exec <client-container> ethdo chain --endpoint=http://localhost:8545
# Compare with expected chain ID
grep chainId <config-file>.yml
5. RPC/Connectivity Issues
Symptom: RPC Endpoint Not Responding
# Test from host
curl -s http://localhost:<port> | head -c 100
# Check if traefik/proxy is running
docker ps | grep -E "(traefik|proxy|nginx)"
# Check traefik logs
docker logs <traefik-container> | tail -50
Common causes:
- Container not running: Client crashed
- Port not exposed: Wrong port mapping
- Traefik misconfiguration: Labels wrong or missing
- Firewall blocking: Host firewall or cloud security group
Symptom: RPC Returns Wrong Chain ID
# Query chain ID from RPC
curl -s -X POST http://localhost:<port> \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"eth_chainId","params":[],"id":1}'
Symptom: Cannot Connect to P2P Network
# Check peer count
./peer-count.sh | grep <config-name>
# Test P2P connectivity from container
docker exec <client-container> nc -zv <bootstrap-node> <p2p-port>
Fixes:
# Set public IP in .env
IP=$(curl -s ipinfo.io/ip)
echo "IP=$IP" >> .env
./force-recreate.sh <config-name>
6. Log Interpretation
Common Log Patterns
Warnings (Node may still function)
| Pattern | Meaning | Action |
|---|---|---|
WARN.*sync.*slow |
Sync slower than expected | Check resources |
WARN.*peers.*low |
Fewer peers than desired | Check P2P connectivity |
WARN.*rate.*limit |
API rate limiting active | Normal for public endpoints |
Errors (Node is degraded)
| Pattern | Meaning | Action |
|---|---|---|
Error.*database.*corrupt |
Database corruption | Restore from backup or resync |
Error.*handshake.*fail |
P2P handshake failed | Check chain ID |
Error.*no.*peers |
Cannot connect to P2P | Check bootstrap nodes |
Error.*timeout |
RPC/HTTP timeout | Check network, increase timeout |
Fatal (Node will not function)
| Pattern | Meaning | Action |
|---|---|---|
Fatal.*panic |
Client crashed | Check client version |
Fatal.*OOM |
Out of memory | Increase memory limit |
Fatal.*disk.*full |
No disk space | Free space |
Fatal.*permission.*denied |
Filesystem permissions | Fix volume permissions |
7. Resource Issues
High CPU Usage
./show-ram.sh <config-name>
./show-cpu.sh
docker stats <container-name> --no-stream
High Memory Usage
./show-ram.sh <config-name>
docker stats <container-name> --no-stream --format "{{.Container}} | {{.MemUsage}} | {{.MemPerc}}"
High Disk Usage
./show-db-size.sh
docker system df -v
Disk I/O Bottleneck
iotop -o -d 1
8. Backup and Restore
Creating a Backup
# Local backup (to /backup directory)
./backup-node.sh <config-name>
# Remote backup (to WebDAV)
./backup-node.sh <config-name> https://backup-server.tld/dav
Restoring from Backup
# List available backups
./list-backups.sh
# Restore latest backup for config
./restore-volumes.sh <config-name>
# Restore from specific URL
./restore-volumes.sh <config-name> https://backup-server.tld/backup/
Cloning a Node
# Clone a node to a new location
./clone-node.sh <config-name>
# Clone peers (for faster sync)
./clone-peers.sh <config-name>
Nuclear Option: Full Reset
# WARNING: This deletes ALL data for the config
./stop.sh <config-name> && \
./rm.sh <config-name> && \
./delete-volumes.sh <config-name> && \
./delete-node-keys.sh <config-name> && \
./force-recreate.sh <config-name>
# Then check logs
./logs.sh <config-name>
9. Common Error Messages
Database Errors
| Error | Cause | Solution |
|---|---|---|
database is corrupted |
Power loss, bug | Restore from backup or resync |
database version mismatch |
Client version changed | Delete and resync |
P2P Errors
| Error | Cause | Solution |
|---|---|---|
no configured peers |
Missing bootstrap nodes | Add bootstrap nodes |
handshake failed |
Chain ID mismatch | Verify genesis.json |
RPC Errors
| Error | Cause | Solution |
|---|---|---|
method not found |
Wrong client | Use correct client |
connection refused |
Port not open | Check container running, port mapping |
10. OP Stack Specific Debugging
OP Node Issues
# Check op-node logs
./logs.sh <config-name> | grep -i "op-node\|rollup\|sequencer"
# Check rollup configuration (if custom)
cat op/<network>/ethereum/rollup.json | jq .
# Check if rollup.json is mounted
docker exec <op-node-container> cat /config/rollup.json | jq .
OP Wheel (Manual Intervention)
# Rewind to specific block (DANGEROUS - only if you know what you're doing)
./op-wheel.sh engine set-forkchoice \
--unsafe=<block-hash> \
--safe=<block-hash> \
--finalized=<block-hash> \
--engine=http://<client-service>:8551/ \
--engine.open=http://<client-service>:8545 \
--engine.jwt-secret-path=/jwtsecret
# Nuclear option: finalize latest local block
./op-wheel-finalize-latest-block.sh <client-service> <node-service>
11. CometBFT Family (Cosmos, etc.) Specific
Init Container Issues
# CometBFT chains use init.sh inside the container
# The master script is at scripts/cometbft-common.sh
# Check if init completed
./logs.sh <config-name> | grep -i "init\|setup\|complete"
# Check the init script
cat <network>/<chain>/scripts/init.sh
12. Quick Start Guide
Starting a Node
# 1. Set up environment
echo "IP=$(curl -s ipinfo.io/ip)" > .env
echo "DOMAIN=${IP//./-}.traefik.me" >> .env
echo "MAIL=your-email@example.com" >> .env
# 2. Select which nodes to run
# Add compose files to COMPOSE_FILE (colon-separated)
echo "COMPOSE_FILE=base.yml:rpc.yml:ethereum-mainnet-geth-pruned.yml" >> .env
# 3. Start the node
docker compose up -d
# 4. Verify it's running
./show-status.sh
Accessing Your Node
# Once running, access via:
# HTTP: http://<your-domain>/ethereum-mainnet-geth-pruned
# HTTPS: https://<your-domain>/ethereum-mainnet-geth-pruned
# WebSocket: wss://<your-domain>/ethereum-mainnet-geth-pruned
# Or locally (if NO_SSL=true):
# HTTP: http://localhost:<port>
13. Configuration Reference
Environment Variables
Required for most setups:
IP=203.0.113.42 # Your public IP
DOMAIN=203-0-113-42.traefik.me # Your domain (traefik.me for testing)
MAIL=your-email@example.com # For Let's Encrypt SSL
WHITELIST=0.0.0.0/0 # IP whitelist (0.0.0.0/0 = all)
Optional:
NO_SSL=true # Disable SSL (testing only)
CHAINS_SUBNET=192.168.0.0/26 # Docker network subnet
Chain-specific (examples):
ETHEREUM_MAINNET_EXECUTION_RPC=https://fallback-rpc.example.com
ARBITRUM_SEPOLIA_EXECUTION_RPC=https://arb-sepolia-rpc.example.com
OP_NODE_NETWORK=mainnet
OP_NODE_L1_RPC_URL=https://l1-rpc.example.com
Compose File Structure
Each compose file defines one or more services:
- client: Execution layer (Geth, Erigon, Reth, etc.)
- node: Consensus/derivation node (op-node, lighthouse, etc.)
- relay: DA relay (eigenda-proxy, op-alt, etc.)
- proxy: HTTP/WS proxy (nginx, etc.)
- database: External database (Postgres, etc.)
Volume Naming
Volumes are named after the config:
<config-name>_<service>_data
<config-name>_<service>_config
Example: ethereum-mainnet-geth-pruned_client_data
14. Quick Debugging Checklist
Use this checklist when debugging an issue:
- Is the container running? →
./show-running.sh - Are there errors in logs? →
./logs.sh <config> | grep -i error - Is the node synced? →
./sync-status.sh <config> - Are peers connected? →
./peer-count.sh - Are resources adequate? →
./show-ram.sh,./show-db-size.sh - Is P2P working? → Check peer count
- Is RPC responding? → Test with curl
- Is disk space available? →
df -h /var/lib/docker - Is the config file correct? →
docker compose -f <file>.yml config - Are environment variables set? → Check
.env - Is the genesis file correct? → Check chain ID
15. When to Escalate
Escalate to a human operator if:
- Node stuck for > 2 hours with no progress
- Repeated
Fatalorpanicerrors after restart - Database corruption confirmed
- Issue affects multiple nodes across different chains
- Need to force-push to this repo
16. File Locations Quick Reference
| What You Need | Where to Find It |
|---|---|
| Compose files | Root of this repo (*.yml) |
| Operational scripts | Root of this repo (*.sh) |
| Chain assets | <network>/<chain>/ or <stack>/<network>/<settlement>/ |
| Genesis files | <stack>/<network>/<settlement>/genesis.json |
| Rollup configs | op/<network>/<settlement>/rollup.json |
| Custom Dockerfiles | <path>/*.Dockerfile |
| Init scripts | <path>/scripts/init.sh |
| CometBFT common | scripts/cometbft-common.sh |
| Compose registry | compose_registry.json |
| RPC endpoints | reference-rpc-endpoint.json |
| Environment | .env |
17. Resource Requirements Reference
| Node Type | Disk | RAM | CPU |
|---|---|---|---|
| Ethereum pruned | ~500GB | 8GB | 2+ cores |
| Ethereum archive | ~2TB+ | 16GB+ | 4+ cores |
| Ethereum archive-trace | ~4TB+ | 32GB+ | 8+ cores |
| L2 pruned | ~100-500GB | 4-8GB | 2+ cores |
| L2 archive | ~1-2TB | 8-16GB | 4+ cores |
Note: Requirements vary by chain. Check specific chain documentation.
This file is your complete operations and debugging reference. For additional user documentation, see README.md.