# VIBE.md — ethereum-rpc-docker Operations & Debugging Guide You are an LLM agent or operator **running or debugging blockchain RPC nodes** from this repository. This file is your **primary reference** for all operational tasks. This repo contains Docker Compose configurations for blockchain RPC nodes plus operational scripts for managing them. Everything you need to run, monitor, debug, and fix nodes is here. --- ## 0. WHEN A NODE IS FAULTY — Start Here ### Immediate Triage (30 seconds) ```bash # 1. Is the container running? ./show-running.sh # 2. Check overall status of all configured nodes ./show-status.sh # 3. If you know the config name, check its specific status ./sync-status.sh # 4. Check logs for the faulty node ./logs.sh ``` **If the container isn't running**, go to [§3. Container Lifecycle Issues](#3-container-lifecycle-issues) **If the container is running but not synced**, go to [§4. Sync Issues](#4-sync-issues) **If the container is running and synced but RPC fails**, go to [§5. RPC/Connectivity Issues](#5-rpcconnectivity-issues) **If you see errors in logs but aren't sure what they mean**, go to [§6. Log Interpretation](#6-log-interpretation) --- ## 1. Repository Overview ### What This Repo Contains ``` rpc/ ├── *.yml # Docker Compose files for node configurations ├── *.sh # Operational scripts (YOUR PRIMARY TOOLS) ├── scripts/ # Additional helper scripts (CometBFT support) ├── / # Network directories (e.g., ethereum/, op/, arb/) │ ├── *.yml # Compose files for specific chains │ └── / # Chain-specific assets │ ├── genesis.json # Custom genesis files │ ├── rollup.json # Rollup configurations (OP Stack) │ └── *.Dockerfile # Custom build files ├── README.md # User documentation └── VIBE.md # THIS FILE — operations guide ``` ### Key Concepts - **Config name**: The compose filename WITHOUT `.yml` (e.g., `ethereum-mainnet-geth-pruned`) - **Service name**: Derived from config name, used in `docker compose` commands - **Short name**: Used in URL paths, container labels. Format: `{network}-{chain}[-{client}][-{db_type}]` - **Volume names**: Docker volumes follow the full config name pattern ### Supported Networks **Layer 1**: Ethereum, Polygon, BSC, Avalanche, Gnosis, Fantom, Core, Berachain, Ronin, Viction, Fuse, Tron, ThunderCore, Goat, AlephZero, Haqq, Taiko, Rootstock, Dogecoin, Litecoin, Bitcoin, Bitcoin-Cash, Ripple, Solana, Tron **Layer 2 (OP Stack)**: Optimism, Base, Zora, Mode, Blast, Fraxtal, Bob, Boba, Worldchain, Metal, Ink, Lisk, SNAX, Celo **Layer 2 (Arbitrum)**: Arbitrum One, Arbitrum Nova, Everclear, Playblock, Real, Connext, OpenCampusCodex **Other L2s**: Linea, Scroll, zkSync Era, Metis, Moonbeam, Starknet, zkEVM, Immutable zkEVM, Polygon zkEVM --- ## 2. Essential Scripts Reference ### Status & Monitoring Scripts | Script | Usage | What It Does | |---|---|---| | `show-status.sh` | `[config-name]` | Lists ALL configured nodes with sync status, block height, health | | `show-running.sh` | | Lists currently running containers | | `sync-status.sh` | `` | Detailed sync status for one config | | `latest.sh` | `` | Latest block number + hash | | `logs.sh` | `` | Tail logs from all containers in a config | | `show-db-size.sh` | | Disk usage of ALL Docker volumes, sorted by size | | `show-ram.sh` | `` | Memory usage of containers | | `show-cpu.sh` | | CPU usage display | | `peer-count.sh` | | P2P peer count for all running nodes | | `time-since-last-block.sh` | `` | How long since last block was processed | | `ping.sh` | `` | Test network connectivity from container | | `show-errors.sh` | | Show error counts/logs across containers | | `show-size.sh` | | Show size of containers/volumes | | `show-file-size.sh` | | Show static file sizes | | `show-static-file-size.sh` | | Show static file sizes (alternative) | ### Lifecycle Management Scripts | Script | Usage | What It Does | |---|---|---| | `start.sh` | `` | Start all containers for a config | | `stop.sh` | `` | Stop all containers for a config | | `force-recreate.sh` | `` | Force recreate containers (keeps volumes) | | `rm.sh` | `` | Remove containers (keeps volumes) | | `delete-volumes.sh` | `` | **DESTRUCTIVE** - Remove containers AND volumes | | `delete-node-keys.sh` | `` | Remove node keys (for re-initialization) | ### Backup & Restore Scripts | Script | Usage | What It Does | |---|---|---| | `backup-node.sh` | ` [url]` | Backup volumes locally or to WebDAV | | `restore-volumes.sh` | ` [url]` | Restore volumes from local or HTTP | | `clone-node.sh` | `` | Clone a node's state | | `clone-backup.sh` | | Clone backup files | | `clone-peers.sh` | | Clone peer information | | `restore-peers.sh` | | Restore peer connections | | `list-backups.sh` | | List available backup files | | `list-peer-backups.sh` | | List peer backup files | | `list-restorable.sh` | | List restorable configurations | | `cleanup-backups.sh` | | Remove old backups | | `cleanup-volumes.sh` | | Clean up unused volumes | ### Network & Connectivity Scripts | Script | Usage | What It Does | |---|---|---| | `upstreams.sh` | | Generate dshackle upstream configuration | | `connect-peers.sh` | | Connect to peer nodes | | `search-node.sh` | `` | Search compose files for patterns | | `search-compose.sh` | `` | Search compose files | | `network-to-config.sh` | | Map network names to config files | | `reload_dshackle.sh` | | Reload dshackle configuration | | `update-whitelist.sh` | | Update IP whitelist | | `update-ip.sh` | | Update IP configuration | ### Specialized Scripts | Script | Usage | What It Does | |---|---|---| | `op-wheel.sh` | | OP rollup maintenance (rewind, set forkchoice) | | `op-wheel-finalize-latest-block.sh` | ` [node_svc]` | Finalize latest block (nuclear option) | | `catchup.sh` | `` | Help node catch up to chain head | | `success-if-almost-synced.sh` | ` ` | Exit 0 if node is almost synced | | `groq.sh` | | Query using Groq | | `trai.sh` | | Trace transaction | | `multicurl.sh` | | Parallel curl requests | | `blocknumber.sh` | | Get block number | | `get-block.sh` | | Get block information | | `get-local-url.sh` | | Get local RPC URL | | `get-shortname.sh` | `` | Get short name for a config | | `disk-space.sh` | | Check disk space | | `limit-bandwidth.sh` | | Limit bandwidth | | `maintenance.sh` | | Maintenance helper | | `random-port.sh` | | Generate random port | | `reference-rpc-endpoint.sh` | | Reference RPC endpoint helper | | `reset-terminal.sh` | | Reset terminal | | `setup-bandwidth-limit-cron.sh` | | Setup cron for bandwidth limiting | --- ## 3. Container Lifecycle Issues ### Symptom: Container Won't Start ```bash # Check why it failed ./logs.sh 2>&1 | tail -50 # Check container exit code docker ps -a --filter "name=" --format "{{.Names}} | {{.State}} | {{.Status}}" # Inspect the container docker inspect | jq '.[0].State' ``` **Common causes:** - **Port conflict**: Two services trying to bind to same host port - **Volume permission issues**: Docker can't write to volume - **Missing environment variables**: `.env` file incomplete - **Invalid compose syntax**: YAML parsing error - **Image pull failure**: Network issue or private registry auth **Fixes:** ```bash # Check for port conflicts grep -h "^[0-9]\{1,5\}:[0-9]" *.yml | sort | uniq -d # Validate compose syntax docker compose -f .yml config # Pull images manually docker compose -f .yml pull # Start with --build if using custom Dockerfiles docker compose -f .yml up -d --build ``` ### Symptom: Container Exits Immediately After Starting ```bash # View the last 100 lines of logs before exit ./logs.sh 2>&1 | tail -100 # Check exit code docker ps -a --filter "name=" --format "{{.Status}}" # Run interactively to see error docker compose -f .yml run --rm sh ``` **Common causes:** - **Missing config files**: `/config/` mount empty or wrong path - **Invalid flags**: Command-line arguments malformed - **Database corruption**: Existing data incompatible with new version - **Checkpoint/genesis mismatch**: Chain ID or genesis doesn't match **Fixes:** ```bash # Verify config directory exists (if using custom configs) ls -la // # Try with fresh volumes (DESTRUCTIVE) ./delete-volumes.sh ./start.sh ``` ### Symptom: Container Restarts Repeatedly (Crash Loop) ```bash # Watch logs in real-time ./logs.sh -f # Check restart count docker inspect | jq '.[0].RestartCount' # Check last restart reason docker inspect | jq '.[0].State.ExitCode, .[0].State.Error' ``` **Common causes:** - **OOM killed**: Memory limit exceeded - **Out of disk space**: No space left on device - **Segmentation fault**: Client bug or bad data - **Panic**: Go client panic **Fixes:** ```bash # Check memory usage ./show-ram.sh # Check disk space df -h /var/lib/docker ./show-db-size.sh # Increase resources in compose file or .env # Then force recreate ./force-recreate.sh ``` --- ## 4. Sync Issues ### Symptom: Node Not Syncing (Stuck at Block 0 or Low Block) ```bash # Check sync status ./sync-status.sh # Check current block ./latest.sh # Check logs for sync errors ./logs.sh | grep -i -E "sync|error|fail|warn|stuck|behind" # Check peer count ./peer-count.sh | grep ``` **Common causes:** - **No peers**: P2P network connection failed - **Wrong network**: Connected to wrong chain - **Checkpoint too old**: Checkpoint URL unavailable or outdated - **Snapshot download failed**: Snapshot server unreachable **Fixes:** ```bash # Check if checkpoint/snapshot is configured grep -E "(checkpoint|snapshot)" .yml # Test checkpoint URL manually curl -I $(grep checkpoint .yml | grep -oE 'http[^ ]+') # Check peer connections (geth example) docker exec admin_peers | jq '.[] | .network.remoteAddress' | wc -l ``` ### Symptom: Sync is Very Slow ```bash # Check sync speed over time ./latest.sh ; sleep 60; ./latest.sh # Check if node is processing blocks ./time-since-last-block.sh # Check CPU and memory top -d 1 -p $(docker inspect | jq -r '.[0].State.Pid') ``` **Common causes:** - **Resource constrained**: CPU throttled, memory swapped - **Disk I/O bottleneck**: Slow storage or contention - **Network rate limited**: P2P or RPC rate limiting - **Too many peers**: P2P overhead - **Wrong sync mode**: Full sync instead of snap sync ### Symptom: Sync Stuck at Specific Block ```bash # Check logs around the stuck block ./logs.sh | grep -A 10 -B 10 "block " # Check if it's a known bad block # Search online: bad block ``` **Common causes:** - **Bad block in chain**: Requires client patch or manual intervention - **State trie inconsistency**: Database corruption - **Fork choice issue**: Node on wrong fork **Fixes for OP Stack:** ```bash # Try to finalize past the block ./op-wheel-finalize-latest-block.sh ``` ### Symptom: Node on Wrong Fork / Chain ```bash # Check chain ID ./latest.sh | grep -i chain # Check what chain the node thinks it's on docker exec ethdo chain --endpoint=http://localhost:8545 # Compare with expected chain ID grep chainId .yml ``` --- ## 5. RPC/Connectivity Issues ### Symptom: RPC Endpoint Not Responding ```bash # Test from host curl -s http://localhost: | head -c 100 # Check if traefik/proxy is running docker ps | grep -E "(traefik|proxy|nginx)" # Check traefik logs docker logs | tail -50 ``` **Common causes:** - **Container not running**: Client crashed - **Port not exposed**: Wrong port mapping - **Traefik misconfiguration**: Labels wrong or missing - **Firewall blocking**: Host firewall or cloud security group ### Symptom: RPC Returns Wrong Chain ID ```bash # Query chain ID from RPC curl -s -X POST http://localhost: \ -H "Content-Type: application/json" \ -d '{"jsonrpc":"2.0","method":"eth_chainId","params":[],"id":1}' ``` ### Symptom: Cannot Connect to P2P Network ```bash # Check peer count ./peer-count.sh | grep # Test P2P connectivity from container docker exec nc -zv ``` **Fixes:** ```bash # Set public IP in .env IP=$(curl -s ipinfo.io/ip) echo "IP=$IP" >> .env ./force-recreate.sh ``` --- ## 6. Log Interpretation ### Common Log Patterns #### Warnings (Node may still function) | Pattern | Meaning | Action | |---|---|---| | `WARN.*sync.*slow` | Sync slower than expected | Check resources | | `WARN.*peers.*low` | Fewer peers than desired | Check P2P connectivity | | `WARN.*rate.*limit` | API rate limiting active | Normal for public endpoints | #### Errors (Node is degraded) | Pattern | Meaning | Action | |---|---|---| | `Error.*database.*corrupt` | Database corruption | Restore from backup or resync | | `Error.*handshake.*fail` | P2P handshake failed | Check chain ID | | `Error.*no.*peers` | Cannot connect to P2P | Check bootstrap nodes | | `Error.*timeout` | RPC/HTTP timeout | Check network, increase timeout | #### Fatal (Node will not function) | Pattern | Meaning | Action | |---|---|---| | `Fatal.*panic` | Client crashed | Check client version | | `Fatal.*OOM` | Out of memory | Increase memory limit | | `Fatal.*disk.*full` | No disk space | Free space | | `Fatal.*permission.*denied` | Filesystem permissions | Fix volume permissions | --- ## 7. Resource Issues ### High CPU Usage ```bash ./show-ram.sh ./show-cpu.sh docker stats --no-stream ``` ### High Memory Usage ```bash ./show-ram.sh docker stats --no-stream --format "{{.Container}} | {{.MemUsage}} | {{.MemPerc}}" ``` ### High Disk Usage ```bash ./show-db-size.sh docker system df -v ``` ### Disk I/O Bottleneck ```bash iotop -o -d 1 ``` --- ## 8. Backup and Restore ### Creating a Backup ```bash # Local backup (to /backup directory) ./backup-node.sh # Remote backup (to WebDAV) ./backup-node.sh https://backup-server.tld/dav ``` ### Restoring from Backup ```bash # List available backups ./list-backups.sh # Restore latest backup for config ./restore-volumes.sh # Restore from specific URL ./restore-volumes.sh https://backup-server.tld/backup/ ``` ### Cloning a Node ```bash # Clone a node to a new location ./clone-node.sh # Clone peers (for faster sync) ./clone-peers.sh ``` ### Nuclear Option: Full Reset ```bash # WARNING: This deletes ALL data for the config ./stop.sh && \ ./rm.sh && \ ./delete-volumes.sh && \ ./delete-node-keys.sh && \ ./force-recreate.sh # Then check logs ./logs.sh ``` --- ## 9. Common Error Messages ### Database Errors | Error | Cause | Solution | |---|---|---| | `database is corrupted` | Power loss, bug | Restore from backup or resync | | `database version mismatch` | Client version changed | Delete and resync | ### P2P Errors | Error | Cause | Solution | |---|---|---| | `no configured peers` | Missing bootstrap nodes | Add bootstrap nodes | | `handshake failed` | Chain ID mismatch | Verify genesis.json | ### RPC Errors | Error | Cause | Solution | |---|---|---| | `method not found` | Wrong client | Use correct client | | `connection refused` | Port not open | Check container running, port mapping | --- ## 10. OP Stack Specific Debugging ### OP Node Issues ```bash # Check op-node logs ./logs.sh | grep -i "op-node\|rollup\|sequencer" # Check rollup configuration (if custom) cat op//ethereum/rollup.json | jq . # Check if rollup.json is mounted docker exec cat /config/rollup.json | jq . ``` ### OP Wheel (Manual Intervention) ```bash # Rewind to specific block (DANGEROUS - only if you know what you're doing) ./op-wheel.sh engine set-forkchoice \ --unsafe= \ --safe= \ --finalized= \ --engine=http://:8551/ \ --engine.open=http://:8545 \ --engine.jwt-secret-path=/jwtsecret # Nuclear option: finalize latest local block ./op-wheel-finalize-latest-block.sh ``` --- ## 11. CometBFT Family (Cosmos, etc.) Specific ### Init Container Issues ```bash # CometBFT chains use init.sh inside the container # The master script is at scripts/cometbft-common.sh # Check if init completed ./logs.sh | grep -i "init\|setup\|complete" # Check the init script cat //scripts/init.sh ``` --- ## 12. Quick Start Guide ### Starting a Node ```bash # 1. Set up environment echo "IP=$(curl -s ipinfo.io/ip)" > .env echo "DOMAIN=${IP//./-}.traefik.me" >> .env echo "MAIL=your-email@example.com" >> .env # 2. Select which nodes to run # Add compose files to COMPOSE_FILE (colon-separated) echo "COMPOSE_FILE=base.yml:rpc.yml:ethereum-mainnet-geth-pruned.yml" >> .env # 3. Start the node docker compose up -d # 4. Verify it's running ./show-status.sh ``` ### Accessing Your Node ```bash # Once running, access via: # HTTP: http:///ethereum-mainnet-geth-pruned # HTTPS: https:///ethereum-mainnet-geth-pruned # WebSocket: wss:///ethereum-mainnet-geth-pruned # Or locally (if NO_SSL=true): # HTTP: http://localhost: ``` --- ## 13. Configuration Reference ### Environment Variables **Required for most setups:** ```bash IP=203.0.113.42 # Your public IP DOMAIN=203-0-113-42.traefik.me # Your domain (traefik.me for testing) MAIL=your-email@example.com # For Let's Encrypt SSL WHITELIST=0.0.0.0/0 # IP whitelist (0.0.0.0/0 = all) ``` **Optional:** ```bash NO_SSL=true # Disable SSL (testing only) CHAINS_SUBNET=192.168.0.0/26 # Docker network subnet ``` **Chain-specific (examples):** ```bash ETHEREUM_MAINNET_EXECUTION_RPC=https://fallback-rpc.example.com ARBITRUM_SEPOLIA_EXECUTION_RPC=https://arb-sepolia-rpc.example.com OP_NODE_NETWORK=mainnet OP_NODE_L1_RPC_URL=https://l1-rpc.example.com ``` ### Compose File Structure Each compose file defines one or more services: - **client**: Execution layer (Geth, Erigon, Reth, etc.) - **node**: Consensus/derivation node (op-node, lighthouse, etc.) - **relay**: DA relay (eigenda-proxy, op-alt, etc.) - **proxy**: HTTP/WS proxy (nginx, etc.) - **database**: External database (Postgres, etc.) ### Volume Naming Volumes are named after the config: ``` __data __config ``` Example: `ethereum-mainnet-geth-pruned_client_data` --- ## 14. Quick Debugging Checklist Use this checklist when debugging an issue: - [ ] **Is the container running?** → `./show-running.sh` - [ ] **Are there errors in logs?** → `./logs.sh | grep -i error` - [ ] **Is the node synced?** → `./sync-status.sh ` - [ ] **Are peers connected?** → `./peer-count.sh` - [ ] **Are resources adequate?** → `./show-ram.sh`, `./show-db-size.sh` - [ ] **Is P2P working?** → Check peer count - [ ] **Is RPC responding?** → Test with curl - [ ] **Is disk space available?** → `df -h /var/lib/docker` - [ ] **Is the config file correct?** → `docker compose -f .yml config` - [ ] **Are environment variables set?** → Check `.env` - [ ] **Is the genesis file correct?** → Check chain ID --- ## 15. When to Escalate Escalate to a human operator if: - [ ] Node stuck for > 2 hours with no progress - [ ] Repeated `Fatal` or `panic` errors after restart - [ ] Database corruption confirmed - [ ] Issue affects multiple nodes across different chains - [ ] Need to force-push to this repo --- ## 16. File Locations Quick Reference | What You Need | Where to Find It | |---|---| | Compose files | Root of this repo (`*.yml`) | | Operational scripts | Root of this repo (`*.sh`) | | Chain assets | `//` or `///` | | Genesis files | `///genesis.json` | | Rollup configs | `op///rollup.json` | | Custom Dockerfiles | `/*.Dockerfile` | | Init scripts | `/scripts/init.sh` | | CometBFT common | `scripts/cometbft-common.sh` | | Compose registry | `compose_registry.json` | | RPC endpoints | `reference-rpc-endpoint.json` | | Environment | `.env` | --- ## 17. Resource Requirements Reference | Node Type | Disk | RAM | CPU | |---|---|---|---| | Ethereum pruned | ~500GB | 8GB | 2+ cores | | Ethereum archive | ~2TB+ | 16GB+ | 4+ cores | | Ethereum archive-trace | ~4TB+ | 32GB+ | 8+ cores | | L2 pruned | ~100-500GB | 4-8GB | 2+ cores | | L2 archive | ~1-2TB | 8-16GB | 4+ cores | **Note:** Requirements vary by chain. Check specific chain documentation. --- *This file is your complete operations and debugging reference. For additional user documentation, see README.md.*