Add VIBE.md debugging guide and update README.md
- Add VIBE.md as primary debugging reference for automated tools - Rewrite README.md as human-focused operator guide - Fix README.md inaccuracies (remove show-networks.sh references, fix typo) - Split content: README for humans, VIBE for agents Generated by Mistral Vibe. Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
This commit is contained in:
729
VIBE.md
Normal file
729
VIBE.md
Normal file
@@ -0,0 +1,729 @@
|
|||||||
|
# VIBE.md — ethereum-rpc-docker Operations & Debugging Guide
|
||||||
|
|
||||||
|
You are an LLM agent or operator **running or debugging blockchain RPC nodes** from this
|
||||||
|
repository. This file is your **primary reference** for all operational tasks.
|
||||||
|
|
||||||
|
This repo contains Docker Compose configurations for blockchain RPC nodes plus operational
|
||||||
|
scripts for managing them. Everything you need to run, monitor, debug, and fix nodes is here.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 0. WHEN A NODE IS FAULTY — Start Here
|
||||||
|
|
||||||
|
### Immediate Triage (30 seconds)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 1. Is the container running?
|
||||||
|
./show-running.sh
|
||||||
|
|
||||||
|
# 2. Check overall status of all configured nodes
|
||||||
|
./show-status.sh
|
||||||
|
|
||||||
|
# 3. If you know the config name, check its specific status
|
||||||
|
./sync-status.sh <config-name>
|
||||||
|
|
||||||
|
# 4. Check logs for the faulty node
|
||||||
|
./logs.sh <config-name>
|
||||||
|
```
|
||||||
|
|
||||||
|
**If the container isn't running**, go to [§3. Container Lifecycle Issues](#3-container-lifecycle-issues)
|
||||||
|
|
||||||
|
**If the container is running but not synced**, go to [§4. Sync Issues](#4-sync-issues)
|
||||||
|
|
||||||
|
**If the container is running and synced but RPC fails**, go to [§5. RPC/Connectivity Issues](#5-rpcconnectivity-issues)
|
||||||
|
|
||||||
|
**If you see errors in logs but aren't sure what they mean**, go to [§6. Log Interpretation](#6-log-interpretation)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Repository Overview
|
||||||
|
|
||||||
|
### What This Repo Contains
|
||||||
|
|
||||||
|
```
|
||||||
|
rpc/
|
||||||
|
├── *.yml # Docker Compose files for node configurations
|
||||||
|
├── *.sh # Operational scripts (YOUR PRIMARY TOOLS)
|
||||||
|
├── scripts/ # Additional helper scripts (CometBFT support)
|
||||||
|
├── <network>/ # Network directories (e.g., ethereum/, op/, arb/)
|
||||||
|
│ ├── *.yml # Compose files for specific chains
|
||||||
|
│ └── <chain>/ # Chain-specific assets
|
||||||
|
│ ├── genesis.json # Custom genesis files
|
||||||
|
│ ├── rollup.json # Rollup configurations (OP Stack)
|
||||||
|
│ └── *.Dockerfile # Custom build files
|
||||||
|
├── README.md # User documentation
|
||||||
|
└── VIBE.md # THIS FILE — operations guide
|
||||||
|
```
|
||||||
|
|
||||||
|
### Key Concepts
|
||||||
|
|
||||||
|
- **Config name**: The compose filename WITHOUT `.yml` (e.g., `ethereum-mainnet-geth-pruned`)
|
||||||
|
- **Service name**: Derived from config name, used in `docker compose` commands
|
||||||
|
- **Short name**: Used in URL paths, container labels. Format: `{network}-{chain}[-{client}][-{db_type}]`
|
||||||
|
- **Volume names**: Docker volumes follow the full config name pattern
|
||||||
|
|
||||||
|
### Supported Networks
|
||||||
|
|
||||||
|
**Layer 1**: Ethereum, Polygon, BSC, Avalanche, Gnosis, Fantom, Core, Berachain, Ronin, Viction, Fuse, Tron, ThunderCore, Goat, AlephZero, Haqq, Taiko, Rootstock, Dogecoin, Litecoin, Bitcoin, Bitcoin-Cash, Ripple, Solana, Tron
|
||||||
|
|
||||||
|
**Layer 2 (OP Stack)**: Optimism, Base, Zora, Mode, Blast, Fraxtal, Bob, Boba, Worldchain, Metal, Ink, Lisk, SNAX, Celo
|
||||||
|
|
||||||
|
**Layer 2 (Arbitrum)**: Arbitrum One, Arbitrum Nova, Everclear, Playblock, Real, Connext, OpenCampusCodex
|
||||||
|
|
||||||
|
**Other L2s**: Linea, Scroll, zkSync Era, Metis, Moonbeam, Starknet, zkEVM, Immutable zkEVM, Polygon zkEVM
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. Essential Scripts Reference
|
||||||
|
|
||||||
|
### Status & Monitoring Scripts
|
||||||
|
|
||||||
|
| Script | Usage | What It Does |
|
||||||
|
|---|---|---|
|
||||||
|
| `show-status.sh` | `[config-name]` | Lists ALL configured nodes with sync status, block height, health |
|
||||||
|
| `show-running.sh` | | Lists currently running containers |
|
||||||
|
| `sync-status.sh` | `<config-name>` | Detailed sync status for one config |
|
||||||
|
| `latest.sh` | `<config-name>` | Latest block number + hash |
|
||||||
|
| `logs.sh` | `<config-name>` | Tail logs from all containers in a config |
|
||||||
|
| `show-db-size.sh` | | Disk usage of ALL Docker volumes, sorted by size |
|
||||||
|
| `show-ram.sh` | `<config-name>` | Memory usage of containers |
|
||||||
|
| `show-cpu.sh` | | CPU usage display |
|
||||||
|
| `peer-count.sh` | | P2P peer count for all running nodes |
|
||||||
|
| `time-since-last-block.sh` | `<config-name>` | How long since last block was processed |
|
||||||
|
| `ping.sh` | `<container-name>` | Test network connectivity from container |
|
||||||
|
| `show-errors.sh` | | Show error counts/logs across containers |
|
||||||
|
| `show-size.sh` | | Show size of containers/volumes |
|
||||||
|
| `show-file-size.sh` | | Show static file sizes |
|
||||||
|
| `show-static-file-size.sh` | | Show static file sizes (alternative) |
|
||||||
|
|
||||||
|
### Lifecycle Management Scripts
|
||||||
|
|
||||||
|
| Script | Usage | What It Does |
|
||||||
|
|---|---|---|
|
||||||
|
| `start.sh` | `<config-name>` | Start all containers for a config |
|
||||||
|
| `stop.sh` | `<config-name>` | Stop all containers for a config |
|
||||||
|
| `force-recreate.sh` | `<config-name>` | Force recreate containers (keeps volumes) |
|
||||||
|
| `rm.sh` | `<config-name>` | Remove containers (keeps volumes) |
|
||||||
|
| `delete-volumes.sh` | `<config-name>` | **DESTRUCTIVE** - Remove containers AND volumes |
|
||||||
|
| `delete-node-keys.sh` | `<config-name>` | Remove node keys (for re-initialization) |
|
||||||
|
|
||||||
|
### Backup & Restore Scripts
|
||||||
|
|
||||||
|
| Script | Usage | What It Does |
|
||||||
|
|---|---|---|
|
||||||
|
| `backup-node.sh` | `<config-name> [url]` | Backup volumes locally or to WebDAV |
|
||||||
|
| `restore-volumes.sh` | `<config-name> [url]` | Restore volumes from local or HTTP |
|
||||||
|
| `clone-node.sh` | `<config-name>` | Clone a node's state |
|
||||||
|
| `clone-backup.sh` | | Clone backup files |
|
||||||
|
| `clone-peers.sh` | | Clone peer information |
|
||||||
|
| `restore-peers.sh` | | Restore peer connections |
|
||||||
|
| `list-backups.sh` | | List available backup files |
|
||||||
|
| `list-peer-backups.sh` | | List peer backup files |
|
||||||
|
| `list-restorable.sh` | | List restorable configurations |
|
||||||
|
| `cleanup-backups.sh` | | Remove old backups |
|
||||||
|
| `cleanup-volumes.sh` | | Clean up unused volumes |
|
||||||
|
|
||||||
|
### Network & Connectivity Scripts
|
||||||
|
|
||||||
|
| Script | Usage | What It Does |
|
||||||
|
|---|---|---|
|
||||||
|
| `upstreams.sh` | | Generate dshackle upstream configuration |
|
||||||
|
| `connect-peers.sh` | | Connect to peer nodes |
|
||||||
|
| `search-node.sh` | `<query>` | Search compose files for patterns |
|
||||||
|
| `search-compose.sh` | `<query>` | Search compose files |
|
||||||
|
| `network-to-config.sh` | | Map network names to config files |
|
||||||
|
| `reload_dshackle.sh` | | Reload dshackle configuration |
|
||||||
|
| `update-whitelist.sh` | | Update IP whitelist |
|
||||||
|
| `update-ip.sh` | | Update IP configuration |
|
||||||
|
|
||||||
|
### Specialized Scripts
|
||||||
|
|
||||||
|
| Script | Usage | What It Does |
|
||||||
|
|---|---|---|
|
||||||
|
| `op-wheel.sh` | | OP rollup maintenance (rewind, set forkchoice) |
|
||||||
|
| `op-wheel-finalize-latest-block.sh` | `<client_svc> [node_svc]` | Finalize latest block (nuclear option) |
|
||||||
|
| `catchup.sh` | `<config-name>` | Help node catch up to chain head |
|
||||||
|
| `success-if-almost-synced.sh` | `<config-name> <seconds>` | Exit 0 if node is almost synced |
|
||||||
|
| `groq.sh` | | Query using Groq |
|
||||||
|
| `trai.sh` | | Trace transaction |
|
||||||
|
| `multicurl.sh` | | Parallel curl requests |
|
||||||
|
| `blocknumber.sh` | | Get block number |
|
||||||
|
| `get-block.sh` | | Get block information |
|
||||||
|
| `get-local-url.sh` | | Get local RPC URL |
|
||||||
|
| `get-shortname.sh` | `<config-file>` | Get short name for a config |
|
||||||
|
| `disk-space.sh` | | Check disk space |
|
||||||
|
| `limit-bandwidth.sh` | | Limit bandwidth |
|
||||||
|
| `maintenance.sh` | | Maintenance helper |
|
||||||
|
| `random-port.sh` | | Generate random port |
|
||||||
|
| `reference-rpc-endpoint.sh` | | Reference RPC endpoint helper |
|
||||||
|
| `reset-terminal.sh` | | Reset terminal |
|
||||||
|
| `setup-bandwidth-limit-cron.sh` | | Setup cron for bandwidth limiting |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. Container Lifecycle Issues
|
||||||
|
|
||||||
|
### Symptom: Container Won't Start
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check why it failed
|
||||||
|
./logs.sh <config-name> 2>&1 | tail -50
|
||||||
|
|
||||||
|
# Check container exit code
|
||||||
|
docker ps -a --filter "name=<config-name>" --format "{{.Names}} | {{.State}} | {{.Status}}"
|
||||||
|
|
||||||
|
# Inspect the container
|
||||||
|
docker inspect <container-name> | jq '.[0].State'
|
||||||
|
```
|
||||||
|
|
||||||
|
**Common causes:**
|
||||||
|
- **Port conflict**: Two services trying to bind to same host port
|
||||||
|
- **Volume permission issues**: Docker can't write to volume
|
||||||
|
- **Missing environment variables**: `.env` file incomplete
|
||||||
|
- **Invalid compose syntax**: YAML parsing error
|
||||||
|
- **Image pull failure**: Network issue or private registry auth
|
||||||
|
|
||||||
|
**Fixes:**
|
||||||
|
```bash
|
||||||
|
# Check for port conflicts
|
||||||
|
grep -h "^[0-9]\{1,5\}:[0-9]" *.yml | sort | uniq -d
|
||||||
|
|
||||||
|
# Validate compose syntax
|
||||||
|
docker compose -f <config-file>.yml config
|
||||||
|
|
||||||
|
# Pull images manually
|
||||||
|
docker compose -f <config-file>.yml pull
|
||||||
|
|
||||||
|
# Start with --build if using custom Dockerfiles
|
||||||
|
docker compose -f <config-file>.yml up -d --build
|
||||||
|
```
|
||||||
|
|
||||||
|
### Symptom: Container Exits Immediately After Starting
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# View the last 100 lines of logs before exit
|
||||||
|
./logs.sh <config-name> 2>&1 | tail -100
|
||||||
|
|
||||||
|
# Check exit code
|
||||||
|
docker ps -a --filter "name=<service>" --format "{{.Status}}"
|
||||||
|
|
||||||
|
# Run interactively to see error
|
||||||
|
docker compose -f <config-file>.yml run --rm <service-name> sh
|
||||||
|
```
|
||||||
|
|
||||||
|
**Common causes:**
|
||||||
|
- **Missing config files**: `/config/` mount empty or wrong path
|
||||||
|
- **Invalid flags**: Command-line arguments malformed
|
||||||
|
- **Database corruption**: Existing data incompatible with new version
|
||||||
|
- **Checkpoint/genesis mismatch**: Chain ID or genesis doesn't match
|
||||||
|
|
||||||
|
**Fixes:**
|
||||||
|
```bash
|
||||||
|
# Verify config directory exists (if using custom configs)
|
||||||
|
ls -la <network>/<chain>/
|
||||||
|
|
||||||
|
# Try with fresh volumes (DESTRUCTIVE)
|
||||||
|
./delete-volumes.sh <config-name>
|
||||||
|
./start.sh <config-name>
|
||||||
|
```
|
||||||
|
|
||||||
|
### Symptom: Container Restarts Repeatedly (Crash Loop)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Watch logs in real-time
|
||||||
|
./logs.sh <config-name> -f
|
||||||
|
|
||||||
|
# Check restart count
|
||||||
|
docker inspect <container-name> | jq '.[0].RestartCount'
|
||||||
|
|
||||||
|
# Check last restart reason
|
||||||
|
docker inspect <container-name> | jq '.[0].State.ExitCode, .[0].State.Error'
|
||||||
|
```
|
||||||
|
|
||||||
|
**Common causes:**
|
||||||
|
- **OOM killed**: Memory limit exceeded
|
||||||
|
- **Out of disk space**: No space left on device
|
||||||
|
- **Segmentation fault**: Client bug or bad data
|
||||||
|
- **Panic**: Go client panic
|
||||||
|
|
||||||
|
**Fixes:**
|
||||||
|
```bash
|
||||||
|
# Check memory usage
|
||||||
|
./show-ram.sh <config-name>
|
||||||
|
|
||||||
|
# Check disk space
|
||||||
|
df -h /var/lib/docker
|
||||||
|
./show-db-size.sh
|
||||||
|
|
||||||
|
# Increase resources in compose file or .env
|
||||||
|
# Then force recreate
|
||||||
|
./force-recreate.sh <config-name>
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Sync Issues
|
||||||
|
|
||||||
|
### Symptom: Node Not Syncing (Stuck at Block 0 or Low Block)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check sync status
|
||||||
|
./sync-status.sh <config-name>
|
||||||
|
|
||||||
|
# Check current block
|
||||||
|
./latest.sh <config-name>
|
||||||
|
|
||||||
|
# Check logs for sync errors
|
||||||
|
./logs.sh <config-name> | grep -i -E "sync|error|fail|warn|stuck|behind"
|
||||||
|
|
||||||
|
# Check peer count
|
||||||
|
./peer-count.sh | grep <config-name>
|
||||||
|
```
|
||||||
|
|
||||||
|
**Common causes:**
|
||||||
|
- **No peers**: P2P network connection failed
|
||||||
|
- **Wrong network**: Connected to wrong chain
|
||||||
|
- **Checkpoint too old**: Checkpoint URL unavailable or outdated
|
||||||
|
- **Snapshot download failed**: Snapshot server unreachable
|
||||||
|
|
||||||
|
**Fixes:**
|
||||||
|
```bash
|
||||||
|
# Check if checkpoint/snapshot is configured
|
||||||
|
grep -E "(checkpoint|snapshot)" <config-file>.yml
|
||||||
|
|
||||||
|
# Test checkpoint URL manually
|
||||||
|
curl -I $(grep checkpoint <config-file>.yml | grep -oE 'http[^ ]+')
|
||||||
|
|
||||||
|
# Check peer connections (geth example)
|
||||||
|
docker exec <client-container> admin_peers | jq '.[] | .network.remoteAddress' | wc -l
|
||||||
|
```
|
||||||
|
|
||||||
|
### Symptom: Sync is Very Slow
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check sync speed over time
|
||||||
|
./latest.sh <config-name>; sleep 60; ./latest.sh <config-name>
|
||||||
|
|
||||||
|
# Check if node is processing blocks
|
||||||
|
./time-since-last-block.sh <config-name>
|
||||||
|
|
||||||
|
# Check CPU and memory
|
||||||
|
top -d 1 -p $(docker inspect <container> | jq -r '.[0].State.Pid')
|
||||||
|
```
|
||||||
|
|
||||||
|
**Common causes:**
|
||||||
|
- **Resource constrained**: CPU throttled, memory swapped
|
||||||
|
- **Disk I/O bottleneck**: Slow storage or contention
|
||||||
|
- **Network rate limited**: P2P or RPC rate limiting
|
||||||
|
- **Too many peers**: P2P overhead
|
||||||
|
- **Wrong sync mode**: Full sync instead of snap sync
|
||||||
|
|
||||||
|
### Symptom: Sync Stuck at Specific Block
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check logs around the stuck block
|
||||||
|
./logs.sh <config-name> | grep -A 10 -B 10 "block <stuck-block-number>"
|
||||||
|
|
||||||
|
# Check if it's a known bad block
|
||||||
|
# Search online: <chain> bad block <number>
|
||||||
|
```
|
||||||
|
|
||||||
|
**Common causes:**
|
||||||
|
- **Bad block in chain**: Requires client patch or manual intervention
|
||||||
|
- **State trie inconsistency**: Database corruption
|
||||||
|
- **Fork choice issue**: Node on wrong fork
|
||||||
|
|
||||||
|
**Fixes for OP Stack:**
|
||||||
|
```bash
|
||||||
|
# Try to finalize past the block
|
||||||
|
./op-wheel-finalize-latest-block.sh <client-service>
|
||||||
|
```
|
||||||
|
|
||||||
|
### Symptom: Node on Wrong Fork / Chain
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check chain ID
|
||||||
|
./latest.sh <config-name> | grep -i chain
|
||||||
|
|
||||||
|
# Check what chain the node thinks it's on
|
||||||
|
docker exec <client-container> ethdo chain --endpoint=http://localhost:8545
|
||||||
|
|
||||||
|
# Compare with expected chain ID
|
||||||
|
grep chainId <config-file>.yml
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. RPC/Connectivity Issues
|
||||||
|
|
||||||
|
### Symptom: RPC Endpoint Not Responding
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Test from host
|
||||||
|
curl -s http://localhost:<port> | head -c 100
|
||||||
|
|
||||||
|
# Check if traefik/proxy is running
|
||||||
|
docker ps | grep -E "(traefik|proxy|nginx)"
|
||||||
|
|
||||||
|
# Check traefik logs
|
||||||
|
docker logs <traefik-container> | tail -50
|
||||||
|
```
|
||||||
|
|
||||||
|
**Common causes:**
|
||||||
|
- **Container not running**: Client crashed
|
||||||
|
- **Port not exposed**: Wrong port mapping
|
||||||
|
- **Traefik misconfiguration**: Labels wrong or missing
|
||||||
|
- **Firewall blocking**: Host firewall or cloud security group
|
||||||
|
|
||||||
|
### Symptom: RPC Returns Wrong Chain ID
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Query chain ID from RPC
|
||||||
|
curl -s -X POST http://localhost:<port> \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"jsonrpc":"2.0","method":"eth_chainId","params":[],"id":1}'
|
||||||
|
```
|
||||||
|
|
||||||
|
### Symptom: Cannot Connect to P2P Network
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check peer count
|
||||||
|
./peer-count.sh | grep <config-name>
|
||||||
|
|
||||||
|
# Test P2P connectivity from container
|
||||||
|
docker exec <client-container> nc -zv <bootstrap-node> <p2p-port>
|
||||||
|
```
|
||||||
|
|
||||||
|
**Fixes:**
|
||||||
|
```bash
|
||||||
|
# Set public IP in .env
|
||||||
|
IP=$(curl -s ipinfo.io/ip)
|
||||||
|
echo "IP=$IP" >> .env
|
||||||
|
./force-recreate.sh <config-name>
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. Log Interpretation
|
||||||
|
|
||||||
|
### Common Log Patterns
|
||||||
|
|
||||||
|
#### Warnings (Node may still function)
|
||||||
|
| Pattern | Meaning | Action |
|
||||||
|
|---|---|---|
|
||||||
|
| `WARN.*sync.*slow` | Sync slower than expected | Check resources |
|
||||||
|
| `WARN.*peers.*low` | Fewer peers than desired | Check P2P connectivity |
|
||||||
|
| `WARN.*rate.*limit` | API rate limiting active | Normal for public endpoints |
|
||||||
|
|
||||||
|
#### Errors (Node is degraded)
|
||||||
|
| Pattern | Meaning | Action |
|
||||||
|
|---|---|---|
|
||||||
|
| `Error.*database.*corrupt` | Database corruption | Restore from backup or resync |
|
||||||
|
| `Error.*handshake.*fail` | P2P handshake failed | Check chain ID |
|
||||||
|
| `Error.*no.*peers` | Cannot connect to P2P | Check bootstrap nodes |
|
||||||
|
| `Error.*timeout` | RPC/HTTP timeout | Check network, increase timeout |
|
||||||
|
|
||||||
|
#### Fatal (Node will not function)
|
||||||
|
| Pattern | Meaning | Action |
|
||||||
|
|---|---|---|
|
||||||
|
| `Fatal.*panic` | Client crashed | Check client version |
|
||||||
|
| `Fatal.*OOM` | Out of memory | Increase memory limit |
|
||||||
|
| `Fatal.*disk.*full` | No disk space | Free space |
|
||||||
|
| `Fatal.*permission.*denied` | Filesystem permissions | Fix volume permissions |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7. Resource Issues
|
||||||
|
|
||||||
|
### High CPU Usage
|
||||||
|
```bash
|
||||||
|
./show-ram.sh <config-name>
|
||||||
|
./show-cpu.sh
|
||||||
|
docker stats <container-name> --no-stream
|
||||||
|
```
|
||||||
|
|
||||||
|
### High Memory Usage
|
||||||
|
```bash
|
||||||
|
./show-ram.sh <config-name>
|
||||||
|
docker stats <container-name> --no-stream --format "{{.Container}} | {{.MemUsage}} | {{.MemPerc}}"
|
||||||
|
```
|
||||||
|
|
||||||
|
### High Disk Usage
|
||||||
|
```bash
|
||||||
|
./show-db-size.sh
|
||||||
|
docker system df -v
|
||||||
|
```
|
||||||
|
|
||||||
|
### Disk I/O Bottleneck
|
||||||
|
```bash
|
||||||
|
iotop -o -d 1
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 8. Backup and Restore
|
||||||
|
|
||||||
|
### Creating a Backup
|
||||||
|
```bash
|
||||||
|
# Local backup (to /backup directory)
|
||||||
|
./backup-node.sh <config-name>
|
||||||
|
|
||||||
|
# Remote backup (to WebDAV)
|
||||||
|
./backup-node.sh <config-name> https://backup-server.tld/dav
|
||||||
|
```
|
||||||
|
|
||||||
|
### Restoring from Backup
|
||||||
|
```bash
|
||||||
|
# List available backups
|
||||||
|
./list-backups.sh
|
||||||
|
|
||||||
|
# Restore latest backup for config
|
||||||
|
./restore-volumes.sh <config-name>
|
||||||
|
|
||||||
|
# Restore from specific URL
|
||||||
|
./restore-volumes.sh <config-name> https://backup-server.tld/backup/
|
||||||
|
```
|
||||||
|
|
||||||
|
### Cloning a Node
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Clone a node to a new location
|
||||||
|
./clone-node.sh <config-name>
|
||||||
|
|
||||||
|
# Clone peers (for faster sync)
|
||||||
|
./clone-peers.sh <config-name>
|
||||||
|
```
|
||||||
|
|
||||||
|
### Nuclear Option: Full Reset
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# WARNING: This deletes ALL data for the config
|
||||||
|
./stop.sh <config-name> && \
|
||||||
|
./rm.sh <config-name> && \
|
||||||
|
./delete-volumes.sh <config-name> && \
|
||||||
|
./delete-node-keys.sh <config-name> && \
|
||||||
|
./force-recreate.sh <config-name>
|
||||||
|
|
||||||
|
# Then check logs
|
||||||
|
./logs.sh <config-name>
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 9. Common Error Messages
|
||||||
|
|
||||||
|
### Database Errors
|
||||||
|
| Error | Cause | Solution |
|
||||||
|
|---|---|---|
|
||||||
|
| `database is corrupted` | Power loss, bug | Restore from backup or resync |
|
||||||
|
| `database version mismatch` | Client version changed | Delete and resync |
|
||||||
|
|
||||||
|
### P2P Errors
|
||||||
|
| Error | Cause | Solution |
|
||||||
|
|---|---|---|
|
||||||
|
| `no configured peers` | Missing bootstrap nodes | Add bootstrap nodes |
|
||||||
|
| `handshake failed` | Chain ID mismatch | Verify genesis.json |
|
||||||
|
|
||||||
|
### RPC Errors
|
||||||
|
| Error | Cause | Solution |
|
||||||
|
|---|---|---|
|
||||||
|
| `method not found` | Wrong client | Use correct client |
|
||||||
|
| `connection refused` | Port not open | Check container running, port mapping |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 10. OP Stack Specific Debugging
|
||||||
|
|
||||||
|
### OP Node Issues
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check op-node logs
|
||||||
|
./logs.sh <config-name> | grep -i "op-node\|rollup\|sequencer"
|
||||||
|
|
||||||
|
# Check rollup configuration (if custom)
|
||||||
|
cat op/<network>/ethereum/rollup.json | jq .
|
||||||
|
|
||||||
|
# Check if rollup.json is mounted
|
||||||
|
docker exec <op-node-container> cat /config/rollup.json | jq .
|
||||||
|
```
|
||||||
|
|
||||||
|
### OP Wheel (Manual Intervention)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Rewind to specific block (DANGEROUS - only if you know what you're doing)
|
||||||
|
./op-wheel.sh engine set-forkchoice \
|
||||||
|
--unsafe=<block-hash> \
|
||||||
|
--safe=<block-hash> \
|
||||||
|
--finalized=<block-hash> \
|
||||||
|
--engine=http://<client-service>:8551/ \
|
||||||
|
--engine.open=http://<client-service>:8545 \
|
||||||
|
--engine.jwt-secret-path=/jwtsecret
|
||||||
|
|
||||||
|
# Nuclear option: finalize latest local block
|
||||||
|
./op-wheel-finalize-latest-block.sh <client-service> <node-service>
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 11. CometBFT Family (Cosmos, etc.) Specific
|
||||||
|
|
||||||
|
### Init Container Issues
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# CometBFT chains use init.sh inside the container
|
||||||
|
# The master script is at scripts/cometbft-common.sh
|
||||||
|
|
||||||
|
# Check if init completed
|
||||||
|
./logs.sh <config-name> | grep -i "init\|setup\|complete"
|
||||||
|
|
||||||
|
# Check the init script
|
||||||
|
cat <network>/<chain>/scripts/init.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 12. Quick Start Guide
|
||||||
|
|
||||||
|
### Starting a Node
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 1. Set up environment
|
||||||
|
echo "IP=$(curl -s ipinfo.io/ip)" > .env
|
||||||
|
echo "DOMAIN=${IP//./-}.traefik.me" >> .env
|
||||||
|
echo "MAIL=your-email@example.com" >> .env
|
||||||
|
|
||||||
|
# 2. Select which nodes to run
|
||||||
|
# Add compose files to COMPOSE_FILE (colon-separated)
|
||||||
|
echo "COMPOSE_FILE=base.yml:rpc.yml:ethereum-mainnet-geth-pruned.yml" >> .env
|
||||||
|
|
||||||
|
# 3. Start the node
|
||||||
|
docker compose up -d
|
||||||
|
|
||||||
|
# 4. Verify it's running
|
||||||
|
./show-status.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
### Accessing Your Node
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Once running, access via:
|
||||||
|
# HTTP: http://<your-domain>/ethereum-mainnet-geth-pruned
|
||||||
|
# HTTPS: https://<your-domain>/ethereum-mainnet-geth-pruned
|
||||||
|
# WebSocket: wss://<your-domain>/ethereum-mainnet-geth-pruned
|
||||||
|
|
||||||
|
# Or locally (if NO_SSL=true):
|
||||||
|
# HTTP: http://localhost:<port>
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 13. Configuration Reference
|
||||||
|
|
||||||
|
### Environment Variables
|
||||||
|
|
||||||
|
**Required for most setups:**
|
||||||
|
```bash
|
||||||
|
IP=203.0.113.42 # Your public IP
|
||||||
|
DOMAIN=203-0-113-42.traefik.me # Your domain (traefik.me for testing)
|
||||||
|
MAIL=your-email@example.com # For Let's Encrypt SSL
|
||||||
|
WHITELIST=0.0.0.0/0 # IP whitelist (0.0.0.0/0 = all)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Optional:**
|
||||||
|
```bash
|
||||||
|
NO_SSL=true # Disable SSL (testing only)
|
||||||
|
CHAINS_SUBNET=192.168.0.0/26 # Docker network subnet
|
||||||
|
```
|
||||||
|
|
||||||
|
**Chain-specific (examples):**
|
||||||
|
```bash
|
||||||
|
ETHEREUM_MAINNET_EXECUTION_RPC=https://fallback-rpc.example.com
|
||||||
|
ARBITRUM_SEPOLIA_EXECUTION_RPC=https://arb-sepolia-rpc.example.com
|
||||||
|
OP_NODE_NETWORK=mainnet
|
||||||
|
OP_NODE_L1_RPC_URL=https://l1-rpc.example.com
|
||||||
|
```
|
||||||
|
|
||||||
|
### Compose File Structure
|
||||||
|
|
||||||
|
Each compose file defines one or more services:
|
||||||
|
- **client**: Execution layer (Geth, Erigon, Reth, etc.)
|
||||||
|
- **node**: Consensus/derivation node (op-node, lighthouse, etc.)
|
||||||
|
- **relay**: DA relay (eigenda-proxy, op-alt, etc.)
|
||||||
|
- **proxy**: HTTP/WS proxy (nginx, etc.)
|
||||||
|
- **database**: External database (Postgres, etc.)
|
||||||
|
|
||||||
|
### Volume Naming
|
||||||
|
|
||||||
|
Volumes are named after the config:
|
||||||
|
```
|
||||||
|
<config-name>_<service>_data
|
||||||
|
<config-name>_<service>_config
|
||||||
|
```
|
||||||
|
|
||||||
|
Example: `ethereum-mainnet-geth-pruned_client_data`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 14. Quick Debugging Checklist
|
||||||
|
|
||||||
|
Use this checklist when debugging an issue:
|
||||||
|
|
||||||
|
- [ ] **Is the container running?** → `./show-running.sh`
|
||||||
|
- [ ] **Are there errors in logs?** → `./logs.sh <config> | grep -i error`
|
||||||
|
- [ ] **Is the node synced?** → `./sync-status.sh <config>`
|
||||||
|
- [ ] **Are peers connected?** → `./peer-count.sh`
|
||||||
|
- [ ] **Are resources adequate?** → `./show-ram.sh`, `./show-db-size.sh`
|
||||||
|
- [ ] **Is P2P working?** → Check peer count
|
||||||
|
- [ ] **Is RPC responding?** → Test with curl
|
||||||
|
- [ ] **Is disk space available?** → `df -h /var/lib/docker`
|
||||||
|
- [ ] **Is the config file correct?** → `docker compose -f <file>.yml config`
|
||||||
|
- [ ] **Are environment variables set?** → Check `.env`
|
||||||
|
- [ ] **Is the genesis file correct?** → Check chain ID
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 15. When to Escalate
|
||||||
|
|
||||||
|
Escalate to a human operator if:
|
||||||
|
|
||||||
|
- [ ] Node stuck for > 2 hours with no progress
|
||||||
|
- [ ] Repeated `Fatal` or `panic` errors after restart
|
||||||
|
- [ ] Database corruption confirmed
|
||||||
|
- [ ] Issue affects multiple nodes across different chains
|
||||||
|
- [ ] Need to force-push to this repo
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 16. File Locations Quick Reference
|
||||||
|
|
||||||
|
| What You Need | Where to Find It |
|
||||||
|
|---|---|
|
||||||
|
| Compose files | Root of this repo (`*.yml`) |
|
||||||
|
| Operational scripts | Root of this repo (`*.sh`) |
|
||||||
|
| Chain assets | `<network>/<chain>/` or `<stack>/<network>/<settlement>/` |
|
||||||
|
| Genesis files | `<stack>/<network>/<settlement>/genesis.json` |
|
||||||
|
| Rollup configs | `op/<network>/<settlement>/rollup.json` |
|
||||||
|
| Custom Dockerfiles | `<path>/*.Dockerfile` |
|
||||||
|
| Init scripts | `<path>/scripts/init.sh` |
|
||||||
|
| CometBFT common | `scripts/cometbft-common.sh` |
|
||||||
|
| Compose registry | `compose_registry.json` |
|
||||||
|
| RPC endpoints | `reference-rpc-endpoint.json` |
|
||||||
|
| Environment | `.env` |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 17. Resource Requirements Reference
|
||||||
|
|
||||||
|
| Node Type | Disk | RAM | CPU |
|
||||||
|
|---|---|---|---|
|
||||||
|
| Ethereum pruned | ~500GB | 8GB | 2+ cores |
|
||||||
|
| Ethereum archive | ~2TB+ | 16GB+ | 4+ cores |
|
||||||
|
| Ethereum archive-trace | ~4TB+ | 32GB+ | 8+ cores |
|
||||||
|
| L2 pruned | ~100-500GB | 4-8GB | 2+ cores |
|
||||||
|
| L2 archive | ~1-2TB | 8-16GB | 4+ cores |
|
||||||
|
|
||||||
|
**Note:** Requirements vary by chain. Check specific chain documentation.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*This file is your complete operations and debugging reference. For additional user documentation, see README.md.*
|
||||||
Reference in New Issue
Block a user