From dd8ce689e4820da38ca8d56ce62cd9d2993cea78 Mon Sep 17 00:00:00 2001 From: rob Date: Mon, 22 Jun 2026 08:37:38 +0000 Subject: [PATCH] Add VIBE.md debugging guide and update README.md - Add VIBE.md as primary debugging reference for automated tools - Rewrite README.md as human-focused operator guide - Fix README.md inaccuracies (remove show-networks.sh references, fix typo) - Split content: README for humans, VIBE for agents Generated by Mistral Vibe. Co-Authored-By: Mistral Vibe --- README.md | 962 +++++++++++++++++++++++++++++++++++++++++------------- VIBE.md | 729 +++++++++++++++++++++++++++++++++++++++++ 2 files changed, 1457 insertions(+), 234 deletions(-) create mode 100644 VIBE.md diff --git a/README.md b/README.md index 1b182c33..0dc15f12 100644 --- a/README.md +++ b/README.md @@ -1,79 +1,535 @@ -# Blockchain Node Configurations +# ethereum-rpc-docker — Blockchain RPC Node Configurations -This directory contains Docker Compose configurations for various blockchain networks and node implementations. +**ethereum-rpc-docker** provides production-ready Docker Compose configurations for running +blockchain RPC nodes. Whether you're running a single Ethereum node or a fleet of L2s, +this repository has everything you need to get started quickly. -## Directory Structure +For **debugging and troubleshooting**, see [VIBE.md](VIBE.md) — the automated operations guide. -- Root level YAML files (e.g. `ethereum-mainnet.yml`, `arbitrum-one.yml`) - Main Docker Compose configurations for specific networks -- Network-specific subdirectories - Contain additional configurations, genesis files, and client-specific implementations -- Utility scripts (e.g. `show-networks.sh`, `logs.sh`) - Helper scripts for managing and monitoring nodes +--- -## Node Types +## 1. What This Repository Is -This repository supports multiple node types for various blockchain networks: +This is a collection of Docker Compose files and supporting scripts for operating blockchain +RPC nodes. Each compose file defines a complete node configuration including: -- **Ethereum networks**: Mainnet, Sepolia, Holesky -- **Layer 2 networks**: Arbitrum, Optimism, Base, Scroll, ZKSync Era, etc. -- **Alternative L1 networks**: Avalanche, BSC, Fantom, Polygon, etc. +- **Client** (execution layer): Geth, Erigon, Reth, Nethermind, Besu, etc. +- **Node** (consensus layer): op-node, lighthouse, prysm, nitro, etc. +- **Relay** (data availability): eigenda-proxy, op-alt, celestia, etc. +- **Proxy** (access layer): nginx for HTTP/WS unification +- **Monitoring**: Prometheus metrics, Grafana dashboards -Most networks have both archive and pruned node configurations available, with support for different client implementations (Geth, Erigon, Reth, etc.). +You can run a single node or combine multiple compose files to create a full fleet. -## Quick Start +--- -1. Create a `.env` file in this directory (see example below) -2. Select which node configurations you want to run by adding them to the `COMPOSE_FILE` variable -3. Run `docker compose up -d` -4. Access your RPC endpoints at `https://yourdomain.tld/path` or `http://localhost:port` +## 2. Quick Start -### Example .env File +### Prerequisites + +- Docker and Docker Compose installed +- Docker daemon running +- Public IP address (for P2P connectivity) +- At least 500GB free disk for Ethereum mainnet pruned nodes + +### Step 1: Create Your Environment File + +Create a `.env` file in this directory. Here's a complete example: ```bash -# Domain settings -DOMAIN=203-0-113-42.traefik.me # Use your PUBLIC IP with dots replaced by hyphens -MAIL=your-email@example.com # Required for Let's Encrypt SSL -WHITELIST=0.0.0.0/0 # IP whitelist for access (0.0.0.0/0 allows all) +# === REQUIRED: Network Settings === +# Your public IP address (required for P2P on most chains) +IP=203.0.113.42 -# Public IP (required for many chains) -IP=203.0.113.42 # Your PUBLIC IP (get it with: curl ipinfo.io/ip) - -# Network settings +# Docker internal subnet for chain containers CHAINS_SUBNET=192.168.0.0/26 -# RPC provider endpoints (fallback/bootstrap nodes) +# === REQUIRED: SSL Settings === +# Domain for Traefik SSL certificates (use traefik.me for testing) +DOMAIN=203-0-113-42.traefik.me + +# Email for Let's Encrypt notifications +MAIL=your-email@example.com + +# IP whitelist (CIDR notation, 0.0.0.0/0 allows all) +WHITELIST=0.0.0.0/0 + +# === OPTIONAL: SSL === +# Disable SSL for local testing (remove for production) +# NO_SSL=true + +# === OPTIONAL: Fallback RPC Endpoints === +# These are used by nodes for initial sync and as fallbacks ETHEREUM_MAINNET_EXECUTION_RPC=https://ethereum-rpc.publicnode.com ETHEREUM_MAINNET_EXECUTION_WS=wss://ethereum-rpc.publicnode.com ETHEREUM_MAINNET_BEACON_REST=https://ethereum-beacon-api.publicnode.com ETHEREUM_SEPOLIA_EXECUTION_RPC=https://ethereum-sepolia-rpc.publicnode.com ETHEREUM_SEPOLIA_EXECUTION_WS=wss://ethereum-sepolia-rpc.publicnode.com -ETHEREUM_SEPOLIA_BEACON_REST=https://ethereum-sepolia-beacon-api.publicnode.com ARBITRUM_SEPOLIA_EXECUTION_RPC=https://arbitrum-sepolia-rpc.publicnode.com -ARBITRUM_SEPOLIA_EXECUTION_WS=wss://arbitrum-sepolia-rpc.publicnode.com - -# SSL settings (set NO_SSL=true to disable SSL) -# NO_SSL=true - -# Docker Compose configuration -# Always include base.yml and rpc.yml, then add the networks you want -COMPOSE_FILE=base.yml:rpc.yml:ethereum-mainnet.yml ``` -## Usage +**To get your public IP:** +```bash +curl ipinfo.io/ip +``` -To start nodes defined in your `.env` file: +**For traefik.me domain:** Replace dots with hyphens in your IP: +```bash +IP=203.0.113.42 +DOMAIN=203-0-113-42.traefik.me +``` + +### Step 2: Select Which Nodes to Run + +Add compose files to the `COMPOSE_FILE` variable in your `.env`. Always include +`base.yml` and `rpc.yml` first, then add your node configurations: ```bash -docker compose up -d +# Example: Run Ethereum mainnet with Geth (pruned) +COMPOSE_FILE=base.yml:rpc.yml:ethereum-mainnet-geth-pruned.yml + +# Example: Run multiple nodes +COMPOSE_FILE=base.yml:rpc.yml:ethereum-mainnet-geth-pruned.yml:arbitrum-one.yml:optimism-mainnet.yml + +# Example: Include monitoring +COMPOSE_FILE=base.yml:rpc.yml:monitoring.yml:ethereum-mainnet-geth-pruned.yml ``` +### Step 3: Start Your Nodes + +```bash +# Start all configured nodes +docker compose up -d + +# Check status of all nodes +./show-status.sh + +# View logs for a specific node +./logs.sh ethereum-mainnet-geth-pruned +``` + +### Step 4: Access Your RPC Endpoints + +Once your nodes are running, access them at: + +``` +# HTTPS (via Traefik) +https:///ethereum-mainnet-geth-pruned + +# HTTP (if NO_SSL=true) +http:///ethereum-mainnet-geth-pruned + +# WebSocket +wss:///ethereum-mainnet-geth-pruned +``` + +--- + +## 3. Configuration Guide + +### Environment Variables Reference + +#### Required Variables + +| Variable | Description | Example | +|---|---|---| +| `IP` | Your public IP address | `203.0.113.42` | +| `DOMAIN` | Your domain for SSL | `203-0-113-42.traefik.me` | +| `MAIL` | Email for Let's Encrypt | `admin@example.com` | +| `COMPOSE_FILE` | Which compose files to load | `base.yml:rpc.yml:ethereum-mainnet.yml` | + +#### Optional Variables + +| Variable | Description | Default | +|---|---|---| +| `WHITELIST` | IP whitelist (CIDR) | `0.0.0.0/0` (all) | +| `CHAINS_SUBNET` | Docker network subnet | `192.168.0.0/26` | +| `NO_SSL` | Disable SSL | Not set (SSL enabled) | + +#### Chain-Specific Variables + +These are automatically used by nodes that support them: + +| Variable | Description | +|---|---| +| `ETHEREUM_*_EXECUTION_RPC` | Fallback JSON-RPC endpoint | +| `ETHEREUM_*_EXECUTION_WS` | Fallback WebSocket endpoint | +| `ETHEREUM_*_BEACON_REST` | Fallback Beacon API endpoint | +| `OP_NODE_NETWORK` | OP Stack network identifier | +| `OP_NODE_L1_RPC_URL` | L1 RPC endpoint for OP Stack | + +### Static Compose Files + +These files provide infrastructure and should always be included: + +| File | Purpose | Required? | +|---|---|---| +| `base.yml` | Base Docker networking | ✅ Yes | +| `rpc.yml` | RPC gateway configuration | ✅ Yes | +| `monitoring.yml` | Prometheus + Grafana stack | ❌ No | +| `prometheus.yml` | Prometheus configuration | ❌ No | +| `nodeexporter.yml` | Node metrics exporter | ❌ No | +| `cadvisor.yml` | Container metrics | ❌ No | +| `drpc.yml` | DRPC gateway | ❌ No | +| `drpc-free.yml` | DRPC free tier | ❌ No | +| `backup-http.yml` | Backup HTTP server | ❌ No | +| `logging-proxy.yml` | Logging infrastructure | ❌ No | +| `portainer.yml` | Portainer UI | ❌ No | +| `benchmark-proxy.yml` | Latency testing | ❌ No | + +--- + +## 4. Available Networks + +### Layer 1 Networks + +#### Major Networks +- **Ethereum**: Mainnet, Sepolia, Holesky +- **BSC** (Binance Smart Chain): Mainnet +- **Polygon**: Mainnet, Amoy +- **Avalanche**: C-Chain Mainnet +- **Gnosis**: Mainnet, Chiado + +#### Alternative L1s +- **Fantom**: Mainnet, Testnet +- **Core**: Mainnet +- **Berachain**: Mainnet, Testnet +- **Ronin**: Mainnet +- **Viction**: Mainnet +- **Fuse**: Mainnet +- **Tron**: Mainnet +- **ThunderCore**: Mainnet + +#### Emerging L1s +- **Goat**: Mainnet, Testnet +- **AlephZero**: Mainnet +- **Haqq**: Mainnet, Testnet +- **Taiko**: Mainnet +- **Rootstock**: Mainnet + +### Layer 2 Networks + +#### OP Stack +- Optimism: Mainnet, Sepolia +- Base: Mainnet, Sepolia +- Zora: Mainnet +- Mode: Mainnet +- Blast: Mainnet +- Fraxtal: Mainnet +- Bob: Mainnet +- Boba: Mainnet, Testnet +- Worldchain: Mainnet +- Metal: Mainnet +- Ink: Mainnet +- Lisk: Mainnet, Sepolia +- SNAX: Mainnet +- Celo: Mainnet + +#### Arbitrum Ecosystem +- Arbitrum One: Mainnet +- Arbitrum Nova: Mainnet +- Everclear: Mainnet +- Playblock: Mainnet +- Real: Mainnet +- Connext: Mainnet +- OpenCampusCodex: Mainnet + +#### Other L2s +- Linea: Mainnet +- Scroll: Mainnet, Sepolia +- zkSync Era: Mainnet +- Metis: Mainnet +- Moonbeam: Mainnet +- Starknet: Mainnet +- zkEVM: Mainnet +- Immutable zkEVM: Mainnet +- Polygon zkEVM: Mainnet + +### Node Types Available + +For most networks, you can choose from: + +| Sync Mode | Database | Client | Use Case | +|---|---|---|---| +| Pruned | Various | Geth, Erigon, Reth | Standard production | +| Archive | Various | Geth, Erigon, Reth | Full history, analytics | +| Archive-Trace | Various | Reth, Erigon | Full history + transaction tracing | + +--- + +## 5. Accessing Your Nodes + +### URL Patterns + +Each node is accessible via Traefik reverse proxy at: + +``` +https:/// +``` + +Where `` is the compose filename without `.yml`, e.g.: +- `ethereum-mainnet-geth-pruned` +- `op-base-sepolia-op-reth-pruned-trace` +- `arbitrum-one-arbnode-archive` + ### Ports -The default ports are defined in the templates. They are randomised to avoid conflicts. Some configurations can require 7 ports to be opened for P2P discovery. Docker will override any UFW firewall rule that you define on the host. You should prevent the containers to try to reach out to other nodes on local IP ranges. +| Service | Port | Protocol | Notes | +|---|---|---|---| +| Traefik HTTP | 80 | TCP | Redirects to HTTPS | +| Traefik HTTPS | 443 | TCP | SSL termination | +| Docker internal | Various | TCP/UDP | P2P and internal communications | -You can use the following service definition as a starting point. Replace the {{ chains_subnet }} with the subnet of your network. Default is 192.168.0.0/26. +**Note:** P2P ports are randomized to avoid conflicts. Check your compose file +for specific port mappings. +### Local Access (Without SSL) + +For local development/testing, set `NO_SSL=true` in your `.env`: + +```bash +# Access locally without SSL +echo "NO_SSL=true" >> .env +docker compose up -d + +# Then access at: +http://localhost: ``` + +Find the port in your compose file or use: +```bash +./get-local-url.sh +``` + +--- + +## 6. Resource Requirements + +### Hardware Recommendations + +| Node Type | Disk | RAM | CPU | Notes | +|---|---|---|---|---| +| Ethereum Pruned | 500GB - 1TB | 8GB | 2+ cores | Standard production | +| Ethereum Archive | 2TB - 4TB | 16GB+ | 4+ cores | Full history | +| Ethereum Archive-Trace | 4TB - 8TB | 32GB+ | 8+ cores | Full + tracing | +| BSC Pruned | 500GB - 800GB | 8GB | 2+ cores | | +| BSC Archive | 2TB - 3TB | 16GB | 4+ cores | | +| Polygon Pruned | 300GB - 500GB | 8GB | 2+ cores | | +| Polygon Archive | 1TB - 2TB | 16GB | 4+ cores | | +| L2 Pruned (OP Stack) | 100GB - 300GB | 4-8GB | 2+ cores | Base, Optimism, etc. | +| L2 Archive (OP Stack) | 500GB - 1TB | 8-16GB | 4+ cores | | +| L2 Pruned (Arbitrum) | 200GB - 500GB | 4-8GB | 2+ cores | | +| L2 Archive (Arbitrum) | 1TB - 2TB | 8-16GB | 4+ cores | | + +### Storage Recommendations + +- **Use SSD or NVMe**: HDDs are too slow for blockchain nodes +- **Separate volumes**: Consider separating chain data from OS +- **Monitor usage**: Use `./show-db-size.sh` to track growth +- **Plan for growth**: Archive nodes grow ~1-2GB/day for Ethereum + +### Performance Tips + +- **CPU**: More cores = faster sync, but diminishing returns after 8 cores +- **RAM**: Archive nodes need sufficient RAM for state +- **Disk I/O**: NVMe > SSD > HDD. RAID 0 can help for multiple nodes +- **Network**: 1Gbps+ recommended for multiple nodes + +--- + +## 7. SSL Certificates and Domain Setup + +### Traefik.me (Testing/Development) + +For quick testing, use the free traefik.me service: + +```bash +# Get your IP +IP=$(curl -s ipinfo.io/ip) + +# Create domain by replacing dots with hyphens +DOMAIN=$(echo $IP | tr . -).traefik.me + +# Add to .env +echo "DOMAIN=$DOMAIN" >> .env +echo "MAIL=your-email@example.com" >> .env +``` + +Traefik.me automatically provides valid SSL certificates. No additional setup needed. + +### Let's Encrypt (Production) + +For production, use your own domain with Let's Encrypt: + +```bash +# Set your domain +DOMAIN=yourdomain.com + +# Set your email for notifications +MAIL=admin@yourdomain.com + +# Add to .env +echo "DOMAIN=$DOMAIN" >> .env +echo "MAIL=$MAIL" >> .env +``` + +Traefik will automatically obtain and renew certificates from Let's Encrypt. + +### Disabling SSL + +For local development, you can disable SSL: + +```bash +# In .env +echo "NO_SSL=true" >> .env + +# Then restart +docker compose up -d +``` + +--- + +## 8. DRPC Integration — Monetize Your RPC + +[DRPC](https://drpc.org/) allows you to sell excess RPC capacity and earn revenue. + +### Quick Setup + +1. **Enable DRPC in your `.env`:** + ```bash + # Add to .env + GW_DOMAIN=your-gateway-domain.com + GW_REDIS_RAM=2gb + DRPC_VERSION=0.64.16 + ``` + +2. **Add DRPC to COMPOSE_FILE:** + ```bash + COMPOSE_FILE=base.yml:rpc.yml:drpc.yml:ethereum-mainnet-geth-pruned.yml + ``` + +3. **Generate upstreams configuration:** + ```bash + ./upstreams.sh + ``` + +The `upstreams.sh` script automatically detects all running nodes on your machine and +generates the configuration for the dshackle load balancer. + +### Configuration Options + +| Variable | Description | Default | +|---|---|---| +| `GW_DOMAIN` | Your gateway domain | Required | +| `GW_REDIS_RAM` | Redis memory limit | `2gb` | +| `DRPC_VERSION` | DRPC version to use | Latest | + +### Multi-Server Setup + +For running DRPC across multiple servers: + +1. On each server, run nodes with DRPC enabled +2. On your gateway server, run `./upstreams.sh` to generate combined config +3. Deploy the dshackle configuration to your gateway + +For more information, visit [drpc.org](https://drpc.org/). + +--- + +## 9. Backup and Restore System + +### Why Backups Matter + +- **Database corruption**: Power loss or bugs can corrupt your chain data +- **Quick recovery**: Restore from backup instead of re-syncing from scratch +- **Migration**: Move nodes between servers efficiently +- **Cloning**: Create identical nodes for redundancy + +### Local Backups + +```bash +# Create a backup of a node +./backup-node.sh ethereum-mainnet-geth-pruned + +# Backup is stored in /backup directory +# List available backups +./list-backups.sh + +# Restore from latest backup +./restore-volumes.sh ethereum-mainnet-geth-pruned +``` + +### Remote Backups (WebDAV) + +For multi-server setups, use WebDAV for remote backups: + +```bash +# Backup to remote WebDAV +./backup-node.sh ethereum-mainnet https://backup-server.tld/dav + +# Restore from remote WebDAV +./restore-volumes.sh ethereum-mainnet https://backup-server.tld/backup/ +``` + +### HTTP Backup Server + +Expose your backups via HTTP for easy access: + +```bash +# Add to COMPOSE_FILE +COMPOSE_FILE=base.yml:rpc.yml:backup-http.yml:ethereum-mainnet.yml + +# Access backups at: +# - HTTP: https://yourdomain.tld/backup +# - WebDAV: https://yourdomain.tld/dav +``` + +### Cross-Server Transfers + +Transfer node data between servers without SSH: + +```bash +# On Server A (source): +./backup-node.sh ethereum-mainnet https://server-a.domain.tld/dav + +# On Server B (destination): +./restore-volumes.sh ethereum-mainnet https://server-a.domain.tld/backup/ +``` + +### Cloning Nodes + +Clone a running node to create a replica: + +```bash +# Clone node state +./clone-node.sh ethereum-mainnet-geth-pruned + +# Clone peer connections (faster sync) +./clone-peers.sh ethereum-mainnet-geth-pruned +``` + +--- + +## 10. Security and Network Configuration + +### Firewall Configuration + +Many chains require specific ports for P2P discovery. Docker will bind to these +ports on all interfaces (0.0.0.0). + +**Critical:** You must configure your firewall to: +1. Allow inbound connections to P2P ports (UDP/TCP) +2. Allow outbound connections from containers +3. Block containers from reaching local networks (security) + +### iptables Rules + +Here's a systemd service and script for managing Docker firewall rules: + +**`/etc/systemd/system/iptables-firewall.service`:** +```ini [Unit] Description= iptables firewall docker fix After=docker.service @@ -87,267 +543,305 @@ StandardOutput=journal WantedBy=multi-user.target ``` +**`/usr/local/bin/iptables-firewall.sh`:** ```bash #!/bin/bash PATH="/sbin:/usr/sbin:/bin:/usr/bin" -# Flush existing rules in the DOCKER-USER chain -# this is potentially dangerous if other scripts write in that chain too but for now this should be the only one +# Flush existing rules iptables -F DOCKER-USER -# block heise.de to test it's working. ./ping.sh heise.de will ping from a container in the subnet. -iptables -I DOCKER-USER -s {{ chains_subnet }} -d 193.99.144.80/32 -j REJECT - -# block local networks +# Block local networks from containers +# (prevents containers from reaching your LAN) iptables -I DOCKER-USER -s {{ chains_subnet }} -d 192.168.0.0/16 -j REJECT iptables -I DOCKER-USER -s {{ chains_subnet }} -d 172.16.0.0/12 -j REJECT iptables -I DOCKER-USER -s {{ chains_subnet }} -d 10.0.0.0/8 -j REJECT -# accept the subnet so containers can reach each other. -iptables -I DOCKER-USER -s {{ chains_subnet }} -d {{ chains_subnet }} -j ACCEPT +# Allow containers to reach each other +iptables -I DOCKER-USER -s {{ chains_subnet }} -d {{ chains_subnet }} -j ACCEPT -# I don't know why that is -iptables -I DOCKER-USER -s {{ chains_subnet }} -d 10.13.13.0/24 -j ACCEPT +# Allow specific external networks if needed +iptables -I DOCKER-USER -s {{ chains_subnet }} -d 10.13.13.0/24 -j ACCEPT ``` +Replace `{{ chains_subnet }}` with your `CHAINS_SUBNET` value (default: `192.168.0.0/26`). -### Node Structure +### IP Whitelist -In general Nodes can have one or all of the following components: - -- a client (execution layer) -- a node (for consensus) -- a relay (for data availability access) -- a database (external to the client mostly zk rollups, can have mulitple databases) -- a proxy (to map http access and websockets to the same endpoint) - -The simplest examples have only a client. The compose files define one entrypoint to query the node. usually it's the client otherwise it's the proxy. some clients have multiple entrypoints because they allow to query the consensus layer and the execution layer. - -In the root folder of this repository you can find convenience yml files which are symlinks to specific compose files. The naming for the symlinks follow the principle {network_name}-{chain_name}.yml which leaves the client and bd type unspecified so they are defaults. - - -### Syncing - -The configurations aim to work standalone restoring state as much as possible from public sources. Using snapshots can help syncing faster. For some configurations it's not reasonably possible to maintain a version that can be bootstrapped from scratch using only the compose file. - - -### Naming conventions - -- default client is the default client for the network. Usually it's geth or op-geth. -- default sync mode is pruned. If available clients are snap synced. -- default node is op-node or prysm or whatever is the default for the network (e.g. beacon-kit for berachain, goat for goat, etc.) -- default sync mode for nodes is pruned -- default client for archive nodes is (op-)erigon or (op-)reth -- default sync mode for (op-)reth and (op-)erigon is archive-trace. -- default sync mode for erigon3 is pruned-trace. -- default db is postgres -- default proxy is nginx - -#### Node features - -The idea is to assume a default node configuration that is able to drive the execution client. In case the beacon node database has special features then the file name would include the features after a double hyphen. e.g. `ethereum-mainnet-geth-pruned-pebble-hash--lighthouse-pruned-blobs.yml` would be a node that has a pruned execution client and a pruned beacon node database with a complete blob history. - -#### Container names - -The docker containers are generally named using the base name and the component suffix. The base name is generally the network name and the chain name and the sync mode archive in case of archive nodes. The rationale is that it doesn't make sense to run 2 pruned nodes for the same chain on the same machine as well as 2 archive nodes for the same chain. The volumes that are created in /var/lib/docker/volumes are using the full name of the node including the sync mode and database features. This is to allow switching out the implementation of parts of the configuration and not causing conflicts, e.g. exchanging prysm for nimbus as node implementation but keep using the same exection client. Environment variables are also using the full name of the component that they are defined for. - - -## Utility Scripts - -This directory includes several useful scripts to help you manage and monitor your nodes: - -### Status and Monitoring - -- `show-status.sh [config-name]` - Check sync status of all configured nodes (or specific config if provided) -- `show-db-size.sh` - Display disk usage of all Docker volumes, sorted by size -- `show-networks.sh` - List all available network configurations -- `show-running.sh` - List currently running containers -- `sync-status.sh ` - Check synchronization status of a specific configuration -- `logs.sh ` - View logs of all containers for a specific configuration -- `latest.sh ` - Get the latest block number and hash of a local node -- `ping.sh ` - Test connectivity to a container from inside the Docker network - -### Node Management - -- `stop.sh ` - Stop all containers for a specific configuration -- `force-recreate.sh ` - Force recreate all containers for a specific configuration -- `backup-node.sh [webdav_url]` - Backup Docker volumes for a configuration (locally or to WebDAV) -- `restore-volumes.sh [http_url]` - Restore Docker volumes from backup (local or HTTP source) -- `cleanup-backups.sh` - Clean up old backup files -- `list-backups.sh` - List available backup files -- `op-wheel.sh` - Tool for Optimism rollup maintenance, including rewinding to a specific block - - -Note: `` refers to the compose file name without the .yml extension (e.g., `ethereum-mainnet` for ethereum-mainnet.yml) - - -#### Nuclear option to recreate a node +Control which IPs can access your RPC endpoints: ```bash -./stop.sh && ./rm.sh && ./delete-volumes.sh && ./force-recreate.sh && ./logs.sh +# In .env - allow all +WHITELIST=0.0.0.0/0 + +# Allow specific IPs +WHITELIST=192.0.2.0/24,203.0.113.0/24 + +# Update whitelist without restart +./update-whitelist.sh ``` -#### Debugging tips +--- -To get the configuration name for one of the commands use `./show-status.sh` which lists all the configrations and their status to copy paste for further inspection with e.g. `./catchup.sh ` or repeated use of `./latest.sh ` which will give you and idea if the sync is actually progressing and if it is on the canonical chain. -Note: some configurations use staged sync which means that there is no measurable progress on the RPC in between bacthes of processed blocks. In any case `./logs.sh ` will give you insights into problems, potentially filtered by a LLM to spot common errors. It could be that clients are syncing slower than the chain progresses. +## 11. Daily Operations -#### Further automation - -You can chain `./success-if-almost-synced.sh ` with other scripts to create more complex automation, e.g. notify you once a node synced up to chainhead or adding the node to the dshackle configuration or taking a backup to clone the node to a different server. - -#### OP Wheel Usage Example - -Be aware that this is dangerous because you skip every check for your rollups op-geth execution client database to be consistent. +### Starting and Stopping Nodes ```bash -# Rewind an Optimism rollup to a specific block -./op-wheel.sh engine set-forkchoice --unsafe=0x111AC7F --safe=0x111AC7F --finalized=0x111AC7F \ - --engine=http://op-lisk-sepolia:8551/ --engine.open=http://op-lisk-sepolia:8545 \ - --engine.jwt-secret-path=/jwtsecret +# Start a specific node +./start.sh ethereum-mainnet-geth-pruned + +# Stop a specific node +./stop.sh ethereum-mainnet-geth-pruned + +# Restart a node (keeps volumes) +./force-recreate.sh ethereum-mainnet-geth-pruned ``` -Nuclear option: +### Monitoring Node Health + ```bash -# Finalize the latest locally available block of an Optimism rollup -./op-wheel-finalize-latest-block.sh () +# Check all nodes +./show-status.sh + +# Check specific node +./sync-status.sh ethereum-mainnet-geth-pruned + +# Get latest block +./latest.sh ethereum-mainnet-geth-pruned + +# Check disk usage +./show-db-size.sh + +# Check memory usage +./show-ram.sh ethereum-mainnet-geth-pruned ``` -Where `` is the name of the client service in the compose file and `` is the name of the node service in the compose file which defaults to `-node`. +### Viewing Logs -## SSL Certificates and IP Configuration +```bash +# View logs for a node +./logs.sh ethereum-mainnet-geth-pruned -### Public IP Configuration +# View logs with follow +./logs.sh ethereum-mainnet-geth-pruned -f -Many blockchain nodes require your public IP address to function properly: +# View all container logs +docker compose logs -f +``` -1. Get your public IP address: - ```bash - curl ipinfo.io/ip - ``` +### Checking Peers -2. Add this IP to your `.env` file: - ```bash - IP=203.0.113.42 # Your actual public IP - ``` +```bash +# Check peer count for all nodes +./peer-count.sh -3. This IP is used by several chains for P2P discovery and network communication +# Check specific node peers +./peer-count.sh | grep ethereum-mainnet +``` -### SSL Certificates with Traefik +--- -This system uses Traefik as a reverse proxy for SSL certificates: +## 12. Node Structure and Naming -1. By default, certificates are obtained from Let's Encrypt -2. Use your **public** IP address with traefik.me by replacing dots with hyphens - ``` - # If your public IP is 203.0.113.42 - DOMAIN=203-0-113-42.traefik.me - ``` -3. Traefik.me automatically generates valid SSL certificates for this domain -4. For production, use your own domain and set MAIL for Let's Encrypt notifications -5. To disable SSL, set `NO_SSL=true` in your .env file +### Understanding Node Components -## Configuration +A typical node consists of one or more of these components: -Each network configuration includes: +| Component | Purpose | Example Implementations | +|---|---|---| +| **Client** | Execution layer (handles transactions) | Geth, Erigon, Reth, Nethermind, Besu | +| **Node** | Consensus/derivation layer | op-node, lighthouse, prysm, nitro, beacon-kit | +| **Relay** | Data availability | eigenda-proxy, op-alt, celestia | +| **Proxy** | HTTP/WS unification | nginx | +| **Database** | External database | Postgres | -- Node client software (Geth, Erigon, etc.) -- Synchronization type (archive or pruned) -- Database backend and configuration -- Network-specific parameters +### Naming Conventions -## Accessing RPC Endpoints +**Config names** (compose filenames) follow this pattern: +``` +{network}-{chain}-{client}-{syncmode}-{dbtype}.yml +``` -Once your nodes are running, you can access the RPC endpoints at: +Examples: +- `ethereum-mainnet-geth-pruned.yml` — Ethereum mainnet, Geth client, pruned +- `op-base-sepolia-op-reth-archive-trace.yml` — Base sepolia, op-reth, archive with tracing +- `arb-arbitrum-one-arbnode-archive.yml` — Arbitrum One, arbnode, archive -- HTTPS: `https://yourdomain.tld/ethereum` (or other network paths) -- HTTP: `http://yourdomain.tld/ethereum` (or other network paths) -- WebSocket: `wss://yourdomain.tld/ethereum` (same URL as HTTP/HTTPS) +**Short names** (used in URLs and container labels): +``` +{network}-{chain}[-{client}][-{syncmode}][-{dbtype}] +``` -All services use standard ports (80 for HTTP, 443 for HTTPS), so no port specification is required in the URL. +**Volume names**: +``` +__ +``` +Example: `ethereum-mainnet-geth-pruned_client_data` -## Resource Requirements +### Defaults -Different node types have different hardware requirements: +- **Default client**: geth (Ethereum), op-geth (OP Stack), arbnode (Arbitrum) +- **Default sync mode**: pruned +- **Default database**: Depends on client (leveldb, pebble, etc.) +- **Default node**: op-node (OP Stack), prysm (Ethereum beacon), etc. -- Pruned Ethereum node: ~500GB disk, 8GB RAM -- Archive Ethereum node: ~2TB disk, 16GB RAM -- L2 nodes typically require less resources than L1 nodes -- Consider using SSD or NVMe storage for better performance +--- -## DRPC Integration +## 13. Syncing and Initial Setup -This system includes support for DRPC (Decentralized RPC) integration, allowing you to monetize your RPC nodes by selling excess capacity: +### Sync Modes -### Setting Up DRPC +| Mode | Description | Speed | Storage | +|---|---|---|---| +| **Snap Sync** | Download recent state, verify | Fast | Medium | +| **Fast Sync** | Download blocks, verify headers | Medium | Medium | +| **Full Sync** | Verify all blocks from genesis | Slow | Full | +| **Archive** | Full sync + keep all state | Slowest | Largest | -1. Add `drpc.yml` to your `COMPOSE_FILE` variable in `.env` -2. Configure the following variables in your `.env` file: - ``` - GW_DOMAIN=your-gateway-domain.com - GW_REDIS_RAM=2gb # Memory allocation for Redis - DRPC_VERSION=0.64.16 # Or latest version - ``` -3. Generate the upstream configurations for dshackle: - ```bash - # Using domain URLs (default) - ./upstreams.sh +Most nodes use **snap sync** or **fast sync** by default for pruned nodes. - ``` +### Using Checkpoints -The `upstreams.sh` script automatically detects all running nodes on your machine and generates the appropriate configuration for the dshackle load balancer. This allows you to connect your nodes to drpc.org and sell RPC capacity. +Some nodes support checkpoint sync for faster startup: -For more information about DRPC, visit [drpc.org](https://drpc.org/). +```bash +# Check if checkpoint is configured +grep checkpoint .yml +``` -## Supported Networks +Checkpoints are typically provided by trusted community members or node operators. -This repository supports a comprehensive range of blockchain networks: +### Snapshots -### Layer 1 Networks -- **Major Networks**: Ethereum (Mainnet, Sepolia, Holesky), BSC, Polygon, Avalanche, Gnosis -- **Alternative L1s**: Fantom, Core, Berachain, Ronin, Viction, Fuse, Tron, ThunderCore -- **Emerging L1s**: Goat, AlephZero, Haqq, Taiko, Rootstock +For even faster sync, some nodes support snapshot downloads: -### Layer 2 Networks -- **OP Stack**: Optimism, Base, Zora, Mode, Blast, Fraxtal, Bob, Boba, Worldchain, Metal, Ink, Lisk, SNAX, Celo -- **Arbitrum Ecosystem**: Arbitrum One, Arbitrum Nova, Everclear, Playblock, Real, Connext, OpenCampusCodex -- **Other L2s**: Linea, Scroll, zkSync Era, Metis, Moonbeam +```bash +# Check snapshot configuration +grep snapshot .yml +``` -Most networks support multiple node implementations (Geth, Erigon, Reth) and environments (mainnet, testnet). +Snapshots are large files containing the entire chain state at a point in time. -## Backup and Restore System +### Initial Sync Time -This repository includes a comprehensive backup and restore system for Docker volumes: +| Network | Pruned | Archive | +|---|---|---| +| Ethereum Mainnet | 1-3 days | 1-2 weeks | +| BSC Mainnet | 1-2 days | 3-5 days | +| Polygon Mainnet | 6-12 hours | 1-2 days | +| L2 Networks | Minutes - hours | Hours - 1 day | -### Local Backups +Times vary based on hardware, network connection, and chain state. -- `backup-node.sh ` - Create a backup of all volumes for a configuration to the `/backup` directory -- `restore-volumes.sh ` - Restore volumes from the latest backup in the `/backup` directory +--- -### Remote Backups +## 14. Common Issues and Solutions -To serve backups via HTTP and WebDAV: +### Container Won't Start -1. Add `backup-http.yml` to your `COMPOSE_FILE` variable in `.env` -2. This exposes: - - HTTP access to backups at `https://yourdomain.tld/backup` - - WebDAV access at `https://yourdomain.tld/dav` +**Symptom:** Container exits immediately or fails to start. -### Cross-Server Backup and Restore +**Check:** +```bash +./logs.sh +docker ps -a | grep +``` -For multi-server setups: +**Common fixes:** +- Check `.env` file has all required variables +- Validate compose syntax: `docker compose -f .yml config` +- Ensure ports aren't conflicting +- Pull images: `docker compose pull` -1. On server A: Include `backup-http.yml` in `COMPOSE_FILE` to serve backups -2. On server B: Use restore from server A's backups: - ```bash - # Restore directly from server A - ./restore-volumes.sh ethereum-mainnet https://serverA.domain.tld/backup/ - ``` +### Not Syncing / Stuck -3. Create backups on server B and send to server A via WebDAV: - ```bash - # Backup to server A's WebDAV - ./backup-node.sh ethereum-mainnet https://serverA.domain.tld/dav - ``` +**Symptom:** Node shows 0 peers or stuck at low block. -This allows for efficient volume transfers between servers without needing SSH access. \ No newline at end of file +**Check:** +```bash +./sync-status.sh +./peer-count.sh | grep +./logs.sh | grep -i error +``` + +**Common fixes:** +- Set `IP=` in `.env` (required for P2P) +- Check firewall allows P2P ports +- Verify checkpoint/snapshot URLs are reachable +- Restart with fresh volumes + +### RPC Endpoint Not Accessible + +**Symptom:** Can't connect to RPC endpoint. + +**Check:** +```bash +./show-status.sh +curl http://localhost: +docker ps | grep traefik +``` + +**Common fixes:** +- Check Traefik is running +- Verify DOMAIN is set correctly +- Check SSL configuration +- Test with NO_SSL=true for debugging + +### High Resource Usage + +**Symptom:** Node using too much CPU, memory, or disk. + +**Check:** +```bash +./show-ram.sh +./show-db-size.sh +./show-cpu.sh +``` + +**Common fixes:** +- Reduce max peers +- Upgrade hardware +- Switch to pruned mode +- Limit specific resources in compose file + +### For More Debugging + +For detailed debugging workflows, error patterns, and advanced troubleshooting, +see [VIBE.md](VIBE.md). + +--- + +## 15. Utility Scripts Overview + +This repository includes many scripts for managing your nodes. For the complete +debugging reference, see [VIBE.md](VIBE.md). + +### Most Useful for Daily Operations + +| Script | What It Does | +|---|---| +| `show-status.sh` | Show health of all running nodes | +| `show-running.sh` | List running containers | +| `logs.sh ` | View node logs | +| `sync-status.sh ` | Check sync progress | +| `latest.sh ` | Get latest block | +| `stop.sh ` | Stop a node | +| `start.sh ` | Start a node | +| `force-recreate.sh ` | Restart a node | +| `backup-node.sh ` | Create backup | +| `restore-volumes.sh ` | Restore from backup | + +--- + +## 16. Learning More + +- **[VIBE.md](VIBE.md)** — Complete debugging and operations guide for automated tools +- **[DRPC Documentation](https://drpc.org/)** — For monetization setup +- **[Traefik Documentation](https://doc.traefik.io/traefik/)** — For SSL and reverse proxy +- **[Docker Compose Documentation](https://docs.docker.com/compose/)** — For compose file reference + +--- + +*For debugging and troubleshooting, see [VIBE.md](VIBE.md).* +*For the complete script reference and advanced operations, see [VIBE.md](VIBE.md).* diff --git a/VIBE.md b/VIBE.md new file mode 100644 index 00000000..77ecb136 --- /dev/null +++ b/VIBE.md @@ -0,0 +1,729 @@ +# VIBE.md — ethereum-rpc-docker Operations & Debugging Guide + +You are an LLM agent or operator **running or debugging blockchain RPC nodes** from this +repository. This file is your **primary reference** for all operational tasks. + +This repo contains Docker Compose configurations for blockchain RPC nodes plus operational +scripts for managing them. Everything you need to run, monitor, debug, and fix nodes is here. + +--- + +## 0. WHEN A NODE IS FAULTY — Start Here + +### Immediate Triage (30 seconds) + +```bash +# 1. Is the container running? +./show-running.sh + +# 2. Check overall status of all configured nodes +./show-status.sh + +# 3. If you know the config name, check its specific status +./sync-status.sh + +# 4. Check logs for the faulty node +./logs.sh +``` + +**If the container isn't running**, go to [§3. Container Lifecycle Issues](#3-container-lifecycle-issues) + +**If the container is running but not synced**, go to [§4. Sync Issues](#4-sync-issues) + +**If the container is running and synced but RPC fails**, go to [§5. RPC/Connectivity Issues](#5-rpcconnectivity-issues) + +**If you see errors in logs but aren't sure what they mean**, go to [§6. Log Interpretation](#6-log-interpretation) + +--- + +## 1. Repository Overview + +### What This Repo Contains + +``` +rpc/ +├── *.yml # Docker Compose files for node configurations +├── *.sh # Operational scripts (YOUR PRIMARY TOOLS) +├── scripts/ # Additional helper scripts (CometBFT support) +├── / # Network directories (e.g., ethereum/, op/, arb/) +│ ├── *.yml # Compose files for specific chains +│ └── / # Chain-specific assets +│ ├── genesis.json # Custom genesis files +│ ├── rollup.json # Rollup configurations (OP Stack) +│ └── *.Dockerfile # Custom build files +├── README.md # User documentation +└── VIBE.md # THIS FILE — operations guide +``` + +### Key Concepts + +- **Config name**: The compose filename WITHOUT `.yml` (e.g., `ethereum-mainnet-geth-pruned`) +- **Service name**: Derived from config name, used in `docker compose` commands +- **Short name**: Used in URL paths, container labels. Format: `{network}-{chain}[-{client}][-{db_type}]` +- **Volume names**: Docker volumes follow the full config name pattern + +### Supported Networks + +**Layer 1**: Ethereum, Polygon, BSC, Avalanche, Gnosis, Fantom, Core, Berachain, Ronin, Viction, Fuse, Tron, ThunderCore, Goat, AlephZero, Haqq, Taiko, Rootstock, Dogecoin, Litecoin, Bitcoin, Bitcoin-Cash, Ripple, Solana, Tron + +**Layer 2 (OP Stack)**: Optimism, Base, Zora, Mode, Blast, Fraxtal, Bob, Boba, Worldchain, Metal, Ink, Lisk, SNAX, Celo + +**Layer 2 (Arbitrum)**: Arbitrum One, Arbitrum Nova, Everclear, Playblock, Real, Connext, OpenCampusCodex + +**Other L2s**: Linea, Scroll, zkSync Era, Metis, Moonbeam, Starknet, zkEVM, Immutable zkEVM, Polygon zkEVM + +--- + +## 2. Essential Scripts Reference + +### Status & Monitoring Scripts + +| Script | Usage | What It Does | +|---|---|---| +| `show-status.sh` | `[config-name]` | Lists ALL configured nodes with sync status, block height, health | +| `show-running.sh` | | Lists currently running containers | +| `sync-status.sh` | `` | Detailed sync status for one config | +| `latest.sh` | `` | Latest block number + hash | +| `logs.sh` | `` | Tail logs from all containers in a config | +| `show-db-size.sh` | | Disk usage of ALL Docker volumes, sorted by size | +| `show-ram.sh` | `` | Memory usage of containers | +| `show-cpu.sh` | | CPU usage display | +| `peer-count.sh` | | P2P peer count for all running nodes | +| `time-since-last-block.sh` | `` | How long since last block was processed | +| `ping.sh` | `` | Test network connectivity from container | +| `show-errors.sh` | | Show error counts/logs across containers | +| `show-size.sh` | | Show size of containers/volumes | +| `show-file-size.sh` | | Show static file sizes | +| `show-static-file-size.sh` | | Show static file sizes (alternative) | + +### Lifecycle Management Scripts + +| Script | Usage | What It Does | +|---|---|---| +| `start.sh` | `` | Start all containers for a config | +| `stop.sh` | `` | Stop all containers for a config | +| `force-recreate.sh` | `` | Force recreate containers (keeps volumes) | +| `rm.sh` | `` | Remove containers (keeps volumes) | +| `delete-volumes.sh` | `` | **DESTRUCTIVE** - Remove containers AND volumes | +| `delete-node-keys.sh` | `` | Remove node keys (for re-initialization) | + +### Backup & Restore Scripts + +| Script | Usage | What It Does | +|---|---|---| +| `backup-node.sh` | ` [url]` | Backup volumes locally or to WebDAV | +| `restore-volumes.sh` | ` [url]` | Restore volumes from local or HTTP | +| `clone-node.sh` | `` | Clone a node's state | +| `clone-backup.sh` | | Clone backup files | +| `clone-peers.sh` | | Clone peer information | +| `restore-peers.sh` | | Restore peer connections | +| `list-backups.sh` | | List available backup files | +| `list-peer-backups.sh` | | List peer backup files | +| `list-restorable.sh` | | List restorable configurations | +| `cleanup-backups.sh` | | Remove old backups | +| `cleanup-volumes.sh` | | Clean up unused volumes | + +### Network & Connectivity Scripts + +| Script | Usage | What It Does | +|---|---|---| +| `upstreams.sh` | | Generate dshackle upstream configuration | +| `connect-peers.sh` | | Connect to peer nodes | +| `search-node.sh` | `` | Search compose files for patterns | +| `search-compose.sh` | `` | Search compose files | +| `network-to-config.sh` | | Map network names to config files | +| `reload_dshackle.sh` | | Reload dshackle configuration | +| `update-whitelist.sh` | | Update IP whitelist | +| `update-ip.sh` | | Update IP configuration | + +### Specialized Scripts + +| Script | Usage | What It Does | +|---|---|---| +| `op-wheel.sh` | | OP rollup maintenance (rewind, set forkchoice) | +| `op-wheel-finalize-latest-block.sh` | ` [node_svc]` | Finalize latest block (nuclear option) | +| `catchup.sh` | `` | Help node catch up to chain head | +| `success-if-almost-synced.sh` | ` ` | Exit 0 if node is almost synced | +| `groq.sh` | | Query using Groq | +| `trai.sh` | | Trace transaction | +| `multicurl.sh` | | Parallel curl requests | +| `blocknumber.sh` | | Get block number | +| `get-block.sh` | | Get block information | +| `get-local-url.sh` | | Get local RPC URL | +| `get-shortname.sh` | `` | Get short name for a config | +| `disk-space.sh` | | Check disk space | +| `limit-bandwidth.sh` | | Limit bandwidth | +| `maintenance.sh` | | Maintenance helper | +| `random-port.sh` | | Generate random port | +| `reference-rpc-endpoint.sh` | | Reference RPC endpoint helper | +| `reset-terminal.sh` | | Reset terminal | +| `setup-bandwidth-limit-cron.sh` | | Setup cron for bandwidth limiting | + +--- + +## 3. Container Lifecycle Issues + +### Symptom: Container Won't Start + +```bash +# Check why it failed +./logs.sh 2>&1 | tail -50 + +# Check container exit code +docker ps -a --filter "name=" --format "{{.Names}} | {{.State}} | {{.Status}}" + +# Inspect the container +docker inspect | jq '.[0].State' +``` + +**Common causes:** +- **Port conflict**: Two services trying to bind to same host port +- **Volume permission issues**: Docker can't write to volume +- **Missing environment variables**: `.env` file incomplete +- **Invalid compose syntax**: YAML parsing error +- **Image pull failure**: Network issue or private registry auth + +**Fixes:** +```bash +# Check for port conflicts +grep -h "^[0-9]\{1,5\}:[0-9]" *.yml | sort | uniq -d + +# Validate compose syntax +docker compose -f .yml config + +# Pull images manually +docker compose -f .yml pull + +# Start with --build if using custom Dockerfiles +docker compose -f .yml up -d --build +``` + +### Symptom: Container Exits Immediately After Starting + +```bash +# View the last 100 lines of logs before exit +./logs.sh 2>&1 | tail -100 + +# Check exit code +docker ps -a --filter "name=" --format "{{.Status}}" + +# Run interactively to see error +docker compose -f .yml run --rm sh +``` + +**Common causes:** +- **Missing config files**: `/config/` mount empty or wrong path +- **Invalid flags**: Command-line arguments malformed +- **Database corruption**: Existing data incompatible with new version +- **Checkpoint/genesis mismatch**: Chain ID or genesis doesn't match + +**Fixes:** +```bash +# Verify config directory exists (if using custom configs) +ls -la // + +# Try with fresh volumes (DESTRUCTIVE) +./delete-volumes.sh +./start.sh +``` + +### Symptom: Container Restarts Repeatedly (Crash Loop) + +```bash +# Watch logs in real-time +./logs.sh -f + +# Check restart count +docker inspect | jq '.[0].RestartCount' + +# Check last restart reason +docker inspect | jq '.[0].State.ExitCode, .[0].State.Error' +``` + +**Common causes:** +- **OOM killed**: Memory limit exceeded +- **Out of disk space**: No space left on device +- **Segmentation fault**: Client bug or bad data +- **Panic**: Go client panic + +**Fixes:** +```bash +# Check memory usage +./show-ram.sh + +# Check disk space +df -h /var/lib/docker +./show-db-size.sh + +# Increase resources in compose file or .env +# Then force recreate +./force-recreate.sh +``` + +--- + +## 4. Sync Issues + +### Symptom: Node Not Syncing (Stuck at Block 0 or Low Block) + +```bash +# Check sync status +./sync-status.sh + +# Check current block +./latest.sh + +# Check logs for sync errors +./logs.sh | grep -i -E "sync|error|fail|warn|stuck|behind" + +# Check peer count +./peer-count.sh | grep +``` + +**Common causes:** +- **No peers**: P2P network connection failed +- **Wrong network**: Connected to wrong chain +- **Checkpoint too old**: Checkpoint URL unavailable or outdated +- **Snapshot download failed**: Snapshot server unreachable + +**Fixes:** +```bash +# Check if checkpoint/snapshot is configured +grep -E "(checkpoint|snapshot)" .yml + +# Test checkpoint URL manually +curl -I $(grep checkpoint .yml | grep -oE 'http[^ ]+') + +# Check peer connections (geth example) +docker exec admin_peers | jq '.[] | .network.remoteAddress' | wc -l +``` + +### Symptom: Sync is Very Slow + +```bash +# Check sync speed over time +./latest.sh ; sleep 60; ./latest.sh + +# Check if node is processing blocks +./time-since-last-block.sh + +# Check CPU and memory +top -d 1 -p $(docker inspect | jq -r '.[0].State.Pid') +``` + +**Common causes:** +- **Resource constrained**: CPU throttled, memory swapped +- **Disk I/O bottleneck**: Slow storage or contention +- **Network rate limited**: P2P or RPC rate limiting +- **Too many peers**: P2P overhead +- **Wrong sync mode**: Full sync instead of snap sync + +### Symptom: Sync Stuck at Specific Block + +```bash +# Check logs around the stuck block +./logs.sh | grep -A 10 -B 10 "block " + +# Check if it's a known bad block +# Search online: bad block +``` + +**Common causes:** +- **Bad block in chain**: Requires client patch or manual intervention +- **State trie inconsistency**: Database corruption +- **Fork choice issue**: Node on wrong fork + +**Fixes for OP Stack:** +```bash +# Try to finalize past the block +./op-wheel-finalize-latest-block.sh +``` + +### Symptom: Node on Wrong Fork / Chain + +```bash +# Check chain ID +./latest.sh | grep -i chain + +# Check what chain the node thinks it's on +docker exec ethdo chain --endpoint=http://localhost:8545 + +# Compare with expected chain ID +grep chainId .yml +``` + +--- + +## 5. RPC/Connectivity Issues + +### Symptom: RPC Endpoint Not Responding + +```bash +# Test from host +curl -s http://localhost: | head -c 100 + +# Check if traefik/proxy is running +docker ps | grep -E "(traefik|proxy|nginx)" + +# Check traefik logs +docker logs | tail -50 +``` + +**Common causes:** +- **Container not running**: Client crashed +- **Port not exposed**: Wrong port mapping +- **Traefik misconfiguration**: Labels wrong or missing +- **Firewall blocking**: Host firewall or cloud security group + +### Symptom: RPC Returns Wrong Chain ID + +```bash +# Query chain ID from RPC +curl -s -X POST http://localhost: \ + -H "Content-Type: application/json" \ + -d '{"jsonrpc":"2.0","method":"eth_chainId","params":[],"id":1}' +``` + +### Symptom: Cannot Connect to P2P Network + +```bash +# Check peer count +./peer-count.sh | grep + +# Test P2P connectivity from container +docker exec nc -zv +``` + +**Fixes:** +```bash +# Set public IP in .env +IP=$(curl -s ipinfo.io/ip) +echo "IP=$IP" >> .env +./force-recreate.sh +``` + +--- + +## 6. Log Interpretation + +### Common Log Patterns + +#### Warnings (Node may still function) +| Pattern | Meaning | Action | +|---|---|---| +| `WARN.*sync.*slow` | Sync slower than expected | Check resources | +| `WARN.*peers.*low` | Fewer peers than desired | Check P2P connectivity | +| `WARN.*rate.*limit` | API rate limiting active | Normal for public endpoints | + +#### Errors (Node is degraded) +| Pattern | Meaning | Action | +|---|---|---| +| `Error.*database.*corrupt` | Database corruption | Restore from backup or resync | +| `Error.*handshake.*fail` | P2P handshake failed | Check chain ID | +| `Error.*no.*peers` | Cannot connect to P2P | Check bootstrap nodes | +| `Error.*timeout` | RPC/HTTP timeout | Check network, increase timeout | + +#### Fatal (Node will not function) +| Pattern | Meaning | Action | +|---|---|---| +| `Fatal.*panic` | Client crashed | Check client version | +| `Fatal.*OOM` | Out of memory | Increase memory limit | +| `Fatal.*disk.*full` | No disk space | Free space | +| `Fatal.*permission.*denied` | Filesystem permissions | Fix volume permissions | + +--- + +## 7. Resource Issues + +### High CPU Usage +```bash +./show-ram.sh +./show-cpu.sh +docker stats --no-stream +``` + +### High Memory Usage +```bash +./show-ram.sh +docker stats --no-stream --format "{{.Container}} | {{.MemUsage}} | {{.MemPerc}}" +``` + +### High Disk Usage +```bash +./show-db-size.sh +docker system df -v +``` + +### Disk I/O Bottleneck +```bash +iotop -o -d 1 +``` + +--- + +## 8. Backup and Restore + +### Creating a Backup +```bash +# Local backup (to /backup directory) +./backup-node.sh + +# Remote backup (to WebDAV) +./backup-node.sh https://backup-server.tld/dav +``` + +### Restoring from Backup +```bash +# List available backups +./list-backups.sh + +# Restore latest backup for config +./restore-volumes.sh + +# Restore from specific URL +./restore-volumes.sh https://backup-server.tld/backup/ +``` + +### Cloning a Node + +```bash +# Clone a node to a new location +./clone-node.sh + +# Clone peers (for faster sync) +./clone-peers.sh +``` + +### Nuclear Option: Full Reset + +```bash +# WARNING: This deletes ALL data for the config +./stop.sh && \ +./rm.sh && \ +./delete-volumes.sh && \ +./delete-node-keys.sh && \ +./force-recreate.sh + +# Then check logs +./logs.sh +``` + +--- + +## 9. Common Error Messages + +### Database Errors +| Error | Cause | Solution | +|---|---|---| +| `database is corrupted` | Power loss, bug | Restore from backup or resync | +| `database version mismatch` | Client version changed | Delete and resync | + +### P2P Errors +| Error | Cause | Solution | +|---|---|---| +| `no configured peers` | Missing bootstrap nodes | Add bootstrap nodes | +| `handshake failed` | Chain ID mismatch | Verify genesis.json | + +### RPC Errors +| Error | Cause | Solution | +|---|---|---| +| `method not found` | Wrong client | Use correct client | +| `connection refused` | Port not open | Check container running, port mapping | + +--- + +## 10. OP Stack Specific Debugging + +### OP Node Issues + +```bash +# Check op-node logs +./logs.sh | grep -i "op-node\|rollup\|sequencer" + +# Check rollup configuration (if custom) +cat op//ethereum/rollup.json | jq . + +# Check if rollup.json is mounted +docker exec cat /config/rollup.json | jq . +``` + +### OP Wheel (Manual Intervention) + +```bash +# Rewind to specific block (DANGEROUS - only if you know what you're doing) +./op-wheel.sh engine set-forkchoice \ + --unsafe= \ + --safe= \ + --finalized= \ + --engine=http://:8551/ \ + --engine.open=http://:8545 \ + --engine.jwt-secret-path=/jwtsecret + +# Nuclear option: finalize latest local block +./op-wheel-finalize-latest-block.sh +``` + +--- + +## 11. CometBFT Family (Cosmos, etc.) Specific + +### Init Container Issues + +```bash +# CometBFT chains use init.sh inside the container +# The master script is at scripts/cometbft-common.sh + +# Check if init completed +./logs.sh | grep -i "init\|setup\|complete" + +# Check the init script +cat //scripts/init.sh +``` + +--- + +## 12. Quick Start Guide + +### Starting a Node + +```bash +# 1. Set up environment +echo "IP=$(curl -s ipinfo.io/ip)" > .env +echo "DOMAIN=${IP//./-}.traefik.me" >> .env +echo "MAIL=your-email@example.com" >> .env + +# 2. Select which nodes to run +# Add compose files to COMPOSE_FILE (colon-separated) +echo "COMPOSE_FILE=base.yml:rpc.yml:ethereum-mainnet-geth-pruned.yml" >> .env + +# 3. Start the node +docker compose up -d + +# 4. Verify it's running +./show-status.sh +``` + +### Accessing Your Node + +```bash +# Once running, access via: +# HTTP: http:///ethereum-mainnet-geth-pruned +# HTTPS: https:///ethereum-mainnet-geth-pruned +# WebSocket: wss:///ethereum-mainnet-geth-pruned + +# Or locally (if NO_SSL=true): +# HTTP: http://localhost: +``` + +--- + +## 13. Configuration Reference + +### Environment Variables + +**Required for most setups:** +```bash +IP=203.0.113.42 # Your public IP +DOMAIN=203-0-113-42.traefik.me # Your domain (traefik.me for testing) +MAIL=your-email@example.com # For Let's Encrypt SSL +WHITELIST=0.0.0.0/0 # IP whitelist (0.0.0.0/0 = all) +``` + +**Optional:** +```bash +NO_SSL=true # Disable SSL (testing only) +CHAINS_SUBNET=192.168.0.0/26 # Docker network subnet +``` + +**Chain-specific (examples):** +```bash +ETHEREUM_MAINNET_EXECUTION_RPC=https://fallback-rpc.example.com +ARBITRUM_SEPOLIA_EXECUTION_RPC=https://arb-sepolia-rpc.example.com +OP_NODE_NETWORK=mainnet +OP_NODE_L1_RPC_URL=https://l1-rpc.example.com +``` + +### Compose File Structure + +Each compose file defines one or more services: +- **client**: Execution layer (Geth, Erigon, Reth, etc.) +- **node**: Consensus/derivation node (op-node, lighthouse, etc.) +- **relay**: DA relay (eigenda-proxy, op-alt, etc.) +- **proxy**: HTTP/WS proxy (nginx, etc.) +- **database**: External database (Postgres, etc.) + +### Volume Naming + +Volumes are named after the config: +``` +__data +__config +``` + +Example: `ethereum-mainnet-geth-pruned_client_data` + +--- + +## 14. Quick Debugging Checklist + +Use this checklist when debugging an issue: + +- [ ] **Is the container running?** → `./show-running.sh` +- [ ] **Are there errors in logs?** → `./logs.sh | grep -i error` +- [ ] **Is the node synced?** → `./sync-status.sh ` +- [ ] **Are peers connected?** → `./peer-count.sh` +- [ ] **Are resources adequate?** → `./show-ram.sh`, `./show-db-size.sh` +- [ ] **Is P2P working?** → Check peer count +- [ ] **Is RPC responding?** → Test with curl +- [ ] **Is disk space available?** → `df -h /var/lib/docker` +- [ ] **Is the config file correct?** → `docker compose -f .yml config` +- [ ] **Are environment variables set?** → Check `.env` +- [ ] **Is the genesis file correct?** → Check chain ID + +--- + +## 15. When to Escalate + +Escalate to a human operator if: + +- [ ] Node stuck for > 2 hours with no progress +- [ ] Repeated `Fatal` or `panic` errors after restart +- [ ] Database corruption confirmed +- [ ] Issue affects multiple nodes across different chains +- [ ] Need to force-push to this repo + +--- + +## 16. File Locations Quick Reference + +| What You Need | Where to Find It | +|---|---| +| Compose files | Root of this repo (`*.yml`) | +| Operational scripts | Root of this repo (`*.sh`) | +| Chain assets | `//` or `///` | +| Genesis files | `///genesis.json` | +| Rollup configs | `op///rollup.json` | +| Custom Dockerfiles | `/*.Dockerfile` | +| Init scripts | `/scripts/init.sh` | +| CometBFT common | `scripts/cometbft-common.sh` | +| Compose registry | `compose_registry.json` | +| RPC endpoints | `reference-rpc-endpoint.json` | +| Environment | `.env` | + +--- + +## 17. Resource Requirements Reference + +| Node Type | Disk | RAM | CPU | +|---|---|---|---| +| Ethereum pruned | ~500GB | 8GB | 2+ cores | +| Ethereum archive | ~2TB+ | 16GB+ | 4+ cores | +| Ethereum archive-trace | ~4TB+ | 32GB+ | 8+ cores | +| L2 pruned | ~100-500GB | 4-8GB | 2+ cores | +| L2 archive | ~1-2TB | 8-16GB | 4+ cores | + +**Note:** Requirements vary by chain. Check specific chain documentation. + +--- + +*This file is your complete operations and debugging reference. For additional user documentation, see README.md.*