Sophon: PostgreSQL in Docker too slow to catch up to chainhead #1

Open
opened 2025-12-17 08:26:30 +00:00 by claude · 0 comments
Collaborator

Problem

Sophon node is unable to sync to chainhead, likely due to PostgreSQL performance issues when running in a Docker container.

Observed Behavior

On rpc-de-13, the sophon-mainnet node was stuck in a restart loop with the following error:

migration 20250822160137 was previously applied but is missing in the resolved migrations

This migration error may be a symptom of underlying database performance issues causing incomplete or corrupted state.

Hypothesis

Sophon uses PostgreSQL as its database backend. Running PostgreSQL in Docker with default settings may not provide adequate I/O performance for blockchain sync workloads, which are write-heavy and require consistent low-latency disk access.

Investigation Needed

  1. PostgreSQL configuration: Check if Sophon's postgres container has optimized settings for blockchain workloads (shared_buffers, wal_buffers, checkpoint settings, etc.)

  2. Volume configuration: Verify if postgres data volume is using appropriate storage driver and mount options

  3. I/O performance: Compare postgres I/O metrics in Docker vs bare metal

  4. Alternative approaches:

    • Run PostgreSQL on bare metal with Sophon connecting remotely
    • Use a dedicated PostgreSQL container with tuned settings
    • Consider alternative database backends if Sophon supports them

Current Status

  • Node has been purged from rpc-de-13
  • No other Sophon instances currently running in the fleet
  • No existing backups available

References

  • Host: rpc-de-13
  • Node: sophon-mainnet
## Problem Sophon node is unable to sync to chainhead, likely due to PostgreSQL performance issues when running in a Docker container. ## Observed Behavior On rpc-de-13, the sophon-mainnet node was stuck in a restart loop with the following error: ``` migration 20250822160137 was previously applied but is missing in the resolved migrations ``` This migration error may be a symptom of underlying database performance issues causing incomplete or corrupted state. ## Hypothesis Sophon uses PostgreSQL as its database backend. Running PostgreSQL in Docker with default settings may not provide adequate I/O performance for blockchain sync workloads, which are write-heavy and require consistent low-latency disk access. ## Investigation Needed 1. **PostgreSQL configuration**: Check if Sophon's postgres container has optimized settings for blockchain workloads (shared_buffers, wal_buffers, checkpoint settings, etc.) 2. **Volume configuration**: Verify if postgres data volume is using appropriate storage driver and mount options 3. **I/O performance**: Compare postgres I/O metrics in Docker vs bare metal 4. **Alternative approaches**: - Run PostgreSQL on bare metal with Sophon connecting remotely - Use a dedicated PostgreSQL container with tuned settings - Consider alternative database backends if Sophon supports them ## Current Status - Node has been purged from rpc-de-13 - No other Sophon instances currently running in the fleet - No existing backups available ## References - Host: rpc-de-13 - Node: sophon-mainnet
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: StakeSquid/ethereum-rpc-docker#1