Indexer Runtime and Performance
Proper configuration and resource monitoring delivers the most performant custom indexer possible. For example:
-
Runtime configuration options for ingestion, database connections, and pipeline selection, as well as purposeful use of debugging tools like
tokio_consolehelp dial in your indexer performance. -
A sensible strategy targeting efficient data pruning for your tables keeps them performant over time.
-
Following best practices for exposing and extending Prometheus metrics helps you keep track of indexer performance.
Together, these techniques help you run indexers that are fast, resource-efficient, and easier to monitor in both development and production.
Fine-tuning configurations
The indexing framework provides multiple levels of configuration to optimize performance for different use cases. This section covers basic configuration options, while complex pipeline-specific tuning is covered in Indexer Pipeline Architecture.
Ingestion layer configuration
Control how checkpoint data is fetched and distributed:
let ingestion_config = IngestionConfig {
// Buffer size across all downstream workers (default: 5000)
checkpoint_buffer_size: 10000,
// Concurrent checkpoint fetches (default: 200)
ingest_concurrency: 500,
// Retry interval for missing checkpoints in ms (default: 200)
retry_interval_ms: 100,
};
Tuning guidelines:
checkpoint_buffer_size: Increase for high-throughput scenarios, decrease to reduce memory usage.ingest_concurrency: Higher values improve ingestion speed but increase network/storage load.retry_interval_ms: Lower values reduce latency for live data, higher values reduce unnecessary retries.
Database connection configuration
let db_args = DbArgs {
// Connection pool size (default: 100)
db_connection_pool_size: 200,
// Connection timeout in ms (default: 60,000)
db_connection_timeout_ms: 30000,
// Statement timeout in ms (default: None)
db_statement_timeout_ms: Some(120000),
};
Tuning guidelines:
db_connection_pool_size: Size based onwrite_concurrencyacross all pipelines.db_connection_timeout_ms: Reduce for faster failure detection in high-load scenarios.db_statement_timeout_ms: Set based on expected query complexity and database performance.
Command-line arguments
Include the following command-line arguments to help focus processing. These values are for demonstration. Use values that make sense to your environment and goals.
# Checkpoint range control
--first-checkpoint 1000000 # Start from specific checkpoint
--last-checkpoint 2000000 # Stop at specific checkpoint
# Pipeline selection
--pipeline "tx_counts" # Run specific pipeline only
--pipeline "events" # Can specify multiple pipelines
Use cases:
- Checkpoint range: Essential for backfills and historical data processing.
- Pipeline selection: Useful for selective reprocessing or testing.
- Skip watermark: Enables faster backfills when watermark consistency isn't required.
Pipeline-specific advanced tuning
For complex configuration scenarios requiring deep understanding of pipeline internals:
Tokio runtime debugging
For performance-sensitive pipelines or when troubleshooting async runtime issues, the sui-indexer-alt-framework integrates with tokio-console, a powerful debugger for async Rust applications. This tool provides real-time insights into task execution, helping identify performance bottlenecks, stuck tasks, and memory issues.
When to use Tokio console
The Tokio console is particularly useful for:
- Performance debugging: Identifying slow or blocking tasks.
- Memory analysis: Finding tasks consuming excessive memory.
- Concurrency issues: Detecting tasks that never yield or wake themselves excessively.
- Runtime behavior: Understanding task scheduling and execution patterns.
Setup instructions
Consult the README in the Tokio GitHub repo for additional information.
-
Add dependencies
Add the
telemetry_subscribersdependency to yourCargo.toml:[dependencies]
telemetry_subscribers = { git = "https://github.com/MystenLabs/sui.git", branch = "main" } -
Initialize telemetry
Add telemetry initialization at the beginning of your
mainfunction:#[tokio::main]
async fn main() -> Result<()> {
// Enable tracing, configured by environment variables
let _guard = telemetry_subscribers::TelemetryConfig::new()
.with_env()
.init();
// Your indexer code here...
} -
Run with console enabled
Start your indexer with the required flags:
RUSTFLAGS="--cfg tokio_unstable" TOKIO_CONSOLE=1 cargo runFlag explanations:
TOKIO_CONSOLE=1: Enablestokio-consoleintegration intelemetry_subscribers.RUSTFLAGS="--cfg tokio_unstable": Required bytokio-consoleto collect task instrumentation data.
-
Launch the console dashboard
# Install tokio-console if not already installed
cargo install tokio-console
# Connect to your running indexer (default: localhost:6669)
tokio-consoleIf successful, the dashboard appears with information about every running Tokio task:

Console features
For detailed information about the console dashboard, available views, warnings, and diagnostic capabilities, refer to the official tokio-console documentation
Production considerations
tokio-console introduces runtime overhead and should be used carefully in production. While it's safe to use regularly during development and staging, production usage requires careful evaluation. Before enabling in production, you should run performance benchmarks with and without Tokio console enabled to measure the impact on your specific workload. Consider enabling only during maintenance windows or targeted troubleshooting sessions while monitoring system resources.
Metrics
The sui-indexer-alt-framework provides built-in Prometheus metrics for monitoring indexer performance and health. All metrics are automatically exposed via HTTP and can be extended with custom metrics.
Built-in metrics
The framework tracks extensive metrics across ingestion, pipeline processing, database operations, and watermark management. For a complete list of available metrics with descriptions, refer to the IndexerMetrics struct in sui-indexer-alt-framework/src/metrics.rs.
#[derive(Clone)]
pub struct IndexerMetrics {
// Statistics related to fetching data from the remote store.
pub total_ingested_checkpoints: IntCounter,
pub total_ingested_transactions: IntCounter,
pub total_ingested_events: IntCounter,
pub total_ingested_inputs: IntCounter,
pub total_ingested_outputs: IntCounter,
pub total_ingested_bytes: IntCounter,
pub total_ingested_transient_retries: IntCounterVec,
pub total_ingested_not_found_retries: IntCounter,
// Checkpoint lag metrics for the ingestion pipeline.
pub latest_ingested_checkpoint: IntGauge,
pub latest_ingested_checkpoint_timestamp_lag_ms: IntGauge,
pub ingested_checkpoint_timestamp_lag: Histogram,
pub ingested_checkpoint_latency: Histogram,
// Statistics related to individual ingestion pipelines' handlers.
pub total_handler_checkpoints_received: IntCounterVec,
pub total_handler_checkpoints_processed: IntCounterVec,
pub total_handler_rows_created: IntCounterVec,
pub latest_processed_checkpoint: IntGaugeVec,
pub latest_processed_checkpoint_timestamp_lag_ms: IntGaugeVec,
pub processed_checkpoint_timestamp_lag: HistogramVec,
pub handler_checkpoint_latency: HistogramVec,
// Statistics related to individual ingestion pipelines.
pub total_collector_checkpoints_received: IntCounterVec,
pub total_collector_rows_received: IntCounterVec,
pub total_collector_batches_created: IntCounterVec,
pub total_committer_batches_attempted: IntCounterVec,
pub total_committer_batches_succeeded: IntCounterVec,
pub total_committer_batches_failed: IntCounterVec,
pub total_committer_rows_committed: IntCounterVec,
pub total_committer_rows_affected: IntCounterVec,
pub total_watermarks_out_of_order: IntCounterVec,
pub total_pruner_chunks_attempted: IntCounterVec,
pub total_pruner_chunks_deleted: IntCounterVec,
pub total_pruner_rows_deleted: IntCounterVec,
// Checkpoint lag metrics for the collector.
pub latest_collected_checkpoint: IntGaugeVec,
pub latest_collected_checkpoint_timestamp_lag_ms: IntGaugeVec,
pub collected_checkpoint_timestamp_lag: HistogramVec,
// Checkpoint lag metrics for the committer.
// We can only report partially committed checkpoints, since the concurrent committer isn't aware of
// when a checkpoint is fully committed. So we report whenever we see a checkpoint. Since data from
// the same checkpoint is batched continuously, this is a good proxy for the last committed checkpoint.
pub latest_partially_committed_checkpoint: IntGaugeVec,
pub latest_partially_committed_checkpoint_timestamp_lag_ms: IntGaugeVec,
pub partially_committed_checkpoint_timestamp_lag: HistogramVec,
// Checkpoint lag metrics for the watermarker.
// The latest watermarked checkpoint metric is already covered by watermark_checkpoint_in_db.
// While we already have watermark_timestamp_in_db_ms metric, reporting the lag explicitly
// for consistency.
pub latest_watermarked_checkpoint_timestamp_lag_ms: IntGaugeVec,
pub watermarked_checkpoint_timestamp_lag: HistogramVec,
pub collector_gather_latency: HistogramVec,
pub collector_batch_size: HistogramVec,
pub committer_commit_latency: HistogramVec,
pub committer_tx_rows: HistogramVec,
pub watermark_gather_latency: HistogramVec,
pub watermark_commit_latency: HistogramVec,
pub watermark_pruner_read_latency: HistogramVec,
pub watermark_pruner_write_latency: HistogramVec,
pub pruner_delete_latency: HistogramVec,
pub watermark_epoch: IntGaugeVec,
pub watermark_checkpoint: IntGaugeVec,
pub watermark_transaction: IntGaugeVec,
pub watermark_timestamp_ms: IntGaugeVec,
pub watermark_reader_lo: IntGaugeVec,
pub watermark_pruner_hi: IntGaugeVec,
pub watermark_epoch_in_db: IntGaugeVec,
pub watermark_checkpoint_in_db: IntGaugeVec,
pub watermark_transaction_in_db: IntGaugeVec,
pub watermark_timestamp_in_db_ms: IntGaugeVec,
pub watermark_reader_lo_in_db: IntGaugeVec,
pub watermark_pruner_hi_in_db: IntGaugeVec,
}
Key metric categories include:
- Ingestion metrics: Global checkpoint and transaction processing stats.
- Pipeline metrics: Per-pipeline processing performance (labeled by pipeline name).
- Database metrics: Batch processing, commit latency, and failure rates.
- Watermark metrics: Progress tracking and lag measurements.