Skip to main content
Version: Latest-4.0

Graceful Exit

From v3.3 onwards, StarRocks supports Graceful Exit.

Overview​

Graceful Exit is a mechanism designed to support non-disruptive upgrades and restarts of StarRocks FE, BE, and CN nodes. Its primary objective is to minimize the impact on running queries and data ingestion tasks during maintenance operations such as node restart, rolling upgrade, or cluster scaling.

Graceful Exit ensures that:

  • The node stops accepting new tasks once exit begins;
  • Existing queries and load jobs are allowed to complete within a controlled time window;
  • System components (FE/BE/CN) coordinate status changes so that the cluster correctly reroutes traffic.

Graceful Exit mechanisms differ between FE and BE/CN nodes, as described below.

FE Graceful Exit Mechanism​

Trigger Signal​

FE Graceful Exit is initiated via:

stop_fe.sh -g

This sends a SIGUSR1 signal, while the default exit (without the -g option) sends SIGTERM signal.

Load Balancer Awareness​

Upon receiving the signal:

  • FE immediately returns HTTP 500 on the /api/health endpoint.
  • Load balancers detect the degraded state within ~15 seconds and stop routing new connections to this FE.

Connection Drain and Shutdown Logic​

Follower FE

  • Handles read-only queries.
  • If the FE node has no active sessions, the connection is closed immediately.
  • If SQL is running, the FE node waits for execution to finish before closing the session.

Leader FE

  • Read request handling is identical to the of Followers.

  • Write request handling requires:

    • Closing BDBJE.
    • Allowing a new Leader election to complete.
    • Redirecting subsequent writes to the newly elected Leader.

Timeout Control​

If a query runs for too long, FE forcibly exits after 60 seconds (configurable via the --timeout option).

BE/CN Graceful Exit Mechanism​

Trigger Signal​

BE Graceful Exit is initiated via:

stop_be.sh -g

CN Graceful Exit is initiated via:

stop_cn.sh -g

This sends a SIGTERM signal, while the default exit (without the -g option) sends SIGKILL signal.

State Transition​

After receiving the signal:

  • The BE/CN node marks itself as exiting.
  • It rejects new query fragments by returning INTERNAL_ERROR.
  • It continues processing existing fragments.

Wait Loop for In-Flight Queries​

The behavior that BE/CN waits for existing fragments to finish is controlled by the BE/CN configuration loop_count_wait_fragments_finish (Default: 2). The actual wait duration equals loop_count_wait_fragments_finish Γ— 10 seconds (that is, 20 seconds by default). If fragments remain after timeout, BE/CN proceeds with normal shutdown (closing threads, network, and other processes).

Improved FE Awareness​

From v3.4 onwards, FE no longer marks BE/CN as DEAD based on heartbeat failures. It correctly recognizes the BE/CN β€œexiting” state, allowing significantly longer graceful-exit windows for fragments to be completed.

Configurations​

FE Configurations​

stop_fe.sh -g --timeout​

  • Description: Maximum waiting time before FE is force-killed.
  • Default: 60 (seconds)
  • How to apply: Specify it in the script command, for example, --timeout 120.

Minimum LB detection time​

  • Description: LB requires at least 15 seconds to detect degraded health.
  • Default: 15 (seconds)
  • How to apply: Fixed value

BE/CN Configurations​

loop_count_wait_fragments_finish​

  • Description: BE/CN wait duration for existing fragments. Multiply the value with 10 seconds.
  • Default: 2
  • How to apply: Modify it in the BE/CN configuration file or update it dynamically.

graceful_exit_wait_for_frontend_heartbeat​

  • Description: Whether BE/CN waits for FE to confirm SHUTDOWN via heartbeat. From v3.4.5 onwards.
  • Default: false
  • How to apply: Modify it in the BE/CN configuration file or update it dynamically.

stop_be.sh -g --timeout, stop_cn.sh -g --timeout​

  • Description: Maximum waiting time before BE/CN is force-killed. Set it to a value larger than loop_count_wait_fragments_finish * 10 to prevent termination before the BE/CN wait duration is reached.
  • Default: false
  • How to apply: Specify it in the script command, for example, --timeout 30.

Global Switches​

Graceful Exit is enabled by default from v3.4 onwards. To disable it temporarily, set the BE/CN configuration loop_count_wait_fragments_finish to 0.

Expected Behavior During Graceful Exit​

Query Workloads​

Query TypeExpected Behavior
Short (less than 20 seconds)BE/CN waits long enough, so queries complete normally.
Medium (20 to 60 seconds)Queries completed within BE/CN wait window are returned with success; else queries are cancelled and require manual retry.
Long (more than 60 seconds)Queries are likely terminated by FE/BE/CN due to timeout and requires manual retry.

Data Ingestion Tasks​

  • Loading tasks via Flink or Kafka Connectors are automatically retried with no user-visible interruption.
  • Stream Load (non-framework), Broker Load, and Routine Load tasks may fail if connection breaks. Manual retry is required.
  • Background tasks are automatically re-scheduled and executed by the FE retry mechanism.

Upgrade and Restart Operations​

Graceful Exit ensures:

  • No cluster-wide downtime;
  • Safe rolling upgrade by draining nodes one at a time.

Limitations and Version Differences​

Version Behavior Differences​

VersionBehavior
v3.3BE Graceful Exit flawed: FE may prematurely mark BE/CN as DEAD, causing queries to get cancelled. The effective wait is limited (15 seconds by default).
v3.4+Fully supports extended wait duration; FE correctly identifies BE/CN β€œexiting” state. Recommended for production.

Operational Limitations​

  • In extreme cases (for example, BE/CN hangs), Graceful Exit may fail. Terminating the process requires kill -9, risking partial data persistence (recoverable via snapshot).

Usage​

Prerequisites​

StarRocks version:

  • v3.3+: Basic Graceful Exit support.
  • v3.4+: Enhanced status management, longer wait windows (up to several minutes).

Configuration:

  • Make sure loop_count_wait_fragments_finish is set to a positive integer.
  • Set graceful_exit_wait_for_frontend_heartbeat to true allow FE to detect BE's "EXITING" state.

Perform FE Graceful Exit​

./bin/stop_fe.sh -g --timeout 60

Parameters:

  • --timeout: The maximum time to wait before the FE node is force-killed.

Behavior:

  • The system sends the SIGUSR1 signal first.
  • After timeout, it falls back to SIGKILL.

Validate FE State​

You can check the FE health via the following API:

http://<fe_host>:8030/api/health

LB removes the node after receiving consecutive non-200 responses.

Perform BE/CN Graceful Exit​

  • For v3.3:

    • BE:
    ./be/bin/stop_be.sh -g
    • CN:
    ./be/bin/stop_cn.sh -g
  • For v3.4+:

    • BE:
    ./bin/stop_be.sh -g --timeout 600
    • CN:
    ./bin/stop_cn.sh -g --timeout 600

BE/CN exits immediately if no fragments remain.

Validate BE/CN Status​

Run on FE:

SHOW BACKENDS;

StatusCode:

  • SHUTDOWN: BE/CN Graceful Exit in progress.
  • DISCONNECTED: BE/CN Node has fully exited.

Rolling Upgrade Workflow​

Procedure​

  1. Perform Graceful Exit on the node A.
  2. Confirm the node A is shown as DISCONNECTED from the FE side.
  3. Upgrade and restart the node A.
  4. Repeat the above for remaining nodes.

Monitor Graceful Exit​

Check FE logs fe.log, BE logs be.log, or CN logs cn.log to make sure whether there were tasks during the exit.

Troubleshooting​

BE/CN exits by timeout​

If tasks fail to complete within the Graceful Exit period, BE/CN will trigger forced termination (SIGKILL). Verify whether this is caused by excessively long task duration or improper configurations (for example, an overly small --timeout value).

Node status is not SHUTDOWN​

If the node status is not SHUTDOWN, verify whether loop_count_wait_fragments_finish is set to a positive integer, or if BE/CN reported a heartbeat before exiting (if not, set graceful_exit_wait_for_frontend_heartbeat to true).

Rocky the happy otterStarRocks Assistant

AI generated answers are based on docs and other sources. Please test answers in non-production environments.