Commit scopes v5
Commit scopes are rules which determine how transaction commits and conflicts are handled within a PGD system.
Commit scopes are manipulated by the bdr.add_commit_scope
, bdr.alter_commit_scope
and bdr.remove_commit_scope
functions.
Commit scope syntax
The overall grammar for commit scope rules is composed as follows:
where node_group
is the name of a PGD data node group.
Commit scope groups
ANY
Example: ANY 2 (left_dc)
A transaction under this commit scope group will be considered commited if any two nodes in the left_dc group confirm they have processed the transaction.
ANY NOT
Example: ANY 2 NOT (left_dc)
A transaction under this commit scope group will be considered commited if any two nodes that are not in the left_dc group confirm they have processed the transaction.
MAJORITY
Example: MAJORITY (left_dc)
A transaction under this commit scope group will be considered commited if a majority of the nodes in the left_dc group confirm they have processed the transaction.
MAJORITY NOT
Example: MAJORITY NOT (left_dc)
A transaction under this commit scope group will be considered commited if a majority of the nodes that are not in the left_dc group confirm they have processed the transaction.
ALL
Example: ALL (left_dc)
A transaction under this commit scope group will be considered commited if all of the nodes in the left_dc group confirm they have processed the transaction.
ALL NOT
Example: ALL NOT (left_dc)
A transaction under this commit scope group will be considered commited if all of the nodes that are not in the left_dc group confirm they have processed the transaction.
Confirmation level
The confirmation level sets the point in time when a remote PGD node confirms that it has reached a particular point in processing a transaction.
ON received
A transaction will be confirmed immediately after receiving it, prior to starting the local application.
ON replicated
A transaction will be confirmed after applying changes of the transaction but before flushing them to disk.
ON durable
A transaction will be confirmed after all of its changes are flushed to disk.
ON visible
This is the default visibility. A transaction will be confirmed after all of its changes are flushed to disk and it's visible to concurrent transactions.
Commit Scope kinds
More details of the commit scope kinds and details of their parameters:
Parameter values
Boolean, enum, int and interval values are specified using the Postgres GUC Parameter value conventions.
GROUP COMMIT
Allows commits to be confirmed by a consensus of nodes and controls conflict resolution settings.
GROUP COMMIT parameters
Parameter | Type | Default | Description |
---|---|---|---|
transaction_tracking | Boolean | Off/False | Specifies whether to track status of transaction. See transaction_tracking settings |
conflict_resolution | enum | async | Specifies how to handle conflicts. (async |eager ). See conflict_resolution settings |
commit_decision | enum | group | Specifies how the COMMIT decision is made. (group |partner |raft ). See commit_decision settings |
ABORT ON parameters
Parameter | Type | Default | Description |
---|---|---|---|
timeout | interval | 0 | Timeout in milliseconds (accepts other units). (0 means not set) |
require_write_lead | bool | false | CAMO only. If set, then for a transaction to switch to local (async) mode, a consensus request will be required. |
transaction_tracking settings
When set to true, two phase commit transactions will:
- Lookup commit decisions when a writer is processing a PREPARE message.
- When recovering from an interruption, lookup the transactions prepared before the interruption. When found it will then lookup the commit scope of the transaction and any corresponding RAFT commit decision. If the node is the origin of the transaction and doesn't have a RAFT commit decision, and transaction_tracking in on in the commit scope, it will periodically look for a RAFT commit decision for this unresolved transaction until it is committed or aborted.
conflict_resolution settings
The value async
means resolve conflicts asynchronously during replication using the conflict resolution policy.
The value eager
means that conflicts are resolved eagerly during COMMIT by aborting one of the conflicting transactions.
See Group Commit/Conflict Resolution
commit_decision settings
The value group
means the preceding commit_scope_group
specification also affects the COMMIT decision, not just durability.
The value partner
means the partner node decides whether transactions can be committed. This value is allowed only on groups with 2 data nodes.
The value raft
means the COMMIT decision is done using Raft consensus of all the nodes in the cluster, independent of any commit_scope_group
consensus. This option has low performance and is for backwards compatibility only.
See Group Commit/Commit Decision
CAMO
Enables protection, with the client's cooperation, to prevent multiple insertions of the same transaction in failover scenarios.
See CAMO in the Durability guide for more details.
Degrade On parameters
Allows degrading to asynchronous operation on timeout.
Parameter | Type | Default | Description |
---|---|---|---|
timeout | interval | 0 | Timeout in milliseconds (accepts other units) after which operation becomes asynchronous. (0 means not set) |
require_write_lead | Boolean | False | Specifies whether the node must be a write lead to be able to switch to asynchronous mode. |
LAG CONTROL
Allows the configuration of dynamic rate-limiting controlled by replication lag.
See Lag Control in the Durability guide for more details.
LAG CONTROL parameters
Parameter | Type | Default | Description |
---|---|---|---|
max_lag_size | int | 0 | The maximum lag in kB that a given node can have in the replication connection to another node. When the lag exceeds this maximum scaled by max_commit_delay , lag control adjusts the commit delay. |
max_lag_time | interval | 0 | The maximum replication lag in milliseconds that the given origin can have wrt to a replication connection to a given downstream node. |
max_commit_delay | interval | 0 | Configures the maximum delay each commit can take, in fractional milliseconds. If set to 0, it disables lag control. After each commit delay adjustment (e.g. if the replication is lagging more than max_lag_size or max_lag_time ), the commit delay is recalculated with the weight of the bdr.lag_control_commit_delay_adjust GUC. The max_commit_delay is a ceiling for the commit delay. |
- If
max_lag_size
andmax_lag_time
are set to 0, the LAG CONTROL is disabled. - If
max_commit_delay
is not set or set to 0, the LAG CONTROL is disabled.
The lag size is derived from the delta of the send_ptr of the walsender to the apply_ptr of the receiver.
The lag time is calculated according to the following formula:
Where lag_size
is the delta between the send_ptr and apply_ptr (as used for
max_lag_size
) and apply_rate
is a weighted exponential moving average,
following the simplified formula:
Where:
prev_apply_rate
was the previously configuredapply_rate
, before recalculating the new rateapply_rate_weight
is the value of the GUCbdr.lag_tracker_apply_rate_weight
apply_ptr_diff
is the difference between the current apply_ptr and the apply_ptr at the point in time which the apply rate was last computeddiff_secs
is the delta in seconds from the last time the apply rate was calculated
SYNCHRONOUS_COMMIT
The SYNCHRONOUS_COMMIT
commit scope kind has no parameters. It is effectively configured only by the commit scope group and commit scope visibility.