Skip to content

Commit

Permalink
Add packet trimming API
Browse files Browse the repository at this point in the history
When a packet is lost, it can be recovered through fast retransmission
(e.g., Go-Back-N in RoCE) or by using timeouts. Retransmission triggered
by timeouts typically incurs significant latency. Packet trimming aims
to facilitate rapid packet loss notification and, consequently,
eliminate slow timeout-based retransmissions.

Signed-off-by: Marian Pritsak <[email protected]>
  • Loading branch information
marian-pritsak committed Jan 17, 2025
1 parent 1443ba8 commit ec74c26
Show file tree
Hide file tree
Showing 6 changed files with 288 additions and 1 deletion.
166 changes: 166 additions & 0 deletions doc/SAI-Proposal-Packet-Trimming.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
# Switch Abstraction Interface Change Proposal for Packet Trimming

Title | Packet Trimming
------------|----------------
Authors | Nvidia
Status | In review
Type | Standards track
Created | 8/28/2024
SAI-Version | 1.14
----------

## Overview
When the lossy queue exceeds a buffer threshold, it drops packets without any notification to the destination host.

When a packet is lost, it can be recovered through fast retransmission (e.g., Go-Back-N in RoCE) or by using timeouts. Retransmission triggered by timeouts typically incurs significant latency. Packet trimming aims to facilitate rapid packet loss notification and, consequently, eliminate slow timeout-based retransmissions.

To help the host recover data more quickly and accurately, we introduce a packet trimming feature, that upon a failed packet admission to a shared buffer,
will trim a packet to a configured size, and try sending it on a different queue to deliver a packet drop notification to an end host.

```
┌───────────────┐
│ │
│Trimmed packet │
│ │
└───────────────┘
┌─┬─┬─┬─┬────────┐
│ │ │ │ │ │
│ │ │ │ │ │
┌────────────────► │ │ │ │ │
│ │ │ │ │ │ │ Queue
│ │ │ │ │ │ │
│ │ │ │ │ │ │
│ └─┴─┴─┴─┴────────┘
┌──────────────┐ │
│ │ ┌──────────────────────────────────────────────────────┐ │ ┌─┬─┬─┬─┬─┬─┬─┬─┬┐
│ │ │ │ │ │ │ │ │ │ │ │ │ ││
│ │ │ │ │ \ / │ │ │ │ │ │ │ │ ││
│ │ │ │ │ \ / │ │ │ │ │ │ │ │ ││
│ Packet │ │ Pipeline ┼────┼───────\────────► │ │ │ │ │ │ │ ││ Queue
│ │ │ │ / \ │ │ │ │ │ │ │ │ ││
│ │ │ │ / \ │ │ │ │ │ │ │ │ ││
│ │ └──────────────────────────────────────────────────────┘ └─┴─┴─┴─┴─┴─┴─┴─┴┘
│ │
│ │
│ │
└──────────────┘
```

This feature assumes that forwarding tables are configured properly, and the original packet would be delivered to the destination successfully if not for the congestion.

## Spec
There is a tradeoff between trying to configure a higher threshold in a queue buffer profile and trimming the packet.

If the user chooses to configure higher thresholds for queues, the probability of a drop on a particular queue is lower only if other ports are less congested at the moment.

However, if all the ports are equally utilized, it makes sense to create a different buffer profile for these queues, with a stricter threshold to have more fairness in shared buffer.

A static trimming threshold may not be effective with shared buffer switches, where the buffer resources allocated to a queue or port can vary over time. Therefore, we propose adding a new attribute to a buffer profile to allow configuring packet trimming on such stricter profiles:
```
/**
* @brief Enum defining queue actions in case the packet fails to pass the admission control.
*/
typedef enum _sai_buffer_profile_packet_admission_fail_action_t
{
/**
* @brief Drop the packet.
*
* Default action. Packet has nowhere to go
* and will be dropped.
*/
SAI_BUFFER_PROFILE_PACKET_ADMISSION_FAIL_ACTION_DROP,
/**
* @brief Trim the packet.
*
* Try sending a shortened packet over a different
* queue. Original packet will be dropped and trimmed copy of the packet will be send.
* The IP length and checksum fields will be updated in a trimmed copy.
* SAI_QUEUE_STAT_DROPPED_PACKETS as well as SAI_QUEUE_STAT_DROPPED_BYTES
* will count the original discarded frames even if they will be trimmed afterwards.
* Interface statistics must show dropped packets.
* Interface statistics may show sent trimmed packets.
*/
SAI_BUFFER_PROFILE_PACKET_ADMISSION_FAIL_ACTION_DROP_AND_TRIM,
} sai_buffer_profile_packet_admission_fail_action_t;
```
```
/**
* @brief Buffer profile discard action
*
* Action to be taken upon packet discard due to
* buffer profile configuration. Applicable only
* when attached to a queue.
*
* @type sai_buffer_profile_packet_admission_fail_action_t
* @flags CREATE_AND_SET
* @default SAI_BUFFER_PROFILE_PACKET_ADMISSION_FAIL_ACTION_DROP
*/
SAI_BUFFER_PROFILE_ATTR_PACKET_ADMISSION_FAIL_ACTION,
```

Trimming engine attributes are configured globally.
```
/**
* @brief Trim packets to this size to reduce bandwidth
*
* @type sai_uint32_t
* @flags CREATE_AND_SET
* @default 128
*/
SAI_SWITCH_ATTR_PACKET_TRIM_SIZE,
/**
* @brief New packet trim DSCP value
*
* @type sai_uint8_t
* @flags CREATE_AND_SET
* @default 0
*/
SAI_SWITCH_ATTR_PACKET_TRIM_DSCP_VALUE,
/**
* @brief Queue mapping mode for a trimmed packet
*
* @type sai_packet_trim_queue_resolution_mode_t
* @flags CREATE_AND_SET
* @default SAI_PACKET_TRIM_QUEUE_RESOLUTION_MODE_STATIC
*/
SAI_SWITCH_ATTR_PACKET_TRIM_QUEUE_RESOLUTION_MODE,
/**
* @brief New packet trim queue index
*
* @type sai_uint8_t
* @flags CREATE_AND_SET
* @default 0
* @validonly SAI_SWITCH_ATTR_PACKET_TRIM_QUEUE_RESOLUTION_MODE == SAI_PACKET_TRIM_QUEUE_RESOLUTION_MODE_STATIC
*/
SAI_SWITCH_ATTR_PACKET_TRIM_QUEUE_INDEX,
```

If more granularity is needed (e.g. trim a specific protocol, or packets within protocol), ACL action is added to disable trimming even if the packet is eligible due to a queue with a buffer profile attached that has trimming enabled.
```
/**
* @brief Disable packet trim for a given match condition.
*
* This rule takes effect only when packet trim is configured on a buffer profile of a queue to which a packet belongs.
*
* @type sai_acl_action_data_t bool
* @flags CREATE_AND_SET
* @default disabled
*/
SAI_ACL_ENTRY_ATTR_ACTION_PACKET_TRIM_DISABLE = SAI_ACL_ENTRY_ATTR_ACTION_START + 0x39,
```

Both the queue and the port have the packet counter to reflect the number of trimmed packet.
```
/** Packets trimmed due to failed shared buffer admission [uint64_t] */
SAI_PORT_STAT_TRIM_PACKETS,
```
```
/** Packets trimmed due to failed admission [uint64_t] */
SAI_QUEUE_STAT_TRIM_PACKETS = 0x00000028,
```
16 changes: 15 additions & 1 deletion inc/saiacl.h
Original file line number Diff line number Diff line change
Expand Up @@ -288,6 +288,9 @@ typedef enum _sai_acl_action_type_t

/** Next Chain Group */
SAI_ACL_ACTION_TYPE_CHAIN_REDIRECT = 0x00000038,

/** Disable packet trim */
SAI_ACL_ACTION_TYPE_PACKET_TRIM_DISABLE = 0x00000039,
} sai_acl_action_type_t;

/**
Expand Down Expand Up @@ -3289,10 +3292,21 @@ typedef enum _sai_acl_entry_attr_t
*/
SAI_ACL_ENTRY_ATTR_ACTION_CHAIN_REDIRECT = SAI_ACL_ENTRY_ATTR_ACTION_START + 0x38,

/**
* @brief Disable packet trim for a given match condition.
*
* This rule takes effect only when packet trim is configured on a buffer profile of a queue to which a packet belongs.
*
* @type sai_acl_action_data_t bool
* @flags CREATE_AND_SET
* @default disabled
*/
SAI_ACL_ENTRY_ATTR_ACTION_PACKET_TRIM_DISABLE = SAI_ACL_ENTRY_ATTR_ACTION_START + 0x39,

/**
* @brief End of Rule Actions
*/
SAI_ACL_ENTRY_ATTR_ACTION_END = SAI_ACL_ENTRY_ATTR_ACTION_CHAIN_REDIRECT,
SAI_ACL_ENTRY_ATTR_ACTION_END = SAI_ACL_ENTRY_ATTR_ACTION_PACKET_TRIM_DISABLE,

/**
* @brief End of ACL Entry attributes
Expand Down
40 changes: 40 additions & 0 deletions inc/saibuffer.h
Original file line number Diff line number Diff line change
Expand Up @@ -590,6 +590,33 @@ typedef enum _sai_buffer_profile_threshold_mode_t

} sai_buffer_profile_threshold_mode_t;

/**
* @brief Enum defining queue actions in case the packet fails to pass the admission control.
*/
typedef enum _sai_buffer_profile_packet_admission_fail_action_t
{
/**
* @brief Drop the packet.
*
* Default action. Packet has nowhere to go
* and will be dropped.
*/
SAI_BUFFER_PROFILE_PACKET_ADMISSION_FAIL_ACTION_DROP,

/**
* @brief Trim the packet.
*
* Try sending a shortened packet over a different
* queue. Original packet will be dropped and trimmed copy of the packet will be send.
* The IP length and checksum fields will be updated in a trimmed copy.
* SAI_QUEUE_STAT_DROPPED_PACKETS as well as SAI_QUEUE_STAT_DROPPED_BYTES
* will count the original discarded frames even if they will be trimmed afterwards.
* Interface statistics must show dropped packets.
* Interface statistics may show sent trimmed packets.
*/
SAI_BUFFER_PROFILE_PACKET_ADMISSION_FAIL_ACTION_DROP_AND_TRIM,
} sai_buffer_profile_packet_admission_fail_action_t;

/**
* @brief Enum defining buffer profile attributes.
*/
Expand Down Expand Up @@ -711,6 +738,19 @@ typedef enum _sai_buffer_profile_attr_t
*/
SAI_BUFFER_PROFILE_ATTR_XON_OFFSET_TH,

/**
* @brief Buffer profile discard action
*
* Action to be taken upon packet discard due to
* buffer profile configuration. Applicable only
* when attached to a queue.
*
* @type sai_buffer_profile_packet_admission_fail_action_t
* @flags CREATE_AND_SET
* @default SAI_BUFFER_PROFILE_PACKET_ADMISSION_FAIL_ACTION_DROP
*/
SAI_BUFFER_PROFILE_ATTR_PACKET_ADMISSION_FAIL_ACTION,

/**
* @brief End of attributes
*/
Expand Down
3 changes: 3 additions & 0 deletions inc/saiport.h
Original file line number Diff line number Diff line change
Expand Up @@ -3352,6 +3352,9 @@ typedef enum _sai_port_stat_t
/** Count of total bits corrected by FEC. Counter will increment monotonically. */
SAI_PORT_STAT_IF_IN_FEC_CORRECTED_BITS,

/** Packets trimmed due to failed shared buffer admission [uint64_t] */
SAI_PORT_STAT_TRIM_PACKETS,

/** Port stat in drop reasons range start */
SAI_PORT_STAT_IN_DROP_REASON_RANGE_BASE = 0x00001000,

Expand Down
3 changes: 3 additions & 0 deletions inc/saiqueue.h
Original file line number Diff line number Diff line change
Expand Up @@ -418,6 +418,9 @@ typedef enum _sai_queue_stat_t
/** Queue delay watermark in nanoseconds [uint64_t] */
SAI_QUEUE_STAT_DELAY_WATERMARK_NS = 0x00000027,

/** Packets trimmed due to failed admission [uint64_t] */
SAI_QUEUE_STAT_TRIM_PACKETS = 0x00000028,

/** Custom range base value */
SAI_QUEUE_STAT_CUSTOM_RANGE_BASE = 0x10000000

Expand Down
61 changes: 61 additions & 0 deletions inc/saiswitch.h
Original file line number Diff line number Diff line change
Expand Up @@ -610,6 +610,30 @@ typedef enum _sai_switch_hostif_oper_status_update_mode_t

} sai_switch_hostif_oper_status_update_mode_t;

/**
* @brief Attribute data for SAI_SWITCH_ATTR_HOSTIF_OPER_STATUS_UPDATE_MODE.
*/
typedef enum _sai_packet_trim_queue_resolution_mode_t
{
/**
* @brief Static queue resolution.
*
* In this mode, a new queue for the trimmed packet is set directly
* by the application.
*/
SAI_PACKET_TRIM_QUEUE_RESOLUTION_MODE_STATIC,

/**
* @brief Dynamic queue resolution.
*
* In this mode, a new queue for the trimmed packet is resolved
* using QOS maps, applied to a new DSCP value that was provided
* for a trimmed packet.
*/
SAI_PACKET_TRIM_QUEUE_RESOLUTION_MODE_DYNAMIC,

} sai_packet_trim_queue_resolution_mode_t;

/**
* @brief Attribute Id in sai_set_switch_attribute() and
* sai_get_switch_attribute() calls.
Expand Down Expand Up @@ -3124,6 +3148,43 @@ typedef enum _sai_switch_attr_t
*/
SAI_SWITCH_ATTR_TAM_TEL_TYPE_CONFIG_CHANGE_NOTIFY,

/**
* @brief Trim packets to this size to reduce bandwidth
*
* @type sai_uint32_t
* @flags CREATE_AND_SET
* @default 128
*/
SAI_SWITCH_ATTR_PACKET_TRIM_SIZE,

/**
* @brief New packet trim DSCP value
*
* @type sai_uint8_t
* @flags CREATE_AND_SET
* @default 0
*/
SAI_SWITCH_ATTR_PACKET_TRIM_DSCP_VALUE,

/**
* @brief Queue mapping mode for a trimmed packet
*
* @type sai_packet_trim_queue_resolution_mode_t
* @flags CREATE_AND_SET
* @default SAI_PACKET_TRIM_QUEUE_RESOLUTION_MODE_STATIC
*/
SAI_SWITCH_ATTR_PACKET_TRIM_QUEUE_RESOLUTION_MODE,

/**
* @brief New packet trim queue index
*
* @type sai_uint8_t
* @flags CREATE_AND_SET
* @default 0
* @validonly SAI_SWITCH_ATTR_PACKET_TRIM_QUEUE_RESOLUTION_MODE == SAI_PACKET_TRIM_QUEUE_RESOLUTION_MODE_STATIC
*/
SAI_SWITCH_ATTR_PACKET_TRIM_QUEUE_INDEX,

/**
* @brief End of attributes
*/
Expand Down

0 comments on commit ec74c26

Please sign in to comment.