Priority-based Flow Control

The Ethernet standard defines a unreliable layer 2 technology, which means that the hops of the network does not trade control messages related to the correctly reception of frames. Besides, the Ethernet does not defines a flow control mechanism on layer 2, leaving this tasks for upper layers, as told before. For this, another IEEE standard, 802.3x, had defined a flow control mechanism to the Ethernet.

The Ethernet Flow Control

The flow control proposed by the 802.3x was based on MAC PAUSE messages. The PAUSE messages aim to stop a sender to continually sending when the receiver are not able to process data in the same speed that messages arrive, preventing buffering overflow. The PAUSE message was implemented through a particular Ethernet frame that was directed to a multicast MAC address and presents the time that the link between the sender and the receiver should be in pause. The time sended is a two-byte unsigned integer and its measured in quanta pause times, where a quanta represents the time to send 512 bits in the link at that moment. The rest of the frame payload is padded with 42 bytes, allowing the transmission.

AAlthough solving the problem of flow control on Ethernet, this approach had some bad consequences related to multiple traffic applications. Originally when a link was paused the sender could not generate any more frames. Due to this fact, the PAUSE messages would make impossible to implement difference quality of service to frames that contains more important data, which is truly desirable on data centers. Based on this necessity, the Priority Flow Control appeared as a solution to use Ethernet on Data Center networks.

Priority on the Ethernet Flow Control

To adjust the Ethernet flow control approach to data center networks, the IEEE Data Center Bridgind task group proposed a modification on PAUSE messages. The new message allows the definition of eight classes of quality of service that represents the priority of the data sended trough frames. The pause time of each one of these classes can be defined differently, and hence, the transmission of high priority data can be privileged.

To implement this changes, the payload of the PAUSE message was modified. In the start of the payload there is a Quality of Service vector, that will enable the priority classes that can transmit. After this vector, the pause time value for each class is defined with two bytes unsigned integer. The rest of the payload is padded.

The header of the frames can be visualized in this image taken from [8]:

The PAUSE messages modifies the usual header structure of the Ethernet control frame to allows flow control and priority-based flow control.

The Classes of Service are related to VLAN tags of the frames. Every ethernet must be tagged to a VLAN, even if when there anyone defined (in this case, is used a default tag). So to introduce classes of services, the DCB task group decided to put Class of Service number in the VLAN tag bits. This new approach solves the multiple traffic issues and allows the optimization of bandwidth use to high priority data.

To have a visualization of this modifications, we can see the PAUSE messages working on different types of traffic in this image taken from [8]:



The PAUSE message is only pausing the flow of the type of traffic that should be.