Monday, February 6, 2012

Data Center TCP


Data Center TCP (DCTCP) By M. Alizadeh, A. Greenberg, D. Maltz, J. Padhye, P. Patel, B. Prabhakar, S. Sengupta, and M. Sridharan. .

This paper aims to achieve high burst tolerance, low latency, and high throughput, with commodity switches by modifying the traditional Transport Control Protocol (TCP) and optimizing it for Data Center Traffic. They name this modified protocol "Data Center Transport Control Protocol (DCTCP)"

Challenges

This protocol targets 3 main performance issues which are common in a production cluster: Incast problem, Queue buildup and Buffer pressure. 

1. Incast Problem: If a large number of flows converge on the same interface of a switch over a short period of time, the packets may exhaust either the switch memory or the maximum permitted buffer for that interface, resulting in packet losses. This can occur even for flows of small size.

2. Queue Buildup: If long and small flows are traversing through the same queue, the packets of small flow, even  when no packets are lost, observe an increase in the latency due to the queue build up caused by the packets of the large flow.

3. Buffer Pressure: If a few ports of a switch are overloaded with traffic, there is direct impact of the traffic traversing from other ports of the same switch. The explanation for this effect is that different ports on the same switch use a shared memory pool which get occupied by the overloaded ports and causes loss of packets on other ports.

DCTCP Algorithm

In order to tackle these issues, DCTCP uses a simple marking scheme at switches that sets the Congestion Experienced (CE) codepoint of packets as soon as the buffer occupancy exceeds a fixed small threshold. The receiver of the packet will check this flag and set the flag in the ACK in order to notify the sender about the congestion. The sender will modify its window size in proportion to the amount of ACKs received with the flag as set. This approach of reacting to congestion is different from the approach of regular TCP which cuts its window size by a factor of 2 when it detect congestion. By reacting proportionally to the amount of congestion. DCTCP tries to get maximum throughput without incurring any packet loss. 

Results

By using DCTCP, the impact of the Incast problem does get reduced. However when a large number of synchronized small flows hit the same queue, even a single packet from all flow can cause the Incast problem. So there isn't much one can do about this issue.

Since DCTCP senders start reacting as soon as queue length on an interface exceeds beyond K, the issue of Queue Build up is resolved. 

The DCTCP also solves the buffer pressure problem because a congested port's queue length doesn't grow exceedingly large.

The results show that if DCTCP is used in a data center, it could handle 10 times larger query responses and 10 times larger background flows while performing better than it does with TCP today.

Reviewer's comments

This paper seems very interesting and also feels very promising as it performs really well in the test results which consisted of real world work loads. Implementing DCTCP in a data center would take 30 lines of code change which is quite amazing considering the performance gain that it could achieve. The authors mention it in the paper that this protocol might not be suitable for WAN and it would be interesting to see how it performs in presence of TCP traffic. Overall the authors have down a very good job.

1 comment: