Notebook

TCP Loss and Latency Measurement per Flow¶

In this notebook, we'll use the QoF flow meter, Python ipfix module, and Pandas to explore passive measurement of TCP loss and latency, the applications that QoF was built for.

The QoF command we used to create the trace used in this notebook is shown below:

[brian@magpie ~]$ qof --verbose --yaml qof-tcp-biflow.yaml --in mawi-0330-30min.pcap.gz \
                    | gzip > mawi-0330-30min-biflow.ipfix.gz
[2014-06-23 16:00:19] qof 0.9.0 ("Albula") starting
[2014-06-23 16:01:34] Processed 66397589 packets into 6044114 flows:
[2014-06-23 16:01:34]   Mean flow rate 80708.40/s.
[2014-06-23 16:01:34]   Mean packet rate 886621.76/s.
[2014-06-23 16:01:34]   Virtual bandwidth 5456.9622 Mbps.
[2014-06-23 16:01:34]   Maximum flow table size 159126.
[2014-06-23 16:01:34]   579 flush events.
[2014-06-23 16:01:34]   4453487 asymmetric/unidirectional flows detected (73.68%)
[2014-06-23 16:01:34] Assembled 33813 fragments into 16810 packets:
[2014-06-23 16:01:34]   Expired 26 incomplete fragmented packets. (0.00%)
[2014-06-23 16:01:34]   Maximum fragment table size 23.
[2014-06-23 16:01:34] Rejected 65101 packets during decode: (0.10%)
[2014-06-23 16:01:34]   65101 due to incomplete headers: (0.10%)
[2014-06-23 16:01:34]     52931 incomplete IPv6 extension headers. (0.08%)
[2014-06-23 16:01:34]     12170 incomplete transport headers. (0.02%)
[2014-06-23 16:01:34]     (Use a larger snaplen to reduce incomplete headers.)
[2014-06-23 16:01:34] qof terminating

As with the flow introduction notebook, this notebook uses the Pandas data analysis framework to explore a collection of flow data. So first, run the following code to set up the environment:

In [ ]:

import ipfix
import panfix
import gzip
import bz2

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

plt.rcParams['figure.figsize'] = (12, 6)

ipfix.ie.use_iana_default()
ipfix.ie.use_5103_default()         # since we're dealing with RFC 5103 biflows
ipfix.ie.use_specfile("qof.iespec") # to get the QoF enterprise Information Elements
ipfix.types.use_integer_ipv4()

In contrast to the flow introduction notebook, here we're looking only at TCP biflows: flows for which information was seen in both directions, or, in other words, complete connections. We'll use a much longer list of IEs in our dataframe, as well, in order to examine performance-relevant parameters of TCP flows.

By selecting biflow Information Elements and TCP-specific Information Elements, we're telling panfix to ignore all flows which don't contain these flags; this prefilters UDP and one-way flows, and leaves us with a much smaller set of flows to work with.

In [ ]:

# Set the name of the IPFIX file to work on here.
ipfix_filename = "../../mawi-0330-30min-biflow.ipfix.gz"
# change to gzip.open, bz2.open or open, as appropriate
ipfix_file_fn = gzip.open
# Change to None for no limit, or set a limit to reduce memory requirements
ipfix_max_flows = 1000000

df = panfix.dataframe_from_ipfix(ipfix_filename, (
                                 "flowStartMilliseconds",           "flowEndMilliseconds", 
                                 "sourceIPv4Address",               "sourceTransportPort",
                                 "destinationIPv4Address",          "destinationTransportPort", 
                                 "protocolIdentifier",              "flowEndReason",
                                 "octetDeltaCount",                 "packetDeltaCount",
                                 "transportOctetDeltaCount",        "transportPacketDeltaCount",
                                 "reverseOctetDeltaCount",          "reversePacketDeltaCount",
                                 "reverseTransportOctetDeltaCount", "reverseTransportPacketDeltaCount",
                                 "tcpSequenceCount",         "reverseTcpSequenceCount", 
                                 "tcpSequenceLossCount",     "reverseTcpSequenceLossCount",
                                 "tcpSequenceJumpCount",     "reverseTcpSequenceJumpCount",
                                 "tcpRetransmitCount",       "reverseTcpRetransmitCount",
                                 "tcpLossEventCount",        "reverseTcpLossEventCount",
                                 "minTcpRttMilliseconds",    "lastTcpRttMilliseconds",
                                 "tcpRttSampleCount"),
                                 count=ipfix_max_flows,
                                 open_fn=ipfix_file_fn)
df = panfix.coerce_timestamps(df)
df = panfix.derive_duration(df)
print("Loaded "+str(len(df))+" flows.")

Measuring Observation Loss¶

As discussed in the course (Never Make the Mistake of Thinking You're Measuring What You Think You're Measuring, part one), QoF observes the set of TCP sequence numbers to determine if packets were probably sent but not observed, and stores its estimation of the amount of observation loss per flow in terms of octets in the tcpSequenceLossCount and reverseTcpSequenceLossCount IEs.

This can happen due to improperly designed or provisioned measurement infrastructure. In the case of the MAWI data, it primarily occurs due to QoF's decoder rejecting packets truncated by the snaplen used by the MAWI trace; from the QoF verbose output:

[2014-06-23 16:01:34]   Expired 26 incomplete fragmented packets. (0.00%)
...
[2014-06-23 16:01:34] Rejected 65101 packets during decode: (0.10%)
[2014-06-23 16:01:34]   65101 due to incomplete headers: (0.10%)
[2014-06-23 16:01:34]     52931 incomplete IPv6 extension headers. (0.08%)
[2014-06-23 16:01:34]     12170 incomplete transport headers. (0.02%)
...

Let's see how much observation loss we're dealing with:

In [ ]:

lossy = (df["tcpSequenceLossCount"] > 0) | (df["reverseTcpSequenceLossCount"] > 0)
lossy.value_counts()[True] / len(lossy)

Less than one percent. As discussed in the course, it introduces bias to drop these lossy flows, so we'll simply call this amount of loss acceptable and continue.

RTT Measurement¶

As discussed in the course (Per-flow passive TCP performance measurement), QoF measures RTT passively by matching sequence numbers to acknowledgments and timestamps to timestamp echoes, in order to estimate the RTT as would be measured by the sender. This is a fairly noisy measurement, as it also captures endpoint and application delay in addition to network latency, but does so without generating any extra traffic.

Presently, QoF exports this information in two IEs, minTcpRttMilliseconds continuing the minimum of all smoothed RTT estimates over the flow's lifetime, and lastTcpRttMilliseconds containing the final smoother RTT estimate. The former aims to provide an upper bound for network latency along the path(s) that the flow took.

Let's have a look at estimated network latency for all TCP flows. We clamp the range to 500ms, as there is a very long tail of flows that have insufficient samples for accurate measurement.

In [ ]:

df['minTcpRttMilliseconds'].hist(bins=250, range=(0,500))
plt.xlabel("RTT ms")
plt.ylabel("flows")

Here we see peaks around 15ms, 40ms, 115ms, and 285ms. This broadly makes sense, as this traffic was taken from a transpacific link: the 115ms peak represents Asia-US traffic, for instance, and 285ms Asia-Europe via the US.

We can also weight the RTTs by number of packets in the flow:

In [ ]:

df['minTcpRttMilliseconds'].hist(bins=250, range=(0,500), 
                                 weights=df['packetDeltaCount'] + df['reversePacketDeltaCount'])
plt.xlabel("RTT ms")
plt.ylabel("packets")

Here, we see that the highest volume flows tend to have lower RTTs. This is a natural outcome of TCP congestion control: since the RTT is the fundamental frequency of the TCP control loop, longer-RTT flows will find less bandwidth and be crowded out by shorter-RTT flows. Indeed, looking at RTT versus data rate in two dimensions confirms this:

In [ ]:

def plot_rate_rtt(df, by="flowDeltaCount", filename=None):
    plt.figure(figsize=(9,7))
    plt.hexbin(x = df["minTcpRttMilliseconds"],
           y = ((df["octetDeltaCount"] + df["reverseOctetDeltaCount"]) * 8) / (df["durationSeconds"] + 0.001), 
           C = df[by],
           reduce_C_function = np.sum,
           yscale='log',
           bins='log',
           cmap = plt.cm.binary)
    cb = plt.colorbar()
    cb.set_label("log10("+by+")")
    plt.xlabel("RTT ms")
    plt.ylabel("data rate (bps)")
    if filename:
        plt.savefig(filename)

In [ ]:

plot_rate_rtt(df[df["minTcpRttMilliseconds"] < 500], by="packetDeltaCount")

In an operational environment, we could use these RTT measurements to determine the delay between pairs of networks, using each flow to refine the estimate; however, given the fact that the MAWI data is anonymized, this won't work here. More interesting would be looking at changes in RTT over time, but we don't have enough data to show that here, either.

Efficiency and Loss Event Measurement¶

We can measure loss per flow two ways: in terms of efficiency (how many bytes were sent by the application versus how many were seen on the wire, effectively counting the proportion of "wasted" traffic), and in terms of loss events (as in the talk, detections of retransmissions or sequence number gaps per RTT). Let's look at efficiency first. To keep from dividing by zero, we need to discard empty flows (flows that had no application layer content, e.g. while the connection was refused) and aberrant flows (certain flows that had apparently more bytes sent by the application than seen on the wire, e.g. due to observation loss):

In [ ]:

empty = df["tcpSequenceCount"] + df["reverseTcpSequenceCount"] == 0
aberrant = ((df["tcpSequenceCount"] + df["reverseTcpSequenceCount"]) >
            (df["transportOctetDeltaCount"] + df["reverseTransportOctetDeltaCount"]))
eff_df = df[(aberrant == False) & (empty == False)]

In [ ]:

((eff_df["tcpSequenceCount"] + eff_df["reverseTcpSequenceCount"]) / 
 (eff_df["transportOctetDeltaCount"] + eff_df["reverseTransportOctetDeltaCount"])
 ).hist(weights=(eff_df["packetDeltaCount"] + eff_df["reversePacketDeltaCount"]), 
        range=(0.95,1.00), bins=250)
plt.xlabel("efficiency")
plt.ylabel("packets")

Here we see that most packets are sent in flows that are 99.8% or better efficient. Indeed, almost 90% of flows see no loss at all:

In [ ]:

noloss_df = eff_df[eff_df["tcpLossEventCount"] + eff_df["reverseTcpLossEventCount"] == 0]
loss_df = eff_df[eff_df["tcpLossEventCount"] + eff_df["reverseTcpLossEventCount"] > 0]
len(noloss_df) / len(eff_df)

This is usually because the flow wasn't long enough to probe the maximum bandwidth available. Indeed, let's look at the difference in durations between lossless flows and those experiencing loss:

In [ ]:

noloss_df["durationSeconds"].describe()

In [ ]:

loss_df["durationSeconds"].describe()

We can examine this in more detail by plotting loss event counts by flow duration:

In [ ]:

def plot_loss_duration(df, by="packetDeltaCount", filename=None):
    plt.figure(figsize=(9,7))
    plt.hexbin(x = df["durationSeconds"],
           y = df["tcpLossEventCount"] + df["reverseTcpLossEventCount"], 
           C = df[by],
           reduce_C_function = np.sum,
           bins='log',
           cmap = plt.cm.binary)
    cb = plt.colorbar()
    cb.set_label("log10("+by+")")
    plt.xlabel("duration (s)")
    plt.ylabel("loss count")
    if filename:
        plt.savefig(filename)

In [ ]:

plot_loss_duration(eff_df)