Many network provides apply different policies for different network traffic, for example, T-mobile's Binge On
program zero-rates (i.e., does not charge against monthly data quota) network traffic identified as video
streaming,
and also throttles this traffic to a maximum of 1.5Mbps. However, in general a network provider does not know
what
app you are using; rather, they only see the app's network traffic. As a result, they have to make educated
guesses based on the network
traffic that the app generates.
To address this challenge, network providers usually deploy one or more devices (typically called middleboxes)
that perform this mapping
between network traffic and applications. Specifically, such middleboxes include a classification rule
that maps network traffic into
specific category, and an action that specifies what should be done to this category of traffic. Little
is known about these
classification rules, since middleboxes use proprietary, closed-source hardware and software.
In this work, we develop a general approach for identifying classification rules (i.e., the network provider's
"educated guesses")
that map network traffic to applications. Specifically, we use an efficient binary search and
carefully-generated flows to eliminate the number of tests to run for reverse-engineering the rules. We also
characterize the classification rules for HTTP(S) traffic implemented
in today's carrier-grade middleboxes and identify examples of misclassification (traffic from application A
being labeled mistakenly as application B).
In summary, our analysis shows that different vendors use different matching rules, but all generally focus on a
small number of fields inside HTTP/S traffic.
List of identified keywords in Host headers
List of identified keywords in User Agent headers
List of identified keywords in Content Type headers
List of identified keywords in SNI fields