LBNL­40319 
UCB//CSD­97­945 
Measurements and Analysis 
of End­to­End Internet Dynamics 
Vern Paxson 
Ph.D. Thesis 
Computer Science Division 
University of California, Berkeley 
and 
Information and Computing Sciences Division 
Lawrence Berkeley National Laboratory 
University of California 
Berkeley, CA 94720 
April, 1997 
This work was supported by the Director, Office of Energy Research, Scientific Computing Staff, of the United 
States Department of Energy under Contract No. DE­AC03­76SF00098. 

Measurements and Analysis of End­to­End Internet Dynamics 
by 
Vern Edward Paxson 
B.S. (Stanford University) 1985 
M.S. (University of California, Berkeley) 1991 
A dissertation submitted in partial satisfaction of the 
requirements for the degree of 
Doctor of Philosophy 
in 
Computer Science 
in the 
GRADUATE DIVISION 
of the 
UNIVERSITY of CALIFORNIA at BERKELEY 
Committee in charge: 
Prof. Domenico Ferrari, Chair 
Prof. Michael Luby 
Prof. John Rice 
1997 

Measurements and Analysis of End­to­End Internet Dynamics 
Copyright 1997 
by 
Vern Edward Paxson 
The U.S. Department of Energy has the right to use this document 
for any purpose whatsoever including the right to reproduce 
all or any part thereof 

1 
Abstract 
Measurements and Analysis of End­to­End Internet Dynamics 
by 
Vern Edward Paxson 
Doctor of Philosophy in Computer Science 
University of California at Berkeley 
Prof. Domenico Ferrari, Chair 
Accurately characterizing end­to­end Internet dynamics---the performance that a user actually ob­ 
tains from the lengthy series of network links that comprise a path through the Internet---is excep­ 
tionally difficult, due to the network's immense heterogeneity. It can be impossible to gauge the 
generality of findings based on measurements of a handful of paths, yet logistically it has proven 
very difficult to obtain end­to­end measurements on larger scales. 
At the heart of our work is a ``measurement framework'' we devised in which a number 
of sites around the Internet host a specialized measurement service. By coordinating ``probes'' be­ 
tween pairs of these sites we can measure end­to­end behavior along O(N 2 ) paths for a framework 
consisting of N sites. Consequently, we obtain a superlinear scaling that allows us to measure a rich 
cross­section of Internet behavior without requiring huge numbers of observation points. 37 sites 
participated in our study, allowing us to measure more than 1,000 distinct Internet paths. 
The first part of our work looks at the behavior of end­to­end routing: the series of routers 
over which a connection's packets travel. Based on 40,000 measurements made using our frame­ 
work, we analyze: routing ``pathologies'' such as loops, outages, and flutter; the stability of routes 
over time; and the symmetry of routing along the two directions of an end­to­end path. We find that 
pathologies increased significantly over the course of 1995, indicating that, by one metric, routing 
degraded over the year; that Internet paths are heavily dominated by a single route, but that routing 
lifetimes range from seconds to many days, with most lasting for days; and that, at the end of 1995, 
about half of all Internet paths included a major routing asymmetry. 
The second part of our work studies end­to­end Internet packet dynamics. We analyze 
20,000 TCP transfers of 100 Kbyte each to investigate the performance of both the TCP endpoints 
and the Internet paths. The measurements used for this part of our study are much richer than 
those for the first part, but require a great degree of attention to issues of calibration, which we 
address by applying self­consistency checks to the measurements whenever possible. We find that 
packet filters are capable of a wide range of measurement errors, some of which, if undetected, can 
significantly taint subsequent analysis. We further find that network clocks exhibit adjustments and 
skews relative to other clocks frequently enough that a failure to detect and remove these effects 
will likewise pollute subsequent packet timing analysis. 
Using TCP transfers for our network path ``measurement probes'' gains a number of ad­ 
vantages, the chief of which is the ability to probe fine time scales without unduly loading the 
network. However, using TCP also requires us to accurately distinguish between connection dy­ 

2 
namics due to the behavior of the TCP endpoints, and dynamics due to the behavior of the network 
path between them. To address this problem, we develop an analysis program, tcpanaly, that has 
coded into it knowledge of how the different TCP implementations in our study function. In the 
process of developing tcpanaly, we thus in tandem develop detailed descriptions of the perfor­ 
mance and congestion­avoidance behavior of the different implementations. We find that some of 
the implementations suffer from gross problems, the most serious of which would devastate overall 
Internet performance, were the implementations ubiquitously deployed. 
With the measurements calibrated and the TCP behavior understood, we then can turn to 
analyzing the dynamics of Internet paths. We first need to determine a path's bottleneck bandwidth, 
meaning the fastest transfer rate the path can sustain. Knowing the bottleneck bandwidth then lets 
us determine which packets a sender transmits must necessarily queue behind their predecessors, 
due to the load the sender imposes on the path. This in turn allows us to determine which of our 
probes are perforce correlated. We identify several problems with the existing bottleneck estimation 
technique, ``packet pair,'' and devise a robust estimation algorithm, PBM (``packet bunch modes''), 
that addresses these difficulties. We calibrate PBM by gauging the degree to which the bottleneck 
rates it estimates accord with known link speeds, and find good agreement. We then characterize 
the scope of Internet path bottleneck rates, finding wide variation, not infrequent asymmetries, but 
considerable stability over time. 
We next turn to an analysis of packet loss along Internet paths. To do so, we distin­ 
guish between losses of ``loaded'' data packets, meaning those which necessarily queued behind a 
predecessor at the bottleneck; ``unloaded'' data packets, which did not do so; and the small ``acknow­ 
ledgement'' packets returned to a TCP sender by the TCP receiver. We find that network paths are 
well characterized by two general states, ``quiescent,'' in which no loss occurs, and ``busy,'' in which 
one or more packets of a connection are lost. The prevalence of quiescent connections remained 
about 50% in both our datasets, but for busy connections, packet loss rates increased significantly 
over the course of 1995. We further find that loss rates vary dramatically between different regions 
of the network, with European and especially trans­Atlantic connections faring much worse than 
those confined to the United States. 
We also characterize: loss symmetry, finding that loss rates along the two directions of 
an Internet path are nearly uncorrelated; loss ``outages,'' finding that outage durations exhibit clear 
Pareto distributions, indicating they span a large range of time scales; the degree to which a connec­ 
tion's loss patterns predict those of future connections, finding that observing quiescence is a good 
predictor of observing quiescence in the future, and likewise for observing a busy network path, but 
that the proportion of lost packets does not well predict the future proportion; and the efficacy of 
TCP implementations in dealing with loss efficiently, by retransmitting only when necessary. We 
find that most TCPs retransmit fairly efficiently, and that deploying the proposed ``selective ack­ 
nowledgement'' option would eliminate almost all of their remaining unnecessary retransmissions. 
However, some TCPs incorrectly determine how long to wait before retransmitting, and these can 
suffer large numbers of unnecessary retransmissions. 
We finish our study with a look at variations in packet transit delays. We find great ``peak­ 
to­peak'' variation, meaning that maximum delays far exceed minimum delays. Delay variations 
along the two directions of an Internet path are only lightly correlated, but correlate well with loss 
rates observed in the same direction along the path. We identify three types of ``timing compres­ 
sion,'' in which packets arrive at their receiver spaced more closely together than when originally 

3 
transmitted. The prevalence of none of the three is such as to significantly perturb network perfor­ 
mance, but all three occur frequently enough to require judicious filtering by network measurement 
procedures to avoid deriving false timing conclusions. 
We then look at the question of the time scales on which most of a path's queueing vari­ 
ations occur. We find that, overall, most variation occurs on time scales of 100--1000 msec, which 
means that transport connections might effectively adapt their transmission to the variations, but 
only if they act quickly. However, as with many Internet path properties, we find wide ranges of 
behavior, with not insignificant queueing variations occurring on time scales as small as 10 msec 
and as large as one minute. 
The last aspect of packet delay variations we investigate is the degree to which it reflects 
an Internet path's available bandwidth. We show that the ratio between the delay variations packets 
incur due to their connection's own loading of the network, versus the total delay variations incurred, 
correlates well with the connection's overall throughput. We further find that Internet paths exhibit 
wide variation in available bandwidth, ranging from very little available to virtually all. The degree 
of available bandwidth diminished markedly over the course of 1995, but, as for packet loss rates, 
we also find sharp geographic differences, so the overall trend cannot be summarized in completely 
simple terms. Finally, we investigate the degree to which the available bandwidth observed by 
a connection accurately predicts that observed by future connections, finding that the predictive 
power is fairly good for time scales of minutes to hours, but diminishes significantly for longer time 
periods. 
We argue that our work supports several general themes: 
ffl The N 2 scaling property of our measurement framework serves to measure a sufficiently di­ 
verse set of Internet paths that we might plausibly interpret the resulting analysis as accurately 
reflecting general Internet behavior. 
ffl To cope with such large­scaled measurements requires attention to calibration using self­ 
consistency checks; robust statistics to avoid skewing by outliers; and automated ``micro­ 
analysis,'' such as that performed by tcpanaly, that we might see the forest as well as the 
trees. 
ffl With due diligence to remove packet filter errors and TCP effects, TCP­based measurement 
provides a viable means for assessing end­to­end packet dynamics. 
ffl We find wide ranges of behavior, so we must exercise great caution in regarding any aspect 
of packet dynamics as ``typical.'' 
ffl Some common assumptions such as in­order packet delivery, FIFO bottleneck queueing, in­ 
dependent loss events, single congestion time scales, and path symmetries are all sometimes 
violated. 
ffl The combination of path asymmetries and reverse­path noise render sender­only measure­ 
ment techniques markedly inferior to those that include receiver­cooperation. 
Finally, we believe an important aspect of this work is how it might contribute towards 
developing a ``measurement infrastructure'' for the Internet: one that proves ubiquitous, informative, 
and sound. 

iii 
To Lindsay --- 
For making it both possible 
and worthwhile 
--- with all my love 

iv 
Contents 
List of Figures xi 
List of Tables xvi 
1 Introduction 1 
I End­to­End Routing Behavior in the Internet 4 
2 Overview of the Routing Study 5 
3 Related Research 8 
3.1 Studies of routing protocols : : : : : : : : : : : : : : : : : : : : : : : : : : : : 8 
3.2 Studies of routing behavior : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 8 
3.3 End­to­end routing dynamics : : : : : : : : : : : : : : : : : : : : : : : : : : : : 9 
3.4 Routing in the Internet : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 10 
4 Methodology 12 
4.1 Experimental apparatus : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 12 
4.2 The traceroute Utility : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 13 
4.2.1 The Time To Live field : : : : : : : : : : : : : : : : : : : : : : : : : : : 13 
4.2.2 How traceroute works : : : : : : : : : : : : : : : : : : : : : : : : : 14 
4.2.3 Traceroute limitations : : : : : : : : : : : : : : : : : : : : : : : : : 15 
4.3 Exponential sampling : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 18 
4.4 Which observations are representative? : : : : : : : : : : : : : : : : : : : : : : 19 
4.5 Testing for significant differences : : : : : : : : : : : : : : : : : : : : : : : : : 20 
4.6 A note on independence : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 22 
5 The Raw Routing Data 23 
5.1 Participating sites : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 23 
5.2 Measurement breakdown : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 27 
5.3 Geography : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 30 

v 
6 Routing Pathologies 34 
6.1 Unresponsive routers : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 34 
6.2 Rate­limiting routers : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 35 
6.3 Routing loops : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 35 
6.3.1 Persistent routing loops : : : : : : : : : : : : : : : : : : : : : : : : : : 36 
6.3.2 Temporary routing loops : : : : : : : : : : : : : : : : : : : : : : : : : : 41 
6.3.3 Location of routing loops : : : : : : : : : : : : : : : : : : : : : : : : : 44 
6.4 Erroneous routing : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 44 
6.5 Connectivity altered mid­stream : : : : : : : : : : : : : : : : : : : : : : : : : : 45 
6.6 Fluttering : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 49 
6.6.1 A simple example : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 49 
6.6.2 A more dramatic example : : : : : : : : : : : : : : : : : : : : : : : : : 50 
6.6.3 Fluttering at another site : : : : : : : : : : : : : : : : : : : : : : : : : : 55 
6.6.4 Skipping : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 56 
6.6.5 Significance of fluttering : : : : : : : : : : : : : : : : : : : : : : : : : : 57 
6.7 Unreachability : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 58 
6.7.1 Host down : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 58 
6.7.2 Stub network outage : : : : : : : : : : : : : : : : : : : : : : : : : : : : 58 
6.7.3 Infrastructure failure : : : : : : : : : : : : : : : : : : : : : : : : : : : : 60 
6.7.4 Consistently unreachable hosts : : : : : : : : : : : : : : : : : : : : : : : 61 
6.7.5 Unreachable due to too many hops : : : : : : : : : : : : : : : : : : : : : 61 
6.8 Temporary outages : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 62 
6.9 Circuitous routing : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 64 
6.10 Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 69 
7 End­to­End Routing Stability 71 
7.1 Importance of routing stability : : : : : : : : : : : : : : : : : : : : : : : : : : : 71 
7.2 Why routes change : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 73 
7.3 Two definitions of stability : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 74 
7.4 Reducing the data : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 75 
7.5 Routing Prevalence : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 77 
7.6 Routing Persistence : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 82 
7.6.1 Rapid route alternation : : : : : : : : : : : : : : : : : : : : : : : : : : : 82 
7.6.2 Medium­scale route alternation : : : : : : : : : : : : : : : : : : : : : : 86 
7.6.3 Large­scale route alternation : : : : : : : : : : : : : : : : : : : : : : : : 86 
7.6.4 Duration of long­lived routes : : : : : : : : : : : : : : : : : : : : : : : : 87 
7.6.5 Summary of routing persistence : : : : : : : : : : : : : : : : : : : : : : 88 
7.7 Detecting route changes : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 89 
8 Routing Symmetry 92 
8.1 Importance of routing symmetry : : : : : : : : : : : : : : : : : : : : : : : : : : 92 
8.2 Sources of routing asymmetries : : : : : : : : : : : : : : : : : : : : : : : : : : 93 
8.3 Definition of routing symmetry : : : : : : : : : : : : : : : : : : : : : : : : : : : 95 
8.4 Analysis of routing symmetry : : : : : : : : : : : : : : : : : : : : : : : : : : : 97 
8.5 Increasing prevalence of asymmetry : : : : : : : : : : : : : : : : : : : : : : : : 98 

vi 
8.6 Size of asymmetries : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 98 
II End­to­End Internet Packet Dynamics 101 
9 Overview of the Packet Dynamics Study 102 
9.1 Methodology : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 103 
9.1.1 Measurement considerations : : : : : : : : : : : : : : : : : : : : : : : : 103 
9.1.2 Using TCP : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 104 
9.1.3 Tracing at both sender and receiver : : : : : : : : : : : : : : : : : : : : 106 
9.1.4 Analysis strategies : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 107 
9.2 An overview of TCP : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 109 
9.2.1 Data delivery goals : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 109 
9.2.2 Achieving high performance : : : : : : : : : : : : : : : : : : : : : : : : 110 
9.2.3 Congestion control : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 112 
9.2.4 Slow start : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 113 
9.2.5 Self­clocking : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 114 
9.2.6 Responding to congestion : : : : : : : : : : : : : : : : : : : : : : : : : 117 
9.2.7 Fast retransmit and recovery : : : : : : : : : : : : : : : : : : : : : : : : 119 
9.3 The Raw Measurements : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 122 
10 Calibrating Packet Filters 125 
10.1 The notion of ``wire time'' : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 125 
10.2 How packet filters work : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 126 
10.3 Packet filter errors : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 127 
10.3.1 Drops : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 128 
10.3.2 Packet drop reports : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 128 
10.3.3 Inferring filter drops : : : : : : : : : : : : : : : : : : : : : : : : : : : : 129 
10.3.4 Trace truncation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 131 
10.3.5 Additions : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 131 
10.3.6 Resequencing : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 133 
10.3.7 Timing : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 135 
10.3.8 Misfiltering : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 137 
10.4 Packet filter ``vantage point'' : : : : : : : : : : : : : : : : : : : : : : : : : : : : 138 
10.5 Pairing packet departures and arrivals : : : : : : : : : : : : : : : : : : : : : : : 139 
11 Analyzing TCP Behavior 142 
11.1 Analysis strategy : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 142 
11.2 Checking packet and measurement integrity : : : : : : : : : : : : : : : : : : : : 145 
11.3 Sender analysis : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 146 
11.3.1 Data liberations : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 147 
11.3.2 Inferring sender windows : : : : : : : : : : : : : : : : : : : : : : : : : 149 
11.3.3 Inferring source quenches : : : : : : : : : : : : : : : : : : : : : : : : : 149 
11.3.4 Inferring initial ssthresh : : : : : : : : : : : : : : : : : : : : : : : : : : 151 
11.4 Receiver analysis : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 151 

vii 
11.4.1 Ack obligations : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 151 
11.4.2 Inferring checksum errors : : : : : : : : : : : : : : : : : : : : : : : : : 153 
11.5 Sender behavior of different TCP implementations : : : : : : : : : : : : : : : : : 155 
11.5.1 Previous studies of TCP implementations : : : : : : : : : : : : : : : : : 156 
11.5.2 Generic Tahoe behavior : : : : : : : : : : : : : : : : : : : : : : : : : : 158 
11.5.3 Generic Reno behavior : : : : : : : : : : : : : : : : : : : : : : : : : : : 158 
11.5.4 BSDI TCP : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 159 
11.5.5 Digital OSF/1 TCP : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 161 
11.5.6 HP/UX TCP : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 161 
11.5.7 IRIX TCP : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 162 
11.5.8 Linux TCP : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 162 
11.5.9 NetBSD TCP : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 164 
11.5.10 Solaris TCP : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 165 
11.5.11 SunOS TCP : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 168 
11.5.12 VJ TCP : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 168 
11.6 Receiver behavior of different TCP implementations : : : : : : : : : : : : : : : : 169 
11.6.1 Acking in­sequence data : : : : : : : : : : : : : : : : : : : : : : : : : : 169 
11.6.2 Acking out­of­sequence data : : : : : : : : : : : : : : : : : : : : : : : : 175 
11.6.3 Gratuitous acks : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 176 
11.6.4 Response delays : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 177 
11.7 Behavior of additional TCP implementations : : : : : : : : : : : : : : : : : : : 179 
11.7.1 Windows NT TCP : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 180 
11.7.2 Windows 95 TCP : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 180 
11.7.3 Trumpet/Winsock TCP : : : : : : : : : : : : : : : : : : : : : : : : : : : 181 
12 Calibrating Pairs of Clocks 185 
12.1 Basic clock terminology : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 185 
12.1.1 Resolution : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 186 
12.1.2 Offset : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 186 
12.1.3 Accuracy : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 186 
12.1.4 Skew and drift : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 186 
12.2 Lack of synchronized clocks : : : : : : : : : : : : : : : : : : : : : : : : : : : : 187 
12.3 Terminology for comparing clocks : : : : : : : : : : : : : : : : : : : : : : : : : 187 
12.4 Assessing clock resolution : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 189 
12.4.1 Method for assessing resolution : : : : : : : : : : : : : : : : : : : : : : 189 
12.4.2 Results of assessing resolution : : : : : : : : : : : : : : : : : : : : : : : 190 
12.5 Assessing relative clock offset : : : : : : : : : : : : : : : : : : : : : : : : : : : 191 
12.5.1 Method for assessing relative offset : : : : : : : : : : : : : : : : : : : : 191 
12.5.2 Relative offset for full­sized sender packets : : : : : : : : : : : : : : : : 193 
12.5.3 Results of assessing relative offset : : : : : : : : : : : : : : : : : : : : : 193 
12.6 Detecting clock adjustments : : : : : : : : : : : : : : : : : : : : : : : : : : : : 201 
12.6.1 A graphical technique for detecting adjustments : : : : : : : : : : : : : : 201 
12.6.2 Removing noise from OTT measurements : : : : : : : : : : : : : : : : : 203 
12.6.3 An algorithm for detecting adjustments : : : : : : : : : : : : : : : : : : 204 
12.6.4 Results of checking for adjustments : : : : : : : : : : : : : : : : : : : : 207 

viii 
12.6.5 Problems with detection method : : : : : : : : : : : : : : : : : : : : : : 207 
12.6.6 Detecting adjustments via correlation : : : : : : : : : : : : : : : : : : : 212 
12.7 Assessing relative clock skew : : : : : : : : : : : : : : : : : : : : : : : : : : : 213 
12.7.1 Defining canonical sender/receiver skew : : : : : : : : : : : : : : : : : : 215 
12.7.2 Difficulties with noise : : : : : : : : : : : : : : : : : : : : : : : : : : : 216 
12.7.3 Failure of line­fitting approaches : : : : : : : : : : : : : : : : : : : : : : 218 
12.7.4 A test based on cumulative minima : : : : : : : : : : : : : : : : : : : : 218 
12.7.5 Applying the test to a positive trend : : : : : : : : : : : : : : : : : : : : 220 
12.7.6 Identifying skew trends : : : : : : : : : : : : : : : : : : : : : : : : : : 220 
12.7.7 Results of checking for skew : : : : : : : : : : : : : : : : : : : : : : : : 222 
12.7.8 oce's puzzling dynamics : : : : : : : : : : : : : : : : : : : : : : : : : 224 
12.7.9 Removing relative skew : : : : : : : : : : : : : : : : : : : : : : : : : : 227 
12.8 Additional clock consistency checks : : : : : : : : : : : : : : : : : : : : : : : : 228 
12.8.1 Non­positive min­RTT sr : : : : : : : : : : : : : : : : : : : : : : : : : : 228 
12.8.2 Gap analysis : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 229 
12.9 Clock synchronization vs. stability : : : : : : : : : : : : : : : : : : : : : : : : : 230 
13 Network Pathologies 232 
13.1 Out­of­order delivery : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 232 
13.1.1 Detecting out­of­order delivery : : : : : : : : : : : : : : : : : : : : : : 233 
13.1.2 Results of out­of­order analysis : : : : : : : : : : : : : : : : : : : : : : 233 
13.1.3 Impact of reordering : : : : : : : : : : : : : : : : : : : : : : : : : : : : 237 
13.2 Packet replication : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 245 
13.3 Packet corruption : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 248 
14 Bottleneck Bandwidth 252 
14.1 Bottleneck bandwidth as a fundamental quantity : : : : : : : : : : : : : : : : : : 252 
14.2 Packet pair : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 254 
14.3 Receiver­based packet pair : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 256 
14.4 Difficulties with packet pair : : : : : : : : : : : : : : : : : : : : : : : : : : : : 257 
14.4.1 Out­of­order delivery : : : : : : : : : : : : : : : : : : : : : : : : : : : 257 
14.4.2 Limitations due to clock resolution : : : : : : : : : : : : : : : : : : : : 258 
14.4.3 Changes in bottleneck bandwidth : : : : : : : : : : : : : : : : : : : : : 260 
14.4.4 Multi­channel bottleneck links : : : : : : : : : : : : : : : : : : : : : : : 261 
14.5 Peak rate estimation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 263 
14.6 Robust bottleneck estimation : : : : : : : : : : : : : : : : : : : : : : : : : : : : 266 
14.6.1 Forming estimates for each ``extent'' : : : : : : : : : : : : : : : : : : : : 267 
14.6.2 Searching for bottleneck bandwidth modes : : : : : : : : : : : : : : : : 269 
14.7 Analysis of bottleneck bandwidths in the Internet : : : : : : : : : : : : : : : : : 274 
14.7.1 Single bottlenecks : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 275 
14.7.2 Bottleneck changes : : : : : : : : : : : : : : : : : : : : : : : : : : : : 282 
14.7.3 Multi­channel bottlenecks : : : : : : : : : : : : : : : : : : : : : : : : : 284 
14.7.4 Estimation errors due to TCP behavior : : : : : : : : : : : : : : : : : : : 286 
14.8 Efficacy of other estimation techniques : : : : : : : : : : : : : : : : : : : : : : : 287 
14.8.1 Efficacy of PR : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 287 

ix 
14.8.2 Efficacy of RBPP : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 288 
14.8.3 Efficacy of SBPP : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 288 
14.8.4 Summary of different bottleneck estimators : : : : : : : : : : : : : : : : 290 
15 Packet Loss 291 
15.1 Loss rates : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 291 
15.2 Data packet loss vs. ack loss : : : : : : : : : : : : : : : : : : : : : : : : : : : : 299 
15.3 Loss bursts : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 305 
15.4 Loss location : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 310 
15.5 Evolution of packet loss rate : : : : : : : : : : : : : : : : : : : : : : : : : : : : 313 
15.6 Efficacy of TCP retransmission : : : : : : : : : : : : : : : : : : : : : : : : : : 316 
16 Packet Delay 323 
16.1 RTT variation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 324 
16.1.1 The role of RTTs : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 324 
16.1.2 RTT measurement considerations : : : : : : : : : : : : : : : : : : : : : 324 
16.1.3 RTT extremes : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 325 
16.1.4 RTT variation during a connection : : : : : : : : : : : : : : : : : : : : : 327 
16.2 OTT variation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 332 
16.2.1 Why we do not analyze OTT extremes : : : : : : : : : : : : : : : : : : : 332 
16.2.2 Range of OTT variation : : : : : : : : : : : : : : : : : : : : : : : : : : 332 
16.2.3 Path symmetry of OTT variation : : : : : : : : : : : : : : : : : : : : : : 333 
16.2.4 Relationship between loss rate and OTT variation : : : : : : : : : : : : : 335 
16.2.5 Evolution of OTT variation : : : : : : : : : : : : : : : : : : : : : : : : 335 
16.2.6 Removing load from OTTs : : : : : : : : : : : : : : : : : : : : : : : : : 338 
16.2.7 Periodicity in OTTs : : : : : : : : : : : : : : : : : : : : : : : : : : : : 342 
16.3 Timing compression : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 343 
16.3.1 Ack compression : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 344 
16.3.2 Data packet timing compression : : : : : : : : : : : : : : : : : : : : : : 345 
16.3.3 Receiver compression : : : : : : : : : : : : : : : : : : : : : : : : : : : 348 
16.4 Queueing analysis : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 349 
16.5 Available bandwidth : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 354 
17 Summary 365 
17.1 The routing study : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 365 
17.2 The packet dynamics study : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 366 
17.2.1 Measurement calibration and TCP behavior : : : : : : : : : : : : : : : : 366 
17.2.2 Timing calibration : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 367 
17.2.3 Network pathologies : : : : : : : : : : : : : : : : : : : : : : : : : : : : 367 
17.2.4 Estimating bottleneck bandwidth : : : : : : : : : : : : : : : : : : : : : : 367 
17.2.5 Packet loss : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 368 
17.2.6 Packet delay : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 369 
17.3 Future research : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 370 
17.4 Themes of the work : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 371 

x 
Bibliography 372 
A The Network Probe Daemon 383 
A.1 Daemon operation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 383 
A.2 Security issues : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 385 
A.2.1 Using rtcpdump instead of tcpdump : : : : : : : : : : : : : : : : : 386 
A.2.2 NPD authentication : : : : : : : : : : : : : : : : : : : : : : : : : : : : 386 

xi 
List of Figures 
5.1 Sites participating in routing study, North America and Asia : : : : : : : : : : 25 
5.2 Sites participating in routing study, Europe : : : : : : : : : : : : : : : : : : : 26 
5.3 Number of measurements made for each Internet path, R 1 dataset : : : : : : : 28 
5.4 Number of measurements made for each Internet path, R 2 dataset : : : : : : : 29 
5.5 Links traversed during R 1 and R 2 , North American perspective : : : : : : : : : 33 
5.6 Links traversed during R 1 and R 2 , European perspective : : : : : : : : : : : : 33 
6.1 Routes taken by alternating packets, wustl to umann : : : : : : : : : : : : : 52 
6.2 Distribution of long R 1 outages : : : : : : : : : : : : : : : : : : : : : : : : : 63 
6.3 Distribution of long R 2 outages : : : : : : : : : : : : : : : : : : : : : : : : : 63 
6.4 Circuitous route from bsdi to usc : : : : : : : : : : : : : : : : : : : : : : : 64 
6.5 Circuitous route from lbli to ucol : : : : : : : : : : : : : : : : : : : : : : 65 
6.6 Circuitous route from nrao to wustl : : : : : : : : : : : : : : : : : : : : : : 65 
6.7 Circuitous route from lbl to wustl : : : : : : : : : : : : : : : : : : : : : : 66 
6.8 Individual routers comprising circuitous path from lbl to wustl : : : : : : : : 67 
6.9 Circuitous route from ncar to xor : : : : : : : : : : : : : : : : : : : : : : : 68 
6.10 Circuitous route from inria to oce : : : : : : : : : : : : : : : : : : : : : : 68 
7.1 Prevalence of the dominant route : : : : : : : : : : : : : : : : : : : : : : : : : 78 
7.2 Prevalence of the dominant route, for different source sites : : : : : : : : : : : : 80 
7.3 Prevalence of the dominant route, for different destination sites : : : : : : : : : 81 
7.4 Site­to­site variation in P 10 
dst s : : : : : : : : : : : : : : : : : : : : : : : : : : : 84 
7.5 Estimated distribution of long­lived route durations : : : : : : : : : : : : : : : 88 
8.1 Route observed from ucol to ucl : : : : : : : : : : : : : : : : : : : : : : : : 96 
8.2 Route observed from ucl to ucol : : : : : : : : : : : : : : : : : : : : : : : : 96 
8.3 Second route observed from ucl to ucol : : : : : : : : : : : : : : : : : : : : 97 
8.4 Distribution of asymmetry sizes : : : : : : : : : : : : : : : : : : : : : : : : : 100 
9.1 Sequence plot of a TCP connection during its ``slow start'' phase : : : : : : : : 113 
9.2 Sequence plot of a ``window­limited'' TCP connection : : : : : : : : : : : : : : 115 
9.3 TCP ``self­clocking'' : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 116 
9.4 Sequence plot showing a TCP timeout retransmission : : : : : : : : : : : : : : 119 
9.5 Sequence plot showing a TCP ``fast retransmission'' : : : : : : : : : : : : : : : 120 
9.6 Sequence plot showing TCP ``fast recovery'' : : : : : : : : : : : : : : : : : : : 122 

xii 
10.1 Packet filter replication : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 132 
10.2 Packet filter resequencing : : : : : : : : : : : : : : : : : : : : : : : : : : : : 133 
10.3 Enlargement of resequencing event in previous figure : : : : : : : : : : : : : : 134 
10.4 Example of ``time travel'' : : : : : : : : : : : : : : : : : : : : : : : : : : : : 136 
10.5 Same plot, with lines showing the ordering of the packets in the trace file : : : : 136 
10.6 Receiver sequence plot showing a forward clock adjustment, undetectable to the 
eye : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 137 
10.7 Example of an ambiguity caused by the packet filter's vantage point : : : : : : : 138 
11.1 Sequence plot showing effects of unobserved source quench : : : : : : : : : : 150 
11.2 Receiver sequence plot showing two data checksum errors : : : : : : : : : : : 154 
11.3 Sequence plot showing a burst of checksum errors : : : : : : : : : : : : : : : : 154 
11.4 Sequence plot showing the Net/3 uninitialized­cwnd bug : : : : : : : : : : : : 160 
11.5 Sequence plot showing the HP/UX congestion window advance with duplicate 
acks : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 161 
11.6 Sequence plot showing broken Linux 1.0 retransmission behavior : : : : : : : : 163 
11.7 Enlargement of righthand side of previous figure : : : : : : : : : : : : : : : : 163 
11.8 Sequence plot showing broken Solaris 2.3/2.4 retransmissions, RTT = 680 msec 165 
11.9 Sequence plot showing broken Solaris 2.3/2.4 retransmissions, RTT = 2.6 sec : : 166 
11.10 Solaris 2.4 retransmitting without cutting cwnd : : : : : : : : : : : : : : : : : 167 
11.11 Sequence plot showing Solaris 2.4 acknowledgments during initial slow­start : : 171 
11.12 Corresponding burstiness at sender : : : : : : : : : : : : : : : : : : : : : : : 172 
11.13 Sequence plot showing retransmission timeout due to loss of single Solaris 2.4 ack 173 
11.14 Receiver sequence plot showing lulls due to Solaris 2.3 acking policy : : : : : : 174 
11.15 Sequence plot showing more frequent acking leading to ``filling the pipe'' : : : : 175 
11.16 Sequence plot showing gratuitous acknowledgement : : : : : : : : : : : : : : 177 
11.17 Sequence plot showing false gratuitous acknowledgement : : : : : : : : : : : : 178 
11.18 Sequence plot showing Windows 95 TCP transmit problem : : : : : : : : : : : 180 
11.19 Sequence plot showing Trumpet/Winsock TCP skipping initial slow start : : : : 181 
11.20 Sequence plot showing Trumpet/Winsock TCP skipping slow start after timeout 182 
11.21 Sequence plot showing Trumpet/Winsock timer­driven acking : : : : : : : : : 183 
11.22 Sequence plot showing Trumpet/Winsock failure to retain above­sequence data : 183 
12.1 Median magnitude of clock offset, N 1 tracing hosts : : : : : : : : : : : : : : : 194 
12.2 Median magnitude of clock offset, N 2 tracing hosts : : : : : : : : : : : : : : : 194 
12.3 Evolution of austr's relative clock offset over the course of N 1 : : : : : : : : 196 
12.4 Evolution of oce's relative clock offset over the course of N 1 : : : : : : : : : : 197 
12.5 Evolution of bnl's relative clock offset over the course of N 1 : : : : : : : : : : 197 
12.6 Expanded view of the central line in the previous figure : : : : : : : : : : : : : 198 
12.7 Evolution of xor's relative clock offset over the course of N 1 : : : : : : : : : : 199 
12.8 Evolution of oce's relative clock offset over the course of N 2 : : : : : : : : : : 199 
12.9 Evolution of lbli's relative clock offset over the course of N 2 : : : : : : : : : 200 
12.10 Evolution of sandia's relative clock offset over the course of N 2 : : : : : : : : 200 
12.11 Evolution of umont's relative clock offset over the course of N 2 : : : : : : : : 201 
12.12 OTT­pair plot illustrating a clock adjustment : : : : : : : : : : : : : : : : : : 202 

xiii 
12.13 Same measurements after de­noising pair­plot : : : : : : : : : : : : : : : : : : 205 
12.14 Clock adjustment via temporary skew : : : : : : : : : : : : : : : : : : : : : : 208 
12.15 Temporary skew leading to separate pivots : : : : : : : : : : : : : : : : : : : 208 
12.16 Clock adjustment masked by excessive network delays : : : : : : : : : : : : : 209 
12.17 Clock adjustment missed because too close to end of connection : : : : : : : : 210 
12.18 Double clock adjustment : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 211 
12.19 Clock adjustment ``hiccup'' : : : : : : : : : : : : : : : : : : : : : : : : : : : 211 
12.20 An OTT pair plot showing strong negative correlation : : : : : : : : : : : : : : 213 
12.21 An OTT pair plot showing relative clock skew : : : : : : : : : : : : : : : : : : 214 
12.22 Clock skew obscured by network delays : : : : : : : : : : : : : : : : : : : : : 217 
12.23 Enlargement of reverse path : : : : : : : : : : : : : : : : : : : : : : : : : : : 217 
12.24 Distribution of R(n; k) for n = 15 : : : : : : : : : : : : : : : : : : : : : : : : 220 
12.25 Example of extreme clock skew : : : : : : : : : : : : : : : : : : : : : : : : : 223 
12.26 Strong relative clock skew of 6% : : : : : : : : : : : : : : : : : : : : : : : : 224 
12.27 Example of puzzling oce behavior : : : : : : : : : : : : : : : : : : : : : : : 225 
12.28 Another example of puzzling oce behavior : : : : : : : : : : : : : : : : : : : 225 
12.29 One more example of puzzling oce behavior : : : : : : : : : : : : : : : : : : 226 
12.30 Initial packet filter timing glitch : : : : : : : : : : : : : : : : : : : : : : : : : 229 
13.1 Sequence plot showing a connection with 36% of data packets delivered out­of­ 
order : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 235 
13.2 Sequence plot showing a connection with an out­of­order gap of 54 packets : : : 236 
13.3 Out­of­order delivery with two distinct slopes : : : : : : : : : : : : : : : : : : 236 
13.4 Sequence plot of entire connection shown in previous figure : : : : : : : : : : : 237 
13.5 Sequence plot of ack delivered out­of­order : : : : : : : : : : : : : : : : : : : 238 
13.6 Sequence plot of two acks delivered out­of­order and very late : : : : : : : : : 238 
13.7 Distribution of out­of­order delivery interval for N 1 data packets : : : : : : : : 240 
13.8 Distribution of data packet out­of­order delivery interval for N 1 and N 2 : : : : : 241 
13.9 Sequence plot showing retransmission event leading to top duplicate ack series : 244 
13.10 Enlargement of top duplicate ack series : : : : : : : : : : : : : : : : : : : : : 245 
13.11 Two acks replicated 8 times each : : : : : : : : : : : : : : : : : : : : : : : : 246 
13.12 Data packet replicated 22 times : : : : : : : : : : : : : : : : : : : : : : : : : 247 
13.13 Data packet replicated at sender : : : : : : : : : : : : : : : : : : : : : : : : : 247 
14.1 Paired sequence plot showing timing of data packets at sender and when received 256 
14.2 Same plot with acks included : : : : : : : : : : : : : : : : : : : : : : : : : : 257 
14.3 Receiver sequence plot illustrating difficulties of packet­pair bottleneck band­ 
width estimation in the presence of out­of­order arrivals : : : : : : : : : : : : : 258 
14.4 Receiver sequence plot showing two distinct bottleneck bandwidths : : : : : : : 260 
14.5 Enlargement of part of the previous figure : : : : : : : : : : : : : : : : : : : : 261 
14.6 Enlargement of part of the previous figure : : : : : : : : : : : : : : : : : : : : 262 
14.7 Multi­channel phasing effect : : : : : : : : : : : : : : : : : : : : : : : : : : : 263 
14.8 Peak­rate optimistic and conservative bottleneck estimates, window­limited con­ 
nection : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 266 
14.9 Erroneous optimistic estimate due to data packet compression : : : : : : : : : : 267 

xiv 
14.10 Histogram of different single­bottleneck estimates for N 1 : : : : : : : : : : : : 276 
14.11 Histogram of different single­bottleneck estimates for N 2 : : : : : : : : : : : : 277 
14.12 Box plots of bottlenecks for different N 2 receiving sites : : : : : : : : : : : : : 280 
14.13 Time until a 20% shift in bottleneck bandwidth, if ever observed : : : : : : : : 281 
14.14 Symmetry of median bottleneck rate : : : : : : : : : : : : : : : : : : : : : : : 283 
14.15 Sequence plot reflecting halving of bottleneck rate : : : : : : : : : : : : : : : 284 
14.16 Excerpt from a trace exhibiting a false ``multi­channel'' bottleneck : : : : : : : 285 
14.17 Self­clocking TCP ``fast recovery'' : : : : : : : : : : : : : : : : : : : : : : : : 286 
15.1 Connection durations for N 1 (solid) and N 2 (dotted) : : : : : : : : : : : : : : 292 
15.2 Connection durations for sites common to N 1 (solid) and N 2 (dotted) : : : : : : 294 
15.3 Hourly variation in ack loss rate for North American connections : : : : : : : : 297 
15.4 Hourly variation in ack loss rate for European connections : : : : : : : : : : : 298 
15.5 Successful North American measurements, per hour : : : : : : : : : : : : : : : 298 
15.6 Successful European measurements, per hour : : : : : : : : : : : : : : : : : : 299 
15.7 N 2 loss rates for data packets and acks : : : : : : : : : : : : : : : : : : : : : 301 
15.8 Complementary distribution plot of N 2 unloaded data packet loss rate : : : : : : 303 
15.9 Complementary distribution plot of N 2 loaded data packet loss rate : : : : : : : 304 
15.10 Complementary distribution plot of N 2 ack loss rate : : : : : : : : : : : : : : : 304 
15.11 Distribution of packet loss outage durations : : : : : : : : : : : : : : : : : : : 307 
15.12 Distribution of packet loss outage durations exceeding 200 msec : : : : : : : : 308 
15.13 Log­log complementary distribution plot of N 2 ack outage durations : : : : : : 308 
15.14 Receiver sequence plot showing packet lost at or before bottleneck link : : : : : 311 
15.15 Receiver sequence plot showing packet lost after bottleneck link : : : : : : : : 311 
15.16 Evolution of how well observing a zero­loss connection predicts that a future con­ 
nection will also be zero­loss : : : : : : : : : : : : : : : : : : : : : : : : : : 314 
15.17 Evolution of how well observing a non­zero­loss connection predicts that a future 
connection will also be non­zero­loss : : : : : : : : : : : : : : : : : : : : : : 315 
15.18 Evolution of the mean difference in loss­rate between successive connections 
along the same path : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 316 
15.19 Receiver sequence plot showing large number of sequence holes : : : : : : : : 317 
15.20 Redundant retransmissions subsequent to previous figure : : : : : : : : : : : : 318 
15.21 Sender sequence plot showing failure of RTO adaption : : : : : : : : : : : : : 320 
16.1 Distribution of the ratio between a connection's maximum RTT to minimum RTT 328 
16.2 Log­log complementary distribution plot of max­min RTT ratio : : : : : : : : : 328 
16.3 Distribution of inverse ratio (minimum RTT to maximum RTT) : : : : : : : : : 329 
16.4 Q­Q plot of ratio of minimum RTT to maximum RTT versus fitted normal distri­ 
bution : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 329 
16.5 Distribution of RTT interquartile range : : : : : : : : : : : : : : : : : : : : : 330 
16.6 Distribution of RTT interquartile range, normalized to minimum RTT : : : : : : 331 
16.7 Distribution of difference between maximum RTT and minimum RTT, normalized 
by interquartile range : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 331 
16.8 Distribution of interquartile and max­min OTT variation : : : : : : : : : : : : 333 

xv 
16.9 Scatter plot of interquartile ranges of unloaded data packet OTT variations versus 
acks : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 334 
16.10 Scatter plot of ack loss rate versus interquartile ack OTT variation, for N 2 con­ 
nections that lost at least one ack : : : : : : : : : : : : : : : : : : : : : : : : 336 
16.11 Evolution of how the interquartile range of normalized ack OTT variation differs 
with time : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 337 
16.12 Evolution of how the interquartile range of raw ack OTT variation differs with time 338 
16.13 OTT plot revealing ``broken'' bottleneck estimate: one that is too low : : : : : : 339 
16.14 Another OTT plot revealing a ``broken'' bottleneck estimate: one that failed to 
detect a change in the bottleneck rate : : : : : : : : : : : : : : : : : : : : : : : 340 
16.15 OTT plot showing virtually all OTT variation due to connection's own queueing 
load : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 341 
16.16 Enlargement of adjusted OTTs from previous figure : : : : : : : : : : : : : : : 341 
16.17 Ack OTT plot showing 10­sec periodicities : : : : : : : : : : : : : : : : : : : 342 
16.18 Paired sequence plot showing ack compression : : : : : : : : : : : : : : : : : 344 
16.19 Data packet timing compression : : : : : : : : : : : : : : : : : : : : : : : : : 346 
16.20 Rampant data packet timing compression : : : : : : : : : : : : : : : : : : : : 347 
16.21 Receiver sequence plot showing major receiver compression : : : : : : : : : : 347 
16.22 Ack OTT plot for a connection with b 
ø = 4 sec for \DeltaQ ø : : : : : : : : : : : : : 350 
16.23 Ack OTT plot for a connection with b 
ø = 1 sec for Q max 
ø : : : : : : : : : : : : 350 
16.24 Proportion (normalized) of connections with given timescale of maximum sus­ 
tained delay variation (b ø ) : : : : : : : : : : : : : : : : : : : : : : : : : : : : 352 
16.25 Proportion (normalized) of connections with given timescale of maximum peak 
delay variation (b ø ) : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 353 
16.26 Distribution of N 1 inferred available bandwidth (fi) : : : : : : : : : : : : : : : 357 
16.27 Distribution of N 2 inferred available bandwidth (fi) : : : : : : : : : : : : : : : 357 
16.28 Distribution of N 1 inferred available bandwidth (fi) for connections with bottle­ 
neck rates exceeding 100 Kbyte/sec : : : : : : : : : : : : : : : : : : : : : : : 359 
16.29 Distribution of N 2 inferred available bandwidth (fi) for connections with bottle­ 
neck rates exceeding 100 Kbyte/sec : : : : : : : : : : : : : : : : : : : : : : : 359 
16.30 Distribution of N 2 inferred available bandwidth (fi) for connections with bottle­ 
neck rates exceeding 250 Kbyte/sec : : : : : : : : : : : : : : : : : : : : : : : 360 
16.31 Distribution of N 1 minimum inferred available bandwidth (fi) for connections 
with bottleneck rates exceeding 100 Kbyte/sec : : : : : : : : : : : : : : : : : 360 
16.32 Distribution of N 1 maximum inferred available bandwidth (fi) for connections 
with bottleneck rates exceeding 100 Kbyte/sec : : : : : : : : : : : : : : : : : 361 
16.33 Distribution of N 2 inferred available bandwidth (fi) for U.S. connections : : : : 362 
16.34 Distribution of N 2 inferred available bandwidth (fi) for European connections : 363 
16.35 Evolution of difference between inferred available bandwidth (fi) for successive 
connections : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 363 

xvi 
List of Tables 
I Sites participating in first experiment (R 1 ) : : : : : : : : : : : : : : : : : : : : 24 
II Additional sites participating in second experiment (R 2 ) : : : : : : : : : : : : 25 
III Summary of routing experiment difficulties : : : : : : : : : : : : : : : : : : : 27 
IV Uncertain router sites : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 30 
V Router cities : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 32 
VI Persistent routing loops in R 1 : : : : : : : : : : : : : : : : : : : : : : : : : : 37 
VII Persistent routing loops in R 2 : : : : : : : : : : : : : : : : : : : : : : : : : : 40 
VIII Failure modes for unreachable hosts in R 1 : : : : : : : : : : : : : : : : : : : : 58 
IX Failure modes for unreachable hosts in R 2 : : : : : : : : : : : : : : : : : : : : 58 
X Summary of representative routing pathologies : : : : : : : : : : : : : : : : : 69 
XI Tightly­coupled routers : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 76 
XII Summary of persistence at different time scales : : : : : : : : : : : : : : : : : 89 
XIII Summary of TTL method for detecting route changes : : : : : : : : : : : : : : 90 
XIV Sites participating in the packet dynamics study : : : : : : : : : : : : : : : : : 123 
XV TCP Implementations known to tcpanaly : : : : : : : : : : : : : : : : : : 144 
XVI Relationship between relative clock accuracy and clock adjustments : : : : : : 230 
XVII Relationship between relative clock accuracy and clock skew : : : : : : : : : : 231 
XVIII Types of results of bottleneck estimation for N 1 and N 2 : : : : : : : : : : : : : 274 
XIX Types of results after eliminating trace pairs with lbli : : : : : : : : : : : : : 274 
XX Raw and user­data rates of different common links : : : : : : : : : : : : : : : 278 
XXI Ack loss rates for different connection geographies : : : : : : : : : : : : : : : 295 
XXII Conditional ack loss rates for different connection geographies : : : : : : : : : 296 
XXIII Unconditional and conditional loss rates for different packet types : : : : : : : 306 
XXIV Proportion of redundant retransmissions (RRs) due to different causes : : : : : : 319 

xvii 
Acknowledgements 
This work has its roots in the teaching, help, patience, and inspiration of a great number 
of people, to whom I wish to express my heartfelt gratitude. 
Simply put, Van Jacobson is the reason I have studied networking; the reason I embarked 
on this study; and the reason I had faith that the work would, with sufficient diligence, yield a host 
of new insights. I am delighted that, having known him for nearly twenty years, I still find he has 
much to teach me. 
Likewise, this work drew inspiration and invaluable support from Domenico Ferrari. The 
energy and respect that he affords to both his students' efforts, and to his students themselves, has 
made it a privilege to be advised by him. 
I have also been delighted to have Sally Floyd as my mentor, colleague, and friend. She 
has listened to countless half­baked ideas of how to analyze and interpret various measurements, 
and has always patiently separated the promising from the harebrained. This calibration of ideas, 
and her suggestions on how to then pursue the more promising ones, has proved invaluable for 
fostering my sense of how to conduct sound research. 
A number of others played major roles in shaping this work. I would particularly like to 
thank John Rice and Mike Luby for their industrious efforts in serving on my dissertation committee, 
which led to the work being much more solid than it would otherwise have been. 1 
My heartfelt thanks to Greg Minshall, for his detailed, insightful comments on nearly 
every page of the work (and for his willingness to burn an entire Friday evening discussing some 
of them); and to Amit Gupta, John Hawkinson, Kurt Lidl, Craig Partridge, and anonymous SIG­ 
COMM and IEEE/ACM Transactions on Networking referees, all of whom contributed very helpful 
comments on earlier versions of the work. 
I would like to also thank my colleagues at the Network Research Group: Kevin Fall, 
Craig Leres, and Steve McCanne, for their much appreciated ideas, support, and feedback. 
Special thanks to Kathryn Crabtree, for her untiring help in surmounting innumerable 
administrative hurdles along the dissertation trail. She is an invaluable asset to UCB computer 
science. 
This work would not have been possible without the efforts of the many volunteers who in­ 
stalled the Network Probe Daemon at their sites. In the process they endured debugging headaches, 
inetd crashes, software updates, and a seemingly endless stream of queries from me regarding 
their site's behavior. I am indebted to: 
Guy Almes and Bob Camm (adv); 
Jos Alsters (unij); 
Jean­Chrysostome Bolot (inria); 
Hans­Werner Braun, Kim Claffy, and Bilal Chinoy (sdsc); 
Randy Bush (rain); 
Jon Crowcroft and Atanu Ghosh (ucl); 
Peter Danzig and Katia Obraczka (usc); 
Mark Eliot (sri); 
Robert Elz (austr); 
1 Particular thanks to Mike for throwing down the glove, and for knowing which glove to use. 

xviii 
Teus Hagen (oce); 
Steinar Haug and Hšavard Eidnes (sintef1, sintef2); 
John Hawkinson (near and panix); 
TR Hein (xor); 
Tobias Helbig and Werner Sinze (ustutt); 
Paul Hyder (ncar); 
Alden Jackson (sandia); 
Kate Lance (austr2); 
Craig Leres (lbl); 
Kurt Lidl (pubnix); 
Peter Linington, Alan Ibbetson, Peter Collinson, and Ian Penny (ukc); 
Steve McCanne (lbli); 
John Milburn (korea); 
Walter Mueller (umann); 
Evi Nemeth, Mike Schwartz, Dirk Grunwald, Lynda McGinley (ucol, batman); 
Franc’ois Pinard (umont); 
Jeff Polk and Keith Bostic (bsdi); 
Todd Satogata (bnl); 
Doug Schmidt and Miranda Flory (wustl); 
Sorell Slaymaker and Alan Hannan (mid); 
Don Wells and Dave Brown (nrao); 
Gary Wright (connix); 
John Wroclawski (mit); 
Cliff Young and Brad Karp (harv); and 
Lixia Zhang, Mario Gerla, and Simon Walton (ucla). 
I am likewise indebted to Keith Bostic, Evi Nemeth, Rich Stevens, George Varghese, 
Andres Albanese, Wieland Holfelder, and Bernd Lamparter for their invaluable help in recruiting 
NPD sites. Thanks, too, to Peter Danzig, Jeff Mogul, and Mike Schwartz for feedback on the design 
of NPD. 
This work also benefited from discussions with Guy Almes, Tom Anderson, Robert Elz, 
Teus Hagen, John Krawczyk, Kate Lance, Dun Liu, Paul Love, Jamshid Mahdavi, Matt Mathis, 
Dave Mills, Pravin Varaiya, Curtis Villamizar, and Walter Willinger. 
A preliminary analysis of the R 1 routing dataset was done by Mark Stemm and Ketan 
Patel. 
Often to understand the behavior of particular routers or to determine their location, I 
asked personnel from the organization responsible for the routers. I was delighted at how willing 
they were to help, and in this regard would like to acknowledge: 
Vadim Antonov, Tony Bates, Michael Behringer, Per Gregers Bilse, Bjorn Carlsson, 
Peggy Cheng, Guy Davies, Sean Doran, Bjorn Eriksen, Amit Gupta, Tony Hain, John 
Hawkinson again!, Susan Harris, Ittai Hershman, Kevin Hoadley, Scott Huddle, James 
Jokl, Kristi Keith, Harald Koch, Craig Labovitz, Tony Li, Martijn Lindgreen, Ted Lind­ 
green, Dan Long, Bill Manning, Milo Medin, Keith Mitchell, Roderik Muit, Chris My­ 
ers, Torben Nielsen, Richard Nuttall, Mark Oros, Michael Ramsey, Juergen Rauschen­ 

xix 
bach, Douglas Ray, Brian Renaud, Jyrki Soini, Nigel Titley, Paul Vixie, and Rusty 
Zickefoose. 
Finally, this work would never have been realized without the ongoing support provided 
by the Lawrence Berkeley National Laboratory. I am deeply grateful. In particular, I would like to 
thank Stu Loken and Ed Theil for their efforts and encouragement. 
This work was supported by the Director, Office of Energy Research, Scientific Comput­ 
ing Staff, of the U.S. Department of Energy under Contract No. DE­AC03­76SF00098. 

1 
Chapter 1 
Introduction 
As the Internet grows larger, measuring and characterizing its dynamics grows harder. 
Part of the difficulty is how quickly the network changes. Depending on the figure of interest, the 
network between 80% and 100% each year, and has sustained this growth for well over a decade. 
Furthermore, the dominant protocols and their patterns of use can change radically over just a few 
years, or even a few months [Pa94b, CBP94]. 
Another difficulty, though, is the network's incredible---and increasing---heterogeneity. It 
is more and more difficult to measure a plausibly representative cross­section of its behavior. It is 
this latter concern that we attempt to address in this work. In this chapter, we develop the context 
for the rest of the study, by discussing different types of traffic studies, and how research efforts of 
those types have addressed heterogeneity problems. Our study falls in the category with perhaps the 
greatest heterogeneity difficulties, that of the ``end­to­end'' performance of entire paths through the 
network. 
Our work has two distinct parts: a study of end­to­end routing behavior in the Internet 
(Part I), and a study of end­to­end Internet packet dynamics (Part II). These two are united by 
the common measurement framework used to gather the data analyzed in each part (described in 
Chapter 4). In addition, some of the results used in each part are incorporated into analysis in the 
other part. However, in many ways the two parts are distinct and self­contained. A reader particu­ 
larly interested in one or the other topic might profitably just read the relevant part. Consequently, 
we defer an overview of each part to later chapters (Chapter 2 of Part I, and Chapter 9 of Part II). 
We summarize both parts, and what we perceive as the themes of the work, in Chapter 17, at the 
end of Part II. For the remainder of this introduction, we give an overview of the general problem 
of measuring large networks, 
We can classify measurement studies into several basic types. Each faces the problem of 
heterogeneity to varying degrees, as follows. 
Exhaustive studies analyze properties of a significant fraction of the entire network. 
Examples are Kleinrock's study of the ARPANET's behavior on time scales of hours to days 
[Kl76]; the series of ``ping'' experiments conducted by Mills to evaluate the effectiveness of the 
TCP retransmission­timeout algorithm [Mi83]; Claffy et al.'s study characterizing traffic on the 
T1 NSFNET backbone [CPB93b]; and Chinoy's study of the dynamics of routing information 
within the NSFNET backbone [Ch93]. While these studies can convincingly characterize the full 
range of behavior one might expect to observe from the network, they become impractical as the 

2 
network grows in size. 1 
Site studies characterize the aggregate traffic patterns observed for entire sites. They fo­ 
cus on the connections sizes, durations, and interarrival times. An early site study, by Danzig and 
colleagues, identified large heterogeneities in the traffic ``mix'' at each site, meaning that the pro­ 
portion of total traffic (total connections, total packets, or total bytes) due to different applications 
varies greatly [DJCME92]. Our subsequent work extended this finding to the characteristics of the 
connections made by each type of application. We found that the distributions of a particular appli­ 
cation's connection sizes and durations varied greatly from site to site [Pa94a], in agreement with 
much earlier findings, in a different communications context, by Fuchs and Jackson [FJ70]. 
Another type of study focuses on server behavior, for services that are distributed over 
the Internet. The heterogeneity issues faced by these studies vary greatly, depending on the service. 
For example, Danzig and colleagues analyzed requests arriving at a ``root'' name server, finding a 
variety of performance problems [DOK92]. However, there are only a handful of root name servers. 
Because clients divide their requests between them, studying a single server yields results plausibly 
representative for all of the servers. On the other hand, a recent study of World Wide Web servers 
had to grapple with the issue of differences among the various servers studied [AW96], and did so 
by developing its central theme around the search for behavioral ``invariants'' among the six Web 
servers analyzed. 
Related to server studies are client studies, which analyze the different ways in which 
clients access servers. From a heterogeneity standpoint, client studies are more difficult than server 
studies, since usually there are many more clients than servers. One approach is to study the behav­ 
ior of all clients located at a particular site [CB96]. Doing so, however, incurs problems similar to 
those of site studies: it is difficult to gauge the generality of the findings. These problems, however, 
can be tempered based on the nature of the service. For example, we might expect request from one 
site's Web clients to more closely resemble those of another site's Web clients, than for the site's 
aggregate traffic to resemble that of another site. 
Another type of study analyzes the aggregate traffic seen on network links. These studies 
have focussed on the dynamics of packet arrivals on the link [FL91, LTWW94, PF95, WTSW95], 
the characteristics of packet ``flows'' [JR86, He90, CBP95], or on traffic patterns over particularly 
singular links, such as the trans­Atlantic link connecting the U.S. and the U.K. [CW91, WLC92]. 
For link studies of local area networks [JR86, FL91, LTWW94, WTSW95], heterogeneity 
presents less of a problem than for those of wide area networks, because the latter encompass a much 
broader range of traffic sources and path characteristics than the former. Some wide area link studies 
attempt to address heterogeneity issues by analyzing traces from multiple links. However, gathering 
link traces is difficult (and becoming more so due to security, privacy, and business concerns), and 
such studies have not to date analyzed more than two dozen or so traces. 2 
Our work falls in still another class, that of end­to­end studies. These studies concern 
1 At the time of the later of the first two studies, the Internet comprised about 600 hosts. As of this writing, it 
comprises about 16 million hosts [Lo97]. At the time of the Claffy study, the backbone consisted of 15 nodes and two 
dozen links. Today, it is much larger, though sources of accurate statistics on its size have virtually disappeared with the 
commercialization of the Internet infrastructure. 
2 Most site studies are conducted as link studies, too, since an extremely convenient way to capture an entire site's 
Internet traffic is to monitor what is usually a single link connecting the site to the rest of the Internet. Some server 
studies can also be conducted in the context of a link study, by analyzing all of the server requests seen on a highly 
aggregated link [EHS92, DHS93]. 

3 
how the network performs from the perspective of an end user. To users, a network like the Internet 
is simply a black box that somehow forwards packets between their host and hosts with which 
they wish to communicate. End­to­end studies face extreme heterogeneity problems because they 
strive to characterize the behavior of paths through the network. Not only does the Internet contain 
millions of distinct paths, but the dynamics of each path reflects the concatenation of the dynamics 
of each forwarding element along the path, and hence can be highly complex. 
The few studies to date of end­to­end packet dynamics---Mogul's look at TCP dynamics 
such as ack compression [Mo92], Bolot's analysis of patterns of packet loss and delay [Bo93], and 
Claffy et al.'s characterization of one­way latencies [CPB93a]---have all been confined to measuring 
a handful of Internet paths, because of the great logistical difficulties, and also analysis difficulties, 
presented by large­scale measurement. 3 Consequently, it is hard to gauge how representative these 
end­to­end findings are for today's Internet. 
As a result, even basic Internet path questions such as ``how often do routes change?'' and 
``how often are packets dropped?'' remain unanswered in any sort of general way. It is towards 
answering these questions that we now embark. 
3 Mogul's study was actually conducted as a link study. Doing so let him observe behavior from a fairly large number 
of Internet paths, albeit ones that all had the single link in common. A drawback of this approach, however, is that it is 
difficult to infer from the perspective of a link the full end­to­end behavior as perceived by the endpoints, an issue we 
discuss further in x 10.4. 

4 
Part I 
End­to­End Routing Behavior in 
the Internet 

5 
Chapter 2 
Overview of the Routing Study 
The large­scale behavior of routing in the Internet has gone virtually without any for­ 
mal study, the exception being Chinoy's analysis of the dynamics of Internet routing information 
[Ch93]. In this part of our thesis we analyze 40,000 route measurements conducted using repeated 
``traceroutes'' between 37 Internet sites. The main questions we strive to answer are: 
1. What sort of pathologies and failures occur in Internet routing? 
2. Do routes remain stable over time or change frequently? 
3. Do routes from A to B tend to be symmetric (the same in reverse) as routes from B to A? 
Our framework for answering these questions is the measurement of a large sample of 
Internet routes between a number of geographically diverse hosts. We argue that the set of routes is 
representative of Internet routes in general, and analyze how the routes changed over time to assess 
how Internet routing in general changes over time. 
We begin by giving an overview of the routing literature in general and, more specifically, 
how routing works in the Internet (Chapter 3). We find that while routing protocols (mechanisms) 
have been heavily studied, the literature offers very few measurement studies of how routing behaves 
in practice. 
We then discuss our experimental methodology (Chapter 4). This includes our measure­ 
ment apparatus, which is the npd ``network probe daemon'' and the traceroute utility for measur­ 
ing Internet paths; the use of exponential sampling, which allows us to apply the PASTA Principle 
[Wo82] as the basis for the generalizations we derive from our measurements; and the use of the 
Fisher's exact test [Ri95] to test for significant differences between different sets of observations. 
We also discuss which aspects of our measurements are plausibly representative of Internet routing 
behavior in general (namely, aggregate observations of Internet paths), and which are not (those 
depending on the behavior of individual sites). 
In Chapter 5 we give an overview of the 37 sites participating in the study, and details of 
the raw data and of the failures encountered when attempting to capture it. We also discuss how we 
assigned geographic locations to all of the 1,531 routers appearing in the paths we measured. 
We performed two separate sets of measurements. The first, R 1 , consisted of 6,991 at­ 
tempted measurements of 689 different paths through the Internet (i.e., distinct source/destination 
pairs). The R 1 measurements were made with an average interval of 1­2 days between samples. 

6 
Upon analyzing the R 1 data, we realized that we could not accurately answer crucial ques­ 
tions regarding routing stability without higher frequency sampling, nor could we unambiguously 
assess routing symmetry without simultaneous measurements of both directions of an Internet path. 
To resolve these difficulties, we conducted a second set of measurements, R 2 , consisting of 37,097 
attempted measurements of 1,056 Internet paths. These measurements were made in two groups, 
one with an average interval of about 2.75 days between samples, and the other with an average 
measurement interval of 2 hours. The latter suffices for accurately assessing routing stability. We 
also paired the bulk (80%) of the measurements, conducting back­to­back measurements of the dif­ 
ferent directions of each Internet path. Pairing allows us to accurately assess routing asymmetries, 
and also to reduce a source of measurement error (x 5.2). 
Before analyzing the data for routing stability and symmetry, we needed to categorize any 
anomalies present in order to prevent them from skewing the analysis. In Chapter 6 we classify a 
number of routing pathologies: 
ffl unresponsive routers, routing loops, routing changes in the middle of measurement, erroneous 
routes, omission of TTL decrement, and infrastructure failures, all of which were rare; 
ffl host and stub network outages, which were fairly common (but for which our samples are 
probably not representative); 
ffl and ``fluttering,'' in which the path rapidly alternated between two different routes. In R 1 , 
fluttering was quite common, and sometimes had great impact on the routes of consecutive 
packets sent by a host. But, like outages, our samples are not persuasively representative, and 
fluttering was rare in R 2 . 
Because R 1 and R 2 were made a year apart, we can analyze the relative prevalence of pathologies 
in each (x 6.10). We find that the likelihood of encountering a major routing pathology more than 
doubled between the end of 1994 (R 1 ) and the end of 1995 (R 2 ), rising from 1.5% to 3.4%. 
After removing anomalous measurements, we analyze the remainder to investigate routing 
stability and symmetry. This analysis is primarily done using the R 2 data, for the reasons given 
above. We begin in Chapter 7 by reviewing the importance of routing stability for a variety of 
network applications. This review reveals that there are two distinct types of stability that are of 
interest. The first is prevalence: whether we are likely to observe the same route in the future as 
at the present. The second is persistence: whether the route we observe at the present is likely to 
remain unchanged for a considerable period of time. 
We show that it is easy to assess routing prevalence, and find that Internet paths are 
strongly dominated by a single prevalent route. But routing persistence is much more difficult 
to assess, because we have no a priori reason for assuming that observing a route at time T 1 and 
then again at time T 2 tells us anything about whether it changed (and changed back) in between 
those two measurements. 
We tackle this difficulty by first analyzing those measurements we made that were spaced 
only minutes apart. Doing so reveals that a minority of the paths have routes that persist for only 
tens of minutes, while the majority persist for significantly longer. After eliminating the quickly 
changing paths, we repeat the analysis at time scales of 1 hour, 6 hours, and days. We find that, at 
each time scale, some paths are prone to changes and others are not. Overall, about two thirds of 
the paths have routes persisting for days or weeks. 

7 
A final question concerning routing stability is how an endpoint can determine that its 
route has changed. We investigate a simple method based on observing changes in the Time To 
Live (TTL) field. We find that this method provides a useful heuristic, having an overall accuracy 
of about 95%, but is prone to false negatives (missing the fact that the route has changed), which 
limits its utility. 
In Chapter 8 we turn to the question of routing symmetry. As with routing stability, we 
first discuss the importance of symmetry for a number of networking applications. We also look at 
different mechanisms that can introduce asymmetry into network routing. Of these, one in particular 
(``hot potato'' routing between different Internet service providers) is expected to grow in the future, 
leading to a greater prevalence of routing asymmetry, and the differences in asymmetry between the 
R 1 and R 2 measurements suggest that this is happening. 
Our first attempt at defining whether two routes are symmetric founders on the difficulties 
of determining whether two Internet addresses do indeed correspond to the same host. In the face 
of this problem, we revise our definition to consider two routes symmetric only if they visit exactly 
the same cities. If two routes are asymmetric according to this definition, then they visit at least one 
different city. Such asymmetries are major because they likely imply different path characteristics, 
such as propagation times and congestion levels. 
We find that half of all Internet paths in R 2 contained a major asymmetry, while only 30% 
in R 1 did. About 20% of the R 2 paths differed in two or more cities, and about 30% differed in the 
autonomous systems they visited. 
The presence of pathologies, short­lived routes, and major asymmetries highlights the 
difficulties of providing a consistent topological view in an environment as large and diverse as 
the Internet. Furthermore, the findings that the prevalence of pathologies and asymmetries greatly 
increased during 1995 show in no uncertain terms that Internet routing has become less predictable 
in major ways. 
A constant theme running through our study is that of widespread diversity. We repeat­ 
edly find that different sites or pairs of sites encounter very different routing characteristics. This 
finding matches that of our previous work [Pa94a], which emphasizes that the variations in Internet 
traffic characteristics between sites are significant to the point that there is no ``typical'' Internet site. 
Similarly, there is no ``typical'' Internet path. But we believe the scope of our measurements gives 
us a solid understanding of the breadth of behavior we might expect to encounter---and how, from 
an endpoint's view, routing in the Internet actually works. 

8 
Chapter 3 
Related Research 
The problem of routing traffic in communications networks has been studied for well over 
twenty years [Sc77, SS80]. The subject has matured to the point where a number of books have been 
written thoroughly examining the different issues and solutions [Pe92, St95, Hu95]. 
A key distinction we will make concerning the study of routing is that between routing 
protocols, by which we mean mechanisms for disseminating routing information within a network 
and the particulars of how to use that information to forward traffic, and routing behavior, meaning 
how in practice the routing algorithms perform. This distinction is important because, while routing 
protocols have been heavily studied, routing behavior has not. 
3.1 Studies of routing protocols 
The literature contains many studies of routing protocols. In addition to the books cited 
above, see, for example, McQuillan et al.'s discussion of the initial ARPANET routing algorithm 
[MFR78] and the algorithms that replaced it [MRR80, KZ89]; the Exterior Gateway Protocol used 
in the NSFNET [Ro82, Re89], and the Border Gateway Protocol that replaced it [RL95, RG95, 
Tr95a, Tr95b]; the related work by Estrin et al on routing between administrative domains [BE90, 
ERH92]; Awerbuch's technique of reducing asynchronous networks to synchronous ones to simplify 
routing algorithms [Aw90]; Perlman and Varghese's discussion of difficulties in designing routing 
algorithms [PV88]; Deering and Cheriton's seminal work on multicast routing [DC90]; Perlman's 
comparison of the popular OSPF and IS­IS protocols [Pe91]; and Baransel et al.'s survey of routing 
techniques for very high speed networks [BDG95]. 
3.2 Studies of routing behavior 
For routing behavior, however, the literature contains considerably fewer studies. Some of 
these studies are based on pure analysis, such as Bertsekas' study of routing dynamics for different 
topologies [Be82]; or on simulation, such as Zaumen and Garcia­Luna Aceves' studies of routing 
behavior on several different wide­area topologies [ZG­LA91, ZG­LA92], and Sidhu et al.'s simu­ 
lation of OSPF [SFANC93]. In only a few studies do measurements play a significant role: Rekhter 
and Chinoy's trace­driven simulation of the tradeoffs in using inter­autonomous system routing in­ 
formation to optimize routing within a single autonomous system [RC92]; Chinoy's study of the 

9 
dynamics of routing information propagated inside the NSFNET infrastructure [Ch93]; and Floyd 
and Jacobson's analysis of how periodicity in routing messages can lead to global synchronization 
among the routers [FJ94]. 
This is not to say that studies of routing protocols ignore routing behavior. But the presen­ 
tation of routing behavior in the protocol studies is almost always qualitative, such as the discussion 
of the poor performance of the original ARPANET routing algorithm [MFR78] or the tendency for 
the revised algorithm to oscillate under heavy load [KZ89]. Even [MRR80], which presents the 
revised algorithm, and notes that to test it the authors subjected the network during off­hours to a 
greater volume of test traffic than users generated during peak hours, discuss this stress­testing in 
general terms, rather than delving into any measurement specifics. 
Of the measurement studies mentioned above, [RC92] and [FJ94] are both devoted to 
examining a tightly focussed question. Only Chinoy's study is devoted to characterizing routing 
behavior in­the­large, and it remains the only formal measurement study of routing in wide­area 
networks of which we are aware. 1 
Chinoy found wide ranges in the dynamics of routing information: For those routers that 
send updates periodically regardless of whether any connectivity information has changed, the vast 
majority of the updates contain no new information. Most routing changes occur at the edges of 
the network and not along its ``backbone.'' Outages during which a network is unreachable from 
the backbone span a large range of time, from a few minutes to a number of hours. Finally, most 
networks are nearly quiescent, while a few exhibit frequent connectivity transitions. 
3.3 End­to­end routing dynamics 
Chinoy's study concerns how routing information propagates inside the network. It is 
not obvious, though, how these dynamics translate into the routing dynamics seen by an end user. 
One of the areas noted by Chinoy as ripe for further study is ``the end­to­end dynamics of routing 
information.'' 
We will use the term path to denote the network­level abstraction of a ``virtual link'' be­ 
tween two Internet hosts. For example, when Internet host A wishes to establish a network­level 
connection to host B, A need not have any knowledge of the routing infrastructure upon which the 
Internet is built. As far as A is concerned, the network layer provides it with a link, or path, directly 
to B. Similarly, B has a path to A. We will sometimes abbreviate the notion of the path from A to 
B as A ) B. 
At any given instant in time, the path A ) B is realized at the network layer by a single 
route, which is a sequence of Internet routers along which packets sent by A and destined for B are 
forwarded. We will refer to a single hop of a particular route for the path as R 1 ! R 2 , indicating 
that after arriving at router R 1 , packets are next forwarded to R 2 . 
The path A ) B may oscillate very rapidly between different routes, or it may be quite 
stable (an issue we explore in Chapter 7). So Chinoy's suggested research area can be viewed as: 
1 Since publishing some of the results from this part of our thesis [Pa96b], we have learned of a very interesting study 
of Internet routing, similar in spirit to that of Chinoy's, by Jahanian, Labovitz and Malan [JLM97]. We will discuss 
this new work in the version of [Pa96b] presently undergoing revision for publication in IEEE/ACM Transactions on 
Networking. We unfortunately learned of the work too late to include discussion of it here. 

10 
given two hosts A and B at the edges of the network, how does the path A ) B between them 
behave over time? This is the question we attempt to answer in our study. 
3.4 Routing in the Internet 
For routing purposes, the Internet is partitioned into a disjoint set of autonomous systems 
(AS's), a notion first introduced in [Ro82]. Originally, an AS was a collection of routers and hosts 
unified by running a single ``interior gateway protocol.'' Over time, the notion has evolved to be 
essentially synonymous with that of administrative domain [HK89], in which the routers and hosts 
are unified by a single administrative authority. Within the domain or AS are one or more routing 
domains, which are hosts and routers that communicate using the same routing protocol. 
Routing between autonomous systems provides the highest­level of Internet interconnec­ 
tion. RFC 1126 [Li89] outlines the goals and requirements for inter­AS routing (of particular interest 
for our study are the goals of infrequent loops and stable routes). [Re95] gives an overview of how 
inter­AS routing has evolved. 
When the NSFNET formed the ``backbone'' of the Internet, inter­AS routing was done 
using the Exterior Gateway Protocol (EGP) [Ro82, Re89]. A major constraint of EGP, however, is 
that it requires a tree­like topology between the AS's (with the NSFNET backbone at the root), and, 
if the topology is violated, loops can result. EGP has since been replaced with the Border Gateway 
Protocol (BGP), currently in its fourth version [RL95, RG95]. BGP is now used between all signif­ 
icant AS's [Tr95a]. BGP removes the EGP topology restrictions, allowing arbitrary interconnection 
topologies between AS's. It also provides a mechanism for preventing routing loops between AS's, 
which we discuss in x 6.3.1 and x 6.3.3. 
The key to whether use of BGP will scale to a very large Internet lies in the stability of 
inter­AS routing [Tr95b]. If routes between AS's vary frequently---a phenomenon termed ``flap­ 
ping'' [Do95]---then the BGP routers will spend a great deal of their time updating their routing 
tables and propagating the routing changes. Daily statistics concerning routing flapping are avail­ 
able from [Me95b] (see also [Co91­95]). 
It is important to note that stable inter­AS routing does not guarantee stable end­to­end 
routing, because AS's are large entities capable of significant internal instabilities. In our study we 
focus on end­to­end routing behavior at the granularity of individual routers, though we also note 
where appropriate how the behavior changes when the granularity is shifted to that of autonomous 
systems (where the route for the path A ) B is viewed as a sequence of AS's rather than a sequence 
of routers). 
One final note: since the publication of Chinoy's study, the Internet has undergone a ma­ 
jor topological and administrative change. Inter­AS routing now uses BGP rather than EGP, as 
discussed above; and the network topology is no longer constrained to a tree with the NSFNET 
backbone at the root, but has switched to a number of commercial network service providers sup­ 
porting a potentially arbitrary interconnection topology. Our measurements spanned this transition, 
with the first dataset taken at the end of 1994, before the NSFNET backbone was decommissioned 
in Spring 1995, while the second was taken at the end of 1995. Thus, our measurements give us an 
opportunity to determine whether Internet routing changed significantly during the year separating 
them. As discussed in x 6.10 and x 8.5, we find significant increases in the prevalence of routing 
``pathologies'' and in routing asymmetry. These changes are not, however, necessarily due to the 

11 
NSFNET transition; in particular, two thirds of the routes measured in the first dataset already did 
not transit the NSFNET, traversing instead commercial providers such as SprintLink and MCINET, 
or networks outside the U.S. 

12 
Chapter 4 
Methodology 
In this chapter we discuss the methodology used to make our routing measurements. We 
begin with the software we used: the npd network probe daemon, the npd control program 
used to drive the measurements, and the traceroute utility for measuring Internet paths. We 
then discuss the utility of sampling at exponentially distributed intervals, including the ``PASTA 
Principle,'' which provides the underlying statistical validity of our measurements. In x 4.4 we then 
address which aspects of our data are plausibly representative of Internet traffic and which are not. 
In our analysis we also attempt to draw some conclusions as to which differences between 
our datasets reflect significant changes in Internet conditions over time. To do so, we give in x 4.5 
an overview of Fisher's exact test for determining whether the frequencies with which a property is 
observed in two different datasets is consistent with the null hypothesis of a single underlying prob­ 
ability of observing the property. If the frequencies observed are inconsistent with this hypothesis, 
then we conclude that the probability of observing the property changed between the two datasets, 
reflecting a corresponding change in Internet conditions. Finally, in order to use Fisher's test, we 
need to make an independence assumption that is not entirely accurate. x 4.6 discusses why this 
assumption remains tenable. 
4.1 Experimental apparatus 
We conducted our experiment as follows. First we recruited a number of Internet sites 
(detailed in Tables I and II) to participate in the study. Each site ran a ``network probe daemon'' 
(npd) that provides measurement services, as described in the Appendix. To measure the route 
from Internet host A to host B, a program called npd control, running on our local workstation, 
would connect to npd on host A and request that it trace the route to host B using traceroute. 
The npd on A would then do so and send the results back to npd control. In this fashion, we 
could run a single script on our local workstation to orchestrate any number of simultaneous route 
measurements. The script (which we programmaticly generated) would run npd control in the 
background to conduct a single measurement, sleep until the time for the next measurement, run an­ 
other npd control in the background to conduct that measurement, and so on. Each measurement 
comprised a single traceroute from a randomly selected site to another randomly selected site. 
This setup gave our experiment a single point of failure, namely our local workstation, 
but also the benefit of a single point of administration, which greatly simplified the task of keeping 

13 
the experiment running correctly as we added new participating sites. Fortunately, during the en­ 
tire measurement period the workstation never crashed or required rebooting, so the measurements 
proceeded uninterrupted. 
For our first set of measurements, termed R 1 , we tuned the script driving the measure­ 
ments so that each site would measure routes at an average rate of one every two hours, to minimize 
network load. Two exceptions were the austr and korea sites. They instead made measurements 
at lower rates of one every four hours and one every eight hours, in deference to the heavily loaded 
trans­Pacific network links that their traffic had to cross. 
While using the same rate for each site meant each site had a consistent measurement load, 
as we added new participating sites to our study, the sampling rate of pairs of sites decreased. This 
inhomogeneity, however, does not present any particular difficulties for our sampling methodology, 
a point we address in x 4.3. 
For the second set of measurements, R 2 , we made measurements at two different, fixed 
rates. The majority (60%) of the measurements were made with a mean inter­measurement interval 
of 2 hours, while the remainder were made with a mean interval of about 2.75 days. The bulk of 
the R 2 measurements were also paired, meaning we would measure the path A ) B and then 
immediately measure the path B ) A. We discuss the reasons for these changes in methodology 
in x 7.4 and x 8.4. 
4.2 The traceroute Utility 
Traceroute is a program written by Van Jacobson to trace the different hops comprising 
a route through the Internet [Jac89]. In this section we discuss the operation of the tool, as its 
particulars have direct impact on our routing measurements. 
4.2.1 The Time To Live field 
All packets sent using the Internetwork Protocol (IP) contain in their headers a Time To 
Live (TTL) field [Po81a]. In the original IP design, this field was meant to limit the amount of time 
that a packet could exist inside the network, to prevent packets from endlessly circulating around 
routing loops (and eventually clogging up the entire network). The TTL header field is 8 bits 
wide and is interpreted as the time in seconds remaining until the packet must be discarded. Each 
internetwork router must decrement the field by the amount of time required to process the packet 
(including queueing), or by 1 second, whichever is larger. Thus, the TTL limits packets to at most 
255 hops through the network, 1 and a lifetime of at most 255 seconds. 
If upon decrementing the TTL field a router observes that the TTL has reached zero, then 
it must not forward the packet but instead discard it as being too old. When it discards a packet for 
this reason, it must 2 then send back an Internet Control Message Protocol (ICMP; [Po81b]) message 
informing the sender of the packet that it was dropped due to an expired lifetime. 
1 This is plenty in today's Internet. Routes of more than 30 hops are rare (x 6.7.5). But if much longer routes became 
commonplace, then the limited size of the TTL field could render parts of the Internet unable to communicate with other 
parts. 
2 This ``must'' is actually a very strong ``should.'' [Ba95] states that the router must generate the message, but can 
provide a per­interface option to disable generation, provided the option defaults to generation enabled. 

14 
While the original IP standard states that TTL is a time [Po81a], in reality virtually all 
Internet routers only decrement the TTL by 1 per hop, regardless of the processing time, often 
for reasons of performance. Acknowledging this de facto behavior, the current standard for Internet 
routers only requires that routers decrement the TTL by 1 per hop, while allowing them the option to 
decrement by more to account for processing time [Ba95]. Part of the motivation for this relaxation 
of the TTL requirement is to aid the workings of traceroute. 
4.2.2 How traceroute works 
The heart of traceroute is clever exploitation of the TTL field, as follows. To trace the 
route to a remote host H , traceroute first constructs a packet with H as its destination but with 
the TTL field initialized to 1. When this packet reaches the first hop in the path to H , the router 
decrements the TTL field, notices that it is zero, and sends back an ICMP message to this effect. 
The ICMP message includes in its own header the address of the router sending the message, which 
lets traceroute identify the hop 1 router as that address. 
Traceroute then sends a packet to H with the TTL field initialized to 2, and, similarly, 
gets back an ICMP message identifying the hop 2 router. It proceeds in this fashion until it receives 
a reply from H itself, and at that point it has elucidated the entire path to H . (Note that it has not 
also elucidated the path from H to the host running traceroute. The two are not necessarily the 
same, as we demonstrate in Chapter 8.) 
We will refer to the packets traceroute sends with adjusted TTL's as probes, and those 
with an initial TTL of n as ``hop n'' probes. Here is an example of the output from traceroute, 
tracing the path from a host at the University of Colorado at Boulder (ucol, as explained in Table I) 
to one at the San Diego Supercomputing Center (sdsc). 
traceroute to rintrah­fddi.sdsc.edu (198.17.46.57), 
30 hops max, 40 byte packets 
1 128.138.209.2 2 ms 2 ms 2 ms 
2 128.138.138.1 14 ms 4 ms 3 ms 
3 144.228.73.113 44 ms 39 ms 53 ms 
4 144.228.73.82 218 ms 207 ms 147 ms 
5 134.24.66.100 234 ms * 85 ms 
6 198.17.46.57 85 ms 63 ms 67 ms 
By default, traceroute sends three probes for each hop. The probes are sent serially, each waiting 
until traceroute receives an answer for the previous one. For each hop, traceroute reports the 
number of hop, the IP address of the corresponding router, and the time in milliseconds it took to 
receive the reply. We note, however, that these times are often exceptionally noisy, because part of 
the total round­trip time includes the delay incurred at the router in generating an ICMP response 
to the exceptional event of an expired TTL. This delay can be quite large of the router is busy with 
other, higher priority tasks. 
A reply time of ``*,'' such as shown for hop 5, corresponds to a lost packet. Either the 
traceroute probe or the corresponding ICMP message was dropped by the network (or perhaps 
the ICMP message was not generated---see x 6.1, and also below). Traceroute waits 5 seconds 
for a reply before deciding that it will not be getting one. 3 
3 Most versions of the traceroute documentation erroneously give this time as 3 seconds. 

15 
The first line of the output indicates ``30 hops max,'' meaning that traceroute will 
stop sending probes after trying to elicit the 30th hop. This behavior is important because, as 
we will see in x 6.3.1, the Internet sometimes contains routing loops that would allow packets to 
circulate all the way up to the maximum of 255 hops, wasting considerable network resources. For 
our study we always used the default of 30 hops maximum (only very rarely did this prevent us 
from measuring the full path between sites in our study; see x 6.7.5), and the default of three probes 
per hop. 
We can translate the IP addresses to hostnames in order to visualize the route more clearly: 
1 cs­gw­discovery.cs.colorado.edu 2 ms 2 ms 2 ms 
2 cu­gw.colorado.edu 14 ms 4 ms 3 ms 
3 sl­ana­3­s2/4­t1.sprintlink.net 44 ms 39 ms 53 ms 
4 sl­univ­ca­1­s0­t1.sprintlink.net 218 ms 207 ms 147 ms 
5 sdsc­ucop­mci.cerf.net 234 ms * 85 ms 
6 rintrah.sdsc.edu 85 ms 63 ms 67 ms 
We see that the first two hops occur inside the University of Colorado at Boulder; then the packets 
are forwarded on to SprintLink, traveling first to Anaheim, CA, then up to Oakland, California 
(the University of California Office of the President), and finally back down along CERFNET to 
San Diego. 
4.2.3 Traceroute limitations 
When using traceroute there are several limitations and measurement difficulties that 
one must bear in mind. In the previous section we showed an example of a traceroute from 
Colorado to San Diego that went quite smoothly, suffering only a single packet loss. In contrast, 
consider the following traceroute, between the same two hosts: 
traceroute to rintrah.sdsc.edu (198.17.47.57), 
30 hops max, 40 byte packets 
1 128.138.209.2 10 ms 0 ms 0 ms 
2 128.138.138.1 0 ms 0 ms 0 ms 
3 129.19.248.61 10 ms 129.19.254.45 10 ms 129.19.248.61 30 ms 
4 192.52.106.1 60 ms 60 ms 70 ms 
5 140.222.96.4 60 ms * 50 ms 
6 140.222.88.1 70 ms 60 ms 60 ms 
7 140.222.8.1 60 ms 50 ms 60 ms 
8 140.222.16.1 70 ms 70 ms 70 ms 
9 140.222.135.1 60 ms 70 ms 70 ms 
10 198.17.47.2 4720 ms !H * 5100 ms !H 
Here are the corresponding hostnames: 
traceroute to rintrah.sdsc.edu (198.17.47.57), 
30 hops max, 40 byte packets 
1 cs­gw­discovery.cs.colorado.edu 10 ms 0 ms 0 ms 
2 cu­gw.colorado.edu 0 ms 0 ms 0 ms 
3 129.19.248.61 10 ms ncar­cu.co.westnet.net 10 ms 129.19.248.61 30 ms 
4 enss.ucar.edu 60 ms 60 ms 70 ms 
5 t3­3.cnss96.denver.t3.ans.net 60 ms * 50 ms 

16 
6 t3­0.cnss88.seattle.t3.ans.net 70 ms 60 ms 60 ms 
7 t3­0.cnss8.san­francisco.t3.ans.net 60 ms 50 ms 60 ms 
8 t3­0.cnss16.los­angeles.t3.ans.net 70 ms 70 ms 70 ms 
9 t3­0.enss135.t3.ans.net 60 ms 70 ms 70 ms 
10 enss.sdsc.edu 4720 ms !H * 5100 ms !H 
The first thing we notice is that this route is longer than the previous one, and more circuitous, travel­ 
ing over ANSNET (instead of SprintLink) through Denver and Seattle before arriving in California. 
We also notice that the router at hop 3, 129.19.248.61, does not have a correspond­ 
ing hostname registered in the Domain Name System (DNS; [MD88]). While most routers have 
hostnames associated with their IP addresses, we found that not all do. In this case, we could 
identify the router's location from its network prefix (129.19), as Colorado State University in 
Boulder, Colorado. 
Furthermore, for hop 3 traceroute reports not just one IP address but multiple ad­ 
dresses. What happened was that the first hop 3 probe was routed via the router with IP address 
129.19.248.61, while the second one went via a different router, 129.19.254.45 (this one has 
a hostname, ncar­cu.co.westnet.net). The third one went via the same router as the first one, 
129.19.248.61. Routing variation such as this can occur due to ``load balancing,'' in which the 
upstream router (hop 2 in this case) alternates the downstream links it uses to forward packets in an 
effort to spread load among them and avoid overloading either one. We investigate the effects of 
such routing, which we term ``fluttering,'' in detail in x 6.6. 
Hop 3 also illustrates the more general principle that packets do not always take the same 
route. It also can be difficult to determine whether two routes are equivalent. For example, it may 
be that 129.19.248.61 is indeed an interface on the same ncar­cu.co.westnet.net router, 
but one that happens not to have a hostname associated with it. Or it may be a physically distinct 
router. 
Because Internet routes can change between successive probe packets, we need to 
also realize that we have no guarantee that probes of different hops take the same route 
as previous probes. For example, from the above we might conclude that the first hop 3 
probe took the route cs­gw­discovery.cs.colorado.edu ! cu­gw.colorado.edu ! 
129.19.248.61, and the second took the route cs­gw­discovery.cs.colorado.edu ! 
cu­gw.colorado.edu ! ncar­cu.co.westnet.net. But for all we know the upstream route 
could have changed between the end of the hop 2 probes and the beginning of the hop 3 probes, and 
the hop 3 packets may have been routed via Alaska at the first two hops! The only ``guarantees'' we 
can have that the route has not changed are: (1) consistency with other measurements of the same 
path (for example, in multiple measurements we always see the same routers for hop 2 and hop 3), 
and (2) self­consistency within the route. For example, if we find that hop n + 1 is geographically 
distant from hop n, and we know the network lacks a link between those two locations, then we 
would conclude that a routing change occurred upstream from hop n + 1. Some examples of this 
behavior are given in x 6.5 and x 6.6.1. 
In general, if a route appears self­consistent and shows no sign of multiple routing for 
any of its hops, then we assume that it is indeed self­consistent, and treat the route as a valid 
measurement of the path to the remote host. 
Another anomaly to discuss in the example above is the 10th hop: 
10 enss.sdsc.edu 4720 ms !H * 5100 ms !H 

17 
Here ``!H'' indicates that traceroute received an ICMP ``Host unreachable'' message from the 
router enss.sdsc.edu. This means that the router knows that the host cannot be presently reached. 
Another diagnostic traceroute can generate is ``!N,'' indicating that it received an ICMP ``Net­ 
work unreachable,'' the counterpart message indicating an entire network is unreachable (e.g., due 
to a failed link). We observed only two of these in all of our measurements. 
Note also that the 3rd probe packet reports a round­trip time (RTT) of 5,100 msec, even 
though traceroute supposedly only waits 5 seconds to receive a reply. Traceroute's timer, 
however, is not fine­grained, so due either to the timer's granularity, or to delays in scheduling the 
traceroute process for execution, traceroute received the reply before it decided to time out 
the probe. 
Another limitation to keep in mind is that traceroute elicits the route as seen at the IP 
network layer. Each hop reported gives the next IP router in the path from the source to the destina­ 
tion. Often, IP routers are connected to one another using simple ``link layer'' technologies such as 
Ethernets or point­to­point links, with trivial topologies. Increasingly, however, the link layer tech­ 
nologies, for example ATM or Frame Relay, themselves have more complicated topologies, and are 
capable of routing packets within a link layer mesh that itself has multiple hops. traceroute can­ 
not measure routing at this layer, because the TTL mechanism (x 4.2.2) is present only at the higher 
(IP) layer. For example, in our second dataset we found a route with the following two successive 
hops: 
gw1.scl1.alter.net 
107.hssi4/0.gw1.mia1.alter.net 
The first hop is in Santa Clara, California, and the second in Miami, Florida. It turns out that there 
is no direct physical connection between these two routers, but rather a Frame Relay mesh [Lid96], 
a fact that we could not have surmised from the traceroute measurement of the route. 
Another potential source of measurement error arises in older (4.3 BSD­derived) routers 
incorrectly setting the TTL in their ICMP replies. As explained in the traceroute documentation 
([Jac89]), these routers would erroneously use for the ICMP reply the TTL of the incoming packet 
that triggered the reply. For traceroute probes, this is a disaster, because the reply being triggered 
is precisely ``TTL expired,'' so the ICMP replies would be sent back using a TTL of 0, too (and thus 
never reach us). Since such routers consistently fail to return an ICMP reply to the sender, they are 
a form of ``unresponsive'' router, for which we analyze our measurements in x 6.1. 
A more subtle measurement problem occurs due to routers that are configured to rate 
limit generation of ICMP messages. For example, some routers will send at most one ICMP mes­ 
sage each second. Such behavior is specifically encouraged in x4.3.2.8 of [Ba95], as a means of 
conserving both network bandwidth and router resources. In x 6.2 we analyze our measurements 
for the presence of rate­limiting routers, and find that, in general, only endpoint hosts (and not 
routers internal to the Internet) appear to be presently limiting their ICMP generation rate. 
Another issue regarding traceroute concerns its use of the User Datagram Protocol 
(UDP; [Po80]). In order to associate the ICMP replies it receives with the probe packets it previously 
generated, traceroute must construct packets that manage to record identifying information in 
just the first 8 bytes of the transport layer header, as that is all of the original packet returned in an 
ICMP message. It does this by using for its probe a UDP packet, which it sends to a (hopefully) non­ 
existent port on the remote host H . The information traceroute needs to record the identifying 
information is coded in the port number in the UDP header. 

18 
Some network sites, however, have ``firewalls'' in place to filter incoming network traffic 
for security purposes [CB94]. These firewalls may decide that the incoming UDP packet does not 
appear destined to any of the services the site wishes to make publicly available to the Internet, so 
the firewall drops the packet without returning an ICMP Time Exceeded message. Thus, firewalls 
can generate an effect similar to lost packets (traceroute never receives a reply for a given hop, 
or beyond it). It was easy to identify such sites, as traceroutes to them consistently stopped short 
at the same router (x 6.7.4). For our analysis of the data, we considered any traceroute reaching 
a firewall router as having successfully reached the host. 
Traceroute's use of UDP packets raises another measurement issue. When 
traceroute traces the route to an IP address A, it determines that it has elicited the full route 
whenever it receives a ``UDP Port Unreachable'' ICMP reply, even if the reply did not come from a 
router identifying itself as address A. Some hosts (and indeed all routers) have multiple IP addresses 
associated with them, so it is possible when tracing the route to address A to receive a reply from 
address B. When this happens, it indicates that A and B are both addresses for the same host (even 
though their associated hostnames might not reveal this). 4 
It is sometimes possible to use this traceroute feature to determine whether 
two IP addresses correspond to the same host. For example, the name associated with 
134.55.12.231 is llnl3­e­stub.es.net, while the name associated with 134.55.6.71 is 
llnl­lc3­3.es.net. Both of these names have DNS ``A'' records for the corresponding ad­ 
dresses, and no extra records, so a priori we might assume that the two addresses/hostnames refer 
to two separate machines. However, depending on the state of ESNET routing, it is possible for a 
traceroute to llnl3­e­stub.es.net to be ``answered'' by llnl­lc3­3.es.net, indicating that 
they are indeed the same machine. This test is not guaranteed to work, though. It depends on the 
machine's algorithm for deciding what IP address to put in its ICMP reply, and on which interface 
the incoming UDP probe packet arrives (which in turn depends on the current routing). 
4.3 Exponential sampling 
We use the term ``measurement'' to denote the full process of running the traceroute 
utility; that is, the attempted tracing of the entire route between a source host and a destination host. 
In our experiment we devise our measurements of Internet routes so that the time intervals between 
consecutive measurements are independent and exponentially distributed. 
Using independent and exponentially distributed intervals between measurements gains 
two important (and related) properties. The first is that the measurements correspond to additive 
random sampling [BM92]. Such sampling is unbiased because it samples all instantaneous signal 
values with equal probability. 
The second important property is that the measurement times form a Poisson process. 
This means that Wolff's PASTA principle---``Poisson Arrivals See Time Averages''---applies to our 
measurements: asymptotically, the proportion of our measurements that observe a given state is 
equal to the amount of time that the Internet spends in that state [Wo82]. 
4 Note also that sometimes the route to address A is different than the route to address B! For our measurements, 
this only occurred for mbone.ucar.edu, for which the route to one of its addresses is one hop longer (and a 
strict superset) of the route to the other address. We accommodated this difference in our analysis by considering a 
traceroute that reached the endpoint of the shorter route as having traveled successfully to the host. 

19 
Two important points regarding Wolff's theorem are (1) the observed process does not 
need to be Markovian; and (2) the Poisson arrivals need not be homogeneous 5 [Wo82, x 3]. This 
second point is particularly important for our study, because our measurement rate varied, as dis­ 
cussed in x 4.1. 
The only requirement of the PASTA theorem is that the observed process cannot antici­ 
pate observation arrivals. For any interarrival distribution other than independent exponentials, the 
process can anticipate observation times to some degree because the instantaneous probability of 
an arrival changes with the length of time since the last observation. For the exponential distribu­ 
tion, however, the probability remains constant, a consequence of the distribution's ``memoryless'' 
property. Thus, the theorem fundamentally requires independent exponential intervals between 
measurements, which argues strongly for the use of exponential sampling in practice. 
There is one respect in which our measurements fail the ``lack of anticipation'' require­ 
ment. Even though we schedule our observations to come at independent, exponentially distributed 
intervals, the network can anticipate arrivals to a certain extent. In particular, when the network has 
lost connectivity between the site running npd control (x 4.1) and a site potentially conducting 
a traceroute, the network can predict that no measurement will occur. Thus, while the times at 
which we attempted to measure the network satisfy the PASTA requirements, the times for which 
we successfully measured the network do not in this regard. The effect of this imperfect sampling is 
a tendency to underestimate the prevalence of network connectivity problems, as discussed further 
in x 5.2. 
The main use we make of the PASTA theorem is as follows. If we make n observations of 
Internet routing, of which k find state S and n 
can apply a Fisher's exact test (x 4.5) to test for significant differences 
among sets of observations. We discuss this independence assumption further in x 4.6. 
4.4 Which observations are representative? 
In this section we discuss what sort of observations we can make of the Internet for which 
our samples are plausibly representative of Internet behavior in general, and those for which we 
would not consider our samples representative. 
37 Internet hosts participated in our routing study. This is a miniscule fraction of the 
estimated 6.6 million Internet hosts as of July, 1995 [Lo95], so clearly the behavior we observe that 
is due to the particular endpoint hosts in our study is not representative. 6 
The 37 endpoint hosts were from 34 different networks, again a miniscule fraction of the 
more than 50,000 known to the NSFNET in April, 1995 [Me95a]. So, again, any behavior we ob­ 
serve due to the particular endpoint (``stub'') networks in our study is not persuasively representative. 
On the other hand, we argue that the routes between the 37 hosts are plausibly repre­ 
sentative, because they include a non­negligible fraction of the autonomous systems (AS's) which 
5 That is, the arrival rate can vary over time, as long as the interarrival distribution remains exponential and the arrivals 
remain independent of each other and of the observed process. 
6 Furthermore, the sites were self­selected (usually, though not always, because someone at the site had an interest in 
wide­area networking) and skewed to universities. 

20 
together comprise the Internet. Recall that AS's are administrative entities that manage routing for 
a collection of networks, using unspecified protocol(s), and that routing between AS's is done using 
the Border Gateway Protocol. We expect the different routes within an AS to have similar char­ 
acteristics (e.g., prevalence of pathologies, or routing stability), because they fall under a common 
administration. We therefore argue that sampling a significant number of AS's lends representa­ 
tional weight to a set of measurements. 
To determine the number of AS's in the Internet, we proceeded as follows. In Jan­ 
uary, 1996, we obtained a BGP routing table dump from the AS border router kasina.sdsc.edu, 
located at the San Diego Supercomputer Center (SDSC) 7 . The routing table lists all the destinations 
(networks, more or less) known to the router, i.e., its view of the Internet. For each of those destina­ 
tions, the table includes a list of AS's over which routing information for the destination traveled to 
kasina.sdsc.edu. The view of Internet routing given directly by this table is skewed by SDSC's 
particular location in the Internet. However, virtually all of the routing reflects disparate AS's con­ 
necting to SDSC's network service provider, MCI, at many different points. So, if we exclude MCI 
itself from our subsequent analysis, then the remainder of the routing gives us a much broader view, 
namely that seen by MCI at its many interconnection points. 
All in all, the routes in the table included 1,031 AS's for 33,824 distinct destinations. 
From this we estimate that the Internet presently has about 1,000 active AS's. (As of August, 1995, 
about 6,600 had been assigned [DISA95].) The routes in our study traversed 85 of these, or about 
8%. 
An important point, however, is that not all AS's are equal---some are much more promi­ 
nent in Internet routing than others. We devised a ``weight'' to associate with AS's as follows. For 
each AS, we counted the number of times it occurred in the BGP table in a path to a remote des­ 
tination. The AS's weight then is the ratio of the number of its occurrences to the total number of 
occurrences of any AS. 8 
The weights obtained in this fashion are skewed towards the view of the Internet as seen 
by SDSC, and indeed two AS's had weight 25%: AS 145 (``NSFNET­CORE'') and AS 3561 (``MCI­ 
RESTON''), because virtually every route known to the SDSC router goes through these two. But 
the next AS has a weight of only 5% (AS 1239, ``SprintLink''), because the majority of the routes do 
not go through it. So we adjusted for the SDSC­skewed perspective by removing the first two AS's 
from the set and recomputing the weights. After this adjustment, we find that the AS's sampled 
by the routes we measured represent, by weight, about 52% of the Internet routes. We take this as 
an indication that we did indeed sample a significant subset of the large­scale variation in Internet 
routes, and our observations of those routes are plausibly representative of Internet routing as a 
whole. 
4.5 Testing for significant differences 
Because we have measurements taken at two points in time---the end of 1994 and the end 
of 1995---we have an opportunity to assess a number of aspects of the measurements in the two 
datasets for the degree to which they reflect significant differences. We can then interpret these 
7 Many thanks to Hans­Werner Braun of SDSC for suggesting and facilitating this. 
8 Better would probably be to weight by traffic volume. Unfortunately, the statistics necessary for doing so are not 
available. 

21 
differences (or lack of differences) as indicating how the Internet changed (remained unchanged) 
over the course of 1995. While having just two points in time offers only the most crude form of 
trend, it is still far better than simply assuming that characteristics of the Internet do not change, 
particularly given evidence of major changes over time as discussed in our previous work [Pa94b, 
Pa94a]. 
The potential changes we will attempt to assess concern the frequency with which we 
observe different Internet phenomena (for example, routing loops). Suppose that, out of two repre­ 
sentative samples from R 1 and R 2 of n 1 and n 2 observations, respectively, we find that subsets of 
size k 1 and k 2 exhibit some property P . We wish to gauge whether finding k 1 instances of P out 
of n 1 samples in R 1 is statistically consistent with finding k 2 instances out of n 2 samples in R 2 . 
If consistent, then we do not have evidence of a significant change between R 1 and R 2 . But if the 
findings are inconsistent, then we interpret the difference as due to a change in the prevalence of P: 
either the likelihood of P increased during 1995, if k 2 
n2 ? k 1 
n1 , or decreased, if k 2 
n2 ! k 1 
n1 . 
To test for statistically significant differences, we use Fisher's exact test. The discussion 
of the test we now present follows that of Rice [Ri95]. Let K 1 denote a random variable giving the 
number of instances of P observed in R 1 , N 1 the total number of observations in R 1 , and K 2 and 
N 2 the same for R 2 . Let K = K 1 + K 2 and N = N 1 + N 2 correspond to the totals across both 
datasets. 
The key observation of Fisher's test is that, if the likelihood of observing P is the same 
in the two datasets, then we can view the problem as: for K total instances of P out of N obser­ 
vations, how likely is it that K 1 of them would have fallen into R 1 , given that R 1 comprises N 1 
observations? With this rephrasing of the problem, we have that 
P [K 1 = k 1 jN 1 = n 1 ; K = k; N = n] = 
of Eqn 4.1 corresponds to the number of ways that k instances of P can be dis­ 
tributed, among a partition of n total observations, into two sets of n 1 and n 2 = n 
es the probability of observing k 1 instances in R 1 , given the 
size of R 1 , the total number of instances of P , the size of the combined sample pool, and the null 
hypothesis that R 1 and R 2 are constructed using independent draws without replacement from the 
combined sample pool. 
Armed with Eqn 4.1 for the probability of observing exactly k 1 instances, we can then 
construct a rejection region corresponding to values of k 1 that we would be unlikely to observe 
if the null hypothesis is indeed correct. We use a two­sided region, meaning that it includes both 
values of k 1 that are too low to be likely, and values that are too high. To construct the region, we 
find the maximum k l and minimum k u for which 
P [K 1 Ÿ k l jN 1 = n 1 ; K = k; N = n] Ÿ ff 
2 
P [K 1 – k u jN 1 = n 1 ; K = k; N = n] Ÿ ff 
2 
: 
Given these values, we then have 
P [K 1 Ÿ k l or K 1 – k u jN 1 = n 1 ; K = k; N = n] Ÿ ff: 

22 
So, given the null hypothesis, K 1 will fall into the rejection region by chance with probability ff or 
smaller. By using ff = 0:05, using this test we will erroneously reject the null hypothesis at most 
5% of the time. Consequently, if K 1 falls into the rejection region, we conclude with confidence 
95% that the null hypothesis is incorrect, and indeed there was a significant change in the prevalence 
of P between R 1 and R 2 . 
All that remains to use this test is to specify how to find k l and k u . For a given Ÿ, we have 
P [K 1 Ÿ ŸjN 1 = n 1 ; K = k; N = n] = 
/ 
n 
k 
! ! 
; 
where n 2 = n 
4.6 A note on independence 
The argument in the previous section assumes that our measurements are observing inde­ 
pendent events. This is not quite true for our measurements. Using Poisson sampling means that 
the measurement arrivals are independent. However, the observations themselves (what each inde­ 
pendently scheduled measurement observes) are not independent: any temporal correlations in the 
observed process will be faithfully reflected in the observations. 
However, we will be applying the methodology in x 4.5 to rare events, such as the obser­ 
vation of pathological routing conditions. These rare events are generally not clustered in time, so 
the approximation that observations of them are independent is a good one. 
9 Here and in the equation, the min and max operators are to exclude values of Ÿ that are impossible because they 
require more than n2 instances of P in the second set of the partition, or fewer than 0. 

23 
Chapter 5 
The Raw Routing Data 
In this chapter we discuss the sites that participated in our routing experiments, the dura­ 
tion of the experiments, and the preliminary reduction of the raw data we gathered. 
5.1 Participating sites 
The first routing experiment began the evening of Tuesday, November 8, 1994, and lasted 
until the morning of Saturday, December 24. During this time, we attempted 6,991 traceroutes 
between 27 sites. We refer to this collection of measurements as R 1 (dataset #1). We will often 
refer to a single such measurement as a ``traceroute.'' 
The second experiment began the morning of Friday, November 3, 1995, and lasted until 
the afternoon of Thursday, December 21. It included 37,097 attempted traceroutes between 
33 sites. We refer to this collection of measurements as R 2 . Details of the measurements and the 
sampling intervals are discussed in x 4.1. Both R 1 and R 2 are publicly available from the Internet 
Traffic Archive, at: 
http://www.acm.org/sigcomm/ITA 
under the name NPD­Routes. 1 
Table I lists the sites participating in R 1 , giving the abbreviation we will use to refer to 
each site, the site's Internet domain, the number of days it participated in the study, a brief descrip­ 
tion of the site, and its location. These sites also participated in R 2 , except for batman, korea, 
usc, and xor. Table II lists the additional sites participating in R 2 . In R 2 , all sites participated at 
least a month, except for ukc, which participated for 23 days, and 13 of the sites participated for 
the maximum of 48 days. 
The sites include educational institutes, research labs, network service providers, and 
commercial companies, in 9 countries. Figures 5.1 and 5.2 show the geographic locations of the 
North American and European sites. 
1 At the time of this writing, the Archive is moving from its old location to the above URL. If the reader has any 
difficulty accessing the Archive, send email to vern@ee.lbl.gov. 

24 
Name Domain Days Description Location 
austr mu.oz.au 24 University of Melbourne Melbourne, Australia 
batman batman.net 11 Experimental ATM network 
at National Center for At­ 
mospheric Research 
Boulder, CO 
bnl bnl.gov 37 Brookhaven National Lab Brookhaven, NY 
bsdi bsdi.com 9 Berkeley Software Design, 
Inc. 
Colorado Springs, CO 
connix connix.com 22 Caravela Software Middlefield, CT 
harv harvard.edu 9 Harvard University Cambridge, MA 
inria inria.fr 9 INRIA Sophia, France 
korea postech.ac.kr 36 Pohang Institute of Science 
and Technology 
Pohang, South Korea 
lbl lbl.gov 45 Lawrence Berkeley Lab Berkeley, CA 
lbli lbl.gov 45 LBL home computer con­ 
nected via ISDN 
Berkeley, CA 
mit mit.edu 21 Massachusetts Institute of 
Technology 
Cambridge, MA 
ncar ucar.edu 22 National Center for Atmo­ 
spheric Research 
Boulder, CO 
nrao cv.nrao.edu 44 National Radio Astronomy 
Observatory 
Charlottesville, VA 
oce oce.nl 19 Oce­van der Grinten Venlo, The Netherlands 
pubnix va.pubnix.com 11 Pix Technologies Corp. Fairfax, VA 
sdsc sdsc.edu 24 San Diego Supercomputer 
Center 
San Diego, CA 
sri sri.com 9 SRI International Menlo Park, CA 
ucl ucl.ac.uk 24 University College London, U.K. 
ucol colorado.edu 45 University of Colorado Boulder, CO 
ukc ukc.ac.uk 24 University of Kent Canterbury, U.K. 
umann uni­mannheim.de 19 University of Mannheim Mannheim, Germany 
umont umontreal.ca 15 University of Montreal Montreal, Canada 
unij kun.nl 9 University of Nijmegen Nijmegen, 
The Netherlands 
usc usc.edu 45 University of Southern 
California 
Los Angeles, CA 
ustutt uni­stuttgart.de 16 University of Stuttgart Stuttgart, Germany 
wustl wustl.edu 33 Washington University St. Louis, MO 
xor xor.com 30 XOR Network Engineering East Boulder, CO 
Table I: Sites participating in first experiment (R 1 ) 

25 
Name Domain Description Location 
adv advanced.org Advanced Network & Services Armonk, New York 
austr2 newcastle.edu.au University of Newcastle Newcastle, Australia 
mid mid.net MIDnet Lincoln, Nebraska 
near near.net NEARnet Cambridge, Massachusetts 
panix nyc.access.net Public Access Networks 
Corporation 
New York, New York 
rain rain.net RAINet, Inc. Portland, Oregon 
sandia ca.sandia.gov Sandia National Laboratories Livermore, California 
sintef1 sintef.no University of Trondheim Trondheim, Norway 
sintef2 sintef.no University of Trondheim Trondheim, Norway 
ucla ucla.edu University of California Los Angeles, California 
Table II: Additional sites participating in second experiment (R 2 ) 
korea 
austr2 
austr 
connix 
lbl, lbli 
usc,ucla 
mid 
rain 
adv 
harv, 
near, mit 
wustl 
umont 
sri 
bsdi 
bnl 
pubnix 
sandia 
sdsc 
ucol, ncar, xor, batman 
nrao 
panix 
Figure 5.1: Sites participating in routing study, North America and Asia 

26 
sintef1 
sintef2 
inria 
ucl 
ustutt 
umann 
unij 
oce 
ukc 
Figure 5.2: Sites participating in routing study, Europe 

27 
Experiment 1 Experiment 2 
Status # % # % 
Unable to contact daemon 495 7.1% 1,872 5.0% 
Daemon configuration error 25 0.4% 15 0.04% 
Host lookup failure 12 0.2% 101 0.3% 
Total failures 532 7.6% 1,988 5.4% 
Total successes 6,459 92.4% 35,109 94.6% 
Total 6,991 100.0% 37,097 100.0% 
Table III: Summary of routing experiment difficulties 
5.2 Measurement breakdown 
In the two experiments, between 5--8% of the traceroutes failed outright (i.e., we 
were unable to contact the remote npd, execute traceroute and retrieve its output). As shown 
in Table III, almost all of the failures were due to an inability of the npd control process to 
contact the remote daemon. Some of these were failures involving lbli; that site, due to its ISDN 
link frequently being down (x 6.7.4), was often unreachable. But for most of the failures we do not 
a priori know whether they represent the remote host being down or an Internet connectivity failure. 
It is important to note that, if the latter was frequently the case, then to some degree the assumptions 
behind PASTA are invalid, since an agent at the remote site with knowledge of current connectivity 
problems could reliably predict no sampling would occur in the near future (x 4.3). 
For our analysis, the effect of these failures to contact the remote daemon (npd) will lead 
to a bias towards underestimating Internet connectivity failures, because sometimes the failure to 
contact the remote daemon will result in losing an opportunity for a traceroute experiment to 
reveal the lack of connectivity between that site and another remote site that shares the same path 
as used between npd control and the daemon. 
When taking the R 2 measurements, however, we somewhat corrected for this underesti­ 
mation by pairing each measurement of the path A ) B with a measurement of the path B ) A. 2 
If npd control was unable to reach one of either A or B, it still attempted to contact the other to 
measure the reverse route. In those circumstances where it was able to measure the reverse route, it 
still had an opportunity to observe the routing fault, if present in both directions. 
npd control was unable to reach one of either A or B 1,872 times. It was unable to 
contact the other host of the measurement pair, either, in only 5% of these instances. Thus, for the 
most part, the R 2 measurements do not suffer from bias in observing bidirectional routing faults. 
We could further reduce this measurement problem by introducing a ``batch'' design to 
npd, where the daemon would accept a list of measurements it should make at future points in time, 
and would email back the results when they were complete. We did not adopt this approach because 
one of our goals in the design of npd was to keep it simple enough that sites volunteering to run it 
could with reasonable ease inspect the code to see what they were running. 
2 About 20% of the measurements were not paired, because they were made in conjunction with the measurements 
discussed in Part II. 

28 
Source 
Destination 
14 
11 
14 
15 
9 
20 
7 
12 
5 
3 
60 
7 
5 
7 
8 
3 
10 
3 
4 
57 
47 
56 
41 
34 
25 
24 
56 
11 
16 
10 
17 
16 
10 
6 
11 
4 
5 
10 
6 
6 
2 
3 
4 
4 
6 
48 
53 
51 
44 
31 
23 
20 
35 
13 
11 
13 
13 
15 
11 
11 
6 
8 
4 
49 
6 
4 
4 
5 
3 
5 
4 
45 
72 
40 
30 
27 
22 
50 
14 
14 
11 
12 
8 
3 
5 
12 
8 
4 
47 
8 
4 
6 
3 
3 
3 
2 
9 
52 
59 
45 
35 
20 
19 
46 
13 
10 
8 
5 
10 
8 
10 
4 
11 
1 
52 
8 
1 
1 
3 
4 
1 
5 
1 
42 
45 
43 
26 
16 
22 
37 
13 
9 
6 
7 
14 
4 
3 
10 
7 
4 
49 
5 
7 
4 
1 
2 
6 
4 
44 
44 
41 
32 
16 
16 
25 
16 
21 
9 
13 
13 
17 
8 
9 
7 
6 
37 
7 
7 
3 
4 
5 
2 
5 
3 
28 
26 
31 
33 
20 
27 
40 
11 
10 
5 
14 
14 
17 
14 
10 
8 
7 
25 
6 
5 
5 
7 
6 
2 
1 
8 
32 
28 
28 
24 
32 
21 
18 
10 
19 
10 
14 
16 
8 
10 
9 
10 
5 
18 
7 
5 
6 
5 
4 
4 
5 
3 
24 
14 
26 
20 
25 
24 
12 
16 
13 
10 
8 
23 
5 
13 
4 
4 
11 
3 
1 
4 
5 
4 
6 
6 
9 
12 
9 
13 
14 
16 
17 
14 
10 
10 
10 
14 
16 
9 
8 
4 
3 
9 
17 
3 
3 
3 
4 
1 
1 
2 
5 
14 
16 
27 
16 
14 
18 
14 
10 
16 
12 
11 
20 
11 
7 
11 
13 
7 
12 
2 
5 
4 
6 
1 
6 
3 
3 
8 
10 
7 
15 
9 
14 
18 
17 
10 
16 
8 
7 
10 
11 
11 
5 
5 
13 
8 
4 
6 
1 
1 
4 
5 
15 
13 
13 
7 
6 
13 
11 
13 
16 
8 
6 
6 
10 
8 
9 
9 
4 
6 
2 
4 
4 
2 
1 
6 
1 
2 
12 
23 
16 
11 
13 
14 
8 
12 
11 
10 
13 
9 
4 
10 
7 
3 
6 
11 
5 
4 
4 
1 
2 
8 
6 
9 
6 
6 
11 
14 
9 
11 
8 
5 
11 
7 
12 
5 
13 
3 
7 
9 
7 
1 
8 
2 
4 
7 
2 
1 
4 
4 
4 
10 
7 
9 
11 
5 
4 
4 
9 
10 
8 
6 
4 
5 
1 
3 
9 
1 
6 
6 
8 
3 
5 
4 
2 
7 
8 
7 
6 
8 
2 
9 
11 
8 
5 
11 
2 
5 
6 
7 
8 
2 
4 
1 
4 
4 
2 
5 
1 
4 
5 
6 
3 
6 
3 
3 
4 
4 
4 
3 
5 
6 
8 
4 
3 
2 
4 
1 
4 
2 
4 
8 
5 
6 
6 
5 
2 
2 
6 
3 
7 
5 
6 
4 
11 
4 
2 
2 
6 
2 
4 
2 
2 
1 
2 
7 
1 
2 
3 
8 
14 
10 
9 
7 
5 
5 
6 
4 
4 
5 
5 
6 
6 
5 
8 
6 
2 
4 
1 
4 
1 
7 
1 
3 
1 
5 
5 
2 
3 
4 
5 
6 
7 
5 
2 
5 
3 
5 
2 
7 
3 
4 
4 
2 
7 
4 
5 
3 
4 
1 
5 
5 
4 
7 
3 
3 
6 
2 
6 
4 
6 
2 
3 
4 
4 
2 
1 
2 
5 
1 
9 
5 
5 
5 
4 
4 
4 
2 
5 
5 
3 
7 
3 
2 
3 
4 
5 
5 
2 
5 
5 
7 
11 
2 
1 
5 
1 
6 
5 
5 
1 
3 
2 
4 
6 
1 
4 
5 
4 
3 
1 
4 
4 
4 
2 
3 
4 
1 
4 
5 
2 
4 
3 
3 
3 
7 
3 
3 
7 
2 
8 
4 
1 
4 
5 
3 
6 
2 
3 
2 
2 
3 
4 
5 
4 
7 
2 
5 
6 
5 
3 
3 
2 
7 
4 
2 
3 
3 
1 
7 
2 
1 
2 
4 
8 
7 
2 
4 
3 
2 
4 
2 
4 
1 
1 
1 
1 
1 
3 
1 
3 
4 
7 
1 
3 
501 473 456 456 396 385 382 380 319 252 251 241 220 214 192 166 145 124 115 109 109 106 103 100 93 92 79 
452 
467 
454 
436 
506 
431 
374 
308 
302 
228 
233 
200 
222 
225 
216 
168 
185 
142 
119 
88 
126 
106 
104 
87 
82 
98 
100 
6459 
total lbl usc nrao ucol lbli ucl bnl wustl xor ukc sdsc mit 
connix ncar oce umont umann ustutt harv korea batman inria sri bsdi unij pubnix austr 
total 
lbl 
usc 
nrao 
ucol 
lbli 
ucl 
bnl 
wustl 
xor 
ukc 
sdsc 
mit 
connix 
ncar 
oce 
umont 
umann 
ustutt 
harv 
korea 
batman 
inria 
sri 
bsdi 
unij 
pubnix 
austr 
Figure 5.3: Number of measurements made for each Internet path, R 1 dataset 

29 
Source 
Destination 
88 
87 
87 
86 
82 
82 
82 
81 
81 
79 
78 
76 
75 
72 
72 
71 
70 
69 68 
68 
68 
67 
67 
67 
67 
66 
66 
66 
65 
64 
64 
64 
64 
64 
64 
62 
62 
61 
61 
61 
61 61 
61 
61 61 
60 
60 
60 
60 
59 
59 
59 
59 
59 
59 
58 
57 
57 
57 
57 
57 
57 
56 
56 
56 
55 
55 
55 
55 
54 
54 
54 
54 
54 
54 
54 
54 
54 
54 
54 
54 
53 
53 
53 
53 
53 
53 
53 
53 
53 
53 
53 
52 
52 
52 
52 
52 
52 
52 
52 
51 
51 
51 
51 
51 
51 51 
51 
51 
51 
51 
51 
51 
51 
51 
51 
51 
50 
50 
50 
50 
50 
50 
50 
49 
49 
49 
49 
49 
49 
49 
49 
48 
48 
48 
48 
48 
48 
48 
48 
48 
48 
48 
48 
48 
48 48 
47 
47 
47 
47 
47 
47 
47 
47 
47 
47 
47 
47 
47 
47 
47 
47 
46 
46 
46 
46 
46 
46 
46 
46 
46 
46 
46 
46 
46 
46 
46 
46 
46 
46 
46 
45 
45 
45 
45 
45 
45 
45 
45 
45 
45 
45 
45 
45 
44 
44 
44 
44 
44 
44 
44 
44 
44 
44 
44 
44 
44 44 
44 
44 
44 
44 
44 
44 
44 
44 
44 
44 
43 
43 43 
43 
43 
43 
43 
43 
43 
43 
43 
43 
43 
43 
43 
43 
43 
43 
43 
43 
43 
43 
43 
43 
43 
43 
43 
43 
43 
43 
42 
42 
42 
42 
42 
42 
42 
42 
42 
42 
42 
42 
42 
42 
42 
42 
42 
42 
42 
41 
41 
41 
41 
41 
41 
41 
41 
41 
41 
41 
41 
41 
41 
41 
41 
41 
41 
41 
41 
41 
41 
41 
41 
41 
41 
41 
40 
40 
40 
40 
40 40 
40 
40 
40 40 
40 
40 
40 
40 
40 
40 
40 
40 
40 
40 
40 
39 
39 
39 
39 
39 
39 
39 
39 
39 
39 
39 
39 
39 
39 
39 
39 
39 
39 
39 
39 
39 
39 
39 
39 
39 
39 
39 
39 
39 
39 
38 
38 
38 
38 
38 
38 
38 
38 
38 
38 
38 
38 
38 
38 
38 
38 
38 
38 
38 
38 
38 
38 
38 
37 
37 
37 
37 
37 
37 
37 
37 
37 
37 
37 
37 
37 
37 
37 
37 
37 
37 
37 
37 
37 
37 
37 
37 
37 
37 
37 
37 
37 
37 
37 
36 
36 
36 
36 
36 
36 
36 
36 
36 
36 
36 
36 
36 
36 
36 
36 
36 
36 
36 
36 
36 
36 
36 
36 
36 
36 
35 
35 
35 
35 
35 
35 
35 
35 
35 
35 
35 
35 
35 
35 
35 
35 
35 
35 
35 
35 
35 
34 
34 
34 
34 
34 
34 
34 
34 
34 
34 
34 
34 
34 
34 
34 
34 
34 
34 
34 
34 
34 
34 
34 
34 
34 
34 
34 
34 
34 
34 
34 
34 
33 
33 
33 
33 
33 
33 
33 33 
33 
33 
33 
33 
33 
33 
33 
33 
33 
33 
33 
33 
33 
33 33 
33 
33 
33 
33 
33 
33 
33 
33 
33 
33 
33 
32 
32 
32 
32 
32 
32 
32 
32 
32 
32 
32 
32 
32 
32 
32 
32 
32 
32 
32 
32 
32 
32 
32 
32 
32 
32 
32 
32 
32 
32 
32 
32 
32 
32 
32 
32 
32 
32 
31 
31 
31 
31 
31 
31 
31 
31 
31 
31 
31 
31 
31 
31 
31 
31 
31 
31 
31 
31 
31 
31 
31 
31 
31 
31 
31 
31 
31 
31 
31 
31 
31 
31 
31 
31 
31 
30 
30 
30 
30 
30 
30 
30 
30 
30 
30 
30 
30 
30 
30 
30 
30 
30 
30 
30 
30 
30 
30 
30 
30 
30 
30 
30 
30 
30 
30 
30 
29 
29 
29 
29 
29 
29 
29 
29 
29 
29 
29 
29 
29 
29 
29 
29 
29 
29 
29 
29 
29 
29 
29 
29 
29 
29 
29 
29 
29 
29 
29 
29 
29 
28 
28 
28 
28 
28 
28 
28 
28 
28 
28 
28 
28 
28 
28 
28 
28 
28 
28 
28 
28 
28 
28 
28 
28 
28 
28 28 
28 
28 
28 
28 
28 
28 
27 
27 
27 
27 
27 
27 
27 
27 
27 
27 
27 
27 
27 
27 
27 
27 
27 
27 
27 
27 
27 
27 
27 
27 
27 
27 
27 
27 
27 
27 
27 
27 
27 
27 
27 
26 
26 
26 
26 
26 
26 
26 
26 
26 
26 
26 
26 
26 
26 
26 
26 
26 
26 
26 
26 
26 
26 
26 
26 
26 
26 
26 
26 
26 
26 
26 
26 
26 
25 
25 
25 
25 
25 
25 
25 
25 
25 
25 
25 
25 
25 
25 
25 
25 
25 
25 
25 
25 
25 
24 
24 
24 
24 
24 
24 
24 
24 24 
24 
24 
24 
24 
24 
24 
24 
24 
24 
24 
24 
24 
23 
23 
23 
23 
23 
23 
23 
23 
23 
23 
23 
23 
23 
23 
23 
23 
23 
23 
23 
23 
23 
23 
23 
23 
23 
23 
22 
22 
22 
22 
22 
22 
22 
22 
22 
22 
22 
22 
22 
22 
22 
22 
22 
22 
22 
22 
22 
21 21 
21 
21 
21 
21 
21 
21 
21 
21 
21 21 
21 
21 
21 
21 
21 
21 
21 
21 
21 
21 
21 
20 
20 
20 
20 
20 
20 
20 
20 
20 
20 
20 
20 
20 
20 
20 
20 
19 
19 
19 
19 
19 
19 
19 
19 
19 
19 
19 
19 
19 
19 
19 
19 
19 
19 
19 
18 
18 
18 
18 
18 
18 
18 
18 
18 
18 
18 
18 
18 
18 
18 
18 
18 
18 
17 
17 
17 
17 
17 
17 
17 
17 
17 
17 
17 17 
17 
17 
16 
16 
16 
16 
16 
16 
16 
16 
16 
16 
16 
16 
16 
16 
16 15 
15 
15 
15 
15 
15 
15 
15 
15 
15 
15 
15 
15 
15 
15 
15 
15 
15 
15 
14 
14 
14 
14 
14 
14 
14 14 
14 
14 
14 
14 
14 
14 
13 
13 
13 
13 
13 
13 
13 
13 
13 
13 
13 
13 
13 
13 
13 
13 
13 
13 
12 
12 
12 
12 
12 
12 
12 
11 
11 
11 
11 
11 
11 
11 
11 
11 
11 
10 
10 
10 
10 
9 
9 
9 
9 
9 
9 
9 
9 
9 
9 
8 
8 
8 
8 
8 8 
8 
8 
7 
7 
7 
7 
6 
6 
6 
6 
5 
5 
5 
5 
5 
3 
3 
3 
3 
3 
1 
1339 1298 1286 1255 1231 1215 1183 1132 1099 1059 1044 1006 989 905 883 694 320 
1336 1288 1279 1237 1222 1191 1154 1131 1092 1054 1031 998 913 887 794 564 
1365 
1309 
1287 
1277 
1336 
1168 
1282 
1260 
1260 
1165 
1210 
1228 
1063 
1077 
1065 
1110 
1030 
1043 
1043 
1031 
1018 
1021 
973 
1078 
910 
921 
931 
1006 
844 
864 
688 
564 
682 
35109 
total connix lbl ucla sintef2 sintef1 austr2 inria mid rain adv ucl near wustl umont austr ustutt mit sri bsdi bnl 
umann pubnix sandia sdsc unij ucol harv oce ncar nrao lbli ukc panix 
total 
connix 
lbl 
ucla 
sintef2 
sintef1 
austr2 
inria 
mid 
rain 
adv 
ucl 
near 
wustl 
umont 
austr 
ustutt 
mit 
sri 
bsdi 
bnl 
umann 
pubnix 
sandia 
sdsc 
unij 
ucol 
harv 
oce 
ncar 
nrao 
lbli 
ukc 
panix 
Figure 5.4: Number of measurements made for each Internet path, R 2 dataset 

30 
Site Best Guess 
wvnet­wtn9­c1.sura.net Charleston, WV 
128.167.205.2 Charlottesville, VA 
reynolds­ctv1­c1.sura.net Charlottesville, VA 
uva­ctv­c3mb.sura.net Charlottesville, VA 
38.2.213.16 New York, NY 
core.net218.psi.net New York, NY 
leaf.net218.psi.net New York, NY 
38.1.2.14 Washington, D.C. 
core.net222.psi.net Washington, D.C. 
137.209.1.1 College Park, MD 
192.80.6.2 College Park, MD 
198.25.80.1 College Park, MD 
199.54.78.1 College Park, MD 
Table IV: Uncertain router sites 
Figures 5.3 and 5.4 summarize the number of traceroute measurements between each 
pair of sites for each of the experiments. 
5.3 Geography 
To understand the Internet topology traversed by the experiment, and how each router 
relates to others, we undertook to identify the geographic locations of the 751 routers (distinct IP 
addresses) involved in R 1 and the 1,095 routers in R 2 . The identification involved several steps: 
1. Routers with an Internet hostname in the same domain as one of the participating sites (e.g., 
colorado.edu) were assumed to be located at that site. 
2. Routers with a single geographic location in their name (e.g., dallas1.tx.alter.net) 
were assumed to reside at that location. 
3. For still­unidentified routers, we sent email to the NIC ``whois'' contacts [HSF85] for the 
router's domain, asking if they could identify the router's location or the naming scheme 
used for routers in that domain. The various contacts proved remarkably helpful, willing 
to go to considerable efforts to aid in locating the sites. We also benefited from various 
``whois'' servers, especially the European server whois.ripe.net and its corresponding 
WAIS server, and topology maps. 
4. If any still­unidentified routers only occurred as a hop between two identified sites at the same 
location, we assumed the router was sited at that location too. For example, if we observed 
a partial network path of A ! B ! C , with A and C both sited in San Diego, then we 
assumed that B was sited in San Diego too. 

31 
5. For the remainder, we made a ``best guess,'' based on the locations of upstream and down­ 
stream routers. Table IV summarizes the sites for which we had to guess. 
Thus, of the 1,531 routers traversed during the study, we were able to identify the location 
of all but 13. 
After locating the routers, we reduced the topology traversed by the experiment to con­ 
nections between cities, listed in Table V. Having developed a geographic database for the various 
routers, we then constructed maps showing the links traversed in the study. 3 Figure 5.5 shows these 
links from a North American perspective, where sites in Hawaii, Korea, and Australia are shown 
west of California, and sites in Europe and Israel are shown in the Atlantic. Figure 5.6 show the 
links from a European perspective; here, the only links extending outside of Europe were those to 
sites in the U.S., which is represented as a single site west of France. 
3 Doing so first required analyzing the traceroutes for routing pathologies (x 6), because ``fluttering'' and mid­ 
stream routing changes can easily introduce spurious links. 

32 
State or Country City 
California Anaheim, Berkeley, Bloomington, Hayward, Livermore, Los Angeles, 
NASA­AMES (Moffett Field), Oakland, Palo Alto, Pasadena, Sacra­ 
mento, San Diego, San Francisco, San Jose, Santa Clara, Stockton 
Colorado Boulder, Colorado Springs, Denver, East Boulder 
Connecticut Hartford, Middlefield 
Florida Miami 
Georgia Atlanta 
Hawaii Honolulu 
Illinois Batavia, Chicago, Willow Springs 
Maryland College Park 
Massachusetts Boston, Cambridge, Waltham 
Michigan Detroit 
Missouri Kansas City, St. Louis 
Nebraska Lincoln 
New Jersey Pennsauken, Princeton, West Orange 
New Mexico Albuquerque, Los Alamos 
New York Albany, Armonk, Brookhaven, Buffalo, Deer Park, Ithaca, New York, 
Syracuse 
North Carolina Greensboro, Raleigh 
Ohio Cleveland, North Royalton 
Oregon Portland 
South Carolina Greenville 
Texas Austin, Dallas, Fort Worth, Houston 
Virginia Charlottesville, Fairfax, Falls Church, Newport News, Norfolk, Vienna 
Washington, D.C. 
Washington Kent, Seattle 
West Virginia Charleston 
Australia Adelaide, Canberra, Melbourne, Newcastle, Sydney 
Austria Vienna 
Belgium Brussels 
Canada Vancouver, Montreal, Toronto 
England Cambridge, Canterbury, London, Manchester 
Finland Helsinki 
France Lyon, Marseilles, Montpellier, Nice, Paris, Poitiers, Sophia, Toulouse 
Germany Aachen, Duesseldorf, Heidelberg, Karlsruhe, Mannheim, Munich, 
Stuttgart 
Italy Milan 
Israel Jerusalem, Rehovot 
Korea Pohang, Seoul 
Netherlands Amersfoort, Amsterdam, Den Bosch, Eindhoven, Nijmegen, Venlo, 
Utrecht 
Norway Oslo, Trondheim 
Spain Madrid 
Sweden Stockholm 
Switzerland Geneva 
Table V: Router cities 

33 
Helsinki 
Stockholm 
Cambridge 
LondonLondon 
Amsterdam 
Paris 
Duesseldorf 
Munich 
Pohang 
Seoul 
HI 
Adelaide 
Melbourne 
Canberra 
Sydney 
Newcastle 
Israel 
Figure 5.5: Links traversed during R 1 and R 2 , North American perspective 
USA 
Figure 5.6: Links traversed during R 1 and R 2 , European perspective 

34 
Chapter 6 
Routing Pathologies 
We begin our analysis by classifying occurrences of routing pathologies---those routes 
that exhibited either clear sub­standard performance, or out­and­out broken behavior. 
6.1 Unresponsive routers 
Some routers do not return the required ICMP messages in response to traceroute 
probes (x 4.2.2), or do so with insufficient TTL's to make the return trip. We refer to these as 
unresponsive routers. If these routers are prevalent, they will add a great deal of noise to our mea­ 
surements, making analysis difficult. This is especially the case because an unresponsive router 
looks identical to a router that had to drop all three probe packets due to congestion, a case we are 
interested in analyzing. 
Fortunately, unresponsive routers are easy to spot. Unlike congested routers, unresponsive 
routers consistently fail to answer any of the traceroute probe packets. Because we measured 
multiple traceroutes between sites, we can look for just such consistency. 1 
Upon inspecting the traceroutes in R 1 , we found 4 unresponsive routers (which be­ 
tween them appeared in a total of 93 traceroutes): the last two hops prior to the ukc endpoint 
(repaired on December 8); the last hop prior to the lbli endpoint (frequently, but not always); and 
the 8th hop from usc to various destinations for traffic routed between CERFNET (hop 7) and Al­ 
terNet or MCINET (hop 9), consistently. This quantity of only 4 unresponsive routers contrasts with 
the 751 responsive routers in the first measurement set: clearly almost all Internet routers correctly 
return ICMP messages for expired TTL's. Furthermore, in R 2 we did not identify any unresponsive 
routers, in contrast with 1,095 responsive routers. The previously unresponsive routers found in 
the first measurement set now were responsive, indicating they had been upgraded (except we were 
unable to determine if those on the usc paths had been upgraded since usc did not participate in 
the second set of measurements). 2 
1 Recall that we use the term ``traceroute'' to refer to both the utility, and to an instance of a measurement made 
using the utility. 
2 In doing this analysis for R2 , we encountered a strange anomaly: all of the traceroutes from adv to ustutt 
were missing the hop between icm­dc­1­h1/0­t3.icp.net and amsterdam1.dante.net. But this hop con­ 
sistently appeared in other traceroutes to ustutt, identifying itself as icm­dante­e0.icp.net. It turned 
out that due to an administrative decision, icm­dante­e0.icp.net did not have a route to adv's autonomous 

35 
6.2 Rate­limiting routers 
Some routers limit the rate at which they generate ICMP messages, to conserve resources 
(x 4.2.3). We can partially test for the presence of such routers in our measurements as follows. 
Recall that, for each hop n, traceroute sends three ``probes'' to elicit ICMP messages in reply. If 
the hop n router limits its ICMP generation rate, then in general it will reply to the first probe (unless 
it happens to already have been generating ICMP messages). This reply will lead to traceroute 
rapidly sending another probe, one whose ICMP reply will then be suppressed by the router due 
to rate­limiting. Since traceroute waits up to 5 seconds between probe packets, the third probe 
will not arrive until 5 seconds after the second, by which time rate­limiting again allows the router 
to reply. So rate­limiting routers that limit ICMP generation to on the order of one per 1­2 seconds 
will show up in our measurements as having a high proportion of first and third replies received, but 
no second reply received. We term such replies ``R­*­R,'' reflecting their pattern. 
We analyzed R 2 to determine for each router the proportion ae of ``R­*­R'' replies, limiting 
the analysis to routers for which we had at least 5 measurements. The distribution of ae was sharply 
bimodal, with 8 routers exhibiting ae – 50% and the remaining 701 all having ae Ÿ 20%. Of the 
8 routers, 7 were endpoints: inria, mid, nrao, sri, ustutt, ucl, and wustl. These seven 
are all running the Solaris operating system, which by default is configured to do rate­limiting. 
The other router was cs­gw.colorado.edu, which, according to its DNS ``HINFO'' record, is a 
Cisco 7000. These routers support rate­limiting and apparently this one had the option activated; 
but we conclude that, in general, routers deployed today do not rate­limit their ICMP generation, at 
least not on time scales of one per 1­2 seconds. 
Because we subsequently only undertake light analysis of dropped traceroute probes 
(and never endpoint drops), for simplicity we assume that all missing ICMP replies correspond 
to either a dropped traceroute probe packet or a dropped reply, and not to the effects of rate­ 
limiting. 
6.3 Routing loops 
Suppose router R 1 's routing tables indicate that, to forward a packet to host H , it should 
send the packet along a path that eventually includes router R 2 . If, due to an inconsistency, R 2 's 
tables indicate it in turn should forward the packet to H via a path that eventually includes R 1 , the 
network contains a loop. The packet will circulate between R 1 and R 2 until either its TTL expires 
(x 4.2.1), never reaching H , or the loop is broken by a routing update. 
In general, routing algorithms are designed to avoid loops, provided all of the routers in 
the network share a consistent view of the present connectivity. Thus, loops are apt to form when 
the network experiences a change in connectivity and that change is not immediately propagated 
to all of the routers [Hu95]. One hopes that loops resolve themselves quickly, as they represent 
a complete failure. As long as the loop persists, end­to­end communication involving the path is 
impossible. 
Some researchers have downplayed the significance of temporary routing loops [MRR80], 
and the ARPANET was subject to transitory looping ``at the 1% level'' [Co90]. Assuming that this 
system, so its replies were always lost. 

36 
means that ARPANET paths on average contained a loop 1% of the time, then from the figures 
presented in this section and the next we will see that loops in the Internet occur much more rarely. 
Other researchers have noted that loops can rapidly lead to congestion as a router is 
flooded with multiple copies of each packet it forwards [ZG­LA92], and minimizing loops is a 
major Internet design goal [Li89]. To this end, the Border Gateway Protocol (BGP) used between 
autonomous systems is designed to never allow the creation of inter­AS loops [RL95, Re95], which 
it accomplishes by tagging all routing information with the AS path it traversed. This technique is 
based on the observation that routing loops occur only when the propagation of routing information 
itself is subject to loops. The tagging allows a BGP router to determine if a peer is giving it infor­ 
mation that the peer directly or indirectly derived from the router itself. If so, the router discards the 
information. 
In this section we analyze our measurements for the prevalence of routing loops. We clas­ 
sify these loops as two types, ``persistent'' if they lasted longer than the traceroute measurement, 
or ``temporary'' if they resolved within the span of the traceroute observing them. The next two 
subsections look at these two types, and the final subsection comments on the location of the loops 
within the network. 
6.3.1 Persistent routing loops 
A persistent routing loop is easy to detect in a traceroute. Here is an example of a loop 
between lbl and lbli, ordinarily 6 hops apart: 
1 ir6gw.lbl.gov 1.853 ms 1.623 ms 2.358 ms 
2 er1gw.lbl.gov 7.165 ms 2.996 ms 3.098 ms 
3 ir2gw.lbl.gov 4.882 ms 3.516 ms 8.371 ms 
4 isdn1gw.lbl.gov 7.98 ms 4.393 ms 4.311 ms 
5 ascend49.lbl.gov 36.833 ms 32.772 ms 31.428 ms 
6 isdn1gw.lbl.gov 30.428 ms 30.502 ms 33.528 ms 
7 ascend49.lbl.gov 69.006 ms 59.429 ms 58.82 ms 
8 isdn1gw.lbl.gov 59.358 ms 63.734 ms 61.775 ms 
9 ascend49.lbl.gov 85.629 ms 84.168 ms 83.397 ms 
10 isdn1gw.lbl.gov 83.374 ms 83.201 ms 83.349 ms 
11 ascend49.lbl.gov 110.316 ms 120.243 ms 116.84 ms 
12 isdn1gw.lbl.gov 109.221 ms 108.97 ms 109.242 ms 
13 ascend49.lbl.gov 135.867 ms 136.797 ms 140.849 ms 
14 isdn1gw.lbl.gov 137.359 ms 138.757 ms 137.028 ms 
15 ascend49.lbl.gov 171.109 ms 167.197 ms 168.027 ms 
16 isdn1gw.lbl.gov 187.18 ms 177.017 ms 165.499 ms 
17 ascend49.lbl.gov 199.461 ms 193.441 ms 201.067 ms 
18 isdn1gw.lbl.gov 191.205 ms 198.674 ms 192.041 ms 
19 ascend49.lbl.gov 228.833 ms 219.05 ms 240.464 ms 
20 isdn1gw.lbl.gov 213.537 ms 214.975 ms 220.435 ms 
21 ascend49.lbl.gov 249.681 ms 254.247 ms 243.089 ms 
22 isdn1gw.lbl.gov 239.341 ms 239.072 ms 243.516 ms 
23 ascend49.lbl.gov 268.134 ms 270.585 ms 267.982 ms 
24 isdn1gw.lbl.gov 273.742 ms 274.974 ms 265.043 ms 
25 ascend49.lbl.gov 297.033 ms 293.392 ms 294.328 ms 
26 isdn1gw.lbl.gov 348.844 ms 303.868 ms 291.552 ms 

37 
Source Dest. Loop Location 
ucol bnl 129.19.253.18, 129.19.253.17 Col. State Univ. 
austr umann mf­0.enss145.t3.ans.net, umd­rt1.es.net FIX­East 
mit umann same 
lbli xor icm­fix­e­h2/0­t3.icp.net, FIX­East, 
icm­dc­2b­h3/0­t3.icp.net Washington D.C. 
lbl lbli isdn1gw.lbl.gov, ascend49.lbl.gov LBL 
(this loop occurred twice) 
lbl inria llnl­e­llnl2.es.net, 
llnl2­e­llnl.es.net 
Livermore, California 
sdsc ukc gw.ukc.ac.uk, gw.ulcc.ja.net London, Canterbury 
sdsc usc mobydick.cerf.net, drzog.cerf.net SDSC 
harv ucl mf­0.cnss56.washington­dc.t3.ans.net, 
mf­0.cnss58.washington­dc.t3.ans.net 
Washington, D.C. 
Table VI: Persistent routing loops in R 1 
27 ascend49.lbl.gov 335.637 ms 324.15 ms 322.982 ms 
28 isdn1gw.lbl.gov 328.654 ms 321.418 ms 316.452 ms 
29 ascend49.lbl.gov 344.561 ms 351.843 ms 346.087 ms 
30 isdn1gw.lbl.gov 358.938 ms 348.781 ms 355.01 ms 
isdn1gw.lbl.gov is the Laboratory's ISDN gateway, and ascend49.lbl.gov is the other end 
of the ISDN link to lbli. Here, ascend49.lbl.gov apparently has lost track of the notion that 
lbli resides on its side of the ISDN point­to­point link, so it forwards any packets for lbli back 
to the ISDN gateway. 
For our analysis, we considered any traceroute showing a loop that was not re­ 
solved by the end of the traceroute (i.e., after probing 30 hops) as a ``persistent loop.'' Of the 
6,204 traceroutes in R 1 , 3 10 exhibited persistent routing loops. Table VI summarizes these. 
Three of these loops appear to have formed during the traceroute probe. In the harv 
) ucl loop, for example, the probes made it to London and almost to the ucl endpoint before the 
loop appeared in Washington, D.C., at hop 16: 
1 glan­gw.harvard.edu 87 ms 3 ms 2 ms 
2 wjhgw1.harvard.edu 4 ms 2 ms 2 ms 
3 harvard­gw.near.net 8 ms 11 ms 4 ms 
4 prospect­gw.near.net 20 ms 20 ms 12 ms 
5 tang­gw.near.net 32 ms 6 ms 6 ms 
6 enss.near.net 6 ms 6 ms 3 ms 
7 t3­3.cnss48.hartford.t3.ans.net 7 ms 9 ms 11 ms 
8 t3­2.cnss32.new­york.t3.ans.net 9 ms 10 ms 10 ms 
9 t3­1.cnss56.washington­dc.t3.ans.net 18 ms 16 ms 20 ms 
3 This number represents the 6,459 total traceroutes, minus 255 traceroutes originating from wustl, 
which, as explained in x 6.6.2, suffered from a large degree of ``fluttering,'' making it difficult to determine whether true 
routing loops were also present. 

38 
10 mf­0.cnss58.washington­dc.t3.ans.net 15 ms 17 ms 16 ms 
11 washington2.dante.net 20 ms 15 ms 19 ms 
12 icm­dc­1­e4/0.icp.net 75 ms 58 ms 77 ms 
13 icm­london­1­s1­1984k.icp.net 144 ms 218 ms 127 ms 
14 smds­gw.ulcc.ja.net 230 ms 161 ms 146 ms 
15 smds­gw.ucl.ja.net 131 ms 155 ms 138 ms 
16 cisco­pb.ucl.ac.uk 1566 ms 
* mf­0.cnss58.washington­dc.t3.ans.net 53 ms 
17 mf­0.cnss56.washington­dc.t3.ans.net 58 ms 58 ms 55 ms 
18 mf­0.cnss58.washington­dc.t3.ans.net 66 ms 61 ms 60 ms 
19 mf­0.cnss56.washington­dc.t3.ans.net 62 ms 68 ms 68 ms 
etc. 
In the sdsc ) usc loop, the loop formed just one hop from the SDSC source, after the probe had 
already made it from San Diego to Los Angeles: 
1 drzog.cerf.net 163 ms 2 ms 2 ms 
2 134.24.120.102 7 ms 8 ms 7 ms 
3 * ucla­la­smds.cerf.net 66 ms 19 ms 
4 * losnet.ucla.edu 16 ms 16 ms 
5 isi­ucla­gw.ln.net 57 ms 20 ms 18 ms 
6 * * mobydick.cerf.net 9 ms 
7 drzog.cerf.net 13 ms 9 ms 7 ms 
8 mobydick.cerf.net 9 ms 10 ms 9 ms 
9 drzog.cerf.net 10 ms 11 ms 21 ms 
10 mobydick.cerf.net 13 ms 32 ms 11 ms 
etc. 
The presence of packet loss (*'s) prior to the loop forming at hops 6--7 may indicate connectivity 
deteriorating prior to a routing change (which led to an inconsistent state). A similar loss can be 
seen in the harv ) ucl example above, at hop 16. 
The lbl ) inria loop entailed two separate loops: 
1 ir6gw.lbl.gov 1.858 ms 1.66 ms 1.546 ms 
2 er1gw.lbl.gov 3.68 ms 2.423 ms 2.244 ms 
3 lbl­lc2­1.es.net 3.252 ms 2.618 ms 2.645 ms 
4 llnl­lbl­t3.es.net 5.892 ms 4.634 ms 3.985 ms 
5 lanl­llnl­t3.es.net 34.728 ms 29.444 ms 30.195 ms 
6 snla­lanl­t3.es.net 61.712 ms 60.392 ms 60.347 ms 
7 pppl­fnal­t3.es.net 78.807 ms 79.19 ms 77.252 ms 
8 pppl­nis.es.net 79.454 ms 78.5 ms 78.166 ms 
9 umd­pppl.es.net 85.851 ms 105.744 ms 89.141 ms 
10 icm­fix­e­f0.icp.net 129.442 ms 86.567 ms 88.157 ms 
11 * * * 
12 * * llnl­lanl­t3.es.net 321.099 ms 
13 lanl­llnl­t3.es.net 577.496 ms 199.259 ms 134.383 ms 
14 llnl­lanl­t3.es.net 134.854 ms 135.204 ms 134.909 ms 
15 lanl­llnl­t3.es.net 160.895 ms 160.312 ms 162.187 ms 
16 llnl­lanl­t3.es.net 161.882 ms 315.869 ms * 
17 * * * 
18 * * * 

39 
19 * * * 
20 * * * 
21 * * * 
22 * * * 
23 * * * 
24 llnl2­e­llnl.es.net 17.051 ms 26.225 ms 22.082 ms 
25 llnl­e­llnl2.es.net 21.823 ms 15.619 ms 21.804 ms 
26 llnl2­e­llnl.es.net 16.693 ms 22.776 ms 26.126 ms 
27 llnl­e­llnl2.es.net 23.758 ms 19.809 ms 22.475 ms 
etc. 
The first sign of trouble is at hop 11, where, after having made it to FIX­East in Maryland at hop 10, 
the network begins dropping probe packets (or their responses). At hop 12, a temporary routing 
loop forms between the ESNET routers in Livermore, California, and Los Alamos, New Mexico. 
This loop appears to lead to further problems at the end of hop 16, 4 where subsequent packets are 
lost for nearly 2 minutes (recall that each ‘*' represents a lost response, including a 5­second wait). 
Finally, at hop 24 the network comes back, but in an inconsistent state, with a consequent routing 
loop. Most likely the routing inconsistency leading to the first loop was propagated through ESNET 
to form the second loop. 
In R 2 , 50 traceroutes showed persistent loops. Due to R 2 's higher sampling fre­ 
quency, for some of these loops we can place bounds on how long they persisted, by looking for 
surrounding measurements between the same hosts that do not show the loop. In addition, some­ 
times the surrounding measurements do show the loop---these allow us to put lower bounds on the 
loop's duration, too. 
Table VII summarizes the loops seen in R 2 . The first two columns give the source and 
destination of the traceroute, the next column the date, and the fourth column the number of 
consecutive traceroutes that encountered the loop. The fifth and sixth columns give the routers 
involved in the loop and the geographic location. Note that only one of the loops spanned multiple 
cities (and multiple continents!), the last in the table. 
The final column gives the bounds we were able to assess for the duration of the loop. 
Upper bounds indicate the difference in time between the two non­looping traceroutes brack­ 
eting the loop, if this difference was less than 1 day (otherwise the upper bound is potentially so 
lax that we omit it). Lower bounds, when present, indicate the difference in time between the first 
traceroute in a sequence observing the loop, and the last. For loops only observed during a sin­ 
gle traceroute, this bound is omitted. Loops for which we were unable to assign any plausible 
bounds have their bounds marked as ``?''. 
The loop durations appear to fall into two modes, those definitely under 3 hours (and 
possibly quite shorter), and those of more than half a day. The presence of persistent loops of 
durations on the order of hours to tens of hours is quite surprising, and suggests a lack of good tools 
for diagnosing network problems: neither the NOC's (Network Operation Centers) responsible for 
the looped routers, nor the customers, apparently discovered and repaired the loops for considerable 
periods of time, despite the total connectivity outage due to the loop. 
We also note a tendency for persistent loops to come in clusters. Geographically, loops 
occurred much more often in the Washington D.C. area (MAE­East and College Park are only a 
4 So the loop persisted for about 2.5 seconds, as indicated by summing the return times for each of the probe packets. 

40 
Source Dest. Date # Loop Location Duration 
inria adv Nov. 6 1 icm­dc­1­f0/0.icp.net, 
icm­dc­2b­f2/0.icp.net 
Washington ? 
inria near Nov. 11 1 same as above Washington Ÿ 3 hr 
wustl inria Nov. 24 1 same as above Washington ? 
inria pubnix Nov. 12 1 icm­dc­3­f2/0.icp.net, 
icm­dc­2b­f2/0.icp.net 
Washington ? 
inria austr2 Nov. 15 1 same as above Washington ? 
sintef1 adv Nov. 12 1 icm­pen­1­h1/0­t3.icp.net, 
icm­dc­2b­h0/0­t3.icp.net 
Washington ? 
pubnix sintef1 Nov. 8 1 sl­ana­1­f0/0.sprintlink.net, 
sl­ana­2­f0/0.sprintlink.net 
Anaheim ? 
ustutt ucl Nov. 11 16 stuttgart1.belwue.de, 
stuttgart4.belwue.de 
Stuttgart 16--32 hr 
connix bsdi Nov. 14 1 sl­dc­8­h1/0­t3.sprintlink.net, 
sl­mae­e­h2/0­t3.sprintlink.net 
MAE­East – 10 hr 
ustutt austr Nov. 14 1 same as above 
pubnix sintef1 Nov. 14 1 fddi0/0.cr1.dca1.alter.net, 
cisco1.washington.dc.ms.uu.net 
Washington Ÿ 5.5 hr 
austr nrao Nov. 15 1 cpk8­cpk­cf.sura.net, 
cpk9­cpk­cf.sura.net 
College Park ? 
many oce Nov. 23 12 amsterdam.nl.net, wgm01.nl.net Amsterdam 14--17 hr 
ucol ustutt Nov. 24 1 borderx1­hssi3­0.sanfrancisco.mci.net 
pacbell­nap­atm.sanfrancisco.mci.net 
San Francisco ? 
ucol inria Nov. 27 1 stamand1.renater.ft.net, 
stamand3.renater.ft.net 
Paris Ÿ 14 hr 
mid bsdi Nov. 28 1 sl­dc­6­f0/0.sprintlink.net, 
sl­dc­8­f0/0.sprintlink.net 
Washington Ÿ 3 hr 
mid austr Dec. 6 1 sl­chi­6­h3/0­t3.sprintlink.net, 
sl­chi­nap­h1/0­t3.sprintlink.net 
Chicago Ÿ 3 hr 
mit wustl Dec. 10 1 starnet2.starnet.net, 
starnet8.starnet.net 
St. Louis ? 
umann nrao Dec. 13 1 heidelberg1.belwue.de, 
heidelberg2.belwue.de 
Heidelberg ? 
ucl mit Dec. 14 1 mci­its.near.net, 
w91­rtr­external­fddi.mit.edu 
Cambridge Ÿ 3 hr 
near ucla Dec. 16 1 ln­gw.cs.ucla.edu, ucla­isi­gw.ln.net Los Angeles ? 
sri near Dec. 17 1 \Lambda su­a.bbnplanet.net, su­b.bbnplanet.net Palo Alto ? 
near sri same 1 \Lambda barrnet.sanfrancisco.mci.net, 
borderx1­hssi2­0.sanfrancisco.mci.net 
San Francisco ? 
bsdi sintef1 Dec. 21 1 icm­pen­2­h2/0­t3.icp.net, 
icm­uk­1­h0/0­t3.icp.net 
Pennsauken, 
London 
Ÿ 10 hr 
Table VII: Persistent routing loops in R 2 

41 
few miles away), perhaps because the very high degree of interchange between different network 
service providers in that area offers ample opportunity for introducing inconsistencies. 
Loops involving separate pairs of routers also are clustered in time. The pubnix ) 
sintef1 loop, involving two AlterNet routers sited in Washington D.C., was measured at the same 
time between the connix ) bsdi and ustutt ) austr observations of a SprintLink loop, at 
nearby MAE­East. The sri ) near and near ) sri loop observations were made back­to­ 
back. They do not observe the same loop, but rather two separate loops between closely related 
routers (the typical path from near to sri proceeds from MCINET in San Francisco immediately 
to BARRNET at Stanford (Palo Alto), and then at the next hop to BBN Planet at Stanford). Thus, 
it appears that the inconsistencies that lead to long­lived routing loops are not confined to a single 
pair of routers but also affect nearby routers, tending to introduce loops into their tables too. This in 
turn suggests that any persistent loop encountered in the network is very serious, as it may reflect a 
substantially larger outage than just the two looped routers initially observed. 
6.3.2 Temporary routing loops 
Fortunately, routing loops do not always persist for long periods of time. In addition 
to analyzing the traceroute data for persistent loops, we also looked for temporary loops. We 
define a temporary loop as one during which a router was visited at different hops, yet eventually 
the traceroute probe traveled beyond the loop. This definition requires manual inspection of 
the candidates, to remove spurious ``loops'' that are in reality due instead to other factors, such as 
``fluttering'' (rapidly­variable routing; x 6.6.2) or midstream route changes (x 6.5). 
The lbl ) inria example in the previous section shows both a temporary loop and a 
permanent loop, both involving ESNET routers. In addition to the lbl ) inria example above, 
R 1 exhibited one other case of a temporary routing loop, occurring between ucl and wustl: 
1 cisco.cs.ucl.ac.uk 12 ms 5 ms 5 ms 
2 cisco­pb.ucl.ac.uk 11 ms 4 ms 4 ms 
3 cisco­b.ucl.ac.uk 5 ms 4 ms 5 ms 
4 gw.lon.ja.net 20 ms 22 ms 19 ms 
5 eu­gw.ja.net 60 ms 21 ms 19 ms 
6 icm­lon­1.icp.net 20 ms 25 ms 37 ms 
7 icm­dc­1­s3/2­1984k.icp.net 177 ms 191 ms 168 ms 
8 * sl­dc­7­f0.sprintlink.net 1174 ms 183 ms 
9 sl­starnet­1­s0­t1.sprintlink.net 220 ms 216 ms 233 ms 
10 * * * 
11 * * * 
12 stl2­e0.starnet.net 506 ms 775 ms 262 ms 
13 stl3­e0.starnet.net 218 ms * * 
14 stl2­e0.starnet.net 919 ms * 237 ms 
15 * stl3­e0.starnet.net 193 ms 191 ms 
16 * * * 
17 * * * 
18 * * * 
19 * * * 
20 * * * 
21 * tango.cs.wustl.edu 260 ms * 

42 
Here, at hops 12­15, the STARnet routers engage in a short­term routing loop that evidently is 
resolved during hops 16­20 (an outage of about 80 seconds). 5 
While in R 1 we only observed two temporary loops, in R 2 we found 23. We confine 
ourselves here to a look at two of the more seriously pathological, as these illustrate the degree to 
which routing can degrade. 
The first of these was from rain to inria: 
1 r0.pdx.rain.rg.net 3.212 ms 2.903 ms 2.348 ms 
2 border1­serial2­5.seattle.mci.net 8.119 ms 7.509 ms 8.303 ms 
3 core­fddi­0.seattle.mci.net 10.255 ms 11.472 ms 9.087 ms 
4 core2­hssi­3.denver.mci.net 42.005 ms 45.637 ms 41.765 ms 
5 core1­aip­4.denver.mci.net 180.353 ms 210.453 ms 222.771 ms 
6 core2­hssi­2.westorange.mci.net 192.796 ms 224.263 ms 257.99 ms 
7 core2­hssi­2.washington.mci.net 96.183 ms 90.611 ms 90.897 ms 
8 borderx2­fddi­1.washington.mci.net 88.917 ms 98.286 ms 99.512 ms 
9 mae­east­plusplus­two.washington.mci.net 
95.96 ms 111.302 ms 121.937 ms 
10 icm­dc­e­f0/0.icp.net 91.077 ms 102.348 ms 95.265 ms 
11 * * * 
12 * * * 
13 * * borderx2­fddi­1.washington.mci.net 269.431 ms 
14 mae­east­plusplus­two.washington.mci.net 
440.782 ms 293.266 ms 166.355 ms 
15 mae­east­plusplus.washington.mci.net 
89.681 ms 94.609 ms 90.987 ms 
16 borderx1­hssi2­0.washington.mci.net 91.661 ms 89.673 ms 96.562 ms 
17 core2­fddi­0.washington.mci.net 137.351 ms 174.362 ms 204.639 ms 
18 borderx2­fddi­1.washington.mci.net 95.169 ms 90.19 ms 94.371 ms 
19 mae­east­plusplus­two.washington.mci.net 
97.839 ms 91.079 ms 97.236 ms 
20 mae­east­plusplus.washington.mci.net 92.483 ms 91.213 ms 91.38 ms 
21 borderx1­hssi2­0.washington.mci.net 92.318 ms 92.662 ms 95.358 ms 
22 * * * 
23 r0.pdx.rain.rg.net 3.343 ms !H * * 
24 * t8­gw.inria.fr 779.58 ms * 
25 tom.inria.fr 657.659 ms * * 
The traceroute begins without any problems, traveling to ICP (the Sprint/NSF International 
Connectivity Project) in Washington via Seattle, Denver, West Orange (New Jersey), Washing­ 
ton, and MAE­East. At hop 11, however, we observe a 40 second outage. Evidently the outage 
was due to the loss of the link between mae­east­plusplus­two.washington.mci.net and 
icm­dc­e­f0/0.icp.net, because when the outage finished, we find ourselves in a routing loop 
between five different routers: 
borderx2­fddi­1.washington.mci.net 
mae­east­plusplus­two.washington.mci.net 
mae­east­plusplus.washington.mci.net 
5 As discussed in x 6.6.2 below, these STARnet routers also suffered from route ``fluttering,'' though that problem was 
apparently fixed on December 12, and this trace is from December 15, after the repair. 

43 
borderx1­hssi2­0.washington.mci.net 
core2­fddi­0.washington.mci.net 
This is one of only two times in either R 1 or R 2 that we observed a loop involving more than two 
routers. (The other is discussed in x 6.4.) The loop persists from hop 13 to hop 21 (at least). At 
hop 22 we suffer a 15 second outage, and when it resolves we find ourselves all the way back to 
where we started at hop 1. The router there has returned an ``ICMP unreachable'' message (the 
!H), indicating it is convinced that it cannot reach inria, presumably because it has lost its link 
to border1­serial2­5.seattle.mci.net. After another 15 second outage, however, we sud­ 
denly find ourselves in France, at inria's doorstep: either both of the previous problems had 
resolved themselves, or an alternate path was discovered. 
The second seriously pathological traceroute was from ucol to umann: 
1 cs­gw­srl.cs.colorado.edu 3 ms 3 ms 2 ms 
2 cu­gw­fddi.colorado.edu 5 ms 2 ms 4 ms 
3 ncar­cu.co.westnet.net 13 ms 4 ms 8 ms 
4 ml­t3­gw.ucar.edu 11 ms 24 ms 34 ms 
5 border2­hssi1­0.denver.mci.net 73 ms 141 ms 87 ms 
6 core­fddi­1.denver.mci.net 80 ms 22 ms 24 ms 
7 * core2­hssi­2.westorange.mci.net 47 ms 64 ms 
8 core2­hssi­2.washington.mci.net 58 ms 63 ms 59 ms 
9 borderx2­fddi­1.washington.mci.net 73 ms 98 ms 111 ms 
10 mae­east­plusplus­two.washington.mci.net 60 ms 64 ms 60 ms 
11 icm­dc­e­f0/0.icp.net 112 ms 99 ms 91 ms 
12 icm­dc­1­h1/0­t3.icp.net 81 ms 94 ms 105 ms 
13 icm­dante­e0.icp.net 115 ms 150 ms * 
14 * amsterdam1.dante.net 205 ms * 
15 nl­s1.dante.bt.net 177 ms 166 ms 151 ms 
16 nl­f0­0.eurocore.bt.net 172 ms 190 ms 176 ms 
17 de­s1­1.eurocore.bt.net 206 ms 247 ms 227 ms 
18 de­f0.dante.bt.net 251 ms 181 ms 227 ms 
19 * * * 
20 * * * 
21 * icm­dc­2b­f2/0.icp.net 151 ms 138 ms 
22 icm­dc­1­f0/0.icp.net 97 ms 86 ms 64 ms 
23 icm­dc­2b­f2/0.icp.net 98 ms 85 ms 107 ms 
24 icm­dc­1­f0/0.icp.net 109 ms 92 ms umd2­pppl2.es.net 251 ms 
25 * mae­east­plusplus­two.washington.mci.net 178 ms 251 ms 
26 pppl2­umd2.es.net 702 ms * * 
27 core­hssi­3.sanfrancisco.mci.net 158 ms !H * 
core­fddi­1.denver.mci.net 34 ms !H 
Everything is fine up until hop 18, with the path traversing from Boulder to Denver, in Col­ 
orado; then over MCINET to West Orange and down to MAE­East, then across to Amsterdam 
and over to Duesseldorf---almost there! But a 35 second outage at hops 19--21 is the begin­ 
ning of trouble. When the network begins responding again, we have fallen back to a tem­ 
porary loop between icm­dc­1­f0/0.icp.net and icm­dc­2b­f2/0.icp.net in Washing­ 
ton, D.C., a position similar to that we had achieved at hops 11­12 earlier. At hop 25 we again visit 
mae­east­plusplus­two.washington.mci.net, already visited at hop 10. Note two things 

44 
about this hop. First, we have now backtracked twice, once to icm­dc­2b­f2/0.icp.net, and 
then again to MAE­East, which is an earlier hop than ICM in Washington. Second, we have ac­ 
quired an additional 15 hops to our route upstream of MAE­East, so along with the routing loop in 
Washington, there is also a major change closer to ucol. At hop 26 we find ourselves on ESNET, 
but at hop 27 we initially are rerouted to San Francisco on MCINET, indicating another upstream 
change (since ESNET does not have a link from Princeton to MCI in San Francisco). This router 
indicates that it knows of an immediate outage by flagging the hop using !H. But only five seconds 
later we lose connectivity even to San Francisco---we are back in Denver again, as we were at hop 6, 
and unable to make any further progress (the router flags !H). 
Clearly at least two different major failures occurred in this example, one the routing 
loop at icm­dc­2b­f2/0.icp.net, and the other the rapidly changing (and lengthening) path 
upstream from MAE­East. In the previous example, the same applies: we observed both a routing 
loop in Washington, and a connectivity outage between Portland and Seattle. A very interesting 
question is whether these failures were actually reflections of a single underlying catastrophe that 
propagated through the network at large. 
All in all we observed 20 instances of multiple large­scale changes such as illustrated in 
this example, suggesting that either the propagation of a single fault's effects through the network 
sometimes leads to widespread, temporary instability, or that a mechanism separate from the ex­ 
change of routing information is producing widespread faults. Determining which of these is the 
case and how the fault propagates would make for interesting future work. 
6.3.3 Location of routing loops 
We analyzed the routers involved in temporary and persistent loops to see whether any of 
the loops involved more than one AS. As mentioned above, the design of BGP in theory prevents 
any inter­AS loops, by preventing any looping of routing information. We found that only three of 
the R 1 loops spanned more than one AS, and only two of those in R 2 . We also learned that at least 
one of the inter­AS loops in R 2 occurred due to the presence of a static route, and thus clearly was 
not the fault of BGP. It may be that the others have similar explanations. In any event, it appears 
clear from our data that BGP loop suppression virtually eliminates inter­AS looping. 
6.4 Erroneous routing 
A final example of a routing loop occurred during a connix ) ucl traceroute, which 
also exhibits erroneous routing, where the packets clearly took the wrong path: 
1 mfd­01.rt.connix.net 8 ms 4 ms 3 ms 
2 sl­dc­5­s2/0­512k.sprintlink.net 39 ms 39 ms 39 ms 
3 sl­dc­6­f0/0.sprintlink.net 39 ms 38 ms 50 ms 
4 psi­mae­east­1.psi.net 48 ms 66 ms * 
5 * * core.net218.psi.net 90 ms 
6 192.91.187.2 1139 ms 1188 ms * 
7 * * * 
8 biu­tau.ac.il 1389 ms * * 
9 tau.man.ac.il 1019 ms * * 
10 * * * 

45 
11 * cisco301s1.huji.ac.il 1976 ms * 
12 * * * 
13 * * * 
14 * * cisco101e5.huji.ac.il 1974 ms 
15 * * * 
16 * cisco103e2.gr.huji.ac.il 1010 ms 1069 ms 
17 cisco101e01.cc.huji.ac.il 2132 ms * * 
18 cisco102e13.huji.ac.il 888 ms 976 ms 2005 ms 
19 cisco103e2.gr.huji.ac.il 1657 ms * * 
20 * * cisco101e01.cc.huji.ac.il 1349 ms 
21 * * * 
etc. 
Recall that connix is sited in Middlefield, Connecticut, and ucl in London, England. Yet 
at hop 6, instead of routing towards London, the route winds up visiting 192.91.187.2 as 
the next hop---192.91.187.2 is sited at the Weizmann Institute in Rehovot, Israel! (As can 
be seen by the long latency to hop 6, a satellite link is involved here.) Not surprisingly, the 
bewildered Israeli routers do not really know what to make of the London­bound packet: it 
enters a routing loop between cisco101e01.cc.huji.ac.il, cisco102e13.huji.ac.il, 
and cisco103e2.gr.huji.ac.il prior to being discarded. The lack of any response to 
traceroute probes beyond hop 20 may be due to the route being terminated further upstream, 
or because growing congestion on the US--Israel link led to subsequent probes getting dropped. 
There is a security lesson to be considered here, too: one really cannot make any safe 
assumptions about where one's packets might travel on the Internet. If the Israeli routers had an 
alternate path to London available to them, it is possible that this highly circuitous route would have 
succeeded (cf. x 6.9). 
6.5 Connectivity altered mid­stream 
In 10 of the R 1 traces we observed routing connectivity reported earlier in the 
traceroute later lost or altered, indicating we were catching a routing failure as it happened: 
1 netlab1­gw.usc.edu 3 ms 3 ms 3 ms 
2 rtr1.usc.edu 3 ms 2 ms 2 ms 
3 isi­usc­gw.ln.net 5 ms 4 ms 5 ms 
4 ucla­isi­gw.ln.net 121 ms 230 ms * 
5 * * * 
6 * * * 
7 * * * 
8 * * * 
9 * rtr1.usc.edu 2 ms !H * 
10 * * * 
11 rtr1.usc.edu 2 ms !H * * 
12 * * * 
13 rtr1.usc.edu 2 ms !H * 2 ms !H 

46 
In this trace from usc to ucol, by hop 4 the packets have made it from usc out to the UCLA/ISI 
Los Nettos gateway. The large round­trip times reported at hop 4 indicate trouble, however, 6 and 
after the second hop 4 reply, connectivity is lost for about 70 seconds. When it returns, connectivity 
is only present to the hop 2 router, which reports that the destination host is unreachable (the ``!H'' 
flag). Because the recovery only extends to the 2nd hop, we infer that the problem occurred not at 
the hop 4 router but rather at hop 3, the gateway between USC and ISI. 
In the other traces, a connectivity loss was followed by a recovery, as shown in this 
traceroute between bnl and usc: 
1 cerberus.90.bnl.gov 2 ms 2 ms 2 ms 
2 nioh.bnl.gov 3 ms 2 ms 4 ms 
3 192.12.15.224 3 ms 2 ms 2 ms 
4 pppl­bnl.es.net 11 ms 11 ms 14 ms 
5 * * * 
6 * 192.12.15.224 4 ms !H * 
7 * 192.12.15.224 3 ms !H * 
8 * 192.12.15.224 5 ms !H * 
9 * * * 
10 * * * 
11 * 192.12.15.224 4 ms !H * 
12 * 192.12.15.224 84 ms !H * 
13 * * * 
14 * usc­cit­gw.ln.net 563 ms 257 ms 
15 rtr5.usc.edu 283 ms 317 ms 242 ms 
16 catarina.usc.edu 282 ms 102 ms 211 ms 
17 escondido.usc.edu 199 ms 306 ms 392 ms 
Router 192.12.15.224 is located at the bnl site. At hop 5, it clearly loses its link to 
pppl­bnl.es.net, and the link does not return for two minutes. Once it does, the traceroute 
probes are able to continue all the way to usc. 
Three additional R 1 traces revealed similar high­delay recoveries, incurring outages rang­ 
ing from about 1 minute to almost 5 minutes. One striking example is from wustl to ucol: 
1 jcr­166.cs.wustl.edu 5 ms 2 ms 2 ms 
2 ncrc­eng.wustl.edu 3 ms 2 ms 2 ms 
3 128.252.5.120 3 ms 3 ms 2 ms 
4 128.252.1.2 4 ms 4 ms 3 ms 
5 sl­dc­7­s7­t1.sprintlink.net 30 ms 28 ms 28 ms 
6 sl­dc­6­f0/0.sprintlink.net 81 ms 27 ms 33 ms 
7 sl­dc­8­f0/0.sprintlink.net 106 ms 37 ms 30 ms 
8 * * * 
9 * * sl­dc­8­f0/0.sprintlink.net 32 ms !H 
10 * * * 
11 * * * 
12 * * * 
13 * * * 
6 Between usc and ucol this hop usually had a latency of 5­10 msec. We did not, however, undertake any rigorous 
evaluation of hop latencies, because of the potentially large noise associated with these times, as discussed in x 4.2.2, and 
as illustrated above. 

47 
14 * * * 
15 * * * 
16 * * * 
17 * * * 
18 * * * 
19 * * * 
20 * * * 
21 * * * 
22 * * * 
23 * * * 
24 * * * 
25 clark.cs.colorado.edu 128 ms 106 ms 105 ms 
Here, connectivity was lost for between 15­17 hops. At first it might appear from this traceroute 
that the route upon recovery consisted of 25 hops, but that is instead a measurement artifact: by 
the time the network had recovered, the traceroute hop­count had ratcheted so high that the first 
successful probes following the outage made it all the way to the ucol endpoint. They no doubt 
would also have done so if they had been transmitted with somewhat lower TTL's. 
Two other traces revealed different, quite quick recovery behavior: 
1 netlab1­gw.usc.edu 3 ms 3 ms 3 ms 
2 rtr1.usc.edu 4 ms 3 ms 3 ms 
3 cit­usc­gw.ln.net 8 ms 3 ms 4 ms 
4 cerfnet­cit­gw.ln.net 17 ms 23 ms 6 ms 
5 sdsc­cit.cerf.net 84 ms 39 ms 21 ms 
6 mobydick.cerf.net 30 ms 37 ms 35 ms 
7 ucop­sdsc­2.cerf.net 85 ms 43 ms 50 ms 
8 sl­ana­3­s2/6­t1.sprintlink.net 68 ms 86 ms 84 ms 
9 sl­ana­1­f0/0.sprintlink.net 94 ms 72 ms 53 ms 
10 sl­fw­6­h2/0­t3.sprintlink.net 100 ms 99 ms 62 ms 
11 sl­fw­2­f0.sprintlink.net 120 ms 130 ms 132 ms 
12 sl­colorado­1­s0­t1.sprintlink.net 146 ms 151 ms 172 ms 
13 * t3­0.cnss56.washington­dc.t3.ans.net 121 ms 140 ms 
14 t3­0.enss145.t3.ans.net 132 ms 127 ms 120 ms 
15 icm­fix­e­f0.icp.net 155 ms 129 ms 306 ms 
16 icm­dc­2b­h3/0­t3.icp.net 370 ms 137 ms 148 ms 
17 sl­dc­8­f0/0.sprintlink.net 127 ms 144 ms 145 ms 
18 * sl­fw­5­h4/0­t3.sprintlink.net 334 ms 211 ms 
19 sl­fw­2­f0.sprintlink.net 156 ms 183 ms 157 ms 
20 sl­colorado­1­s0­t1.sprintlink.net 202 ms * 199 ms 
21 gw2.boulder.co.coop.net 179 ms 193 ms 189 ms 
22 bandicoot.xor.com 237 ms 199 ms 210 ms 
The path here is from usc to xor. It looks fairly straight­forward, suffering only three iso­ 
lated losses, but observe that hop 11 and hop 19 are identical! (As are hops 12 and 20.) 
The sl­colorado­1­s0­t1.sprintlink.net router is only two hops from the destination, 
bandicoot.xor.com, so apparently this traceroute was on the verge of reaching its destina­ 
tion at hop 14 (and indeed two of the other usc ) xor traceroutes took only 14 hops) when a 
routing change occurred upstream, forcing the packets to detour all the way to the East coast of the 
U.S. on their trip from California to Colorado. In contrast to the examples in the previous section, 

48 
in this case the routing change occurred quite smoothly, with only a single packet loss at hop 13 
indicating a 5­second outage during the switch­over. 
By inspecting other usc routes involving t3­0.cnss56.washington­dc.t3.ans. 
net at hop 13, we conclude that the change occurred at hop 10, where instead of routing from 
Anaheim, California to Fort Worth, Texas, as shown above, and staying inside Sprintlink, the switch 
was made to route to Houston, Texas, using ANS. 
Another example, a ucl ) wustl traceroute, is even more striking: 
1 cisco.cs.ucl.ac.uk 13 ms 5 ms 5 ms 
2 cisco­pb.ucl.ac.uk 14 ms 4 ms 4 ms 
3 cisco­b.ucl.ac.uk 5 ms 4 ms 4 ms 
4 gw.lon.ja.net 48 ms 36 ms 81 ms 
5 eu­gw.ja.net 71 ms 58 ms 72 ms 
6 icm­lon­1.icp.net 56 ms 120 ms 119 ms 
7 icm­dc­1­s3/2­1984k.icp.net 162 ms 137 ms 175 ms 
8 sl­dc­7­f0.sprintlink.net 160 ms 197 ms 189 ms 
9 sl­starnet­1­s0­t1.sprintlink.net 166 ms 122 ms 634 ms 
10 ncrc­acn.wustl.edu 457 ms 127 ms 119 ms 
11 ncrc­eng.wustl.edu 140 ms 237 ms 174 ms 
12 cisco­b.ucl.ac.uk 488 ms !H jcr.ecl.wustl.edu 244 ms 232 ms 
13 tango.cs.wustl.edu 228 ms * 151 ms 
Note that the first hop 12 router, cisco­b.ucl.ac.uk, is the same as the hop 3 router! This 
router also reports ``!H'', indicating it could not forward the packet, and yet the second and third 
traceroute probe packets for that hop make it all the way to wustl. This traceroute appears 
to reflect a 500 msec outage, quickly repaired. 
We thus see that the distribution of recovery times from routing problems is at least 
bimodal---some recoveries occur quite quickly, on the time scale of congestion delays, while others 
take on the order of a minute to resolve. The latter type of recovery presents significant difficulties 
to time­sensitive applications that assume outages are short­lived. 
Sometimes the presence of a connectivity change is more subtle, such as in this R 1 
traceroute from korea to ucol: 
1 fpls.postech.ac.kr 2 ms 1 ms 1 ms 
2 fddicc.postech.ac.kr 3 ms 2 ms 2 ms 
3 ktrc­postech.hana.nm.kr 30 ms 30 ms 51 ms 
4 gateway.hana.nm.kr 31 ms 31 ms 31 ms 
5 hana.hana.nm.kr 33 ms 44 ms 32 ms 
6 bloodyrouter.hawaii.net 1152 ms 1275 ms 968 ms 
7 bloodyrouter.hawaii.net 744 ms 336 ms 325 ms 
8 arc1.nsn.nasa.gov 384 ms 491 ms 691 ms 
9 jpl6.nsn.nasa.gov 791 ms 772 ms 1082 ms 
10 jpl3.nsn.nasa.gov 876 ms * 1641 ms 
11 ncar1.nsn.nasa.gov 1117 ms 1225 ms 848 ms 
12 * cu­gw.ucar.edu 1280 ms 805 ms 
13 cu­ncar.co.westnet.net 774 ms 884 ms * 
14 cs­gw.colorado.edu 1079 ms 897 ms 603 ms 
15 lewis.cs.colorado.edu 283 ms 383 ms 899 ms 

49 
In this example, hop 6 and hop 7 were both to bloodyrouter.hawaii.net. 7 The subsequent 
route shown above is exactly the route taken by every other korea ) ucol traceroute, except 
each hop is delayed by one (e.g., jpl6.nsn.nasa.gov is hop 9 here instead of hop 8 as usual). 
Duplicate hops such as this one are most likely due to upstream route changes (x 4.2.3) 
which, in this example, added an extra hop upstream to bloodyrouter.hawaii.net. The change 
would have had to occur just between the end of the probes for hop 6 and the beginning of those for 
hop 7. We considered all such duplicated hops to be midstream route changes. 
In contrast with the rarity of connectivity changes in R 1 (10 total), in R 2 we observed 
155 instances of a change, a fact we comment upon further in x 6.10. 
6.6 Fluttering 
We use the term ``fluttering'' to refer to rapidly­variable routing. On the time scale of a 
single traceroute (seconds to minutes) we would expect the path we are measuring to remain 
stable, yet surprisingly often our data showed that the packets belonging to a single traceroute 
took multiple paths through the Internet. 
6.6.1 A simple example 
Route fluttering can be detected from traceroute output by the presence of more than 
one host listed for a single hop, as in this example of a R 1 traceroute between korea and austr. 
1 fpls.postech.ac.kr 2 ms 2 ms 2 ms 
2 fddicc.postech.ac.kr 3 ms 2 ms 2 ms 
3 ktrc­postech.hana.nm.kr 57 ms 123 ms 30 ms 
4 gateway.hana.nm.kr 31 ms 31 ms 31 ms 
5 hana.hana.nm.kr 33 ms 140 ms 32 ms 
6 bloodyrouter.hawaii.net 825 ms 722 ms 805 ms 
7 usa­serial.gw.au 960 ms 922 ms 893 ms 
8 national­aix­us.gw.au 1039 ms * * 
9 * rb1.rtr.unimelb.edu.au 903 ms rb2.rtr.unimelb.edu.au 1279 ms 
10 itee.rtr.unimelb.edu.au 1067 ms 1097 ms 872 ms 
11 * * mulkirri.cs.mu.oz.au 1468 ms 
12 mullala.cs.mu.oz.au 1042 ms 1140 ms 1262 ms 
Here, the 9th hop shows two different hosts (as well as no reply for the first traceroute packet), 
rb1.rtr.unimelb.edu.au and rb2.rtr.unimelb.edu.au. Thus, it appears that for the sec­ 
ond packet national­aix­us.gw.au routed the packet to rb1.rtr.unimelb.edu.au, and for 
the third packet to rb2.rtr.unimelb.edu.au. (This change occurred most likely for purposes 
of load­balancing---see x 6.6.2 and x 7.4.) 
It is important to keep in mind, though, that the actual route flutter could have occurred 
upstream from national­aix­us.gw.au, and that for the hop 9 traceroute packets, the 8th 
hop was actually a different router altogether (x 4.2.3). 
7 In the example we have shown hostnames rather than IP addresses, as this aids in placing the router's location and 
service provider. It is possible for two different IP addresses to translate to the same hostname (indeed this is very 
common for routers). But inspecting the raw traceroute reveals the same IP address for both hop 6 and hop 7. 

50 
For subsequent hops, we cannot tell which of rb1.rtr.unimelb.edu.au or 
rb2.rtr.unimelb.edu.au was used (indeed, it could have been all of one or the other, or a 
continuation of switching between the two, or still a third router; the path was consistent with others 
we observed from the two routers). 
6.6.2 A more dramatic example 
The preceding example is straight­forward and demonstrates only minor fluttering, which 
presumably has no significant effect on the characteristics of the Internet path between korea 
and austr. A more dramatic example comes from a R 1 traceroute between wustl and umann: 
1 128.252.166.249 11 ms 29 ms 8 ms 
2 128.252.123.254 3 ms 2 ms 2 ms 
3 128.252.5.120 3 ms 3 ms 14 ms 
4 128.252.1.135 6 ms 3 ms 3 ms 
5 199.217.253.1 19 ms 35 ms 199.217.253.3 64 ms 
6 144.228.73.17 56 ms 144.228.27.5 26 ms 28 ms 
7 144.228.20.101 29 ms 38 ms 144.228.70.2 55 ms 
8 144.228.10.25 69 ms 65 ms 192.157.65.74 57 ms 
9 144.228.8.233 217 ms 117 ms 194.41.0.17 118 ms 
10 144.228.10.22 107 ms 193.172.4.8 122 ms 114 ms 
11 192.203.230.253 68 ms 193.172.4.12 130 ms 192.203.230.253 70 ms 
12 193.174.74.94 194 ms 140.222.8.4 72 ms 193.174.74.94 192 ms 
13 193.174.74.29 192 ms 189 ms 192 ms 
14 140.222.112.2 108 ms 129.143.6.16 222 ms 216 ms 
15 140.222.64.1 128 ms 153.17.62.105 236 ms 140.222.64.1 141 ms 
16 129.143.61.2 238 ms 284 ms 140.222.104.2 162 ms 
17 134.155.48.125 242 ms 140.222.72.1 164 ms 134.155.48.125 263 ms 
Here we show the route using untranslated IP addresses, since showing the names of all of the 
various routers would make for messy reading. However, consider hop 10: 
10 icm­fix­w­h2/0­t3.icp.net 107 ms amsterdam6.empb.net 122 ms 114 ms 
The first packet visited FIX­West at NASA AMES Research Center (Moffett Field, San Francisco 
Bay Area), while the second and third made it to Amsterdam! 
The divergence begins at hops 4­5: 
4 128.252.1.135 6 ms 3 ms 3 ms 
5 stl1­e0.starnet.net 19 ms 35 ms stl3­e0.starnet.net 64 ms 
The WUSTL border router (128.252.1.135) picks two different STARnet routers for the next 
hop, each of which presumably has a different notion of the best path to Europe. The confused 
traceroute shown above can be reduced to two separate traceroutes at this split. First, the 
``successful'' path---the one that first reaches umann: 
5 ? 
6 sl­dc­7­s7­t1.sprintlink.net 
7 icm­dc­1­f0/0.icp.net 
8 icm­dante­e0.icp.net 

51 
9 amsterdam1.dante.net 
10 amsterdam6.empb.net 
11 duesseldorf2.empb.net 
12 ipgate2.win­ip.dfn.de 
13 duesseldorf2.win­ip.dfn.de 
14 heidelberg1.belwue.de 
15 mannheim.belwue.de 
16 belwue­gw.uni­mannheim.de 
17 eratosthenes.informatik.uni­mannheim.de 
Geographically, this route traverses: St. Louis, Missouri; Washington, D.C.; Amsterdam, the 
Netherlands; and Duesseldorf, Heidelberg, and Mannheim, in Germany. 8 
The second route instead criss­crosses the United States: 
5 ? 
6 sl­ana­3­s3/1­t1.sprintlink.net 
7 sl­ana­2­f0/0.sprintlink.net 
8 sl­stk­6­h2/0­t3.sprintlink.net 
9 144.228.8.233 
10 icm­fix­w­h2/0­t3.icp.net 
11 t3­0.enss144.t3.nsf.net 
12 t3­3.cnss8.san­francisco.t3.ans.net 
13 ? 
14 t3­1.cnss112.albuquerque.t3.ans.net 
15 t3­0.cnss64.houston.t3.ans.net 
16 t3­1.cnss104.atlanta.t3.ans.net 
17 t3­0.cnss72.greensboro.t3.ans.net 
Geographically, this route traverses: St. Louis, Missouri; Anaheim, Stockton, FIX­West, and 
San Francisco, California; Albuquerque, New Mexico; Houston, Texas; Atlanta, Georgia; 
and Greensboro, North Carolina. 9 From other traceroutes that included t3­0.cnss72 
.greensboro.t3.ans.net, we can determine that eventually this route would also have made it 
to the destination, albeit with many more hops. For example, from a trace from sri to umann, we 
have: 
12 t3­0.cnss72.greensboro.t3.ans.net 
13 t3­0.cnss56.washington­dc.t3.ans.net 
14 t3­0.enss145.t3.ans.net 
15 umd­rt1.es.net 
16 umd2­e­stub.es.net 
17 pppl2­umd2.es.net 
18 ipgate2.win­ip.dfn.de 
8 Hop 5 is marked as ``?'' because from the trace it is not clear which of the two STARnet routers picks this route (by 
forwarding to sl­dc­7­s7­t1.sprintlink.net), and which picks the longer route. 
9 Hop 13 is missing because, in the raw trace, all three replies to the hop 13 traceroute probe 
were returned by duesseldorf2.win­ip.dfn.de, which clearly is not the next hop following 
t3­3.cnss8.san­francisco.t3.ans.net, but rather represents hop 13 from the first route. 
By inspecting other traceroutes from wustl to umann, it is evident that hop 13 for the second route is 
t3­0.cnss16.los­angeles.t3.ans.net, so we can add Los Angeles to the list of California cities tra­ 
versed by the route. 

52 
Amsterdam 
Duesseldorf 
Figure 6.1: Routes taken by alternating packets from wustl (St. Louis, Missouri) to umann 
(Mannheim, Germany), due to fluttering 
19 ipgate2.win­ip.dfn.de 
20 duesseldorf2.win­ip.dfn.de 
21 heidelberg1.belwue.de 
22 mannheim.belwue.de 
23 belwue­gw.uni­mannheim.de 
24 eratosthenes.informatik.uni­mannheim.de 
Thus, it appears that the second wustl ) umann route would also succeed in delivering packets, 
though using 29 hops instead of 17. 
The wustl fluttering occurs over very small timescales, essentially the time between 
successive traceroute probes, which are spaced out by the amount of time it takes for each reply 
to the previous probe packet (x 4.2.2). One routing mechanism that can lead to such small­scale 
fluttering occurs when a router alternates between multiple next­hop routers in order to split load 
among the links to those routers. Such behavior is explicitly allowed in [Ba95, p.79], though that 
document also cautions that there are situations for which it is inappropriate, and so it should at 
most be a configurable option for a router. It turns out that the wustl fluttering was indeed due to 
load­splitting: STARnet had two T1 links for its access to Sprintlink, one to Anaheim and the other 
to Washington, D.C. (as shown above), and would alternate packets ``round­robin'' between them in 
order to balance load [My95]. 
Figure 6.1 shows the two routes that packets can take from wustl to umann. The dramatic 
difference in the lengths of the two routes highlights the great impact an early routing discrepancy 
can make. 
Of the 380 traceroutes initiated by wustl, 255 exhibited fluttering, all but one oc­ 
curring before 12PM PST, December 13. After this point, the Anaheim link apparently became 
unavailable, and the routing was no longer split. This change however was not due to a decision to 
eliminate fluttering, but, apparently, simply due to an outage along the Anaheim link. On Decem­ 

53 
ber 20 the Anaheim link again became operational, and led to an interesting pathology: 
1 128.252.166.249 4 ms 2 ms 3 ms 
2 128.252.123.254 3 ms 2 ms 4 ms 
3 128.252.5.120 3 ms 2 ms 2 ms 
4 128.252.1.2 5 ms 3 ms 3 ms 
5 199.217.253.2 4 ms 3 ms 4 ms 
6 199.217.253.1 4 ms 11 ms 199.217.253.3 6 ms 
7 199.217.253.2 4 ms 144.228.73.17 58 ms 56 ms 
8 144.228.70.1 56 ms 199.217.253.3 4 ms 5 ms 
9 144.228.10.29 85 ms 144.228.73.17 74 ms 63 ms 
10 144.228.30.5 102 ms 217 ms 218 ms 
11 144.228.10.29 81 ms 144.228.10.17 93 ms 92 ms 
12 144.228.20.6 84 ms 131 ms 125 ms 
13 192.157.65.227 85 ms 144.228.10.29 80 ms 192.157.65.227 81 ms 
14 144.228.20.6 137 ms 144.228.30.5 264 ms 144.228.20.6 165 ms 
15 144.228.10.17 70 ms * * 
16 144.228.30.5 90 ms * 144.228.20.6 74 ms 
17 * 192.157.65.227 105 ms * 
18 137.39.128.7 120 ms * * 
19 * 192.157.65.227 84 ms * 
20 * * * 
21 * * * 
22 * * 137.39.128.7 202 ms 
23 * * * 
24 * * * 
25 * * * 
26 * * * 
27 * * * 
28 * * * 
29 * * * 
The fluttering begins at hop 6: 
5 stl2­e0.starnet.net 4 ms 3 ms 4 ms 
6 stl1­e0.starnet.net 4 ms 11 ms stl3­e0.starnet.net 6 ms 
Here, packets again alternate between stl1­e0.starnet.net and stl3­e0.starnet.net. 
Hop 7, though, shows that the routing is further confused: 
7 stl2­e0.starnet.net 4 ms sl­ana­3­s3/1­t1.sprintlink.net 58 ms 56 ms 
It appears that either stl1­e0.starnet.net or stl3­e0.starnet.net forwarded the packet 
back to stl2­e0.starnet.net, while the other forwarded the packet to sl­ana­3­s3/1 
t1.sprintlink.net in Anaheim, California. In the next hop: 
8 sl­ana­1­f0/0.sprintlink.net 56 ms stl3­e0.starnet.net 4 ms 5 ms 
one of the packets makes it to the next Anaheim hop, while the other is forwarded (apparently from 
stl2­e0.starnet.net) to stl3­e0.starnet.net. 
At this point, the packets proceed to bsdi but with some making one (or even more!) vis­ 
its from stl2­e0.starnet.net to the non­forwarding STARnet router (it is difficult to determine 
whether this is stl1­e0.starnet.net or stl3­e0.starnet.net). Viewed geographically: 

54 
9 Fort­Worth­6 85 ms Anaheim 74 ms 63 ms 
10 Fort­Worth­5 102 ms 217 ms 218 ms 
11 Fort­Worth­6 81 ms Washington­DC­8 93 ms 92 ms 
12 Washington­DC­6 84 ms 131 ms 125 ms 
13 Boone­VA 85 ms Fort­Worth­6 80 ms Boone­VA 81 ms 
14 Washington­DC­6 137 ms Fort­Worth­5 264 ms Washington­DC­6 165 ms 
15 Washington­DC­8 70 ms * * 
16 Fort­Worth­5 90 ms * Washington­DC­6 74 ms 
17 * Boone­VA 105 ms * 
18 Dallas 120 ms * * 
19 * Boone­VA 84 ms * 
The Fort­Worth­5 router at both hop 10 and hop 16 indicates that one of the hop 16 packets made 
three trips to the non­forwarding STARnet router prior to getting forwarded to the working router. 
Most likely this pathology occurred due to a set of inconsistent routing tables introduced by the 
reactivation of the Anaheim link. 
For reference, a flutter­free route from wustl to bsdi is: 
1 jcr­166.cs.wustl.edu 5 ms 2 ms 2 ms 
2 ncrc­eng.wustl.edu 3 ms 2 ms 2 ms 
3 128.252.5.120 4 ms 3 ms 2 ms 
4 128.252.1.2 6 ms 6 ms 3 ms 
5 sl­dc­7­s7­t1.sprintlink.net 29 ms 28 ms 25 ms 
6 sl­dc­6­f0/0.sprintlink.net 156 ms 26 ms 64 ms 
7 boone1.va.alter.net 30 ms 35 ms 28 ms 
8 dallas1.tx.alter.net 80 ms 67 ms 69 ms 
where the 128.252.x.y routers are local to WUSTL (traceroutes to bsdi stop in Dallas, as 
explained in x 6.7.4). 
The STARnet routing remained split for many more months. Here is a traceroute from 
wustl to umann, taken on July 2, 1995: 
1 128.252.166.249 3 ms 2 ms 2 ms 
2 128.252.123.254 4 ms 2 ms 2 ms 
3 128.252.5.120 4 ms 2 ms 2 ms 
4 128.252.41.2 4 ms 3 ms 3 ms 
5 199.217.253.1 4 ms 6 ms 11 ms 
6 144.228.73.17 71 ms 144.228.27.5 41 ms 144.228.73.17 166 ms 
7 144.228.20.8 30 ms 144.228.70.1 151 ms 56 ms 
8 144.228.10.29 87 ms 144.228.10.42 61 ms 144.228.10.29 90 ms 
9 144.228.30.5 143 ms 258 ms 192.41.177.252 35 ms 
10 144.228.10.17 91 ms 134.55.12.161 81 ms 67 ms 
11 192.188.33.10 138 ms 159 ms 144.228.10.42 74 ms 
12 192.41.177.252 79 ms 73 ms 74 ms 
13 153.17.200.105 198 ms * 220 ms 
14 192.188.33.10 202 ms * * 
15 193.174.74.141 224 ms 134.155.48.125 245 ms 214 ms 
Fluttering occurs downstream of the hop 5 router: 
5 stl1­e0.starnet.net 4 ms 6 ms 11 ms 
6 Anaheim­3 71 ms Washington­DC­7 41 ms Anaheim­3 166 ms 

55 
and continues from there. This example is slightly different from the previous ones we looked 
at, in that the STARnet routers stl2­e0.starnet.net and stl3­e0.starnet.net no longer 
appear. Instead, it looks like stl1­e0.starnet.net is doing its own load­splitting between 
sl­ana­3­s3/1­t1.sprintlink.net and sl­dc­7­s7­t1.sprintlink.net, on opposite 
sides of the country. 
STARnet has since switched to a single connection (via MCI), so this pathology no longer 
occurs [My95]. 
In x 13.1.3 we analyze the effects that the split­routing had upon TCP performance. Sur­ 
prisingly, it was generally quite minor. While wustl packets very often arrived out of order, they 
only very rarely arrived so far out of order as to trigger a spurious fast retransmission, as discussed 
in x 6.6.5 below. 
6.6.3 Fluttering at another site 
Putting aside traceroute probes initiated at wustl, of the remaining 6,079 R 1 probes, 
295 (about 5%) exhibited fluttering. None of these sites suffered such extreme fluttering as wustl; 
all of the flutters affected either a single hop or at most two hops. Here is an example of a two­hop 
flutter, between ncar and ucol, both sited in Boulder, Colorado: 
1 north­gw.scd.ucar.edu 3 ms 2 ms 2 ms 
2 server­gw.ucar.edu 3 ms 2 ms 2 ms 
3 cu­gw.ucar.edu 4 ms 3 ms 3 ms 
4 129.19.248.62 5 ms cu­ncar.co.westnet.net 5 ms 129.19.248.62 6 ms 
5 cs­gw.colorado.edu 6 ms 6 ms 5 ms 
6 lewis.cs.colorado.edu 8 ms 19 ms 9 ms 
The 4th hop shows a flutter from 129.19.248.62 (at Colorado State University) to cu­ncar 
.co.westnet.net and back again. We note that the problem occurred during a hop to Colorado 
State University, which suggests that those routers may be prone to fluttering. Indeed, of the 295 
remaining flutters, 277 involved ucol. For all but 6 of these, the fluttering occurred immediately 
downstream from either the cu­gw.colorado.edu router (for traffic outbound from ucol) or the 
cu­gw.ucar.edu (traffic inbound to ucol). It appears that these routers were splitting load just as 
did the STARnet router in the previous section, but both downstream routers they alternated between 
had the same view of subsequent wide­area routing, so the effect remained localized. 
Neither the ucol nor the wustl fluttering was present in R 2 . The only re­ 
peated pattern we found was that every route originating at sdsc that passed through 
nynap­sdsc­atm­ds3.cerf.net suffered from downstream fluttering. Here is an example, 
from a traceroute to adv: 
1 tigerfish.sdsc.edu 8 ms 8 ms 8 ms 
2 mobydick.cerf.net 85 ms 246 ms 18 ms 
3 nynap­sdsc­atm­ds3.cerf.net 475 ms 380 ms 71 ms 
4 sprintnap.ans.net 73 ms t3­3.cnss32.new­york.t3.ans.net 75 ms 77 ms 
5 cnss33.new­york.t3.ans.net 76 ms 77 ms 76 ms 
6 enss240.t3.ans.net 80 ms 80 ms 79 ms 
7 enss240.t3.ans.net 173 ms betelgeuse.advanced.org 81 ms 87 ms 
There were only 7 of these, however, so their overall impact on routing performance in R 2 was 
insignificant. 

56 
6.6.4 Skipping 
When analyzing the traces for fluttering, we notice an interesting anomaly in which 
routers were visited ``prematurely.'' Here is an example, taken from an xor ) ucl traceroute: 
1 xor­gw.xor.com 0 ms 0 ms 10 ms 
2 gw1.boulder.co.coop.net 0 ms 0 ms 0 ms 
3 sl­fw­2­s9­t1.sprintlink.net 30 ms 30 ms 30 ms 
4 sl­fw­5­f1/0.sprintlink.net 30 ms 20 ms 40 ms 
5 sl­dc­8­h3/0­t3.sprintlink.net 60 ms 60 ms 60 ms 
6 icm­dc­1­f0/0.icp.net 1520 ms 
icm­london­1­s1­1984k.icp.net 160 ms 
icm­dc­1­f0/0.icp.net 60 ms 
7 icm­london­1­s1­1984k.icp.net 150 ms 140 ms 150 ms 
8 smds­gw.ulcc.ja.net 140 ms 150 ms 140 ms 
9 smds­gw.ucl.ja.net 150 ms 150 ms 140 ms 
10 cisco­pb.ucl.ac.uk 160 ms 160 ms 160 ms 
11 cisco.cs.ucl.ac.uk 150 ms 160 ms 160 ms 
12 neptune.cs.ucl.ac.uk 160 ms 160 ms 170 ms 
At hop 6, we see flutter between icm­dc­1­f0/0.icp.net and icm­london­1­s1­1984k 
.icp.net. But hop 7 then reveals that icm­london­1­s1­1984k.icp.net is actually the next 
hop! 
All told, 11 traceroutes in R 1 and 22 in R 2 (at a number of different routers) showed 
this ``skipping'' effect. Furthermore, very often the packet return time just prior to the skip was 
unusually high (note in the example above the return time of 1,520 msec, much larger than any 
other in the traceroute). It appears that the router was under a period of stress during the time 
of the skip, and (perhaps due to a forwarding bug only exhibited under high load) a packet was 
erroneously forwarded without decrementing and checking its TTL. The downstream router then 
decremented the TTL, noted it had expired, and returned an ICMP message. The upstream router 
subsequently recovered from the error condition and continued to correctly forward packets, as is 
shown for the third probe of hop 6 above. 
If the source of the router load were network traffic, then the response from the down­ 
stream router should have been heavily delayed too, but, as shown above, it was not. Another 
explanation is that the load was instead due to the upstream router processing a routing update. This 
agrees with the fact that the router recovered quickly from the load condition: all that was needed 
was a single packet's worth of time (about 160 msec above) for the load to disappear. 
That a router might, under stress, forward a packet without decrementing its TTL raises 
a possibility of network instability. If the router stress was due to a routing loop, packets might 
circulate around the loop indefinitely because their TTL's would not correctly expire, which might 
in turn maintain the router stress. 
We considered traceroutes exhibiting ``skipping'' as reflecting a pathology separate 
from ``fluttering,'' since the underlying mechanisms (load­balancing vs. an apparent packet forward­ 
ing error) are quite different. 

57 
6.6.5 Significance of fluttering 
While fluttering can provide benefits as a way to balance load in a network, it also creates 
a number of problems for different networking applications: 
1. A fluttering network path presents the difficulties that arise from unstable network paths, as 
discussed in x 7.1: difficult­to­predict behavior, potential inconsistencies in state information 
created in the routers on behalf of connections, and problems with constructing consistent 
measurements of the network's condition. However, if fluttering occurs only at a larger gran­ 
ularity than individual packets---for example, per connection or per end­to­end ``flow''---then 
these problems are ameliorated. 
2. If the fluttering only occurs in one direction (as it does for wustl, but not for ucol), then 
the path is necessarily partially asymmetric, too, suffering from the problems discussed in 
x 8.1: difficulties in computing unidirectional latencies for protocols such as NTP, difficul­ 
ties in using ``sender­only'' measurement techniques, and inefficiencies in keeping state for 
bidirectional flows. 
3. Constructing reliable estimates of the path characteristics, such as round­trip time and avail­ 
able bandwidth, becomes potentially very difficult, since in fact there may be two different 
sets of values to estimate. 
4. When the two routes have different propagation times, such as many of those from the wustl 
site, then packets will often arrive at the destination out­of­order, depending on whether they 
took the shorter route or the longer route. At a minimum, this can lead to extra processing at 
the receiver to reassemble the out­of­order data stream. 
It can lead to a more serious problem for TCP connections, however. Whenever a TCP end­ 
point receives an out­of­order packet, the receipt triggers the sending of a redundant acknow­ 
ledgement in reply, as a mechanism for informing the sender that the receiver has a hole in its 
sequence space. If three out­of­order packets arrive in a row, then the receiver will generate 
three redundant acknowledgements. These are enough in turn to trigger ``fast retransmission'' 
by the sender (x 9.2.7), leading it to needlessly retransmit data. Thus, out­of­order delivery 
can result in redundant network traffic, both due to the extra acknowledgements, and due to 
possible data retransmissions. We explore this phenomenon further in x 13.1.3. 
These problems all argue for eliminating large­scale fluttering whenever possible, where 
we define fluttering as large­scale if it leads to significantly different routes (as it does for wustl). 
On the other hand, when the effects of the flutter are confined, as for ucol, or invisible at the 
network layer (such as split­routing used at the link layer, which would not show up at all in our 
study), then these problems are all ameliorated. 
Finally, we note that ``deflection'' and ``dispersion'' routing schemes that forward packets 
along varying or multiple paths have many of the characteristics of fluttering paths [BDG95, GK97]. 
While these schemes can offer benefits in terms of simplified routing decisions, enhanced through­ 
put, and resilience, they bring with them the difficulties discussed above. From the discussion of 
dispersion routing in [GK97], it appears that the literature in that area to date has only considered 
the problem of out­of­order delivery, which is addressed simply by noting that the schemes require 
a resequencing buffer. 

58 
Failure mode # Failures Notes 
Host down 81 (65 %) umann, sdsc, and inria accounted for 93% 
Stub network outage 31 (25 %) ustutt accounted for 74% of these 
Infrastructure failure 13 (10 %) no dominant pattern 
Table VIII: Failure modes for unreachable hosts in R 1 
Failure mode # Failures Notes 
Host down 277 (45 %) panix accounted for 61% of these 
Stub network outage 170 (27.5 %) nrao accounted for 57% of these 
Infrastructure failure 170 (27.5 %) no dominant pattern 
Table IX: Failure modes for unreachable hosts in R 2 
6.7 Unreachability 
In addition to traceroute failures due to persistent routing loops and erroneous routing, 
125 of the R 1 traceroutes and 617 of the R 2 traceroutes failed to reach the destination 
host for other reasons. We analyzed these failures to determine the corresponding failure modes, 
summarized in Tables VIII and IX. 
6.7.1 Host down 
We concluded that a host was down (first row) if the traceroute to it terminated at 
one of the routers which in another traceroute proved to be the penultimate hop to that host. 
In R 1 , this occurred 81 times out of a total of 6,459 traceroutes, giving us an unconditional 
probability that a site participating in our study was down during an experiment of p ß 1:25%. This 
probability corresponds to an availability of ß 98:75%, Similarly, for R 2 we get an availability of 
ß 99:2%. These values are a bit higher than the median availability of 97.2% reported in [LMG95], 
though our ``polling'' frequency is lower than theirs (a mean of 10 minutes), which could explain 
the discrepancy. Also, as noted in x 4.4, our sites do not plausibly constitute a random sample of 
Internet hosts (while [LMG95]'s sites are much closer to such), so disagreement between the two 
figures is not particularly significant. Finally, note that most of the failures were due to just a few of 
the sites, as indicated in the tables. 
6.7.2 Stub network outage 
We classified an Unreachability failure as a ``stub network outage'' (second row) if the 
final router reached during the traceroute was sited inside the same institute as the endpoint (but 
not a penultimate hop), or at the border between the institute and the remainder of the Internet. 10 
10 Such a failure could also occur at the traceroute source's institute. One might think we would never observe 
this in our traces because, in order to generate a traceroute, the npd control site had to be able to connect to 

59 
The numbers of observations of such failures correspond to availabilities of 99.5% for both R 1 
and R 2 , though again we cannot draw a general conclusion about connectivity to Internet sites 
because our collection of participating sites might not be representative. We also need to be wary 
about generality given the strong dominance of this type of failure by routes to the ustutt and 
nrao 11 sites. 
On the other hand, the prevalence of network outages to ustutt gives us an opportunity 
to assess how quickly a router learns that the next­hop router has crashed. If a router does not have 
a route to a packet's destination, the router is required to generate some form of ICMP ``Destination 
Unreachable'' message [Ba95]. However, a router may not know that it has no route to the packet's 
destination, because it is unaware that the next­hop router has crashed. These two cases result in 
different traceroute behavior: the first elicits a ``!H'' (or ``!N'') response in the traceroute 
output, while the second will simply show a dropped packet. Consider the following traceroute 
from ukc to ustutt: 
1 rtcomp.ukc.ac.uk 2 ms 2 ms 2 ms 
2 brtcomp.ukc.ac.uk 2 ms 2 ms 2 ms 
3 brtsj.ukc.ac.uk 3 ms 3 ms 3 ms 
4 smds­gw.ulcc.ja.net 7 ms 7 ms 6 ms 
5 eu­gw.ja.net 8 ms 8 ms 6 ms 
6 london4.empb.net 12 ms 11 ms 8 ms 
7 duesseldorf2.empb.net 33 ms 31 ms 38 ms 
8 ipgate2.win­ip.dfn.de 91 ms 52 ms 46 ms 
9 duesseldorf4.win­ip.dfn.de 70 ms 44 ms 32 ms 
10 stuttgart4.belwue.de 67 ms 68 ms 56 ms 
11 stuttgart1.belwue.de 84 ms 85 ms 74 ms 
12 belwue­gw.uni­stuttgart.de 63 ms 57 ms 69 ms 
13 * * * 
14 * * * 
15 * * * 
16 * * * 
17 * * * 
18 * * * 
19 * * * 
20 * * * 
21 * * * 
22 * * * 
23 * * belwue­gw.uni­stuttgart.de 68 ms !H 
24 * * * 
25 * * * 
26 * * * 
27 * * * 
28 * * * 
29 * * * 
30 * * belwue­gw.uni­stuttgart.de 64 ms !H 
the source in the first place. However, some sources have multiple connections to the Internet, and we did observe several 
instances where we were able to connect to a source but it was unable to advance packets to any routers outside of its site. 
We include these instances in the tables as stub network outages. 
11 It turns out that the entire nrao site was intentionally disconnected from the Internet from November 28 through 
December 6, 1995, following a serious break­in by a network cracker. 

60 
Hop 12 makes it to belwue­gw.uni­stuttgart.de, ustutt's border router. Normally the 
next hop would be to cisco1.rus.uni­stuttgart.de, inside the ustutt site, and hop 12 
gives no indication of an impending problem here. But the next 36 packets are dropped, re­ 
flecting an outage of 3.5 minutes. At hop 23, belwue­gw.uni­stuttgart.de again re­ 
sponds, but this time includes an ICMP unreachable message. Thus, it appears that it took 
belwue­gw.uni­stuttgart.de at least 3.5 minutes to learn that the next hop had crashed. 
What follows, from hops 24­30, remains a puzzle: belwue­gw.uni­stuttgart.de 
apparently forgets that the next­hop router has crashed and only relearns the fact after another 
100 seconds. At this point the traceroute terminates because it has reached the 30­hop limit. 
Of the 23 R 1 stub network outages involving ustutt, 19 exhibited this pattern. 12 For 
those 19, the learning periods range from 0 seconds (the router immediately knew that the next hop 
was unavailable) to 170 seconds, with a median of 30 seconds and a mean of 50 seconds (distributed 
roughly exponentially--see x 6.8 for the significance of this). For the other four ustutt outages, the 
router failed to learn the unavailability of the downstream hop before the traceroute terminated 
due to the 30­hop limit. These failures spanned between 105 and 225 seconds, so those give lower 
bounds on the learning time. 
Clearly, for belwue­gw.uni­stuttgart.de, the router does not quickly learn about a 
next­hop crash. If this slow response is typical (we lack enough data to know if it is), then Internet 
traffic is subject to outages on the order of a minute whenever a router crashes. This finding is 
consistent with the BGP specification, which recommends that routers wait for 90 seconds' worth 
of unanswered polls before deciding that a peer is unreachable [RL95]. The higher this figure is, the 
less prone a network is to routing oscillations; but high delays in detecting unreachable peers also 
present serious difficulties for real­time protocols that need to quickly adapt to such faults [GR95]. 
6.7.3 Infrastructure failure 
The final type of failure (third row in each table) reflects a problem inside the Internet 
infrastructure: the terminating router in the traceroute was in the middle of the network, not at 
the source or destination. 13 In this case, we can make a general statement about availability, since 
the basis for our study is the assumption that the collection of routes between our sites is represen­ 
tative of Internet connectivity as a whole (x 4.4). A total of 13 failures out of 6,459 R 1 observations 
corresponds to an Internet infrastructure availability of 99.8%, while for R 2 this percentage drops 
to 99.5%. The difference is significant using the methodology discussed in x 4.5. If we add to these 
failures the instances of persistent routing loops (x 6.3.1) and erroneous routing (x 6.4), then the R 1 
12 All of the ustutt outages occurred between the early morning of Saturday, December 10th and the early morning 
of Monday, December 12th (Stuttgart time), indicating that the crashed router was down for the weekend. 
13 In some cases, such a termination can still reflect an unreachable host or a stub network outage, if the unreachability 
information has been propagated into the interior of the network. However, in these cases we would expect that the 
information is not propagated deeply into the network, since the need to ``aggregate'' routing information means that 
information pertaining to individual host or stub network outages cannot be propagated beyond the point at which it is 
aggregated with information for other, reachable hosts or networks. 
We inspected the points in the terminating routers for the infrastructure failures and found that in the vast majority of 
cases, the router was sited far from the unreachable destination. For example, we observed several infrastructure failures 
for traceroutes going from bnl to European sites, each of which terminated at ames­llnl.es.net in 
California. Such a termination is much more likely to reflect loss of general connectivity to Europe, than an outage of a 
single European site being propagated all the way to a router in California. 

61 
availability falls to 99.6%, and that for R 2 to 99.35%. We must bear in mind, however, that these 
numbers will be skewed by the fairly large proportion of our attempted measurements that failed 
due to an inability to contact the remote npd site (x 5.2); some of these failures could be due to 
infrastructure problems, making these availability numbers overestimates. 
A solid figure for Internet infrastructure availability is important for network service 
providers wishing to provide a form of guaranteed service in which the guarantees carry legal (con­ 
tractual) obligations [Fe90, PaFe94]. We do not claim that the availabilities given in the preceding 
paragraph are such solid figures, but they are a step in that direction. 14 
6.7.4 Consistently unreachable hosts 
Several hosts in our study were either always or frequently unreachable. Those always 
unreachable---bsdi in R 1 , and oce and ucol in R 2 ---all reside behind firewalls that drop incom­ 
ing, unidentified UDP packets (such as used by traceroute; x 4.2.3), so traceroutes to it 
always showed connectivity lost after the hop prior to the firewall. We adjusted for this behavior by 
considering any traceroutes that made it to that hop as making it all the way to the host. 
The other frequently unreachable host, lbli, is connected to the Internet via an ISDN 
circuit. This circuit disconnects after any idle period during which lbli did not use the circuit for a 
configurable amount of time (typically 10­20 minutes). Thus, many traceroutes to lbli found 
the circuit down, and terminated at the Internet side of the ISDN link. As with the firewall hosts, we 
considered these traceroutes as having successfully reach the lbli host. 
The net effect of these adjustments is to introduce possible underestimation into our as­ 
sessment of the prevalence of stub network outages and hosts being down. Most likely, this intro­ 
duced bias is quite slight, given how our stub network outages and downed hosts statistics were 
dominated by just a few sites anyway. 
6.7.5 Unreachable due to too many hops 
As noted in x 4.2.1, traceroute by default probes up to 30 hops of the route between two 
hosts. This length sufficed for all of the R 1 measurements, and all but 6 of the R 2 measurements. 15 
The fact that it failed occasionally in R 2 , however, indicates that the operational diameter of the 
Internet has grown beyond 30 hops, and argues for using large initial TTL values when a host orig­ 
inates an IP datagram. In informal studies of the link connecting the Lawrence Berkeley National 
Laboratory to the rest of Internet, we have found that most hosts send IP datagrams with TTL's well 
above 30, but a non­negligible proportion of the datagrams (10% in one dataset) appear to have been 
sent with TTL's of around 30. 
While routes of more than 30 hops were not correctly measured by traceroute in our 
experiment, they were so rare as to not present any significant source of error. 
A final note concerning large hop counts: it is sometimes assumed that the hop count of 
a route equates to its geographical distance. While from our data this appears roughly the case, we 
14 Naturally, a network service provider will keep detailed statistics on their own network, and not need a figure such 
as that we have computed. But if they must deal with other providers for portions of the end­to­end route, such a figure 
as a rule­of­thumb will prove useful. 
15 5 of the 6 were to or from inria. Routing within France (and international routing in general) often has many 
hops. The other was between umont and umann, also international in scope. 

62 
noticed some remarkable disagreements, both in terms of a few hops corresponding to large dis­ 
tances, and many hops corresponding to little distance. For example, the shortest route we observed 
from ncar, in Colorado, to sdsc, in southern California (about 1,500 km distant), was three hops: 
cs­vbns.ucar.edu 
cs­atm0­0­3.sdsc.vbns.net 
rintrah.sdsc.edu 
This route traveled over the VBNS ATM backbone (recall from x 4.2.3 that traceroute elicits 
paths at the network layer, and does not measure any ``hops'' made at the link layer). We also 
observed in R 1 a 5 hop route from pubnix to bsdi, about 2,000 km distant. 
On the other hand, all of the routes we observed between mit and harv (in either direc­ 
tion), sited about 3 km apart, were 11 hops, and we observed 14 and 17 hop routes between sri 
and lbl, about 50 km apart. 
6.8 Temporary outages 
The final pathology we studied was temporary network outages. When a sequence of 
consecutive traceroute probes are lost, the most likely cause is either a temporary loss of net­ 
work connectivity, or very heavy congestion lasting 10's of seconds. For each traceroute, we 
examined its longest period of consecutive probe losses (other than consecutive losses at the end of 
a traceroute when, for example, the endpoint was unreachable). 
The resulting distribution of the number of probes lost appears trimodal. In R 1 (R 2 ), 
about 55% (43%) of the traceroutes had no losses, 44% (55%) had between 1 and 5 losses, and 
0.96% (2.2%) had 6 or more losses 16 
Of these latter, after eliminating those to ukc in R 1 (because these ``outages'' are actually 
unresponsive routers; see x 6.1), the distribution of the number of probes lost in the R 1 data is 
quite close to geometric. Figure 6.2 plots the outage duration on the x­axis vs. the probability 
of observing that duration or larger on the y­axis (logarithmically scaled). The outage duration is 
determined by the number of probe losses multiplied by 5 seconds per lost probe. The line added 
to the plot corresponds to what would be expected for a geometric distribution with probability 
p = 0:92 that a probe beyond the 5th is dropped. (The line appears straight due to the logarithmic 
y­axis scale and the fact that the geometric distribution is the discrete counterpart to the exponential 
distribution.) As can be seen, the fit is fairly good, especially in the tail. 
From the above evidence it is reasonable to argue that long outages are well­modeled 
as persisting for 30 seconds plus an exponentially distributed random variable with mean equal to 
about 40 seconds. This finding would be convenient, since the exponential distribution often makes 
for tractable analysis. 
If we turn to the R 2 data, however, we find that the geometric tail with p = 0:92 is still 
present, but only for outages more than 75 seconds long, as illustrated in Figure 6.3. For outages 
between 30 and 70 seconds, the duration still exhibits a geometric distribution, but with p = 0:62, 
suggesting two different recovery mechanisms, one operating on time scales of 30 seconds to a 
minute or so and the other on significantly longer time scales. 
16 Recall from x 4.2.3 that probe ``losses'' can also be due to ICMP rate­limiting, which we do not differentiate. We 
analyze true packet losses in much greater detail in Chapter 15. 

63 
Duration of Outage (sec) 
P(X 
>= 
x) 
50 100 150 200 250 300 
0.007 
0.020 
0.040 
0.070 
0.200 
0.400 
0.700 
Figure 6.2: Distribution of long R 1 outages 
Duration of Outage (sec) 
P(X 
>= 
x) 
50 100 150 200 250 300 
0.007 
0.020 
0.040 
0.080 
0.200 
0.400 
0.800 
Figure 6.3: Distribution of long R 2 outages 

64 
Figure 6.4: Circuitous route from bsdi to usc 
Note that x 6.7.2 provides separate evidence that the time taken for routers to recover from 
the loss of a next­hop router is exponentially distributed, with a mean of 50 seconds (shorter than 
the R 1 fit, but in agreement with the R 2 data). 
6.9 Circuitous routing 
Since the inception of the Internet Protocol, one of its main goals has been resilience in 
the presence of network failures [Cl88]. In this section we document some of the more circuitous 
routes the network found in order to maintain connectivity in the presence of failures. These routes 
do not represent pathologies per se but rather triumphs of robust routing, or, sometimes, simply the 
lack of the necessary infrastructure to take advantage of more direct routes. 
Figure 6.4 shows a route used from bsdi, in Colorado Springs, Colorado, to usc, in 
Los Angeles, California. The route is perhaps three times longer than the bsdi route to sri (located 
in Northern California), which also makes a first hop to Dallas, Texas, but from there travels to 
San Jose, California, rather than to the East coast. 
Figure 6.5 shows one of the routes used from lbli, in Berkeley, California, to ucol, in 
Boulder, Colorado. Here the packets travel all the way to the East coast, then back to the West 
coast, and finally over to Colorado. A more direct path, also present in our data, travels straight 
from New Mexico to Colorado. Presumably this link was unavailable during the time of the longer 
route. 
Figure 6.6 shows a route from nrao, in Charlottesville, Virginia, to wustl, in St. Louis, 
Missouri. This route increases the distance of the more direct route we also observed (via Washing­ 
ton, D.C., and then straight to St. Louis) by roughly a factor of five. 
Figure 6.7 shows an even more tortuous route to wustl, this time from lbl. The packets 

65 
Figure 6.5: Circuitous route from lbli to ucol 
Figure 6.6: Circuitous route from nrao to wustl 

66 
Figure 6.7: Circuitous route from lbl to wustl 
first travel to Livermore, California, and then Los Alamos, New Mexico, via ESNET. They continue 
up to Illinois and across to Washington, D.C., via Princeton, New Jersey, and College Park, Mary­ 
land. They next take a southern route all the way back to northern California (!), back to southern 
California, and finally across to St. Louis. Figure 6.8 shows the 29 hops making up this path. One 
might be tempted to conclude that the path must have been the product of some sort of one­time 
glitch, but it showed up 5 different times in the R 1 data. 
In Figure 6.9 we see an illustration of the difficulties sometimes encountered even when 
going a very short distance. This route was the only one we observed from ncar to xor (8 observa­ 
tions total). ncar is located in Boulder, Colorado, and xor in East Boulder, Colorado, a few miles 
to the east. Yet the route between them visits the Gulf of Mexico and the East coast before crossing 
those few miles. 
Circuitous routing is not limited to the United States. Figure 6.10 shows the route from 
inria, located in Southern France, to oce, located in the Netherlands, a few hundred kilometers to 
the North. The routing takes the packets across the Atlantic ocean to Vienna, Virginia (and nearby 
Falls Church), before crossing the Atlantic again to Amsterdam. The return path from oce to inria 
also follows this path, except in one instance the routing went from Amsterdam to Paris via Vienna, 
Austria (shown with a dotted line), rather than Vienna, Virginia. We speculated that perhaps the 
trans­Atlantic routing was due simply to accidental misconfiguration based on the similarities of 
the names; but we learned from EUnet personnel that much more likely the trans­Atlantic routing 
was intentional, due to its low­cost and higher available capacity compared to the underprovisioned 
intra­European links [Bi95]. 
Persistent circuitous routing might strike us as pathological, and unexpected in a well­ 
run network. Because we do not know the underlying reasons for the routing configurations, we 
are unable from our data to answer why circuitous routing exists. We speculate, however, that 

67 
ir6gw.lbl.gov (Berkeley, CA) 
er1gw.lbl.gov 
lbl­lc2­1.es.net 
llnl­lbl­t3.es.net (Livermore, CA) 
lanl­llnl­t3.es.net (Los Alamos, NM) 
fnal­lanl­t3.es.net (Batavia, IL) 
pppl­fnal­t3.es.net (Princeton, NJ) 
pppl­nis.es.net 
umd­pppl.es.net (College Park, MD) 
mf­0.enss145.t3.ans.net 
t3­2.cnss56.washington­dc.t3.ans.net (Washington, DC) 
t3­1.cnss72.greensboro.t3.ans.net (Greensboro, NC) 
t3­0.cnss104.atlanta.t3.ans.net (Atlanta, GA) 
t3­2.cnss64.houston.t3.ans.net (Houston, TX) 
t3­0.cnss112.albuquerque.t3.ans.net (Albuquerque, NM) 
t3­1.cnss16.los­angeles.t3.ans.net (Los Angeles, CA) 
t3­2.cnss8.san­francisco.t3.ans.net (San Francisco, CA) 
t3­0.enss144.t3.ans.net (Moffett Field, CA) 
fix­w.icm.net 
sl­stk­5­h2/0­t3.sprintlink.net (Stockton, CA) 
sl­stk­6­f0/0.sprintlink.net 
sl­ana­2­h4/0­t3.sprintlink.net (Anaheim, CA) 
sl­ana­3­f0/0.sprintlink.net 
sl­starnet2­1­s0­t1.sprintlink.net (St. Louis, MO) 
stl2­e0.starnet.net 
ncrc­acn.wustl.edu 
ncrc­eng.wustl.edu 
jcr.ecl.wustl.edu 
tango.cs.wustl.edu 
Figure 6.8: Individual routers comprising circuitous path from lbl to wustl 

68 
Figure 6.9: Circuitous route from ncar to xor 
USA 
Figure 6.10: Circuitous route from inria to oce 

69 
Pathology Probability Trend Notes 
Unresponsive routers 0.00--0.53% Rare enough to not present a mea­ 
surement problem. 
Failure to decrement TTL 0.18% ? 0.06% better Downstream router visited 
prematurely. 
Persistent routing loops 0.13--0.16% Some lasted for hours. 
Temporary routing loops 0.055--0.078% 
Erroneous routing 0.004--0.004% Packets in R1 visited Israel! No 
instances in R2 . 
Connectivity altered mid­stream 0.16% ? 0.44% worse Suggests rapidly varying routes. 
Infrastructure failure 0.21% ? 0.48% worse No dominant link. 
Temporary outage – 30 secs 0.96% ? 2.2% worse Outage duration distributed as 
constant plus exponential. This 
distribution in R2 is bimodal. 
Total user­visible pathologies 1.5% ? 3.4% worse 
Table X: Summary of representative routing pathologies 
it may be an inevitable consequence of the structure of today's Internet: the network is so vast 
and heterogeneous, and so under­instrumented for purposes of diagnosing end­to­end ailments, that 
errors inexorably arise and persist for long periods of time. 
6.10 Summary 
Table X summarizes the routing pathologies we studied in this section. The table is con­ 
fined to those pathologies for which we claim our samples are representative (x 4.4). (So, for exam­ 
ple, we omit the ``fluttering'' pathology, which was heavily dominated by a pair of sites in our study; 
and also ``host down,'' and ``stub network outage.'') The first part of the table reflects pathologies 
that are not in general visible to an end­to­end user of the network; that is, their presence does not 
significantly impact most network users. The second part of the table summarizes pathologies that 
are user­visible. 
The second column gives the probability of observing the pathology, in two forms. When 
the probability is given as a range, such as for ``persistent routing loops,'' then the proportion of ob­ 
servations of the pathology in R 1 was consistent with the proportion in R 2 (using the methodology 
in x 4.5). The range reflects the values spanned by the two datasets. 
When the table lists two probabilities separated by ``?,'' then the proportion of R 1 obser­ 
vations was inconsistent, with 95% confidence, with the proportion of R 2 observations. The first 
probability applies to the R 1 measurements, and reflects the state of the Internet at the end of 1994; 
and the second to the R 2 measurements, reflecting the state at the end of 1995. 
For those pathologies with inconsistent probabilities, the third column assesses the trend 
during the year separating the R 1 and R 2 measurements. A trend of ``better'' indicates that the situ­ 
ation improved, and ``worse'' that it degraded. One pathology improved significantly: the likelihood 

70 
of a router failing to decrement the TTL decreased. This change likely reflects upgraded and more 
stable router software. 
Note though that this pathology is of no interest to end­to­end users of the network--- 
improvements in the pathology do not reflect any significant gains in network service for the user. 
On the other hand, of the pathologies given in the second part of the table, which are of interest to 
users, none of them improved!, and a number became significantly worse. 
The final row summarizes the total probability of observing a user­visible pathology. We 
note that: During 1995, the likelihood of a user encountering a serious end­to­end routing problem 
more than doubled, to 1 in 30. The most prevalent of these problems was an outage lasting more 
than 30 seconds. 
This finding raises concerns regarding the long­term stability of the Internet. Clearly, if 
the trend continues, then network service will degrade to unacceptable levels. Unfortunately, from 
only two points in time it is impossible to assess the actual likelihood of the trend continuing. 
Finally, we note that, for reasons given in x 5.2, our estimates of the prevalence of patholo­ 
gies are biased towards underestimation; the true figures are most likely somewhat higher. 

71 
Chapter 7 
End­to­End Routing Stability 
One key property we would like to know about an end­to­end Internet route is its stability: 
do routes change often, or are they stable over time? In this section we analyze the routing measure­ 
ments to address this question. We begin by discussing the impact of routing stability on different 
aspects of networking, to motivate our study, and summarizing the reasons why routes change. We 
then present two different notions of routing stability, ``prevalence'' and ``persistence,'' and show that 
they can be orthogonal (i.e., a route can be considered ``stable'' by one definition independently of 
whether it is stable by the other definition). 
It turns out that ``prevalence'' is quite easy to assess from our measurements, and ``persis­ 
tence'' quite difficult. In x 7.5 we characterize the ``prevalence'' stability of the routes, and then in 
x 7.6 we tackle the problem of assessing ``persistence.'' 
We finish by evaluating a method for detecting route changes based on observing changes 
in hop count (TTL). We find this method makes a decent heuristic, but generates enough ``false 
negatives'' that it should not be trusted if accuracy is crucial. 
7.1 Importance of routing stability 
One of the stated goals of the Internet architecture is that large­scale routing changes (i.e., 
those involving different autonomous systems) rarely occur [Li89]: 
The Inter­AS Routing scheme must provide stability of routes. It is totally unac­ 
ceptable for routes to vary on a frequent basis. This requirement is not meant to limit 
the ability of the routing algorithm to react rapidly to major topological changes, such 
as the loss of connectivity between two AS's. The need for adaptive routing does not 
imply any desire for load­based routing. 
This point has been argued by others as well [BE90, Tr95b]. Routing instability sets the foremost 
limit on how use of BGP can scale to a very large internet, because CPU utilization required by BGP 
routers increases directly in proportion to the frequency of routing changes (but not, otherwise, 
in proportion to the overall size of the network) [Tr95b]. Hence, the key concern is that routing 
instability can in turn lead to general network instability (i.e., loss of packet­forwarding function). 
There are a number of aspects of networking affected by routing stability: 

72 
1. Some of the most important properties of a network---latency, bandwidth, congestion levels, 
packet losses---are all route properties. If the route through the network changes, so might 
some or all of these properties. Therefore, the degree to which a network's behavior is pre­ 
dictable is directly related to the stability of its routes. This is not to say that, even if the 
route remains stable, these properties will too. Rather, routing stability is necessary but not 
sufficient for predictable network behavior. 
One particular example affected by routing stability is the predictive service scheme proposed 
for real­time network traffic [CSZ92]. Predictive service attempts to satisfy the performance 
requirements of real­time traffic by only admitting new real­time flows if recent traffic mea­ 
surements suggest the network has sufficient capacity for them. If routes are unstable over 
short time scales, however, then these predictions become considerably difficult to make. 
2. The degree to which endpoints can benefit from caching information of previously encoun­ 
tered path conditions is limited by (among other factors) whether the route observed in the 
past is likely to be the same as the present route. 
3. New network protocols supporting ``real­time'' applications such as audio and visual flows 
generally require establishing state in routers in order to assure that the flows receive the nec­ 
essary performance. Real­time flows will often be long­lived, existing for time spans on the 
order of human interactions (minutes to hours) rather than computer interactions (millisec­ 
onds to seconds). If routing changes occur frequently, then these long­lived flows will be 
prone to losing the state they have established in the routers in the network, and will suf­ 
fer outages or degraded service while they attempt to find alternate routes with sufficient 
resources. 
Some protocols use ``hard state'' in the routers, meaning that, if state information for a 
given flow is not present in the router, then the router will not forward the flow's packets 
[DB95, FBZ94]. Other protocols use ``soft state'' schemes in which, even if a router has no 
corresponding state information for the flow, it will forward a flow's packets, though with 
possibly degraded performance [ZDESZ93, BCS94, DEFJLW94]. Hard state and soft state 
schemes trade off performance guarantees versus flexibility in the face of errors. Part of the 
question of evaluating the flexibility gain of soft state schemes concerns the degree of route 
stability. If routes do not tend to change frequently, then the soft state gain in flexibility is 
minor, but, if routes change frequently, then the gain will be larger. 
For an overview of the difficulties of dealing with routing changes in real­time protocols, see 
[GR95]. We do not attempt here to evaluate the flexibility gain of soft state versus hard state 
schemes. Indeed, the question is much more complex than stated above 1 . But we do attempt 
to characterize the stability constants that could then be used in such an evaluation. 
4. Another form of router state arises from schemes for supporting advance reservations, in 
which the network allows resources to be reserved for future use [FGV95]. If the state con­ 
1 For example, both types of schemes often use ``route pinning,'' in which the route available when a flow is established 
remains the route used by that flow for its lifetime. If a route is pinned, then only route changes due to the failure of a 
router used by the flow affect the flow; not those due to the discovery of improved routes (x 7.2). 
Similarly, some hard state schemes have explicit recovery mechanisms for when a flow's route does fail ([Ba94, DB95, 
GR95]), so these schemes do not necessarily stop working in the presence of route changes. 

73 
cerning these reservations is stored in the network's routers (a logical choice, to avoid cen­ 
tralized bottlenecks), then frequent route changes may lead to reservations failing because 
the routers used to establish the reservations are no longer the routers relevant to the real­ 
time path. 
5. If routes change frequently, then network measurements face difficult consistency problems. 
For example, several studies of end­to­end network behavior rely on repeated measurements 
of a network path made over the course of hours to days [Mi83, CPB93a, Bo93, SAGJ93, 
Mu94, BCG95]. Whether these measurements all observe the same path significantly affects 
the accuracy of the studies. 
Similarly, distributed algorithms for analyzing the network's state also face consistency prob­ 
lems if routes change frequently. For example, recent theoretical work has developed ``tomog­ 
raphy'' techniques for inferring end­to­end network traffic intensities using just measurements 
of aggregate traffic intensities along the network's links [Va95]. The work assumes stable 
routing (an extension explores Markovian routing). If routes change frequently, then it may 
prove extremely difficult to capture a consistent global snapshot of any significant portion of 
the Internet for purposes of operational monitoring. 
We now look briefly at why routes change, and then introduce two different notions of 
routing stability, to encompass the different stability concerns discussed above. 
7.2 Why routes change 
There are several different reasons why a route might change: 
1. If a link or router fails, then the network must reroute traffic using that link or router. 
2. If a link or router recovers, then the network may elect to route previously redirected traffic 
back to using that link or router. If routes are ``pinned,'' however, then they will not be changed 
due to recoveries. 
3. If a link degrades or improves, where such notions might for example be measured by con­ 
gestion levels, then the network might adapt by changing routes to account for the altered 
view of the cost of the link. For example, the ARPANET routing algorithms were designed 
to route around congested areas of the network. As experience with the ARPANET showed, 
such adaptive routing is tricky to get right: the initial routing scheme reacted ``very quickly 
to good news, and very slowly to bad news'' [MFR78], and the first revision of the algo­ 
rithm [MRR80] also exhibited oscillations under heavy load [KZ89]. Because it is difficult 
to achieve stable adaptive routing, in which routes are not subject to rapid oscillation in re­ 
sponse to transient congestion, adaptive routing is not widely used [Mo95], and a number of 
researchers argue for caution in its use [ERH92, RG95]. 
4. A router might cycle between different routes to the same destination in order to balance 
load. We analyzed this sort of route ``flutter'' in x 6.6, where we found that often its effects 
are confined to a single hop in an Internet path, but sometimes the split routes fail to rejoin, 
leading to drastically different path characteristics. 

74 
Wewould hope to observe four different time constants associated with these four reasons, 
of decreasing durations. Link failures should occur only rarely, hopefully on the time scale of days. 
Link recoveries should occur significantly quicker (i.e., shortly after the link failure), on the time 
scale of minutes (if a reboot or restart is all that is required) to hours (if human intervention and 
repair is required). If adaptive routing is used, then changes should occur on the time scales of 
congestion epochs (unfortunately not well characterized in the literature), which one presumes is on 
the order of seconds to minutes; adaptive routing algorithms generally damp rapid changes, though, 
to avoid oscillations, so we would expect this time constant to be more on the order of minutes. 
Finally, load balancing is generally done on very small time scales (such as every other packet), on 
the order of milliseconds. 
7.3 Two definitions of stability 
As suggested in x 7.1, there are two distinct views of routing stability. The first is: ``Given 
that I observed route r at time t, how likely am I to observe r again at time t + s?'' We refer to 
this notion as prevalence. A route's prevalence directly affects the first two motivations discussed 
above, namely predictability of service, and our ability to learn from past conditions. In general, the 
degree of route prevalence will depend on s. For large s, however, we would expect the observation 
at time t + s to be (nearly) independent of the observation at time t. In this study, for simplicity we 
focus on the unconditional probability of observing a route, confining our analysis to s ! 1, i.e., 
the steady­state probability of observing r again at a point far in the future. We leave the interesting 
question of how prevalence evolves for different intervals s for future work. 
A second view of stability is: ``Given that I observed route r at time t, how long before 
that route is likely to have changed?'' The likelihood of routes changing in the near future has 
implications for the latter three motivations, namely hard and soft router state, resource reservations, 
and network measurement consistency. We refer to this notion as persistence. 
Intuitively, we might expect these two notions to be coupled. Consider, for example, a 
sequence of routing observations made every T units of time. If the routes we observe are: 
R 1 ; R 1 ; R 1 ; R 1 ; R 1 ; R 1 ; R 1 ; R 1 ; R 1 ; R 1 ; R 1 ; R 2 ; R 1 ; R 1 ; R 1 : : : 
then clearly route R 1 is much more prevalent than route R 2 . We might also conclude that route 
R 1 is persistent, because we observe it so frequently; but this is not at all necessarily the case. For 
example, suppose T is one day. If the mean duration of R 1 is actually 10 days, and that of R 2 is one 
day, then this sequence of observations is quite plausible, and we would be correct in concluding 
that R 1 is persistent and prevalent. Furthermore, depending on our concern, we might also deem 
that R 2 is persistent, since on average it lasts for a full day (if its lifetime were much shorter, then we 
would have been unlikely to observe it from measurements made only once a day). If we consider 
a route that last for more than a few hours as persistent, then from the above observations we could 
argue that R 2 is persistent but not prevalent. 
But suppose instead that the mean duration of R 1 is 10 seconds and the mean duration of 
R 2 is 1 second, and that alterations between them occur as a semi­Markov process, 2 where state 1 
2 Such processes consist of a set of states. Each state i has associated with it a distribution of durations, G i . The distri­ 
bution depends on the state number i, but not on anything else. Upon entering state i, a duration is drawn independently 

75 
of the process corresponds to R 1 , state 2 to R 2 , and P 1;2 = P 2;1 = 1 (i.e., whenever a change 
occurs, it is a change to the other route). Then a well­known result from the theory of stochastic 
processes states that the proportion of time the system spends in state 1 is equal to the mean duration 
of state 1 divided by the sum of the mean durations of states 1 and 2 [Ro83]. For our example, we 
have that the proportion of time spent in state R 1 is 10 
11 
, reflecting that R 1 is prevalent. Similarly, the 
proportion of time spent in state R 2 is 1 
11 . Given these proportions, the sequence of observations is 
again plausible, even though each observation of R 1 is actually of a separate instance of the route. 
In this case, R 1 is prevalent but not persistent, and R 2 is neither prevalent nor persistent. In other 
words, we very likely are missing instances of R 2 between observations of R 1 , and hence R 1 is not 
persistent. 
This example shows that the notions of ``prevalent'' versus ``persistent'' stability are or­ 
thogonal, in the sense that the presence or absence of one does not necessarily indicate anything 
about the presence or absence of the other. 
7.4 Reducing the data 
To begin our analysis, we first need to reduce the more than 40,000 traceroutes mea­ 
surements in R 1 and R 2 to those relevant for assessing stability. Before we had gathered the R 2 
measurements, we performed an initial stability analysis of the R 1 data. Doing so, we concluded 
that the inter­measurement spacing of the R 1 traceroutes , on average about one day, was too 
large to allow any assessment of routing stability in terms of persistence, because of the ambiguities 
discussed in the previous section. Consequently, we confine our routing stability analysis to R 2 , 
which contains the bulk (85%) of the 40,000 measurements. 60% of these were taken with a 2­hour 
inter­measurement spacing. As shown in the remainder of this chapter, this granularity is sufficient 
to resolve the persistence ambiguities. 
Of the 35,109 R 2 measurements, we began by excluding those exhibiting the patholo­ 
gies discussed in Chapter 6, because they reflect connectivity difficulties distinct from routing 
instabilities. 3 (We did not exclude ``circuitous'' routes, however, because, as mentioned in x 6.9, 
these are not true pathologies.) Doing so eliminated 805 traceroutes. 
We also omitted traceroutes for which one or more hops were completely missing (all 
three of the probe packets unanswered). These measurements are inherently ambiguous, because we 
could not tell if the route was the same as that observed at other instances. This decision eliminated 
another 2,595 measurements, leaving us with a total of 31,709 measurements. 
We next made a preliminary assessment of the patterns of route changes by seeing which 
changes occurred the most frequently. We found the pattern of changes dominated by a number of 
from G i . The process remains in state i until the duration elapses. At this point, a new state j is chosen based on a set of 
probabilities fixed for state i. 
3 An exception is the pathology of a routing change during a traceroute. Including these pathologies, however, 
can lead to overestimating the frequency of route changes. Suppose we make three route measurements of a particular 
path, yielding routes A, A=B, and B, where A=B indicates a traceroute that included a change from route A to 
route B. If we included the second, pathological measurement, we would conclude that over the three observations two 
changes occurred (A to A=B and A=B to B), whereas in reality only one change occurred (A to B). 
It is possible that instead the sequence we observe is A, A=B, A, because route B was short­lived; in this case, omitting 
the pathological traceroute underestimates the frequency of changes. But this becomes an issue only if B was quite 
short­lived, and we account for such routes separately, as discussed in x 7.6.1. 

76 
Routers Notes 
asd01.nl.net, amf01.nl.net These routers are located in different cities, but 
provide equal bandwidth and latency to their peers 
[Lin96]. 
icm­dc­1.icp.net, 
icm­dc­2b­s4/0­1984k.icp.net 
rgnet­b1­serial2­3.seattle.mci.net, 
rainnet­inc.seattle.mci.net 
rb1.rtr.unimelb.edu.au, 
rb2.rtr.unimelb.edu.au 
unit­gw.unit.no, 
sintef­gw.sintef.no 
Both at the University of Trondheim. 
Table XI: Tightly­coupled routers 
single­hop differences, at which consecutive measurements showed exactly the same path except for 
a single router. Furthermore, the names of these routers often suggested that the pair were adminis­ 
tratively interchangeable. 4 For example, many of the routing changes to the austr site only differed 
in whether the University of Melbourne border router in the route was rb1.rtr.unimelb.edu.au 
or rb2.rtr.unimelb.edu.au. Which of these two routers provides the route to the austr host 
depends on the distribution of load within other parts of the University, but the two routers are under 
the same direct administration and would indeed be one machine if a single router with sufficient 
capacity had been available at the time of acquisition [El96]. 
It seems likely that many route changes differing at just a single hop are due to shifting 
traffic between two tightly coupled machines. For the stability concerns given in x 7.1, such a 
change is likely to have little consequence, provided the two routers are co­located and capable of 
sharing state. We decided that, if a single pair of routers with like names were responsible for more 
than 200 routing transitions, then we would classify them as ``tightly coupled,'' and merge them 
into a single router for purposes of evaluating stability. Table XI summarizes these routers. After 
merging those responsible for ? 200 changes, the remaining pairs were all responsible for 80 or 
fewer changes. We left these as separate routers, as changes between them did not dominate the 
data, and we would like to minimize assumptions about which routers are tightly coupled. 
Finally, we reduced the acceptable routes to three different levels of granularity. First, 
we considered each route as a sequence of Internet hostnames. We call this host granularity. We 
then reduced the routes to sequences of cities, as outlined in x 5.3. Note that a route change at host 
granularity might not be a route change at city granularity, though the converse always holds. The 
motivation behind the distinction of host granularity vs. city granularity is to introduce a notion of 
``any change'' vs. ``major change.'' A route change at city granularity will likely have considerably 
more repercussions than a change visible only at host granularity. For example, the latency of the 
route will often be different. Overall, 57% of the route changes at host granularity were also route 
changes at city granularity. 
4 Sometimes the routers were identical. For example, IP address 192.157.65.130, which translates to 
icm­paris­1­s0­1984k.icp.net, is actually also an interface on paris­ebs2.ebone.net. 

77 
The third level of granularity was AS path---the sequence of autonomous systems visited 
by the route (x 4.4). A change at AS granularity reflects a possible change in the intermediate routing 
algorithms and policies, and as such is another form of major change. Overall, 36% of the route 
changes at host granularity were also changes at AS path granularity. Note that a change at AS path 
granularity is not necessarily a change at city granularity, nor vice versa, though overall we found 
AS path granularity coarser (i.e., comprising fewer changes) than city granularity. 
7.5 Routing Prevalence 
In this section we look at routing stability from the standpoint of prevalence: how likely 
we are, overall, to observe a particular route (c.f. x 7.3). We can associate with prevalence a pa­ 
rameter ß r , the steady­state probability that a path at an arbitrary point in time uses a particular 
route r. 
We can assess ß r from our data as follows. We suppose that routing changes follow a 
semi­Markov process. In this model, each route's duration has a fixed distribution (but different 
routes can have different distributions), and the duration of each instance of a route is independent 
of all previous route durations. Furthermore, the probability that route r 1 is followed by route r 2 is 
fixed and independent of past events. 
We then use the result that, for a semi­Markov process, the steady­state probability of 
observing a particular state is equal to the average amount of time spent in that state [Ro83]. 5 Fur­ 
thermore, because of PASTA, our independent exponential sampling gives us an unbiased estimator 
of this time average (x 4.3). Suppose we make n observations of a path and k r of them find state r 
(i.e., route r). Then we will estimate ß r as “ ß r = k r =n. 
We proceed as follows. For a particular path p (and for a given granularity), let n p be 
the total number of traceroutes measuring that path, and d p the number of distinct routes seen. 
We will denote the most commonly occurring route as the dominant route, and others as secondary 
routes. Thus, there are always d p 
dom p 
= k p =n p ; 
the prevalence of the dominant route. 
Figure 7.1 shows the cumulative distribution of the prevalence of the dominant routes over 
all of the paths in our study (i.e., all 1,054 source/destination pairs), for the three different granular­ 
ities. For example, at host granularity, nearly half (49%) of the paths (y­axis) were dominated by a 
route with a prevalence of at least 80% (x­axis). 
There is clearly a wide range, particularly for host granularity. For example, for the path 
between pubnix and austr, in 46 measurements we observed 9 distinct routes at host granularity, 
and the dominant route was observed only 10 times, leading to “ 
ß dom 
= 0:217. On the other hand, 
at host granularity more than 25% of the paths exhibited only a single route (“ß dom 
= 1). For city 
and AS path granularities, the spread in “ 
ß dom 
is more narrow, as would be expected (the figure also 
5 This result requires that the distribution of time spent in each state be nonlattice: i.e., not always an integral multiple 
of some constant, so that the notion of ``steady state'' can be defined without reference to specifics about exactly when, in 
the far future, we observe the process. For route durations, this seems like a plausible assumption. 

78 
Prevalence of dominant route 
Cumulative 
probability 
0.0 0.2 0.4 0.6 0.8 1.0 
0.0 
0.2 
0.4 
0.6 
0.8 
1.0 Host granularity 
City granularity 
AS granularity 
Figure 7.1: Fraction of measurements observing the dominant route, for all paths, at all granularities 

79 
shows how route changes at city or AS path granularity do not necessarily imply changes at the 
other granularity, since neither is strictly below the other). 
A key figure to keep in mind from this plot, however, is that, while there is a wide range 
in the distribution of “ 
ß dom over different paths, its median value at host granularity is 82%; 97% at 
city granularity, and 100% at AS path granularity. The clustering of many paths only ever exhibiting 
a single route (i.e., prevalence = 100%) reflects the finding we develop below in x 7.6 that many 
routes are long­lived. (If we had data gathered over periods of time exceeding several weeks, we 
would doubtless find that the spike at prevalence = 100% would spread out to values in the upper 
90%'s.) Thus, we can conclude: In general, Internet paths are strongly dominated by a single route. 
Our previous work, however, has shown that many characteristics of network traffic ex­ 
hibit considerable site­to­site variation [Pa94a], and thus it behooves us to assess the differences in 
“ 
ß dom between the sites in our study. To do so, for each site s (and for each granularity) we computed: 
“ 
ß src s = 
P 
src paths s i 
k s i 
P 
src paths s j 
n s j 
: 
where k s i 
is the number of times we observed the dominate route when measuring a path from 
source s to destination i, and n s j 
is the total number of times we made a measurement of the path 
from source s to destination j. 
The aggregate estimate “ 
ß src s then indicates the overall prevalence of dominant routes from 
s to different destinations. We expect variations in this estimate for different sites to reflect differing 
routing prevalence due to route changes near the source. Route changes further downstream from 
the source occur either deep inside the network (and so will affect many different sites), or near the 
destination (and thus will not affect any particular source site unduly). 
Similarly, we can construct “ 
ß dst s for all of the paths with destination s. Studying “ 
ß src s 
and “ 
ß dst s for different sites and at different granularities reveals considerable site­to­site variation, 
in agreement with the general findings in [Pa94a]. Figure 7.2 shows the values computed for “ 
ß src s 
for each of the R 2 sites, at host granularity. We find that the prevalence of the dominant routes 
originating at the ucl source is under 50% (we will see in x 7.6.1 the main cause for this), and 
for bnl, sintef1, sintef2, and pubnix is around 60%; while for ncar, ucol, and unij, it is 
just under 90%. Even at AS path granularity, the ucl source has an average prevalence of 60%, 
with ukc about 70%, and the remainder from 85% to 99%. At city granularity, however, the main 
outlier is bnl, with a prevalence of 75% (c.f. x 7.6.2), because the ucl and ukc instabilities, while 
spanning autonomous systems, do not span different cities. 
We find similar spreads for “ 
ß dst s for different destination sites s. Figure 7.3 shows the 
per­site values, computed for host granularity. Sometimes the sites with low overall prevalence are 
the same as the sites with low prevalence for “ ß src s (e.g., ucl), and sometimes they are different 
(e.g., ukc); this variation is due to asymmetric routing, which we analyze in Chapter 8. 
We can thus summarize routing prevalence as follows: In general, Internet paths are 
strongly dominated by a single route, but, as with many aspects of Internet behavior, we also find 
significant site­to­site variation. 

80 
unij 
ucol 
ncar 
panix 
austr 
adv 
nrao 
bsdi 
harv 
ucla 
sandia 
mid 
near 
connix 
sdsc 
rain 
lbli 
mit 
umont 
inria 
sri 
lbl 
austr2 
ustutt 
oce 
umann 
wustl 
ukc 
pubnix 
sintef1 
bnl 
sintef2 
ucl 
0.0 0.2 0.4 0.6 0.8 1.0 
Likelihood of Observing the Dominant Route 
Figure 7.2: Fraction of measurements observing the dominant route, for different source sites, at 
host granularity 

81 
bsdi 
unij 
ucol 
nrao 
ukc 
panix 
ucla 
mid 
harv 
connix 
rain 
adv 
ncar 
mit 
sri 
umont 
austr2 
lbli 
inria 
pubnix 
oce 
lbl 
austr 
ustutt 
bnl 
umann 
sandia 
sdsc 
wustl 
near 
sintef1 
sintef2 
ucl 
0.0 0.2 0.4 0.6 0.8 1.0 
Likelihood of Observing the Dominant Route 
Figure 7.3: Fraction of measurements observing the dominant route, for different destination sites, 
at host granularity 

82 
7.6 Routing Persistence 
We now turn to the more difficult task of assessing the persistence of routes: How long 
they are likely to endure before changing. As illustrated in x 7.3, unlike prevalence, routing persis­ 
tence can be difficult to evaluate because a series of measurements at particular points in time do 
not necessarily indicate a lack of change and then change back in between the measurement points. 
Thus, to accurately assess persistence requires first determining whether routing alternates on short 
time scales. If not, then we can trust shortly spaced measurements observing the same route as 
indicating that the route did indeed persist during the interval between the measurements. If shortly 
spaced measurements can be trusted in this fashion, then they can be used to assess whether routing 
alternates on medium time scales. 
Fortunately, we have measurements made at a number of different intervals: about 60% 
of the R 2 measurements were exponentially distributed with a mean of 2 hours, and the other 40% 
with a mean of about 66 hours (with wide variation in the actual intervals, since they were expo­ 
nentially distributed). While these measurements do not allow us to directly address the problem of 
assessing persistence---doing so would require a way to unambiguously determine exactly when a 
route changed, which could be done by tracing BGP routing information exchanges, 6 but not from 
end­to­end traceroutes---our strategy is to analyze the measurements with the shorter spacing to 
assess the frequency of route alternations, and, in turn, to determine to what degree we can trust the 
measurements with larger spacing. In this fashion, we aim to ``bootstrap'' ourselves into a position 
to be able to make sound characterizations of routing persistence across a number of time scales. 
7.6.1 Rapid route alternation 
In order to reliably analyze widely­spaced traceroute measurements, we must first 
assess the predominance of rapidly alternating routes. We have already identified two types of 
rapidly alternating routes, those due to ``flutter'' and those due to ``tightly coupled'' routers. We have 
separately characterized fluttering (x 6.6) and consequently have not included paths experiencing 
flutter in this analysis. As mentioned in x 7.4, we merged tightly coupled routers into a single entity, 
so their presence also does not further affect our analysis of rapidly alternating routes. 
We next note that in R 2 we observed 155 instances of a route change during a 
traceroute. The combined amount of time observed by the 35,109 R 2 traceroutes was 
881,578 seconds. (That is, the mean duration of a R 2 traceroute was 25.1 seconds.) Since 
when observing the network for 881,578 seconds we saw 155 route changes, we can estimate that 
on average we will see a route change every 5,687 seconds (ß 1.5 hours). This reflects quite a high 
rate of route alternation, and bodes ill for relying on measurements made much more than a few 
hours apart (though see x 7.6.2); but it is not such a high rate that we would expect to completely 
miss routing changes for sampling intervals significantly less than an hour. 
We first looked at those traceroute measurements that were made less than 60 seconds 
apart. There were only 54 of these, but all of them were of the form ``R 1 ; R 1 ''---i.e., both of the 
measurements observed the same route. This provides credible, though not definitive, evidence that 
6 As briefly mentioned in x 3.2, recent work by Jahanian, Labovitz and Malan pursues this approach with very inter­ 
esting results [JLM97]. We became aware of this work too late to discuss it here, but will address it in the version of 
[Pa96b] that we are presently revising for publication in IEEE/ACM Transactions on Networking. 

83 
there are no additional widespread, high­frequency routing oscillations, other than those we have 
already characterized. 
We then looked at measurements made less than 10 minutes apart. There were 1,302 
of these, and 40 triple observations (three observations all within a ten minute interval). The triple 
observations allow us to double check for the presence of high­frequency oscillations: if we observe 
the pattern R 1 ; R 2 ; R 1 or R 1 ; R 2 ; R 3 , then we are likely to miss some route changes when using only 
two measurements 10 minutes apart. If we only observe R 1 ; R 1 ; R 1 ; R 1 ; R 2 ; R 2 ; or R 1 ; R 1 ; R 2 , 
then measurements made 10 minutes apart are not missing short­lived routes. Of the 40 triple 
observations, none were of the form R 1 ; R 2 ; R 1 or R 1 ; R 2 ; R 3 , confirming the finding from the 
60 second observations that there are no additional sources of high­frequency oscillation. 
The 1,302 ten­minute observations included 25 instances of a route change (R 1 ; R 2 ). This 
suggests that the likelihood of observing a route change over a ten minute interval is not negligible, 
and requires further investigation before we can look at more widely spaced measurements. 
A natural question to ask concerning 10­minute changes is whether they are equally likely 
to occur along paths between any two sites, or if just a few sites are responsible for most of the 
10­minute changes. 7 This is an important consideration: if all paths are equally likely to exhibit 
a change during a 10­minute interval, then from the figure above of 25 changes observed out of 
1,302 ten­minute observations we could conclude that routes change, on average, 25 times per 
(1; 302 \Delta 10 min), or about once every eight hours. 
We test whether paths to or from particular sites are more prone to change than others as 
follows. For each site s, let N 10 
src s be the number of 10­minute pairs of measurements originating 
at s, and X 10 
src s be the number of times those pairs reflected a transition (i.e., the pair was R 1 ; R 2 ). 
Similarly, define N 10 
dst s and X 10 
dst s for those pairs of measurements with destination s. Here we are 
aggregating, for each site, all of the measurements made using that site as a source (destination), in 
an attempt to see whether route oscillations are significantly more prevalent near a handful of the 
sites. 
For each site s, we can then define: 
P 10 
src s = 
X 10 
src s 
N 10 
src s 
; 
and similarly for P 10 
dst s . These values then give the estimated probability that a pair of ten­minute 
observations of paths with source (or destination) s will show a routing change. We now check 
the P 10 
src s (and P 10 
dst s ) estimates for each site to determine which sites appear particularly prone to 
exhibiting changes during ten minute intervals. 
Figure 7.4 shows the sorted P 10 
dst s estimates. We see, for example, that none of the 10­ 
minute measurements of paths to the destination adv observed a route change, but more than 12% of 
those to austr did. From the plot, austr appears to be an outlier, and merits further investigation. 
Before removing it as an outlier, however, we must be careful to first look at its routing oscillations 
to see what patterns they exhibit. 
For the destination austr, the 10­minute changes involve a number of source sites: 
inria, mit, near (twice), and pubnix. All of the changes take place at the point­of­entry 
7 Certainly no single path (between the same source/destination pair) is skewing the count of 10­minute changes, since 
the most frequently observed single path only accounted for 8 of the 1,302 observations. 

84 
adv 
austr2 
bnl 
bsdi 
lbli 
mid 
mit 
near 
nrao 
panix 
sri 
ucla 
ucol 
ukc 
umont 
ustutt 
wustl 
sintef1 
lbl 
sintef2 
inria 
harv 
pubnix 
sdsc 
ncar 
oce 
ucl 
connix 
rain 
unij 
umann 
sandia 
austr 
0.0 0.02 0.04 0.06 0.08 0.10 0.12 
Estimated probability of observing a change 
Figure 7.4: Site­to­site variation in P 10 
dst s 

85 
into Australia. 8 The changes are either the first Australian hop of vic.gw.au, in Melbourne, 
or act.gw.au, in Canberra, or serial4­6.pad­core2.sydney.telstra.net in Sydney fol­ 
lowed by an additional hop to nsw.gw.au (also in Sydney). These are the only points of change: 
before and after, the routes are unchanged. Thus, the destination austr exhibits rapid (time scale of 
tens of minutes) changes in its incoming routing, and these changes are non­negligible, since they 
involve different Australian cities. As such, the routing to austr is not at all persistent. 
However, for the next potential outlier, sandia, the story is different. Both of its changes 
occurred along the path originating at sri, and reflected the following change at hops 8 and 9: 
core­fddi­0.sanfrancisco.mci.net 
borderx2­fddi0­0.sanfrancisco.mci. 
versus: 
core2­fddi­0.sanfrancisco.mci.net 
borderx2­fddi­1.sanfrancisco.mci.net 
These changes are localized to a single city. Furthermore, had this change been more prevalent, we 
might have decided that the two pairs of routers in question were ``tightly coupled'' (x 7.4), except 
that it turns out that they are responsible for routing changes only between sri and sandia. Thus, 
we can deal with this outlier by just eliminating the path sri ) sandia, but keeping the other 
paths with destination sandia. 
In addition to the destination austr, a similar analysis of P 10 
src s points up ucl, ukc, mid, 
and umann as outliers. Both ucl and ukc had frequent oscillations in the routers visited between 
London and Washington, D.C., alternating between the two hops of: 
icm­lon­1.icp.net 
icm­dc­1­s3/2­1984k.icp.net 
and the four hops of: 
eu­gw.ja.net 
gw.linx.ja.net 
us­gw.thouse.ja.net 
icm­dc­1­s2/4­1984k.icp.net 
Note that these different hops also correspond to different AS's, as the latter includes AS 786 
(JANET) and the former does not. For mid and umann, however, the changes did not have a clear 
pattern, and their prevalence could be due simply to chance. 
On the basis of this analysis, we conclude that the sources ucl and ukc, and the desti­ 
nation austr, suffer from significant, high­frequency oscillation, and excluded them from further 
analysis. After removing any measurements originating from the first two or destined to austr, we 
then revisited the range of values for P 10 
src s and P 10 
dst s . Both of these now had a median of 0 observed 
changes, and a maximum corresponding to about 1 change per hour (this latter rate is computed by 
dividing the number of route changes observed for the site's paths by total amount of time spanned 
by the measurements of those paths). On this basis, we believe we are on firm ground treating pairs 
of measurements between these sites, made less than an hour apart, both observing the same route, 
as consistent with that route having persisted unchanged between the measurements. 
8 Note that in general the paths to austr and austr2 use two different trans­Pacific links, which is why austr2 
does not exhibit these rapid changes. 

86 
7.6.2 Medium­scale route alternation 
Given the findings in the previous section that, except for a few sites, route changes do not 
occur on time scales less than an hour, we now turn to analyzing those measurements made an hour 
or less apart to determine what they tell us about medium­scale routing persistence. We proceed 
much as in x 7.6.1. 
Let P hr 
src s and P hr 
dst s be the analogs of P 10 
src s and P 10 
dst s , but now for measurements made an 
hour or less apart. After eliminating the rapidly oscillating paths identified in the previous section, 
we have 7,287 pairs of measurements to assess. 
The data also included 1,517 triple observations spanning an hour or less. Of these, only 
10 observed the pattern R 1 ; R 2 ; R 1 or R 1 ; R 2 ; R 3 , indicating that, in general, two observations 
spaced an hour apart are not likely to miss a routing change. 
Plots similar to Figure 7.4 immediately pick out paths originating from bnl as exhibiting 
rapid changes. These changes are almost all from oscillation between llnl­satm.es.net and 
pppl­satm.es.net. The first of these is in Livermore, California, while the other is in Princeton, 
New Jersey, so this change is definitely major. ESNET oscillations also occurred on one­hour time 
scales in traffic between lbl (and lbli) and the Cambridge sites, near, harv, and mit. 
The other prevalent oscillation we found was between the source umann and the destina­ 
tions ucl and ukc. Here the alternation was: 
ch­s1­0.eurocore.bt.net 
uk­s1­1.eurocore.bt.net 
which goes through Switzerland to reach England, versus 
nl­s1­1.eurocore.bt.net 
uk­s1­0.eurocore.bt.net 
which goes through the Netherlands instead, also a major change. 
Eliminating these oscillating paths leaves us with 6,919 measurement pairs. These paths 
are not statistically identical (i.e., we find among them paths that have significantly different route 
change rates), but all have low rates of routing changes. For these paths, the median P hr 
src s and P hr 
dst s 
correspond to one routing change per 1.5 days, and the maximum to one change per 12 hours. 
7.6.3 Large­scale route alternation 
Given that, after removing the oscillating paths discussed in x 7.6.1 and x 7.6.2, we expect 
at most on the order of one route change per 12 hours, we now can analyze measurements less than 
6 hours apart of the remaining paths to assess longer­term route changes. There were 15,171 such 
pairs of measurements. As 6 hours is significantly larger than the mean 2 hour sampling interval 
(x 7.6), not surprisingly we find many triple measurements spanning less than 6 hours. But of the 
10,660 triple measurements, only 75 included a route change of the form R 1 ; R 2 ; R 1 or R 1 ; R 2 ; R 3 , 
indicating that, for the paths to which we have now narrowed our focus, we are still not missing 
many routing changes using measurements spaced up to 6 hours apart. 
Employing the same analysis, we first identify sintef1 and sintef2 as outliers, both 
as source and as destination sites. The majority of their route changes turn out to be oscillations 
between two sets of routers. The first alternates between: 

87 
trd­gw2.uninett.no 
in Trondheim, and 
oslos­gw.uninett.no 
trds­gw.uninett.no 
(or the reverse of this, for paths originating at sintef1 or sintef2), which includes an extra hop 
to Oslo. The second alternates between: 
nord­gw.nordu.net 
no­gw.nordu.net 
(or the reverse), the first hop in Stockholm and the second in Trondheim, and 
syd­gw.nordu.net 
no­gw2.nordu.net 
oslos­gw.uninett.no 
trds­gw.uninett.no 
which again adds a visit to Oslo (middle two hops). 
Two other outliers at this level are traffic to or from sdsc, which alternates between two 
different pairs of CERFNET routers, all sited in San Diego, and traffic originating from mid, which 
alternates between two MIDNET routers, both in St. Louis. 
Eliminating these paths leaves 11,174 measurements of the 712 remaining paths. The 
paths between the sites in these remaining measurements are quite stable, with a maximum transition 
rate for any site of about one change every two days, and a median rate of one change every four 
days. 
7.6.4 Duration of long­lived routes 
We will term the remaining measurements as corresponding to ``long­lived'' routes. For 
these, we might hazard to estimate the durations of the different routes as follows. We suppose that 
we are not completely missing any routing transitions (changes of the form R 1 ; R 2 ; R 1 , where we 
only observe the first and last). We base this assumption on the overall low rate of routing changes. 
Then, for a sequence of measurements all observing the same route, we assume that the route's 
duration was at least the span of the measurements. So if the last observation was made two weeks 
after the first observation, we assume the route's duration was at least two weeks. Furthermore, if at 
time t 1 we observe route R 1 , and then the next measurement at time t 2 observes route R 2 , we make 
a ``best guess'' that route R 1 terminated and route R 2 began half way between these measurements, 
i.e., at time t 1 +t2 
2 . 
For routes observed at the beginning (end) of our measurement period, but not spanning 
the entire measurement period, we assign a starting (ending) time as follows. If the next (previous) 
measurement also observed the route, then we estimate that the route persisted for at least that 
much time into the past (future). If the next (previous) measurement did not observe the route, 
then we take the lone observation of the route as its starting (ending) time. This rule will tend to 
underestimate routing durations, while the rule in the previous paragraph will tend to overestimate 
(due to occasionally missing a routing change), so these estimation errors will to some degree tend 
to cancel. 

88 
0 10 20 30 40 50 
0.0 
0.05 
0.10 
0.15 
Route Duration (Days) 
Fraction 
of 
Routes 
Figure 7.5: Estimated distribution of long­lived route durations 
Figure 7.5 shows the distribution of the estimated durations of the ``long­lived'' routes. 
Even keeping in mind that our estimates are rough, it is clear that the distribution of long­lived route 
durations has two distinct regions, with many of the routes persisting for 1­7 days, and another 
group persisting for several weeks. (Although not evident from the plot, about 4% of the routes 
had durations under 6 hours, so we might consider the distribution as having three distinct regions.) 
About half the routes persisted for under a week, but the half of the routes lasting more than a week 
accounted for 90% of total persistence, meaning the integrated amount of time during which routes 
remained unchanged. This means that, if we observe a path at an arbitrary point in time, and we are 
not observing one of the numerous, more rapidly oscillating paths outlined in the previous sections, 
then we have about a 90% chance of observing a route for that path with a duration of at least a 
week. 
7.6.5 Summary of routing persistence 
We summarize routing persistence as follows. First, routing changes occur over a wide 
range of time scales, ranging from seconds to days. Table XII lists different time scales over 
which routes change. The second column gives the percentage of all of our measurement paths 
(source/destination pairs) that were affected by route changes at the given time scale. (The first 
two rows show ``N/A'' in this field because the changes were due to a very small set of routers, so 
we do not claim any sort of representative fractions.) The third column gives the section where 
we discuss the changes, and the final column any associated notes. When the note mentions ``in­ 
side the network'' or ``intra­network,'' we mean that the changes occurred not at the stub networks 
where the sites themselves connect to the Internet, but instead in what we would deem the Internet 
infrastructure. 
One important point apparent from the table is that routing changes on shorter time scales 

89 
Time scale % Paths Affected x Notes 
seconds N/A x 6.6 ``Flutter'' for purposes of load balancing. Treated 
separately, as a pathology, and not included in the 
analysis of persistence. 
minutes N/A x 7.4 ``Tightly­coupled routers.'' We identified five in­ 
stances, which we merged into single routers for the 
remainder of the analysis. 
10's of minutes 9% x 7.6.1 Frequent route changes inside the network. In some 
cases involved routing through different cities or 
AS's. 
hours 4% x 7.6.2 Usually intra­network changes. 
6+ hours 19% x 7.6.3 Also intra­network changes. 
days 68% x 7.6.4 Two regions. 50% of routes persist for under 7 days. 
The remaining 50% account for 90% of the total 
route lifetimes. 
Table XII: Summary of persistence at different time scales 
(fewer than days) happen inside the network and not at the stub networks. Thus, those changes 
observed in our measurements are likely to be similar to those observed by most Internet sites. 
On the other hand, while the changes occurred inside the network, only those involving 
ucl and ukc (x 7.6.1) involved different sequences of autonomous systems. While this bodes well 
for the scalability of BGP, we do not claim this finding as having major significance: one could 
make a much more thorough assessment of the degree of inter­AS route flapping by analyzing the 
data discussed in [Do95, Me95b]. 
Finally, two thirds of the Internet paths we studied had quite stable paths, persisting for 
days or weeks. This finding is in accord with that of Chinoy's, who found that most networks are 
nearly quiescent (in terms of routing changes) while a few exhibit frequent connectivity transitions 
[Ch93]. 
7.7 Detecting route changes 
Given our findings that routes change in the Internet on a wide range of time scales, we 
would like to find mechanisms by which an endpoint can detect that its route to a remote destination 
has changed. This knowledge has two different applications. The first is that it allows the endpoint 
to flush any cached information associated with the route, such as round­trip time or available 
bandwidth. The second application is for network measurement experiments. A number of Internet 
experiments have been made in which a path through the network is repeatedly sampled [Mi83, 
CPB93a, Bo93, SAGJ93, Mu94, BCG95]. For such measurements it is important to know whether 
each time the path is measured, the measurement is observing the same route for that path, or 
whether the route may have changed (affecting the measurement). 
While traceroute can be used to elicit the route currently used for a given Internet path, 
its use is expensive in terms of network resources, and also slow because of the necessity to wait for 
(possibly dropped) replies to many probe packets. 

90 
Granularity False positives False negatives Error rate 
host 0% 25% 3% 
city 4% 26% 5% 
AS path 5% 10% 5% 
Table XIII: Summary of TTL method for detecting route changes at different granularities 
On the other hand, endpoints can easily determine whether a route's hop count has 
changed by seeing whether the TTL of packets arriving from the remote destination differ from 
the previously observed TTL. Because the IP TTL field is in fact a hop count and not a time­to­live 
(x 4.2.1), this measurement has no noise, provided the remote destination always sends packets with 
the same initial TTL. Thus, the endpoint need receive only a single packet from the destination in 
order to detect that the hop count of the path from the destination to the endpoint has changed. We 
call this method the ``TTL method.'' To our knowledge, it was first used in [CPB93a]. 
While the TTL method has an attractive simplicity, it will sometimes result in ``false neg­ 
atives'': the underlying route might have changed, perhaps drastically, but if the new route happens 
to have the same number of hops as the cached one, the TTL method will report it as unchanged. 
In this section, we explore the degree to which these false negatives affect the practicality of the 
method. 
After removing pathologies and fluttering paths, the data contained 30,145 consecutive 
traceroutes for us to test. Of these, 3,380 were route changes when viewed at host granularity, 
1,928 at city granularity, and 1,266 at AS path granularity. 
We consider a route to have changed if and only if it did not visit exactly the same hosts 
(cities; AS's) in the same order. Before determining the host visited at each hop, however, we 
merged the ``tightly­coupled'' routers discussed in x 7.4 into a single router. 
We deem the method as generating a ``false positive'' if it erroneously declares that the 
route changed, and a ``false negative'' if it fails to detect that the route did indeed change. To make 
these notions more precise, suppose that, out of N observations, K were genuine route changes 
at a given granularity, but of these K the method only detects k, and it also erroneously ``detects'' 
b bogus route changes. Then the false positive rate is b=(N 
)=N . 
Barring the remote host altering its initial TTL setting, or routers actually decrementing 
the TTL field for each second they delay a packet, the TTL method will never generate a false 
positive at host granularity 9 . It can do so at other granularities, however, when the underlying route 
changes in the number of hops, but the same cities or AS's are still visited. At all three granularities, 
the TTL method can generate false negatives. 
Table XIII summarizes the effectiveness of the TTL method for detecting different gran­ 
ularities of route changes. Its overall error rate is consistently low. This is mostly a reflection of 
the fact that all­in­all the underlying route does not change very often. Because in the absence 
of any change whatsoever the TTL method always reports ``no change,'' it is correct whenever the 
9 Provided we exclude from testing pathological routes that visit a given hop more than once, which we did. 

91 
underlying route has not changed. 
At no granularity, however, is the false negative rate especially good, and at city and AS 
path granularities the false positive rate is non­negligible, too. Thus, we conclude that the TTL 
method serves as a handy heuristic, but is definitely not fool­proof. Still, it seems worthwhile to 
use the TTL method to detect route changes when conducting the network measurement studies 
mentioned at the beginning of this section, and the generally low false positive rate suggests that 
flushing cached route information upon observing a TTL change will usually be the correct action. 
One must not, however, be too complacent in accepting the absence of a TTL change as indicative 
of an unchanged route. 
A final note concerning the TTL method: The TTL value most easily available to an 
endpoint for caching is that in packets the endpoint receives from the remote host. The TTL's in 
these packets reflect the hop count for the route from the remote host to the local host. If the routes 
between the two hosts are asymmetrical, however, then this hop count does not necessarily reflect 
the hop count along the route in the other direction (local host to remote host), which is generally 
the direction of interest. As shown in Chapter 8, routing asymmetry is not uncommon. Because 
of this, use of the TTL method may require some additional mechanism by which the local host 
can learn the TTL the remote host observed in packets it received from the local host. We do not 
attempt here to offer a well thought out mechanism for doing so. We only comment that any such 
mechanism must take care that, when a route changes, the network is not immediately flooded with 
messages to that effect. Perhaps a solution can be found using multicasting techniques to minimize 
the number of messages sent after route changes. 

92 
Chapter 8 
Routing Symmetry 
We now analyze the routes from our measurement study to assess the degree to which 
routes are symmetric. We first motivate the investigation by discussing the impact of routing asym­ 
metry on different network protocols and measurements. We then give an overview of various mech­ 
anisms that can introduce asymmetry into Internet routing, including ``hot potato'' routing (x 8.2), 
which could result in a greater proportion of asymmetric routes in the future. We next introduce a 
definition of routing symmetry, and show that practical considerations require a revision in which 
we view routes as asymmetric only if they visit different cities or autonomous systems. We then as­ 
sess our data for these asymmetries and find that, overall, 50% of the time an Internet path includes 
a major asymmetry in terms of the cities visited in the different directions, and 30% of the time it 
includes a major asymmetry in terms of autonomous systems visited. We finish with a discussion of 
the magnitude of the asymmetries, most of which differ at just one ``hop,'' but some at many hops. 
8.1 Importance of routing symmetry 
Routing symmetry affects a number of aspects of network behavior. When attempting to 
assess the one­way propagation time between two Internet hosts, the common practice is to assume it 
is well approximated as half of the round­trip time (RTT) between the hosts [CPB93a]. The Network 
Time Protocol (NTP) needs to make such an assumption when synchronizing clocks between widely 
separated hosts [Mi92a]. If routes are asymmetric, however, the assumption might easily lead to 
error. The NTP design utilizes multiple time server peers and robust algorithms to choose among 
them for the best time offset to use to account for propagation effects. Thus, routing asymmetry has 
an impact on NTP only if the paths between two NTP communities are predominantly asymmetric, 
with similar differences in one­way times. In that case, the two communities will keep consistent 
time among themselves, but not between each other. 1 
Claffy and colleagues studied variations in one­way latencies between the United States, 
Europe, and Japan [CPB93a]. They discuss the difficulties of measuring absolute differences in 
propagation times in the absence of separately­synchronized clocks, but for their study they fo­ 
cussed on variations, which does not require synchronization of the clocks. They found that the 
1 Recently, however, highly accurate atomic clocks have become much more affordable than in the past (as have Global 
Positioning System receivers, which also provide reliable time). These provide an independent solution to the problem 
of keeping widely separated NTP servers synchronized. 

93 
two opposing directions of a path do indeed exhibit considerably different latencies, in part due 
to different congestion levels, and in part due to routing changes, which they detected using the 
TTL method (x 7.7). 
Along with affecting Internet protocols such as NTP, routing asymmetry can render net­ 
work measurement considerably more difficult. Often it is easiest to perform measurements at 
a single endpoint of a network path, but in the face of routing asymmetries, such measurements 
might be unable to distinguish between considerably different behavior along the forward and re­ 
verse directions of the path. We explore this problem at length in Part II (see x 9.1.3 for a general 
discussion). 
Closely related to this measurement problem, routing asymmetry also potentially compli­ 
cates mechanisms by which connection endpoints can infer network conditions from the pattern of 
packet arrivals they observe. For example, we develop a technique in Chapter 14 for estimating the 
``bottleneck bandwidth'' of the network path used by a connection. The technique works by exam­ 
ining the timing with which packets arrive at their receiver. If routing is symmetric, then (for most 
link technologies) the bottleneck bandwidth measured by this technique will be the same as that 
encountered by packets sent in the other direction. Symmetry could, for example, allow the server 
for a request/reply application such as the World Wide Web [BCLF+], or, more generally, T/TCP 
[Br94], to determine the link bandwidth available for sending its reply, based on the bandwidth in­ 
ferred from the request. If routing is asymmetric, however, then the server runs the risk of inferring 
an incorrect value for the bandwidth. 2 However, we show in Chapters 14 and 16 that bottleneck 
bandwidths and delays are often asymmetric along the two directions of a path, and attribute the 
difference at least in part to routing asymmetries. 
Finally, recent work has investigated the characteristics of network traffic flows as viewed 
by a router [CBP95]. That study describes a taxonomy of methodologies that can be used by routers 
to define and manage flow state. One finding of the study is that a large number of flows are bidi­ 
rectional, due in part to request/reply transactions such as those used by the Domain Name System 
(DNS; [MD88]) and the World Wide Web. When a router R sees a flow likely to be bidirectional, 
for example a DNS request from A to B, one might consider establishing anticipatory flow state in 
the router for the reply coming from B to A, to avoid the overhead of two separate trips through 
the ``slow path'' associated with flows for which there is no cached state. With prevalent routing 
asymmetry, however, while B may very likely send such a message shortly, the reply could well not 
be routed via R, in which case the anticipatory flow state is wasted effort and resources. 
Similarly, accounting used to charge for carrying network traffic is complicated by the 
possibility of locally observing only one direction of a traffic flow. For example, a recently devel­ 
oped architecture for Internet traffic flow measurement has a basic assumption that routers observe 
bidirectional flows [BMR97]. 
8.2 Sources of routing asymmetries 
In this section we discuss several mechanisms that can lead to routing asymmetries. To 
illustrate, we assume the viewpoint of a router R 0 faced with the decision of how to forward packets 
originated by host A and destined for host B. In addition to the upstream router from which R 0 
2 Even if routing is symmetric, the server cannot rely on the congestion levels being symmetric. Thus, as with routing 
stability, routing symmetry is necessary but not sufficient for predicting network behavior. 

94 
receives packets sent by A, R 0 is connected to two potential downstream routers, R 1 and R 2 , and 
the decision it must make is to which of these it forwards packets bound for B. Let us also assume 
that packets from B headed to A arrive at R 0 via R 1 (but in general R 0 does not itself know this 
fact), and that these packets first pass through a router R 3 , which makes the decision whether to use 
the route that ultimately delivers the packets to R 0 via R 1 , or a different route that results in the 
packets arriving at R 0 via R 2 . 
In general, routing algorithms incorporate ``link costs'' or metrics to quantify the desir­ 
ability of using a particular link for a given route [Pe92, St95]. To assure reliable operation, a router 
also generally knows of multiple paths available to a remote destination B, so we assume that R 0 
has two metrics, ¯ 1 and ¯ 2 , associated with forwarding packets to B via R 1 or R 2 . If ¯ 1 = ¯ 2 , then 
R 0 must somehow arbitrate between them. If it does so deterministically, by picking R 2 , then an 
asymmetry is created. 3 
Another way of introducing asymmetry is via configuration asymmetries or errors. For 
example, if due to misconfiguration R 0 believes that using the link to R 1 is very expensive, but R 1 
does not share this view, then R 0 will artificially inflate the cost of using R 1 to get to B, and instead 
pick R 2 . 
Network topology changes can also introduce routing asymmetries, albeit transient ones, 
due to the non­negligible amount of time required for changes to propagate through the network. 
For example, suppose R 2 learns of a better route to R 3 than it had before. If knowledge of this new 
route propagates to R 0 before R 3 , then R 0 will switch from R 1 to R 2 , and an asymmetry will exist 
until R 3 learns of the route. 
Another transient mechanism for creating routing asymmetries can arise due to adaptive 
routing (x 7.2), in which a router attempts to shift traffic from a highly loaded link to a less loaded 
link. For example, R 0 might decide that it is sending too much traffic via the link to R 1 (the bulk 
of this traffic might not be destined for B), so it increases the metrics associated with R 1 to the 
point where routing via R 2 becomes the preferred route to B. More generally, if routing metrics 
include a notion of current congestion levels, then asymmetric congestion in the network can lead 
to asymmetric routing, as the network alters its routing to avoid the congested region. 
A final mechanism introducing asymmetry, and one of possibly growing importance, con­ 
cerns ``hot potato'' and ``cold potato'' routing. In the past, Internet backbones were primarily operated 
by a single entity. In recent years this has changed, with the growth of competing Internet Service 
Providers (ISP's) due to the privatization of the Internet infrastructure. 
Suppose host A in California uses ISP I A , and host B in New York uses I B . Assume that 
both I A and I B provide Internet connectivity across the entire United States. When A sends a packet 
to B, the routers belonging to I A must at some point transfer the packet to routers belonging to I B . 
Since cross­country links are a scarce resource, both I A and I B would prefer that the other convey 
the packet across the country. If the inter­ISP routing scheme allows the upstream ISP (I A , in our 
example) to determine when to transfer the packet to I B , then, due to the preference of avoiding the 
cross­country haul, I A will elect to route the packet via I B as soon as possible. This form of routing 
is known as ``hot potato.'' In our example, it leads to I A transferring the packet to I B in California. 
But when B sends traffic to A, I B gets to make the decision as to when to forward the traffic to I A , 
and with hot potato it will choose to do so in New York. Since the paths between California and 
New York used by I A and I B will in general be quite different, hot potato routing thus leads to a 
3 If it alternates between R1 and R2 , it creates fluttering, as discussed in x 6.6. 

95 
major routing asymmetry between A and B. 
Conversely, if the downstream ISP can control where the upstream ISP transfers packets 
to it, then the result is ``cold potato'' routing, in which I B instructs I A that, to reach B, I A should 
forward packets to I B 's New York network access point (NAP). Similarly, I A advertises to I B that, 
to reach A, I B should forward packets to I A 's California NAP. The result is that packets from A to 
B travel across the country via I A 's links, while those from B to A travel via I B 's links. The paths 
are the opposite of those resulting from hot potato routing, but the degree of asymmetry remains the 
same, and potentially large. 
For further discussion of asymmetry issues, see [Che95]. 
8.3 Definition of routing symmetry 
In this section we develop a definition for whether two routes are symmetric. We first try 
the following: 
Definition 1 For two hosts A and B, let r 1 ; : : : ; r n denote the routers visited in sequence by packets 
sent from A to B, and r 0 
1 ; : : : ; r 0 
m denote those visited in sequence by packets from B to A. Then 
the two routes are symmetric if and only if n = m and: 
8i; 1 Ÿ i Ÿ n : r i = r 0 
n+1 
leaves the site at the first access point for a 
downstream router R, while traffic from B to A comes to the site also from R, but arriving at the 
second access point. Such an asymmetry is minor. For example, it will have minimal impact on 
the accuracy of the NTP protocol (x 8.1). On the other hand, if the route from A to B visits a 
different city than does the route from B to A, then the two paths might have considerably different 
properties, and the asymmetry is major. 
To illustrate these differences, consider the route we observed in R 1 from ucol to ucl 
(where we have annotated the cities visited in parentheses), shown in Figure 8.1. One of the com­ 
plementary routes we observed from ucl to ucol is shown in Figure 8.2. This route visits the 
same cities as the reverse route, though not the same routers; the asymmetry is minor. On the other 
hand, we also observed a route from ucl to ucol as shown in Figure 8.3. In this case, the de­ 
tour via California is skipped, shaving perhaps 2,000 kilometers of travel from the route: a major 
asymmetry. 
A second problem with Definition 1 is determining whether two routers r i and r 0 
j are 
indeed the same router. The difficulty arises because traceroute provides an IP address for each 
hop, but these do not uniquely identify routers. In general, routers have multiple IP addresses, one 
for each network interface attached to the router. Furthermore, these IP addresses can translate to 
different hostnames. Thus, for example, it is difficult to determine whether the IP address with 
hostname sl­ana­3­s2/4­t1.sprintlink.net in Figure 8.1 corresponds to the same router 
as that with hostname sl­ana­3­f0/0.sprintlink.net in Figure 8.2. 
We address both these difficulties using a revised definition: 

96 
cs­gw­discovery.cs.colorado.edu (Boulder, CO) 
cu­gw.colorado.edu 
sl­ana­3­s2/4­t1.sprintlink.net (Anaheim, CA) 
sl­ana­1­f0/0.sprintlink.net 
sl­fw­6­h2/0­t3.sprintlink.net (Fort Worth, TX) 
sl­fw­5­f1/0.sprintlink.net 
sl­dc­8­h3/0­t3.sprintlink.net (Washington, D.C.) 
icm­dc­1­f0/0.icp.net 
icm­london­1­s1­1984k.icp.net (London, UK) 
smds­gw.ulcc.ja.net 
smds­gw.ucl.ja.net 
cisco­pb.ucl.ac.uk 
cisco.cs.ucl.ac.uk 
neptune.cs.ucl.ac.uk 
Figure 8.1: Route observed from ucol to ucl 
cisco.cs.ucl.ac.uk (London, UK) 
cisco­pb.ucl.ac.uk 
cisco­b.ucl.ac.uk 
gw.lon.ja.net 
eu­gw.ja.net 
icm­lon­1.icp.net 
icm­dc­1­s3/2­1984k.icp.net (Washington, D.C.) 
sl­dc­6­f0/0.sprintlink.net 
sl­dc­8­f0/0.sprintlink.net 
sl­fw­5­h4/0­t3.sprintlink.net (Fort Worth, TX) 
sl­fw­6­f0/0.sprintlink.net 
sl­ana­1­h2/0­t3.sprintlink.net (Anaheim, CA) 
sl­ana­3­f0/0.sprintlink.net 
sl­ucb­2­s0­t1.sprintlink.net (Boulder, CO) 
cs­gw.colorado.edu 
clark.cs.colorado.edu 
Figure 8.2: Route observed from ucl to ucol 

97 
cisco.cs.ucl.ac.uk (London, UK) 
cisco­pb.ucl.ac.uk 
cisco­c.ucl.ac.uk 
smds­gw.ulcc.ja.net 
icm­lon­1.icp.net 
icm­dc­1­s3/2­1984k.icp.net (Washington, D.C.) 
sl­dc­8­f0/0.sprintlink.net 
sl­fw­5­h4/0­t3.sprintlink.net (Fort Worth, TX) 
sl­fw­4­f0/0.sprintlink.net 
sl­ucb­1­s0­t1.sprintlink.net (Boulder, CO) 
cns­gw­suns.colorado.edu 
cs­gw.colorado.edu 
lewis.cs.colorado.edu 
Figure 8.3: Second route observed from ucl to ucol 
Definition 2 For two hosts A and B, let c 1 ; : : : ; c n denote the cities visited in sequence by packets 
sent from A to B, and c 0 
1 ; : : : ; c 0 
m denote those visited in sequence by packets from B to A. Then 
the two routes are symmetric if and only if n = m and: 
8i; 1 Ÿ i Ÿ n : c i = c 0 
n+1 
in the same city than whether they refer to the same router, since with 
a bit of effort it is generally possible to determine the city corresponding to an Internet host­ 
name (cf. x 5.3). For example, we know from the Sprintlink naming convention that both 
sl­ana­3­s2/4­t1.sprintlink.net and sl­ana­3­f0/0.sprintlink.net are located 
in Anaheim, California. 
We can make an analogous definition for routes differing in the autonomous systems they 
visit, rather than the cities. 
8.4 Analysis of routing symmetry 
In R 1 , we did not make simultaneous measurements of the paths A ) B and B ) A, 
which introduces ambiguity into an analysis of routing symmetry: if a measurement of A ) B is 
asymmetric to a later measurement of B ) A, is that because the route is the same but asymmetric, 
or because the route changed? 
In R 2 , however, the bulk of the measurements were paired: we first measured A ) B 
and then immediately afterward measured B ) A. Barring rapid route oscillations (which we can 
avoid by eliminating pathological traceroutes from our analysis), these measurements allow us 
to unambiguously determine whether the route between A and B is symmetric. 
The R 2 measurements contain 11,339 successful pairs of measurements, in which we 
were able to conduct traceroutes in both directions between sites A and B, neither of the mea­ 
surements encountering pathologies. 

98 
We find that 49% of the measurements observed an asymmetric path that visited at least 
one different city. 
There is a large range, however, in the prevalence of asymmetric routes among paths 
to and from the different sites. For example, 86% of the paths involving umann were asymmetric, 
because nearly all outbound traffic from umann travel via Heidelberg, but none of the inbound traffic 
does. At the other end of the spectrum, only 25% of the paths involving umont were asymmetric 
(but this is still a significant amount). 
If we consider autonomous systems rather than cities, then we still find asymmetry quite 
common: about 30% of the paired measurements observed different autonomous systems traversed 
in the path's two directions. The most common asymmetry was the addition of a single AS in one 
of the directions. This can reflect a major change, however. For example, the most common of these 
additions was the presence of SprintLink routers in one direction along the path but not in the other. 
Again, we find a wide range in the prevalence of asymmetry among the different sites. 
Fully 84% of the paths involving the ucl site were asymmetric, mostly due to some paths including 
JANET routers in London and others not (unsurprising, given the rapid oscillation between JANET 
and non­JANET routers discussed in x 7.6.1). On the other end of the spectrum, only 7.5% of adv's 
paths were asymmetric at AS granularity. 
8.5 Increasing prevalence of asymmetry 
We previously analyzed R 1 for routing asymmetry, attempting to adjust for the non­ 
simultaneity of its measurements by only using measurements spaced less than a day apart. The 
mismatch is likely to overestimate routing asymmetry, since if the route changes between measure­ 
ments that may be incorrectly regarded as an asymmetry, per our discussion at the beginning of x 8.4. 
The mismatch can also introduce false symmetries, if the route happens to change to the symmetric 
counterpart, but this circumstance is probably more rare than introducing false asymmetries. 
In the R 1 measurements, we found 30% of the paths contained city­level asymmetries. 
The large discrepancy between this figure and the 50% figure for the R 2 measurements suggests 
that over the course of a year routing became significantly more asymmetric. We surmise that the 
increase of asymmetry is likely due to the ``hot potato'' effect discussed in x 8.2. If so, then the rise in 
asymmetry has its roots in commercial factors, and frequent routing asymmetry may continue to be 
common in the Internet in the future. From a measurement perspective, this would be unfortunate, 
for the reasons given x 8.1, and further developed in x 9.1.3. 
8.6 Size of asymmetries 
We finish our study of routing symmetry with a look at the size of the different asymme­ 
tries. We can assign a ``magnitude'' to each asymmetry in terms of the number of cities different 
in the two directions. We consider each ``city hop'' at which the two directions of a path differ as 
contributing a magnitude of 1; if one direction has more ``city hops'' than the other, each additional 
city contributes 1 
2 . For example, for the paths between rain and bnl, we observed simultaneous 
measurements of the following routes: 
r0.pdx.rain.rg.net (Portland) 

99 
sl­stk­13­s2/2­t1.sprintlink.net (Stockton) 
sl­stk­5­f0/0.sprintlink.net 
sl­dc­6­h1/0­t3.sprintlink.net (Washington, D.C) 
sl­pen­1­h2/0­t3.sprintlink.net (Pennsauken) 
sl­pen­2­f0/0.sprintlink.net 
ny­nyc­2­h1/0­t3.nysernet.net (New York) 
ny­nyc­6­f0/0.nysernet.net 
ny­dp­1­h0/0­t3.nysernet.net (Deer Park) 
ny­bnl­2­s0­t1.nysernet.net (BNL) 
cerberus.bnl.gov 
frog.rhic.bnl.gov 
and 
cerberus.90.bnl.gov (BNL) 
nioh.bnl.gov 
192.12.15.224 
llnl­satm.es.net (Livermore) 
ames­llnl.es.net (Mountain View) 
fix­west­cpe.sanfrancisco.mci.net (San Francisco) 
borderx2­hssi2­0.sanfrancisco.mci.net 
core2­fddi­1.sanfrancisco.mci.net 
core1­hssi­2.sacramento.mci.net (Sacramento) 
core­hssi­3.seattle.mci.net (Seattle) 
border1­fddi­0.seattle.mci.net 
rgnet­b1­serial2­3.seattle.mci.net 
chia.rain.net (Portland) 
The paths differ at five ``city hops,'' Stockton/Seattle, Washington/Sacramento, 
Pennsauken/San Francisco, New York/Mountain View, and Deer Park/Livermore, so we as­ 
sign a magnitude of 5 to this asymmetry. 
Figure 8.4 shows the distribution of asymmetry magnitudes. We see that the asymmetries 
typically include only one different city hop, or, even more commonly, just one additional city. 
About one third of the asymmetries have magnitude 2 or greater. We should bear in mind, though, 
that this corresponds to almost 20% of all the paired measurements in our study, and can correspond 
to a very large asymmetry. For example, a magnitude 2 asymmetry between ucl and umann differs 
at the central city hops of Amsterdam and Heidelberg in one direction, and Princeton and College 
Park in the other! 
In general, the presence of such asymmetries highlights the difficulties of providing a 
consistent topological view in an environment as large and diverse as the Internet. 

100 
0 2 4 6 8 
0.0 
0.1 
0.2 
0.3 
Number of Different Cities (Magnitude) 
Fraction 
of 
Asymmetric 
Routes 
Figure 8.4: Distribution of asymmetry sizes 

101 
Part II 
End­to­End Internet Packet Dynamics 

102 
Chapter 9 
Overview of the Packet Dynamics Study 
In this part of our study we present our efforts to find convincing answers to questions 
about end­to­end Internet packet dynamics such as ``how often are packets dropped?'' As in Part I, 
we devise a large­scale measurement experiment based on the ``Network Probe Daemon'' (NPD) 
measurement framework. Our goal with this part of our study is to develop persuasive characteriza­ 
tions of the dynamics of Internet packet loss and delay. To do so, however, requires a great deal of 
groundwork in order to assure that the resulting findings are sound. 
First, we need to calibrate our basic packet measurements, detecting those that are untrust­ 
worthy or inaccurate so that we can discard them to avoid drawing false conclusions. We describe 
how we do so in Chapter 10. Because we use TCP transfers as our basic ``probes'' for measuring 
network paths, our probes have a complicated structure due to the particulars of TCP. In Chapter 11 
we discuss our development of an analysis tool, tcpanaly, that accounts for the details of the 
various TCPs in our study, and thus can separate their effects from true networking effects. The 
development of tcpanaly also gives us an opportunity to look at the differences in behavior be­ 
tween the TCP implementations in our study. These turn out to be quite significant, including some 
sufficiently broken TCPs that, if ubiquitously deployed, would devastate Internet performance due 
to congestion collapse. 
Because one of our goals is to characterize one­way packet delays, we must also deal with 
the problem of calibrating the clocks used in our study. This proves much more difficult than we 
had originally anticipated. Chapter 12 details our efforts. 
In Chapter 13 we turn to examining network ``pathologies,'' meaning unexpected network 
behavior. These include out­of­order delivery, in which packets arrive at the receiver in a different 
order than that in which they were sent; packet replication, in which the network delivers multiple 
copies of a single packet; and packet corruption, in which the data in the packet delivered by the 
network differs from that in the packet as originally sent. 
In order to then soundly evaluate packet delay and loss, we need to first determine each 
connection's bottleneck rate, i.e., the upper bound imposed by the network path on the connection's 
throughput. This rate plays a crucial role because it determines when closely­spaced packets must 
necessarily queue behind each other in the network. Network conditions observed by such packets 
are correlated and must be treated separately from uncorrelated observations. In Chapter 14 we 
discuss shortcomings of the main existing technique for estimating bottleneck bandwidth, ``packet 
pair,'' and develop a robust algorithm, PBM (``packet bunch modes''), to address these problems. 

103 
In addition, we characterize the range of bottleneck rates we observed among the various Internet 
paths, and assess the stability of a path's bottleneck rate over time. 
We then proceed in Chapter 15 to an analysis of patterns of Internet packet loss. We 
look at many different facets of loss, including the differences between loss rates of data packets 
and acknowledgements; correlations between loss rates along the two directions of a network path; 
trends in loss rates; differences in loss rates due to geography; the duration of loss ``outages''; 
the location, with respect to the path's bottleneck element, where packet loss occurs; how well a 
connection's observed packet loss predicts those of future connections; and how well TCP deals 
with packet loss, in terms of retransmitting only when necessary. 
We finish in Chapter 16 with an analysis of patterns of Internet packet delay. We look at 
variations and extremes of round­trip times (RTTs) and one­way transit times (OTTs); symmetry in 
OTT variation along the two directions of a network path; correlations between delay variations and 
loss; how well a connection's delay variations predict those of future connections; the phenomenon 
of packet timing ``compression''; the time scales on which queueing occurs; and the degree of avail­ 
able bandwidth present along Internet paths. 
Chapter 17 summarizes the findings of both Part I and Part II, and sketches the main 
themes of the work. 
In the remainder of this chapter, we discuss our experimental methodology (x 9.1); those 
aspects of the TCP protocol relevant to our study (x 9.2); and the raw data produced by the experi­ 
ment (x 9.3). 
9.1 Methodology 
In this section we discuss the methodology underlying the packet dynamics experiment. 
We address two separate issues: how to make the measurements, and how to analyze them. 
9.1.1 Measurement considerations 
For our packet dynamics study, our measurement ``probes'' consisted of TCP transfers of 
100 Kbyte files over different Internet paths. We discuss in x 9.1.2 the reasoning behind using TCP 
for the study. The transfers were unidirectional: data only flowed along one direction of the path. 
Such connections are referred to as bulk transfers [DJCME92, Pa94a]. There are other classes of 
traffic in the Internet (such as request/response, interactive, multicast, and real­time). All of these 
ultimately boil down to dividing data into packets for delivery by the Internet's packet forwarding 
infrastructure. Our goal is to characterize what happens to packets once they are in the hands of this 
infrastructure. For this purpose, bulk transfers serve well, as they provide a fairly steady stream of 
data packets traveling in one direction, and a corresponding stream of ack packets traveling in the 
other. We can then analyze the fate and timing of the packets to determine how the two directions 
of the Internet path performed. 
Each transfer was traced using the tcpdump utility [JLM89] at both the sender and the 
receiver, resulting in two trace files. We term the combination of the two trace files a ``trace pair.'' 
Our findings are all based on analyzing trace files and trace pairs. 
For security reasons, the NPD transfers used fixed TCP sending and receiving ``ports,'' so 
tcpdump could immediately filter out traffic not related to the transfer. That we did so has two draw­ 

104 
backs. First, it means that the traces lack some network traffic relevant to the transfer, namely any 
associated Internet Control Message Protocol (ICMP; [Po81b]) messages. We discuss in x 11.3.3 
how we inferred the presence of a particular type of ICMP message, termed ``source quench.'' In 
addition, using fixed ports resulted in our measurements incurring a minimum separation between 
consecutive measurements of the same pair of hosts, because TCP has rules governing how quickly 
a pair of ports can be reused for a new connection. 1 
As with the routing dynamics experiment, we used exponentially­spaced sampling in­ 
tervals in order that our measurements might observe an unbiased sample of conditions along the 
different Internet paths (x 4.3). We conducted two experimental runs, N 1 and N 2 , detailed in x 9.3. 
For N 1 , source hosts were randomly paired with destination hosts, and we conducted a single mea­ 
surement for each pairing. The drawback of this approach is that, if we want to study how an Internet 
path's characteristics change (or ``evolve'') over time, then random pairing results in widely­spaced 
measurements of individual pairs. For example, in N 1 the mean sampling interval for a given pair 
was about two days. Consequently, we cannot analyze much finer time scales of evolution. 
We addressed this difficulty in the second run, N 2 , by randomly pairing source and desti­ 
nations into groups of measurements. Each measurement group consisted of two subgroups. Within 
a subgroup, we conducted six measurements, separated by 180 sec plus exponentially­distributed 
intervals with means 30 sec, 60 sec, 120 sec, 240 sec, and 480 sec. 2 These spacings allow us to 
analyze evolutions over short time intervals. 
The two subgroups were then separated by an exponentially­distributed interval with 
mean 2 hours, allowing us to characterize evolution over medium time intervals. In addition, 
source/destination pairs would conduct additional groups of measurements separated from the pre­ 
vious group by another exponential interval with a mean of 12 hours. Finally, the pairs would be 
revisited on the order of a number of days later. These last two groups of measurements allow us to 
characterize relatively long time intervals, too. 
9.1.2 Using TCP 
Most previous end­to­end studies have used ICMP ``ping'' messages [Mi83, CPB93a] or 
User Datagram Protocol (UDP; [Po80]) ``echo'' messages for their network probes [Bo93]. 3 Both 
have the considerable advantage of logistical ease: most Internet hosts readily reply to ``ping'' 
messages, 4 and activating the UDP echo service is often a one line configuration tweak. 
However, these types of probes also incur disadvantages. The most significant of these 
is that of the rate at which the probes are sent. To probe fine time scales requires sending closely­ 
spaced probe packets. Yet, if this is done blindly, say by deciding to send packets 1 msec apart, 
then depending on the mismatch between the sending rate and the capacity of the network path, the 
measurement traffic can grossly overload the path. Consequently, both ``production'' traffic sharing 
the path suffers, and the measurements are skewed by the abnormal loading. Unfortunately, there 
is a very wide range in network path capacity (we develop this claim in detail in Chapter 14 and 
1 Nominally, this minimum time is four minutes, twice the ``maximum segment lifetime'' of two minutes. In practice, 
it varies between TCP implementations. 
2 The 180 sec constant interval was required to avoid problems with reusing the fixed source and destination ports, 
discussed above. 
3 An exception is Mogul's study of TCP packet dynamics [Mo92]. 
4 This is changing, with the advent of firewalls. 

105 
Chapter 16), so there is no a priori correct choice to use for the probe spacing. 
Furthermore, capacity changes over the course of a series of probes, so we cannot deter­ 
mine a single correct choice for a path even after studying the path a bit. Therefore, ICMP­ and 
UDP­based measurement must make a trade­off between possibly overloading the network path, 
and probing conservatively but with no possibility of analyzing finer time scales. In general, re­ 
searchers have prudently chosen the latter. 
One could devise a probing strategy based on adapting the probe transmission rate to the 
current network conditions. However, to do this properly, one essentially must implement TCP's 
congestion control. At this point, it becomes easier to just start with TCP in the first place! 
Another drawback with echo­based techniques is that the echo services return a full copy 
of whatever packet they receive. Consequently, the measurement loads the network path both in the 
forward and the return direction. If the measurement is conducted using ``sender­only'' techniques 
(x 9.1.3), then the reverse­path loading makes it impossible to determine which direction of the path 
is responsible for what proportion of the phenomena observed. If the echoes are instead small, such 
as are TCP acknowledgements for data packets, then the connection does not load the reverse path, 5 
which lessens the conflation of the two directions. 
Both of these considerations, particularly the first, argue favorably for using TCP transfers 
as network probes, since then, by construction, our probes do not load the network any more than 
does a routine file transfer. Using TCP has one other major advantage: TCP is very widely used. 
Consequently, the end­to­end performance observed by TCP transfers is a much closer match to the 
service Internet users actually obtain from the network than are echo­based techniques. We will also 
see in Chapter 11 that one result of our using TCP is to uncover a large variation in how different 
TCP implementations perform, some with major performance and congestion implications. 
Using TCP, however, also brings with it some serious drawbacks. The first of these is 
that the TCP protocol behavior is quite complex. When casually inspecting TCP measurements, 
it can be difficult to determine which facets of the overall connection behavior were due to the 
state of the network path, and which were due to the behavior of the TCP implementations at the 
endpoints. If our goal is to characterize the network path, we must be able to separate these two, 
which entails understanding the nitty­gritty details of how different TCP implementations realize 
the protocol. To do so, for our study we developed a program, tcpanaly, which has knowledge of 
various TCP implementations and can analyze tcpdump traces in order to separate TCP endpoint 
effects from those due to the network path. Writing tcpanaly was a significant undertaking, much 
harder than we had initially anticipated (because we had not realized the wide range of real­world 
TCP behaviors). We discuss it in detail in Chapter 11. 
The other major drawback with using TCP is that often it sends small groups of data 
packets at rates exceeding that of the network path's capacity (x 9.2.5). These packets necessarily 
queue behind one another at the path's bottleneck. Therefore, for measuring the network's state such 
a group constitutes a correlated set of probes. We address this difficulty at length in Chapter 14. 
5 There is one way in which small packets can contribute to load along the reverse path similarly to large packets. If a 
congested router manages its buffers for queued packets on a per­packet basis, rather than allocating the number of bytes 
required to queue a packet out of a shared pool, then small packets consume the same amount of resource when queued at 
the router as do large packets. In this regard, small packets can push the congested router to the point of buffer overflow 
as fast as large packets do. Once, however, the small packets receive service, by transmission across their outbound link, 
then their contribution to the router's load immediately diminishes, since they require significantly less transmission time. 

106 
Furthermore, the TCP sender adapts the rate at which it transmits data packets based on 
previously observed network conditions (in particular, packet loss, per x 9.2.6). Thus, even when 
uncorrelated, the data packets do not reflect an unbiased measurement process, but rather one that 
changes its sampling rate in order to try to minimize observed packet loss. We discuss this property 
in Chapter 15. 
On the other hand, for a TCP bulk transfer, both of these problems only occur along the 
forward path. The traffic along the reverse path is comprised entirely of small acknowledgement 
packets. These in general do not necessarily queue behind one another at the bottleneck, and, 
furthermore, their transmission rate is adapted not to conditions along the reverse path, which they 
observe, but to conditions along the forward path. We show in Chapter 15 that these conditions 
are generally uncorrelated. Thus, the ``ack stream'' along the reverse path reflects a much cleaner 
measurement process. 
In summary, by using TCP transfers, we get two basic types of measurements: those that 
correspond to conditions that TCP data packets encounter (the forward path), and those that tell us 
about general Internet path properties (the reverse path ack stream). The combination makes for 
rich analysis. 
9.1.3 Tracing at both sender and receiver 
End­to­end measurement is often done using what we term ``sender­based'' or ``sender­ 
only'' measurement, meaning that probes and their replies are recorded only at the location of the 
probe sender. Sender­based measurement has the enormous logistical advantage of not requiring 
access to the remote site in order to instrument the probe arrivals. Such access can be difficult to 
gain, for administrative and security reasons. 
On the other hand, sender­based measurement carries with it the limitation that from it 
one can say little about how traffic behaves along the path's two different directions. For example, 
suppose a measurement consists of sending a flight of 20 ICMP ``ping'' packets from A to B, and 
timing at A the arrival of their echoes. If only 6 echoes return, we have no way of knowing whether 
B never sent the 14 others, because their corresponding pings never arrived at B; or if B did send 
them, but they were lost on their journey from B back to A; or if some combination of loss from A 
to B and loss from B to A occurred. Consequently, it is difficult to say anything concrete about the 
nature of the loss event. 
This consideration becomes more subtle, but equally important, when applied to analyzing 
packet delay. A sender­based scheme can only observe round­trip time (RTT) delays. These are per­ 
force the sum of the one­way transit time (OTT) delays in the two directions, plus the (unobserved) 
delay of the receiver generating its reply. If the goal of the timing measurement is to estimate ca­ 
pacity along the forward path, such as for TCP Vegas [BOP94], then any delay variations incurred 
on the return path are pure noise, and at best dilute the precision with which the sender can estimate 
the path capacity. 
Because we traced our transfers at both the sender and the receiver, we can fully separate 
effects due to the forward path, the reverse path, and the processing delays at both the sender and 
the receiver. Throughout our study we examine issues of path symmetry with an eye to gauging 
the effectiveness of sender­only measurement. We find, overall, that such measurement is signif­ 
icantly less accurate than receiver­based measurement. Consequently, it behooves us to consider 
mechanisms for coordinating measurement between sender and receiver. 

107 
9.1.4 Analysis strategies 
In this section we discuss the principles underlying our analysis of the measurement data. 
They are all in response to three dominant considerations. The first is that we gathered a very large 
volume of data: more than 20,000 transfers recorded at both sender and receiver. Each transfer 
consisted of 100--400 packets, resulting in well over a gigabyte of data. The second consideration 
is that we lack separate means of calibrating the measurements. All we have to work with are the 
packet traces. It is easy to assume that such traces accurately reflect the true number and timing 
of the packets comprising the traffic we wish to measure, but no large­scale study has been made 
to test the overall integrity of packet traces, so the validity of this assumption is unproven. The 
third consideration is that network behavior almost inevitably includes ``noise'' in a variety of forms 
and on a variety of scales. We observe ``extreme'' behavior much more often than we might expect 
using a traditional statistical framework (such as one based on assumptions of normality and tame 
correlations). 
That we must deal with a large volume of data lies at the heart of our study: the study 
is interesting precisely because the volume of data is large. By (very careful) analysis of it, we 
have a hope of capturing a useful description of the immensely diverse behavior of the huge, het­ 
erogeneous network that is the Internet. We further argue that future Internet traffic studies must 
likewise measure on a large scale, otherwise we have little hope of divining from them general re­ 
sults. Thus, a central contribution of our work is the set of approaches we develop to deal with this 
large, uncalibrated, noisy mass of measurements. 
In addition, in the hopes of abetting future studies, we will make our TCP data publicly 
available via the Internet Traffic Archive, sited at: 6 
http://www.acm.org/sigcomm/ITA 
The routing data analyzed in Part I is already available in the Archive, under the name NPD­Routes. 
Automated analysis 
Confronted with 20,000 traces to analyze, it is clear that we cannot hope to individually 
analyze each trace. We must instead turn to automated analysis. That is, we realize part of our anal­ 
ysis in terms of a computer program that has coded into it the different reductions and calculations 
required by the analysis. We briefly mentioned this program, tcpanaly, above. One of its basic 
tasks is to separate TCP endpoint behavior from network behavior, hence its name. Another is to 
then characterize the network dynamics reflected in the trace of the connection. 
tcpanaly undertakes what we might call ``micro­analysis.'' It is limited in its scope 
to analyzing single connections. The ``macro­analysis,'' namely the sifting through the individual 
micro­analyses in search of unifying observations and themes (much in the sense of ``scientific 
inference,'' as discussed in [Cha95]), is then done manually. 7 Both forms of analysis are highly 
iterative processes, and each gives insight into the other by identifying patterns that merit further 
investigation. 
6 At the time of this writing, the Archive is moving from its old location to this URL. If the reader has any difficulty 
accessing the Archive, send email to vern@ee.lbl.gov. 
7 We used the S statistical environment [BCW88] for the macro­analysis. 

108 
Self­consistency checks 
To address the second problem---lack of separate calibration---we must turn to ``self­ 
calibration'' in the form of self­consistency checks: testing, to as great a degree as possible, for 
any ways in which different aspects of the data contradict one another. 
Calibration is all about detecting error, whether introduced by the measurement process, 
or by the subsequent analysis. Ideally, all of the effort is for naught; the data and analysis are wholly 
free of error. Consequently, it can sometimes be tempting to skip calibration or treat it lightly, 
since it only provides negative results. Doing so, however, undermines the entire validity of the 
measurement process. Furthermore, our experience in conducting both this study and several other 
large­scale studies [Pa94a, Pa94b, PF95] is that, when the scale becomes sufficiently large, errors are 
inevitable, since even rarely observed problems have sufficient opportunity to manifest themselves. 
Thus, we discuss self­consistency checks throughout our study. (For example, Chapter 12 is almost 
entirely about developing self­consistency checks for calibrating the timing measurements recorded 
in our traces.) The degree to which these checks prove persuasive is the degree to which one might 
accept our findings as well­grounded. 
Robust statistics 
The final problem we must address with our analysis strategies is that of widespread noise. 
For example, if we wish to summarize a connection's round trip times (RTTs), we might at first think 
to express them in terms of their sample mean and variance (or standard deviation, the square root 
of variance). However, in practice we find that often a connection observes one or two RTTs that are 
much higher than the remainder. These extreme values greatly skew the sample mean and variance, 
so that the resulting summaries do not accurately reflect ``typical'' behavior. 
To address these sorts of problems, statisticians have developed the field of robust statis­ 
tics [HMT83]. These are statistics that remain resilient in the presence of extremes, or ``outliers.'' 
One example is use of the median, or 50th percentile, as a statistic for summarizing a distribution's 
central location, rather than the mean. Unlike the mean, the median is virtually unaffected by the 
presence of outliers. 
In our study, we make heavy use of medians as robust estimates of central location. To 
compute a median of n points, x i = x 1 ; : : : ; x n , we sort the points to obtain x (1) ; : : : ; x (n) , and then 
use: 
median(x i ) = x ( n+1 
2 ) ; 
if n is odd, or: 
median(x i ) = 
1 
2 
(x ( n 
2 ) + x ( n+1 
2 ) ); 
if n is even. 
A robust statistic for measuring variation is the interquartile range, or IQR [Ri95]. The 
IQR is the difference between a distribution's 75th percentile and its 25th percentile. Thus, it char­ 
acterizes the distribution's ``central variation.'' It is likewise virtually unaffected by the presence of 
outliers, since these by definition fall outside of the range of the values used to compute the IQR. 
We likewise in our study often make use of IQR rather than standard deviation. 
One other technique we borrow from robust statistics is that of fitting a line to a series 
of hx; yi points. Techniques such as least­squares can be heavily skewed by trying to minimize the 

109 
distance between the fitted line and any outliers. The technique we use, taken from [HMT83], is to 
first estimate the slope of the line as the median of all of the pairwise slopes between the different 
points, and then estimate the intercept as the median of the offset of the y coordinates from a line 
with the given slope and zero­intercept. 
9.2 An overview of TCP 
In this section we give an overview of how the Internet's TCP protocol works. We make 
numerous references to its operation in subsequent chapters. Our presentation is not exhaustive, but 
confined to those aspects of TCP relevant to our later discussion. 
The main protocol used in the Internet for reliable data delivery is the Transmission Con­ 
trol Protocol, or TCP. TCP is specified in [Po81c], with updates and clarifications given by [Br89], 
as well as several other documents specifying optional extensions [BJ88, BBJ92, Br94, MMFR96]. 
Stevens gives an excellent, detailed description of how TCP works [St94], and [WS95] analyzes an 
entire TCP implementation line­by­line. 8 TCP is implemented on top of the Internet Protocol, or 
IP, described in [Po81a]. The combination is often referred to as ``TCP/IP.'' 
9.2.1 Data delivery goals 
TCP is a complex protocol, since it was designed to accomplish a number of objectives: 
ffl In­order delivery, meaning that data is presented to the receiving application in the same 
sequence as transmitted by the sending application. 
ffl A byte­stream model, in which the sender and receiver view the data simply as a series of 
bytes, with no apparent boundary points (such as those introduced by packetization). 
ffl Reliable data delivery, meaning that all of the data transmitted ultimately arrives at the re­ 
ceiver with its original contents (i.e., undamaged). 
Accomplishing these objectives in an environment where packets can be delayed, dropped, re­ 
ordered, duplicated, or corrupted is quite challenging. TCP achieves in­order, byte­stream data 
delivery by assigning each byte of data a sequence number, corresponding to its offset from the 
beginning of the byte stream. It does so efficiently by associating with each data packet a beginning 
sequence number (i.e., the sequence number of the first byte in the payload) and a length, which 
then gives the packet's upper sequence number. In subsequent discussion, we will adopt the con­ 
vention of using upper sequence numbers to distinguish between different data packets. When this 
identification is not unique, we will also give the time at which the packet was sent or received, to 
disambiguate. 
TCP achieves reliability by having the data receiver return acknowledgements, or ``acks,'' 
to the data sender. 9 Each ack includes an acknowledged sequence number, which indicates all 
of the in­order data that the receiver has successfully received. For example, if data packets with 
sequence numbers 1, 2, 3, 5, and 6 arrive at the receiver, then it can acknowledge up to sequence 
8 Both books also discuss other Internet protocols in depth. 
9 It also uses a 16­bit checksum to verify data integrity, a point we return to in x 11.4.2. 

110 
number 3. It cannot acknowledge 5 or 6, since they are not (yet) in­order. When the receiver 
subsequently receives sequence number 4, then it can acknowledge all the way up to 6. Such acks 
are termed ``cumulative,'' since receipt of any ack serves to acknowledge all of the data correctly 
received so far. [MMFR96] describes a TCP extension for ``selective acks'' (SACKs), which allow 
more detailed feedback of exactly which out­of­order packets have arrived at the receiver so far. 
In Chapters 13 and 15 we study some aspects of the efficacy of this extension, finding that it has 
considerable merits. 
If a TCP sender does not eventually receive an ack for data it has sent, then it concludes 
that the data packet was lost (``dropped'' or ``discarded'') during its journey through the network, and 
it retransmits the data in a new packet. Such a retransmission is termed a ``timeout retransmission,'' 
because it occurs when a timer expires indicating that enough time has elapsed that the packet was 
presumably lost, since an ack should have been received by now. The amount of time to wait before 
retransmitting is termed the retransmission timeout (RTO). Choosing a good value for RTO is a 
major problem, which we discuss in more detail below. We discuss another form of retransmission 
in x 9.2.7. 
9.2.2 Achieving high performance 
Achieving these objectives would be considerably simpler if TCP did not have another 
goal, namely performance. Without performance considerations, one can achieve in­order, reliable 
byte­stream delivery by simply sending one packet at a time until the receiver acknowledges it, and 
then advancing to the next packet (``stop­and­go''). Stop­and­go can be tremendously inefficient 
in terms of the performance achieved. If packets are b bytes and the round­trip time (RTT; the 
interval between when a packet is sent and when the corresponding acknowledgement arrives) is 
\DeltaT seconds, then even if the network path is completely unloaded and does not suffer from any 
undue loss or delay, the maximum achieved throughput is: 
ae = 
b 
\DeltaT 
: (9.1) 
A typical value for b is 512 bytes, and a typical cross­country path in the U.S. has \DeltaT ß 100 msec, 
so ae = 5; 120 bytes/sec, even though the path might be capable of transferring megabytes per 
second. 
TCP addresses performance issues in several ways. First, it sends packets that are as 
large as possible. Each Internet path has a Maximum Transmission Unit (MTU), which is the 
largest IP packet that can be transmitted along the path without incurring potentially expensive 
``fragmentation'' into smaller packets. An end­to­end path's MTU is the minimum of the MTUs 
of the various links that comprise the path. When a connection is established between two TCP 
endpoints, they negotiate a Maximum Segment Size (MSS), which is the largest amount of data each 
TCP is prepared to receive in a single packet transmitted to it by the other TCP. In general, the MSS 
is less than the MTU, since the MTU must also include the overhead associated with each packet, 
namely its protocol header information. 10 Given these considerations, TCP implementations strive 
to transmit ``full­sized'' data packets, meaning those that carry MSS bytes of user data. They cannot 
always do so, if the sending application has not provided them with enough data to completely fill 
10 Some TCPs confuse MSS and MTU, as described in x 11.5.4. 

111 
a data packet. For our bulk­transfer connections, however, this is generally not a problem, and the 
TCPs usually sent full­sized data packets. 
Using full­sized data packets helps increase b in Eqn 9.1, but never beyond MSS. Gener­ 
ally, MSS values are on the order of 512 bytes or sometimes 1460 bytes or 4 Kbytes, so this increase 
alone does not suffice for achieving good performance along a high­speed path. The much larger 
performance gain comes from having multiple packets in flight at one time. If a TCP has k packets 
in flight, then the potential throughput is: 
ae = 
kb 
\DeltaT 
; 
which can in principle match any available path speed (``bandwidth'') by using a suitably large k. 
The problem then becomes how to choose k. There are two separate concerns: how fast 
the receiver can accept data, and how fast the network can accept data. The first problem is referred 
to as ``flow control,'' and the second as ``congestion control.'' 
TCP addresses flow control by including in the receiver's acks an ``offered window'' (also 
referred to as ``advertised window'', ``receiver window,'' and, in some contexts, as simply the ``win­ 
dow''). The offered window specifies how much new data the receiver promises to accept from the 
sender. It reflects the buffer available at the receiver, which is used to absorb discrepancies between 
the rate at which new data arrives at the receiver, and the rate at which the receiving application con­ 
sumes that data from the receiving TCP. When the available buffer changes, the receiving TCP may 
send ``window updates,'' which are acknowledgements with revised values for the offered window. 
The offered window is expressed in terms of a ``credit'' beyond the packet acknowledged 
by the ack. For example, suppose the MSS is 512 bytes and the receiver has 4,096 bytes of buffer 
available. If data packets with sequence numbers 512, 1024, 2048, and 2560 arrive, 11 then the 
receiver can acknowledge up to 1024, since the first two packets arrived in sequence. It cannot 
acknowledge 2048 or 2560 because they arrived ``above sequence.'' So far, the receiving application 
has not consumed any of the data, even though it could read the first 1,024 bytes if it wished. So 
the receiving TCP needs to hold the data from all four of the packets in its buffer. Because it has 
4,096 bytes of buffer, it can accommodate additional data up to sequence 4096, so in the ack it 
includes an offered window of 4,096 
ing it with 2,048 bytes of uncommitted buffer. This works because the sender can only use the 
3,072 byte credit as a window beyond the acknowledgement point (``ack point''). So it can only 
transmit 2,048 bytes' worth of data not already buffered, namely those corresponding to sequence 
numbers 1536, 3072, 3584, and 4096. 
Suppose now the receiving application reads the 1,024 bytes that have been acknow­ 
ledged. (It cannot yet read the data in the later packets, since they are presently ``above sequence,'' 
so they cannot be read in­sequence yet.) Now the receiving TCP no longer needs to buffer the first 
two packets, so it can accommodate an additional 1,024 bytes from the sender. It may at this point 
send another acknowledgement for sequence 1024, but this time with an offered window of 4,096, 
allowing the sender to transmit all the way up to sequence 1024 + 4096 = 5120. 
11 Where we are using the conventions that the sequence number refers to the upper sequence number carried in the 
packet, and that data packets are always full­sized. 

112 
When sequence 1536 arrives, then up to sequence 2560 may be ack'd, since the data up 
to there can now be delivered in­sequence. When the sender receives the ack of 2560, its window, 
meaning the range of data it can now send, ``advances.'' As part of this advance, the ``upper edge'' 
of the window, meaning the largest sequence number the sender can transmit (equal to the ack point 
plus the offered window), ``slides'' to a new maximum. Consequently, transport protocols using this 
form of flow control are termed ``sliding window'' protocols. 
In this fashion, the receiving TCP can (if it wishes) assure that it is always able to accom­ 
modate data arriving from the sender. If, for example, the receiving application ceases to consume 
data, then eventually the TCP's buffer will fill. When it does, the TCP will advertise a window of 
0 bytes, requiring the sender to cease transmission. 
9.2.3 Congestion control 
Quite separate from flow control is the vital performance issue of congestion control. The 
limitation on how fast the sender should transmit may derive not from limited buffer at the receiver, 
but limited capacity inside the network. Originally, TCP dealt with congestion control by setting the 
RTO (retransmission timeout) to a multiple of the estimated mean RTT (round­trip time). When the 
RTO expired, unacknowledged packets were retransmitted, and the RTO was doubled (``exponential 
backoff''), so that during periods of high congestion, the connection would progressively lower its 
sending rate. 
In a landmark paper, Jacobson described the shortcomings of this form of congestion con­ 
trol: in particular, its excessive consumption of resources due to retransmitting multiple packets at 
one time, and the instability that occurs because it does so precisely when the network has been over­ 
loaded to the point of packet loss. He also identified inadequacies in the RTO algorithm, which used 
only the estimated mean RTT, without including an estimated RTT variance [Ja88]. He addressed 
these problems by introducing a second window, the congestion window (or, cwnd), and a modified 
RTO algorithm that includes the estimated RTT variance, both of which have been incorporated into 
the TCP specification [Br89, St97]. It is no exaggeration to say that the Internet works today only 
because of these changes. Without them, the network would inevitably devolve into ``congestion 
collapse'' (discussed below). Thus, proper TCP congestion control is vital to the Internet's stability, 
a point we return to in Chapter 11, where we find that some TCP implementations fail to follow 
these requirements. 
We will focus on the first of Jacobson's changes, the congestion window. cwnd is com­ 
pletely separate from the receiver's offered window. At any time, a TCP sender must not send 
beyond the minimum of the two windows. The offered window governs how much in­flight data 
the receiver's buffer can accommodate, and cwnd governs how much the buffers along the network 
path can accommodate. The networking infrastructure, however, does not provide this latter infor­ 
mation explicitly (nor can it, for scalability reasons). Jacobson's insightful observation was that the 
network path does, however, provide an implicit signal that its buffer resources are scarce: namely, 
it drops packets. Thus, packet loss is interpreted as a sign of congestion (also observed by Jain in 
[Jai89]). Such losses are termed ``congestive losses.'' While packet loss can occur for other reasons, 
the presumption is that most losses occur due to congestion, and so merit a response by the sending 
TCP: diminishing the rate at which it transmits packets. It does so by reducing cwnd. 

113 
Time 
Sequence 
# 
0.0 0.1 0.2 0.3 0.4 
0 
2000 
4000 
6000 
Figure 9.1: Sequence plot of a TCP connection during its ``slow start'' phase 
9.2.4 Slow start 
Jacobson discussed two different issues in managing cwnd. The first is what value to 
use for it initially, which we address in this section. The second is how it should be cut upon 
detecting loss, to adapt to congestion, which we address in x 9.2.6. His scheme addresses the 
first issue by initializing cwnd to one packet (more precisely, to MSS bytes), so connections begin 
by transmitting just one packet and waiting for an acknowledgement. Each ack that arrives then 
increases cwnd by one packet (again, actually by MSS bytes). Thus, if the receiving TCP sends 
an ack for every in­sequence packet it receives, then in the absence of loss the congestion window 
will be 1 packet, 2 packets, 4 packets, 8 packets, and so on, where each increase reflects cwnd after 
the packets in the previous ``flight'' have been acknowledged. (We use the term ``flight'' to refer to 
a set of packets transmitted within a single RTT's worth of time.) Thus, in the absence of loss, 
cwnd increases exponentially quickly. It continues to do so until either it is limited by the receiver's 
offered window; or the connection suffers a loss, indicating that a network­imposed congestive limit 
has been reached; or the connection completes before either of these occur. 
This form of window increase is called ``slow start,'' since the window starts at a small 
value, and hence the TCP transmits slowly at first. Figure 9.1 shows a ``sequence plot'' of the 
packets sent and received by a TCP sender during its slow­start phase. We will make extensive use 
of such plots and so describe them here in detail. The x­axis gives time since the connection was 
established. The y­axis gives sequence numbers: these are either upper sequence numbers for data 
packets (shown as solid squares), or acknowledged sequence numbers for acks (hollow squares). 
Sequence plots are highly informative illustrations of what happens during a connection. 
Here, the solid square at T = 0 sec with sequence number 1 corresponds to the ``initial SYN'' 
packet. Each connection begins with the originator transmitting a packet with the ``SYN'' flag set 
in the header to request establishment of a connection (``SYN'' is short for ``synchronize sequence 

114 
numbers''). If the connection request is accepted, then the responder replies with a SYN acknow­ 
ledgement (``SYN­ack'') packet. If the sequence numbers in the SYN­ack accord with those that 
the originator sent, then the sender acknowledges the SYN­ack and the connection has been es­ 
tablished. Because establishment entails exchanging three packets---the initial SYN, the SYN­ack, 
and the final ack of the SYN­ack---it is referred to as a ``three­way'' handshake. 12 TCP terminates 
connections in a similar fashion, using an exchange of ``FIN'' (``finish'') packets and a final ack, for 
another three­way handshake. 
The initial SYN carries a sequence number of 1 because the SYN flag conceptually oc­ 
cupies the first sequence position of the byte stream. At about T = 0:07 sec, the plot shows the 
arrival of an acknowledgement for sequence number 1. This is the SYN­ack packet. Shortly after, 
the sender begins transmitting. It sends a single packet carrying 512 bytes (and with sequence num­ 
ber 513), because cwnd has been set to one packet due to slow start. This data packet also carries the 
ack for the SYN­ack packet, and hence completes the three­way handshake. When 513 is ack'd at 
T = 0:17 sec, the congestion window opens to two packets, and these are promptly sent (sequence 
numbers 1025 and 1537). Both of these are acknowledged by a single ack at T = 0:23 sec, which 
opens cwnd by an additional packet, and three new packets are sent. 13 
At T = 0:30 sec, an ack arrives for the first two of the three packets in this flight. It 
opens the window to four packets. Since one of these four is already in flight, the TCP only trans­ 
mits the three new ones. At T = 0:36 sec an ack arrives for the third packet of the earlier flight 
(sequence 3073). This is a delayed ack, one that the receiver momentarily refrained from sending 
in hope that more data would arrive and it could ack two packets at once. (The receiver employs an 
``ack­every­other'' policy for sending its acknowledgements, as do many TCPs.) Very shortly after 
this ack arrives, so does another one, for sequence 4097, corresponding to the first two packets of 
the most recent flight. Each of these acks advances cwnd by one packet, so after both arrive, cwnd 
is 6 packets. One of these is already in flight (and not yet unacknowledged), so the TCP sends the 
other five. 
Note that we can read the RTT directly from the plot: it is the x­axis distance between the 
transmission of a packet and its acknowledgement, in this case about 70 msec. 
The sending TCP continues to open up cwnd until loss occurs. If cwnd reaches the point 
where it exceeds the size of the offered window, then the connection becomes receiver­window 
limited. Figure 9.2 shows a sequence plot of the same connection later in its transmission, when 
this has occurred. We have added circles to the plot indicating the upper ``edge'' of the window, 
that is, the sum of the offered window and the sequence number acknowledged by the ack (the ``ack 
point''). We see that the sending TCP closely tracks the upper edge, sending packets up to that limit 
every time the edge advances. 
9.2.5 Self­clocking 
Another effect shown in Figure 9.2 is the important phenomenon of self­clocking. In 
the figure, each flight of data packets elicits in response an ack ``echo'' that preserves the temporal 
12 TCP uses a three­way handshake for reliability concerns that we will not describe further here; see [St94] for a 
detailed discussion. 
13 One might expect that cwnd would open by two additional packets, since the received ack acknowledges receipt of 
that many packets. However, the TCP standard governing congestion window management [St97] specifies that, during 
slow start, each ack for new data increases the window by one packet, regardless of how much new data has been ack'd. 

115 
Time 
Sequence 
# 
0.75 0.80 0.85 0.90 0.95 1.00 
28000 
30000 
32000 
34000 
36000 
38000 
40000 
Figure 9.2: Sequence plot of a ``window­limited'' TCP connection 
structure of the flight. When the flight of acks arrives at the sender, the window advances in step 
with the echo, because the receiving application is consuming the data as fast as it arrives, and hence 
the offered window remains constant---4,096 bytes, in this case. Because the connection is receiver­ 
window limited, the sending TCP then transmits new data whose temporal structure reflects that of 
the window advances, and thus ultimately reflects that of the previous flight of data packets, so the 
cycle continues. 
The term ``self­clocking'' is also due to Jacobson. It comes from the observation that a 
window­limited TCP connection will over time naturally pace out its data packets to exactly match 
the bandwidth available along the network path. Figure 9.3, reproduced from [Ja88], illustrates 
this property. The top ``pipe'' represents that network path from the sender to the receiver, and 
the bottom pipe that from the receiver back to the sender. The thickness of each component in a 
pipe reflects the bandwidth available at that part of the pipe, and horizontal distances correspond 
to differences in time. Each packet occupies a portion of the pipe, shown as a shaded region. The 
width of these regions indicates how long it takes the packet to traverse that portion of the pipe, and 
the height reflects that the packet consumes the region's available bandwidth during the traversal. 
In the figure, the sender has sent a number of packets back­to­back into the local, high­speed end 
of the path. These packets travel through the network closely spaced, until they reach the path's 
bottleneck (thin central region), where the available bandwidth sharply diminishes. At this point, it 
takes much more time to transmit each packet, so they spread out in time. 
The key observation underlying self­clocking is that once the packets have been spaced 
out to a distance P b by the bottleneck, they remain spaced out. That is, P r in the figure is equal 
to P b . There is no mechanism for subsequently recovering their initial spacing. 14 Furthermore, 
such recovery is not desirable: the distance P b is in fact the optimal spacing for the connection's 
14 To first order. See x 16.3. 

116 
P r 
P r 
A r 
A r 
A s 
P b 
P b 
Receiver 
Receiver 
Sender 
Sender 
A b 
A b 
Figure 9.3: TCP ``self­clocking.'' Reprinted with permission from [Ja88], copyright c 
fl1988 Asso­ 
ciation for Computing Machinery. 
packets. If they are transmitted any closer together, they will simply have to wait in a queue at the 
bottleneck anyway, because it cannot accommodate a faster rate. So we want packets ideally to be 
transmitted with a distance P b between them. Any less and they cause queueing without any gain 
in performance. Any more and the connection underutilizes the available bandwidth. 
The very nice property of window­based flow control is that, as the data packets arrive 
at the receiver with a spacing of P r = P b between them, the receiver generates acks for them and 
these also have a spacing A r = P r = P b between them. Furthermore, the acks are small and are not 
spaced out by the bottleneck along the return path, even if it is smaller than that along the forward 
path. Consequently, the acks arrive at the sender with a spacing A s between them, and because 
the timing has not been perturbed, A s = P b . Finally, these acks then advance the window, and the 
sender transmits new packets in response to them. The timing of the new packets, however, reflects 
that of the acks, and hence they have a spacing of P b , just as we desire. Thus, the connection 
settles into a state in which it ``clocks out'' new packets at exactly the proper rate for the available 
bandwidth. 
Receivers that employ ack­every­other policies, such as that shown in Figure 9.2, perturb 
self­clocking in only a minor fashion. Instead of generating acks with a spacing A r = P b between 
them, they generate acks with a spacing of A r = 2P b . Consequently, in the absence of extra delays 
along the return path, they arrive at the sender with that same spacing. Note, however, that these 
acks advance the window by two packets each instead of one packet, so the sender then transmits 
two packets that are spaced a distance P r = 2P b apart from the previous group of two packets. Thus, 
over intervals of 2P b , the connection transmits at exactly the bottleneck rate. On finer time scales, 
it can transmit faster, but the excess is only one additional packet, so very modest buffer space at 

117 
the bottleneck can accommodate the burst. In x 11.6.1 we will see other acking policies that involve 
acking large numbers of packets with single acks. These can lead to highly bursty arrivals at the 
bottleneck. 
Self­clocking is an idealized state. In practice, connections might not self­clock due to 
delay variations along the network paths in either direction, discussed in depth in Chapter 16. One 
particular form of delay variation that defeats self­clocking is timing compression, which we discuss 
in x 16.3. 
Connections also do not self­clock when in the slow­start phase, since arriving acks do 
not simply advance the window, but widen it also. Consequently, the average spacing between 
the packets transmitted by the sender during slow start will be less than P b . When the network 
is unloaded, this behavior is not only acceptable but desirable, because our goal is to continually 
send packets with intervals of P b between them (``filling the pipe''). If we can accomplish this, 
then the connection sustains transmission at the full available bandwidth, and we have achieved the 
maximum performance possible along the given path. However, a TCP sender does not know in 
advance the proper value of P b (and the proper value might change over the course of a connection). 
Slow start is a mechanism for hunting for the correct spacing, by continually opening up the window 
until the connection finds itself in the self­clocking regime corresponding to the currently available 
bandwidth. The difficulty with this hunt is knowing when to stop. TCP currently determines the 
stopping point when it has driven the network to the point of packet loss (see below). But this point 
corresponds to having exceeded the available bandwidth by a factor proportional to the available 
buffer space, too, since Internet routers today only drop packets when their buffers are exhausted. 
Thus, when beginning a connection, using slow start will often drive the network to the point of 
loss, which is excessive, since we instead want them to drive the network to the limit of available 
throughput. 
There are proposals for modifying either TCP [WC91, WC92, BOP94] or the drop policy 
used by routers [RJ90, FJ93] so that connections can find the available bandwidth without unduly 
stressing the network. We comment on both approaches as we analyze our measurements. The TCP 
modifications are of particular interest for our study because they rely on accurate packet timing 
information, which we will find can be elusive. 
In Figure 9.2, the connection fails to fill the pipe because it is receiver­window limited. 
In general, to fill the pipe requires that the window size in bytes, w, exceeds the ``bandwidth­delay 
product,'' i.e.: 
w – ae A \Delta RTT; (9.2) 
where ae A is the available bandwidth in bytes per second, and RTT is the round­trip time in seconds. 
We develop this relationship in detail in x 16.1. While estimating RTT is not difficult, estimating ae A 
is, so connections cannot readily use Eqn 9.2 to determine their correct window size. In Chapter 14 
we discuss ways of estimating the bottleneck bandwidth, ae B , which is an upper bound on ae A , and 
in x 16.5 we look at the relationship between the two. 
9.2.6 Responding to congestion 
The other fundamental component of Jacobson's modifications to TCP concerns how TCP 
reacts to congestion, i.e., periods when some element of the end­to­end chain of routers and links 
is under stress: a sustained interval during which packets arrive more quickly than the element 

118 
can service them. During congestion, the unserviced packets are queued at the congested router 
until they can be serviced. If the congestion lasts long enough, the queues build and build until 
eventually the queued packets exhaust the router's buffer. At this point, the router must discard 
incoming packets. 
Note that congestion spans a spectrum between busy periods during which queues grow 
large, and periods when no more buffer remains and packets are lost. Thus, packets may or may 
not be lost during congestion periods, depending on the sizes of the buffers and the duration of the 
congestion. (We examine the interplay between delay and loss in x 16.2.4.) 
Congestion is potentially lethal to a network because it can lead to positive feedback that 
sustains and even magnifies the congestion. In particular, if packet loss leads to retransmissions 
that are sent at the same rate as the original packets, then the load borne by the network will not 
diminish, and the congestion sustains itself. Packets from newly­initiated connections add further 
to the load, leading to even higher levels of congestion. 
The positive feedback can thus bring the network to a state of congestion collapse, in 
which the network load stays extremely high but throughput is reduced to close to zero [Na84]. 
Exactly this happened in the early days of the Internet, and led to Jacobson's work on TCP conges­ 
tion avoidance [Ja88]. As discussed above, one of the key insights of that work is that the network 
provides an implicit signal of congestion in the form of packet loss. Barring loss due to causes such 
as transmission noise on a network link, the network should only discard packets if it no longer has 
enough buffers to carry them. Consequently, when a TCP sender observes a packet loss, it should 
infer that the network path is congested, and ease its use of the path by cutting cwnd (and hence 
limiting its transmission rate). It does so as follows. 
First, upon retransmitting, cwnd is set to one packet, so the connection begins a ``slow 
start'' phase in order to hunt for the correct value of the available bandwidth again. Second, the TCP 
state variable ssthresh (``slow start threshold'') is set to half of the window in effect at the time of the 
retransmission (i.e., the smaller of either the offered window, or the value of cwnd prior to setting 
it down to one packet). The intent behind ssthresh is to denote the window size beyond which it 
is likely that there is no more available bandwidth. The sending TCP should only gingerly expand 
cwnd beyond ssthresh. 
As acks arrive for packets now transmitted by the sender, each increases cwnd by one 
packet, per the usual slow­start increase. Once cwnd reaches ssthresh, however, then the TCP 
increases cwnd by only one packet per RTT. Thus, the rate at which cwnd increases changes from 
exponential during the slow­start phase to linear during the ``congestion avoidance'' phase. 
Figure 9.4 illustrates how a TCP timeout retransmission appears on a sequence plot. At 
T = 2:3 sec, data up to 24577 has been acknowledged, and eight more packets are in flight, which 
equals the offered window. A little later two more acks for 24577 arrive (``duplicates,'' as discussed 
in x 9.2.7 below). However, no additional acks are forthcoming. At T = 3:4 sec, the RTO ex­ 
pires and the sending TCP retransmits the first unacknowledged packet, 25089. At this point, cwnd 
has been set to 1 packet (which is why only one packet is retransmitted), and ssthresh has been 
set to 4 packets, half of the window in effect at the time of the retransmission. The retransmit­ 
ted packet elicits an ack for 28673, corresponding to all of the outstanding data. This indicates 
that only 25089 was dropped by the network---all of the later packets arrived at the receiver, so a 
retransmission of 25089 was all that was needed to fill the sequence ``hole.'' 
The ack for 28673 both advances the window edge and enlarges cwnd to 2 packets, so 

119 
Time 
Sequence 
# 
2.5 3.0 3.5 
25000 
30000 
35000 
Figure 9.4: Sequence plot showing a TCP timeout retransmission 
the sender now transmits two new packets. As acks arrive, the sender continues rapidly increasing 
cwnd in slow­start fashion. However, when the ack for 32257 arrives at T = 3:75 sec, cwnd does 
not increase from 5 packets to 6 packets, but remains at 5 packets, because the TCP has now entered 
congestion avoidance. It is only after the arrival of the ack for 35329, the last in the plot, that cwnd 
increases to six packets. 
While the above outlines the congestion avoidance principles, in practice there are many 
fine points regarding exactly how congestive avoidance is implemented. (For example, why in 
Figure 9.4 it took more than one RTT during congestion avoidance for cwnd to increase by one 
packet.) We discuss a number of these in Chapter 11. 
9.2.7 Fast retransmit and recovery 
In addition to timeouts, TCP supports another retransmission mechanism, called ``fast 
retransmit.'' It is also due to Jacobson. Although not part of the TCP specification, it is widely 
implemented. Fast retransmission is an attempt to avoid the sometimes lengthy lulls a connection 
experiences upon a loss, due to the RTO being much larger than the RTT. Figure 9.4 above illustrates 
the problem. For this connection, the RTT was about 65 msec, but the RTO wait was 1.2 sec. 
In general, RTO should be larger than the maximum RTT a connection's packets might ex­ 
perience, in order to allow enough time for acks to arrive. Yet, it is difficult to estimate this maximum 
due to frequent fluctuations in RTT, and, furthermore, it is important to estimate it conservatively, 
i.e., overestimate it rather than underestimate it, so that packets are not needlessly retransmitted. 
(We will see the effects of underestimation in x 11.5.10.) Finally, many TCP implementations have 
access to only coarse­grained clocks, so it is difficult for them to time small RTOs. 
To address this problem, Jacobson noted that TCPs receive an additional, implicit signal 
when a packet has been lost. This signal comes in the form of the arrival of ``duplicate acks.'' When 

120 
Time 
Sequence 
# 
0.85 0.90 0.95 1.00 1.05 
32000 
34000 
36000 
38000 
40000 
42000 
Figure 9.5: Sequence plot showing a TCP ``fast retransmission'' 
above­sequence data arrives at a TCP receiver, the specification states that the TCP should generate 
a redundant acknowledgement. 15 These are termed ``duplicate acks'' or ``dups.'' In Figure 9.4 we 
see two of these arriving at T = 2:33 sec and T = 2:53 sec. Since all but the first packet beyond the 
ack point arrived at the receiver, it should have sent 7 dups. From the plot, we cannot tell whether 
it did so but 5 were lost, or if it failed to do so. (It turns out that, in this case, it failed to do so. The 
TCP implementation was one that does not include the recommended generation of dups.) 
Fast retransmission works by counting duplicate acks, and, if their number reaches a 
given threshold, N d , then the sending TCP infers that the packet beyond the ack point was lost, and 
retransmits it. Current implementations use N d = 3. This value was chosen as a trade­off between 
not missing fast retransmit opportunities because too few dups arrive, versus not misinterpreting the 
arrival of dups and retransmitting unnecessarily. The latter can occur when packets arrive out of 
order. In x 13.1.3 we examine how well N d = 3 performs, and find that it does very well, almost 
always detecting true loss and not being fooled by reordering; and, further, that N d = 2 would result 
in TCPs being fooled significantly more often. 
Figure 9.5 shows a sequence plot depicting a fast retransmission. Packet 36865, originally 
transmitted at T = 0:85 sec, was lost, but all of its 6 successors arrived successfully. These then 
elicit six dups, the third of which causes a fast retransmission at T = 0:93 sec. At this point, cwnd 
is one packet and ssthresh is 4 packets. When the retransmission is ack'd at T = 0:98 sec, slow 
start advances cwnd to 2 packets, and then to 3 packets upon receipt of an ack for those two. 16 
Fast retransmit works very well for eliminating the lengthy timeout lull, provided enough 
above­sequence packets arrive at the receiver to elicit at least 3 dups. (If the receiver's offered 
15 It also does this if below­sequence data arrives, i.e., unnecessarily retransmitted data. We explore the distinction 
between these two in x 13.1.3. 
16 The apparent duplicate ack for 39937 is in fact a ``window update,'' per x 9.2.2. TCPs are careful to distinguish 
between window updates and true duplicates, as the former do not indicate the safe arrival of an additional data packet. 

121 
window is small, or if cwnd is small, then this may be a problem.) Jacobson further refined it with a 
mechanism termed ``fast recovery.'' The observation underlying fast recovery is that each additional 
dup beyond the first N d = 3 indicates that another data packet has arrived at the receiver. Thus, it 
is sound to increase cwnd (which was cut to 1 packet upon the fast retransmission) by one packet 
for each of these, though not to exceed ssthresh. Furthermore, it is sound to increase cwnd by N d 
packets upon a fast retransmission, too, because each of the first N d dups likewise corresponds to a 
successfully received packet. 
Thus, fast recovery opens cwnd more quickly. If this were all that the TCP did, then fast 
recovery would lead to a large burst when the TCP received an ack for the retransmitted packet 
(T = 0:98 sec in Figure 9.5), because at this point cwnd would often be much larger than 1 packet 
(then increased in Figure 9.5 to 2 packets by the arrival of the ack). To eliminate this burstiness, fast 
recovery also specifies that, if the TCP receives enough additional dups, it then begins transmitting 
new data, before it has received an acknowledgement for the retransmitted data. Thus, the algorithm 
looks like: 
1. Upon receiving 3 dups, set ssthresh to half the effective window, set cwnd to one packet, and 
retransmit the first unacknowledged packet. 
2. Next, ``inflate'' cwnd using: 
cwnd / ssthresh + 3; 
where the constant 3 reflects the three duplicates already received. 17 
3. Whenever another dup arrives, increase cwnd by one more packet. If cwnd is now large 
enough to transmit new data, do so. 
4. When an ack arrives that advances the ack point to or beyond the last packet that was in flight 
prior to the fast retransmission, then fast recovery ends. Execute: 
cwnd / ssthresh 
to ``deflate'' the window to its proper post­recovery size, and update cwnd from the ack nor­ 
mally. 
Figure 9.6 illustrates how fast recovery appears on a sequence plot. A number of dups 
arrive for 74573, which is retransmitted after the third dup is received (i.e., after four acks for 74573 
arrive, the first being the ``original'' and the others being dups). Prior to the retransmission, the 
window was 8 packets, so after the retransmission ssthresh is 4 packets, and after window inflation, 
cwnd is 4 + 3 = 7 packets. The next dup advances cwnd to 8 packets, but the TCP already has 
8 packets' worth of data in flight, so it cannot retransmit at this point. The dup after that, though, 
arriving at T = 7:94 sec, advances cwnd to 9 packets, and this is enough to liberate a new data 
packet, 79361. Two more dups after it advance cwnd to 10 and 11 packets, and two more data 
packets are sent. Then, at T = 7:96 sec, all of the data outstanding prior to the retransmission is 
ack'd (closely followed by a window update, the second ack shown overlapping with the first). At 
17 We have simplified discussion by presenting the algorithms in terms of full­sized packets, when in fact they are 
implemented in terms of bytes. Provided all of the packets contain a full MSS' worth of data, these two are equivalent. 

122 
Time 
Sequence 
# 
7.86 7.88 7.90 7.92 7.94 7.96 
72000 
74000 
76000 
78000 
80000 
Figure 9.6: Sequence plot showing TCP ``fast recovery'' 
this point, the window deflates back to ssthresh, or 4 packets. The ack is then processed, and since 
this TCP's test for congestion avoidance is 
cwnd ? ssthresh 
rather than 
cwnd – ssthresh 
(used by some other TCPs), the connection is deemed still in slow start, so the ack advances cwnd 
to 5 packets. Three of these are already in flight, so the TCP transmits two new packets. Thus, the 
TCP was able to continue transmitting, and ended the retransmission period with cwnd having just 
entered congestion avoidance, and it did so without generating any unduly large bursts. 
We make one final point regarding fast recovery. The window inflation and deflation is 
subtle (and often confusing). It arises due to conflating the meaning of cwnd to be both ``how many 
packets the connection can have in flight'' and ``how far above the ack point can the connection 
transmit.'' During fast recovery, these notions are separate, since some of the packets above the 
ack point are indeed no longer in flight (because they are what caused the dups). Because these 
points are subtle, we should not be too surprised to learn in Chapter 11 that TCPs implementing fast 
recovery suffer from more than one bug in managing the window deflation. 
9.3 The Raw Measurements 
Table XIV lists the 35 sites that participated in the two experimental runs, N 1 and N 2 . 
Tables I and II in Part I summarize the sites. 
We conducted the first run, N 1 , during December 1994, coincident with the routing study. 
Likewise, we conducted the second, N 2 , during November--December 1995. As with the routing 

123 
Name # N 1 # N 2 Tracing machine 
adv -- 1,244 
austr 207 1,036 BSDI 1.1 
austr2 -- 1,259 
bnl 307 1,200 
bsdi 166 1,374 
connix 308 1,474 
harv 190 1,061 
inria 172 1,180 
korea 49 -- HP/UX 9.01 
lbl 318 1,412 SunOS/BPF 
lbli 230 1,134 SunOS/BPF 
mid -- 1,295 
mit 308 -- 
near -- 1,296 SunOS/BPF 
nrao 301 982 
oce 126 838 
panix -- 240 
pubnix 148 1,085 
rain -- 1,289 
sandia -- 1,182 
sdsc 259 964 
sintef1 -- 1,469 NetBSD 1.0 
sintef2 -- 1,524 SunOS/BPF 
sri 194 1,306 
ucl 230 1,266 
ucla -- 1,397 SunOS 4.1 
ucol 275 1,208 SunOS 4.1 (N 1 ) 
ukc 299 989 SunOS 4.1 
umann 222 998 
umont 144 1,469 SunOS 4.1 
unij 74 1,412 SunOS 4.1 
usc 231 -- SunOS 4.1 
ustutt 240 1,165 
wustl 304 1,232 
xor 316 -- 
Total 2,805 18,490 
Table XIV: Sites participating in the packet dynamics study 

124 
study, differences between N 1 and N 2 give us an opportunity to analyze how Internet packet dy­ 
namics changed during the course of 1995. 
The second and third columns give the number of connections in which the site partici­ 
pated as either sender or receiver. The final column lists the operating system of the machine used 
to trace the site's TCP traffic, or empty, if the tracing was conducted on the same machine as ran the 
TCP. Tracing systems listed as ``XYZ/BPF'' had the Berkeley Packet Filter installed [MJ93], which 
greatly aids with accurate packet measurement. One site, ucol, changed its measurement setup 
between N 1 and N 2 , using a separate machine during N 1 but the same machine during N 2 . 
As discussed above, each measurement was made by instructing the Network Probe Dae­ 
mons (NPDs) running at two of the sites to send or receive a 100 Kbyte TCP bulk transfer, and to 
trace the results using tcpdump. An important difference between N 1 and N 2 is that in N 2 we used 
Unix socket options to assure that the sending and receiving TCPs had sufficiently large buffers that 
they were never ``window limited'' (x 9.2.4), to prevent window limitations from throttling the trans­ 
fer's throughput. This change has a downside, which is that it sometimes clouds apparent trends 
between the N 1 measurements and the N 2 measurements with questions concerning whether the 
trends are simply artifacts of using bigger windows in N 2 . Nevertheless, the change was worth 
making, since the bigger windows enabled the N 2 connections to push considerably harder on the 
network path, with more opportunities to observe the amount of resources the path had available as 
a result. 
Finally, we limited measurements to a total of 10 minutes, as a mechanism to prevent 
measurement attempts from indefinitely consuming resources at the NPD sites. This limit leads to 
under­representation of those times during which network conditions were poor enough to make it 
difficult to complete a 100 Kbyte transfer in that much time. Thus, our measurements are biased 
towards more favorable network conditions. In x 15.1 we show that the bias was negligible for 
North American sites, but noticeable for European sites. 

125 
Chapter 10 
Calibrating Packet Filters 
The data for our entire packet dynamics study are traces of packets sent through the net­ 
work recorded by the tcpdump utility, written by Van Jacobson, Craig Leres, and Steve McCanne 
[JLM89]. tcpdump uses a host computer's packet filter to measure when packets appear on the 
local network. Packet filters are operating system services for recording network packets. In this 
chapter we discuss the general problem of how to test the soundness of a trace measured by a packet 
filter, and the specific issues that arise from the different packet filters used in our study. 
We begin by introducing the notion of ``wire time'' (x 10.1), and then describe how packet 
filters work (x 10.2). One of the goals of a packet filter is to record wire times as accurately as possi­ 
ble. In x 10.3 we give an overview of the sorts of measurement errors packet filters can exhibit. For 
each error, we discuss how tcpanaly attempts to detect its presence when analyzing a tcpdump 
trace. While not a measurement error, a packet filter's ``vantage point''---where in the network it is 
located---can also complicate the analysis of a tcpdump trace, which we discuss in x 10.4. Finally, 
it is not quite as simple as it might at first appear to pair up instances of the same packets in two 
tcpdump traces, one recorded at the TCP sender and one at the receiver. x 10.5 covers the details 
of doing so. 
10.1 The notion of ``wire time'' 
If we wish to accurately describe how packets travel through a network, then we need to 
carefully specify exactly what we mean when we associate a time with a packet's appearance on a 
link in the network. To do so, we introduce the notion of ``wire time.'' Wire time is defined in terms 
of a particular measurement location M on a particular network link L. For a given packet p, the 
wire time of p on L is the time t at which p appears at M on L. Note that this definition is vague 
in some fundamental ways. Sometimes what we (ideally) want to know is when p first appears at 
M , which one might define as p's ``wire arrival time,'' corresponding to the first moment at which 
any bit of p is viewed at M on L. Other times we want to know when p finishes appearing at M , its 
``wire completion time,'' corresponding to the first moment at which all the bits of p have been seen 
at M on L. These two times can be quite different if L has a low bandwidth, and so it takes a long 
time for all of p's bits to pass M . 
Depending on the particular link, wire times can vary considerably with different mea­ 
surement points on the link, such as the two ends of a satellite link; or very little, such as the various 

126 
measurement points on an Ethernet. 
For each packet recorded by a packet filter, the filter generates a timestamp corresponding 
to the time at which the filter captured the packet (discussed further in x 10.2). One goal of a well­ 
designed packet filter is to ensure that this timestamp is as close as possible to the packet's wire time 
with respect to the packet filter's measurement location, M . 
However, the filter's location M will often differ from the location E where the connec­ 
tion endpoint whose traffic we wish to measure resides. (This difference affects the packet filter's 
vantage point, an issue we discuss in detail in x 10.4.) Consequently, it may be difficult to accurately 
estimate from packet filter timestamps recorded at M the wire times as seen at E. In our study, how­ 
ever, the packet filters always monitored the same local­area network (LAN) as was used by one of 
the endpoints in our study, or ran on the endpoint itself. Since the LAN's had small propagation 
times, this means that the packet filter timestamps were (potentially) quite close to wire times as 
seen at E. 
10.2 How packet filters work 
The goal of a packet filter supplied with an operating system is to selectively record net­ 
work traffic. This operation is referred to as packet ``capture.'' The captured packets might be only 
to or from the computer running the packet filter, or might be ancillary traffic that has nothing to do 
with the local computer. In the latter case, the filter still needs some way to ``see'' the traffic in order 
to measure it. This is done by means of passively monitoring broadcast media such as Ethernet or 
FDDI networks, a mode of operation referred to as ``promiscuous.'' With non­broadcast media such 
as point­to­point links or Ethernet ``hubs,'' passive operation is sometimes not possible (depending 
on the design of the networking elements) unless considerable pains are taken to split the physical 
signal so that the passive monitoring machine receives its own copy. For our study, measurement 
was always done either in the context of a broadcast medium, or on the endpoint host itself. 
The position of a packet filter with respect to the TCP endpoints, or its ``vantage point,'' 
can complicate analysis of cause­and­effect among the streams of packets between the sender and 
the receiver. We discuss this issue further in x 10.4. We note here that vantage point complexities are 
often more significant for passive monitoring because the monitoring machine is further removed 
from the TCP endpoint. Apart from this issue, which can be important, we in general prefer passive 
monitoring because it minimizes measurement error. A passively­monitoring packet filter often can 
yield more accurate estimates of ``wire time'' because the computer doing the measurement is not 
also busy processing the network traffic itself. 
Packet capture usually takes place inside the operating system's kernel, since dealing 
with hardware devices such as network interfaces generally falls within the kernel's domain. It is 
presumably at this point that the packet's timestamp is generated, reflecting the time at which the 
packet was captured. Hopefully this occurs as early in the process as possible, so that the timestamp 
is as close to the packet's wire time (x 10.1) as possible. 1 
Depending on what one wishes to measure, often most of the network traffic seen by the 
filter is irrelevant and needs to be discarded. Doing so is termed packet ``filtering,'' and provides the 
genesis for the name ``packet filter.'' 
1 The timestamps generally are closer to ``wire completion'' times than ``wire arrival'' times, since usually the timestamp 
is generated after the entire packet has been received from the network interface. 

127 
Operating systems greatly differ on the amount of filtering provided by their kernels. 
Some provide only very simple filtering, while others allow quite sophisticated pattern­matching. 
The difference can be very important for network measurement, because, if a kernel supports only 
crude filtering, then additional filtering must be performed by the application program accessing 
the packet filter. This filtering is done at the user­level, which entails copying the potentially very 
high volume of network traffic from the kernel up to the user­level, merely so almost all of it can 
be discarded. This copy operation can take considerable processing, and thus can greatly aggravate 
the problem of packet filter drops (x 10.3.1). For this reason, one generally prefers what is termed 
a kernel packet filter, meaning a packet filter that implements sophisticated filtering at the kernel 
level, since these can much more rapidly winnow down the packet stream to just those of interest to 
the application. 
We used the tcpdump utility for generating our packet traces. tcpdump is written in 
terms of libpcap, a library that knows about a great number of packet filters provided by different 
operating systems [MLJ94]. libpcap provides packet filtering using the BSD Packet Filter (BPF; 
[MJ93]). For operating systems that fail to provide much in terms of kernel­level packet filtering, 
libpcap hauls up all the packets received by the filter and uses the BPF matcher at user­level to 
filter. For systems that provide BPF­equivalent kernel filtering, libpcap knows how to download 
a filter from the application program (tcpdump, in our case) to the kernel, to obtain the benefits of 
kernel­level filtering. 
Of the sites participating in our study, libpcap was able to use kernel­level filtering on 
those systems running the following operating systems: BSDI (bsdi, connix, pubnix, rain; 
austr's separate tracing machine), NetBSD (panix; sintef1's separate tracing machine), and 
Digital OSF/1 (harv, mit, ucol in N 2 , umann). In addition, some systems had BPF manually 
added to their kernels (lbl, lbli, near; sintef2's tracing machine). For the remainder, libpcap 
performed packet filtering at the user­level. 
In all cases, the filtering used in our study was for packets with the IP addresses for the 
NPD source and destination hosts in their IP header, and also with both a source port of 7,505 and 
a destination port of 7,505, as these were the ports used by all of the NPD probe traffic. Note that 
we did not additionally capture ICMP traffic directed to either host, the lack of which subsequently 
complicated our TCP analysis, since one form of ICMP message (``source quench,'' cf. x 11.3.3) 
alters the TCP behavior of a host receiving it. 
A final measurement consideration concerning packet filters is the use of a ``snapshot 
length'' or snaplen to control how much of each packet the filter records. Often, for network analy­ 
sis all that is required is to record the packet headers. Doing so and omitting the packet contents can 
save large amounts of both copying (minimizing processing time and thus decreasing the chance of 
measurement drops) and storage space. Consequently, for our study we only recorded packet head­ 
ers. Doing so limited certain types of analysis that require packet contents for full accuracy, such 
as assessing the prevalence of data corruption. We discuss how we worked around this limitation in 
x 11.4.2. 
10.3 Packet filter errors 
It is crucial in any study based on packet filter measurement to consider the forms of 
measurement errors that packet filters can exhibit. In this section we discuss five types of errors: 

128 
drops; additions; resequencing; timing; and misfiltering. For each, we look at the impact of the error 
on subsequent analysis, and how tcpanaly attempts to diagnose the presence of the error. 
10.3.1 Drops 
The most widely recognized (and often most common) form of packet filter error is the 
presence of drops, in which the trace produced by the filter fails to include all of the packets ap­ 
pearing on the network link that matched the filter pattern. The missing packets are said to have 
been ``dropped.'' The usual reason that drops occur is that the measuring computer lacks sufficient 
processing power to keep up with the rate at which packets arrive on the monitored network link. 
This is particularly a problem for machines requiring ``user­level'' filtering (x 10.2), because for 
them considerable processing can be spent simply moving the stream of monitored packets up to 
the user level from the kernel level. 
Packet filter drops can present serious problems for analyzing network traffic. For ex­ 
ample, any analysis of network packet loss rates must be certain not to confuse filter drops with 
true network drops. Furthermore, filter drops generally occur during periods of peak network load. 
These are often precisely the times of greatest interest for studying traffic dynamics. If the peaks 
are ``clipped,'' one can easily underestimate the maximum load the network experiences [FL91]. 
In general, packets can be dropped at two different places. The network interface card 
that connects the monitoring computer to the network link can run out of buffer memory for storing 
packets awaiting recording, because the kernel is too busy doing other things to read them quickly 
enough from the card; or the kernel itself can exhaust its buffer for storing packets awaiting con­ 
sumption by the user­level tracing utility. Once a packet is successfully transferred to the tracing 
utility, it is usually immune from further drops (unless it fails to match the filter, naturally), but 
the time required to subsequently transfer it to permanent storage can result in the user­level utility 
failing to consume new packets at the same rate that the kernel makes them available, eventually 
exhausting the kernel's buffer memory. 
As discussed in x 10.2, kernel­level packet filters are generally much less susceptible to 
drops because they pare down the measured packet stream much more rapidly than do user­level 
packet filters, and hence require much less processing time. 
10.3.2 Packet drop reports 
The operating system's packet filter interface usually includes a mechanism to query how 
many packets the kernel dropped, taking care of the second place where packets can be dropped. 
Network interface cards, on the other hand, often supply only crude signals that packets were 
dropped (such as a boolean flag indicating simply whether or not any drops have occurred), making 
it more difficult to evaluate drops occurring due to the kernel being unable to keep up with rate of 
packet arrivals. 
Unfortunately, some operating systems do not report drops (harv, ucol in N 2 , korea, 
sandia; most of the Solaris sites). Others report drops when in fact the trace includes all of the 
connection's packets. This can occur with user­level filtering, because the drop count tallies the 
number of packets the kernel was unable to deliver to the user level, and it can be the case that 
none of these belong to the connection of interest. Worse, some report no drops when in fact there 
were drops. This occurred numerous times for the NetBSD 1.0 machine used to trace sintef1's 

129 
traffic, and also for some of the Solaris machines that nominally reported drop counts (xor, austr2, 
nrao in N 2 ). None of these systems ever reported a drop count other than zero, indicating that the 
accounting machinery is absent. 
Finally, we note that packet drops were quite rare for the systems with kernel­level filter­ 
ing, though they did sometimes occur. 
10.3.3 Inferring filter drops 
Because we cannot trust the different packet filters to reliably report drops, tcpanaly 
employs a number of self­consistency checks to infer their presence. The key in doing so is to be 
certain not to mistake a genuine network drop for a filter drop, while still detecting filter drops as 
reliably as possible. 
Fortunately, for TCP traffic it is usually possible to discern between a network drop and a 
filter drop, because TCP is reliable. This means that a (correct) TCP implementation will diligently 
work to repair genuine network drops, while taking no action in response to filter drops (since, in 
fact, it successfully transmitted the packets). 2 This observation leads to a number of self­consistency 
checks employed by tcpanaly: 
1. Since TCP implementations send data in sequence order, except during retransmission, a 
``skip ahead'' in which new data is sent that does not follow the highest sequence sent so 
far indicates that the packet filter dropped some earlier­sent data (namely, the data that was 
indeed in­sequence). 
When applying this check, one must be careful to allow for the possibility of a network 
``interface drop.'' That is, the implementation may appear to have skipped ahead because the 
earlier­sent packet, while successfully transferred from the sending computer to its network 
interface card, never made it out from the card onto the local network. Interface drops are 
actually a special case of the ``vantage point'' problem discussed in x 10.4 below. 
tcpanaly distinguishes between a likely measurement drop and an interface drop by check­ 
ing to see whether the TCP later retransmits the skipped packet. If so, it most likely did 
so because the packet did indeed fail to arrive at the receiver, and it was an interface drop. 
If not, then the packet must have arrived at the receiver (since TCP is reliable), so it was a 
measurement drop. 
2. Even during retransmission, TCPs have a particular order in which they will retransmit data. 
While this varies between implementations, for those implementations tcpanaly knows 
about, it can detect whether the TCP deviates from the order, which generally indicates that 
the packet filter either dropped an incoming ack that altered the retransmission order, or an 
outgoing data packet that maintained the integrity of the retransmission order. 
3. Since a TCP implementation should never send data beyond the upper edge of the congestion 
window (x 9.2.2), or the inflated congestion window in the case of fast recovery (x 9.2.7), the 
presence of such in a trace is much more likely to be due to the packet filter having dropped 
an ack. 
2 An exception is if a packet is dropped by both the packet filter and, later, by the network. 

130 
Detecting this inconsistency is difficult because it requires understanding exactly how the 
particular TCP implementation manages its congestion window. tcpanaly does have this 
knowledge (Chapter 11), however, so it can make this consistency check. This is fortunate, 
because if the receiver is offering a spacious window, as was the design in N 2 (x 9.3), then 
offered window violations (see below) will be very rare, even in the presence of filter drops of 
acks; but congestion window violations will still flag most instances in which the filter drops 
an ack. 
4. For TCP implementations free of retransmit timer problems (cf. x 11.5.8 and x 11.5.10), the 
presence of an uncalled­for retransmission usually indicates that the packet filter has dropped 
one or more acknowledgements that triggered a ``fast retransmit'' (x 9.2.7) sequence. 
5. A failure of a TCP to send data when it apparently was allowed to do so can likewise signal a 
packet filter drop---the data was actually sent, but the filter failed to record this fact. However, 
there are many reasons why a TCP might not send data when it appears it can, including not 
having data available from the sending application; attempting to avoid the ``silly window 
syndrome'' ([Cl82]); or the host processor being busy doing something else. Because it can 
be difficult to determine if one of these is the reason the TCP failed to send, tcpanaly does 
not consider a failure to send as indicative of a measurement drop. 
6. A properly functioning TCP will never acknowledge data that has not arrived, nor will it 
acknowledge data above a sequence ``hole'' (some earlier data has still not arrived), since 
TCP acknowledgements are cumulative. Presence of such acknowledgements are thus much 
more likely to be due to the packet filter having dropped some incoming data packets. 
7. Since a TCP implementation should never send data beyond the upper edge of the offered 
window (cf. x 9.2.2), the presence of such in a trace is almost certainly due to the packet filter 
having dropped an ack (or having resequenced an ack; see x 10.3.6 below). 
8. If data is sent before the connection is fully established (x 9.2.4), this usually indicates that 
some of the packets in the establishment sequence were dropped by the filter. 3 
Most of these checks can only be conducted from vantage points (x 10.4) that are local to 
the point where the bogus traffic is sent (or fails to be sent). If the vantage point is some distance 
away (in particular, if it is at the opposite end of the connection) then one cannot always distinguish 
measurement drops from network drops. Consequently, the first five of the checks can only be 
reliably assessed from traces gathered at the data sender, the sixth can only be reliably assessed at 
the receiver, and the last two can be reliably assessed at either end, since they should never occur 
regardless of earlier packets dropped by the network. 
For trace pairs, tcpanaly makes one further check: if a packet arrives at the receiver 
that was never sent according to the sender trace, then almost always this indicates a measurement 
drop at the sender. (Note that this check is complementary to those above, and does not serve to 
replace them, since it only detects measurement drops at the packet sender.) For further discussion, 
including why it does not always indicate such, see x 10.5 and x 13.2. 
3 The T/TCP extension to TCP allows data to be sent prior to full establishment [Br94, St96]. None of the TCPs in our 
study used T/TCP, however. 

131 
10.3.4 Trace truncation 
Related to packet filter drops but slightly different is the problem of trace truncation. 
Truncation occurs when the filter misses the packets belonging to either the beginning of a connec­ 
tion or the end. Both cases are easy to detect because TCP connections are delimited by an exchange 
of special connection management packets (x 9.2.4). If this exchange is missing, then the trace has 
been truncated. 
Trace truncation occurs due to a race between when the measurement process begins and 
finishes executing and when the connection itself begins and finishes. npd control attempts to 
avoid this race by waiting five seconds between requesting that the remote npd's start their mea­ 
surement processes, and requesting that they proceed with the connection. Similarly, it waits five 
seconds after the transfer source indicates it has finished before requesting that the remote npd's 
terminate their measurement processes. 
These delays do not always avoid the race, however, particularly because npd control's 
trace requests may themselves be held up in the network due to transmission delays, so the transfer 
request can wind up arriving right on the heels of the measurement request. In addition, the sending 
application can consider itself as done transmitting its data well before its TCP actually completes 
the transfer, due to retransmissions that occur after the application has scheduled all of the data 
for transfer. This mismatch further contributes to the potential for races. A better design would 
be to use explicit handshaking between the measurement and transfer processes to ensure that the 
measurements always fully bracket the transfer. 
If the beginning of a trace is missing, then tcpanaly gives up on trying to analyze it, 
because it is too difficult to then work out what the congestion window is, and hence to apply the 
powerful self­consistency check of looking for packets that are sent in violation of the congestion 
window. If, however, only the end of a trace is missing, then tcpanaly can readily analyze the 
remainder of the trace. When pairing such a truncated trace with the complementary trace made at 
the other endpoint, tcpanaly truncates the trace pair at the last packet appearing in both traces. 
This occurred in about 6% of the N 1 trace pairs, and 3% of the N 2 pairs. Truncation typically 
involved only the final few packets of the trace. 
A final note: sometimes a trace begins with what is actually leftover traffic from a previous 
measurement between the same pair of hosts, because at the network level the previous connection's 
final connection handshake has not yet completed. In principle, this should never happen, because 
the TCP implementation should not allow the same connection port to be reused while it still main­ 
tains state for the earlier instantiation of the connection. In practice, however, we have observed 
it in several of our traces, sometimes in the traces at both ends of the new connection, indicating 
it is not simply stale packets left unread from the earlier use of the packet filter but indeed the last 
wisps of the previous connection. Providing the packets have connection termination flags set (FIN 
or RST), tcpanaly simply ignores them. 
10.3.5 Additions 
While it is easy to see how packet filters can sometimes fail to record network packets, 
we might not expect that they can also record extra packets! Yet, this does indeed happen, with the 
IRIX 5.2 and 5.3 packet filters. Figure 10.1 shows part of a sequence plot exhibiting this problem. 
Here, the ack just before time T = 11:175 has liberated five packets. 

132 
Time 
Sequence 
# 
11.1750 11.1760 11.1770 11.1780 
31500 
32000 
32500 
33000 
33500 
34000 
Figure 10.1: Packet filter replication 
Each outgoing data packet appears twice. The slope (i.e., data rate, per x 9.2.4) of the two 
sets of packets is telling. The first corresponds to a data rate of over 2.5 MB/sec, while the second 
is almost exactly 1 MB/sec. This latter agrees closely with the data rate of an Ethernet, and indeed 
the host generating the traffic is connected to an Ethernet. Thus, surprisingly, the first set of packets 
appear to have bogus timing while the second set appears to be accurate! Furthermore, the two sets 
are indeed intertwined, that is, the second occurrence of sequence number 32,257 appears in the 
trace before the first occurrence of 34,305. 
This puzzling picture all makes sense given the following explanation. This trace was 
made running the packet filter on the same machine as that generating the network traffic, and the 
operating system is copying outgoing packets to the filter twice, the first time when the packets are 
scheduled to be sent out onto the local Ethernet, and the second time when they actually depart onto 
the Ethernet. The 2.5 MB/sec corresponds to how fast the operating system is sourcing the traffic, 
while the 1 MB/sec reflects the local rate limit of the Ethernet link speed. 
About 2,000 of the traces in our study have duplications of this sort. Clearly such dupli­ 
cates can complicate or skew our analysis. For example the computation of packet loss rates had 
better not conclude that when the sender's filter reports 400 packets sent but only 200 arrive that the 
loss rate was 50%! On the other hand, we would rather not discard all these traces for our subse­ 
quent analysis, so tcpanaly needs to cope with the duplication. Yet, we cannot blithely discard 
the second copy of each packet, because we might in the process discard a packet truly replicated by 
the network, an event that would be very interesting to detect (this does indeed happen, see x 13.2). 
For our measurement purposes, the second copy is actually preferred to the first, since it 
is closer to the true wire time (x 10.1). Unfortunately, while in many traces every single packet sent 
by the host (data packets, if the IRIX host was the sender, acks if the receiver) appeared twice, in 
some of the traces a second copy was occasionally missing. (We know the omission was not due 
to an interface drop, per x 10.3.3, because it was never retransmitted.) Furthermore, in some traces 

133 
Time 
Sequence 
# 
6.4 6.6 6.8 7.0 7.2 
18000 
20000 
22000 
24000 
26000 
28000 
Figure 10.2: Packet filter resequencing 
the duplication starts midway through the trace, rather than permeating the entire trace. For these 
reasons, tcpanaly copes with measurement duplicates by discarding the later copy. 
It discriminates between a measurement duplicate and a true retransmission as follows. 
First, it checks whether the ``id'' field in the packets' IP headers match. This field is used by IP for 
fragmentation purposes, which we need not delve into here. However, one salient property of the 
field is that in general it is incremented for each new IP packet that a host sends. Consequently, 
different TCP packets will usually have different IP ``id'' fields in their IP headers. If the ``id'' fields 
agree, it then checks whether the sequence number fields match, and, for data packets, also whether 
the second copy was sent less than one quarter of the minimum observed round­trip time (RTT) after 
the first copy. If the endpoint TCP is known to reuse the IP ``id'' field when retransmitting a data 
packet (of the TCPs in our study, only Linux 1.0 does this), then data packets are never considered 
candidates for measurement duplication, since it is too easy to confuse a true retransmission with 
a measurement duplicate (especially since Linux 1.0 retransmits too early, per x 11.5.8, and hence 
would pass the RTT test). Fortunately, the packet filter used to trace the sole Linux host (korea) 
does not appear to suffer from measurement duplications, so we do not lose any calibration by 
doing so. 
10.3.6 Resequencing 
Another form of packet filter error is what we term ``resequencing,'' in which the packet 
filter alters the ordering of the packets so that it no longer reflects events as they actually occurred 
in the network. Figure 10.2 shows a portion of a sender trace in which this occurred. At first glance 
the plot appears normal: acks are occasionally arriving and as they do, the window slides several 
packets' worth and newly liberated packets depart shortly afterwards. 
Figure 10.3, however, shows a blow­up of the central tower from the previous figure. 

134 
Time 
Sequence 
# 
6.9280 6.9284 6.9288 
22000 
23000 
24000 
Figure 10.3: Enlargement of resequencing event in previous figure 
We see that the packet filter has recorded timestamps for the packets such that the first two data 
packets are sequenced as having departed before the acknowledgement arrived. Since the congestion 
window would not have permitted their earlier departure, and there was a lengthy lull as shown in 
Figure 10.2 before their departure but only 100's of microseconds between their alleged departure 
and the arrival of their liberating ack, it is clear that the filter has misrepresented the true sequence of 
events. The problem here is not a clock adjustment (x 12.6), since the packets appear in the shown 
chronological order in the trace file (and also because this problem is much more common than 
we find clock adjustments to be). This problem occurs quite frequently for the Solaris 2.3 and 2.4 
packet filters, plaguing about 20% of the traces they record. It almost never occurs for any of the 
other packet filters. 
Most likely the resequencing occurs because the packets are being recorded by a packet 
filter running on the same host as is generating the traffic. We speculate that the Solaris packet filter 
has two code paths by which packets are copied to the packet filter for recording, one corresponding 
to incoming packets and one corresponding to outbound ones. If the outbound path is appreciably 
faster than the inbound one; if copies of packets can queue separately in both paths waiting for the 
filter to record them; and if packets are only timestamped when the filter processes them, then the 
resequencing makes sense. 
Unfortunately, resequencing presents a considerable analysis headache, as it destroys any 
ready assessment of cause­and­effect. It also means that the packet timestamps have large margins 
of error, with a bias towards overestimating how long it takes acks to arrive compared to how quickly 
data packets are sent out. Thus, tcpanaly needs to detect this problem so that it knows not to trust 
the sequence of events reported by the packet filter. It cannot really correct the problem since we do 
not know when the ack truly arrived, so we do not have a sound timestamp to assign to it. Instead, 
it flags the trace as lacking accurate timing and causality information. 
To detect resequencing for traces recorded at the data sender, tcpanaly keeps track of 

135 
stall packets. These are data packets that are not timeouts (i.e., not retransmissions of the lowest 
unacknowledged sequence number) and that have been sent after a lengthy lull in network activity. 
tcpanaly considers a lull to have occurred if at least 25 msec has elapsed since the previous data 
packet was sent, or, if an ack arrived after the last data packet was sent, then at least 50 msec has 
elapsed since the ack arrived. 
If an ack follows a stall packet by less than max(1 msec; R s ), where R s is the clock 
resolution of the packet filter's timestamps (x 12.1), and if the ack acknowledged a sequence number 
below that of the stall packet (so, transmitting the stall packet after seeing the ack would have made 
sense), then tcpanaly flags the ack as reflecting a resequencing event. 
As mentioned above, the stall packet technique only works for traces recorded at the data 
sender. tcpanaly uses a similar technique for receiver traces, namely looking for acknowledge­ 
ments for data as­yet­unreceived but arriving shortly after. 
Note that there is some overlap between detecting measurement drops and resequencing 
events. For example, an observation of data sent beyond the congestion window could be due to the 
corresponding ack having been dropped, or due to resequencing, with the ack arriving shortly after 
the window violation. tcpanaly may occasionally mistake one for the other, based on the timing 
of the packets arriving after the event. For our purposes, this potential misattributing of the exact 
type of packet filter error is unimportant. The key requirement is simply that tcpanaly recognize 
the trace as untrustworthy. 
Finally, the Solaris filters are particularly apt to resequence an ack for a FIN packet ter­ 
minating the connection, presumably because the associated code paths are particularly asymmetric 
in terms of processing time. Since for our analysis this reordering is essentially benign, because 
it comes at the very end of the connection, tcpanaly does not consider traces that only exhibit 
resequencing for a FIN packet as untrustworthy. The statistic above of 20% of the Solaris traces 
having resequencing problems does not include those with only resequenced FIN packets. 
10.3.7 Timing 
Another type of packet filter error concerns the accuracy of the timestamp recorded for 
each packet: how close is the timestamp to the true wire time? In Chapter 12 we look at the issue of 
calibrating these timestamps in detail. Most of the consistency checks we develop in that Chapter 
rely on comparing pairs of packet timings, those corresponding to when the sender's packet filter 
recorded the packet's departure, and those of when the receiver's packet filter recorded the packet's 
arrival. These tests prove quite powerful at detecting different clock problems, but require extensive 
analysis. In this section we confine ourselves to a simple test tcpanaly performs to check the 
validity of a single trace's timestamps, namely ensuring that they never decrease. 
We refer to a decrease in the timestamp values as ``time travel.'' One might think that time 
travel would never occur, and checking for it would be a waste of effort, but, surprisingly, it does 
happen! We recorded four instances in N 1 , all involving connix's clock, and 538 instances (!) in 
N 2 , 498 involving sintef1's clock (that is, the clock of sintef1's NetBSD 1.0 tracing machine) 
and 40 involving panix's clock (also a NetBSD 1.0 machine). 
Figure 10.4 gives an example of how a sequence plot exhibiting time travel appears. If we 
add lines to the plot showing the order of the packets as they appear in the trace file (Figure 10.5), 
then we see a sharp backward jump from time T = 3:6 sec to T = 3:05 sec. 

136 
Time 
Sequence 
# 
2.8 3.0 3.2 3.4 3.6 3.8 
15000 
20000 
25000 
Figure 10.4: Example of ``time travel'' 
Time 
Sequence 
# 
2.8 3.0 3.2 3.4 3.6 3.8 
15000 
20000 
25000 
Figure 10.5: Same plot, with lines showing the ordering of the packets in the trace file 

137 
Time 
Sequence 
# 
16 18 20 22 24 26 
40000 
50000 
60000 
70000 
Figure 10.6: Receiver sequence plot showing a forward clock adjustment, undetectable to the eye 
Time travel has a simple explanation: it reflects the local clock being set backwards. It 
can occur frequently, as with sintef1, if the clock is periodically synchronized with an external 
source by setting it to the source's reading, and if the clock tends to run fast. Another form of 
time travel, considerably more difficult to detect, are forward adjustments. Figure 10.6 shows a re­ 
ceiver sequence plot spanning an 11 second period during which the receiver's clock was artificially 
advanced by an additional 400 msec. To the eye, however, this adjustment is completely hidden. 
We look at detecting clock adjustments in greater detail in x 12.6. 
10.3.8 Misfiltering 
The last type of packet filter error we look at is ``misfiltering,'' meaning that the filter 
incorrectly executes its pattern matching and either rejects packets it should accept, or accepts pack­ 
ets it should reject. The first of these is similar to a measurement drop, though systematic in nature. 
The second can in principle be detected by checking the accepted packets to make sure they do 
indeed match the desired filter. To do this check properly requires a separate implementation of 
the filtering mechanism than that used by libpcap , since otherwise one would expect the same 
erroneous match to occur again. 
tcpanaly does not include a full, separate matching mechanism, but it does perform 
two consistency checks in this regard. First, it checks to make sure that the IP header of each 
packet indicates a TCP packet. This check never failed. Second, it partitions all the TCP packets it 
inspects into individual TCP connections based on their host and port numbers, and analyzes each 
resulting connection separately. In no case did it find more than one connection in a trace, though 
it occasionally found remnants of an earlier incarnation of the same connection, as discussed in 
x 10.3.4. 

138 
Time 
Sequence 
# 
4.200 4.202 4.204 4.206 4.208 4.210 4.212 
52000 
53000 
54000 
55000 
Figure 10.7: Example of an ambiguity caused by the packet filter's vantage point 
10.4 Packet filter ``vantage point'' 
While not a measurement error per se, another difficulty in calibrating packet filter mea­ 
surements arises from complications due to the packet filter's location in the network. We term 
this its ``vantage point.'' For example, if the packet filter records data packets as they arrive at the 
receiver, ambiguities arise in trying to determine whether any arrival anomalies observed are due to 
the network perturbing the packets, or because they were sent by the source in an unusual fashion. 
Suppose two packets arrive out of sequence order; it is not always apparent whether the network 
reordered them, or if the packet with the lower sequence number was dropped by the network and 
the sender has already retransmitted it. 
Vantage point effects can be significantly more subtle than in this example, however. They 
are most insidious when the filter appears as if it were located directly at one of the TCP endpoints, 
and only occasionally does its separate location alter the traffic perspective it records. 
Figure 10.7 gives an example. The sequence plot is from a packet filter recording traffic 
at the sending endpoint. A little after time T = 4:203, an ack arrives for a sequence number a bit 
below 52,000. Very shortly afterwards, at time T = 4:204, an ack arrives for a sequence number 
above 54,000. Then at time T = 4:205, the sender transmits two packets with sequence numbers 
below 54,000. If the sequence plot truly reflected the traffic as seen by the TCP endpoint, then the 
TCP never should have sent these packets, since it had already received an acknowledgement for 
the corresponding data! As can be seen from the plot, shortly after sending these two packets the 
endpoint then does process the second ack, and sends new, unacknowledged data. 
The key point here is that neither the packet filter nor the endpoint TCP are behaving 
erroneously. The problem is simply that the packet filter's vantage point is not exactly the same as 
that of the endpoint TCPs, and the problem is exacerbated by the vantage point being very close to 
that of the TCPs, as this then encourages assumptions that the two are indeed the same. 

139 
Vantage­point problems can be reduced by running the packet filter on the same machine 
as the TCP endpoint, although this introduces other measurement problems due to competition 
for the machine's processing power. This step does not, however, eliminate the problem, because 
cause and effect can still be obscured if the TCP takes a long time to react to any particular input. 
For example, when new data arrives, many TCP receivers only acknowledge it after the receiving 
application process has consumed at least two packets' worth of data, which can take considerable 
time after the network arrival of the data. 
In order to correctly analyze TCP traffic, tcpanaly must be able to cope with vantage­ 
point problems. This means that in general it is insufficient for analysis purposes to only remember 
the most recently received packet. Dealing with vantage­point problems considerably complicates 
tcpanaly's design, but the result is much more robust analysis. We discuss how we do so in 
x 11.3.1. 
10.5 Pairing packet departures and arrivals 
The last packet filter issue we look at is how to take two trace files, T s recorded at TCP 
endpoint s, and T r recorded at endpoint r, and from them synthesize a trace pair that matches 
packet departures from s and r with their corresponding arrivals at r and s. 
The basic approach we use for trace pairing comes from the observation that each packet 
has two ``fairly'' unique fields in its header, its sequence number (or the sequence number it is 
acknowledging, if an ack) and its IP ``id'' field (x 10.3.5). If these fields were indeed unique, then 
trace pairing would be easy, since the fields would allow unambiguous determination of which 
packets in T s correspond to which in T r . Those without a corresponding packet were either dropped 
by the network (if present only in the trace local to their sender), or by the packet filter (if present 
only in the trace local to their receiver). 
The pairing problem lies in the fact that the sequence number and IP id fields are not 
actually unique. Sequence numbers can reappear in different packets due to retransmissions or 
duplicate acks (x 9.2.7). Most TCPs only reuse the IP id field when its 16 bit counter wraps around, 
but one system in our study (Linux 1.0) reuses the IP id field as well as the sequence number when 
retransmitting. 4 
tcpanaly deals with these problems as follows. Suppose we wish to pair packets sent by 
s with their arrivals at r (everything works the same when pairing in the other direction). tcpanaly 
first goes through T s and for each packet p sent by s it computes a key, K p , comprised of the triple 
of the packet's IP id field and its data and acknowledgement sequence numbers. 
Using K p as an index into a table P s , we check to see whether we have already seen a 
packet with the same key. If not, the packet is added to P s and tcpanaly proceeds to the next 
packet. If another packet with the same key has been seen, then we check whether the packets 
are identical, meaning they have the same TCP header flags, data and acknowledgement sequence 
numbers, length, and offered window. 5 If any of these differ, then tcpanaly flags that a serious 
4 This is a reasonable performance decision, and explicitly allowed in x 4.2.2.15 of [Br89]. If the sending TCP keeps 
its unacknowledged data in the form of fully assembled packets, then for retransmission all it needs to do is copy the 
packet out to the network interface. The reuse of the IP id field does not present an integrity problem since what is being 
retransmitted is a verbatim copy of what was sent earlier. 
5 In principle, for data packets we should also check whether the data contents agree. Since, however, the traces in our 

140 
analysis error has occurred, since the assumption that the key suffices as a unique identifier has 
proven incorrect. For all of the traces in N 1 and N 2 , this never occurred. We next check whether 
the packet filter in use is known to create spurious measurement duplications (x 10.3.5). If so, then 
tcpanaly discards the later copy of p as a measurement artifact. Otherwise, if the sending TCP 
is known to reuse IP id fields (Linux 1.0, for our study), then the additional packet is entered as 
a second instance of K p in P s . If none of these considerations hold, then tcpanaly flags that a 
packet has apparently been replicated at the sender (these are analyzed further in x 13.2), and does 
not construct a trace pair for T s and T r because it cannot do so reliably. 
tcpanaly next goes through each packet p arriving from s in T r , again computing its key 
K p . If K p does not appear in P s (the table of packets sent by s, indexed by their keys), then either 
p's transmission was dropped by the packet filter at the sender; or T s was truncated (x 10.3.4); or the 
network garbled p in transmission so that its sequence number or IP id field has changed (analyzed 
further in x 13.3). If K p appears in P s , then p is checked against the T s version of the packet to see 
if they are identical. If not, tcpanaly flags that the packet was corrupted by the network (again 
analyzed in x 13.3). If the two copies agree, then we proceed as follows: 
1. If K p appears exactly once in P s , and has not yet been paired with an arrival in T r , then it is 
paired with p in T r . 
2. If K p appears exactly once in P s but has already been paired in T r with an arrival p 0 , then p 
is flagged as a replication of p 0 . Replications are further analyzed in x 13.2. 
3. If K p appears m times in P s for m ? 1, then we term the pairing as ambiguous. To resolve 
ambiguous pairings, tcpanaly first computes n, how many times the same key occurs in 
T r . If n = m, then tcpanaly assumes that each packet arrived in order and pairs them in 
order of occurrence. If n ? m, then we presume a measurement drop occurred in T s (it could 
also have been a packet replication, but that is much less likely). If n ! m, then some of the 
original instances of the packet were dropped by the network. In this case, we attempt to pair 
each departure with the arrival that has the smallest difference in timestamps, provided this 
difference is no smaller than the smallest such difference for all of the unambiguous pairings. 
If this pairing results in a single packet departure matching two different packet arrivals, then 
we abandon the attempt to construct a trace pair, since we cannot construct a plausible set of 
pairings. 
If tcpanaly was not able to unambiguously pair the packets in the traces, or if the traces included 
corrupted packets (which may be erroneously paired), then tcpanaly does not construct a ``trace 
pair'' and skips any subsequent analysis that requires a trace pair. The latter problem (corrupted 
packets) was extremely rare, but the former problem is more common: ambiguities due to Linux 1.0 
TCP reusing IP id fields rendered 65% (15 out of 23) of the traces with a Linux 1.0 sender un­ 
pairable. Consequently, we were unable to perform sound analysis of the trans­Pacific path from 
Korea to the other sites, especially because the Linux 1.0 traces that did not suffer ambiguities were 
those with especially low levels of retransmission, so analyzing just them would result in a biased 
assessment of the levels of retransmission and loss along the path. 
study only include packet headers, and not data contents, we could not perform this test. 

141 
Finally, if tcpanaly removes relative skew from the receiver's clock (x 12.7.9), it then 
recomputes the packet pairings, in case any of the ambiguous matches are changed by the altered 
receiver timestamps. 

142 
Chapter 11 
Analyzing TCP Behavior 
We discussed earlier how one of the main drawbacks to using TCP traffic for our network 
``probes'' is the often quite complex behavior of the TCP endpoints (x 9.1.2). We argued that the 
resulting fine­resolution probing outweighs this disadvantage, because the disadvantage can be over­ 
come by careful analysis of the packet arrivals and departures in order to remove those aspects of 
the traffic behavior due to the TCP endpoints themselves. In this chapter we discuss how tcpanaly 
performs this analysis. In addition, the process of removing the TCP effects reveals a wealth of in­ 
teresting detail about how different TCP implementations behave. We find a tremendous range both 
in their performance and in their congestion­avoidance behavior, the latter playing a critical role in 
the Internet's global stability. 
In addition, a solid understanding of each TCP endpoint's exact behavior enables us to 
distinguish between packet filter errors and bona fide network anomalies. For example, if multiple 
copies of a single data packet arrive at the TCP receiving endpoint, we can look to see whether 
the receiver generates an ack for each one. If it does, then the extra copies are bona fide and not 
measurement duplications (x 10.3.5). If not, then if the TCP endpoint is known to correctly generate 
acks when it receives redundant packets, we can conclude that a measurement error occurred, and 
the packets did not really exist. If the TCP is known to not generate acks in this situation, then we 
cannot tell, and look for a separate indication of whether the packets were indeed real (for example, 
whether they have different TTL's). Thus, thoroughly understanding TCP behavior provides an 
invaluable self­consistency check on the soundness of our measurement (x 9.1.4). 
11.1 Analysis strategy 
As its name suggests, we began writing tcpanaly with the goal of analyzing TCP behav­ 
ior. Only later did we realize that, in the process of doing so, it develops many of the data structures 
also needed to analyze network dynamics. 
Our original goal was for the program to work in one pass over the packet trace by recog­ 
nizing generic TCP actions. The goal of executing only one pass stemmed from hoping tcpanaly 
might later evolve into a tool that could watch an Internet link in real­time and detect misbehaving 
TCP sessions on the link. Designing the program in terms of generic TCP actions such as ``time­ 
out'' and ``fast retransmission'' would then enable it to work for any TCP implementation without 
needing to know details of the implementation. 

143 
After considerable effort, we were forced to abandon both of these goals. One­pass anal­ 
ysis immediately proved difficult due to vantage point issues (x 10.4), in which it was often hard 
to tell whether a TCP's actions were due to the most recently received packet, or one received in 
the more distant past. Attempts to surmount this problem by using k­packet look­ahead for small 
k proved clumsy, and finally foundered when we realized that one basic property tcpanaly needs 
to determine concerning a TCP implementation is only truly apparent upon inspecting an entire 
connection, namely whether the implementation has a ``sender window'' (x 11.3.2). Since sender 
windows are common, in order to infer them soundly we decided to allow tcpanaly to inspect 
the entire packet trace before making decisions as to how the TCP behaved. Doing so immediately 
simplified other types of analysis, too. 
We abandoned the goal of recognizing generic TCP actions as the wide variation in TCP 
behavior became apparent. For example, as related below, the Solaris and Linux TCP implemen­ 
tations in our study often retransmit data packets much too early, before the original packet had a 
chance to arrive at the destination and be acknowledged, and the Linux implementation furthermore 
retransmits entire flights of packets rather than just one packet at a time. Neither of these behaviors 
fit a generic TCP action (except ``broken retransmission''!), and they are very easily confused with 
legitimate retransmissions due to ``fast retransmission'' (x 9.2.7). Similarly, the fashions in which 
different implementations open the congestion window differ in subtle ways, with the result that 
sometimes it can be extremely difficult to tell why a TCP failed to send new data when an ack 
arrives: is it because its window has not opened another full packet, or because the TCP is simply 
running slow and has not had time to do so? Both occur quite frequently. 
Thus, we are left with a much less flexible but more robust design for tcpanaly: it makes 
two passes over the packet trace, it uses k­packet look­ahead and look­behind to resolve ambigui­ 
ties, and, instead of characterizing the TCP behavior in terms of generic actions, we must settle for it 
having coded into it intimate knowledge of the idiosyncrasies of 17 different TCP implementations. 
Furthermore, when confronted with a trace generated by a new implementation not already coded 
into it, it can only fruitfully analyze the trace if the new implementation behaves identically to one 
of the 17 it already knows about, or if the extra effort is made to add knowledge of the new imple­ 
mentation to the program. To ameliorate this shortcoming, the program is capable of automatically 
running all known implementations against a given trace to determine those with which the trace 
appears in full accord. 
All told, tcpanaly is about 14,000 lines of C++ code. Of these, about 7,500 analyze TCP 
behavior (1,400 concerning individual implementation behavior), 5,000 analyze network behavior, 
and the remainder perform utility functions. The use of C++ is particularly beneficial for expressing 
the behavior of one TCP implementation in terms of its differences from that of another implemen­ 
tation. In particular, tcpanaly includes a ``Reno'' implementation that captures the main features 
of the BSD Reno TCP release, from which most of the TCPs in our study were derived. This allows 
these derivatives to be expressed succinctly, in terms of just how they differ from ``generic'' Reno. 
A widespread Reno variant known as Net/3 is discussed in detail in [WS95]. 
Table XV summarizes the different TCP implementations known to tcpanaly. The first 
column gives the name of the implementation and the version numbers present among the imple­ 
mentations in our study. The second column lists the sites running each version, separated by ';'s. 
Sites listed with a subscript of 1 or 2 participated in both N 1 and N 2 , but only used the given 
implementation during the first or second, respectively. 

144 
Implementation Sites Notes 
BSDI 1.1; 2.0; 2.1ff bsdi 1 , connix, pubnix 1 , 
austr 2 ; pubnix 2 , rain; 
bsdi 2 
Reno­derived. BSDI 2.1ff not a 
public release. 
Digital OSF/1 harv, mit, ucol 2 , umann Reno­derived. No differ­ 
ences observed between ver­ 
sions 1.3a, 2.0, 3.0, 3.2. 
HP/UX 9.05; 10.00 sintef2; sintef1 Reno­derived. 
IRIX 4.0.1; 4.0.5f; 5.1; 5.2; 
5.3; 6.2ff 
oce; sandia; bnl 1 ; sdsc 1 ; 
adv, bnl 2 ; sdsc 2 
Reno­derived. No differences 
observed between 4.0.1 and 
4.0.5f, nor between 5.3 and 
6.2ff. 6.2ff not a public release. 
Linux 1.0 korea Implemented independently 
from BSD. 
NetBSD 1.0 panix Reno­derived. 
Solaris 2.3; 2.4 inria 1 , sri, ucl 1 , ustutt, 
wustl, xor; austr2, inria 2 , 
mid, nrao 2 , ucl 2 
Implemented independently 
from BSD. Very minor dif­ 
ferences between 2.3 and 
2.4. 
SunOS 4.1 austr 1 , lbl 1 , near, nrao 1 , 
ncar, ucla, ucol 1 , ukc, 
umont, unij, usc 
Tahoe­derived. 4.1.3 and 4.1.4 
appear identical. 
VJ 1 ; VJ 2 lbl 2 ; lbli Experimental Reno­variants 
developed by Van Jacobson. 
Neither a public release. 
Table XV: TCP Implementations known to tcpanaly 

145 
All but Linux 1.0, Solaris 2.3 and 2.4, and SunOS 4.1 are some variant of ``Reno.'' 
SunOS 4.1 is a variant of ``Tahoe,'' a Reno predecessor, while the Linux and Solaris implemen­ 
tations were written independently of Reno and of each other. 
11.2 Checking packet and measurement integrity 
One often assumes that a trace produced by a packet filter sited at a TCP endpoint does 
indeed reflect the packets sent and received by the endpoint. In Chapter 10 we discussed some ways 
in which this assumption can be violated. Here we look at additional consistency checks tcpanaly 
uses to avoid misassumptions. 
Among all the traces in the study, we never observed any of the following: 
1. Options present in the IP header. 
2. A packet sent with more data than the MSS (x 9.2.2). 
3. A TCP connection­establishment option present in a non­establishment (non­SYN) packet. 
4. An establishment (SYN) packet appearing after completion of the connection establishment 
handshake. 
5. Illegal or unknown TCP header options. 
6. SYN packets with other flags set. (We have seen this in other Internet traffic traces.) 
7. IP fragments with the ``Don't Fragment'' bit set. 
8. Non­TCP traffic (x 10.3.8). 
9. Illegal IP header lengths. 
10. TCP ``simultaneous open'' [St94]. 
We did, however, occasionally observe the following: 
1. Time travel (x 10.3.7). 
2. IP header checksum errors. tcpanaly verifies that the computed checksum for the IP header 
matches that in the header. This test never failed in N 1 , but failed 17 times in N 2 . All 17 oc­ 
currences were between the same pair of hosts (connix and nrao), and all of the IP headers 
flagged with errors suffered from corrupted (too large) length fields. These circumstances 
strongly suggest a faulty link somewhere in the middle of the path between connix and 
nrao, presumably the final hop in the path because otherwise an intermediary router should 
have discarded the packets. The corrupted length fields are consistent with CSLIP errors, as 
discussed in x 13.3. 
3. TCP checksum errors. Packet traces generated by tcpdump have a snaplen that limits the 
amount of data recorded for each packet to the first n bytes (x 10.2). The snaplen can greatly 
reduce the volume of data the packet filter must copy and record. But it means that, for TCP 

146 
data packets longer than the snaplen, tcpanaly cannot compute the corresponding checksum 
and compare it to the value in the TCP header. tcpanaly can, however, checksum pure ack 
packets, since they completely fit within the snaplen used in our experiment. It does so unless 
the header checksum is exactly 2 16 
checksum computation, because the computation is done later by 
the network interface hardware. 
Checksum errors in pure acks detected by this means are quite rare: 1 instance in N 1 and 
26 instances in N 2 . All but one of these latter involved lbli, which, as discussed in x 13.3, 
suffered from an atypically strong predilection for checksum errors. We discuss how to infer 
checksum errors in data packets below in x 11.4.2, and in x 13.3 we find that these are much 
more common than errors in pure acks. 
An interesting question is whether tcpanaly ever falsely identified TCP checksum errors 
because a packet filter recorded a corrupted copy of a packet (while the receiving TCP re­ 
ceived an uncorrupted copy). However, with corrupted packets removed from the analysis, 
tcpanaly still found that the receiving TCP behaved as expected, indicating that the packets 
were indeed corrupted and ignored by the TCP. 
4. Truncated packets. These are packets that, according to the IP header, have a length of n 
bytes, but in fact, as delivered by the local link, had a length of only k ! n bytes. There were 
4 instances in N 1 , 348 in N 2 . The latter involved 8 different receiving hosts. 
5. Illegal TCP header length. This is a TCP header length field that indicates a length less than 
the allowed minimum of 20 bytes. It indicates a corrupted packet. We observed only two 
instances, both in N 2 . 
6. IP fragments. We observed 5 instances in N 2 (none in N 1 ) of a packet arriving with an 
IP header indicating it was the beginning of a fragment. (The packet filter pattern we used 
precluded capture of any fragment portions other than the initial fragment.) Upon inspection, 
however, all of these were not actually bona fide IP fragments, but instead a repeated pattern 
of packet corruption: the packet was enlarged in flight from carrying to 512 bytes of data 
to purportedly carrying either 980 bytes or 1460 bytes. Both of the latter are popular MTU 
values (x 9.2.2). Their presence suggests a SLIP compression error, as discussed in more 
detail in x 13.3. 
It is important for tcpanaly to detect corrupted packets, because they are discarded by 
the receiving TCP rather than processed by it. If tcpanaly misses such a corruption, then it can 
erroneously infer that the TCP failed to act when it should have. Thus, we believe the effort entailed 
in detecting the sometimes quite rare errors reported above is well worth while, especially because 
a priori we have no solid reason for assuming they are indeed rare. 
11.3 Sender analysis 
In this section we discuss how tcpanaly analyzes a TCP implementation's sender be­ 
havior: that is, the details of how the TCP reliably transmits data to the other endpoint. The sender 

147 
behavior includes the TCP's congestion behavior, too: how the TCP responds to signals of network 
stress. Proper congestion behavior (x 9.2.6) is crucial to assure the network's stability. The next 
main section (x 11.4) then discusses how tcpanaly analyzes receiver behavior: when and how a 
TCP implementation chooses to acknowledge the data it receives. In general a TCP both sends and 
receives data. tcpanaly, however, only accurately analyzes unidirectional TCP transfers. Extend­ 
ing it to cope with bidirectional transfers would not be a major undertaking, but was not needed for 
our study and so was left for future work. 
11.3.1 Data liberations 
To accurately deduce the sender behavior of a TCP from a record of its traffic requires a 
packet trace captured from a vantage point (x 10.4) at or near the TCP. If the vantage point is distant 
from the sender (especially at the receiver), tcpanaly has no reliable means of distinguishing 
between measurement drops, anomalous TCP behavior, and true network drops. It also cannot 
distinguish lengthy latencies from the vantage point to the sending TCP's location, and a TCP that 
is simply slow to respond to the acknowledgements it receives. 
As discussed in x 10.4, even a vantage point quite close to the sender still can result in 
timing ambiguities. We accommodate this difficulty by introducing the notion of data liberations. 
Whenever an acknowledgement arrives, tcpanaly determines how it updates the offered window 
and the congestion window (x 9.2.2). If the new window values permit the TCP to send another 
packet(s), tcpanaly then notes which packets it should send. We term each such newly­allowed 
data packet a ``liberation.'' 
By noting the time at which new acks created liberations, tcpanaly can keep a list of 
all pending liberations and, when the TCP finally does send more data packets, determine their 
corresponding liberations. The difference in time between when the data packet was sent and when 
it was liberated then defines the response time of the TCP for that ack. Unusually large response 
times often indicate that tcpanaly has an incomplete understanding of the TCP's behavior, and 
that the delay was really because the purported ``liberating'' ack did not in fact liberate the data 
finally sent. It flags such instances so they can be inspected manually to determine the origins of the 
apparently imperfect behavior. 
Sometimes tcpanaly will observe a packet being sent that has no corresponding liber­ 
ation. We term this a ``window violation,'' because it indicates that the TCP exceeded either the 
congestion window or the offered window. In principle, tcpanaly should never observe a window 
violation if it correctly understands the operation of the sending TCP. Violations can still occur, how­ 
ever, if the trace suffers from measurement drops, or if the understanding of the TCP is incomplete 
or inaccurate. 
tcpanaly can use statistics of response times (minimum value, mean value) to compare 
how closely different candidate TCP implementations match a particular trace. If a candidate im­ 
plementation is indeed correct, then its response times will usually be quite small. If the candidate 
is incorrect, then the liberations tcpanaly computes for the implementation will not correspond to 
the times at which packets were actually liberated. The difference leads to either increased response 
times or window violations. Thus, depending on the relative response times and presence or lack of 
window violations, tcpanaly sorts candidate implementations into those that are close fits, those 
that are imperfect fits, and those that are clearly incorrect fits (for example, if it observes window 
violations). These last can also occur due to measurement drops, though, in that case, tcpanaly 

148 
usually rejects all of the candidate implementations. 
The process of coding into tcpanaly a new TCP implementation likewise relies on min­ 
imizing response time statistics and eliminating window violations. For example, we might begin 
by deriving a C++ class to encapsulate the new implementation I in terms of differences from the 
generic Reno class. We then run tcpanaly against a trace of I's sender behavior. If tcpanaly 
flags a window violation, we manually inspect the trace at the location of the violation (usually using 
a sequence plot; x 9.2.4) and attempt to determine a rule for how I differs from Reno at that point. 
Once all window violations have been eliminated, we then turn to the response time statistics. If 
the maximum response time is quite large, it usually indicates a congestion window that has opened 
up more slowly than expected, or a failure to take advantage of fast retransmit. Again, a sequence 
plot greatly aids in diagnosing the behavior. After identifying and codifying I's behavior, we test 
to assure that this has indeed lowered the response time. If so, we proceed to the next instance of a 
large response time, or the next trace of I's behavior. If the new TCP is close to one of the existing 
ones, this is a fairly quick process. 
In addition to summarizing the amount of data newly allowed and when it became lib­ 
erated, liberations include a set of zero or more attributes that describe how tcpanaly should 
interpret a failure of the TCP to promptly use the liberation: 
Blameless due to SWS (Silly Window Syndrome) avoidance TCPs are supposed to implement 
the SWS avoidance algorithm described in [Cl82, St94], which in some cases prevents them 
from sending data that they otherwise could. 
This attribute indicates that the TCP should not be blamed for failing to utilize the liberation, 
since the TCP's state after receiving the ack that created the liberation corresponds to one in 
which it should not send due to SWS avoidance. 
Blameless due to PSH When a TCP is sending data and has temporarily exhausted the available 
data, then the TCP marks the last packet it sends with the PSH (``push'') flag, informing the 
receiving TCP that it should not wait for any further data since none will be forthcoming for 
a while. Any ack received after a PSH packet was sent is marked as blameless­due­to­PSH, 
since the TCP might still not have any fresh data to send, and hence could reasonably ignore 
the opportunity created by the ack to send additional data. 
Blameless due to no more data tcpanaly has looked ahead and the sender will never have any 
more data to send, so the liberation can be safely ignored. This attribute is separate from the 
one above because TCPs do not always set PSH when all of the data for a connection has 
been sent. 
Should not be missed If true, then tcpanaly should specifically complain if the TCP fails to 
respond to the ack. An example is for the third duplicate ack that, for many TCP implementa­ 
tions, triggers a ``fast retransmission'' sequence (x 9.2.7). For those implementations, the fast 
retransmission should always occur. 
These attributes guide tcpanaly in correctly assessing the sending TCP's response times. For 
``blameless'' liberations, if the TCP's apparent response time is excessive, it is ignored. 
There are many additional, minor details to tcpanaly's accurate management of libera­ 
tions. We omit further discussion here in the interest of brevity. They are documented in the C++ 
code. 

149 
11.3.2 Inferring sender windows 
tcpanaly sometimes lacks critical information that affects the sending TCP's behavior. 
In this and the next two sections we discuss how it infers such information based on testing the 
directly­available information for self­consistency. 
In x 11.1 above we discussed the problem of determining whether the sending TCP has an 
unstated ``sender window,'' that is, a fixed limit on how many packets it can have in flight separate 
from its congestion window and the offered window (x 9.2.2). In practice all TCPs have a sender 
window, namely the amount of buffer space they can commit for holding previously sent data until it 
is acknowledged. The key question, though, is whether this limit is ever smaller than the congestion 
window and the offered window. If so, then it is reasonable for the TCP to not send data even though 
from recent liberations it looks like it could. However, there is no obvious sign in a packet trace 
what the TCP's actual sender window is. 
tcpanaly infers whether a sender window was in effect by calculating the maximum 
amount of data the connection ever had in flight. Then, during its second pass over the trace, if at 
some point the TCP's congestion window and the offered window would have allowed it to have 
sent a full segment (x 9.2.2) more than this amount, but the TCP failed to do so, then the failure 
to send additional data was either due to a sender window, or to insufficient understanding of the 
TCP. 1 One clue sometimes present that the limitation was indeed a sender window is that often the 
sender window is the same as the offered window advertised by the sending TCP in the data packets 
it transmits to the receiver. tcpanaly can still make mistakes, however, particularly when it fails 
to realize that the reason the TCP did not transmit more data is not because of a sender window, but 
because of the arrival of a source quench (x 11.3.3). 
11.3.3 Inferring source quenches 
Unfortunately, the filter pattern we used to collect the traces in our study was limited to 
exactly the TCP packets used for each TCP transfer. This limit was imposed for security reasons, 
to guarantee that the packet filter making the trace could not be used (either accidentally, or mali­ 
ciously, by a cracker) to spy on other network traffic using the same link. Usually, the TCP packets 
fully suffice for understanding the resulting TCP behavior. One exception, however, is if some el­ 
ement of the Internet infrastructure sends an Internet Control Message Protocol (ICMP; [Po81b]) 
message to the sending TCP instructing it to slow down. This message is called a ``source quench,'' 
and its packet format does not match the filter pattern used for our measurement, so our traces do 
not include any source quench ICMP messages. 
TCP implementations vary on how they respond to source quench messages. In general, 
the TCP is supposed to diminish its sending rate. BSD­derived TCPs do so by entering a ``slow start'' 
phase (x 9.2.4). Figure 11.1 shows an example of this happening. At time T = 11:2 the congestion 
window is five packets, so the ack at T = 11:25, which advanced the window by two packets, 
should have led to two additional packets being sent. None were, however. About 200 msec later 
another ack arrives and advances the window another two packets, yet only one packet is sent, as 
though the window were now only three packets. This would indeed be the case if a source quench 
had arrived between T = 11:2 and T = 11:25, setting the window to 1 packet. Due to slow start, 
the first ack (T = 11:25) would then have advanced the window to 2 packets, not enough to send 
1 A particularly easy error to make is to overlook the possibility that the TCP failed to send due to SWS avoidance. 

150 
Time 
Sequence 
# 
10.5 11.0 11.5 12.0 
42000 
44000 
46000 
48000 
Figure 11.1: Sequence plot showing effects of unobserved source quench 
any new data, and the second ack would have advanced it to three packets. Similarly, the ack around 
T = 11:6 advances the window to 4 packets, as can be seen in the plot. 
Solaris also enters slow start, but in addition it cuts ssthresh by a factor of two. Linux 1.0 
diminishes the congestion window by one full segment (MSS). 
tcpanaly infers the presence of a source quench as follows. Any time it detects a large 
lull between when a liberation is created and when the resulting packet was actually sent, it looks 
at the series of packets between the ack creating the liberation and the data packet ostensibly cor­ 
responding to the liberation, as well as the packets shortly after. If the whole series is consistent 
with slow start having begun (for no discernable reason) sometime between the ack and the data 
packet, then the trace is consistent with an unseen source quench. (This analysis does not work for 
Linux 1.0, since it does not enter slow start. Consequently, tcpanaly fails to infer source quenches 
for Linux 1.0.) 
Source quenches are quite rare---they have been deprecated (x 4.3.3.3 of [Ba95]), 
since generating extra network traffic during a time of heavy load violates fundamental stability 
principles---but they do happen. In N 1 , tcpanaly inferred a total of 26 source quenches in 20 dif­ 
ferent traces. Almost all of these included bnl as sender (one time as receiver), suggesting that a 
router near it still generates source quenches when stressed. Likewise, tcpanaly inferred 65 source 
quenches in 64 different N 2 traces, almost all of which involved connix or austr2. The connix 
source quenches are quite striking in their regularity: the time they arrived after the beginning of 
the connection was always between 500 msec and 1 sec, with a median and mean of 750 msec. The 
connections further exhibit a strikingly regular pattern of the connix TCP opening its congestion 
window to about 2 15 bytes just before the source quench is sent, suggesting that it is single­handedly 
stressing a particular nearby router. 
We note that often the source quenches inferred by tcpanaly are almost immediately 
followed by retransmissions, indicating that the router sending them is indeed almost overwhelmed. 

151 
We can see this phenomenon at the end of Figure 11.1. We also note that tcpanaly's analysis 
of possible source quenches is only heuristic. In particular, if a source quench is followed by a 
retransmission timeout or a second source quench, then tcpanaly will not find an exact match 
to a slow­start sequence following the first source quench, and does not infer that a source quench 
occurred. 
11.3.4 Inferring initial ssthresh 
The final inference done by tcpanaly is determining whether the sending TCP has an 
initial limit on ssthresh. Recall from x 9.2.6 that the TCP state variable ssthresh determines when 
the TCP should switch from ``slow start,'' in which the congestion window begins at only 1 packet 
but rapidly expands, to ``congestion avoidance,'' in which the window increases less quickly. 
Usually, when a new TCP connection begins, its ssthresh variable is initialized to the 
equivalent of ``infinity,'' allowing it to rapidly probe for the presence of arbitrarily high available 
bandwidth. (Exceptions are Solaris, which initializes ssthresh to 8 packets, and Linux, which sets 
it to a single packet.) Sometimes, however, the TCP implementation first inspects its route cache 
for information about previous connections to the same remote host. These implementations then 
initialize ssthresh based on the congestion conditions previously encountered. 
tcpanaly needs to be able to detect when the initial ssthresh is lower than normal, be­ 
cause otherwise it will erroneously conclude that the sending TCP is very slow in responding to the 
acks that would normally---due to slow start---have opened up the congestion window beyond the 
hidden initial ssthresh limit. It does so in a fashion similar to inferring source quenches (x 11.3.3). 
Any time the TCP appears to take too long to respond to a liberation, if the TCP has not already 
undergone a retransmission (which would have altered ssthresh anyway) then tcpanaly looks 
ahead to see whether the series of packets beyond the point of the apparent lull is consistent with 
congestion avoidance rather than slow start. If so, it infers that the connection had an atypical initial 
value for ssthresh. 
It turns out that only the experimental VJ 1;2 TCPs exhibit non­default initial ssthresh 
values. 2 Other TCPs may in the future exhibit different initial ssthresh's, too, as a recent proposal 
for improving TCP's start­up behavior includes setting the initial ssthresh based on measurements 
of the connection's first few packets [Ho96]. 
11.4 Receiver analysis 
In this section we discuss how tcpanaly analyzes a TCP implementation's receiver be­ 
havior, namely when and how the implementation chooses to acknowledge the data it receives. 
11.4.1 Ack obligations 
Similar to the notion of data liberations (x 11.3.1), when analyzing receiver behavior 
tcpanaly addresses vantage point problems (x 10.4) by keeping track of a list of pending ack 
2 The HP/UX implementations appeared to, also, but so rarely that we cannot determine whether a different, not yet 
determined mechanism is leading to the early onset of congestion avoidance. 

152 
obligations. Whenever a TCP receives data, it incurs some sort of obligation to generate an ack­ 
nowledgement in response to that data. The obligation may be optional or mandatory, as discussed 
below. 
tcpanaly has a default set of rules for the sorts of obligations created by different types 
of packets. It then includes additional rules for specific implementations that do not follow the 
default set, as discussed in x 11.6. In our discussion of different types of ack obligations below, we 
also detail tcpanaly's corresponding default rules. 
Optional ack obligations 
An optional ack obligation refers to data that the TCP may choose to acknowledge but can 
also wait before acknowledging. This occurs when new data arrives that is in sequence. The TCP 
standard states that a TCP may refrain from acknowledging such data in the hopes that additional 
data may arrive and the acknowledgements combined, but for no longer than 500 msec (x 4.2.3.2 
of [Br89]). Furthermore, a correct TCP implementation should always generate at least one ack­ 
nowledgement for every two packet's worth of new data received. 3 Acknowledgement strategy is 
further discussed in [Cl82]. 
tcpanaly considers the arrival of any new, in­sequence data as creating an optional ack 
obligation, even if more than one such packet has arrived and not yet been ack'd. When an ack­ 
nowledgement is finally generated for the new data, we then inspect the number of packets acknow­ 
ledged to see whether the TCP has heeded the suggested limit of one ack for every two packets. 
tcpanaly reports instances in which the limit is violated, but considers this different than a failure 
to meet a mandatory ack obligation, discussed in the next section. 
Mandatory ack obligations 
A mandatory ack obligation occurs when a packet arrives to which the TCP standard re­ 
quires the receiving TCP to respond with an acknowledgement. In the original TCP specification, 
the receipt of a packet containing already­acknowledged data mandated that a new acknowledge­ 
ment be sent, since the unnecessary retransmission indicates that the sender may be confused as to 
what data the receiver has successfully received. This was clarified in x 4.2.2.21 of [Br89] to also 
optionally include the receipt of packets whose data cannot yet be acknowledged due to a sequence 
``hole'' below the packet's sequence, in order to facilitate ``fast retransmission'' (x 9.2.7). 
Consequently, tcpanaly considers the arrival of any out­of­sequence data as creating 
a mandatory ack obligation. (The mandatory obligation is not to ack the out­of­sequence data, but 
instead to generate a cumulative ack for all in­sequence data received, since TCP acknowledgements 
always reflect the extent of cumulative, in­sequence data received, per x 9.2.1.) tcpanaly keeps 
track of statistics concerning how often and how quickly an implementation responds to mandatory 
obligations separately from those for optional obligations. 
Gratuitous acks 
If tcpanaly observes an ack being sent for which there was no obligation, and which 
does not change the offered window or terminate the connection, then it flags the ack as gratuitous. 
3 x 4.2.3.2 of [Br89] expresses this as ``SHOULD,'' while x 4.2.5 notes it as ``MUST.'' 

153 
Observing gratuitous acks plays a role analogous to observing window violations when analyzing 
a sender's behavior: they can indicate confusion regarding tcpanaly's interpretation of the TCP's 
behavior, or measurement errors in the packet trace. 
11.4.2 Inferring checksum errors 
As noted in x 11.2, tcpanaly often cannot verify a packet's TCP checksum because 
the packet filter only records the beginning of the packet and not its entire contents. Nevertheless, 
checksum failures do indeed occur, and when they do tcpanaly needs to deduce their presence to 
avoid misattributing the receiving TCP's behavior to something else. 
There are several situations in which tcpanaly infers the possibility that a packet re­ 
ceived earlier had a checksum error (and thus the subsequent ack obligations derived from the trace 
do not correctly reflect the situation as perceived by the receiving TCP): 
1. If a retransmission is received for data already apparently received by the TCP, and which 
should have previously been ack'd by the TCP but was not, and if all sequentially earlier data 
has been ack'd; 
2. if instead of acking increasing sequence numbers in response to a series of optional ack obli­ 
gations, the TCP generates duplicate acks as each new packet arrives, until the retransmission 
called for by the duplicate acks arrives; or, 
3. if an apparently unnecessary retransmitted packet actually results in an advance of the ack­ 
nowledged sequence number, indicating that the retransmission did indeed fill a sequence 
hole. (This item is slightly different from the first item, because here we are considering data 
that originally arrived above­sequence, and so could not be acknowledged directly at that 
time.) 
More precisely, what tcpanaly really infers is that the TCP acted as though it ignored an arriving 
packet. We then assume that the packet was ignored because it failed its checksum test. We return 
to this point in more detail later. 
tcpanaly does not attempt to infer checksum errors in traces recorded by packet filters 
that it has determined either dropped (x 10.3.1) or resequenced (x 10.3.6) packets, since it is to diffi­ 
cult with these traces to disambiguate between a genuine checksum failure and seemingly confusing 
TCP behavior because the trace is inaccurate. 
Figure 11.2 shows a sequence plot reflecting two checksum errors. The plot comes from 
a trace recorded at the receiving end of a connection. Consequently, most of the points showing 
acknowledgements lie directly on top of the data packets being acknowledged and thus do not show 
up visually. (This is fine for the purposes of this example.) Up through time T = 20:0 the data 
all arrives in sequence, but starting at time T = 19:5 the receiving TCP generates duplicate acks 
for sequence 74,241 rather than advancing the acknowledgements. This continues until data packet 
74,241 is retransmitted at T = 20:2. The retransmission leads to the TCP immediately acking all 
of the outstanding data, fully consistent with a single checksum error occurring at the 74,241 data 
packet. Note that, after the retransmission, the pattern repeats at time T = 20:5. Duplicate acks for 
sequence 78,849 indicate that the 79,361 packet was likewise discarded due to a checksum error. 
Figure 11.3 shows a sequence plot of a considerably different instance of checksum errors. 
Instead of as in Figure 11.2, where two isolated packets were corrupted, here an entire burst of 

154 
Time 
Sequence 
# 
19.0 19.5 20.0 20.5 21.0 
74000 
76000 
78000 
80000 
Figure 11.2: Receiver sequence plot showing two data checksum errors 
Time 
Sequence 
# 
3 4 5 6 7 8 
25000 
30000 
35000 
40000 
Figure 11.3: Sequence plot showing a burst of checksum errors 

155 
9 packets were all discarded by the receiving TCP. We can tell that the TCP did not accept the nine 
packets from 26,281 to 37,961 at T = 3:7 to T = 5:2 sec, because as the data is retransmitted the 
TCP only acknowledges the newly retransmitted packets---they are not shown filling any sequence 
``holes'' as would be the case if some of the 9 packets had been successfully received. 
We further discuss checksum bursts such as this one, as well as detailing the prevalence of 
checksum errors in our datasets, in x 13.3. As noted above, what tcpanaly really infers are packets 
ignored by the receiver, which we then assume were ignored due to checksum errors. It is possible 
that the packets were ignored for a different reason, such as the kernel lacking sufficient buffers to 
keep them until the receiving TCP could process them. In x 13.3 we investigate this possibility and 
find that almost all of the errors appear indeed due to packet corruption. 
11.5 Sender behavior of different TCP implementations 
In this section we look at the variations in how the different TCP implementations listed 
in Table XV act when sending data. Our findings in this section and the next are almost all based 
on the modifications we had to make to tcpanaly in order for it to successfully match the traces 
of the TCP's behavior. A few other behaviors were discovered by examining source code for the 
implementations, which we had for Linux 1.0, Solaris 2.5, VJ 1 and VJ 2 , as well as the invaluable 
source code analysis of Net/3 in [WS95]. In addition, in x 11.7 we present brief findings of behavior 
observed for three other implementations; these were determined by manually studying sequence 
plots, as tcpanaly does not have the behavior of these implementations coded into it. 
TCP behavior is very complex, and we do not attempt to exhaustively examine it. Our 
main interest is in performance and congestion behavior: does the TCP implementation use the net­ 
work as effectively as it can, and does it correctly adapt to congestion by decreasing its transmission 
rate, as is required for global Internet stability? There is a natural tension between these two goals, 
and a great deal of research has gone into tuning TCP so it balances high performance with stable 
behavior in the presence of congestion. One of the basic questions we would like to answer in this 
section is how successfully this research has in fact been incorporated into TCP implementations 
deployed in the Internet. The answer turns out to be ``quite mixed.'' 
We proceed as follows. First, we give an overview of previous work in analyzing the 
behavior of TCP implementations. The work focuses almost entirely on sender behavior. Next, 
we present the sender behavior of the implementations in our study, beginning with two ``generic'' 
implementations, ``Tahoe'' and ``Reno,'' from which almost all of the implementations derive their 
behavior. We then discuss each of the different implementations in Table XV. After analyzing 
sending behavior, we turn in x 11.6 to receiver behavior, namely the policy by which the TCP 
sends acks. Finally, we look in x 11.7 at the behavior of some additional TCP implementations: 
Windows 95, NT, and Trumpet/Winsock. This last investigation was motivated by our finding that 
the independently written TCP implementations in our study (Linux and Solaris) suffered from 
serious congestion and performance problems. We were interested to see whether other non­Reno­ 
derived TCP implementations likewise have these sorts of problems. The answer turns out to be: 
yes! 

156 
11.5.1 Previous studies of TCP implementations 
Several researchers have previously studied and characterized the behavior of TCP imple­ 
mentations, using different techniques from ours. 
Comer and Lin 
Comer and Lin studied TCP behavior using a technique termed active probing [CL94]. 
Active probing consists of treating a TCP implementation as a black box and observing how it 
reacts to external stimuli, such as a loss of connectivity to the other endpoint, or a failure by the 
other endpoint to consume data sent by the TCP under study. They examined five implementa­ 
tions, IRIX 5.1.1, HP­UX 9.0, SunOS 4.0.3, SunOS 4.1.4, and Solaris 2.1, to determine their initial 
retransmission timeout values, ``keep­alive'' strategies, and zero­window probing techniques. The 
authors' emphasis was on correctness in terms of the TCP standards, and they found several imple­ 
mentation flaws. 
Brakmo and Peterson 
Brakmo and Peterson analyzed performance problems they found in TCP Lite, a widely­ 
used successor to TCP Reno (and the basis for some of the implementations in our study) [BP95b]. 
TCP Lite is also known as ``Net/3,'' which is the term we will use for consistency with other studies 
we discuss. 
Their approach was to simulate Net/3's behavior using a simulator based on the x­kernel 
[HP91]. The x­kernel is highly configurable, so that the simulations actually directly executed the 
Net/3 code, an important consideration for assuring accuracy. They found: 
1. An error in the ``header prediction'' code. Net/3 uses this code to make an early decision 
whether an incoming packet is what would have normally been expected: either an in­ 
sequence, non­retransmitted data packet, or an ack for new data that does not change the 
size of the offered window [CJRS89]. If the packet matches the expectation, then it can 
be processed succinctly; for example, without all the computations necessary to update the 
congestion window. 
The error they found was that the code considered an incoming acknowledgement as expected 
even if the congestion window had been inflated due to ``fast recovery'' (x 9.2.7). Thus, if after 
fast recovery the acknowledgements all passed the header prediction test, then the window 
was never deflated. 
Fixing this problem is a one­line addition to the prediction code. 
2. Inaccuracies computing the retransmission timeout (RTO) due to details in some of the integer 
arithmetic used to approximate the true real­numbered calculations. The authors proposed 
altering the scaling used in the integer arithmetic to remedy the inaccuracy. 
3. Confusion between whether the ``maximum segment size'' variable used to decide when to 
send new acknowledgements and how to update the congestion window should include the 
size of TCP header options or not. 

157 
4. Very bursty behavior when the offered window advances a large amount (an incoming ack for 
a large amount of new data). When this occurs, Net/3 (and, in our experience, all other TCPs) 
immediately sends as many packets as the new window allows. The authors include a small 
coding addition that would reduce such bursts to 2 or 3 packets at a time. 
5. A ``fencepost'' error in determining whether the congestion window was inflated due to fast 
recovery, and later needs deflating. The fix is replacing a ? test with a – test. 
Of these problems, we found that a number of the implementations in our study exhibited 
all of them, except we did not examine the RTO's used by the implementations and thus did not 
have an opportunity to observe the second problem. 
Stevens 
In [St96], Stevens devotes a chapter to an analysis of the behavior of a large number of 
TCP connections made to a World Wide Web server running Net/3 TCP. The analysis was based on 
a 24 hour tcpdump packet trace of 147,103 attempts by remote sites to connect to the Web server. 
He characterized the range of options offered by the remote TCPs, finding tremendous variation 
(including many obviously incorrect values); the rate at which connection attempts and re­attempts 
arrived; the variation in round trip time between the server and the remote clients; and the pending­ 
connection load on the server. In addition, he analyzed three Net/3 implementation bugs, one in 
which two different TCP connection states become confused (``SYN received'' and ``performing 
keep­alive probe''), one in which the TCP fails to time out zero window probes (and thus over time 
devotes more and more resources to zero window probes for connections that have permanently lost 
connectivity), and one in which the TCP can skip the first cycle of ``slow start'' if it happens to have 
data ready to send upon connection establishment. 
He further found that almost 10% of all SYN packets were retransmitted; some remote 
TCPs sent ``storms'' of up to 30 SYNs/sec, all requesting the same connection; and some remote 
TCPs did not correctly back off their connection­establishment retry timer, or reset it after 4 at­ 
tempts. 
Dawson, Jahanian and Mitton 
In recent work, Dawson, Jahanian and Mitton studied six TCP implementations using a 
``software fault injection'' tool they developed [DJM97]. The implementations were: SunOS 4.1.3, 
AIX 3.2.3, NeXT (Mach 2.5), OS/2, Windows 95, and Solaris 2.3. The first and last were also 
present in our study; the remainder were not. 
Their basic approach is a refinement of Comer and Lin's ``active probing'' (x 11.5.1). 
They use the x­kernel to interpose a general purpose packet manipulation program between the 
TCP implementation and the actual network, so they can arbitrarily alter, delay, reorder, replicate, 
or discard any packets the TCP sends or receives. 
The main focus was on timer management. They found that retransmission sequences 
vary a great deal; that some TCPs do not correctly terminate the connection with a RST packet if 
the maximum retransmission count is reached; and that Solaris 2.3 uses a much lower bound for its 
initial RTO, around 300 msec, than the other implementations, and also takes much longer to adapt 
the RTO to higher, measured RTTs. We further discuss both of these latter problems in x 11.5.10. 

158 
They also studied keep­alive behavior. ``Keep­alives'' are an optional TCP mechanism for 
probing idle connections to ensure that the network path still provides connectivity between the two 
endpoints. The TCP standard specifies that, if a TCP supports keep­alives, then, by default, the idle 
interval must be at least two hours before the TCP begins probing the path. However, the authors of 
[DJM97] found that OS/2 begins keep­alive after only 800 sec. In addition, Windows 95 only makes 
four keep­alive probes, all sent one second apart. If none of these elicit replies, then it abandons 
the connection. This latter behavior will make Windows 95 connections quite brittle in the face of 
mid­sized connectivity outages. 
Finally, they found that Solaris 2.5.1 (not otherwise part of their study) incorrectly imple­ 
ments Karn's algorithm, which is used to disambiguate round­trip time measurements [KP87]. 
11.5.2 Generic Tahoe behavior 
The goal of our TCP behavior analysis is to delve considerably deeper into the perfor­ 
mance and congestion behavior of the different TCPs in our study than done previously. We begin 
by discussing the generic TCP ``Tahoe'' implementation that tcpanaly uses as a building block for 
describing the behavior of all of the TCP implementations except Linux 1.0. 
Our Tahoe implementation reflects the behavior of the Tahoe version of BSD TCP, re­ 
leased in 1988 [St96, p.27]. It includes slow start (x 9.2.4), congestion avoidance (x 9.2.6), and fast 
retransmission (x 9.2.7), but not fast recovery (x 9.2.7). It updates the congestion window upon the 
receipt of any ack for new data. It sets ssthresh to half the effective window upon a retransmission, 
but for fast­retransmit it rounds the result down to a multiple of the Maximum Segment Size (MSS; 
x 9.2.2), while for a timeout it does not. No doubt this inconsistency is due to the fast retransmit 
code having been added later than the original timeout code. In both cases, ssthresh is never set 
lower than 2\DeltaMSS. 
Tahoe updates the congestion window cwnd using congestion avoidance if cwnd is strictly 
larger than ssthresh. The increase is: 
\DeltaW = 
$ 
MSS 2 
cwnd 
% 
; (11.1) 
without any additional constant term (Eqn 11.2 below). 
11.5.3 Generic Reno behavior 
The ``Reno'' version of BSD TCP was released in 1990. Our generic Reno implementation 
does not attempt to precisely describe that release, but instead to provide a common base from which 
we can express as variants the numerous Reno­derived implementations in our study. Reno differs 
from Tahoe as follows: 
1. It implements fast recovery (x 9.2.7), in which following a fast retransmit it inflates the con­ 
gestion window cwnd and will send additional packets if enough additional duplicate acks 
arrive. 
2. It consequently suffers from the ``header prediction'' and ``fencepost'' errors when deflating 
the window, as previously described in [BP95b] (x 11.5.1). 

159 
3. It rounds ssthresh down to a multiple of MSS for timeout retransmissions as well as fast­ 
retransmits. 
4. It includes an additive constant when increasing the window during congestion avoidance. 
That is, instead of using Tahoe's increase as given in Eqn 11.1, it uses: 
\DeltaW = 
$ 
MSS 2 
cwnd 
% 
+ 
¯ MSS 
8 
š 
: (11.2) 
The extra term MSS/8 leads to a super­linear increase of the congestion window during con­ 
gestion avoidance. Subsequent to its addition to Reno, this extra term has come to be viewed 
as too aggressive ([BP95b], credited to S. Floyd in footnote 6), but its presence is widespread. 
11.5.4 BSDI TCP 
We had several BSDI 1.1 and 2.0 sites in our study, as well as one site running an alpha 
release of 2.1, which we term 2.1ff. 
BSDI 1.1 appears identical to our generic Reno implementation. We observed two 
changes with BSDI 2.0. The first is that it omits the extra congestion avoidance increment (i.e., 
it uses Eqn 11.1 rather than Eqn 11.2). The second is that it computes the MSS governing how 
much data it should send in each TCP/IP packet in a slightly complicated fashion, as follows. 
When initiating a connection, BSDI 2.0 includes the ``window scaling'' and ``timestamp'' 
options in its initial SYN packet. If the remote peer agrees to these options in its SYN­ack, then 
each subsequent packet sent by BSDI 2.0 includes an accompanying timestamp in its header. With 
padding, this option requires an additional 12 bytes of space in the header. If, for example, the MSS 
is 512 bytes, as is often the case, then the TCP should send 512 bytes of data in each packet along 
with 52 bytes of header, the usual 40 bytes of TCP/IP header plus the timestamp option. Instead, 
it uses an MSS of 500 bytes. The fundamental problem 4 is that the implementation is overloading 
the notion of ``MSS,'' trying to make it serve as both the maximum amount of data to send to the 
receiver in one packet, and also as the largest total TCP/IP packet size that can be sent along the 
Internet path without incurring fragmentation. Yet, the presence of options means the relationship 
between these two is more complex than simply adding in a constant header size. 
To further complicate matters, BSDI 2.0 uses the unadjusted MSS (i.e., its value before 
deducting 12 bytes for options) in its congestion window computations. 
None of these MSS fine points has much impact at all on BSDI 2.0's performance or 
congestion behavior. But they do subtly alter the conditions under which the TCP will send packets, 
and thus solid analysis of the TCP's behavior must take them into account. 
BSDI 2.1ff behaves the same as BSDI 2.0 except for two differences. The first is that it 
uses the adjusted MSS for its congestion window computations (the MSS still has 12 bytes deducted 
for the header options). The second is that, if the remote TCP does not include an MSS option in its 
SYN­ack reply to the BSDI TCP's initial SYN packet, then the congestion window and ssthresh are 
initialized to a huge value 5 instead of MSS bytes. This bug occurs because of an assumption in the 
Net/3 code that SYN­acks will always include MSS options and that therefore receiving a SYN­ack 
is the proper time to initialize cwnd and ssthresh. 
4 Pointed out to me by Matt Mathis. 
5 Specifically: 2 30 ­ 2 14 . See [WS95, p.835]. 

160 
Time 
Sequence 
# 
0.0 0.2 0.4 0.6 0.8 
0 
5000 
10000 
15000 
20000 
25000 
30000 
Figure 11.4: Sequence plot showing the Net/3 uninitialized­cwnd bug 
Figure 11.4 dramatically illustrates the potential burstiness created by this bug. Here, 
when the initial ack arrives offering a window of 16,384 bytes (and with no MSS option), the BSDI 
TCP instantly sends all the full­sized (536 bytes, in this case) packets that fit within the window, a 
total of 30 packets. The next ack (which was sent because it updates the advertised window) offers 
a larger window (cf. x 9.3), and again the TCP floods the network with packets, taking advantage 
of the increased window. A third ack arrives but does not advance the window, so nothing further is 
sent. 
Ironically, even the first packet of the storm was lost (as was its retransmission), as can 
be seen by the lack of progress in the acknowledgements. All told, 14 of the 61 packets sent in the 
first two spikes were lost (any other connections sharing the path between the two TCPs also surely 
suffered). 
Fortunately, it is relatively rare that this bug manifests itself so dramatically. It requires 
interaction between the BSDI TCP and a remote TCP that both does not send MSS options in its 
SYN­ack, and offers a large window. TCPs that do not offer MSS options tend to be of quite old 
vintage, and these tend to offer small receiver windows. 
The bug does not always manifest itself under the conditions given above. We suspect 
that the times it does not are when the BSDI TCP finds initial cwnd and ssthresh values in its route 
cache, and thus begins the new connection with tamer values. 
This bug nicely illustrates the fundamental tension between TCP performance and con­ 
gestion behavior. Fixing it lessens the TCP's performance (blasting out 30 packets at a time can 
work extremely well in making sure one utilizes all available bandwidth), but also makes the TCP 
much more ``congestion friendly.'' 
Finally, we note that the IRIX 5.2 TCP implementation also exhibits this bug, as does 
Net/3. Most likely both BSDI 2.1ff and IRIX 5.2 ``inherited'' the bug as they incorporated enhance­ 
ments and changes from Net/3. 

161 
Time 
Sequence 
# 
18.0 18.5 19.0 19.5 20.0 
35000 
40000 
45000 
50000 
55000 
Figure 11.5: Sequence plot showing the HP/UX congestion window advance with duplicate acks 
11.5.5 Digital OSF/1 TCP 
Digital's OSF/1 TCP implementation appears virtually identical to our generic Reno 
implementation. The only difference we observed was that it does not always manifest the ``header 
prediction'' bug (x 11.5.1). We could not find a pattern to when it would and when it would not. 
For analyzing a given trace, tcpanaly accommodates its inability to know whether the TCP will 
exhibit the bug by looking ahead to determine whether in fact the TCP deflated the congestion 
window. 
We did not observe any differences between Digital OSF/1 versions 1.3a, 2.0, 3.0, and 3.2. 
11.5.6 HP/UX TCP 
HP/UX 9.05 TCP is very similar to our generic Reno implementation. The only dif­ 
ferences we observed were two behaviors that rarely have an opportunity to manifest themselves. 
First, HP/UX 9.05 does not clear its ``dup­ack'' counter (x 9.2.7) when a timeout occurs, so if it 
receives additional duplicate acknowledgements after a timeout, these can lead to fast retransmit 
or the sending of additional fast recovery packets. Second, such duplicate acks also advance the 
congestion window, providing that the timeout was for a segment previously retransmitted using 
fast retransmission. 
We illustrate this latter behavior in Figure 11.5, since it is somewhat unusual. The stream 
of acks along the bottom of the figure are all duplicates. The packet they call for has already been 
retransmitted, but was dropped. The data packets sent around T = 18:8 with sequence numbers near 
55,000 are fast recovery packets, sent out by inflating cwnd. Just before T = 19:0, the previously­ 
retransmitted packet times out and is retransmitted again. As more dups arrive (from an earlier huge 
flight of packets), each liberates another retransmission via fast recovery. This is not ideal behavior: 
the packets being retransmitted may all have already arrived at the receiver. The TCP should instead 

162 
either send additional new data, as it was doing prior to the timeout (and which is the intent behind 
fast recovery, thwarted by the timeout having reset cwnd), or simply wait one RTT to see what data 
the peer has now received. 
HP/UX 10.00 behaves identically to HP/UX 9.05 except it advances the congestion win­ 
dow (per Figure 11.5) for dup acks received after any timeout, not just one of a packet previously 
transmitted using fast­retransmission; and it uses the original MSS it offered to its peer when com­ 
puting congestion window updates, rather than the final MSS negotiated during the connection 
setup. 
11.5.7 IRIX TCP 
IRIX 4.0 appears identical to our generic Reno implementation except it does not manifest 
the header prediction bug (x 11.5.1). IRIX 5.1 does, though not always, the same as Digital OSF/1 
TCP (x 11.5.5). IRIX 5.2 is identical to IRIX 5.1 except it also exhibits the uninitialized­cwnd bug 
shown in Figure 11.4. IRIX 5.3 is identical to IRIX 5.2 except that, if the remote peer does not 
include an MSS option in its SYN­ack, then IRIX 5.3 initializes the congestion window to the value 
it offered, even if this is larger than the final MSS it used. 6 
11.5.8 Linux TCP 
The Linux 1.0 TCP implementation was written independently from any other. Conse­ 
quently, it is not surprising that it differs in many ways from the others in our study, including some 
ways that are particularly significant. 
The most significant is its broken retransmission behavior. First, often when it decides 
to retransmit, it re­sends every unacknowledged packet in a single burst. Second, it decides to 
retransmit much too early, leading it to retransmit packets for which acks are already heading back, 
or, even worse, which are themselves still in flight towards the receiver. 7 Jacobson terms this sort 
of behavior ``the network equivalent of pouring gasoline on a fire'' [Ja88], because it unnecessarily 
consumes network resources precisely when they are scarce. Consequently, it can lead to congestion 
collapse, in which the network load stays extremely high but throughput is reduced to close to zero 
[Na84]. 
Figure 11.6 illustrates Linux 1.0's behavior. At about T = 85 an acknowledgement arrives 
advancing the window by three packets, which the TCP immediately sends. At T = 86, however, 
two duplicate acks arrive, the first of which spurs the TCP to retransmit every packet it has in flight. 
Shortly after, an ack arrives for sequence 77,825; this correctly liberates only new data, as does this 
ack for 78,849 that follows momentarily. None of the new data arrives successfully---the network 
is already clogged with the unnecessary retransmissions. 
At T = 87:8, sequence 79,361 times out and is retransmitted. This happens again at 
T = 90:6 (the timeout is not fully doubling as it backs off, though in other cases it does). At T = 92 
dup acks for 78,849 arrive. These were sent within 400 msec of the ack received at T = 86:4 but 
took more than 5 seconds to arrive, indicating huge delays in the network. The TCP appears to 
6 The offered MSS can differ from the final MSS used because, if the remote peer does not include an MSS option, 
then the TCP must use an MSS of no more than 536 bytes (x 4.2.2.6 of [Br89]). 
7 These retransmissions usually occur shortly after receiving an ack, suggesting that they are not timeout retransmis­ 
sions per se, but are stimulated instead by the arrival of the ack. 

163 
Time 
Sequence 
# 
86 88 90 92 94 
76000 
78000 
80000 
82000 
Figure 11.6: Sequence plot showing broken Linux 1.0 retransmission behavior 
Time 
Sequence 
# 
93.5 94.0 94.5 95.0 
80000 
80500 
81000 
81500 
82000 
82500 
83000 
Figure 11.7: Enlargement of righthand side of previous figure 

164 
ignore their arrival, however (so would a Reno TCP), but when the twice­retransmitted data packet 
is ack'd a little while later, again all data in flight is retransmitted, and again 1.3 sec later, and again 
1.1 sec later. Worse, not only is all of this data being retransmitted at about 1 sec intervals: if we 
blow up the activity (Figure 11.7), we see the packets are also being retransmitted on much finer 
time scales! 8 
All told, this connection sent 317 packets, 117 of them retransmissions. 20% of the 
packets were dropped by the network. Of the retransmitted packets that reached the other end, 
60% were superfluous, since the data had already arrived safely in an earlier packet. How hard this 
connection hammered others sharing the network path, we can only guess! But it is clear that, if 
Linux 1.0 were ubiquitous, its retransmission behavior would bring the Internet to its knees. 
The excessive retransmissions clearly follow shortly after the TCP receives an ack, so 
tcpanaly models them as a type of ``fast retransmission.'' We have been unable to determine 
exactly which incoming acks will trigger these retransmissions, though they appear to occur only 
for duplicate acks or acks received during a retransmission sequence. Consequently, tcpanaly 
simply allows that either of these might potentially liberate the entire window for retransmission. 
The Linux TCP maintainers are aware of this problem and report that it has since been 
fixed. 
Linux 1.0 differs from the other implementations in our study in several other ways: 
1. It does not implement fast retransmission or fast recovery. 
2. It initializes ssthresh to a single packet (MSS), which makes it slow to initially open its win­ 
dow. This behavior is beneficial from the perspective of network stability, as it means that 
Linux 1.0 TCP connections begin in a fundamentally conservative fashion. 
3. The Linux 1.0 code has logic in it to prevent more than 2,048 bytes from ever being in flight, 
quite conservative behavior. However, a typo 9 renders it ineffective. 
4. It does not round ssthresh down to a multiple of MSS for any form of retransmission. 
5. Its test for slow­start is cwnd ! ssthresh rather than cwnd Ÿ ssthresh. 
6. In congestion avoidance, it counts the number of acks received, and, when they exceed cwnd 
divided by MSS, then cwnd is advanced by MSS and the counter reset to zero. 
7. It has no minimum value on how far it can cut ssthresh. 
8. It acks every packet received (x 11.6). 
11.5.9 NetBSD TCP 
As far as we could determine, NetBSD 1.0 TCP is identical to our generic Reno imple­ 
mentation. 
8 We have observed Linux 1.0 retransmitting a packet it sent less than 2 msec before. The first transmission was due 
to a newly arrived ack advancing the window, and the second was part of a retransmission burst apparently triggered by 
the receipt of the ack. 
9 The limit is specified as 2048 when what is being tested against it is the number of packets in flight. 

165 
Time 
Sequence 
# 
0 2 4 6 8 10 
0 
2000 
4000 
6000 
8000 
10000 
Figure 11.8: Sequence plot showing broken Solaris 2.3/2.4 retransmission behavior, RTT = 
680 msec 
11.5.10 Solaris TCP 
Along with Linux, Solaris TCP is the other independent TCP implementation in our study. 
tcpanaly knows about two versions, 2.3 and 2.4, which differ only in minor ways. 
Like Linux, the most striking feature of Solaris 2.3 and 2.4 TCP is its broken retransmis­ 
sion behavior. Dawson et al. identified that Solaris uses an atypically low initial value of about 
300 msec for its retransmission timeout (RTO). This value, plus difficulties the timer has with 
adapting to higher RTTs, leads to the broken retransmission behavior. For a connection with a 
longer RTT, the TCP is guaranteed to retransmit its first packet, whether needed or not. Such an 
unnecessary retransmission would be only a minor problem if the timer then adapted to the RTT 
and raised the RTO, but it fails to do so, leading to connections riddled with premature, unnecessary 
retransmissions. 
Figure 11.8 shows an example of the retransmission problem in action. The sender is sri, 
in California, and the receiver is oce, in the Netherlands. The round­trip time is about 680 msec, 
above that of 200 msec for the initial Solaris retransmit timer (but not pathologically large). The 
Solaris TCP sends almost as many retransmissions as new packets, yet no data packets whatsoever 
were dropped! Each retransmission was completely unnecessary. Furthermore, so many retransmis­ 
sions are generated that it is difficult to find unambiguous RTT timings, in order to adapt the timer. 
While the RTO does indeed double on multiple timeouts, it is restored to its erroneously small value 
immediately upon an acknowledgement for a retransmitted packet, so it never has much opportunity 
to adapt. 
As the path's RTT increases, the problem only gets worse. Figure 11.9 shows a plot for 
an N 2 connection from wustl to oce. The smallest RTT in the trace is about 2.6 sec, and it got 
as high as 9.9 sec. The beginning of the connection is simply disastrous, with the first data packet 

166 
Time 
Sequence 
# 
0 5 10 15 20 
0 
1000 
2000 
3000 
4000 
Figure 11.9: Sequence plot showing broken Solaris 2.3/2.4 retransmission behavior, RTT = 2.6 sec 
being retransmitted 5 times (the first retransmission occurs closely enough to the original packet 
that it is hard to distinguish in the plot), the second data packet is retransmitted 6 times, the third 
4 times, the fourth 4 times (not all shown), and so on. None of the packets or their retransmissions 
were dropped! All of the retransmissions were needless. Worse yet, because they were needless, 
they elicited dup acks from the receiver, which eventually reached the level sufficient to trigger fast 
retransmission (x 9.2.7), generating further needless retransmissions! 
The connection eventually ran smoother, as the timer managed to adapt, but was still 
plagued with needless retransmissions as the RTT grew larger and the timer sometimes failed to 
track it quickly enough. 
Thus, Solaris TCP can effectively increase the load it presents to any high­latency Internet 
path by a factor of two or even quite a bit more. Unfortunately, many of the most heavily loaded 
Internet paths---those linking different continents via trans­oceanic or satellite links---have exactly 
this property. It would be interesting to learn what proportion of the traffic on a very heavily utilized 
link (such as the U.K.--U.S. trans­Atlantic cable) is due to completely unnecessary retransmissions. 
The Solaris TCP maintainers are aware of this problem and have issued a patch to fix it. 
Solaris TCP differs from the other implementations in our study in a number of additional 
ways: 
1. It initializes ssthresh to 8\DeltaMSS. From the perspective of network stability, this is nicely con­ 
servative, but from the perspective of performance, it impedes fast transfers unless they are 
quite lengthy. 
2. Sometimes when it receives an ack, it retransmits the packet just after the ack rather than the 
packet newly liberated by the advance of the window. These retransmissions do not affect the 
congestion window, nor do they alter the notion of what new data should be sent next time the 
window advances. Figure 11.10 shows an example. At T = 10:3, the Solaris TCP retransmits 

167 
Time 
Sequence 
# 
9.5 10.0 10.5 11.0 11.5 
35000 
40000 
45000 
Figure 11.10: Solaris 2.4 retransmitting without cutting cwnd 
sequence 37,125, and then just after T = 10:5 it retransmits 38,577. Yet, when an ack arrives 
for (the original transmission of) 38,577, we see that the congestion window was not reduced 
by the retransmissions, but remains at 5 packets. 
3. Its duplicate­ack counter survives timeouts, which can lead to a recently retransmitted­via­ 
timeout packet being retransmitted again via fast retransmission. 
4. Although there is code in the implementation for fast recovery, it is only exercised under 
rare circumstances. The problem is that the Solaris implementation is careful to advance the 
congestion window only upon receiving an ack for new data (see next item). This means that 
the dup acks that are supposed to keep inflating the window in order to liberate additional 
packets do not actually increase the window, since they do not acknowledge any new data. 
The rare circumstance in which the TCP can send a single fast recovery packet is if it has 
already accumulated during congestion avoidance more ``excess'' bytes than are required to 
advance cwnd given its current value. 
5. During congestion avoidance, the TCP keeps track of exactly how many bytes of data have 
been acknowledged since the last advance in cwnd. Whenever this value exceeds cwnd, cwnd 
is increased by the MSS. (Like the Linux congestion avoidance increment strategy, this is 
closer in spirit to the scheme outlined in [Ja88] than the Tahoe approach given by Eqn 11.1.) 
6. Its test for whether it is in a slow­start phase is cwnd ! ssthresh rather than cwnd Ÿ ssthresh. 
7. Upon receiving an ICMP Source Quench (x 11.3.3), it sets ssthresh to cwnd/2 prior to entering 
slow start. 
8. When cutting ssthresh, it does not round it down to a multiple of MSS. 

168 
The only differences between Solaris 2.3 and 2.4 that we observed are in their acking 
policies. See x 11.6 for discussion. 
11.5.11 SunOS TCP 
We had many SunOS 4.1.3 and 4.1.4 sites in our study. We did not observe any differences 
between the two releases. 
SunOS 4.1 appears to have been derived from BSD Tahoe, with the following differences: 
1. If the MSS offered by the remote TCP peer is larger than that offered by the SunOS TCP, 
then it uses the larger value to initialize cwnd, though it still uses its own offered value for all 
subsequent cwnd calculations. 
2. If it receives a series of acknowledgements for the same sequence number, if any of the acks 
is a window recision (that is, advertises a smaller window than did the previously­received 
acks), it simply ignores the ack. Other TCPs consider the window­recision ack as resetting 
the duplicate ack counter, delaying the possible onset of fast retransmission. 
We note that in our study the only window recisions we observed were due to packet reorder­ 
ing. No TCP ever originated an ack that rescinded a previously­offered window. 
3. It will only enter fast retransmission for a packet that was not previously retransmitted using 
fast retransmission (circumstances under which this behavior manifests itself are rare). 
4. Upon retransmission, when cutting ssthresh it does not round it down to a multiple of MSS, 
regardless of the type of retransmission. 
11.5.12 VJ TCP 
Two sites in our study ran experimental TCP implementations developed by Van 
Jacobson. lbl during N 2 ran a version we term VJ 1 (in N 1 it ran SunOS), and in both N 1 and 
N 2 lbli ran a version we term VJ 2 . Though it differs from the numbering, VJ 2 is the earlier of the 
two versions. It behaves the same as our generic Reno implementation except: 
1. it uses an additive constant of 4 bytes when updating cwnd during congestion avoidance, as 
opposed to MSS/8 (Eqn 11.2); 
2. it does not exhibit the ``fencepost'' error when deflating the window (x 11.5.1); 
3. it does not cut ssthresh if a timeout retransmission occurs during a fast retransmission se­ 
quence; 
4. it has a bug that leads to it always cutting ssthresh down to two segments upon any other 
timeout. 
VJ 1 behaves like VJ 2 except it does not exhibit the header­prediction bug (x 11.5.1) and 
it uses Eqn 11.1 to update the congestion window during congestion avoidance (no additive incre­ 
ment). 

169 
11.6 Receiver behavior of different TCP implementations 
In this section we examine variations in how the different implementations behave as re­ 
ceivers of data: the policies used to acknowledge newly arrived data and the effects of these on 
performance and congestion. We begin with a discussion of how different implementations ac­ 
knowledge in­sequence data, the ``normal'' case of a connection proceeding smoothly (x 11.6.1). 
We find a number of different ``policies'' for choosing exactly when to generate acknowledgements. 
Some of these have surprisingly negative performance problems. We then look at how implementa­ 
tions acknowledge out­of­sequence data: packets coming above or below a sequence hole (x 11.6.2). 
Finally, after characterizing the generation of gratuitous acks (x 11.6.3), we finish with an analysis 
of response delays, namely, how long it takes a TCP receiver to generate its acknowledgements 
(x 11.6.4). Variations in response times can introduce a significant noise term for senders that at­ 
tempt to measure round­trip times (RTTs) to high resolution. One of our goals is to assess the 
viability of sender­only timing schemes. 
11.6.1 Acking in­sequence data 
When a TCP receives in­sequence data, it needs to eventually generate an acknowledge­ 
ment for the data, so the sender knows it has been successfully received and can release the resources 
allocated for retaining the data in case it required retransmission. There is a basic tension between 
acknowledging data quickly versus waiting to see if more in­sequence data arrives so that a single 
ack can take care of acknowledging multiple data packets. 10 The more acks the receiver generates, 
the more network resources its feedback stream consumes; but also the more likely in the face of 
packet loss that enough acks will reach the sender that it will not retransmit unnecessarily, and the 
smoother the resulting stream of transmitted packets, since the window moves in numerous, small 
increments rather than rare, large increments. 
TCPs need to assure that they acknowledge data quickly enough that the sender does not 
erroneously conclude a packet was lost and retransmit it. The TCP standard requires that acknowl­ 
edgements be delayed no more than 500 msec, and either recommends or requires (x 4.2.3.2 and 
x 4.2.5 of [Br89]) that a TCP acknowledge upon receiving the equivalent of two full­sized packets, 
that is, 2\DeltaMSS bytes (x 9.2.2). 
As discussed in x 11.4, tcpanaly associates the acks generated by a TCP with the data 
packet that prompted the TCP to send the ack, allowing determination of the acknowledgement 
delay. It also classifies acks into three categories, those for less than two full­sized packets (``delayed 
acks''), those for two full­sized packets (``normal acks''), and those for more than two full­sized 
packets (``stretch acks''). We expect: delayed acks to incur considerable delay as the TCP waits 
hoping for more data to acknowledge; normal acks to be commonplace in any connection with 
significant data flow, and to take little time to generate; and stretch acks to be rare. We now treat 
each in turn. 
Delayed acks 
In both N 1 and N 2 , it was exceedingly rare to observe a delayed ack that took longer than 
500 msec, on the order of one trace in 1,000. 
10 Or to see if the ack can piggyback on a data packet or window update being sent back to the sender. 

170 
All of the BSD­ (i.e., Tahoe­ and Reno­) derived implementations in Table XV use a 
delayed­ack timer of 200 msec, meaning that, except for truly unusual conditions (presumably when 
the host was very busy doing something else), they generate delayed acks within 200 msec of 
receiving the corresponding packet. These delays are furthermore evenly distributed over the range 
0 msec to 200 msec, a consequence of the implementations using a 200 msec ``heartbeat'' timer. 
Every time the timer expires, they check to see whether new, unacknowledged data has arrived. If 
so, they generate an ack. The fact that the new data may have arrived at any point since the last 
heartbeat leads to the even distribution of the delays. 
Linux 1.0 always immediately acknowledges newly arrived in­sequence data, so, by 
tcpanaly's definition, all of its acks are delayed acks. It usually generates the ack within 1 msec. 
Solaris TCP differs from the others in that it uses a 50 msec interval timer, scheduled upon 
the arrival of each packet, instead of a 200 msec heartbeat timer. As a result, the delay is generally 
very close to 50 msec (slightly lower, perhaps because the timer is scheduled before the packet filter 
timestamps the arriving data packet; cf. x 10.3.6), though it is a configurable parameter. One might 
think that a shorter delay would lead to better performance because the sender waits less before 
receiving the ack. We note, however, that, for certain link speeds, a low value such as 50 msec 
guarantees that every ack for in­sequence data will be a delayed ack, which is instead counter­ 
productive because the sender winds up waiting longer for acks in terms of the delay required 
to acknowledge two packets. Suppose the delay timer is set for t seconds, the maximum data 
transfer rate the Internet path can support is ae bytes/sec and the data packets have size b bytes. Then 
whenever: 
t ! b=ae; 
it is impossible that two full­sized data packets will arrive before the delay timer expires. 11 Con­ 
sequently, the sender will wait an extra t seconds for the acknowledgements of every two pack­ 
ets. If t = 50 msec and b = 512 bytes, then if ae ! 10 KB/sec, the delay will be sub­optimal, 
leading to acking of every packet even if they arrive as fast as possible. This range includes the 
still­quite­common rates of 56 Kbit/sec and 64 Kbit/sec. If, however, t = 200 msec, then only for 
ae ! 2:5 KB/sec is the delay sub­optimal. This rate includes some of today's modems, but no other 
commonly used link technologies. 
Finally, we temper this discussion by noting that the deficiency is fairly minor. Yes, a low 
delay timer results in extra ack traffic, and somewhat elevated RTTs. However, acks are small, so 
the additional traffic load is likewise small, and the additional latency is bounded by the small timer 
setting to an often­imperceptible value. 
Normal acks 
We term an ack ``normal'' if it is for two full­sized packets. Since our study concerns 
unidirectional bulk transfer, we expect that most of the time the receiving TCP will have plenty of 
opportunity to generate normal acks. 
BSD­derived TCPs do not simply generate acknowledgements every time they receive 
two in­sequence, full­sized packets. Instead, they generate the acknowledgements when the receiv­ 
ing application process has consumed that much data, even if the data it consumed was actually 
delivered in earlier packets. This means that normal acks are not always promptly generated. We 
11 Well, almost impossible. See x 16.3.2. 

171 
Time 
Sequence 
# 
0.0 0.2 0.4 0.6 0.8 1.0 1.2 
0 
20000 
40000 
60000 
80000 
100000 
Figure 11.11: Sequence plot showing Solaris 2.4 acknowledgments (large squares) during initial 
slow­start 
analyze the timing of their generation below in x 11.6.4. Here we simply note that quite frequently 
the delay in generation is very small, presumably because it takes little time for the application 
process to consume the new data. 
Since Linux 1.0 TCP acks every packet, it does not generate normal acks, by tcpanaly's 
definition of ``normal.'' Solaris TCP generates normal acks after an initial slow­start sequence, but 
not before (see next section). 
Stretch acks 
Every implementation in our study except Linux 1.0 sometimes generates ``stretch'' acks, 
that is, acknowledgements for more than two full­sized packets, contrary to [Br89] (though they all 
came less than 500 msec after the last packet they were acknowledging). We suspect most of these 
occur because of delays in the application process consuming the newly arrived data (discussed 
above). For most implementations and sites, stretch acks usually were for no more than three full­ 
sized packets. 
Some implementations and sites, however, were especially prone to large stretch acks, par­ 
ticularly some of the IRIX sites. These instances, however, were intermittent (except for Solaris--- 
see below): quite often, the site would not generate a significant number of stretch acks, other times 
it would. Most likely this intermittence reflects periods of heavy versus light load. The IRIX sites 
might be particularly prone because of some peculiarity of how the IRIX scheduler deals with heavy 
processor contention: if it delays competing processes for lengthy periods of time, this could easily 
translate into stretch acks. For example, we noticed that adv often generated stretch acks sepa­ 
rated by almost exactly a multiple of 30 msec, and posit that 30 msec reflects the host's scheduling 
quantum. 

172 
Time 
Sequence 
# 
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 
0 
20000 
40000 
60000 
80000 
100000 
Figure 11.12: Corresponding burstiness at sender 
Solaris TCP, however, generates stretch acks in quite a different manner. It apparently has 
been tuned so that, during the initial slow­start, it generates only one ack for each increasingly­large 
``flight'' of packets. Figure 11.11 shows how this works, using a trace recorded at a Solaris receiver. 
Here, the acks are shown with large squares, since they lie directly on top of the end of each initial 
slow­start flight. The delay between the final packet of a flight and the corresponding ack is only 
100's of ¯sec---much too small for timer­driven acking. Since the TCP appears to ``know'' exactly 
when each flight ends without waiting any appreciable time for additional packets, we conclude that 
it does indeed know: it predicts that each flight will be one packet larger than the previous flight 
(which is exactly the case during slow­start, if each flight elicits only one ack in reply), and counts 
exactly that many packets before acknowledging. 
At around T = 0:9 a data packet was lost, and thus the prediction that 10 packets would 
arrive in that flight failed. The ack for the 9 packets that did arrive is sent when the delayed­ack timer 
expires, 49 msec after the last packet in the flight arrived. The packets liberated by this ack then 
arrive above the sequence hole and the TCP generates a series of duplicate acks in response, and 
the sending TCP retransmits the missing packet. Note that, after this point, the Solaris TCP gives 
up on trying to ack just once for each flight, and falls back on acking every three full­sized packets 
(in violation of [Br89]), or fewer if the delayed­ack timer expires before three arrive. This behavior 
also fits with our hypothesis that the TCP is predicting flights by counting slow­start cycles: once 
the connection is no longer in slow­start, the TCP cannot easily determine the size of the next flight, 
so it falls back on a less sparse acking policy. 
It seems very likely that this acking behavior was developed in order to maximize through­ 
put for local­area networks. We are led to speculate that this is the case, because the acking policy 
has four major drawbacks for wide­area network use, worth discussing in detail because at first 
blush one might find such a frugal ack policy attractive as apparently efficient and streamlined: 
1. Because each ack advances the window by increasingly large amounts, the acking behavior 

173 
Time 
Sequence 
# 
0.0 0.5 1.0 1.5 
0 
10000 
20000 
30000 
40000 
50000 
Figure 11.13: Sequence plot showing retransmission timeout due to loss of single Solaris 2.4 ack 
leads to progressively burstier transmissions by the sender. Figure 11.12 shows the same trace 
as in Figure 11.11 except recorded at the sending TCP. We see increasingly taller ``towers'' of 
packets, sent at rates up to 1.15 Mbyte/sec, completely saturating the local Ethernet. While 
a local area network might be able to accommodate such burstiness, it can be very hard on 
a wide­area network, because it leads to rapid queue growth if the bottleneck bandwidth is 
significantly lower (in this connection, tcpanaly calculated it to be about 350 Kbyte/sec 
(evidently two T1 circuits), using the methodology discussed in Chapter 14). This queue 
variation then potentially perturbs all the other connections currently sharing the bottleneck 
link, by delaying their packets and perhaps causing their packets to be dropped. 
Much better is for the packets to be spaced out more evenly, approaching the bottleneck 
bandwidth, which will happen naturally due to ``self­clocking'' (x 9.2.5) if the receiving TCP 
generates acks at a quicker rate. See [BP95b] for a discussion of TCP sender modifications to 
achieve smoother spacing in the face of large ack advances. 
2. Because only one ack is sent per round­trip time, the connection loses the usual benefit of 
exponential window­increase during slow­start. On the kth slow­start flight, the Solaris ack­ 
ing policy will lead to exactly k packets in flight. A policy of ack­every­packet, on the other 
hand, leads to 2 k 
network path with a large bandwidth­delay product. 
3. Because only one ack is sent per round­trip time, the resulting connections are brittle in 
the face of packet loss, which is much more prevalent in wide­area networks than local­ 
area networks. Since each flight of data elicits only one ack in response, if the ack is lost, 
then the data/ack ``pipeline'' must shut down with an expensive (in terms of performance) 
retransmission timeout, because the sender will not receive any more information about the 
data it sent. Figure 11.13 shows a trace recorded at a Solaris receiver in which this occurred. 

174 
Time 
Sequence 
# 
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 
0 
20000 
40000 
60000 
80000 
100000 
Figure 11.14: Receiver sequence plot showing lulls due to Solaris 2.3 acking policy 
The slow­start progresses normally until about T = 0:8, at which point the lone ack for 
the 8th slow­start flight is lost. Even though none of the data packets were lost, the entire 
connection must shut down until a timeout about 700 msec later restarts the flow of data, and 
then proceeds on from this point at an unnecessarily reduced transmission rate, due to TCP 
congestion avoidance. With a more prolific acking policy, loss of the ack would have had 
no effect on the data flow whatsoever, since more data would have arrived shortly (liberated 
by acks for packets earlier in the flight) and elicited more acks in response, keeping the flow 
alive. 
4. Finally, the Solaris acking policy is provably sub­optimal in the following sense. One of the 
goals of a solid implementation of a byte­stream transport protocol such as TCP should be 
that, in the absence of any competing network traffic, a transport connection should quickly 
reach a state in which it delivers packets to the receiving end continuously and at the available 
bandwidth. Yet, the Solaris acking policy cannot achieve this goal, even if we allow its linear 
slow­start window increase discussed above to qualify as ``quickly.'' 
The fundamental problem is that, regardless of how large the slow­start flight grows, it always 
eventually comes to an end, at which point the Solaris TCP sends the sole ack for that flight. 
While that ack is traversing the network back to the sender, the sender is perforce doing 
nothing, because it has already sent its entire flight and cannot send any more data until an 
ack arrives to advance the window. Thus, the Solaris acking policy guarantees that a lull equal 
to the round­trip time will accommodate each flight of data. As long as the sender remains in 
slow­start, the receiver will never see a continuous stream of packets arriving at the available 
bandwidth! 
Figure 11.14 illustrates this problem. This connection has a RTT of about 44 msec, and a 
T1 bandwidth limit of about 170 Kbyte/sec. Thus, the connection's bandwidth­delay product 

175 
Time 
Sequence 
# 
0.0 0.5 1.0 1.5 
0 
20000 
40000 
60000 
80000 
100000 
Figure 11.15: Sequence plot showing more frequent acking leading to ``filling the pipe'' 
is about 8 Kbyte, so if the sending TCP has this much data in flight at one time, ordinarily 
that would suffice to ``fill the pipe'' and completely utilize the available bandwidth. Near the 
end of the connection, it has more than 8 Kbyte in flight, and yet still does not achieve full 
utilization, due to the 44 msec delays incurred at the end of each flight. 
The only Solaris TCP in our study that did not exhibit this problem was austr2, because its 
bottleneck bandwidth of about 13 Kbyte/sec was so small that the delay ack timer (50 msec 
in Solaris) would often expire before the full flight could arrive. 
Other acking policies avoid this problem because, by acking more often, they can ensure (for 
a large enough window) that the sender will have additional data already in flight by the time 
the current flight ends. As the window grows sufficiently large, the packets from this next 
flight will arrive closer and closer to the end of the first flight, until eventually the distinction 
between flights blurs and the connection settles into a continuous stream of arriving data 
packets. Figure 11.15 shows such a connection, with the same sender as in Figure 11.14. 
Note that this connection had a longer RTT than that shown in Figure 11.14, which explains 
why it happened to achieve only the same overall throughput, instead of higher throughput, 
which would have been the case for equal RTTs and a greater degree of ``filling the pipe.'' 
11.6.2 Acking out­of­sequence data 
When a TCP receives a packet with out­of­sequence data, it either must generate an ack­ 
nowledgement, if the data corresponds to data already acknowledged, which we term ``below se­ 
quence''; or should generate an acknowledgement, if the data is for a sequence number beyond what 
has been previously acknowledged, which we term ``above sequence'' [Br89]. (These situations are 
also discussed above in x 11.4.1.) For example, suppose a TCP has received contiguous data up to 
sequence 10,000. If it now receives data with a sequence number below 10,000, then it must gener­ 

176 
ate another acknowledgement for sequence 10,000. If, instead, it receives data starting at sequence 
number 11,000, then it should generate another acknowledgement for sequence 10,000. 
In both cases, the acknowledgement generated is for the highest in­sequence data re­ 
ceived. The reason for generating acks in the first case is that the sender has retransmitted unnec­ 
essarily and thus appears confused as to how much data the receiver has in fact received, so the 
receiver needs to inform the sender again of what it has received. The reason for generating acks in 
the second case is to enable ``fast retransmit,'' discussed in x 9.2.7. 
Of the TCPs in Table XV, only SunOS 4.1 exhibited unusual behavior when receiving 
out­of­sequence data. While it generally will immediately acknowledge below­sequence packets, 
it does not always do so, and it never immediately acknowledges above­sequence packets. Instead, 
it apparently checks upon each expiration of the 200 msec delayed­ack heartbeat timer whether 
any above­sequence (or, sometimes, below­sequence) data has arrived. If so, it generates a single 
duplicate acknowledgement reflecting its current upper­sequence limit. 
One other form of ``mandatory'' ack not generated by SunOS 4.1 concerns the initial SYN 
packet used to begin establishing a TCP connection. SunOS 4.1 TCP appears to ignore retrans­ 
missions of the initial SYN once it has sent a SYN­ack, and instead continues retransmitting (upon 
timeout) the SYN­ack until it is acknowledged. This behavior has only minor implications concern­ 
ing a possible delay in establishing connections when the first SYN­ack is lost. 
Other than SunOS, all the implementations in our study tend to generate mandatory ac­ 
knowledgements promptly (though we have observed more than 1 minute delays for a Solaris im­ 
plementation while it waited for a sequence hole to be filled!). The few times tcpanaly detected 
a failure to send a mandatory ack were generally due to either vantage­point problems (x 10.4), 
packet­filter resequencing errors (x 10.3.6), or confusion caused by checksum errors. 
The only other failure we observed with respect to generating mandatory acks is with 
Solaris 2.3 TCP. If it receives a packet containing only a FIN option (no data), and arriving above­ 
sequence, then it simply ignores the packet. If the packet contains data, then it elicits a duplicate ack 
like any other above­sequence arrival, but the presence of the FIN bit is forgotten (so if the sequence 
hole is filled, the TCP will acknowledge all of the data but not the FIN). This behavior is fixed in 
Solaris 2.4, and is the only difference in behavior we observed between the two implementations. 
11.6.3 Gratuitous acks 
tcpanaly includes in its analysis checking for ``gratuitous acks,'' meaning acknowledge­ 
ments that as far as it could determine simply did not need to have been sent. These are quite rare. 
For example, only about 0.5% of the N 2 receiver traces exhibited a gratuitous ack. SunOS 4.1 TCP 
is particularly apt to generate them; Figure 11.16 shows a typical gratuitous ack produced by this 
implementation. The acknowledgement at T = 0:4 is sent on the delayed­ack timer, because the 
TCP has received above­sequence data that it cannot directly acknowledge. 12 (As noted in x 11.4.1, 
SunOS 4.1 does not acknowledge each above­sequence packet.) The second ack, at time T = 0:6, 
appears completely unneeded. It was sent almost exactly 200 msec after the first ack in the plot, so 
almost certainly due to the delayed­ack timer. While the last data packet arrived shortly before the 
T = 0:4 ack was sent, we suspect is had not yet been processed, and its processing led the TCP 
to generate another ack the next time the delayed­ack timer expired. (So this example is really a 
12 This ack includes the same offered window as its predecessor; it was not sent in order to update the window. 

177 
Time 
Sequence 
# 
0.3 0.4 0.5 0.6 0.7 0.8 
20000 
25000 
30000 
35000 
40000 
Figure 11.16: Sequence plot showing gratuitous acknowledgement 
vantage­point problem, per x 10.4.) 
tcpanaly can also become confused and falsely conclude a gratuitous ack was sent if the 
TCP takes a particularly long time to generate an ack, or if a checksum error confuses tcpanaly's 
analysis of cause and effect. Figure 11.17 shows an example of the former, in which tcpanaly 
views the lower ack sent at T = 1:28 as gratuitous, even though it was really a response to an out­of­ 
order packet 21,745 received shortly before the packet preceding it in sequence, around T = 1:26. 
Thus, it took the TCP in this example (HP/UX 9.05) more than 20 msec to generate the mandatory 
ack required by receiving an out­of­sequence packet, which in the presence of the earlier (likewise 
tardy) ack for the same sequence number at T = 1:26 sufficed to confuse tcpanaly as to why the 
second ack was sent. 
One other form of gratuitous ack we observed occurs with Linux 1.0. It will generate an 
ack if 30 seconds have elapsed without any newly arriving packets. Presumably, this ack is intended 
to resynchronize the sender with the receiver in the face of a lull induced by the loss of previous 
acks. 
11.6.4 Response delays 
As discussed in x 9.1.3, there are a number of advantages to network measurement 
schemes that rely only on the ability to record packet timings at one of the two connection end­ 
points. One of the main advantages is that it is logistically much easier to secure single­endpoint 
measurements than dual­endpoint. For example, TCP Vegas has as one if its central congestion con­ 
trol mechanisms an analysis of round­trip times measured by the TCP sender [BOP94]. The goal of 
these measurements is to infer how the sender's window changes are affecting the queueing delays 
in the network, i.e., how the sender's behavior is utilizing networking resources. As developed in 
[BOP94], the RTT timings central to the congestion control policy are made solely by the sender. 

178 
Time 
Sequence 
# 
1.20 1.22 1.24 1.26 1.28 1.30 
18000 
19000 
20000 
21000 
22000 
Figure 11.17: Sequence plot showing false gratuitous acknowledgement 
Not needing to rely on cooperation by the receiver in making these measurements is a great boon, 
but it carries with it the risk of having to make control decisions based on considerably less precise 
measurements than could be obtained if the receiver cooperated. 
In this section we look at the variation among a TCP's response delays, by which we mean 
how much time the TCP takes to generate an acknowledgement for new data it has received. We 
are interested in the variation because it directly affects the precision with which a sending TCP can 
measure round­trip time delays. If the receiving TCP exhibits large variations in the time it takes 
to generate acknowledgements, and if the sender has no way of factoring out these delays, then the 
sender must contend with considerable noise in its RTT measurements, perhaps enough to render 
impractical the accurate assessment of the network's state based on sender­only measurement. 
As we argue elsewhere (Chapter 16), often what is of greatest interest is variations in 
networking delays rather than the absolute magnitude of the delays. Thus, we do not concern 
ourselves in this section with the mean time a TCP takes to generate an acknowledgement, as this 
contributes nothing to errors in measuring delay variation. Instead, we focus on the variation of the 
time taken to generate an acknowledgement. 
Furthermore, we assume that the sender can eliminate one of the common sources of delay 
variation, namely delayed acks. These are easy to spot, because any time an ack is received that 
advances the window by less than two full­sized packets, the ack was potentially delayed, so RTTs 
derived from its arrival should not be trusted beyond the 200 msec of variation known to frequently 
attend delayed acks. 
We also assume that acks generated for exceptional conditions such as out­of­sequence 
data are not of much interest, since they generally indicate that the sending TCP is about to enter an 
exceptional state (retransmission) anyway. Thus, we confine ourselves to the time taken by different 
TCPs to generate acks for two or more full­sized, in­sequence packets. 
The maximum time taken by a TCP to generate a ``normal'' ack (x 11.6.1) is almost always 

179 
less than 200 msec and often less than 50 msec, no doubt reflecting the BSD and Solaris delayed­ack 
timer intervals. We have, however, observed values as high as 1.6 sec. (The mean time taken is less 
than 1 msec in about two thirds of our traces, and less than 10 msec in about 95% of our traces.) 
One final important point is that to assess response time we compute the standard devia­ 
tion (oe) of the response time, rather than using a more robust statistic (x 9.1.4). We do so because 
we argue that a real­time sender­based measurement scheme will only be able to make fairly cheap 
assessments of delay variations, rather than employing robust statistics. Furthermore, even if the 
sender can afford to compute robust statistics on the packet timing measurements it gathers, it will 
still have the serious problem of discerning between ``outliers'' due to receiver delays versus those 
due to genuine networking effects. Thus, we argue it is reasonable to assess delay variations in 
terms of standard deviation, even though we know this estimator can be seriously misleading in the 
presence of occasionally quite large, exceptional values. 
In assessing both N 1 and N 2 , we find that about two thirds of the time oe calculated for 
the response time is below 1 msec. These cases are good news for sender­based measurement, since 
often clock resolutions are not appreciably more accurate than 1 msec anyway (x 12.4.2). However, 
the mean value for oe was about 5 msec, and for the one­third of the traces with oe ? 1 msec, the 
mean climbs to 15 msec. 
There is a great amount of site­to­site variation among the average values of oe, no doubt 
reflecting large variations in average site­to­site load. For example, adv, an IRIX system, has an 
average value of oe just under 1 msec, while bnl, another IRIX system, has an average value of over 
5 msec. 
We conclude that, for high­precision, sender­only RTT measurement, the ack response 
delays will often not prove an impediment; but sometimes they will, meaning that the intrinsic 
measurement errors will be large enough to possibly swamp any true network effects we wish to 
quantify. Here, ``often not'' is roughly 2/3's of the time, ``sometimes they will'' is 1/3 of the time, 
and ``large enough'' is on the order of 15 msec. Naturally, the point at which the noise impairs 
measurement and control depends on the particular time constants associated with the connection, 
and with what information the TCP wishes to derive from its measurements. 
11.7 Behavior of additional TCP implementations 
Our analysis of TCP behavior above revealed two implementations with particularly sig­ 
nificant problems: Linux 1.0 and Solaris (2.3 and 2.4). These implementations were both written 
independently of any of the others. Furthermore, of the 15 other implementations we studied, none 
of which exhibited problems of the same magnitude as these two, all were derived from a common 
implementation, the BSD Tahoe/Reno releases. Thus, we find a striking dichotomy between those 
TCP implementations exhibiting serious problems, and those that do not: the former were written 
independently, the latter built upon the Tahoe/Reno code base. 
We interpret this difference as highlighting the fact that implementing TCP correctly is 
extremely difficult. The Tahoe/Reno implementations benefited from extensive development and 
testing by a host of TCP experts. Furthermore, they were the code base used by Jacobson to imple­ 
ment the algorithms in his seminal paper on TCP congestion behavior [Ja88]. 
However, to test our hypothesis that implementing TCP independently is difficult and 
fraught with error, we need to examine other independent implementations. To do so, we gathered 

180 
Time 
Sequence 
# 
155 156 157 158 159 160 
438000 
440000 
442000 
444000 
446000 
448000 
Figure 11.18: Sequence plot showing Windows 95 TCP transmit problem 
tcpdump traces 13 of three additional TCPs: Windows NT, Windows 95, and Trumpet/Winsock, all 
implementations for personal computers. 14 
We analyzed these traces by studying sequence plots of their behavior. We did not in­ 
tegrate them into tcpanaly because we had only a handful of traces to study. These sufficed, 
however, to find some interesting behavior. 
11.7.1 Windows NT TCP 
We inspected four traces of Windows NT TCP, two of it sending data and two of it receiv­ 
ing data. We found no serious problems. It does not do fast retransmit, but this only impedes its own 
performance; it does not affect network stability (if anything, it abets stability). The only unusual 
aspect of its behavior we found is that its congestion window during its initial slow­start begins at 
2 packets instead of 1. This could be a calculated decision to improve initial performance, or a bug 
due to treating the ack that completes the three­way SYN handshake establishing the connection as 
opening the congestion window. 
11.7.2 Windows 95 TCP 
We obtained only two traces of Windows 95 TCP, one of it sending data and one of it 
receiving. The sending trace exhibited a striking performance problem: often when it could send 
out two packets, only the second appeared to have been sent, and the first would subsequently be 
13 Many thanks to Kevin Fall for undertaking the measurement of these. 
14 We have subsequently been informed that the Windows NT and Windows 95 TCPs are in fact the same implementa­ 
tion. We observed different, but not inconsistent, behaviors between them, as noted below. In particular, the Windows 95 
behavior that we did not observe in Windows NT may be due to the particular software/hardware combination used when 
obtaining the Windows 95 traces, which differed from that used to obtain the Windows NT traces. 

181 
Time 
Sequence 
# 
0.0 0.1 0.2 0.3 0.4 0.5 0.6 
0 
2000 
4000 
6000 
8000 
10000 
Figure 11.19: Sequence plot showing Trumpet/Winsock TCP skipping initial slow start 
sent via timeout ``retransmission.'' Figure 11.18 shows this problem. A pattern of one­ack, two­acks, 
one­ack, two­acks repeats. The first ack (such as the one a bit before T = 155) reflects a timeout 
retransmission filling a sequence hole. The congestion window is evidently one packet at this point. 
The TCP sends a single packet and this is acknowledged about 150 msec later. It then apparently 
sends not the next in­sequence packet, but the one after that. Receiving this out­of­sequence packet 
elicits a dup ack from the remote TCP, but only one, after which no more acks arrive. The sending 
TCP thus times out and retransmits the packet it should have sent in the first place, and the cycle 
repeats. Eventually it breaks out of the cycle, by sending two back­to­back packets when called for 
by a newly­received ack. 
We suspect the problem is that the TCP is sending both packets, but the first is frequently 
being dropped by the network interface card, perhaps because the second arrives too closely on its 
heels. This would explain why the problem is sporadic, and also why it may have gone unnoticed 
during development of the TCP. Certainly, if this problem is widespread, then Windows 95 TCP 
users suffer from very poor performance. Since the retransmission problem lies wholly within the 
sending host, however, it does not threaten network stability in any way. 
11.7.3 Trumpet/Winsock TCP 
The last independently implemented TCP we studied was Trumpet/Winsock. We obtained 
13 traces of its behavior. Some were made with version 2.0b and some with version 3.0c. We did 
not detect any difference in behavior between the two, even though the release notes of 3.0c indicate 
it fixed a retransmission problem with version 2. 
The first problem Trumpet/Winsock TCP exhibits it that it skips the initial slow start. 
Figure 11.19 illustrates this behavior. The connection is established just after T = 0. The TCP 
waits 400 msec and then dumps 6 packets of 1460 bytes (except the first, which is 512 bytes) 

182 
Time 
Sequence 
# 
6.8 7.0 7.2 7.4 7.6 7.8 8.0 8.2 
200000 
210000 
220000 
230000 
240000 
250000 
Figure 11.20: Sequence plot showing Trumpet/Winsock TCP skipping slow start after timeout 
without waiting for any acknowledgements. When the first ack arrives, the window simply slides 
and more packets go out. Over time the window opened to 9 packets. 
It further skips slow start after timeout retransmission. Figure 11.20 illustrates this be­ 
havior. At T = 7:6, a packet times out and is retransmitted. When an acknowledgement for it and 
a number of other successfully received packets is received, the TCP sends another 8 packets, and 
when an ack for the first four of these arrives (along with dups), another 9 are sent! (We observed 
similar behavior even if the ack for the retransmitted packet only acknowledged a few packets be­ 
yond it.) We did also observe some apparent slow­start sequences after retransmission timeouts 
(though duplicate acks received during this sequences advanced the congestion window), indicating 
that the notion of entering slow start after timeout is present in the implementation, but incorrectly 
implemented. These sequences had one other unusual aspect, which is that they began with the 
transmission of a packet followed 10 msec later by a retransmission of that same packet. 
We are, unfortunately, not yet finished with cataloging Trumpet/Winsock TCP's imple­ 
mentation flaws. Figure 11.21 shows the TCP's acking policy. The trace was recorded at a Trum­ 
pet/Winsock receiver of a bulk transfer. The only acks it sent are those shown distinctly in the 
plot---none were sent shortly after a data packet arrived. The acking is clearly entirely timer­driven, 
incurring similar performance implications as for Solaris (x 11.6.1), except it always acks in this 
fashion, rather than just during the initial slow­start, and it is acking off of a timer rather than when 
it knows no more data is in flight. 
The final implementation flaw we found in Trumpet/Winsock TCP is that it discards any 
above­sequence data it receives. Figure 11.22 shows this surprising deficiency. Again, the trace 
was captured at the Trumpet/Winsock side of a connection in which the TCP was receiving a bulk 
transfer. Shortly after T = 18:5, a sequence hole forms due to a packet having been dropped 
by the network. 13 more packets follow, all arriving safely, yet the TCP does not generate any 
duplicate acks indicating their reception. Furthermore, when the lost packet is finally retransmitted 

183 
Time 
Sequence 
# 
16.2 16.4 16.6 16.8 17.0 17.2 17.4 
160000 
170000 
180000 
190000 
Figure 11.21: Sequence plot showing Trumpet/Winsock timer­driven acking 
Time 
Sequence 
# 
18.5 19.0 19.5 20.0 20.5 21.0 
220000 
225000 
230000 
Figure 11.22: Sequence plot showing Trumpet/Winsock failure to retain above­sequence data 

184 
due to a timeout, we find it does not fill the hole previously created, which would lead to the TCP 
acknowledging both it and the 13 previously received packets. Instead, only it is acknowledged, 
and, as additional packets (already safely received) are retransmitted, they too form the limit of the 
acknowledged data. 
Thus, the TCP has thrown away all of the additional packets it received above the se­ 
quence hole. As noted in x 13.3, this pattern of behavior is possible when a CSLIP link generates 
a ``burst'' of checksum failures. When we first observed this behavior, we presumed that was what 
had happened. However, we 15 subsequently gathered full packet traces (no snaplen limitation on 
the amount recorded for each packet; cf. x 10.2) and enabled tcpanaly's checksum testing (x 11.2) 
to determine whether the data packets were received uncorrupted. They were, indicating that the 
TCP could have kept them but instead discarded them. Furthermore, we never observed the TCP 
generating a duplicate ack upon receiving a packet above a sequence hole, nor acting as though a 
retransmission had filled a sequence hole. 
All of these behaviors have strong, adverse impacts on network stability. Skipping slow 
start initially and after loss means that Trumpet/Winsock data transfers can present heavy bursts 
of traffic to the network when it lacks the resources to accept them. It violates [Br89]. Acking 
only when a timer expires can lead to excessive, unnecessary retransmissions when a single ack for 
many packets is dropped by the network. This also violates [Br89]. Finally, discarding successfully­ 
received above­sequence data wastes network resources as the other TCP must resend all of the data 
again. This behavior, while strongly discouraged by [Br89, x 4.2.2.20], is not strictly forbidden, 
presumably to avoid indefinitely tying up resources in the receiving TCP in cases where connectivity 
is lost with the sender. 
15 Thanks again to Kevin Fall. 

185 
Chapter 12 
Calibrating Pairs of Clocks 
In this chapter we tackle the difficult problem of calibrating the accuracy of packet filter 
timestamps. ``Wire times,'' as defined in x 10.1, lie at the heart of much of our study, and the packet 
filter timestamps are the only means we have for estimating wire times. Yet, we have no independent 
means of verifying that the timestamps reported by the packet filters are indeed accurate. We must 
instead develop self­consistency techniques for calibrating the timestamps against themselves. For 
the most part, we are successful in doing so. 
Undetected clock errors can result in serious systematic errors in our analysis of network 
dynamics, since superficially a clock error is indistinguishable from variations in packet transit 
times. These latter variations occur all the time due to queueing in the network, and we are interested 
in accurately analyzing them. 
We begin by defining in x 12.1 basic terminology for describing the different clock at­ 
tributes of ``resolution,'' ``offset,'' ``accuracy,'' and ``skew.'' We next discuss in x 12.2 why we did 
not require the clocks in our study to be synchronized, and how, if we had, use of the popular Net­ 
work Time Protocol (NTP) would not necessarily have eliminated clock problems. Since the clocks 
at the connection endpoints lacked synchronization, we introduce in x 12.3 ``relative'' counterparts 
of ``offset,'' ``accuracy'' and ``skew,'' for discussing potential disagreements between two network 
clocks. 
We then turn to methods for assessing clock resolution and relative clock accuracy 
(x 12.4, x 12.5); detecting clock adjustments (x 12.6), in which a clock quickly jumps or skews 
forward or backward because it is being set to a new absolute time; and detecting relative clock 
skew (x 12.7). Clock adjustments and skew can introduce large, artificial network ``dynamics,'' so it 
is particularly important to detect and remove these effects. 
We finish in x 12.9 with a look at how well a clock's synchronization correlates with stable 
clock behavior (lack of adjustments and of skew). We show that, unfortunately, a high degree of 
synchronization between two clocks does not necessarily mean that the clocks are free of relative 
errors. 
12.1 Basic clock terminology 
In this section we define basic terminology for discussing the characteristics of the clocks 
used in our study. The Network Time Protocol (NTP; [Mi92a]) defines a nomenclature for dis­ 

186 
cussing clock characteristics, which we will use as appropriate. It is important to note, however, 
that the main goal of NTP is to provide accurate timekeeping over fairly long time scales, such as 
minutes to days, while for our purposes we are concerned with much shorter­term accuracy, namely 
between the beginning of a network transfer and its end. This difference in goals sometimes leads 
to different definitions of terminology, as discussed below. 
12.1.1 Resolution 
A clock's resolution is the smallest unit by which the clock's time is updated. It gives a 
lower bound on the clock's uncertainty. (Note that clocks can have very fine resolutions and yet 
be wildly inaccurate.) It is crucial that this uncertainty be propagated when deriving estimates of 
network properties from timestamps produced by the clock. 
Note that we define resolution relative to the clock's reported time and not to true time, 
so for example a resolution of 10 msec only means that the clock updates its notion of time in 
0.01 second increments, not that this is the true amount of time between updates. 
12.1.2 Offset 
We define a clock's offset at a particular moment as the difference between the time re­ 
ported by the clock and the ``true'' time as defined by national standards. If the clock reports a time 
T c and the true time is T t , then the clock's offset is T c 
accuracy also includes a notion of the frequency of the clock; for our purposes, we split out 
this notion into that of skew, because we define accuracy in terms of a single moment in time rather 
than over an interval of time. 
12.1.4 Skew and drift 
A clock's skew at a particular moment is the frequency difference (first derivative of its 
offset with respect to true time) between the clock and national standards. 
As noted in [Mi92a], real clocks exhibit some variation in skew. That is, the second 
derivative of the clock's offset with respect to true time is generally non­zero. [Mi92a] defines 
this quantity as the clock's drift. We in general will only talk about this notion in terms of clock 
adjustments, during which the clock's time is rapidly altered, because during the small time scales 
of interest for our study, only large drift values have discernable effects. 1 
1 We will see in x 12.7 that, for the time scale of a single TCP connection in our study, relative clock skew is nearly 
always very close to linear, indicating near­zero relative drift over small time scales. 

187 
12.2 Lack of synchronized clocks 
When designing the Network Probe Daemon (NPD) experiment, we made an early deci­ 
sion not to require synchronization between the clocks at the participating NPD sites. There were 
two reasons for this decision. First, one of the most important requirements of the experiment was 
to enlist as many participating sites as possible, in the quest for obtaining plausibly representa­ 
tive results. It was felt that requiring sites to install clock synchronization as well as bring up the 
measurement daemon would significantly add to the burden of participating in the study. 
Furthermore, it is not clear that requiring clock synchronization would help in the mea­ 
surement analysis. The main reason why it might not is because the most common form of clock 
synchronization used by Internet hosts is the Network Time Protocol (NTP). Use of NTP for the 
NPD experiment has two important shortcomings. First, NTP's accuracy depends in part on the 
properties (particularly delay) of the Internet paths used by the NTP peers, and these are exactly the 
properties that we wish to measure, so it would be less than completely sound to use NTP to cali­ 
brate our measurements. Second, NTP focuses on clock accuracy, which can come at the expense 
of short­term clock skew and drift. For example, when a host's clock is indirectly synchronized via 
NTP to a time source, if the synchronization intervals occur infrequently, then the host will some­ 
times be faced with the problem of how to adjust its current, incorrect time, T i , with a considerably 
different, more accurate time it has just learned, T a . Two general ways in which this is done are to 
either immediately set the current time to T a , or to adjust the local clock's update frequency (hence, 
its skew) so that at some point in the future the local time T 0 
i will agree with the more accurate time 
T 0 
a . (We will see examples of both of these in x 12.7.) 
A key point is that, for the NPD experiment, we are much more interested in correctly 
estimating differences between two timestamps than with the correctness of individual timestamps. 
That is, we care much more about clock skew than clock accuracy, because it is the differences that 
measure network delays. So, given a choice, we would prefer to buy very low clock skew at the ex­ 
pense of diminished clock accuracy, but NTP makes the opposite trade­off. In this respect, we prefer 
to synchronize the clocks a posteriori as we do here, after having completed the measurements. 
In the future, it may be possible to obtain highly accurate clock synchronization via a 
mechanism separate from using the network itself; for example, GPS (Global Positioning System) 
receivers. That would allow us to have both accuracy and very low skew, which would be ideal 
for network measurement. Unfortunately, obtaining such separate synchronization today remains 
rare, so it behooves us to see how much use we can make of unsynchronized or NTP­synchronized 
clocks. 
Finally, one might hope that a highly accurate clock will have very low skew, because if it 
had high skew it would not tend to be highly accurate. In x 12.9 we briefly investigate the degree to 
which this held for the closely­synchronized hosts, and find that it is only somewhat true. We also 
briefly argue in that section that, even with separate synchronization such as GPS receivers, sound 
measurement still calls for calibrating the timestamps. 
12.3 Terminology for comparing clocks 
A fundamental part of our experimental design was to arrange to record packet departures 
and arrivals at both ends of the end­to­end TCP connections between the NPD hosts. Doing so 

188 
is crucial for discriminating between network conditions on the forward path, in which the data 
packets flowed, and the reverse path, over which only the receiver's acks flowed (since the TCP 
transfers were unidirectional). While recording packets at only one of the connection's endpoints is 
logistically much easier, analyzing network effects then becomes much more difficult, because the 
forward and reverse path become deeply intertwined. 
Tracing packets at both ends, however, immediately raises questions about how to com­ 
pare the timestamps produced by the packet filters at the two endpoints. In this section, we develop 
terminology for discussing differences between the two clocks producing the timestamps. The defi­ 
nitions are, for the most part, analogous to those in x 12.1, except that, instead of comparing a single 
clock against ``true'' time, we are comparing one clock against another. 
We first introduce the meta­notation of a subscript ``s'' denoting time measured at the TCP 
sender, and ``r'' denoting time at the TCP receiver. Because our transfers are unidirectional, data 
flows only from the sender to the receiver, and acks flow from the receiver to the sender. Let C s and 
C r refer to the clocks at the sender and receiver, and R s and R r their respective resolutions. 
We define C r 's offset relative to C s at a particular true time T as T r 
C r is compared. 
Similarly, C r 's relative skew is the first derivative of C r 's relative offset with respect to 
true time. Since we lack an independent means of measuring true time, we can only estimate C r 's 
relative skew in terms of time as measured by either C s or C r . See x 12.7 for further discussion. 
If C r is accurate relative to C s (their relative offset is zero), then we will refer to the pair 
of clocks as ``synchronized.'' Note that clocks can be highly synchronized yet arbitrarily inaccurate 
in terms of how well they tell true time. This point is important because, for the analysis of our 
measurements, synchronization between C s and C r is more useful than the absolute accuracy of 
the clocks. The same is somewhat true of skew, too: as long as the absolute skew is not too great 
(x 12.7.9), then minimal relative skew is more important, as it can induce systematic trends in packet 
transit times measured by comparing timestamps produced by the two clocks. In addition, since we 
lack an independent time standard in our study, we have no general way of assessing absolute skew, 
only relative skew. 
These distinctions arise because what is often most important for our measurements are 
differences in time as computed by comparing the timestamps from the two clocks. The process of 
computing the difference removes any error due to clock inaccuracies with respect to true time; but 
it is crucial that the differences themselves reflect good approximations to differences in true time. 
For resolution, what we care about is not ``relative resolution'' but joint resolution, which 
we define as R s;r = R s + R r . This definition reflects the fact that, when comparing timestamps 
from C s with those from C r , the corresponding uncertainties must be added to properly propagate 
the resulting total uncertainty. 
While the presence of generally­unsynchronized clocks in our study presents a number 
of measurement headaches, it also provides an opportunity for detecting certain types of clock 
errors---namely adjustments and skew---that sometimes cannot be determined at all when analyzing 
timestamps produced by a single clock. We delve into methods for detecting such errors in detail in 
the subsequent sections. 

189 
12.4 Assessing clock resolution 
All of the computers participating in our study ran some variant of the Unix operating 
system. Unix defines a data structure for recording timestamps that has two fields, one for how 
many seconds have elapsed since a particular epoch, and one for how many microseconds have 
elapsed since the beginning of the current second. Thus, timestamp resolution is never better than 
1 ¯sec. It can be much worse. 
The basic idea behind estimating the resolution of the packet filter timestamps produced 
by the clocks in our study is to examine consecutive timestamps to determine the smallest difference 
between them. Unfortunately, Unix systems differ on how they report the time on subsequent calls 
during which the (digital) clock has not advanced. Some systems simply return the same unchanged 
time as given for previous calls. These are easy to detect, by disregarding timestamp differences of 
zero when determining clock resolution. 
Others Unix systems add a small increment to the reported time to maintain monotone­ 
increasing timestamps. We will refer to these adjustments as monotonicity increments. For such 
systems, we do not want to consider monotonicity increments when evaluating the clock's reso­ 
lution, since they are artifacts of a more coarse resolution. Such systems generally increase the 
clock by 1 ¯sec to maintain monotonicity, but we cannot simply disregard timestamp differences of 
exactly 1 ¯sec, because it is possible that other processes running on the same machine (or even the 
packet filter, when discarding unwanted traffic) have queried the clock multiple times, making the 
increase n ¯sec. We proceed by hoping that occasionally n is small (in particular, n ! 5), so that, 
if we observe a very small, positive timestamp difference, then we can infer that the system uses 
monotonicity increments. 
12.4.1 Method for assessing resolution 
Taking these considerations into account, we use the following method for estimating the 
clock resolution b 
R: 
1. Let T i ; 0 Ÿ i Ÿ n be the ith packet filter timestamp, given n + 1 successive timestamps. 
2. Let \DeltaT i = T i 
1 ; 1 Ÿ i Ÿ n, the differences between successive timestamps. 
3. If any \DeltaT i is less than zero then the timestamps exhibit time travel, and the timing is untrust­ 
worthy (x 10.3.7). 
4. If any \DeltaT i is greater than zero but less than 5 ¯sec, then set b 
R 0 to the smallest \DeltaT i greater 
than 100 ¯sec. 
5. Otherwise, set b 
R 0 to the smallest \DeltaT i greater than zero. 
This method either produces b 
R 0 , an initial bound on the clock resolution, or the determination that 
the timestamps are polluted by time travel. If the former, we then form our estimate b 
R as b 
R 0 rounded 
to two decimal digits. 
2 The rounding is primarily to introduce a reminder that b 
R is only a rough 
2 The exact algorithm used by tcpanaly is slightly more complicated. It executes the above algorithm ``on the fly,'' 
for historical reasons. To minimize computation, tcpanaly only decreases b 
R 0 if a new value is at least 2.5% smaller 
than the best value so far. 

190 
estimate, and not to be taken too exactly. It is also useful for ensuring that a resolution like 10 msec 
is expressed as such, rather than 9.999 msec, as can happen if two timestamps differ by slightly less 
than 10 msec because of a monotonicity increment. 
Note that this computation of b 
R produces at best an upper bound on R, the clock's true 
resolution, because it may happen that the packet filter never receives back­to­back packets as little 
as R seconds apart. For our purposes, this inaccuracy is acceptable, because the extra error intro­ 
duced is conservative in the sense that it only widens the uncertainties we associate with our timing 
analysis. 
12.4.2 Results of assessing resolution 
tcpanaly uses the method outlined in the previous section to estimate the timestamp 
resolution of each trace it analyzes. We would hope to always observe roughly the same value for 
each particular packet filter, since a computer clock's resolution changes only very rarely (due to a 
hardware or perhaps operating system upgrade). This is indeed the case. Here we summarize the 
resolutions of the timestamps returned by the different packet filters. 3 
Three of the systems, oce, ucol (during N 1 ), and xor, always had an estimated resolu­ 
tion of 10 msec. Their operating systems were IRIX 4.0, SunOS 4.1.3, and Solaris 2.3. A number of 
other sites running these operating systems also participated in the study, all with finer resolutions, 
so the limitations must be due to either hardware constraints or user configuration, rather than be­ 
ing fixed by the operating systems. We did not further investigate the hardware differences, as our 
primary interest is in accurately estimating a packet filter's timestamp resolution, and not the details 
of why the resolution is what it is. 
The coarse 10 msec resolution proves problematic during our later analysis, because it 
makes it difficult to resolve, for example, bottleneck bandwidths with any sort of precision. We 
address this difficulty in x 14.7. 
One system, sandia, also running IRIX 4.0, always had an estimated resolution of either 
1 msec or 990 ¯sec. 
All of the Digital Unix OSF/1 systems (harv, mit, umann, ucol in N 2 ) always had a 
resolution of 980 ¯sec or 970 ¯sec, which matches a clock advance of 2 10 = 1; 024 ticks/sec. 
Some of the SunOS (nrao, umont, unij) and BSDI (austr, rain) always had reso­ 
lutions – 200 ¯sec, while other SunOS and BSDI systems had finer resolutions, again suggesting 
hardware differences or user configuration. 
Of the remainder, all exhibited resolutions finer than 200 ¯sec, though not in every trace. 
The median resolutions over all of the traces were almost always in the 10­300 ¯sec range. This 
turns out to be ample for our purposes. 
Finally, we note that estimates based on packet traces from a given host H receiving a 
unidirectional data transfer tend to be slightly larger (more coarse) than those from traces of H 
sending the data. The difference is on the order of 3--25%. It can be understood in terms of the 
overestimation effect discussed in the previous section, namely that, if the packet filter never sees 
back­to­back packets with a spacing equal to the clock resolution, then tcpanaly has no opportu­ 
nity to accurately estimate the resolution. A TCP sender will often send two packets back­to­back as 
3 Recall that some NPD sites used a separate computer for monitoring the NPD traffic (Table XIV). All of the analysis 
in this chapter concerns the clock of the host used in tracing the traffic, as that is the only clock relevant to our subsequent 
analysis. 

191 
the window slides or the congestion window opens (x 9.2.2), and these then provide an opportunity 
to observe minimally­spaced timestamps. TCP receivers, on the other hand, receive these packets 
spaced out by the bottleneck bandwidth (Chapter 14), generally well above the clock resolution. 
Furthermore, most implementations will wait to send an ack until the receiving application has read 
at least two packets' worth of data (x 11.6.1), which will entail extra delay, perhaps more than the 
clock's true resolution. 
12.5 Assessing relative clock offset 
In this section we discuss how to estimate the relative offset between two network clocks. 
The closer the offset is to zero, the greater the relative clock accuracy (degree of synchronization). 
For our purposes, estimating relative offset is not crucial to our subsequent analysis of network 
dynamics. We only need to do so in order to construct legible plots of the two­way flow of packets 
and acks, and to qualitatively investigate the relationship between large relative offset and other 
clock problems such as relative skew. Accordingly, we are satisfied with the method developed in 
this section even though it is not highly accurate. 
12.5.1 Method for assessing relative offset 
Let \DeltaT ps be the time required to send a packet p s from host s to host r. In general, we 
refer to this time as the ``one­way transit time'' or ``OTT.'' Suppose p s is sent from s with a timestamp 
T s from s's clock, and it is received at r with at local timestamp T r . If the clock C r were perfectly 
synchronized with C s , then we would have \DeltaT ps = T r 
s 
estimate \DeltaT ps and then use that estimate to estimate \DeltaC r;s as follows. First, define: 
\Delta e 
T ps = T r 
e can then 
rewrite Eqn 12.1 as: 
\DeltaC r;s = \Delta e 
T ps 
transfer it is almost 
always large for packets from the sender to the receiver (the exception being the SYN and FIN 
handshake packets that delimit the connection, and the occasional very small data packet sent due 

192 
to buffer boundary mismatches), and always small for the acks sent in the reverse direction. We can, 
however, attempt to control for network conditions, by selecting the minimal observed \Delta e 
T ps . (Here 
we are applying the assumption that minima occur during times when the network is unloaded.) 
Selecting the minimal value works because (most) network­induced noise is additive and positive 
(x 12.6.2). Term the minimal value ffi e 
T ps . 
Similarly, we compute ffi e 
T pr for the acks sent in the opposite direction. Since 
\DeltaC r;s = 
the possibility that one or both of the network paths were never unloaded during the transfer, dif­ 
ferences in skew between C r and C s , and asymmetries in the routes in the two directions, which 
we know from Chapter 8 are quite common. While keeping these uncertainties in mind, we can 
manipulate Eqn 12.3 as follows. Combining: 
\DeltaC r;s = \Delta e 
T ps 
T pr ) 
= \Delta e 
T ps 
and ffi e 
T pr , and the second that: 
\DeltaT pr = \DeltaT ps : (12.5) 
Eqn 12.5 corresponds to an assumption that the OTTs in the two directions are the same. We know 
that this is not in general true, for the reasons given above, but are otherwise at a loss at how to 
rectify the clock readings. It is the inaccuracy of Eqn 12.5 that requires us to make only casual use 
of the estimate for C r;s , as discussed at the beginning of the section. We note that the Network Time 
Protocol must make this same assumption when attempting to synchronize clocks over the Internet. 
See Claffy et al. for further discussion [CPB93a]. 
With this assumption, we then have: 
\DeltaC r;s ß ffi e 
T ps 
T ps + ffi e 
T pr : (12.7) 
Eqn 12.7 offers an immediate self­consistency check: it should always be positive due to the un­ 
derlying ``network physics.'' Surprisingly, this test fails for 57 N 1 trace pairs and 30 N 2 pairs. We 
discuss these failures in more detail in x 12.8.1 below. 

193 
12.5.2 Relative offset for full­sized sender packets 
As discussed above, the bulk transfer sender s sometimes will send full, Maximum Seg­ 
ment Size (MSS; x 9.2) packets, and other times shorter packets, including some with no data 
whatsoever. If the path from s to r is slow (low bandwidth), then the shorter packets might arrive 
appreciably more quickly than the full­sized packets. Sometimes it is more convenient to discuss 
the relative clock offset and minimal RTT as computed when considering only the full­sized packets 
sent by s (and continuing to consider all of the packets sent by r, which tend to be acks of uniform 
size). To do so, we introduce the terms \DeltaC MSS 
r;s and min­RTT MSS 
sr . 
12.5.3 Results of assessing relative offset 
Using the methodology developed in x 12.5.1, we evaluated the relative clock offsets 
in N 1 and N 2 to see what sort of variation they exhibited. A single computation of \DeltaC r;s does 
not tell anything about the absolute accuracy of either C r or C s , but we would expect that many 
computations of different \DeltaC r i ;s j 
's will reveal clusterings among the truly accurate clocks, and a 
large spread among the inaccurate clocks. 
Maximum relative offset 
In N 1 , the largest observed offset was 207,982 seconds (2.4 days!). Overall, 42 times we 
observed an offset greater in magnitude than 1,000 seconds, almost all greater than 10,000 seconds. 
All of the host pairs with these large offsets included austr, and the problem clearly lay with its 
clock. We will see the reason for this in x 12.7.7 below. 
In N 2 , the largest offset was 824 seconds (13+ minutes). We observed an offset larger 
than 6 minutes 782 times, always with oce as one of the hosts. We will likewise see in x 12.7.8 
that oce's clock and network paths have puzzling properties. These two outliers are thus suggestive 
that, upon observing a very large relative clock offset, we should consider the possibility of other 
clock errors. 
Median relative offset 
We next look at clustering host clocks based on the magnitude of their median relative 
clock offset for all the traces in which they participated. We use the median offset in order to isolate 
hosts that consistently had large relative offsets, instead of those that only occasionally had large 
offsets, since the latter could be skewed by unfortunately­frequent pairing of a host with an accurate 
clock together with a host with a poor clock. We use the median of the absolute value of the offset 
rather than the median of the offset itself as a way of detecting hosts that often ``swing'' from being 
too slow to too fast. For each host, we analyze the relative offsets for those traces in which it was 
the source; these are quite similar (though opposite in sign) to the offsets when it was the receiver, 
and limiting our analysis to just when the host was the source simplifies the presentation. 
Figures 12.1 and 12.2 shows the median magnitudes of each host's relative clock offset. 
In both, oce is a clear outlier, being typically 5--15 minutes different from the other clock. Note 
that, for N 1 , austr is not a particularly striking outlier, even though in the previous section we 
identified it as having the largest maximum clock offset magnitudes. The reason it is not an outlier 
in Figure 12.1 is that its clock ran accurately for most of N 1 , and only degraded late during the 

194 
xor 
wustl 
ustutt 
usc 
unij 
umont 
umann 
ukc 
ucol 
ucl 
sri 
sdsc 
pubnix 
oce 
nrao 
mit 
lbli 
lbl 
korea 
inria 
harv 
connix 
bsdi 
bnl 
austr 
0 50 100 150 200 250 300 
Seconds 
Figure 12.1: Median magnitude of clock offset, N 1 tracing hosts 
wustl 
ustutt 
unij 
umont 
umann 
ukc 
ucol 
ucla ucl 
sri 
sintef2 sintef1 
sdsc 
sandia 
rain 
pubnix 
panix 
oce 
nrao 
near mid 
lbli lbl 
inria 
harv 
connix 
bsdi bnl 
austr2 austr 
adv 
0 200 400 600 800 
Seconds 
Figure 12.2: Median magnitude of clock offset, N 2 tracing hosts 

195 
experimental run (see below). Hence its median relative offset over all of the transfers it participated 
in is quite small. 
Both figures show other apparent outliers in addition to oce. We need to be careful before 
removing them, though, as there is a possibility that some of them have unusually high proportions 
of their connections to the other outliers, and hence are outliers only by ``association.'' Thus we 
remove the connections involving the largest outlier and recompute the plot, then remove those 
involving what is now the largest remaining outlier and recompute the plot, and so on, similar to 
the approach developed in x 7.6.1 for assessing the ``persistence'' of Internet routes. For N 1 , this 
process removes oce, korea, bnl, harv, sdsc, xor, lbli, and pubnix as being outliers. Note 
that, during the iterative process, austr ceased to be an outlier, even though in Figure 12.1 it looks 
like it has almost as large a median offset as pubnix: this is because it was an outlier only by 
association with larger outliers. After eliminating these hosts, the remainder all have median offsets 
! 1:25 sec. We consider this group of 17 hosts as closely synchronized. We can, if we wish, 
continue the process to find a core group of highly synchronized hosts: they are austr (!), bsdi, 
mit, nrao, and ukc, all with median offsets ! 10 msec between one another. 
For N 2 , outlier removal eliminates the six largest spikes in Figure 12.2, namely, oce, 
ucla, lbli, bnl, wustl, and ucl, these last two having relatively small median offsets of 3 and 
1.5 sec, respectively. We consider the remaining group of 25 hosts as closely synchronized. They 
all have median offsets ! 600 msec, and, if lbl is removed from the group, they are all below 250 
msec. Eliminating six more of the hosts with the largest median offset leaves a group of 18 syn­ 
chronized hosts, with median offsets below 50 msec. We can further winnow the group down to a 
final set of highly synchronized hosts, adv, connix, harv, near, nrao, pubnix, sdsc, sintef2 
(but not sintef1), ucol, and unij, all of which have median offsets between each other of less 
than 10 msec. Note that this group includes hosts on both coasts of North America as well as two in 
Europe, indicating synchronization well below that of the propagation time between the hosts--very 
good, and around the accuracy limit for NTP reported in [Mi92b], even though we are performing a 
cruder estimate of accuracy (and of relative accuracy rather than absolute accuracy). 
We will make use of these different groups of closely­synchronized and highly­ 
synchronized hosts in x 12.9 when we test whether high clock accuracy (which we assume can 
be inferred from close synchronization, although this is not necessarily the case) tends to correlate 
with low relative clock skew. 
Evolution of relative offset 
We finish with a look at how a host's relative offset evolves over the course of an experi­ 
mental run. The evolution is interesting because it provides a large­scale look at how clock accuracy 
changes. Our interest here is phenomenological---to develop an appreciation for clock inaccuracies 
and an awareness of how they occur. 
To assess offset evolution, for each host we constructed a plot with the relative offsets (in 
seconds) computed for those connections for which it served as the data source, using the method­ 
ology given in x 12.5, on the y­axis; versus the time of the connection (days since the beginning of 
the experiment) on the x­axis. Since the plots are for the host as the data source, the offsets reflect 
the receiver's clock minus the host's clock. Hence, positive values indicate the host's clock was 
running behind the receiver's clock. Note that we include the sign of the offset in the plot---there is 
no need to use only the magnitude, as we did above. 

196 
Days 
Offset 
(sec) 
0 5 10 15 
0 
50000 
100000 
150000 
200000 
Figure 12.3: Evolution of austr's relative clock offset over the course of N 1 
Figure 12.3 shows such a plot for the austr tracing host's clock over the course of the N 1 
experimental run. This is the site that we identified above as sometimes having very large relative 
clock offsets, on the order of days, yet also, surprisingly, found not to be an outlier in terms of its 
median relative offset. From the figure, it is immediately clear how to reconcile the findings: up 
until the 14th day of austr's participation in N 1 , it kept good time, but after that point its clock 
came unglued and ran very slowly, such that the clocks of the other hosts to which it transferred data 
ran further and further ahead of it (hence, higher and higher offsets). We look at this phenomenon 
further in x 12.7.7. 
Figure 12.4 shows the evolution of N 1 's greatest median offset outlier, oce, after elimi­ 
nating its connections with austr. The central points in the plot reflect connections for which oce 
was paired with sites that had a clock closely synchronized to true time (or at least, so we presume, 
because of the preponderance of such clocks in the plot). 4 ``Noise'' values distant from the central 
points reflect pairings with other sites that had poorly­synchronized clocks. 
We see that the 5 minute median offset actually grew increasingly negative over the course 
of N 1 . A robust linear fit (shown in the plot) to the points yields an overall offset decrease of about 
1.5 sec/day. This is quite small compared to the magnitude of the offsets themselves. 
Figure 12.5 shows the evolution of bnl's relative clock offset, with connections to oce 
removed. The central line appears to show an increasing trend, but a somewhat complicated one. 
To look at it in greater detail, Figure 12.6 examines just the region of the line. We observe what 
appear to be three separate regions of clearly upward trend, one spanning 0--5 days, one spanning 
8--14 days, and one spanning 15--16 days. Each increase corresponds to about 0.7 sec/day. What 
is puzzling are the offset shifts between the regions. These appear to be too small to have been 
4 As discussed in x 12.2, and revisited below in x 12.9, we did not require NTP synchronization of the clocks of the 
sites in our study. In addition, we assume that when we discover highly synchronized clocks, that the synchronization was 
achieved using NTP. Regrettably, we did not ask the participating sites specifics regarding the site's clock synchronization. 

197 
Days 
Offset 
(sec) 
0 2 4 6 8 10 12 
­450 
­400 
­350 
­300 
­250 
­200 
Figure 12.4: Evolution of oce's relative clock offset over the course of N 1 
Days 
Offset 
(sec) 
0 5 10 15 
0 
50 
100 
150 
200 
250 
Figure 12.5: Evolution of bnl's relative clock offset over the course of N 1 

198 
Days 
Offset 
(sec) 
0 5 10 15 
115 
120 
125 
130 
135 
Figure 12.6: Expanded view of the central line in the previous figure 
caused by someone adjusting bnl's clock by hand, and too far from true to have been induced 
by NTP synchronization. Perhaps the changes came from temporary changes in machine­room 
temperatures, which are known to alter clock skew [Mi92b]. 
Figure 12.7 shows the evolution of xor's clock during N 1 , after removing connections to 
austr and oce. It shows not only a steadily increasing relative offset, but a 2­minute adjustment 
around day 6. We look at clock adjustments in more detail in x 12.6 below. 
Figure 12.8 shows the evolution of oce's relative offset over the course of N 2 (as op­ 
posed to N 1 in Figure 12.4). The sustained decreasing offset is striking; the fit corresponds to 
:4 sec/day. Figure 12.9 shows the evolution of lbli's clock during N 2 . While overall the clock 
has a clear persistent skew, the skew is reversed around day 8, perhaps in an effort to correct the 
clock's inaccuracy. But the effort ends a few days later and the original skew returns. However, 
around day 27 the clock's relative offset jumps by over a minute, reflecting a different sort of cor­ 
rection. 
Figure 12.10 shows how sandia's clock evolved during N 2 . For most of the experimental 
run the clock performs very smoothly, but around day 20 it began a slow increase over the next week, 
eventually reaching 3 seconds. During this week it initiated transfers to a number of different other 
sites, so this effect is definitely due to its own clock variation rather than those of its NPD peers. 
Figure 12.11 presents our last example of interesting clock offset evolution, that for 
umont's clock during N 2 . What is striking here are the presence of offset ``towers'' that, over 
the course of hours, slowly elevate the relative offset from nearly zero to several hundred millisec­ 
onds. Apparently what is happening is that umont's clock has a fairly hearty intrinsic skew, but 
NTP synchronization is detecting this and periodically resetting the clock as it strays too far. We 
will see more regarding this behavior of umont's clock below in x 12.6.5. 

199 
Days 
Offset 
(sec) 
0 5 10 15 
­100 
­50 
0 
50 
100 
150 
Figure 12.7: Evolution of xor's relative clock offset over the course of N 1 
Days 
Offset 
(sec) 
0 10 20 30 
­810 
­800 
­790 
­780 
­770 
­760 
Figure 12.8: Evolution of oce's relative clock offset over the course of N 2 

200 
Days 
Offset 
(sec) 
0 10 20 30 
­100 
­80 
­60 
­40 
­20 
0 
Figure 12.9: Evolution of lbli's relative clock offset over the course of N 2 
Days 
Offset 
(sec) 
0 10 20 30 
­2 
­1 
0 
1 
2 
3 
Figure 12.10: Evolution of sandia's relative clock offset over the course of N 2 

201 
Days 
Offset 
(sec) 
0 10 20 30 
­1.0 
­0.5 
0.0 
0.5 
1.0 
Figure 12.11: Evolution of umont's relative clock offset over the course of N 2 
12.6 Detecting clock adjustments 
As shown quite strikingly in Figures 12.7 and 12.9, computer clocks are sometimes sub­ 
ject to abrupt adjustments in which the clock's notion of the current time is changed, either gradually 
or instantaneously (x 12.2). Gradual change is produced by artificially altering the clock's skew, so 
that it slowly alters its offset towards the target. Instantaneous change is produced by simply loading 
a new value into the clock register. 
In order to characterize Internet packet dynamics, we will make heavy use in later chapters 
of variation in one­way trip times (OTTs). A clock adjustment will result in a systematic shift in 
OTTs between those computed prior to the adjustment and those computed after (illustrated below). 
If undetected, such a shift can lead to completely erroneous findings of periods of sustained high 
delay. Since we are very interested in the possibility that network dynamics truly have this property 
anyway, it is vital that we reliably detect clock adjustments so as not to be fooled by them into 
drawing such a conclusion. 
Backward clock adjustments, in which a clock is set to a value it already registered in 
the past, can sometimes be easily detected if the adjustment is large, by the presence of a pair of 
timestamps T 1 and T 2 for which T 2 ! T 1 even though T 2 was recorded after T 1 . We refer to this 
sort of adjustment as ``time travel,'' and already analyzed it in x 10.3.7. In this section we tackle the 
harder problem of clock adjustments (both forward and backward) that are not apparent by trivial 
inspection of the timestamp sequences. 
12.6.1 A graphical technique for detecting adjustments 
Suppose we have a trace pair between s and r. One simple way to detect whether a clock 
adjustment occurred during the trace is to plot both the OTTs for the packets from s to r and those in 

202 
Time (sec) 
One­way 
Delay 
(msec) 
0.0 0.5 1.0 1.5 2.0 
0 
10 
20 
30 
40 
Figure 12.12: OTT­pair plot illustrating a clock adjustment (sender packets are filled, receiver pack­ 
ets are hollow) 
the reverse direction. (Packets that are dropped have no OTT associated with them and are omitted 
from the plot.) 
Figure 12.12 shows such a plot made for a connection from sdsc to usc in N 1 . The 
solid black squares indicate the OTT for packets sent from the sender to the receiver, and the hollow 
squares reflect the OTTs of the acks sent from the receiver to the sender. The OTTs have been ad­ 
justed using Eqn 12.6 to approximately synchronize the two clocks. (In this case, the approximation 
does not work particularly well, since there is more than one clock offset to estimate!) 
The figure shows a striking level­shift occurring for the sender's OTTs around time 
T = 0:7 seconds, a fall of about 10 msec. Furthermore, the OTTs in the opposite direction show 
an equal and opposite change. This equal and opposite change is a crucial aspect of the plot, as it 
is the signature of a clock adjustment. If the shift were due to a change in network path properties 
(for example, a route change), then in general we would expect that (1) either it would occur in only 
one direction, or (2) if it occurred in both directions due to a coupled effect, it would have the same 
sign. 
For a networking change to result in an equal­but­opposite level shift, some resource 
needs to have been shifted between the two directions of the network path, and furthermore the 
resource needs to affect the transit times of the small acks equally with those of the large data 
packets. It is difficult to see what sort of networking change could do this (but see x 12.7.8). The 
change, however, makes perfect sense if, at around time T = 0:7 seconds, sdsc's clock was set 
ahead 10 msec, or usc's clock was set back 10 msec. In either of these cases, the difference in the 
timestamps for packets sent from sdsc to usc, i.e., the quantity \Delta e 
T ps defined in Eqn 12.2, will 
decrease by 10 msec, and similarly \Delta e 
T pr will increase by 10 msec. This is exactly the behavior 
shown in the plot. 

203 
12.6.2 Removing noise from OTT measurements 
Two other points concerning Figure 12.12 merit attention. The first is the presence of 
a few unusually small sender packet OTTs, one of about 7 msec around T = 0, and the other of 
around 
MSS 
r;s , as discussed in x 12.5.2). Both of these reflect sender packets that did not carry any data 
(the SYN and FIN connection management packets). These travel through the network more quickly 
than full­sized data packets. Often in OTT plots we will include such packets (as they are a useful 
reminder of one source of OTT variation), but we need to be careful when developing techniques 
for analyzing OTT behavior to remember that these packets have unusually low OTTs due to their 
size. Hence our techniques need to be careful to not weigh their OTT values the same as those for 
full­sized packets. 
The second important point shown in the plot is the large variation in OTTs, both for the 
full­sized sender packets and the receiver packets. For example, note that the OTTs of both some 
of the acks before the adjustment, and some the data packets after the adjustment, are larger than 
many of the OTTs on the other side of the adjustment. This variation is the first suggestion that we 
will require robust algorithms in order to not be fooled by noise when analyzing OTT data. The eye 
quite readily picks out the twin level shifts in this plot, but doing so algorithmically requires care to 
screen out noise such as these large OTT values. 
OTTs often exhibit considerable network­induced noise in terms of deviation of a given 
OTT from the value expected if the network were unloaded. The noise, however, has one crucial 
property that often makes it tractable: barring a significant change in the network path (such as a 
route change), the noise always takes the form of an additive, positive increase. This means that, 
given a set of OTT measurements, we can often hope to find those with very little network­induced 
noise by looking at the smallest values in the set. 
We used this property of OTT noise in x 12.5.1 above when we picked ffi e 
T ps and ffi e 
T pr as 
the measured raw offsets to use when attempting to estimate the relative clock offset. We will use 
it again when developing methods to detect clock adjustments and skew. For these latter, what is 
interesting are trends in how the OTT values (with noise removed) change over the course of the 
connection. Thus, we cannot simply de­noise the OTT values by selecting the global minimum, 
or we will obliterate the trend. Instead we divide the series of OTT values up into intervals and 
de­noise each interval by selecting the minimum value observed during the interval. The question 
then becomes which intervals to use. 
One natural way of devising intervals is to allocate them so that each has the same number 
of packets. Another is to choose them so that they each span the same amount of time. For assessing 
trends in OTT values over time, the latter seems to be the natural choice. But using fixed­time 
intervals has a fundamental problem. Sometimes a connection's activity primarily occurs during 
only a small portion of the connection's total duration, with the rest of the time mostly inactive due 
to lengthy retransmission timeout lulls. 
To address this difficulty, we combine the two approaches by choosing both a packet­ 
count interval, I p , and a duration interval, I t . We then advance through the OTT timings and group 
timings into a single interval whenever we have either encountered I p packets, or we have reached a 
point I t from the beginning of the interval. At this point, if we have any packets at all, we take their 
minimum as the de­noised OTT value for the interval, and we begin a new interval by resetting the 
packet count and setting the start of the interval to coincide with the next OTT measurement. 

204 
One detail we must attend to is the final partial interval at the end of the connection. It in 
general will not span I t nor have a full I p 's worth of packets in it. We adopted the rule that, if the 
interval had more than I p =2 packets, we included it, otherwise we skipped it. 
The final issue is how to pick I p and I t . For a set of n OTT measurements spanning an 
interval \DeltaT , we used: 
I p = b 
p 
nc; 
I t = \DeltaT = 
p 
n: 
Using these choices means that the number of de­noised OTT values scales as the square­root of 
the total number of values. This struck us as a good compromise between preserving sufficient 
detail without using too fine a resolution (which could mean we do not effectively remove noise). 
Furthermore, we anticipate subsequently applying a number of robust algorithms to the de­noised 
values, some of which have running times of O(n 2 ) or higher. For these, if we present them with 
only O( 
p 
n) values, then the total running time will remain O(n) or only slightly higher, which is 
important for performing fast automatic analysis. 
We will refer to a measured series of OTT values as x t . Here, x t can reflect either a series 
of data packet OTTs, or ack OTTs. To detect adjustments ultimately requires comparing properties 
of the data packet OTTs with those of the ack OTTs, but prior to developing the tests on these 
properties, our discussion will apply to any generic series of OTT values. 
We denote the de­noised series derived from x t as Ÿ 
x t . Note that for each Ÿ 
x t , the index t 
corresponds to the same index as where in the interval we found the (first) minimal value of x t . This 
is an important point---if we instead adjusted the index to reflect, say, the middle of the interval, then 
we might introduce inaccuracies in the trends. The key idea is that the ``best'' (least noisy) value of 
x t during the interval occurred at a particular t, and we want to take that point and discard all the 
others in the interval. 
Figure 12.13 shows the results of applying this de­noising method to the measurements 
plotted in Figure 12.12. 
12.6.3 An algorithm for detecting adjustments 
We now turn to attempting to detect adjustments algorithmically, since it is infeasible to 
manually inspect all 20,000 of our trace pairs to look for adjustments (x 9.1.4). The central notion 
we will use is that of the signature of the OTTs in the two directions showing equal but opposite 
level shifts. 
Identifying pivots 
The foundation of our approach lies in identifying pivots: points in time before which 
the OTTs all lie predominantly above or below all the OTTs after the given point in time. In Fig­ 
ure 12.12, the pivot we aim to identify occurs around T = 0:7 sec. 
In this subsection we develop a heuristic for identifying pivots in the series of OTTs for 
packets sent in a single direction (from s to r or vice versa). In the next subsection we then analyze 
the pivots identified in both directions to test for a clock adjustment. 

205 
Time (sec) 
One­way 
Delay 
(msec) 
0.0 0.5 1.0 1.5 2.0 
0 
10 
20 
30 
40 
Figure 12.13: Same measurements after de­noising pair­plot 
Let Ÿ x t be a series of de­noised OTT values occurring at times t, ordered by the time 
index t. Let Ÿ 
x t i 
be the same series numbered from i = 1 : : : n, where t i is the ith measurement 
time. 
We define a pivot partition of Ÿ x t as a partition of Ÿ 
x t into two disjoint sets, Ÿ 
x 0 
t and Ÿ x 00 
t , for 
which the maximum of one set is less than the minimum of the other. Without loss of generality, let 
Ÿ 
x 0 
t be the ``larger'' of the two sets, i.e., its minimum is larger than the maximum of Ÿ x 00 
t . 
We further require that the time intervals spanned by Ÿ x 0 
t and Ÿ 
x 00 
t are disjoint, namely either 
the largest i in Ÿ 
x 0 
t i 
is less than the smallest j in Ÿ x 0 
t j 
, or vice versa. 
We term the pivot partition positive if the measurements Ÿ 
x 0 
t occurred after those in Ÿ 
x 00 
t , and 
negative otherwise. 
Geometrically, this definition corresponds to being able to draw horizontal and vertical 
lines on a plot like that in Figure 12.13 such that either all of the points lie in the first and third 
quadrants formed by the lines (if positive), or they all lie in the second and fourth quadrants (nega­ 
tive). 
It is important to note that a given series Ÿ x t may have more than one pivot partition. For 
example, if Ÿ 
x t is strictly decreasing, then every value of t gives rise to a pivot partition. In addition, 
any time the largest or smallest value of Ÿ x t occurs at the lowest value of t, i.e., Ÿ x t 1 , then there is a 
pivot partition that isolates that one value versus placing all the other values in the other partition 
set. Generally, this is not a pivot partition of interest. 
We proceed as follows. First, we determine whether to search for a positive or negative 
pivot by inspecting whether Ÿ 
x t 1 
is less than or greater than Ÿ 
x t n 
. From here on, we assume without 
loss of generality that we wish to detect a positive pivot, such as the one exhibited by the receiver 
packets (hollow squares) in Figure 12.12. We indicate in brackets, like [this], the changes we make 
to the algorithm when testing instead for a negative pivot. 
We search through the measurements to find the point k where 

206 
min(Ÿx t k+1 
; Ÿ 
x t k+2 
) pivot, 
min(Ÿx t k 
we spread 
the differencing over the additional intervals on either side to combat the problem of the intervals 
right at the pivot misleading us due to noise. Note that this spreading operation also means that we 
cannot detect a pivot that occurs right at the beginning or end of a connection (x 12.6.5). 
k is now the candidate pivot (actually, the potential pivot occurs at a point in time between 
measurement k and measurement k + 1). We then inspect the points Ÿ k to find Ø k , the largest 
[respectively, the smallest] point before the candidate pivot, and likewise those ? k to find Ø k+1 , 
the smallest [largest] after the candidate. If Ø k is less [greater] than Ø k+1 , then we conclude that 
[k; k + 1] does indeed straddle a pivot; otherwise, we conclude they do not. 
If we find a pivot partition, then we define its magnitude M as the absolute value of the 
difference between the median of the points after the pivot with the median of those before. We also 
associate a pivot width, W = t k+1 
Identifying adjustment signatures 
We now turn to identifying the signature of a clock adjustment for the clocks of two hosts, 
s and r. The method we developed is not entirely satisfying, as it uses some heuristics in order to 
accommodate residual noise in the OTT measurements, while attempting to not mistake genuine 
networking effects for a clock adjustment. However, the method appears to work well in practice. 
We note, though, that the method assumes that clock adjustments are relatively rare events: rare 
enough that our traces are likely to exhibit at most one adjustment, and that the likelihood of both 
of the clocks we are comparing exhibiting an adjustment during the trace is negligible. 5 
Suppose we have two sets of de­noised OTT measurements, Ÿ s t and Ÿ r t , corresponding to 
full­sized packets from the data sender to the receiver, and acks in the other direction, respectively. 
If either of Ÿ 
s t or Ÿ r t does not exhibit a pivot, or if the pivots are both positive or negative, then we 
conclude there was not any clock adjustment. 
Let M s , W s , M r , and W r be the magnitudes and widths of the corresponding pivots. We 
next check whether the pivots overlap. Let s 1 and s 2 denote the packets bracketing Ÿ s t 's pivot region, 
and likewise for r 1 and r 2 . Let s s 
1 denote the time at which s 1 was sent from s (according to s's 
clock), and s r 
1 the time at which it arrived at r (according to r's clock). With analogous definitions 
for the other packets, we then conclude that the pivots overlap if either of the following holds: 
s r 
1 ! r r 
2 + ffit and 
s r 
2 + ffit ? r r 
1 ; 
or 
r s 
1 ! s s 
2 + ffit and 
r s 
2 + ffit ? s s 
1 ; 
5 This assumption might be violated if NTP updates among widely separated clocks sometimes happen in synchroniza­ 
tion. To our knowledge, the possibility of this occurring for NTP has not been studied. Given the findings of synchronized 
routing messages reported in [FJ94], it does not seem completely implausible. 

207 
where ffit is the allowed measurement ``slop'', which we set to: 
ffit = 
max(W s ; W r ) 
2 
: 
The idea behind the slop is to allow for other­than­instantaneous adjustments (illustrated below). 
If the pivots do not overlap, then we conclude there was no adjustment. If they do, we 
then next look at the magnitudes of the pivots. If either magnitude is less than the larger of twice 
the joint clock resolution R s;r (x 12.3), or 2 msec (an arbitrary value to weed out fairly insignificant 
adjustments), then we declare the pivot ``insignificant'' and ignore it. 
Finally, we look to see whether M s and M r are within a factor of two of each other. If 
not, then we term the pivot a ``disparity pivot,'' meaning that it may be due to unusual networking 
dynamics (x 12.6.5). If the two agree within a factor of two (which experience has shown is a good 
cut­off point), then we conclude that the trace pair exhibits a clock adjustment with a magnitude of 
about Ms+Mr 
2 . 
12.6.4 Results of checking for adjustments 
tcpanaly uses the method given in x 12.6.3 to check each trace pair it analyzes for clock 
adjustments. Doing so, we found 36 trace pairs in N 1 out of 2,335 (1.5%) that exhibited clock 
adjustments, and 128 out of 15,492 in N 2 (0.8%). While these proportions are fairly low (and not 
representative, since the behavior of the individual hosts in our study is not necessarily representa­ 
tive), they are high enough to argue that a large­scale measurement study for which accurate times­ 
tamps are important needs to take into account the possibility of clock adjustments. Furthermore, 
the adjustments are only detectable due to the use of a pair of clocks. If a study uses timestamps 
from only one measurement endpoint, then checking the timestamps for clock adjustments becomes 
much more difficult. The median adjustments were on the order of 10--20 msec, the mean around 
100 msec, and the maxima close to 1 sec. These magnitudes are unfortunately small enough to 
sometimes not be glaringly obvious, but large enough to be comparable to wide­area packet transit 
times, so they can introduce quite large analysis errors if undetected. 
While clock adjustments are usually abrupt, this is not always the case. The adjustment­ 
detection method found some clock adjustments that occurred due to a short period of altered clock 
frequency (i.e., temporary skew). Figure 12.14 shows a striking example. 6 Here, around time 
T = 40 sec the sender's clock began running more quickly than the receiver's, leading to lower 
sender OTTs and higher receiver OTTs. Less than 20 seconds later, the frequencies were again 
equal, but the relative offsets between the clocks shifted by nearly 1 sec in the process. 
12.6.5 Problems with detection method 
The method given in x 12.6.3 works well in practice, but it does sometimes fail to detect 
clock adjustments. In this section we look at some cases where we identified this happening. 

208 
Time (sec) 
One­way 
Delay 
(msec) 
0 10 20 30 40 50 60 
­500 
0 
500 
1000 
1500 
2000 
Figure 12.14: Clock adjustment via temporary skew 
Time (sec) 
One­way 
Delay 
(msec) 
0 2 4 6 8 10 
­100 
0 
100 
200 
Figure 12.15: Temporary skew leading to separate pivots 

209 
Time (sec) 
One­way 
Delay 
(msec) 
0 50 100 150 200 250 
500 
1000 
1500 
2000 
Figure 12.16: Clock adjustment masked by excessive network delays 
Failure to detect adjustment via skew 
In Figure 12.14 we illustrated how sometimes a clock adjustment can occur due to tem­ 
porary skew. Figure 12.15, however, shows such a case that the method fails to detect. The problem 
here is that, due to noise in the forward direction, the two pivots located by the method do not 
overlap, so the possibility of an adjustment is rejected. The lefthand vertical line marks the pivot 
the method found for the data packets (solid), and the righthand vertical line marks the pivot for 
the acks (hollow). In general, this sort of failure will only occur with adjustments using tempo­ 
rary skew; abrupt adjustments have sharply defined pivots. This example does, however, exhibit a 
negative estimate for min­RTT sr (x 12.5.1), so tcpanaly still flags it as having a clock problem. 
Excessive network­induced delay 
Figure 12.16 shows a case where the reverse path exhibits a clear level shift around 
T = 70 sec, with a magnitude of about 250 msec, but the corresponding shift on the forward path 
is less clear because it is accompanied by an increase in networking delays, too. In that direction, 
tcpanaly assesses the magnitude of the shift as about 730 msec. Since this is more than twice the 
magnitude in the other direction, tcpanaly rejects the possibility of a clock adjustment. 
tcpanaly flags a trace pair like this as having a ``disparity pivot,'' namely common pivots 
that have too great a difference in their magnitudes to be considered a clock adjustment. Disparity 
pivots are quite rare (only 61 in N 2 ). We inspected each one and found that only the one shown 
above was a likely clock adjustment. The rest appear simply due to unfortuitous patterns of noise, 
often in truncated traces (x 10.3.4) with few OTT timings. 
6 Note that the OTTs in the plot have not been ``de­noised'' (discussed in x 12.6.2). Likewise, subsequent OTT plots 
do not show de­noised OTTs unless so stated. 

210 
Time (sec) 
One­way 
Delay 
(msec) 
0 2 4 6 8 10 
­200 
0 
200 
400 
600 
Figure 12.17: Clock adjustment missed because too close to end of connection 
Adjustment too close to connection edge 
Since our method for identifying pivots (x 12.6.3) will not accept a pivot right at the 
beginning or at the end of a connection, tcpanaly naturally will miss this sort of adjustment 
should it occur. Figure 12.17 shows an example. This one, like the one above, is still detected by 
tcpanaly due to a negative estimate for min­RTT sr . 
Multiple adjustments 
The development of the clock adjustment detection algorithm presumes that there is a sin­ 
gle clock adjustment to be detected. Sometimes a trace pair suffers from more than one adjustment, 
and the algorithm either only detects one of them (which suffices, if the policy is to discard trace 
pairs with any adjustments in them), or fails to detect any of them. The latter is particularly likely if 
there are two adjustments in opposite directions. Figure 12.18 shows a striking example of a trace 
pair with two adjustments, both effected using temporary skew. The algorithm fails to detect these 
adjustments, but tcpanaly flags the trace pair due to a negative estimate for min­RTT sr , as well as 
due to strong negative correlation between the two directions (x 12.6.6 below). 
Clock ``hiccups'' 
Related to the multiple adjustments discussed in the previous subsection are clock ``hic­ 
cups,'' in which one of the clocks in a trace pair momentarily either ceases to advance or advances 
very quickly. Figure 12.19 shows an example, occurring at time T = 6 sec. It is possible that this 
example is actually due to surprising network dynamics, as the 4 acks with lowered OTTs come 
right after the only packet reordering event in the trace. While a clock glitch can change the value of 

211 
Time (sec) 
One­way 
Delay 
(msec) 
0 20 40 60 80 100 120 
­400 
­200 
0 
200 
400 
600 
Figure 12.18: Double clock adjustment (both using temporary skew) 
Time (sec) 
One­way 
Delay 
(msec) 
0 2 4 6 8 
80 
100 
120 
140 
160 
180 
Figure 12.19: Clock adjustment ``hiccup'' 

212 
OTTs, it cannot reorder packets on the wire! But it is difficult to see what networking mechanism 
could lead to the data packets in the opposite direction simultaneously experiencing increased delay. 
This hiccup is undetected by tcpanaly. 
12.6.6 Detecting adjustments via correlation 
When we examine a smoothed OTT pair plot such as that in Figure 12.13, a different 
approach for detecting adjustments suggests itself: look for strong negative correlation between the 
forward OTTs and the reverse OTTs. In general, this approach suffers from two problems. 
First, it is highly susceptible to error due to large noise elements. Periods of inflated OTT 
values (such as due to an increase in queueing) tend to dominate the computation of the coefficient of 
correlation. We attempted to address this difficulty by devising a ``robust coefficient of correlation'' 
based on the direction of deviations from the median, but this proved no better: we were unable to 
eliminate the dominant effects of noise. 
The second problem is that strong negative correlation is also a signature for relative 
clock skew, as discussed in the next section. So, by itself, it does not suffice for detecting clock 
adjustments. 
There is still a role for correlation testing, though. In particular, if we only consider cor­ 
relation significant when it is extremely strong, then the noise effects of momentary congestion 
periods diminish, and the approach holds promise for detecting cases of large adjustments and rela­ 
tive skews. In particular, very strong correlations can detect multiple adjustments and adjustments 
via skew, and this property motivated us to pursue it further. 
The method we devised is based on examining the intervals produced when looking for 
pivots. For each interval i, we compute the median of the OTT of the packets sent by the sender 
(either full­sized data, or acks, depending on the direction). Call this s m i 
. Similarly, for the packets 
received by the sender from the receiver during the interval, we compute their OTT median, r m i 
. 
(We require that at least three packets were sent and another three received, otherwise we skip the 
interval in our analysis.) We then compute ` s;r , the coefficient of correlation between the s m i 
's 
and their corresponding r m i 
's. Similarly, we compute ` r;s in the opposite direction. That is, we 
construct similar intervals based on packet departures and arrivals at r instead of at s. 
If tcpanaly finds that both ` s;r ! as 
exhibiting strong negative correlation. We then inspect the trace pair by hand (i.e., using an OTT 
pair plot) to determine the source of the correlations. 
We found that connections only very rarely have the property of strong negative correla­ 
tion. (If, however, we lower the threshold from 
1 , only two 
trace pairs were flagged. One of these was the double­adjustment shown in Figure 12.18. In N 2 , six 
connections were flagged. Five of these, however, involved oce, which we show below (x 12.7.8) 
to have highly unusual behavior in general. The sixth is an ``edge'' clock adjustment similar to that 
shown in Figure 12.17. 
The second N 1 trace pair with strong negative correlation is quite interesting, however. 
Figure 12.20 shows the corresponding OTT pair plot. It is clear that the correlation stems from the 
tendency for the reverse­path OTTs to climb sharply, by 100--200 msec, followed shortly by the 
forward­path OTTs falling by roughly the same amount. Another striking feature of the plot is the 
sustained elevated level for the forward OTTs after about time T = 3 sec. 

213 
Time (sec) 
One­way 
Delay 
(msec) 
0 5 10 15 
100 
200 
300 
400 
500 
Figure 12.20: An OTT pair plot showing strong negative correlation 
These two features are fundamentally related. The link connecting the sender of this 
connection to the rest of the Internet had a capacity of 56 Kbit/sec, or under 7 Kbyte/sec after link­ 
level overhead is deducted. Thus, it was not difficult for the sender to open its window sufficiently 
to build up a queue at this link's router. The size of the OTT increase reflects the size of this queue. 
Occasionally, the acknowledgements sent by the receiver are being compressed; that is, several of 
them all arrive at a queue, and there they have their spacing compressed because they are placed 
in the queue closely together. (See x 16.3.1 for a more detailed discussion.) The signature of ``ack 
compression'' on an OTT plot is a quick build­up in OTT (reflecting having to wait in the queue) 
followed by a likewise­quick decrease in OTT (as the back­to­back acks all leave the queue closely 
spaced together). 
By inspecting sequence plots corresponding to this connection, we see that what is hap­ 
pening is that the ack compression leads to a delay at the sender as it waits for the lead ack of the 
compressed group to arrive. During this delay, the queue at the 56 Kbit/s link connecting the sender 
to the Internet drains, so once the acks finally arrive and the sender sends out a bunch of packets, the 
first packet encounters very little queueing delay at the Internet link. This low delay is reflected in 
the plot by the dip in the sender OTTs. It then immediately climbs back up as the remaining packets 
in the bunch queue behind the lead sender packet. 
This effect occurs quite often in connections for which there is a low­speed bottleneck 
link. The example shown above, though, was the only one in which the effect was so strong as to 
be detected by the negative correlation test. 
12.7 Assessing relative clock skew 
Many of the clock errors discussed in x 12.5.3---often skews on the order of perhaps a 
second a day---might seem trivial and perhaps not worth the effort of characterizing. For purposes 

214 
Time (sec) 
One­way 
Delay 
(msec) 
0 20 40 60 80 100 120 
100 
200 
300 
Figure 12.21: An OTT pair plot showing relative clock skew 
of keeping fairly good absolute time, this is true, but, for purposes of assessing network dynamics, 
it is not. 
To illustrate why skew is a crucial concern, consider evaluating OTTs between two hosts 
s and r, for which r's clock runs 0.01% faster than s's. That is, over the course of a day, r's clock 
will gain about 9 seconds relative to s's clock, not a particularly large error for many purposes. If, 
however, we are computing OTTs between s and r, then over the course of only 10 minutes r's clock 
will gain 60 msec over s's clock. If we assume that variations in OTT reflect queueing delays in the 
network, then this minor clock drift could lead to a large false interpretation of growing congestion. 
For example, if s sends 512 byte packets to r and the bandwidth of the path between them is T1 
(x 14.7.1), then a true 60 msec increase in delay reflects the equivalent of an additional 23 packets' 
worth of queueing. Thus, quite ``minor'' skew differences between the two endpoint clocks can lead 
to quite large, erroneous assessments of queueing delay. 
Because we are very interested in accurately characterizing queueing time scales (x 16.4), 
it is vital that we determine whether a given pair of clocks suffer from skew. The first issue is 
then to identify a skew ``signature'' similar to that for clock adjustments shown in Figure 12.12. 
Figure 12.21 shows an OTT pair plot that exhibits a clear skew signature: the OTTs in one direction 
show a steady overall increase, while those in the opposite direction show a steady decrease. Both 
changes have a magnitude of about 120 msec over the 2 minute course of the connection, consistent 
with the receiver's clock advancing about 0.1% faster than the sender's clock. It is difficult to see 
what sort of network dynamics could introduce such a true combined inflation and deflation of OTTs 
over a two­minute period, so we conclude that the OTT pair plot shows strong evidence of relative 
clock skew. 
Two other clock skew signatures we investigated were differences in round­trip times 
(RTTs) reported by the endpoints in a connection, and strong negative correlations between the 
forward and reverse OTTs. The difficulty with evaluating RTT differences lies in limited clock 

215 
resolution 7 and noise making the RTTs in the two directions slightly different even in the absence 
of clock skew. The difficulty with looking for strong negative correlations is the same as discussed 
in x 12.6.6 above, namely that except in instances of very strong clock skew, there is too much noise 
to obtain a reliable decision based on the strength of the correlations. 
In the remainder of this section we develop robust algorithms for detecting and removing 
relative clock skew. 
12.7.1 Defining canonical sender/receiver skew 
Before we proceed with developing a method for identifying relative clock skew, we need 
to define exactly what quantity it is that we wish to estimate. First, we assume that the skew trends 
we identify will be linear. While we might possibly encounter non­linear skew, we did not find 
any clear examples of such in N 1 or N 2 , except those shown in x 12.6.5. For linear skew, we can 
summarize the skew using a single value that reflects the excess rate at which one clock advances 
compared to the other. 
To avoid ambiguity (in terms of which clock we are comparing to which), we will always 
quantify how C r , the receiver's clock, advances with respect to C s . Suppose C r runs a factor j 
faster than C s , by which we mean that, if C s reports that an interval \DeltaT has elapsed, then C r will 
have reported the same interval as having length j\DeltaT . Likewise, we can say that C s runs a factor 
1=j faster than C r (or, a factor of j slower). 
The algorithms we develop for estimating relative skew all work in terms of linear trends 
in OTT measurements. These trends are estimated based on how OTT measurements expand or 
shrink with respect to time. It is important to recognize that the phrase ``with respect to time'' does 
not mean ``with respect to true time,'' since we have no way of measuring true time. Instead, it 
means ``with respect to the packet originator's clock,'' that is, the clock associated with tracing the 
TCP endpoint that sent the packet. 
When discussing a linear trend in the measured OTTs of the packets sent by host s, we 
will quantify the trend in terms of G s , the growth in the OTTs of the packets sent by s. Suppose 
packet p 1 is sent at time T 1 
s , according to C s , and arrives at time T 1 
r , according to C r . Likewise, 
suppose packet p 2 is sent at T 2 
s and arrives at T 2 
r . Suppose further that the transit times of the packets 
are identical (no network­induced noise), so the only variation in their OTTs are due to clock skew. 
The measured OTTs for the two packets are: 
O 1 = T 1 
r 
ve skew between C r and C s , G s = G r = 0:0. If C r runs faster than C s , then 
the packets sent by s will exhibit increasing OTTs and those sent by r will exhibit decreasing OTTs, 
so we will have G s ? 0 and G r ! 0. Naturally, the reverse holds if C r runs slower than C s . 
7 For example, if the RTT is on the order of 100 msec, and the clock resolution is 1 msec, then only relative skews 
larger than 1% can be detected; these are very large. 

216 
We now relate G r and G s to j, the factor by which C r runs faster than C s . Continuing 
the example above, we have: 
G s = 
O 2 
2 
s 
= 1 + ffl, where jfflj ø 1, we have: 
G s = ffl; 
G r ß 
are equal but opposite), since this often appears to be the case when inspecting OTT 
pair plots. To ensure full accuracy, we instead take care to always use Eqns 12.8 and 12.9 to express 
relative clock skew in terms of j, or Eqn 12.10 to translate G r to G s . We will refer to values of G s 
and G r that are consistent with respect to Eqn 12.10 as ``equivalent but opposite'' slopes. 
12.7.2 Difficulties with noise 
One particular problem with testing for clock skew is that one of the paths can have 
such highly variable OTTs due to queueing fluctuations that these completely mask the smaller­ 
scale trend of OTT increase or decrease due to skew, even after de­noising. Figure 12.22 shows 
an example, in which congestion on the forward path completely obscures the relative clock skew, 
which is apparent from the enlargement of the return path shown in Figure 12.23. Such noise most 
often obscures the forward path (presumably due to extra queueing induced by the data packets), but 
it can also obscure the reverse path. Thus, we cannot always rely on the signature of dual equivalent­ 
but­opposite OTT trends; sometimes we must settle instead for simply a compelling trend in one 
direction. 

217 
Time (sec) 
One­way 
Delay 
(msec) 
0 10 20 30 
0 
500 
1000 
1500 
2000 
Figure 12.22: Clock skew obscured by network delays 
Time (sec) 
One­way 
Delay 
(msec) 
0 10 20 30 
70 
75 
80 
85 
90 
Figure 12.23: Enlargement of reverse path 

218 
12.7.3 Failure of line­fitting approaches 
Our first attempt to detect relative skew was based on the idea of fitting lines to the OTT 
plots. We hoped that fits with equivalent and opposite slopes would indicate clock skew, and those 
without would indicate a lack of skew. One difficulty with this approach is cases of unidirectional 
noise, as illustrated in the previous section. For these, we can still try to find a very clean fit in one 
direction, and, if present, to then use it to deduce the presence of skew. 
From Figure 12.21 it is clear that the raw OTT measurements are too noisy to hope for 
clean fitting, as was also the case when testing for clock adjustments. So, we again base our analysis 
on the de­noised OTT measurements, Ÿ s t and Ÿ r t (x 12.6.2). 
Even using de­noised measurements, least­squares fitting fails to provide solid skew de­ 
tection, because residual noise in Ÿ s t and Ÿ 
r t makes it too difficult to reliably distinguish between 
a skewing trend and coincidental opposite queueing trends. All it takes is one period of elevated 
queueing at either end of the connection to throw off the fit. 
We expected as much, but had high hopes for the robust linear fitting technique discussed 
in x 9.1.4 as a way of coping with the residual noise. Alas, even this approach fails to reliably 
detect clock skew. The difficulty lies in both false positives and false negatives generated due to 
queueing fluctuations. These fluctuations are sufficient to introduce frequent non­zero slopes for 
the robust fits, and sometimes these slopes happen to have equivalent magnitudes with opposite 
sign. Furthermore, the fluctuations are often significant enough to alter the slopes so they no longer 
have equivalent magnitude in the different directions, even though skew is present. Finally, the 
robust techniques do not offer much help in distinguishing between a genuine skew trend in one 
direction and noise in the other (x 12.7.2), versus noise in both directions but no skew. 
12.7.4 A test based on cumulative minima 
Eventually we recognized that the most salient feature of relative clock skew is not simply 
the overall trend (slope) of the OTT measurements, but the fact that the smallest such measurements 
continually increase or decrease. This observation suggests the following statistical test, the strength 
of which is that it is relatively immune to transient increases in OTT measurements due to queueing 
buildups. 
Suppose we have n observations X t i 
, 1 Ÿ i Ÿ n, where t i is the time of the observation 
and X t i 
is the value of the observation. We assume that the t i 's are monotone increasing, and that 
the X t i 
are distinct. Further, we assume without loss of generality that we wish to test for a negative 
trend in X t i 
. We discuss applying the same test for a positive trend in x 12.7.5 below. 
Consider the indicator: 
I t j = 
ae 
1; if X t j ! min i!j X t i 
, or if j = 1, and 
0 otherwise. 
That is, I t j 
is 1 if X t j 
represents a new ``cumulative minimum'' if we inspect X t i 
from 1 up to j (but 
not all the way up to n), and 0 if there is an earlier X t i 
that is less than X t j 
. 
If the X t i 
are independent, then: 
P [I t j = 1] = 1=j; 
because the probability that any particular X t i 
out of j observations is the minimum of the group is 
simply 1=j. 

219 
Consider now the function: 
M j = 
j 
X 
i=1 
I t i ; 
which is the number of cumulative minima seen as we inspect X t i 
from the first value up to the jth 
value. The key observation we make is that, in the absence of a negative trend, the distribution of 
M j will tend to be close to that for independent X t i 
; that is, we will find a few cumulative minima 
but not a great number; while, in the presence of a negative trend, we should find many cumulative 
minima, since the X t i 
tend to get smaller and smaller. 
Suppose we find M n = k, that is, the X t i 
exhibit k cumulative minima. We wish to 
compute the probability that we would have observed this many or more minima, given the inde­ 
pendence assumption. If we find the probability sufficiently low, we will reject the null hypothesis 
that the X t i 
are independent. In its place we will accept the tentative hypothesis (which we will 
further test in x 12.7.6) that the X t i 
exhibit a negative trend. 
Let: 
R(n; k) = P [M n – k]: 
Given 0 Ÿ k Ÿ n, we can compute R(n; k) recursively, as follows: 
R(n; k) = 
8 ! 
: 
1; if k = 0, 
1=n!; if k = n, and 
R(n 
ays 1. 
The second case corresponds to every single X t i 
being a cumulative minimum. This only 
occurs if the X t i 
's are sorted in descending order, which, if they are independent, has probability 
1=n!, since there are n! permutations of the X t i 
, only one of which is sorted (because the X t i 
are 
distinct). 
The last case corresponds to conditioning on whether X t n 
is a cumulative minimum or 
not. For independent X t i , it will be a cumulative minimum with probability 1=n. In this case, for 
the n points to exhibit at least k cumulative minima, the n 
1 
if the dynamic programming is done using a ``memo'' function that remembers its 
previously­computed results in a table, then additional computations of R(n; k) will benefit from 
earlier computations, and the evaluation becomes extremely cheap. 
Figure 12.24 shows the distribution of R(n; k) for n = 15. The key feature of the distri­ 
bution that makes it a powerful test for a negative trend is the rapid fall­off in probability above a 
certain point, in this case around k = 8. Because if the X t i 's do indeed have a negative trend we 
should find k quite close to n, this means we can readily distinguish between the case of a negative 
trend and that of no trend, without requiring that all of the X t i 
be increasingly negative. Thus, we 
can accommodate considerable noise. 
Finally, we take as for the size of the trend the slope computed by a robust linear fit 
(x 9.1.4) to X t i 
's minima. This corresponds to the value G s or G r discussed in x 12.7.1 above. 

220 
k 
P[M(n) 
>= 
k] 
0 5 10 15 
0.0 
0.2 
0.4 
0.6 
0.8 
1.0 
Figure 12.24: Distribution of R(n; k) for n = 15 
12.7.5 Applying the test to a positive trend 
The test developed in x 12.7.4 for detecting a negative trend can also be applied to de­ 
tecting a positive trend, with one subtlety. At first blush one might think that, to do so, one simply 
uses maxima in lieu of minima. This works in principle, but fails when applied to OTT sequences, 
because of the positive additive nature of OTT noise (x 12.6.2). That is, the maxima will be often 
dominated by the noisiest OTT values, rather than by OTT values that slowly rise due to skew, so 
the noise will obscure any positive trend due to clock skew. This remains a problem even after 
de­noising, since all it takes is a single period of elevated OTT values, long enough to span an entire 
de­noising interval, to pollute the de­noised values with what will in some cases by a global max­ 
imum. When searching for a negative trend, such an interval will, on the other hand, simply not 
include a minimum; but it will not prevent the test from finding other minima due to clock skew. 
There is a simple fix for this problem, though. The key observation is that the smallest 
OTT values are in general those with the least noise. So we apply the cumulative minima test to 
Y t j = X t n 
a positive trend in X t i 
to a negative trend in Y t j 
, which the cumulative minima algorithm then readily detects. 
Finally, for a given series X t i 
we need to decide whether to test it for a positive or negative 
trend. We do this by first performing a robust linear fit to the observations. If the slope of the fit is 
positive, we look for a positive trend; if negative, a negative trend; and if exactly zero, we decree 
there is no trend. 
12.7.6 Identifying skew trends 
With the cumulative minima test we finally have a robust algorithm for detecting trends. 
These trends, however, might not be due to clock skew but to networking effects, so we need to 
develop further heuristic checks to correctly detect linear skew. 

221 
Suppose we have two sequences of de­noised OTT measurements, Ÿ 
s t and Ÿ 
r t , correspond­ 
ing as usual to the full­sized data packets sent from the connection sender to the receiver, and the 
acks sent back from the receiver to the data sender. For each sequence, we first determine whether 
it is a skew candidate as follows. 
Let u t denote the given sequence. Let R u (n; k) be the probability that the sequence u t 
matches the null hypothesis of no trend (independence) given by Eqn 12.11. We consider u t a skew 
candidate if either: 
1. R u (n; k) ! 10 t and its trend is negative. This latter test 
is because queueing buildup due to the data packets sent along the forward path can often 
produce a strong positive trend; or 
2. R u (n; k) ! 10 
for example, if we do not 
have a large number of points in u t . For example, if we have only 7 points in u t , then the 
smallest possible value of R u (n; k) is 
R u (n; n) = R u (7; 7) = 
1 
7! 
ß 2 \Delta 10 
large number of the de­noised OTTs lie very 
closely to a pure linear trend. 
We then proceed to determine whether either Ÿ 
s t or Ÿ 
r t is compelling enough by itself to 
accept as evidence of a skew trend; or if the pair form a joint skew candidate to be investigated 
further; or if there is insufficient evidence for a skew trend. To do so, we first consider which of 
them is individually a skew candidate, as follows: 
1. If neither is a candidate, then we check to see whether max(R s (n; k); R r (n; k)) Ÿ 10 
probability exceeds 10 
We do so because sometimes we have no hope of detecting a 
skew trend in Ÿ s t due to queueing buildup, as illustrated in Figure 12.22 and Figure 12.23. 
3. If Ÿ 
s t is a skew candidate but Ÿ r t is not, then we check the direction of Ÿ s t 's trend. If it is negative, 
then this goes against the networking tendency for a positive trend induced by the queueing of 

222 
the data packets along the forward path, and we accept Ÿ s t as reflecting clock skew quantified 
using G s . 
If the trend is positive, we must proceed carefully to screen out a false skew trend due to 
queueing. First, we require 
oe 2 
Ÿ s t 
Ÿ oe 2 
Ÿ 
r t 
; 
that is, the variance of the de­noised OTT values along the forward path is less than that in the 
reverse path. If this is not the case, then we reject the trace pair as a candidate for exhibiting 
a skew trend. 
Next we split Ÿ s t into two halves, Ÿ s t 1 
and Ÿ 
s t 2 
, with the division coming at b n 
2 c if s t has n 
values. If R(n; k) for either half exceeds 10 
candidate. We 
reverse Ÿ s t 2 so it now has the opposite trend of Ÿ s t 1 , and proceed as discussed below. 
4. If both Ÿ s t and Ÿ r t are skew candidates, then we consider them together a joint skew candidate. 
If the above procedure yields a joint skew candidate, we then evaluate the candidate as 
follows: 
1. If both candidates have the same trend direction, then we reject the possibility of a skew trend. 
2. If not, then we translate the first candidate's skew quantification into terms of the second 
candidate using Eqn 12.10. Let G 1 and G 2 be the corresponding skew quantifications (one of 
which has been translated, so they can be directly compared). If 
jG 1 
we accept the pair as indicative of a skew quantified as G = G1+G2 
2 . 
12.7.7 Results of checking for skew 
tcpanaly uses the method given in x 12.7.6 to check each trace pair it analyzes for clock 
skew. We found that 295 trace pairs in N 1 out of 2,335 (13%) exhibited clock skews, and 487 out 
of 15,492 did so in N 2 (3%). These proportions are high enough to argue for considerable caution 
when comparing timestamps from two different packet filters. 
In both N 1 and N 2 , about three­quarters of the skews were detected on the basis of Ÿ r t 
alone, not particularly surprising since often a skew trend in Ÿ s t will be lost in the OTT variations 
due to queueing induced by the data packets. The largest skew in N 1 was a whopping j = 5:5, 
meaning that one clock ran more than five times faster than the other! Figure 12.25 shows how 
skew like this appears in an OTT pair plot. Note that the reverse path starts a time T = 
223 
Time (sec) 
One­way 
Delay 
(msec) 
­4 ­2 0 2 4 6 
­4000 
­2000 
0 
2000 
4000 
Figure 12.25: Example of extreme clock skew 
This example is more than just an amusing curiosity. It occurred not once but 43 times in 
N 1 . Each time, the slower clock belonged to austr, and that was indeed the erroneous clock. We 
know it was the broken clock of the pairs exhibiting the problem not just because it was always one 
member of each problematic pair (which would be convincing by itself), but also because RTTs in 
those connections computed using its timestamps are physically impossible (too small) for the long 
distances traversed by the packets it sent and received. We likewise see the onset of this problem 
above in Figure 12.3. Note, however, that austr's clock was one of the ones identified in x 12.5.3 
as being highly synchronized with a number of the other sites, indicating care was being taken to 
keep accurate time with it (presumably using NTP). Thus, this clock's behavior is an compelling 
argument that just because a clock is believed to be well­synchronized does not render it immune 
from extreme error! 
Aside from austr's clock, the next largest skew we observed in N 1 was j = 0:991, a 
frequency difference of about 0.9%. This led to an OTT change of about 70 msec during an 8 sec 
connection. All in all, after removing connections involving austr, in N 1 the median skew had a 
magnitude of about 0.023%, and the mean 0.035%. These are small, but not negligible, as discussed 
at the beginning of x 12.7. 
In N 2 , the prevalence of trace pairs exhibiting skew was significantly lower (3% versus 
14%), perhaps due to the use among the participating sites of newer hardware with more reliable 
clocks. Apart from oce's clock, which we discuss in x 12.7.8 below, the largest skews we observed 
were on the order of 6%. One of these was the example of clock adjustment using skew in Fig­ 
ure 12.15 above. Figure 12.26 shows another example. The pattern is quite striking, and clearly 
could lead to grossly inaccurate conclusions about network dynamics if undetected. Note that both 
sites involved in this connection, nrao and ustutt, were among those identified as closely syn­ 
chronized in N 2 (x 12.5.3), again emphasizing that clocks that are in general well­synchronized can 
still exhibit very large errors. 

224 
Time (sec) 
One­way 
Delay 
(msec) 
0 2 4 6 8 10 
­200 
0 
200 
400 
Figure 12.26: Strong relative clock skew of 6% 
If we remove oce's connections and those with skews larger than 1%, then the median 
skew magnitude of the remainder in N 2 is about 0.011%, and the mean around 0.016%. These are 
a factor of two smaller than those in N 1 , but still not completely negligible for assessing queueing 
in longer­lived connections. 
12.7.8 oce's puzzling dynamics 
When testing the N 2 trace pairs for clock skew, we repeatedly encountered puzzling dy­ 
namics (or clock behavior) for some of the connections originated by oce, and, to a lesser degree, 
some of those in which oce was the receiver of the TCP transfer. (This did not occur for oce con­ 
nections in N 1 .) Figures 12.27 and 12.28 show the general pattern of behavior. The connections 
have exceptionally high RTTs, more than 2 sec. These times far exceed the intrinsic propagation 
delay from the remote sites to oce. Furthermore, traceroutes from oce to other sites often show 
a first hop RTT on the order of 2 sec; thus, almost all of the delay is occurring right at oce's border 
to the Internet. 
Another part of the puzzle is the shift in OTTs from almost all of the total delay being in­ 
curred by the acks incoming to oce, to almost all of it being incurred by the data packets outbound 
from oce, back to the incoming acks again. The pattern is sometimes a bit different. Figure 12.29 
shows a trace for which during most of the trace's 7.5 minute lifetime, the ack OTTs were virtu­ 
ally constant, while those for the data packets fluctuated enormously (1000's of msec). Then, at 
T = 235 sec, the ack OTTs suddenly begin to increase by a whopping 8 seconds, only to return to 
1 sec again after a 75 second outage. 
One possible explanation is that the network path between oce and the rest of the Internet 
exhibits what we term half­duplex self­interference. That is, somewhere on the path, probably at the 
first hop, there is a half­duplex link that does not fairly arbitrate between traffic in the two directions. 

225 
Time (sec) 
One­way 
Delay 
(msec) 
0 5 10 15 20 25 30 
0 
1000 
2000 
3000 
Figure 12.27: Example of puzzling oce behavior 
Time (sec) 
One­way 
Delay 
(msec) 
0 10 20 30 40 
0 
1000 
2000 
3000 
4000 
5000 
Figure 12.28: Another example of puzzling oce behavior 

226 
Time (sec) 
One­way 
Delay 
(msec) 
0 100 200 300 400 
2000 
4000 
6000 
8000 
Figure 12.29: One more example of puzzling oce behavior 
Initially, the data packets get first use of the link, and the acks must wait for their turn. Eventually, 
the phasing between which end of the link has preference shifts, so the acks gain preference and the 
data packets must wait, and with time it then shifts back. 
One can imagine half­duplex self­interference occurring on any heavily­loaded half du­ 
plex link that does not explicitly guarantee fairness between the hosts using the link. For example, 
Ethernet networks can exhibit a ``capture effect'' in which the host using the network is unfairly able 
to continue using it longer than intended [RY94]. Another half­duplex networking technology that 
can exhibit unfairness on small time scales is FDDI, in which a single host can continue to use the 
ring for up to the ``token holding time'' [Jai90]. We have observed ``ack compression'' (x 16.3.1) 
on high­speed network paths in which it appears that the compression is not due to network­layer 
queueing, but instead to link­layer delays, in which a TCP connection's acks wind up waiting for an 
FDDI token that is being hoarded by the same connection's data packets traveling in the opposite 
direction. 
While half­duplex self­interference would explain the interplay between the oce forward 
and reverse OTT variations, it does not by itself explain the very large first­hop delay associated 
with the behavior. It may be that reversing the direction in which the link is being used is a very 
expensive operation (perhaps because of low­layer errors and retries; it seems unlikely such an 
expensive mechanism would be designed into a data link). The oce staff was unable to obtain 
an explanation for the phenomenon from their networking provider. oce does have a firewall in 
place through which the NPD traffic must transit, but it would be extremely poor performance for a 
firewall to add 2 seconds of latency to every packet it forwarded. 
The final part of the puzzle concerns oce's clock. As discussed in x 12.5.3, its clock 
was the least­well synchronized in both N 1 and N 2 . Even for those N 2 oce connections that did 
not exhibit this sort of behavior (and many did not), the clock often exhibited skew. It is possible 
that oce's puzzling network dynamics makes synchronizing the clock difficult. But it is also quite 

227 
possible that at least some of the puzzling dynamics are due to the clock itself (i.e., measurement 
artifacts), since the variations resemble quite closely the signature of a clock that is varying its rate 
over short time scales. The only problem with this explanation is the fact that the connections much 
more often start with elevated OTTs for the return path that then decrease as the forward path OTTs 
increase (Figure 12.27 and Figure 12.28) than the other way around (Figure 12.29). If the behavior 
were due to a variable­rate clock, then we would instead expect the clock to be equally likely to 
start the connection running at an elevated rate as at a depressed rate. For the OTT patterns to be 
due entirely to a misbehaving clock requires that somehow fluctuations in the clock's variable rate 
are tied with the host computer's network traffic. It is difficult to see what sort of mechanism could 
create this linkage, however. 
Because the magnitude of the effect is sometimes so large, and because we could not rule 
out clock behavior as a source for the behavior or part of the behavior, we decided to eliminate all 
of the N 2 oce connections from any analysis that involved timestamps produced by its clock. (But, 
for example, we still analyze its connections for statistics like proportion of packets lost, since these 
do not rely on timestamps.) 
12.7.9 Removing relative skew 
As discussed in the previous section, a non­negligible proportion of the trace pairs in our 
study suffer from relative clock skew. We would like to remove this skew so we can then reliably 
include those traces in our analysis of network dynamics. Fortunately, the skew almost always 
appears well­described as linear, which means it is straight­forward to remove it. 
To remove skew of magnitude j, we simply modify all the timestamps t r 
i generated by 
C r using: 
t r 
i 
0 = t r 
i +G r (t r 
i 
by r 
change with time. If G r ? 0, then the OTTs increase with time, indicating that C r runs more slowly 
than C s , and to adjust it we need to increase the timestamps it generates. If G r ! 0, then the OTTs 
decrease with time, and we need to decrease C r 's timestamps to effectively it slow down. 
A key point is that applying Eqn 12.12 does not necessarily rectify C r 's skew with respect 
to true time. It only rectifies it with respect to C s . It could be that the correct action to take in terms 
of true skew removal is to apply an analogous transformation to C s 's timestamps instead. We have 
no way of knowing which clock is in error, but by Eqn 12.12 we can at least make the two sets of 
timestamps consistent. 
Indeed, both clocks could be skewed with respect to true time, in which case neither 
action will correct them in an absolute sense. But for purposes of comparing the clocks' timestamps 
to compute OTTs and infer queueing delays from them, the most important consideration is that 
the two clocks have no relative skew. Provided the absolute skew is small (say ! 1%), then its 
only effect is that the magnitude of the computed OTTs (and RTTs) will be off by an equally small 
amount. By correcting the relative skew, we remove potentially quite large, artificial OTT trends, 
and there lies our primary goal. 
tcpanaly uses Eqn 12.12 to take out relative clock skew if its magnitude is less than 
1%. If it is larger, then it flags the trace pair as having large relative skew and will not do any 

228 
timing­based analysis. 
Finally, after tcpanaly removes relative skew, it re­analyzes the clock. If it still detects 
relative skew, then either its initial assessment that the trace pair had relative skew was wrong, or 
the skew was not linear. It flags this case separately, and also then refrains from any further timing 
analysis. Thus, re­analysis provides a self­consistency test for the soundness of our skew detection. 
Only 1 of the 295 N 1 trace pairs flagged as having relative skew failed this additional test, and only 
10 of the 487 N 2 trace pairs failed. Of these 13, three involved the puzzling oce behavior discussed 
in x 12.7.8, seven appear to have been false skew assessments due to network noise, and one had 
definite skew but enough noise along the reverse path to lead to misassessment of the magnitude of 
the skew. 
12.8 Additional clock consistency checks 
Along with testing the timestamps in trace pairs for clock adjustments and relative skew 
using the methods developed above, we apply two final self­consistency checks to the timestamps 
in an attempt to calibrate their accuracy. 
12.8.1 Non­positive min­RTT sr 
We stated in x 12.5.1 that min­RTT sr , as given by Eqn 12.7, should always be positive. 
tcpanaly flags any trace pair for which it is non­positive. It also checks for whether a non­positive 
min­RTT sr was the only indication of a clock problem, as this means that our main heuristics failed 
to detect a measurement problem. This happened four times in N 1 and twelve times in N 2 , rarely 
enough to give us considerable confidence in our heuristics. 
Most of the missed clock problems were due to one of the following: failing to detect skew 
in the presence of considerable noise; failing to detect adjustments due to noise or their occurrence 
at the edge of a connection (x 12.6.5); or dealing with connections for which the RTT is on the order 
of the clock accuracy (some between sintef1 and sintef2). 
Of the three remaining problems flagged only by the min­RTT sr check, one was due 
to tcpanaly failing to detect unreliable packet filter timestamps (x 10.3.6), and the other two 
were due to a bizarre packet filter timing problem in which the filter appears to have waited many 
seconds before starting to timestamp packets at the beginning of a connection. Thus, for example, 
a connection between sdsc in San Diego and korea, on the other side of the Pacific, had packet 
filter timestamps from the korea tracing machine showing that the initial SYN handshake took 
only 4 msec to complete, while the San Diego packet filter reported it took 510 msec! Physically 
the first value is impossible, as the propagation time across the Pacific is much larger than 4 msec. 
Further inspection shows that packet timings on the korea end varied wildly at the beginning of the 
connection, yielding a swing of more than 10 seconds in the OTTs, after which they settled down 
and remained quite even. Figure 12.30 shows the corresponding OTT pair plot. Had this occurred 
in only one trace then we would have concluded the measurement had the bad luck to encounter a 
clock adjustment right at the connection's beginning, but it happened similarly in a second korea 
trace, indicating instead a packet filter timing problem associated with the beginning of a connection 
trace. 

229 
Time (sec) 
One­way 
Delay 
(msec) 
0 20 40 60 80 100 120 140 
­4000 
­2000 
0 
2000 
4000 
6000 
Figure 12.30: Initial packet filter timing glitch 
12.8.2 Gap analysis 
The final self­consistency check is based on the following observation. Suppose host s 
sends a packet at time s 1 , measured by C s , and it arrives at r at time r 1 , according to C r . Later, r 
sends a packet at r 2 , arriving at s 2 . It should always be the case that: 
s 2 
after 
s 1 , and r 2 reflects an event that occurred before 
s 2 . 
Put another way, if all of the timestamps were accurate, then we would have: 
s 1 ! r 1 ! r 2 ! s 2 ; 
and, even if C s and C r have a relative offset \DeltaC r;s between them, as long as the offset is fixed, 
then the inequality in Eqn 12.13 follows, since the subtractions remove the effects of the offset. 
Eqn 12.13 might not 
hold, however, if C s is running slower than C r , or if C s is adjusted backward 
(or C r forward) in between s 1 and s 2 (in between r 1 and r 2 ). 
We term checking whether Eqn 12.13 holds as ``gap analysis.'' Exhaustively testing all of 
the packet arrivals and departures for consistency with Eqn 12.13 requires O(n 2 ) time for n packets, 
since each departure of a sender packet can be paired with the departures of any of the receiver's 
packets sent after it. To avoid this cost, tcpanaly instead employs a strategy of ``burning the candle 
at both ends,'' namely it checks Eqn 12.13 for the first packet and the last ack; then for the next packet 
and the penultimate ack; and so on, until it works its way to the middle of the connection. Doing 
so reduces O(n 2 ) time to O(n), at the cost of perhaps missing some instances in which Eqn 12.13 
fails to hold, though the strategy still spans a wide range of gap intervals. tcpanaly also does 
gap analysis from the receiver's perspectives (where s is the host generating acks and r the host 

230 
Dataset Relative offset Likelihood of adjustment 
N 1 ! 1 sec 1.4 % 
N 1 – 1 sec 1.6 % 
N 2 ! 1 sec 0.75 % 
N 2 – 1 sec 0.95 % 
Table XVI: Relationship between relative clock accuracy and clock adjustments 
generating subsequent data packets). It needs to check both perspectives in order to detect relative 
skew and adjustments in which either of the two clocks runs faster than the other. 
Gap analysis finds some but by no means all of the clock adjustment and skew prob­ 
lems uncovered by the more robust techniques developed earlier. However, it also serves as a 
self­consistency check: we would like to know that the robust techniques find all of the clock prob­ 
lems, so we would hope that gap analysis never uncovers a problem missed by the others. It did so 
only once, the problem being a clock ``hiccup'' (x 12.6.5) in which a connection with OTTs of about 
3 msec (from lbl to sandia) had a single packet with an OTT of 430 ¯sec! 
12.9 Clock synchronization vs. stability 
We finish our study of clock calibration with an investigation into the question of whether 
highly­synchronized clocks tend to be free of problems such as adjustments and skew. We will term 
clocks free of such problems as ``stable.'' 
We might hope that highly­synchronized clocks would also be stable, because freedom 
from such problems would tend to greatly aid a clock in maintaining synchronization. On the other 
hand, if good synchronization is maintained by frequently adjusting an errant clock to match an 
external notion of accurate time, then such clocks might be more likely to exhibit adjustments or 
skew (x 12.2), and hence be less stable than other clocks. 
The issue is an important one because it is quite cheap to determine whether a remote 
clock's offset is close to that of a local clock (x 12.5.1). If relative accuracy is a good indicator 
that the remote clock is stable, then we can quickly determine that we can rely on the soundness 
of the timestamps generated by the remote clock, without having to go through all the effort of the 
methods developed in this chapter for detecting adjustments and skew. Such a quick determination 
could prove invaluable for a transport protocol that needs to decide whether it can trust the timing 
feedback information being returned from a remote peer. The hope is that the protocol can do so by 
looking at just a few initial timestamps. 
Table XVI shows the relationship between relative clock accuracy and the likelihood of 
observing a clock adjustment. We see that closely synchronized clocks, i.e., those with a relative 
offset under 1 sec, are only slightly less likely to exhibit a clock adjustment than less closely syn­ 
chronized clocks. Thus, relative clock accuracy is not a good predictor of the absence of clock 
adjustments. 
Table XVII shows the relationship between relative clock accuracy and the likelihood of 

231 
Dataset Relative offset Likelihood of skew 
N 1 ! 0:01 sec 0.95% 
N 1 ! 0:1 sec 5.6% 
N 1 ! 1 sec 13 % 
N 1 – 1 sec 12 % 
N 2 ! 0:001 sec 1.3 % 
N 2 ! 0:01 sec 0.88 % 
N 2 ! 0:1 sec 1.3 % 
N 2 ! 1 sec 1.8 % 
N 2 – 1 sec 5.3 % 
Table XVII: Relationship between relative clock accuracy and clock skew 
observing relative clock skew. 8 For N 1 , clock synchronization only provides an advantage if the 
clocks are highly synchronized, with a relative offset under 100 msec and preferably under 10 msec. 
For N 2 , however, synchronization of under 1 sec provides a definite advantage in predicting a lower 
likelihood of skew, though much better synchronization provides little additional predictive power. 
For both N 1 and N 2 , not even very close synchronization reduces the likelihood of encountering 
clock skew to a negligible level (i.e., appreciably lower than 1%). 
In summary, we conclude that relative clock accuracy provides no benefit in assuring that 
clock adjustments will be unlikely, and some benefit in assuring that clock skew is less likely, but 
not to such a degree that we can ignore the possibility of clock skew when analyzing more than a 
handful of measurements. 
In addition, we conjecture that the closely­synchronized hosts in our study are most likely 
synchronized using NTP. If so, then the use of NTP does not reduce the likelihood of clock adjust­ 
ments introducing systematic errors when measuring packet transit times, and reduces but does not 
eliminate the likelihood of clock skew introducing systematic errors. This finding does not mean 
that NTP fails to keep good time. Rather, the timescales on which it does so significantly exceed 
those of our connections. NTP keeps good time on large time scales precisely by altering clock 
behavior on small time scales. 
Thus, prudent large­scale measurement and analysis of packet timings should include 
algorithms such as those developed in this chapter as self­consistency checks to detect possible 
systematic errors, even in the presence of NTP­synchronization. We further argue that even pairs 
of clocks using a more direct external synchronization source such as GPS should be subjected to 
such checks, as a means of assuring that no timing errors have crept in between the original, highly 
accurate time source, and the timestamps ultimately produced by the packet filters. 
8 The percentages given in the table include the outlier sites of austr in N1 and oce in N2 . However, these sites 
only affect the – 1 sec row, since their relative offsets were large; and, it seems legitimate to leave them in the summaries 
since they are indeed instances of large relative offsets indicating an increased likelihood of clock skew. 

232 
Chapter 13 
Network Pathologies 
After correcting for packet filter errors (Chapter 10) and TCP behavior (Chapter 11), we 
next turn to analyzing network behavior we might consider ``pathological,'' meaning unusual or 
unexpected. When we present a series of packets to the network for delivery to a remote endpoint, 
a number of things might happen. The network can: 
(i) deliver them as we asked; 
(ii) fail to deliver them at all (packet loss, cf. Chapter 15); 
(iii) unduly delay them (packet delay, cf. Chapter 16), where ``unduly'' does not have a precise 
definition, except perhaps ``causing unnecessary retransmission''; 
(iv) deliver them in a different order than sent (out­of­order delivery, x 13.1); 
(v) deliver them more than once (packet replication, x 13.2); 
(vi) deliver imperfect copies of them (packet corruption, x 13.3). 
All but ``deliver them as we asked'' are in some sense unusual or unexpected, though to varying 
degrees. The first two unusual behaviors are in fact often expected; we devote two subsequent 
chapters to analyzing them in depth. The last three are less often expected, and we discuss them in 
the remainder of this chapter. It is important that tcpanaly recognize these sorts of pathological 
behaviors so that its subsequent analysis of packet loss and delay is not skewed by the presence of 
pathologies. For example, it is very difficult to perform any sort of sound queueing delay analysis 
in the presence of out­of­order delivery, since the latter indicates that a first­in­first­out (FIFO) 
queueing model of the network does not apply. 
13.1 Out­of­order delivery 
While Internet routers almost always employ FIFO queueing, the packet­switched nature 
of the network provides one common mechanism for reordering packets so that they arrive in a 
different order than sent: whenever the routes taken by two packets differ, and the second packet 
enjoys a sufficiently shorter transit time than the first, then reordering can occur [Mo92]. The 

233 
designers of TCP were well aware of this fact, and engineered TCP for resilience in the face of 
out­of­order delivery, as well as the other pathologies enumerated above. 
In the context of a transport protocol like TCP that sequences its data stream, we need 
to make a distinction between out­of­order delivery, which is caused by the network, and out­of­ 
sequence delivery, which is caused by the either the network (due to packet loss), or the transport 
protocol (due to retransmission). 
From a trace recorded at a TCP receiver, we cannot always distinguish between these two, 
though two heuristics often work well. The first is checking whether the IP ``id'' field (x 10.3.5) of 
two packets exhibits a small backward skip. Since each IP packet sent by a host typically increments 
the field by one, a backward skip usually only occurs due to reordering. The second is to look at the 
length of time between the arrival of the first (out­of­order or out­of­sequence) packet and that of 
the second. If it is on the order of the round trip time (RTT) or higher, then it is likely that the first 
packet is a retransmission. If it is quite short, then it is likely due to network reordering. 
Since we have traces recorded at both ends of each TCP connection, and since we can 
reliably pair departures recorded in one trace with arrivals in the other (x 10.5), we can more directly 
detect network reordering. tcpanaly does this as follows. 
13.1.1 Detecting out­of­order delivery 
To analyze network reordering between endpoints s and r, with corresponding packet 
traces T s and T r , we first check to see whether we have previously determined that r's packet filter 
suffers from resequencing (x 10.3.6), or if we were unable to pair the packets in the two traces due 
to ambiguities (x 10.5). If either of these occurred, we skip further analysis. Otherwise, we scan the 
packet arrivals in T r . For each arriving packet p i recorded in the trace, we check whether it was sent 
after the last non­reordered packet, p N . If so, then we set p N / p i , and proceed to the next arrival. 
If, however, p i was sent before p N , then we count p i 's arrival as an instance of a network 
reordering. So, for example, if a flight of ten packets all arrive in the order sent except the last 
one arrives before all of the others, we consider this to reflect 9 reordered packets rather than 1. 
Likewise, if the first arrives after all the others, and otherwise all arrivals are in order, we consider 
this as reflecting 1 reordered packet. Using this definition emphasizes ``late'' arrivals rather than 
``premature'' arrivals. It turns out that counting late arrivals gives somewhat higher numbers than 
counting premature arrivals, but the difference is not that great (ß 25%). 
tcpanaly further computes statistics on how many packets were sent between p i and 
p N , how many of these arrived prior to p N , and how much time elapsed between the arrival of p i 
and that of p N . After analyzing packets sent from s to r, it then repeats the process for those sent 
from r to s. 
13.1.2 Results of out­of­order analysis 
Out­of­order packet delivery proved much more prevalent in the Internet than we had 
expected (prior to the routing pathology analysis in x 6). In N 1 , 36% of the traces included at 
least one packet (data or ack) delivered out of order, while in N 2 , 12% did. Overall, 2.0% of all 
of the N 1 data packets and 0.61% of the acks arrived out of order, while in N 2 the corresponding 
figures fell to 0.26% and 0.10%. It is not surprising that data packets are significantly more often 
reordered than acks, because they are frequently sent closer together than acknowledgements due 

234 
to ack­every­other acking policies (x 11.6.1), and so reordering for data packets requires less of a 
difference in transit times than reordering for acks. 
We should not infer from the differences between reordering in N 1 and N 2 that reordering 
became less likely over the course of 1995, because out­of­order delivery varies greatly from site 
to site. For example, 15% of the data packets sent by ucol during N 1 arrived out of order, far ex­ 
ceeding the average for the entire dataset. Likewise, reordering is highly asymmetric. For example, 
only 1.5% of the data packets sent to ucol during N 1 arrived out of order. Furthermore, while for 
some sites out­of­order delivery of packets sent from the site strongly correlated with out­of­order 
delivery of those sent to the site, for other sites (such as ucol and wustl) the two directions were 
uncorrelated. This means a TCP cannot soundly infer whether the packets it sends are likely to be 
reordered, based on observations of the acks it receives. This is unfortunate, because, if a TCP could 
make this assumption, then it could more accurately determine the correct duplicate ack threshold 
to use for fast retransmission (see x 13.1.3 below). 
The site­to­site variation in reordering directly matches our findings concerning route 
flutter (x 6.6). In that analysis, we identified two sites as particularly exhibiting flutter, ucol and 
wustl. For the part of N 1 during which wustl exhibited route flutter, 24% of all of the data packets 
it sent arrived out of order, a rather stunning degree of reordering. If we eliminate ucol and wustl 
from the analysis, then the proportion of all of the N 1 data packets delivered out­of­order falls by 
a factor of two. Clearly, these two sites heavily dominate N 1 reordering. Finally, we note that, in 
N 2 , data packets sent by ucol were reordered only 25 times out of nearly 100,000 sent, though 
3.3% of the data packets sent to ucol arrived out of order, dramatizing how, over long time scales, 
site­specific effects can completely change. 
Thus, we should not interpret the prevalence of out­of­order delivery summarized above 
as giving any sort of representative numbers for the Internet, but should instead form the rule of 
thumb: Internet paths are sometimes subject to a high incidence of reordering, but the effect is 
strongly site­dependent, and highly correlated with route fluttering. 
The extremes of out­of­order delivery are interesting because they represent situations of 
network behavior far from normal. Such true pathologies sometimes illuminate unforeseen interac­ 
tions between transport protocols and the network. 
Figure 13.1 shows the single worst trace in our data in terms of out­of­order delivery, 
from wustl to nrao in N 1 . 74 packets out of 205 sent arrived out­of­order, a proportion of 36% 
(the worst in N 2 was 28%). The plot includes a line linking adjacent packets to highlight the 
effect. Every time the line heads downward to the right it indicates an out­of­order delivery. It is 
interesting to note that while this connection endured major reordering, it did not suffer any packet 
loss, and only one needless retransmission, that due to the Solaris TCP's insufficiently large initial 
retransmission timeout (RTO), discussed in x 11.5.10. In particular, the timer was able to cope 
with significant fluctuations in round­trip time. This may appear surprising in light of the problems 
previously uncovered with the Solaris timer adaption algorithm (x 11.5.1). However, out­of­order 
packets elicit duplicate acks from the network, corresponding to the temporarily missing packets. 
If the RTO adaptation only uses timings based on acks that advance the window, then it will tend to 
see timings reflecting the longer of the two routes over which the packets travel. This is, fortunately, 
exactly the right RTT timing to which one should adapt the RTO, since it represents the worst­case 
on how long it can take for a packet to traverse the network and be acknowledged. 
While we found earlier in this section that data packets are significantly more likely to be 

235 
Time 
Sequence 
# 
1.2 1.4 1.6 1.8 2.0 2.2 2.4 
20000 
30000 
40000 
50000 
60000 
Figure 13.1: Sequence plot showing a connection with 36% of data packets delivered out­of­order 
reordered than acks, this does not necessarily apply to the extremes of behavior. Indeed, in N 1 we 
observed 12 connections in which 20% or more of the acks were reordered, with an extreme value 
of 33% reordered. (In N 2 , the extreme value was 13%.) 
Figure 13.2 shows the largest out­of­order gap we found. In this N 2 trace from adv to 
harv, all the packets shown in the plot were sent in sequence. After data packet 61,953 arrives, the 
next arrival is 89,601, sent 54 packets later! 
While at first blush it might appear that the reordering in Figure 13.2 is due to a routing 
change at sequencing 89,601, the evidence indicates it is in fact due to a different effect. Figure 13.3 
shows a similar massive reordering event. Here, however, the higher­sequence number packets 
nearly lie on a line. Indeed, fitting a line to them yields a data rate of a little over 170 Kbyte/sec. 
This rate is a compelling value because it agrees with a T1 bottleneck (x 14.7.1). Furthermore, it 
agrees with the remainder of the trace, which is shown in its entirety in Figure 13.4. Indeed, from 
that figure it is clear both that the slope of the packets delivered late in Figure 13.3 is aberrant, 
and that the late packets were abnormally delayed, rather than the high­sequence packets arriving 
early due to a routing change. Finally, the slope of the late packets, if we factor in the number of 
high­sequence packets arriving in their midst, is just under 1 Mbyte/sec, consistent with an Ethernet 
bottleneck. 
We analyze this behavior as follows. A router quite close to the receiver (such that the 
bottleneck bandwidth between the router and the receiver corresponds to Ethernet speed) stopped 
forwarding packets just as 72,705 arrived. The most likely explanation for its 110 msec lull is that it 
had a routing update to process, as these can take considerable time and many routers cease forward­ 
ing packets during the update [FJ94]. After the processing finished, which occurred just between 
the arrival of 91,137 and 91,649, the router began forwarding packets normally again. Thus, the 
higher­sequence packets, which arrived at the router at T1 speeds since that is the upstream bottle­ 
neck, continued through the router unaltered. Meanwhile, the router had queued some 35 packets 

236 
Time 
Sequence 
# 
0.64 0.66 0.68 0.70 0.72 0.74 
60000 
70000 
80000 
90000 
100000 
Figure 13.2: Sequence plot showing a connection with an out­of­order gap of 54 packets 
Time 
Sequence 
# 
1.72 1.73 1.74 1.75 1.76 1.77 1.78 
75000 
80000 
85000 
90000 
95000 
100000 
Figure 13.3: Out­of­order delivery with two distinct slopes 

237 
Time 
Sequence 
# 
0.0 0.5 1.0 1.5 
0 
20000 
40000 
60000 
80000 
100000 
Figure 13.4: Sequence plot of entire connection shown in previous figure 
while it processed the update, and these were now finally forwarded whenever the router had time 
(was not processing a newly arriving packet). Thus, they went out as quickly as possible, namely at 
Ethernet speed. 
We observed this pattern a number of times in our data---not frequent enough to conclude 
that it is anything but a pathology, but often enough to suggest that significant momentary increases 
in networking delay can be due to effects different from both queueing and route changes; most 
likely due to router ``pauses.'' 
Striking reordering is not confined to data packets. Figure 13.5 shows a SYN­ack packet, 
still advertising a (relatively) small initial window (shown in the plot by the circle above the ack), 
arriving a full second after it was sent, after 19 subsequent acks have already arrived. Even more 
striking is the trace shown in Figure 13.6. Here, two acks, the first for 47,617 and the second for 
48,129, arrive a full twelve seconds after they were sent (and long after the packets they acknow­ 
ledged were needlessly retransmitted). Just where in the network they spent those 12 seconds, and 
what led to their eventual release, remains a mystery! One clue, however, is that they arrived with 
a remaining TTL of 40, while all the other acks had TTL's of 41 remaining. They may have taken 
a different route through the network. This is not certain, however, because the router that detained 
them may instead have additionally decremented the TTL field to reflect the long delay (x 4.2.1). 
13.1.3 Impact of reordering 
While out­of­order delivery can violate one's assumptions about how the network 
works---in particular, the abstraction that the network is well­modeled as a series of FIFO queue­ 
ing servers---it often is no more than a nuisance in terms of its impact on transport protocols such 
as TCP. For example, Figure 13.1 above shows the trace that endured the largest proportion of 
out­of­order packet delivery of the more than 20,000 traces we studied; yet it did not suffer any 

238 
Time 
Sequence 
# 
2.0 2.2 2.4 2.6 2.8 3.0 3.2 
0 
10000 
20000 
30000 
40000 
Figure 13.5: Sequence plot of ack delivered out­of­order 
Time 
Sequence 
# 
20 25 30 
45000 
50000 
55000 
60000 
65000 
70000 
Figure 13.6: Sequence plot of two acks delivered out­of­order and very late 

239 
retransmissions, and in fact had its performance limited by the small advertised receiver window, 
rather than by any effects from the reordering. 
Where reordering makes a difference, however, is when one wishes to make a quick de­ 
cision whether or not to retransmit an unacknowledged packet. 1 In particular, if the network never 
exhibited reordering, then, as soon as the receiver observed the arrival of a packet that created a 
sequence ``hole,'' it would know that the expected in­sequence packet had been dropped, and could 
signal this information to the sender to call for prompt retransmission. Because of reordering, how­ 
ever, the receiver does not know whether the packet in fact was dropped; it may instead have simply 
been reordered and will arrive shortly. In this latter case, the receiver should not call for retransmis­ 
sion, as retransmission is unnecessary and will thus needlessly consume network resources. 
TCP addresses this problem as follows. When a TCP receives a packet above a sequence 
hole, it may generate a dup ack for the sequence hole. (Indeed, all TCPs in our study except SunOS 
generate such acks; see x 11.6.2.) If a TCP receives a certain threshold number N d of dup acks, it 
then can enter a fast retransmit phase (x 9.2.7). Presently, N d = 3, a value chosen so that ``false'' 
dup acks generated by out­of­order delivery are unlikely to lead to spurious retransmissions. 
The value of N d = 3 was chosen primarily to assure that the threshold was conservative 
and needless retransmission avoided. Large­scale measurement studies were not available to further 
guide the selection of the threshold. In this section we examine whether the fast retransmit mech­ 
anism could be improved in two different ways: by delaying the generation of dup acks in order 
to better disambiguate packet loss from out­of­order delivery, and by choosing a different threshold 
value to improve the balance between increasing opportunities to retransmit quickly, and avoiding 
unneeded retransmissions due to out­of­order delivery. 
We first look at packet reordering time scales to determine whether a TCP could profitably 
wait a short period of time upon receiving a packet above a sequence hole before generating a dup 
ack. We only look at the time scales of data packet reorderings, since ack reordering time scales 
do not affect the fast retransmission process. Indeed, since TCP acks are cumulative, out­of­order 
delivery of acks has essentially no effect on the performance of a TCP connection. 
Figure 13.7 shows the distribution of the amount of time between an out­of­order data 
packet arrival and the later arrival of the last packet sent before it. The plot is log­scaled and 
thus reflects a wide range in reordering times. The distribution exhibits several artifacts meriting 
investigation. For example, the central step in the distribution occurring around 50% probability lies 
at exactly 10 msec, and corresponds to a common clock resolution (x 12.4.2). Likewise, the smaller 
step a bit to the right of it is at 20 msec, another common resolution. 
The skip at the upper right of the plot is more interesting, as it is not a measurement 
artifact per se. It lies right at 81 msec, which initially seems a strange value. However, one of the 
sites in our study was linked to the Internet during N 1 via a 56 Kbit/sec link (connix). Using the 
methodology developed in Chapter 14, we found this site's bottleneck bandwidth was right around 
6,320 user data bytes/sec. If a remote site is sending 512 byte packets, and if they are reordered 
upstream from the 56 Kbit/sec bottleneck link, then the packets can arrive no closer than: 
512 bytes 
6; 320 bytes/sec = 81:0 msec: 
1 It can also make a significant difference for a TCP receiver that does not retain above­sequence data, as we saw for 
Trumpet/Winsock in x 11.7.3. Such a TCP will force retransmission of every packet delivered out of order. 

240 
Delivery Gap (sec) 
y 
10^­6 10^­4 10^­2 10^0 
0.0 
0.2 
0.4 
0.6 
0.8 
1.0 
Figure 13.7: Distribution of out­of­order delivery interval for N 1 data packets 
Thus, we see that reordering can have associated with it a minimum time which can be quite large. 
This effect, however, will diminish with time as faster links replace slower ones. 
Figure 13.8 shows the same distribution for N 2 (solid), with N 1 added (dotted) for com­ 
parison. It likewise exhibits timer resolution steps and the 56 Kbit/sec minimum reordering time, as 
well as a slightly smaller minimum time of 70 msec, corresponding to 64 Kbit/sec links delivering 
about 7,300 bytes/sec. The most noteworthy aspect of the plot, however, is the strong shift towards 
lower values. The median of the N 1 intervals was 10 msec, and the geometric mean 9 msec, while 
for N 2 these dropped by more than a factor of two, both to around 4 msec. We suspect the change is 
due to the deployment of faster links within the Internet infrastructure. 2 If so, then again we expect 
reordering times to diminish as the infrastructure is further upgraded. 
Even with the N 1 intervals, a strategy of waiting 20 msec would identify 70% of the 
out­of­order deliveries. For the N 2 intervals, the same proportion can be achieved waiting 8 msec. 
However, a more basic question is: are false fast retransmit signals due to out­of­order 
deliveries actually a problem? To find an answer, we added to tcpanaly analysis of duplicate 
acks 3 as follows. For each trace pair it analyzes, it inspects each series of duplicate acks arriving at 
the sending TCP and classifies the sequence as one of: 
good: indeed due to a missing packet requiring retransmission; 
2 It is not due to better clock resolutions in N2 compared to those in N1 . If we eliminate the 9--11 msec and 19-- 
21 msec spikes in the distributions shown in Figure 13.7 and Figure 13.8, we still find a virtually identical shift between 
the two datasets. 
3 tcpanaly only considers an ack as a duplicate of the preceding ack if it (i) acknowledges the same sequence 
number; (ii) contains the same offered window; and (iii) is a ``pure'' ack packet, one not containing any data. This test 
can still mistake a series of acknowledgements for ``zero window'' probes as triggering a fast retransmit. However, such 
probes were exceedingly rare in our traces: only 6 instances in N1 , and none in N2 . Of the 6 in N1 , only one persisted 
long enough to elicit more than a single ack in reply (it elicited two such acks). 

241 
Delivery Gap (sec) 
y 
0.00001 0.00100 0.10000 
0.2 
0.4 
0.6 
0.8 
1.0 
Figure 13.8: Distribution of data packet out­of­order delivery interval for N 1 (dotted) and N 2 (solid) 
bad: actually due to a temporary sequence hole caused by out­of­order delivery; or, 
top: corresponding to the top sequence number sent so far. 
The terms good and bad reflect the perspective of using the series of duplicate acks as a signal 
for fast retransmission. top series reflect situations in which the TCP has already needlessly re­ 
transmitted. When a needless retransmission arrives at the receiver, because it is below­sequence it 
will immediately trigger the generation of a duplicate ack (x 9.2.7). top series can lead to further 
needless retransmission (thus perpetuating the cycle), but the TCP can employ a simple heuristic to 
avoid these, discussed below. 
In addition to classifying each duplicate ack series, tcpanaly assigns a length D corre­ 
sponding to the number of duplicate acks in the series. For good duplicate ack series, tcpanaly 
also associates a savings S indicating how much time would have been saved if the fast retransmit 
threshold N d had been equal to D, and thus the series had led to retransmission. For D ? 3, S is 
often negative, because in fact the packet was already transmitted upon receipt of the third duplicate, 
rather than waiting for all D packets. 
For bad duplicate ack series, tcpanaly associates a waiting time W , indicating how long 
the TCP would have had to wait in order to recognize that the sequence hole was due to out­of­order 
delivery rather than to packet loss. 
When considering a refinement to the fast retransmission mechanism, our interest lies 
in the resulting ratio of good to bad, R g:b , which is controlled by both N d and f 
W , the minimum 
amount of time that the receiving TCP would wait prior to generating a duplicate; and the mean 
ensuing savings S of how much more quickly the TCP can retransmit as a result of the refinement. 
We first consider the current state of affairs, in which N d = 3 duplicates and f 
W = 0, 
namely duplicate acks are generated immediately as called for. In N 1 we find R g:b = 22, and in 
N 2 R g:b = 300! (That is, in N 1 , each incorrect fast retransmit was countered, overall, by 22 correct 

242 
fast retransmits, and, in N 2 , by 300 correct retransmits.) The order of magnitude improvement 
between N 1 and N 2 is likely mostly due to the use in N 2 of bigger windows (x 9.3), and hence 
much more opportunity for good duplicate ack series. (We do not evaluate the savings S of the 
current mechanism, because it is what we are measuring against.) 
Because the current scheme works well, we do not investigate increasing the threshold in 
detail. We note, however, that N d = 4 improves R g:b by about a factor of 2.5, but diminishes the 
number of fast retransmit opportunities by about 30%, a significant loss. 
We might instead consider whether the threshold can be safely lowered from 3 to 2. For 
N d = 2, we gain about 65--70% more fast retransmit opportunities (i.e., good dup ack sequences), a 
hefty improvement. Furthermore, the mean savings S for these new opportunities is 1.65--1.73 sec, 
because we are avoiding retransmission timeouts. The cost, however, is that R g:b falls by about a 
factor of three, in both N 1 and N 2 . 
If, however, the receiving TCP waited f 
W = 20 msec before generating a second dup 
(avoiding doing so if the missing packet arrived, and immediately doing so if another out­of­order 
arrival called for a third dup), then, for N 1 , R g:b only falls from 22 to 15, while for N 2 it does not 
fall at all. 
Thus, the simplest change of just lowering N d from 3 to 2 gains a large proportion of 
quicker retransmissions, but at the cost of three times as many unnecessary retransmissions. A com­ 
panion change to TCPs to delay for f 
W = 20 msec when sending a second duplicate ack ameliorates 
almost all of the drawbacks of lowering N d to 2. However, there are considerable deployment differ­ 
ences between these two modifications. The first is a one­line change in most TCP implementations 
and garners benefits (and drawbacks) even if only the sending TCP has been modified and it is 
communicating with an unmodified receiving TCP. The receiving TCP change involves additional 
timer management and so is not necessarily a simple change, and it only garners benefit if both the 
sending and receiving TCP have been modified (it does not do much harm if the sender has not, 
however). But lowering the retransmit threshold to two duplicate acks is only a sound change if de­ 
ployed simultaneously with the f 
W = 20 msec change. Such widespread simultaneous deployment, 
however, is virtually infeasible due to the size of the Internet. Therefore, we would have to live with 
partial deployment for a lengthy period of time, and, for that time, significantly more unneeded 
retransmissions. In summary, if we require changing both the sender and the receiver, then, while 
the change is appealing, it is likely impractical considering the size of the Internet's installed base 
of TCP implementations. 
Another approach would be to modify senders to wait 20 msec before responding to N d = 
2 duplicate acks with a fast retransmission. This pause would then generally allow, in the case of 
out­of­order delivery, sufficient time for another ack to arrive indicating that the temporarily missing 
data packet was successfully delivered. We do not evaluate this approach in detail here, but note 
that it has several drawbacks. First, it requires additional timer management, which, as mentioned 
above, is not always a simple change. Second, delay variations along the return path taken by the 
acks might require a significantly larger value of f 
W to avoid unnecessary retransmissions. Third, if 
the ack return path suffers from loss, then the ``clarifying'' ack that identifies the first two dups as 
due to a reordering event might be lost, again leading to unnecessary retransmissions. 4 
4 We show in x 15.2 that losses along the forward and reverse directions of an Internet path are, overall, nearly uncor­ 
related, so we could quite plausibly have a situation in which ``clarifying'' acks are dropped, but there is no loss along the 
forward path, and hence no retransmission necessary. 

243 
We note that the TCP selective acknowledgement (``SACK'') option, now pending stan­ 
dardization, also holds promise for honing TCP retransmission [MMFR96]. SACK provides suf­ 
ficiently fine­grained acknowledgement information that the sending TCP can generally tell which 
packets require retransmission and which have safely arrived (x 15.6). To gain any benefits from 
SACK, however, requires that both the sender and the receiver support the option, so the deployment 
problems are similar to those discussed above. Furthermore, use of SACK aids a TCP in determin­ 
ing what to retransmit, but not when to retransmit. Because these considerations are orthogonal, 
investigating the effects of lowering N d to 2 merits investigation, even in face of impending deploy­ 
ment of SACK. 
Perhaps needless to say, lowering N d all the way to a single dup ack is a disaster. R g:b 
falls by a factor of 10 from its value for N d = 3. For N 2 , using a 20 msec delay before generating 
a dup ack wins back most of the loss (changing the factor to 1.5), but for N 1 , it still falls by a 
factor of 3. 
The final category of duplicate ack series analyzed by tcpanaly is top. These are quite 
common, due primarily to broken retransmission timers (x 11.5.10), but also due to imperfect re­ 
covery during retransmission. A top series occurs when the original ack (of which all the others 
are then duplicates) had acknowledged all of the outstanding data (hence, the top of the sequence 
space). When this occurs, subsequent duplicates for that ack are always due to an unnecessary re­ 
transmission arriving at the receiving TCP, until the sending TCP sends new data. Even when it 
does, subsequent duplicates are still due to redundant packets until at least a round­trip time has 
elapsed after sending the new data. 
Figure 13.9 shows a retransmission event leading to a top series. The sender has opened 
a large window of about 50 packets when data packet 45,025 is lost, as are the 17 packets following 
it. A river of dup acks pours in as 54,673 and above successfully arrive. The third dup triggers fast 
retransmit, but since nearly half the window was lost, the many dup acks are not enough to induce 
fast recovery, so no more packets are in flight, and hence no more dups arrive signaling that 45,561 
was also lost. Thus, 45,561 times out, and a slow­start sequence begins at T = 2:46. 
The first four flights of this sequence all work to fill the large sequence hole due to the 
18 dropped packets, but the fifth flight, considerably larger than the fourth, transmits almost entirely 
redundant data already safely received at the other end. The arrival of these unnecessary packets 
then causes another sequence of duplicate acks. Figure 13.10 shows the resulting top series. The 
first ack for 67,001 is not a dup but instead indicates that the sequence hole has been filled. It also 
advances the window, so 13 new packets are sent, beginning with 67,537. Shortly after, the first of 
the dups arrive, and, after three, 67,537 is sent again due to fast retransmission, and more packets 
are sent on the additional dups due to fast recovery. Since fast recovery is enabled, however, no 
more spurious retransmissions result, ending the cycle, and the connection proceeds normally once 
fast recovery terminates about time T = 2:85. 
Top series are about 10 times rarer than good series, but that still makes them the cause 
of between 2 and 15 times as many unnecessary retransmissions than bad series due to out­of­order 
delivery. They are, however, preventable, using the following heuristic. Whenever a TCP receives 
an ack, it notes whether the ack covers all of the data sent so far. If so, it then ignores any duplicates 
it receives for the ack, otherwise it acts on them in accordance with the usual fast retransmission 
mechanism. 
The only drawback to this method is if the TCP sends a flight of new data after receiving 

244 
Time 
Sequence 
# 
1.0 1.5 2.0 2.5 3.0 
30000 
40000 
50000 
60000 
70000 
80000 
90000 
Figure 13.9: Sequence plot showing retransmission event leading to top duplicate ack series 

245 
Time 
Sequence 
# 
2.78 2.80 2.82 2.84 2.86 
66000 
68000 
70000 
72000 
74000 
76000 
78000 
Figure 13.10: Enlargement of top duplicate ack series 
the first top ack, and the first packet of the flight is lost, then the subsequent dups generated by the 
arrival of the remainder of the flight will fail to trigger fast retransmission for the missing packet, 
and so the connection will stall pending a timeout retransmission. This deficiency can be addressed 
by allowing the TCP to honor dup acks if they arrive at least one round­trip time (RTT) after the 
TCP sent new data. This requires, however, that the TCP maintain an estimate of the minimal RTT, 
which most present implementations do not. (The retransmission timeout is based on an estimate 
of the maximum RTT.) Use of SACK will also eliminate top dup ack series, since SACK allows 
the sender to disambiguate between dups due to needless retransmission and dups due to a genuine 
missing packet. But the heuristic we propose has the attractive benefit of not requiring that both the 
sender and receiver implement it. It works fine if just the sender uses it. 
13.2 Packet replication 
In this section we look at packet replication, meaning instances in which the network 
delivers multiple copies of the same single packet. While with out­of­order delivery we can readily 
picture a causal mechanism, namely uneven path delays, it is difficult to see how the network can 
replicate a packet given to it. Our imaginations notwithstanding, it does occur, albeit very rarely. 
We suspect the mechanism may involve links whose link­level technology includes a notion of 
retransmission, and for which the sender of a packet on the link incorrectly believes the packet was 
not successfully received, so it sends the packet again. A related mechanism, pointed out by Van 
Jacobson, would occur on a token ring network if the sender's network interface sometimes failed 
to promptly drain the packet from the ring, such that it made multiple circuits. 
In N 1 , we observed only one instance of packet replication. Figure 13.11 shows the 
corresponding sequence plot, recorded at the data sender. Two acks, one for 43,009 and one for 
44,033, arrive at T = 1:86. They then arrive again, and again, and again, for a total of 9 pairs of 

246 
Time 
Sequence 
# 
1.6 1.8 2.0 2.2 2.4 
34000 
36000 
38000 
40000 
42000 
44000 
46000 
48000 
Figure 13.11: Two acks replicated 8 times each 
arrivals, each pair coming 32 msec after the last. Since the replication involves two different acks, 
the multiple arrivals do not constitute a duplicate ack series, and so no fast retransmission occurs 
(x 9.2.7). The fact that two packets were together replicated does not fit with the explanations offered 
above for how a single packet could be replicated, since link­layer effects would only replicate one 
packet at a time. Finally, the replication in Figure 13.11 was accompanied by a routing change along 
the path from the data sender to the receiver. It seems likely the two events were somehow related. 
In N 2 , however, we observed 65 instances of the network infrastructure replicating a 
packet. Figure 13.12 shows the most striking of these, a single data packet 78,337 being replicated 
22 times by the network (two extended blurs in the plot). The receiving TCP dutifully generates 
dup acks for each additional arrival, though it experiences a processing lull of about 7 msec while 
doing so. 
All of the packet replications in N 2 were of a single packet, indicating perhaps a different 
mechanism than that for N 1 's lone replication event. Several sites dominated the N 2 replication 
events: in particular, the two Trondheim sites, sintef1 and sintef2, accounted for half of the 
events (almost all of these involving sintef1). Of the remainder, the two British sites, ucl and 
ukc, accounted for nearly half again. But after eliminating all of these, we still observed replication 
events among connections between 7 different sites, so the effect is not completely isolated to one 
or two locations. 
Surprisingly, packets can also be replicated at the sender. Figure 13.13 shows an example. 
Here, the ack arriving in the lower left corner of the plot has liberated 19 new packets (the receiver 
is a Solaris system and the ack reflects the Solaris slow­start acking strategy discussed in x 11.6.1). 
The packets are sent at nearly Ethernet speed, but, 4 msec after it was first sent, packet 91,649 
shows up again. The second occurrence is a replication and not a temporary routing loop, because 

247 
Time 
Sequence 
# 
1.220 1.225 1.230 1.235 1.240 1.245 
78000 
78500 
79000 
79500 
Figure 13.12: Data packet replicated 22 times 
Time 
Sequence 
# 
1.580 1.585 1.590 1.595 
88000 
90000 
92000 
94000 
96000 
Figure 13.13: Data packet replicated at sender 

248 
both copies show up at the receiver. 5 Furthermore, the second copy had a TTL field one less than 
that in the first copy, indicating that the replicant did indeed take a slight detour before showing up 
again on the local link. While there were no sender­replicated packets in N 1 , N 2 had 17 instances, 
12 involving sintef1 and the remainder involving connix. For both sites, the replicated packet 
was always out­bound, sometimes an ack and sometimes a data packet. 
13.3 Packet corruption 
The final pathology we look at is packet corruption, in which the network delivers to the 
receiver an imperfect copy of the original packet. Packet corruption is a well­known problem and 
a great deal of effort has been devoted to coding schemes and checksums in order to detect and 
correct for transmission errors. For TCP/IP, the IP header includes a 16 bit header checksum that 
is computed over the IP header bytes. It does not include the TCP header or the TCP data bytes. 
It is supposed to be checked at each forwarding hop (though it is not clear whether all high­speed 
routers do so). If the checksum fails to match the header, the packet is discarded, because it cannot 
be reliably forwarded (who knows what is the true destination address?). 
TCP packets are further protected by a 16 bit checksum for the entire data contents of 
the packet, as well as the TCP header and part of the IP header. This checksum is intended as an 
end­to­end checksum, the merits of which are persuasively argued in [SRC84]. 
We discussed tcpanaly's checksum analysis in x 11.2 and x 11.4.2. One issue we men­ 
tioned was the fact that what tcpanaly is actually detecting are packets ignored by the TCP re­ 
ceiver, which we then presume are due to checksum failures. An important point is that packets can 
be ignored due to other effects, such as the kernel having exhausted its available buffer space for 
keeping the packet until the TCP receiver can process it, or the network card dropping the incoming 
packet for the same sort of reason. In particular, the vantage­point problem (x 10.4) can render the 
distinction between a checksum failure and other problems difficult to make. 
We address this difficulty by observing that packet filters running on the same host as the 
TCP receivers should only see packets also seen by the receiver: if the network interface or kernel 
lacked resources for delivering the packet to the TCP, then the filter should not have received a copy, 
either. 6 Packet filters running on separate hosts, on the other hand, will see both kinds of receiver 
losses, those due to checksum failures and those due to other causes. Thus, if a significant portion 
of tcpanaly's inferred checksum errors are actually packets discarded for a different reason, then 
we should find the sites with separate packet filter hosts more likely to detect purported checksum 
errors than those with the packet filter running on the same host as the TCP. 
We do not, however, find much of a disparity: in N 2 , after eliminating lbli (see below), 
we find that 3.3% of the traces recorded by separate­host packet filters included a purported check­ 
sum error, while about 3.0% of those recorded by same­host filters did. Accordingly, we argue that 
the vast majority of checksum errors inferred by tcpanaly are indeed due to packet corruption. 
We now present analysis based on this assumption. In N 1 , tcpanaly flagged 75 traces 
(2.9%) as exhibiting a total of 105 checksum errors, with an overall proportion of 0.02% of the 
5 We verified that both copies include the same value in the IP ``id'' field. 
6 It might be possible that, on some systems, the kernel may find it has sufficient resources to give a copy of a packet 
to the packet filter, but not a separate copy to the TCP receiver. We would expect, though, that this sort of borderline case 
would manifest itself only rarely. 

249 
received packets corrupted by checksum errors. In N 2 , however, the figures climbed to 748 traces 
(4.4%) exhibiting 1,982 checksum errors, for an overall proportion of 0.06% of the received packets. 
The apparent trend, however, is not significant. It is all due to an increase in the checksum 
errors seen for data packets received by lbli. In N 1 , only 4% of the traces with data checksum 
errors were to lbli as the receiver. In N 2 , however, 33% were. Furthermore, lbli in N 2 was 
particularly prone to checksum bursts like those shown in Figure 11.3. If we eliminate from our 
analysis those N 2 traces with lbli as the receiver, then the proportion of traces with errors falls 
to 3.0% and the proportion of received packets falls to 0.02%, essentially the same as in N 1 . After 
doing so, no particular site stands out as being exceptionally plagued by checksum errors. Thus, the 
evidence is good that, as a rule of thumb, the proportion of Internet data packets corrupted in transit 
is around 1 in 5,000. 7 
A corruption rate of 1 packet in 5,000 is low but certainly not negligible, because TCP 
protects its data with a mere 16­bit checksum. Consequently, on average one bad packet out of 
65,536 will be erroneously accepted by the receiving TCP, resulting in undetected data corruption. 
If the rates in our study are typical, which seems plausible (but see below), then about one in every 
300 million Internet packets is accepted with corruption. As the Internet carries far more data than 
300 million packets per day, 8 it appears likely that bad data is being accepted by a number of TCPs 
around the Internet every day. 9 Thus, these statistics argue that TCP's 16­bit checksum is no longer 
adequate, if the goal is that globally in the Internet there are very few corrupted packets accepted by 
TCP implementations. 
We noted above that lbli showed a strong increase in the prevalence of corrupted data 
packets received between N 1 and N 2 . Since lbli's Internet link is via an ISDN line, it appears 
quite likely that the change is due to an increase in noise on the ISDN channels. That the errors most 
likely occur on an ISDN link also suggests why we observe bursts of checksum errors. The link in 
question uses SLIP compression (CSLIP) in order to transmit the TCP/IP header information very 
succinctly over the link [Jac90]. CSLIP works by encoding the header as differences with respect to 
the header of the connection's previous packet. Thus, if the link suffers an undetected error, not only 
will the current packet be corrupted, but so will every subsequent packet whose header is expressed 
in terms of differences with respect to the current packet's corrupted header. CSLIP consequently 
produces a stream of corrupted packets until the compression is reset (which happens when the 
originally­corrupted packet is retransmitted). This is exactly the behavior seen in Figure 11.3---the 
errors stop as soon as the first corrupted packet is retransmitted. (We frequently see this pattern with 
checksum bursts.) This means that, at the physical layer, probably only one error occurs, but the 
use of compression magnifies this error and turns it into a burst. From a networking perspective, 
this is quite unfortunate, as it results in a spate of what should have been unneeded retransmissions. 
The correct fix for this problem is probably to ensure that the link layer uses a strong checksum, so 
it can discard corrupted packets without even presenting them to CSLIP for decompression; and to 
7 If we assume single­bit uniformly­distributed errors, along with 512 byte data packets having 40 bytes of TCP/IP 
header, then this corruption rate corresponds to a Bit Error Rate of about 4:5 \Delta 10 
IX­WEST 
captured on June 21, 1995, logged slightly under 
1,000,000 packets per minute [http://www.nlanr.net/Flowsresearch/fixstats.21.6.html]. 
9 This analysis assumes that corruptions result in uniformly­distributed checksum alterations. See [PHS95] for a more 
detailed analysis of data corruption checksum patterns, which can make the failure rate for accepting bad data significantly 
higher. In general, our data does not enable us to check for these other patterns, since our traces do not include packet 
contents. 

250 
ensure that CSLIP can resynchronize its compression state in the presence of such discards. 
Finally, we note that the data checksum error rate of 0.02% of the packets is much higher 
than that found for pure acks (x 11.2). For pure acks, we found only 1 corruption out of 313,730 acks 
in N 1 , and 26 out of 1,839,824 acks in N 2 . Of the 26 in N 2 , however, 25 were received by lbli, 
which we removed from our analysis above since it showed a clear prevalence of checksum errors 
far exceeding any other site. We thus need to reconcile an error rate of 2 \Delta 10 
tendency for data 
packet corruptions to come in bursts, as discussed above. However, other than lbli, this is not the 
case---for other sites, corruption events were usually confined to isolated packets. 
If we assume that corruption is due to uniformly distributed single bit errors, then a 
packet's likelihood of corruption will be directly proportional to the packet's size. Since pure acks 
have 40 bytes of TCP/IP header while data packets in our study were usually about 14 times larger 
(though sometimes as much as 37 times), the difference in size alone does not appear to reconcile 
the discrepancy. 
Note, however, that the IP header has its own checksum, which is supposedly verified at 
each hop taken by a packet. We add the caveat ``supposedly'' because it is not clear whether all high­ 
speed routers verify checksums, a potentially costly packet­forwarding step as it requires inspecting 
the entire IP header, which might otherwise be avoidable. 
Thus, if a packet is corrupted on a link so that its IP header is altered, then the router 
receiving the packet is supposed to discard it. Furthermore, if either of the 16­bit port fields in the 
TCP header are corrupted, then the packet filter used in our study would have rejected the packet, so 
we would not have had an opportunity to observe the checksum error. The net effect is that, from the 
perspective of the number of corruptible­yet­observable bits, pure acks have a size of only 16 bytes. 
(The number of corruptible­yet­observable bits in data packets likewise diminishes, but by a much 
smaller fraction.) This effect, plus the factor of 14 difference in size, reduces the weighted error rate 
ratio to between about 2:1 and 10:1. 
In addition, if a compression technique such as that in CSLIP is used, then pure acks as 
transmitted on a link can take much less than 40 bytes (as little as 5 bytes using CSLIP), while 
data packets take only slightly less than their full TCP/IP size. The size difference can therefore 
expand from 14:1 to 100:1 or even larger. However, it is not clear whether CSLIP is used on any 
but quite slow links, since for faster links, the performance cost of compressing and decompressing 
the packet headers might outweigh the gains due to the reduced transmission times. 
Another possibility is that errors are not uniformly distributed across the bits in a packet. 
We could imagine a scenario, for example, in which each time a new packet is sent, the beginning 
of the transmission of a packet on a link serves to synchronize the sender and receiver on the link. 
It could then be that for longer packets there is more opportunity for the sender and receiver to drift 
out of synchronization, adding noise to the signals used to communicate the bits. Investigating this 
possibility, however, is beyond the scope of our study---doing so would require capturing entire 
packets in order to assess the distribution of errors within them. 
In summary, we can make a somewhat plausible, but not compelling, argument that we 
can reconcile the discrepancies in checksum failure rates. If we accept the argument, then the 
compression effect's large role in reconciling the two error rate estimates suggests that errors tend 

251 
to occur most often on point­to­point links, since those are the ones for which compression is widely 
used; and furthermore, most likely on slow point­to­point links, as those are the ones for which it is 
particularly appealing to use compression. Such links might also plausibly be relatively more prone 
to link errors, since the underlying technology will be pushed hard to try to squeeze out as much 
bandwidth as possible. 
Finally, we note that packet corruption combined with CSLIP can produce surprising 
errors. Because CSLIP highly compresses the representation of the IP and TCP headers, but does 
not utilize an additional checksum to protect the compact representation, a bit error can result in 
packets that appear in many respects perfectly reasonable, albeit different than what was originally 
sent! We refer to these as ``desynchronization errors,'' since one of the elements leading to them is 
that the CSLIP sender and receiver have lost agreement upon their common state. 
One benign form of desynchronization error exhibits itself as a change in the IP ``id'' field 
(x 10.3.5). This has virtually no effect upon the packet's integrity as far as TCP is concerned, though 
it can introduce ambiguities when attempting to match up packets in pairs of traces (x 10.5). 
A considerably nastier form of desynchronization error occurs when a packet alters in a 
plausible fashion. If undetected by the checksum, these packets will often match what the TCP 
receiving them expects, leading to a fundamental mismatch between the connection state at the 
two TCP endpoints. We observed several such instances, all in N 2 and all involving packets sent 
to or from lbli. In one, an acknowledgement for sequence 1 (corresponding to an ack for the 
receiver's SYN­ack) arrived at the receiver with an ack for sequence 33 instead, and similarly for 
the next packet; then two more packets after those arrived with acknowledgements for sequence 65. 
Needless to say, the receiver had never sent any of this data! In others, packets sent without any 
data arrived with 512 bytes of in­sequence data, and other packets changed size in flight. All of 
these failed their checksum tests. But the ability of a CSLIP link to turn bit errors into plausible 
header fields, which is somewhat inevitable due to its clever, heavy use of compression, means that, 
when a corrupted packet finally does pass the checksum test, it is considerably more likely to both 
be accepted by the receiving TCP as valid and to desynchronize the TCP's state with respect to that 
of its remote peer. 

252 
Chapter 14 
Bottleneck Bandwidth 
In this chapter we discuss one of the fundamental properties of a network connection, the 
bottleneck bandwidth that sets the upper limit on how quickly the network can deliver the sender's 
data to the receiver. In x 14.1 we discuss the general notion of bottleneck bandwidth and why we 
consider it a fundamental quantity. x 14.2 discusses ``packet pair,'' the technique used in previous 
work, and x 14.3 discusses why for our study we gain significant benefits using ``receiver­based 
packet pair,'' in which the measurements used in the estimation are those recorded by the receiver, 
rather than the ack ``echoes'' that the sender later receives. 
While packet pair often works well, in x 14.4 we illustrate four difficulties with the tech­ 
nique, three surmountable and the fourth fundamental. Motivated by these problems, we develop 
a robust estimation algorithm, ``packet bunch modes'' (PBM). To do so, we first in x 14.5 discuss 
an alternative estimation technique based on measurements of the ``peak rate'' (PR) achieved by the 
connection, for use in calibrating the PBM technique, which we then develop in detail in x 14.6. In 
x 14.7, we analyze the estimated bottleneck bandwidths for the Internet paths in our study, and in 
x 14.8 we finish with a comparison of the efficacy of the various techniques. 
14.1 Bottleneck bandwidth as a fundamental quantity 
Each element in the end­to­end chain between a data sender and the data receiver has 
some maximum rate at which it can forward data. These maxima may arise directly from physical 
properties of the element, such as the frequency bandwidth of a wire, or from more complex prop­ 
erties, such as the minimum amount of time required by a router to look up an address to determine 
how to forward a packet. The first of these situations often dominates, and accordingly the term 
bandwidth is used to denote the maximum rate, even if the maximum does not come directly from 
a physical bandwidth limitation. 
Because sending data involves forwarding the data along an end­to­end chain of network­ 
ing elements, the slowest element in the entire chain sets the bottleneck bandwidth, i.e., the max­ 
imum rate at which data can be sent along the chain. The usual assumption is that the bottleneck 
element is a network link with a limited bandwidth, although this need not be the case. 
Note that from our data we cannot say anything meaningful about the location of the 
bottleneck along the network path, since our methodology gives us only end­to­end measurements 
(though see x 15.4). Furthermore, there may be multiple elements along the network path, each 

253 
limited to the same bottleneck rate. Thus, our analysis is confined to an assessment of the bottleneck 
bandwidth as an end­to­end path property, rather than as the property of a particular element in the 
path. 
We must make a crucial distinction between bottleneck bandwidth and available band­ 
width. The former gives an upper bound on how fast a connection can possibly transmit data, while 
the less­well­defined latter term denotes how fast the connection in fact can transmit data, or in some 
cases how fast it should transmit data to preserve network stability, even though it could transmit 
faster. Thus, the available bandwidth never exceeds the bottleneck bandwidth, and can in fact be 
much smaller. Bottleneck bandwidth is often presumed to be a fairly static quantity, while available 
bandwidth is often recognized as intimately reflecting current network traffic levels (congestion). 
Using the above terminology, the bottleneck location(s), if we were able to pinpoint them, would 
generally not change during the course of a connection, unless the network path used by the connec­ 
tion underwent a routing changes. But the networking element(s) limiting the available bandwidth 
might readily change over the lifetime of a connection. 
TCP's congestion avoidance and control algorithms reflect an attempt to confine each 
connection to the available bandwidth. For this purpose, the bottleneck bandwidth is essentially 
irrelevant. For connection performance, however, the bottleneck bandwidth is a fundamental quan­ 
tity, because it indicates a limit on what the connection can hope to achieve. If the sender tries to 
transmit any faster, not only is it guaranteed to fail, but the additional traffic it generates in doing 
so will either lead to queueing delays somewhere in the network, or packet drops, if the overloaded 
element lacks sufficient buffer capacity. 
We discuss available bandwidth further in x 16.5, and for the remainder of this chapter 
focus on assessing bottleneck bandwidth. 
The bottleneck bandwidth is further a fundamental quantity because it determines what we 
term the self­interference time­constant, Q b . Q b measures the amount of time required to forward 
a given packet through the bottleneck element. Thus, Q b is identical to the service time at the 
bottleneck element; we use the term ``self­interference time­constant'' instead because of the central 
role Q b plays in determining when packet transit times are necessarily correlated, as discussed 
below. 
If a packet carries a total of b bytes and the bottleneck bandwidth is ae B byte/sec, then: 
Q b = 
b 
ae B 
(14.1) 
in units of seconds. We use the term ``self­interference'' because if the sender transmits two b­byte 
packets with an interval \DeltaT s ! Q b between them, then the second one is guaranteed to have to wait 
behind the first one at the bottleneck element (hence the use of ``Q'' to denote ``queueing''). 
We use the notation Q b instead of the more functional notation Q(b) because we will 
assume unless otherwise stated that, for a particular trace pair, b is fixed to the maximum segment 
size (MSS; x 9.2.2). We note that full­sized packets are larger than MSS, due to overhead from 
transport, network, and link­layer headers. However, while it might at first appear that this overhead 
is known (except for the link­layer) and can thus be safely added into b, if the bottleneck link along a 
path uses header compression (x 13.3) then the header as transmitted might take much less data than 
would appear from tallying the number of bytes in the header. Since many of the most significant 
bottleneck links in our study also use header compression, we decided to perform all of our analysis 

254 
of the bottleneck bandwidth in terms of the maximum rate at which a connection can transmit user 
data. 
For our measurement analysis, accurate assessment of Q b is critical. Suppose we observe 
a sender transmitting p 1 and p 2 , both b bytes in size, and that they are sent an interval \DeltaT s apart. If 
\DeltaT s ! Q b ; 
then we know that p 2 had to wait a time Q b 
, the delays experienced by p 1 and p 2 are perforce correlated. If 
\DeltaT s – Q b , then if p 2 experiences greater delay than p 1 , the increase is not due to self­interference 
but some other source (such as additional traffic from other connections, or processing delays). 
We use Q b to analyze packet timings and remove self­interference effects in Chapter 16. 
In this chapter, we focus on sound estimation of Q b , as we must have this in order for the subsequent 
timing analysis to be likewise sound. 
14.2 Packet pair 
The fundamental idea behind the packet pair estimation technique is that, if two packets 
are transmitted by the sender with an interval \DeltaT s ! Q b between them, then when they arrive at 
the bottleneck they will be spread out in time by the transmission delay of the first packet across 
the bottleneck: after completing transmission through the bottleneck, their spacing will be exactly 
Q b . Barring subsequent delay variations (due to downstream queueing or processing lulls), they will 
then arrive at the receiver spaced not \DeltaT s apart, but \DeltaT r = Q b . The size b then enables computation 
of ae B via Eqn 14.1. 1 
The principle of the bottleneck spacing effect was noted in Jacobson's classic congestion 
paper [Ja88], where it in turn leads to the ``self­clocking'' mechanism (x 9.2.5). Keshav subsequently 
formally analyzed the behavior of packet pair for a network in which all of the routers obey the 
``fair queueing'' scheduling discipline, and developed a provably stable flow control scheme based 
on packet pair measurements [Ke91]. 2 Both Jacobson and Keshav were interested in estimating 
available rather than bottleneck bandwidth, and for this variations from Q b due to queueing are of 
primary concern (x 16.5). But if, as for us, the goal is to estimate ae B , then these variations instead 
become noise we must deal with. 
To use Jacobson's self­clocking model to estimate bottleneck bandwidth requires an as­ 
sumption that delay variation in the network is small compared to Q b . Using Keshav's scheme 
requires fair queueing. Internet paths, however, often suffer considerable delay variation (Chap­ 
ter 16), and Internet routers do not employ fair queueing. Thus, efforts to estimate ae B using packet 
pair must deal with considerable noise issues. The first step in dealing with measurement noise is 
1 If the two packets in the pair have different sizes b1 and b2 , then which to use depends on how we interpret the 
timestamps for the packets. If the timestamps reflect when the packet began to arrive at the packet filter's monitoring 
point, then b1 should be used, since that is how much data was transmitted between the timestamps of the two packets. 
If the timestamps reflect when the packet finished arriving, then b2 should be used. In practice, a packet's timestamp is 
recorded some time after the packet has finished arriving, per x 10.2, and so if b1 6= b2 , tcpanaly uses b2 . 
2 Keshav also coined the term ``packet pair.'' 

255 
to analyze as large a number of pairs as feasible, with an eye to the tradeoff between measurement 
accuracy and undue loading of the network by the measurement traffic. 3 
Bolot used a stream of packets sent at fixed intervals to probe several Internet paths in 
order to characterize delay and loss behavior [Bo93]. He measured round­trip delay of UDP echo 
packets and, among other analyses, applied the packet pair technique to form estimates of bottleneck 
bandwidths. He found good agreement with known link capacities, though a limitation of his study 
is that the measurements were confined to a small number of Internet paths. One of our goals is to 
address this limitation by determining how well packet pair techniques work across diverse Internet 
conditions. 
Recent work by Carter and Crovella also investigates the utility of using packet pair in the 
Internet for estimating bottleneck bandwidth [CC96a]. Their work focusses on bprobe, a tool they 
devised for estimating bottleneck bandwidth by transmitting 10 consecutive ICMP echo packets 
and recording the arrival times of the corresponding replies. bprobe then repeats this process with 
varying (and carefully chosen) packet sizes. Much of the effort in developing bprobe concerns 
how to filter the resulting raw measurements in order to form a solid estimate. bprobe currently 
filters by first widening each estimate into an interval by adding an (unspecified) error term, and then 
finding the point at which the largest number of intervals overlap. The authors also undertook to 
calibrate bprobe by testing its performance for a number of Internet paths with known bottlenecks. 
They found in general it worked well, though some paths exhibited sufficient noise to sometimes 
produce erroneous estimates. Finally, they note that measurements made using larger echo packets 
yielded more accurate estimates than those made using smaller packets, which bodes well for our 
interest in measuring Q b for b = MSS. 
One limitation of both studies is that they were based on measurements made only at the 
data sender (x 9.1.3). Since in both studies the packets echoed back from the remote end were the 
same size as those sent to it, neither analysis was able to distinguish whether the bottleneck along 
the forward and reverse paths was the same, or whether it was present in only one direction. The bot­ 
tleneck could differ in the two directions due the packets traversing different physical links because 
of asymmetric routing (x 8), or because some media, such as satellite links, can have significant 
bandwidth asymmetries depending on the direction traversed [DMT96]. 
For the study in [CC96a], this is not a problem, because the authors' ultimate goal was to 
determine which Web server to pick for a document available from a number of different servers. 
Since Web transfers are request/response, and hence bidirectional (albeit potentially asymmetric in 
the volume of data sent in each direction), the bottleneck for the combined forward and reverse path 
is indeed a figure of interest. For general TCP traffic, however, this is not always the case, since for a 
unidirectional transfer---especially for FTP transfers, which can sometimes be quite huge [PF95]--- 
the data packets sent along the forward path are much larger than the acks returned along the reverse 
path. Thus, even if the reverse path has a significantly lower bottleneck bandwidth, this is unlikely to 
limit the connection's maximum rate. However, for estimating bottleneck bandwidth by measuring 
TCP traffic a second problem arises: if the only measurements available are those at the sender, then 
ack compression (x 16.3.1) can significantly alter the spacing of the small ack packets as they return 
through the network, distorting the bandwidth estimate. We investigate the degree of this problem 
below. 
3 Gathering large samples, however, can conflict with another goal, that of forming an estimate quickly, briefly 
discussed at the end of the chapter. 

256 
Time 
Sequence 
# 
0.0 0.5 1.0 1.5 2.0 2.5 3.0 
0 
2000 
4000 
6000 
8000 
10000 
Figure 14.1: Paired sequence plot showing timing of data packets at sender (black squares) and 
when received (arrowheads) 
14.3 Receiver­based packet pair 
For our analysis, we consider what we term receiver­based packet pair (RBPP), in which 
we look at the pattern of data packet arrivals at the receiver. We also utilize knowledge of the 
pattern in which the data packets were originally sent, so we assume that the receiver has full timing 
information available to it. In particular, we assume that the receiver knows when the packets sent 
were not stretched out by the network, and can reject these as candidates for RBPP analysis. 
RBPP is considerably more accurate than sender­based packet pair (SBPP; cf. x 14.2), 
since it eliminates the additional noise and possible asymmetry of the return path, as well as noise 
due to delays in generating the acks themselves (x 11.6.4). Figure 14.1 shows a paired sequence 
plot for data transferred over a path known to have a 56 Kbit/sec bottleneck link. The centers of the 
filled black squares indicate the times at which the sender transmitted the successive data packets, 
and the arrowheads point to the times at which they arrived at the receiver. (We have adjusted the 
relative clock offset per the methodology given in x 12.5). The packet pair effect is quite strong: 
while the sender tends to transmit packets in groups of two back­to­back (due to slow start opening 
the congestion window), this timing structure has been completely removed by the time the packets 
arrive, and instead they come in at a nearly constant rate of about 6,200 byte/sec. 
Figure 14.2 shows the same trace pair with the acknowledgements added. They are offset 
slightly lower than the sequence number they acknowledge for legibility. The arrows start at the 
point in time at which the ack was generated by the receiver, and continue until received by the 
sender. We can see that some acks are generated immediately, but others (such as 4,096) are delayed. 
Furthermore, there is considerable variation among the transit times of the acks, even though they 
are almost certainly too small to be subject to stretching at the bottleneck link along the return path. 
If we follow the ack arrowheads by eye, it is clear that the strikingly smooth pattern in Figure 14.1 

257 
Time 
Sequence 
# 
0.0 0.5 1.0 1.5 2.0 2.5 3.0 
0 
2000 
4000 
6000 
8000 
10000 
Figure 14.2: Same plot with acks included 
has been blurred by the ack delays, which have nothing to do with the quantity of interest, namely 
Q b on the forward path. 
14.4 Difficulties with packet pair 
As shown in the Bolot and Carter/Crovella studies ([Bo93, CC96a]), packet pair tech­ 
niques often provide good estimates of bottleneck bandwidth. We are interested both in estimating 
the bottleneck bandwidth of the Internet paths in our study, and, furthermore, whether the packet­ 
pair technique is robust enough that an Internet transport protocol might profitably use it in order to 
make decisions based on Q b . 
A preliminary investigation of our data revealed four potential problems with packet pair 
techniques, even if receiver­based. Three of these can often be satisfactorily addressed, but the 
fourth is more fundamental. We discuss each in turn. 
14.4.1 Out­of­order delivery 
The first problem stems from the fact that, for some Internet paths, out­of­order packet 
delivery occurs quite frequently (x 13.1). Clearly, packet pairs delivered out of order completely 
destroy the packet pair technique, since they result in \DeltaT r ! 0, which then leads to a negative 
estimate for ae B . The receiver sequence plot in Figure 14.3 illustrates the basic problem. (Compare 
with the clean arrivals in Figure 14.1.) 
Out­of­order delivery is symptomatic of a more general problem, namely that the two 
packets in a pair may not take the same route through the network, which then violates the as­ 
sumption that the second queues behind the first at the bottleneck. In a sense, out­of­order delivery 
is a blessing, because the receiver can usually detect the event (based on sequence numbers, and 

258 
Time 
Sequence 
# 
60 65 70 75 
25000 
30000 
35000 
40000 
45000 
50000 
Figure 14.3: Receiver sequence plot illustrating difficulties of packet­pair bottleneck bandwidth 
estimation in the presence of out­of­order arrivals 
possibly IP ``id'' fields for retransmitted packets; cf. x 10.5). More insidious are packets pairs that 
traverse different paths but still arrive in order. The interval computed from their arrivals may have 
nothing to do with the bottleneck bandwidth, and yet it is difficult to recognize this case and discard 
the measurement from subsequent analysis. We discuss a particularly problematic instance of this 
problem in x 14.4.4 below. 
14.4.2 Limitations due to clock resolution 
Another problem relates to the receiver's clock resolution, C r (x 12.3). C r can introduce 
large margins of error around estimates of ae B . Suppose two b­byte packets arrive at the receiver 
with a spacing of \DeltaT r . We want to estimate ae B from Eqn 14.1 using 
\DeltaT r = Q b 
= 
b 
ae B 
; 
and hence 
ae B = 
b 
\DeltaT r 
: (14.2) 
However, we cannot measure \DeltaT r exactly, but only estimate an interval in which it lies, using: 
max(\Delta e 
T r 
259 
where \Delta e 
T r is the value reported by the receiver's clock for the spacing between the two packets. 
Combining Eqn 14.2 with Eqn 14.3 gives us: 
b 
ae b = 
b 
\Delta b 
T r 
; 
b 
\Delta e 
T r + C r 
Ÿ b 
ae b Ÿ b 
max(\Delta e 
T r 
r = 10 msec, a common 
value on older hardware (x 12.4.2), then for b = 512 bytes, from the arrival of a single packet pair 
we cannot distinguish between 
ae B = 
512 
0.010 sec = 51; 200 byte/sec; 
and 
ae B = 1: 
This means we cannot distinguish between a fairly pedestrian T1 link of under 200 Kbyte/sec, and 
a blindingly fast (today!) OC­12 link of about 80 Mbyte/sec. 
For C r = 1 msec, the threshold rises to 512,000 byte/sec, still much too low for mean­ 
ingful estimation for high­speed networks. For today's networks, C r = 100 ¯sec almost allows 
us to distinguish between T3 speeds of a bit over 5 Mbyte/sec and higher speeds. Since some of 
the clocks in our study had finer resolution, we view this problem as tractable with today's (better) 
hardware. It is not clear, however, whether in the future processor clock resolution will grow finer 
at a rate to match how network bandwidths grow faster (and thus Q b decreases). 
While some of today's hardware provides sufficient resolution for packet­pair analysis, 
other platforms do not, so we still need to find a way to deal with low­resolution clocks. In line with 
the argument in the previous paragraph, doing so also potentially benefits measurement of future 
networks, since their bandwidth growth may outpace that of clock resolution. 
A basic technique for coping with poor clock resolution is to use packet bunch rather than 
packet pair. 4 The idea behind packet bunch, in which k – 2 back­to­back packets are used, is that 
bunches should be less prone to noise, since individual packet variations are smoothed over a single 
large interval rather than k 
disrupted by a 
significant delay, leading to underestimation of ae B . We investigate this concern below. However, 
another benefit of packet bunch is that the overall time interval \DeltaT k 
r spanned by the k packets will 
be about k 
result. 
4 The term ``packet bunch'' has been in informal use for at least several years; however, we were unable to find any 
appearance of it in the networking literature. The notion appears in [BP95a], in the discussion of the ``Vegas­*'' variant, 
which attempts to estimate available bandwidth using a four­packet version of packet pair; and in [Ho96], which uses an 
estimate derived from the timing of three consecutive acks. 

260 
Time 
Sequence 
# 
0 2 4 6 8 10 12 
0 
20000 
40000 
60000 
80000 
100000 
Figure 14.4: Receiver sequence plot showing two distinct bottleneck bandwidths 
14.4.3 Changes in bottleneck bandwidth 
Another problem that any bottleneck bandwidth estimation must deal with is the possibil­ 
ity that the bottleneck changes over the course of the connection. Figure 14.4 shows a trace in which 
this happened. We have shown the entire trace, but only the data packets and not the corresponding 
acks. While the details are lost, the eye immediately picks out a transition between one overall slope 
to another, just after T = 6. The first slope corresponds to about 6,600 byte/sec, while the second 
is about 13,300 byte/sec, and increase of about a factor of two. 
For this example, we know enough about one of the endpoints (lbli) to fully describe 
what occurred. lbli's Internet connection is via an ISDN link. The link has two channels, each 
nominally capable of 64 Kbyte/sec. When lbli initially uses the ISDN link, the router only acti­ 
vates one channel (to reduce the expense). However, if lbli makes sustained use of the link, then 
the router activates the second channel, doubling the bandwidth. 
While for this particular example the mechanism leading to the bottleneck shift is specific 
to the underlying link technology, the principle that the bottleneck can change with time is both 
important and general. It is important to detect such an event, because it has a major impact on 
the ensuing behavior of the connection. Furthermore, bottlenecks can shift for reasons other than 
multi­channel links. In particular, routing changes might alter the bottleneck in a significant way. 
Packet pair studies to date have focussed on identifying a single bottleneck bandwidth 
[Bo93, CC96a]. Unfortunately, in the presence of a bottleneck shift, any technique shaped to esti­ 
mate a single, unchanging bottleneck will fail: it will either return a bogus compromise estimate, 
or, if care is taken to remove noise, select one bottleneck and reject the other. In both cases, the 
salient fact that the bottleneck shifted is overlooked. We attempt to address this problem in the 
development of our robust estimation algorithm (x 14.6). 

261 
Time 
Sequence 
# 
8.5 9.0 9.5 10.0 
55000 
60000 
65000 
70000 
75000 
80000 
Figure 14.5: Enlargement of part of the previous figure 
14.4.4 Multi­channel bottleneck links 
We now turn to a more fundamental problem with packet­pair techniques, namely bot­ 
tleneck estimation in the face of multi­channel links. Here we do not concern ourselves with the 
problem of detecting that the bottleneck has changed due to the activation or deactivation of the 
link's additional channel (x 14.4.3). We instead illustrate a situation in which packet pair yields 
incorrect overestimates even in the absence of any delay noise. 
Figure 14.5 expands a portion of Figure 14.4. The slope of the large linear trend in the 
plot corresponds to 13,300 byte/sec, as earlier noted. However, we see that the line is actually made 
up of pairs of packets. Figure 14.6 expands the plot again, showing quite clearly the pairing pattern. 
The slope between the pairs of packets corresponds to a data rate of about 160 Kbyte/sec, even 
though we know that the ISDN link has a hard limit of 128 Kbit/sec = 16 Kbyte/sec, a factor of ten 
smaller! Clearly, an estimate of 
b 
ae b ß 160 Kbyte/sec 
must be wrong, yet that is what a packet­pair calculation will yield. 
The question then is: where is the spacing corresponding to 160 Kbyte/sec coming from? 
A clue to the answer lies in the number itself. It is not far below the user data rates achieved over 
T1 circuits, typically on the order of 170 Kbyte/sec. It is as though every other packet were immune 
to queueing behind its predecessor at the known 16 Kbyte/sec bottleneck, but instead queued behind 
it at a downstream T1 bottleneck. 
Indeed, this is exactly what is happening. As discussed in x 14.4.3, the bottleneck ISDN 
link has two channels. These operate in parallel. That is, when the link is idle and a packet arrives, 
it goes out over the first channel, and when another packet arrives shortly after, it goes out over the 
other channel. If a third packet then arrives, it has to wait until one of the channels becomes free. 
Effectively, it is queued behind not its immediate predecessor but its predecessor's predecessor, the 

262 
Time 
Sequence 
# 
8.80 8.85 8.90 8.95 9.00 9.05 
59000 
60000 
61000 
62000 
Figure 14.6: Enlargement of part of the previous figure 
first packet in the series, and it is queued not for a 16 Kbyte/sec link but for an 8 Kbyte/sec channel 
making up just part of the link. 
As queues build up at the router utilizing the multi­channel link, often both channels will 
remain busy for an extended period of time. In this case, additional traffic arriving at the router, or 
processing delays, can alter the ``phase'' between the two channels, meaning the offset between when 
the first begins sending a packet and when the second does so. Thus, we do not always get an arrival 
pattern clearly reflecting the downstream bottleneck as shown in Figure 14.6. We can instead get a 
pairing pattern somewhere between the downstream bottleneck and the true bottleneck. Figure 14.7 
shows an earlier part of the same connection where a change in phase quite clearly occurs a bit 
before T = 8. Here the pair slope shifts from about 23 Kbyte/sec up to 160 Kbyte/sec. Note that 
the overall rate at which new data arrives at the receiver has not changed at all during this transition, 
only the fine­scale timing structure has changed. 
We conclude that, in the presence of multi­channel links, packet­pair techniques can give 
completely misleading estimates for ae B . Worse, these estimates will often be much too high. The 
fundamental problem is the assumption with packet pair that there is only a single path through the 
network, and that therefore packets queue behind one another at the bottleneck. 
We should stress that the problem is more general than the circumstances shown in this 
example, in two important ways. First, while in this example the parallelism leading to the esti­ 
mation error came from a single link with two separate (and parallel) physical channels, the exact 
same effect could come from a router that balances its outgoing load across two different links. If 
these links have different propagation times, then the likely result is out­of­order arrivals, which can 
be detected by the receiver and removed from the analysis (x 14.4.1). But if the links have equal 
or almost equal propagation times, then the parallelism they offer can completely obscure the true 
bottleneck bandwidth. 
Second, it may be tempting to dismiss this problem as correctable by using packet bunch 

263 
Time 
Sequence 
# 
7.0 7.5 8.0 8.5 
35000 
40000 
45000 
50000 
Figure 14.7: Multi­channel phasing effect 
(x 14.4.2) with k = 3 instead of packet pair. This argument is not compelling without further investi­ 
gation, however, because packet bunch is potentially more prone to error; and, more fundamentally, 
k = 3 only works if the parallelism comes from two channels. If it came from three channels (or 
load­balancing links), then k = 3 will still yield misleading estimates. 
We now turn to developing techniques to address these difficulties. 
14.5 Peak rate estimation 
In this section we discuss a simple, cheap­to­compute, and not particularly accurate tech­ 
nique for estimating the bottleneck bandwidth along a network path. We term this technique peak 
rate and subsequently refer to it as PR. Our interest in PR lies in providing calibration for the robust 
technique developed in the next section, based on packet­bunch modes (``PBM''). We develop two 
PR­based estimates, a ``conservative'' estimate, c 
PR c 
, very unlikely to be an overestimate, and an 
``optimistic'' estimate, c 
PR o 
, which is more likely to be accurate but is also prone to overestimation. 
Armed with these estimates, we then can compare them with results given by PBM. If the robust 
technique yields an estimate less than c 
PR c 
, or higher than c 
PR o 
, then the discrepancy merits investi­ 
gation. If they generally agree, then perhaps we can use the simpler PR techniques instead of PBM 
without losing accuracy (though it would be surprising to find that PR techniques suffice, per the 
discussion below). 
PR is based on the observation that the peak rate the connection ever manages to transmit 
along the path should give a lower bound on the bottleneck rate. PR is a necessarily stressful 
technique in that it requires loading the network to capacity to assure accuracy. As such, we would 
prefer not to use PR as an active measurement methodology, but it works fine for situations in which 
the measurements being analyzed are due to traffic initiated for some reason other than bottleneck 
measurement. Thus, PR makes sense as a candidate algorithm for adding to a transport protocol. 

264 
In contrast, packet pair and PBM do not necessarily require stressing the network for accuracy, so 
they are attractive both as additions to transport protocols to aid in their decision­making, and as 
independent network analysis tools. 
At its simplest, PR consists of just dividing the amount of data transferred by the duration 
of the connection. This technique, however, often grossly underestimates the true bottleneck band­ 
width, because transmission lulls due to slow­start, insufficient window, or retransmission timeouts 
can greatly inflate the connection duration. 
To reduce the error in PR requires confining the proportion of the connection on which we 
calculate the peak rate to a region during which none of these lulls impeded transmission. Avoiding 
slow­start and timeout delays is easy, since these regions are relatively simple to identify. Identifying 
times of insufficient window, however, is more difficult, because the correct window is a function of 
both the round­trip time (RTT) and the available bandwidth, and the latter is shaped in part by the 
bottleneck bandwidth, which is what we are trying to estimate. 
If the connection was at some point not window­limited, then by definition it achieved 
a sustained rate (over at least one RTT) at or exceeding the available capacity. Since the hope 
embodied in PR is that at some point the available capacity matched the bottleneck bandwidth, 
we address the problem of insufficient window by forming our estimate from the maximum rate 
achieved over a single RTT. 
tcpanaly computes a PR­based estimate by advancing through the data packet arrivals 
at the TCP receiver as follows. For each arrival, it computes the amount of data (in bytes) that 
arrived between that arrival and the next data packet coming just beyond the edge of a temporal 
window equal to the minimum RTT, RTT min . (RTT min is computed as the smallest interval between 
a full­sized packet's departure from the sender and the arrival at the sender of an acknowledgement 
for that packet.) Suppose we find B bytes arrived in a total time \DeltaT r ? RTT min , and that the 
interval spanned by the departure of the packets when transmitted by the sender is \DeltaT s . 5 Finally, 
if any of the packets arrived out of order, then we exclude the group of packets from any further 
analysis. 
Otherwise, we compute the expansion factor 
¸ s;r = 
\DeltaT r +C r 
\DeltaT s +C s 
; (14.5) 
where C s and C r are the resolutions of the sender's and receiver's clocks (x 12.3). ¸ s;r measures the 
factor by which the group of packets was spread out by the network. If less than 1, then the packets 
were not spread out by the network and hence not shaped by the bottleneck. Thus, calculations 
based on their arrival times should not be used in estimating the bottleneck. In practice, however, 
two effects complicate the simple rule of rejecting timings if ¸ s;r ! 1. The first is that, if C s is con­ 
siderably different (orders of magnitude larger or smaller) than C r , then ¸ s;r can vary considerably, 
even if the magnitudes of \DeltaT r and \DeltaT s are close. The second problem is that sometimes due to 
``self­clocking'' (x 9.2.5), a connection rapidly settles into a pattern of transmitting packets at very 
close to the bottleneck bandwidth, in which case we might find ¸ s;r slightly less than 1 even though 
it allows for a solid estimate of ae B . To address these concerns, we use a slightly different definition 
5 Here, B does not include the bytes carried by the first packet of the group, since we assume that the packet timestamps 
reflect when packets finished arriving, so the first packet's bytes arrived before the point in time indicated by its timestamp. 
Also see the footnote in x 14.2. 

265 
of ¸ s;r than that given by Eqn 14.5: 
e 
¸ s;r = 
\DeltaT r +C r 
\DeltaT s + C r 
; (14.6) 
namely, C r is used in both the numerator and the denominator, which eliminates large swings in 
¸s; r due to discrepancies between C r and C s . This is a bit of a ``fudge factor,'' and in retrospect 
a better solution would have been to use C r + C s ; but, we find it works well in practice. The 
other fudge factor is that tcpanaly allows estimates for e 
¸ s;r – 0:95, to accommodate self­clocking 
effects. 
After taking into account these considerations, we then form the PR­based estimate: 
c 
PR c 
= 
B 
\DeltaT r +C r 
: (14.7) 
The c superscript indicates that the estimator is conservative. Since it requires \DeltaT ? RTT min , it 
may be an underestimate if the connection never managed to ``fill the pipe,'' which we illustrate 
shortly. 
For the same group of packets, tcpanaly also computes an ``optimistic'' estimate corre­ 
sponding to the group minus the final packet (the one that arrived more than RTT min after the first 
packet): 
c 
PR o 
= 
B 
packet. (Thus, 
we always have \DeltaT 
ve reliable estimates, because they queued at the 
bottleneck link behind earlier packets transmitted by the sender. tcpanaly does require, however, 
that either \DeltaT window (i.e., the connection was 
certainly window­limited), to ensure that compression of a small number of packets does not skew 
the estimate. 6 
We compute the final estimates as the maxima of c 
PR c 
and c 
PR o 
. Note that the algorithms 
described above work best with cooperation between the sender and the receiver, in order to detect 
out­of­order arrivals, and to form a good estimate for RTT min , which can be quite difficult to assess 
from the receiver's vantage point because it cannot reliably infer the sender's congestion window. 
Figure 14.8 illustrates the difference between computing c 
PR c 
and c 
PR o 
for a window­ 
limited connection. RTT min is about 110 msec. 8 packets arrive, starting at T = 1:5. The optimistic 
estimate is based on the 3,584 bytes arriving 22 msec after the first packet, for a rate of about 
163 Kbyte/sec. The conservative estimate includes the 9th packet arriving significantly later than 
the first 8 (due to the window limit). The corresponding estimate is 4,096 bytes arriving in 115 msec, 
for a rate of about 36 Kbyte/sec. In this case, the optimistic estimate is much more accurate, as the 
limiting bandwidth is in fact that of a T1 circuit, corresponding to about 170 Kbyte/sec of user data. 
In this example, the connection is limited by the offered window, which is easy to detect. Very of­ 
ten, however, connections are instead limited by the congestion window, due earlier retransmission 
6 The precise method used is a bit more complicated, since it includes the possibility of different­sized packets arriving. 

266 
Time 
Sequence 
# 
1.1 1.2 1.3 1.4 
6000 
8000 
10000 
12000 
3,584 bytes / 22 msec = 
163,000 bytes / sec 
4,096 bytes / 115 msec = 
36,000 bytes/sec 
Figure 14.8: Peak­rate optimistic and conservative bottleneck estimates, window­limited connection 
events. This limit is more difficult for the receiver to detect. Thus, c 
PR c 
often forms a considerable 
underestimate. 
On the other hand, Figure 14.9 shows an instance in which c 
PR o 
is a large overestimate. 
The optimistic and conservative estimates for this trace both occurred for the group of packets 
arriving at time T = 1:5, in the middle of the figure. As can be seen from the surrounding groups, 
the true bottleneck capacity is about 170 Kbyte/sec (T1). The packet group at T = 1:5, however, has 
been compressed by the network (cf. x 16.3.2), and it all arrives at Ethernet speed. Thus, PR forms 
a gross overestimate for c 
PR o 
. Furthermore, even if ¸ 
simple to compute, it often fails to provide reliable estimates. We 
need a more robust estimation technique. 
14.6 Robust bottleneck estimation 
Motivated by the shortcomings of packet pair and PR estimation techniques, we developed 
a significantly more robust procedure, ``packet bunch modes'' (PBM). The main observation behind 
PBM is that dealing with the shortcomings of the other techniques involves both forming a range 
of estimates based on different packet bunch sizes, and to analyze the result with the possibility in 
mind of finding more than one bottleneck value. 
By considering different bunch sizes, we can accommodate limited receiver clock reso­ 
lutions (x 14.4.2) and the possibility of multiple channels or load­balancing across multiple links 
(x 14.4.4), while still avoiding the risk of underestimation due to noise diluting larger bunches, or 
window limitations (x 14.5), since we also consider small bunch sizes. 

267 
Time 
Sequence 
# 
1.2 1.3 1.4 1.5 1.6 1.7 
35000 
40000 
45000 
50000 
55000 
Figure 14.9: Erroneous optimistic estimate due to data packet compression 
By allowing for finding multiple bottleneck values, we both again accommodate multi­ 
channel (and multi­link) effects, and also the possibility of a bottleneck change (x 14.4.3). Further­ 
more, these two effects can be distinguished from one another: multiple bottleneck values due to 
multi­channel effects overlap, while those due to bottleneck changes fall into separate regions in 
time. 
In the remainder of this section we discuss a number of details of PBM. Many are heuristic 
in nature and evolved out of an iterative process of refining PBM to avoid a number of obvious 
estimation errors. It is unfortunate that PBM has a large heuristic component, as it makes it more 
difficult to understand. On the other hand, we were unable to otherwise satisfactorily deal with the 
considerable problem of noise in the packet arrival times. We hope that the basic ideas underlying 
PBM---searching for multiple modes and interpreting the ways they overlap in terms of bottleneck 
changes and multi­channel paths---might be revisited in the future, in an attempt to put them on a 
more systematic basis. 
14.6.1 Forming estimates for each ``extent'' 
PBM works by stepping through an increasing series of packet bunch sizes, and, for each, 
computing from the receiver trace all of the corresponding bottleneck estimates. We term the bunch 
size as the extent and denote it by k. For each extent, we advance a window over the arrivals at the 
receiver. The window is nominally k packets in size, but is extended as needed so that it always 
includes k \Delta MSS bytes of data (so we can include less­than­full packets in our analysis). We do not, 
however, do this extension for k = 1, as that can obscure multi­channel effects. 7 
7 For higher extents (k ? 1), this extension does not obscure multi­channel effects, because we detecte multi­channel 
bottlenecks based on comparing estimates for k = 1 with estimates for k = m, where m is the number of multiple 
channels. Thus, the main concern is to not confuse the k = 1 estimate. 

268 
We also extend the window to include more packets if \DeltaT r ! C r , that is, if all the arrivals 
occurred without the receiver's clock advancing. 
If any of the arrivals within the window occurred out of order, or if they were transmitted 
due to a timeout retransmission, we skip analysis of the group of packets, as the arrival timings will 
likely not reflect the bottleneck bandwidth. 
If when the last packet in the group was sent, the sender had fewer than k packets in flight, 
then some unusual event occurred during the flight (such as retransmission or receipt of an ICMP 
source quench), and we likewise skip analysis of the group. 
We next compute bounds on \DeltaT r , using Eqn 14.3: 
\DeltaT 
+ 
r = \DeltaT r + C r : 
We also compute two expansion factors associated with the group, similar to that in Eqn 14.6. The 
first is more conservative: 
¸ c 
s;r = 
\DeltaT r 
likely to be overall the more accurate, but subject to fluctuations due to limited clock resolution: 
¸ o 
s;r = 
\DeltaT r 
\DeltaT s + C r 
: 
We term it ``optimistic'' since it yields expansion factors larger than ¸ c 
s;r . 
If the last packet group we inspected spanned an interval of \DeltaT 0 
r , then we perform a 
heuristic test. If: 
\DeltaT r + C r 
\DeltaT 0 
r + C r 
? 2; (14.10) 
then this group was spaced out more than twice as much as the previous group, and we skip the group 
(after assigning \DeltaT 0 
r \DeltaT r ), because it is likely to reflect sporadic arrivals. In some cases, this 
decision will be wrong; in particular, after a compression event such as that shown in Figure 14.9, 
we will often skip the immediately following packet group. However, this will be the only group we 
skip after the event, so, unless a trace is riddled with compression, our estimation does not suffer. 
We then test whether ¸ o 
s;r – 0:95 (where use of 0.95 rather than 1 is again an attempt to 
accommodate the self­clocking effect, per the discussion of Eqn 14.6). If so, we ``accept'' the group, 
meaning we treat it as providing a reliable estimate. (We will further analyze the accepted estimates, 
as discussed below.) Let B denote the number of bytes in the group (excluding those in the first 
packet, as also done in x 14.5). With the ith such estimate (corresponding to the ith acceptable 
group), we associate six quantities: 
1. p f 
i , an index identifying the first packet in the group; 
2. p l 
i , an index identifying the last packet in the group; 
3. ae i = B=\DeltaT r , the bandwidth estimate; 

269 
4. ae = B=\DeltaT r +, the lower bound on the estimate due to the clock resolution C r ; 
5. ae + 
i = B=\DeltaT r 
that, if ¸ o 
s;r ! 0:2, i.e., the data packets were 
grossly compressed, then we also accept the estimate given by the corresponding group. (So we 
reject the estimate if 0:2 Ÿ ¸ o 
s;r ! 0:95.) This reasoning behind this heuristic is the same as that ac­ 
companying the discussion of Eqn 14.8, namely, that data packets can be highly compressed but still 
reflect the bottleneck bandwidth due to queueing at the bottleneck behind earlier packets transmitted 
by the sender. Finally, we note that this heuristic does not generally lead to problems accepting es­ 
timates based on compressed data that would otherwise be rejected, because the compression needs 
to be rampant for PBM to erroneously accept it as a bona fide estimate. 
Finally, from a computational perspective, we would like to have an upper bound on the 
maximum extent k for which we do this analysis. The nominal upper bound we use is k = 4. If, 
however, the bounds on the estimates obtained for k ! 4 are unsatisfactorily wide due to limited 
clock resolution, or if we found a new candidate bottleneck for k = 4, then we continue increasing k 
until both the bounds become satisfactory and we have not produced any new bottleneck candidates. 
These issues are discussed in more detail in the next section. 
14.6.2 Searching for bottleneck bandwidth modes 
In this section we discuss how we reduce a set of bottleneck bandwidth estimates into 
a small set of one or more values. Let \Psi(k) be the set of bottleneck estimates formed using the 
procedure outlined in the previous section, for an extent of k packets. Let n k denote the number of 
estimates, and N the total number of packets that arrived at the receiver. If: 
n k ! max( 
N 
4 
; 5); 
then we reject further analysis of \Psi(k) because it consists of too few estimates. Otherwise, consider 
\Psi(k) as comprising a sound set of estimates, and turn to the problem of extracting the best estimate 
from the set. 
Previous bottleneck estimation work has focussed on identifying a single best estimate 
[Bo93, CC96a]. As discussed at the beginning of x 14.6, we must instead accommodate the pos­ 
sibility of forming multiple estimates. This then rules out the use of the most common robust 
estimator, the median, since it presupposes unimodality. We instead turn to techniques for identi­ 
fying modes, i.e., local maxima in the density function of the distribution of the estimates. Using 
modal techniques gives PBM the ability to distinguish between a number of situations (bottleneck 
changes, multi­channel links) that previous techniques cannot. 
Clustering the estimates 
Because modes are properties of density functions, in trying to identify them we run into 
the usual problem of estimating density from a finite set of samples drawn from an (essentially) 

270 
continuous distribution. [PFTV86] gives one procedure for doing so, based on passing a size­k 
window over sorted samples X (i) to see where X (i+k [X (i) ; X (i+k 
choice for k, and different values yield different 
estimates. 
We then devised an algorithm based on a similar principle of conceptually passing a win­ 
dow over the sorted data. Instead of parameterizing the algorithm with a window size k, we use an 
``error­factor,'' oe, for oe ? 1. We then proceed through the sorted data, and, for each X (i) , we search 
for an l satisfying i Ÿ l ! n such that: 
X (l) Ÿ oeX (i) ! X (l+1) : 
In other words, we look ahead to find two estimates that straddle the value of a factor oe larger than 
X (i) . The first estimate, with index (l), is within a factor oe of X (i) , while the second, (l + 1), is 
beyond it. If there is no such l (which can only happen if X (n) Ÿ oeX (i) ), then we consider X (n) as 
the end of the range of the modal peak. 
We term C i = l cluster size, as it gives us the number of points that lie within 
a factor of oe of X (i) . If C i Ÿ 3, then we consider the cluster trivial, and disregard it. Otherwise, 
we take as the cluster's mode its central observation, i.e., X (i+ C i 
2 ) 
. If this is identical to that of a 
previously observed cluster, we merge the two clusters. 8 We then continue advancing the window 
until we have defined m cluster tuples. The final step is to prune out any clusters that overlap with 
a larger cluster. 
We now turn to how to select oe. We decided to regard as consistent any bottleneck esti­ 
mates that fall within \Sigma20% of the central bottleneck estimate. We found that using smaller error 
bars (less than \Sigma20%) can lead to PBM finding spurious multiple peaks, while larger ones can wash 
out true, separate peaks. 
Consequently, we will accept as falling within the estimate's bounds 
X (i) = 0:8 \Delta X (i+ C i 
2 ) 
; 
and 
X (l) = 1:2 \Delta X (i+ C i 
2 ) 
: 
However, oe is in terms of the ratio between X (l) , the high end of the bottleneck estimate's range, 
and X (i) , the low end. It is easy to show that the above two relationships can hold if oe = 1:5, so 
that is the value we choose. Note, though, that we do not define the estimate's bounds in terms of 
\Sigma20%, but as 
[min(X (i) ; ae 
limits, the bounds will often be tighter than \Sigma20%; 
but in the presence of such limits, they will often be wider. 
The final result is \Phi(k), a list of disjoint, non­trivial clusters associated with \Psi(k), sorted 
by descending cluster size, and each with associated error bars given by Eqn 14.11. 
8 This can happen because of repeated observations yielding the same bottleneck estimates, due to clock resolution 
granularities and constant packet sizes. 

271 
Reducing the clusters 
It is possible that \Phi(k) is empty, because \Psi(k) did not contain any non­trivial clusters. 
This can happen even if n k is large, if the individual estimates differ sufficiently. In this case, we 
consider the extent­k analysis as having failed, and proceed to the next extent, or stop if k – 4. 
Otherwise, we inspect the estimate reflected by each cluster to determine its suitability, 
as follows. First, we compute ¸ c(50) 
i and ¸ c(95) 
i as the 50th and 95th percentiles of the conservative 
expansion factors ¸ c 
i associated with each of the estimates / i within the cluster (per Eqn 14.9). 
We next examine all of the estimates that fall within the cluster's error bars (nominally, 
\Sigma20%), to determine the cluster's range: where in the trace we first and last encountered packets 
leading to estimates consistent with the cluster. When determining the cluster's range, we only 
consider estimates for which ¸ c 
i – min(¸ c(50) 
i ; 1:1), to ensure that we base the cluster's range on 
sound estimates (those derived from definite expansion, if present very often; otherwise, those in 
the upper 50% of the expansions). Without this filtering, a cluster's range can be artificially inflated 
due to self­clocking and spurious noise, which in turn can mask a bottleneck change. 
We next inspect all of the extent­k estimates derived from packets falling within i 's inner 
range, to determine j i , the proportion of these estimates consistent with the cluster (within the error 
bars given by Eqn 14.11). j i is the cluster's local proportion, and reflects how well it captures the 
behavior within its associated range. A value of j i near 1 indicates that, over its range, the evidence 
was very consistent for the given bottleneck estimate, while a lower value indicates the evidence for 
the bottleneck was diluted by the presence of numerous inconsistent measurements. If j i ! 0:2, or 
if k = 2 (i.e., we are looking at packet pair estimates) and j i ! 0:3, we reject the estimate reflected 
by the cluster as too feeble. This heuristic prunes out the vast majority of estimates that have made 
it this far in the process, since most of them are due to spurious noise effects. It keeps, however, 
those that appear to dominate the region over which we found them. 
It at first appears that a threshold of 0.2 or 0.3 is considerably too lenient, but in fact it 
works well in practice, and using a higher threshold runs the risk of failing to detect multi­channel 
effects, which can split the estimates into two or three different regions. For example, in Figure 14.7 
we can readily see that a number of different slopes emerge. 
An estimate that has made it this far is promising. The next step is to see whether we 
have already made essentially the same estimate. We do so by inspecting the previously accepted 
(``solid'') estimates to see whether the new estimate overlaps. If so, we consolidate the two estimates. 
The details of the consolidation are numerous and tedious. 9 We will not develop them here, except 
to note that this is the point where a solid estimate with a large error interval (ae 
point where we determine whether to increase the maximum extent associated with 
an estimate. Doing so is important when hunting for multi­channel bottleneck links, as these should 
exhibit one bandwidth estimate with a maximum extent exactly equal to the number of parallel 
channels. 
If we do not consolidate a new estimate with any previous solid ones, then we add it to 
the set of solid estimates. 
9 And can be gleaned from the tcpanaly source code. 

272 
Forming the final estimates 
After executing the process outlined in the previous two subsections, we have produced 
\Upsilon, a set of ``solid'' estimates. It then remains to further analyze \Upsilon to determine whether the estimates 
indicate the presence of a multi­channel link or a bottleneck change. Note that in the process we may 
additionally merge some of the estimates; we have not yet constructed the set of ``final'' estimates! 
If \Upsilon is empty, then we failed to produce any solid bandwidth estimates. This is rare but 
occasionally happens, for one of the following reasons: 
1. so many packet losses that too few groups arrived at the receiver to form a reliable estimate; 
2. so many retransmission events that the connection never opened its congestion window suffi­ 
ciently to produce a viable stream of packet pairs; 
3. such a small receiver window that the connection could never produce a viable stream of 
packet pairs; or, 
4. the trace of the connection was so truncated that it did not include enough packet arrivals 
(x 10.3.4). 
In N 1 , we encountered 37 failures; in N 2 , only 1, presumably because the bigger windows used in 
N 2 (x 9.3) gave more opportunity of observing a packet group spaced out by the bottleneck link. 
Interestingly, no estimation failed on account of too many out­of­order packet deliveries. Even 
those with 25% of the arrivals occurring out of order provided enough in­order arrivals to form a 
bottleneck estimate. 
Assuming \Upsilon is not empty, then if it includes more than one solid estimate, we compare 
the different estimates as follows. First, we define the base estimate, ae \Lambda , as the first solid estimate 
we produced. No other estimate was formed using a smaller extent than ae \Lambda , since we generated 
estimates in order of increasing extent. 
If ae \Lambda was formed using an extent of k = 2, and if \Upsilon includes additional estimates that were 
only observed for k = 2 (i.e., for higher extents we never found a compatible estimate with which 
to consolidate them), then we assess whether these estimates are ``weak.'' An estimate is weak if it is 
low compared to ae \Lambda ; the overall proportion of the trace in accordance with the estimate is small; and 
the estimate's expansions ¸ c(50) 
i and ¸ c(95) 
i are low. If these all hold, then the estimate fits the profile 
of a spurious bandwidth peak (due, for example, to the relatively slow pace at which duplicate acks 
clock out new packets during ``fast recovery'', per x 9.2.7), and we discard the estimate. 
We now can (at last!) proceed to producing a set of final bandwidth estimates. We begin 
with the base estimate, ae \Lambda . We next inspect the other surviving estimates as follows. For each 
estimate, we test to see whether its range overlaps any of the final estimates. If so, then we check 
whether the two estimates might reflect a two­channel bottleneck link, which requires: 
1. One of the estimates must have a maximum extent of k = 2 and the other must have a 
minimum extent of k – 3. Call these E 2 and E 3 . This requirement splits the estimates into 
one that reflects the downstream bottleneck, which is only observed for packet pairs (k = 2, 
since for k ? 2 the effect cannot be observed for a two­channel bottleneck), and the other 
that reflects the true link bandwidth (which can only be observed for k ? 2, since k = 2 is 
obscured by the multi­channel effect). 

273 
2. E 3 must span at least as much of the trace as E 2 . It may span more due to phase effects, as 
illustrated in Figure 14.7. 
3. Unless E 3 spans almost the entire trace, we require that: 
¸ c(95) 
3 – min( 
3 
4 
¸ c(95) 
2 ; 2): 
This requirement assures that E 3 was at least occasionally observed for a considerable expan­ 
sion factor, or, if not, then neither was E 2 . The goal here is to not be fooled by an E 3 that 
was only generated by self­clocking (i.e., no opportunity to observe a higher bandwidth for 
an extent k ? 2). 
4. The bandwidth estimate corresponding to E 3 must be at least a factor of 1.5 different than 
that from E 2 , to avoid confusing a single very broad peak with two distinct peaks. 
If the two estimates meet these requirements, then we classify the trace as exhibiting a 
multi­channel bottleneck link. 
We originally performed the same analysis for (E 3 , E 4 ), that is, for overlapping estimates, 
one with extent k = 3 and one with k – 4. A three­channel bottleneck would produce estimates for 
both. We did not find any traces that plausibly exhibited three­channel bottleneck links, though, and 
did endure a number of false findings, so we omit three­channel analysis from PBM. If we have the 
opportunity in the future to obtain traces from paths with known three­channel bottlenecks, then we 
presume we could devise a refinement to the present methodology that would reliably detect their 
presence. 
If two estimates overlap but fail the above test for a multi­channel bottleneck, and if either 
has both a higher bandwidth estimate and accords with twice as many measurements as the other, 
then we discard the weaker estimate and use the stronger in its place. 
If they overlap but neither dominates, then if one has a minimum extent larger than the 
other's maximum extent, and larger than k = 3 (to avoid erroneously discarding multi­channel 
estimates), then we discard it as almost certainly reflecting spurious measurements. 
If two estimates overlap and none of the three procedures above resolve the conflict, then 
PBM reports that it has found conflicting estimates. This never happened when analyzing N 1 . For 
N 2 , we found only 10 instances. 7 involve lbli, which frequently exhibits both a bottleneck change 
and a multi­channel bottleneck, per Figures 14.4 and 14.5. The other three all exhibit a great deal 
of delay variation, leading to the conflicting estimates. 
If the newly considered estimate does not overlap, then, after some final sanity checks to 
screen out spurious measurements (which can otherwise remain undetected, if they happen to occur 
at the very beginning or end of the trace, and thus do not overlap with the main estimate), we add 
it to the collection of final estimates. At this point, we conclude that the trace exhibits a bottleneck 
change. 
Completing the above steps results in one or more final estimates. For each final estimate 
ae B , we then associate bounds: 
ae 
latter case, we 
term the estimate as clock­limited. 

274 
N 1 N 2 
Results of estimation # % # % 
Single bottleneck 2,018 90% 14,483 94% 
Estimate failure 37 1.7% 1 --- 
Broken estimate 46 2.1% 72 0.05% 
Ambiguous estimate: 139 6.2% 779 5.1% 
change 94 4.2% 594 3.9% 
multi­channel 74 3.3% 506 3.3% 
conflicting 0 0.0% 11 0.07% 
Total trace pairs 2,240 100% 15,335 100% 
Table XVIII: Types of results of bottleneck estimation for N 1 and N 2 
N 1 N 2 
Results of estimation # % # % 
Single bottleneck 1,929 95% 14,134 98% 
Estimate failure 37 1.8% 1 --- 
Broken estimate 19 0.9% 61 0.04% 
Ambiguous estimate: 48 2.3% 204 1.4% 
change 7 0.34% 67 0.47% 
multi­channel 41 2.0% 135 0.9% 
conflicting 0 0.0% 3 0.02% 
Total trace pairs 2,033 100% 14,400 100% 
Table XIX: Types of results after eliminating trace pairs with lbli 
14.7 Analysis of bottleneck bandwidths in the Internet 
We applied the bottleneck estimation algorithms developed in x 14.5 and x 14.6 to the 
trace pairs in N 1 and N 2 for which the clock analysis described in Chapter 12 did not uncover 
any uncorrectable problems. These comprised a total of 2,240 and 15,335 trace pairs, respectively. 
Table XVIII summarizes the types of results we obtained. ``Single bottleneck'' refers to traces 
for which we found solid evidence for a single, well­defined bottleneck bandwidth. An ``estimate 
failure'' occurs when PBM is unable to find any persuasive estimate peaks (x 14.6.2). ``Broken 
estimate'' summarizes traces for which PBM yielded a single uncontested estimate, but subsequent 
queueing analysis found counter­evidence indicating the estimate was inaccurate. (We describe this 
self­consistency test in x 16.2.6.) ``Ambiguous estimate'' means that the trace pair did not exhibit 
a single, well­defined bottleneck: it included either evidence of a bottleneck change, or a multi­ 
channel bottleneck link, or both; or it had conflicting estimates, already discussed in x 14.6.2. 
The ambiguous estimates were clearly dominated by lbli, no doubt because its ISDN 
link routinely exhibited both bottleneck changes and multi­channel effects (since when it activates 
the second ISDN channel, the bandwidth doubles and a parallel path arises). Table XIX summarizes 

275 
the types of results after removing all trace pairs with lbli as sender or receiver. We see that PBM 
almost always finds a single bottleneck. The results also exhibit a general trend between N 1 and 
N 2 towards fewer problematic estimates. We suspect the difference is due to two effects: the lower 
prevalence of out­of­order delivery in N 2 compared to N 1 , and the use of bigger windows in N 2 
(x 9.3), which provides more opportunity for generating tightly­spaced packet pairs and packet 
bunches. 
In the remainder of this section, we analyze each of the different types of estimated bot­ 
tlenecks. 
14.7.1 Single bottlenecks 
Far and away the most common result of applying PBM to our traces was that we obtained 
a single estimated bottleneck bandwidth. Unlike [CC96a], we do not a priori know the bottleneck 
bandwidths for many of the paths in our study. We thus must fall back on self­consistency checks 
in order to gauge the accuracy of PBM. Figures 14.10 and 14.11 show histograms of the estimates 
formed for N 1 and N 2 , where the histogram binning is done using the logarithms of the estimates, 
so the ratio of the sizes of adjacent bins remains constant through the plot. 
There are a number of readily apparent peaks. In N 1 , we find the strongest at about 
170 Kbyte/sec, and another strong one at 6.5 Kbyte/sec. Secondary peaks occur at about 100, 
330, 80, and 50 Kbyte/sec, with lesser peaks at 30 Kbyte/sec, 500 Kbyte/sec, and at a bit over 
1 Mbyte/sec. The pattern in N 2 is a bit different. The 170 Kbyte/sec peak clearly dominates, 
and the 6.5 Kbyte/sec peak has shifted over to about 7.5 Kbyte/sec. The peaks between 50 and 
100 Kbyte/sec are no longer much apparent, and the 330 Kbyte/sec peak has diminished while the 
30, 500 and 1 Mbyte/sec peaks have grown. Finally, a new, somewhat broad peak has emerged at 
13--14 Kbyte/sec. 
We calibrate these peaks using a combination of external knowledge about popular link 
speeds, and by inspecting which sites tend to predominate for a given peak. Several common 
slower link speeds are 56, 64, 128, and 256 Kbit/sec. Common faster links are 1.544 Mbit/sec 
(``T1''---primarily used in North America), 2.048 Mbit/sec (``E1''---used outside North America), 
and 10 Mbit/sec (Ethernet). Certainly faster links are in use in the Internet, but we omit discussion 
of them since none of the bottlenecks in our study exceeded 10 Mbit/sec; we note, however, that it 
is the use of faster wide­area links that enables a local­area limit such as Ethernet to wind up as a 
connection's bottleneck. 
The link speeds discussed above reflect the raw capacity of the links. Not all of this 
capacity is available to carry user data. Often a portion of the capacity is permanently set aside 
for framing and signaling. Furthermore, transmitting a packet of user data using TCP requires 
encapsulating the data in link­layer, IP, and TCP headers. The size of the link­layer header varies 
with the link technology. The IP and TCP headers nominally require at least 40 bytes, more if IP or 
TCP options are used. Use of IP options for TCP connections is rare, and none of the connections in 
our study did so. TCP options are common, especially in the initial SYN packets. Thus, we might 
take 40 bytes as a solid lower bound on the TCP/IP header overhead. An exception, however, is links 
utilizing header compression (x 13.3), which, depending on the homogeneity of the traffic traversing 
the link, can greatly reduce the bytes required to transmit the headers. Header compression works by 
leveraging off of the large degree of redundancy between the headers of a connection's successive 
packets. For example, under optimal conditions, CSLIP compression can reduce the 40 bytes to 

276 
KBytes/sec 
5 10 50 100 500 
0 
100 
200 
300 
400 
56 Kbps 
64 Kbps 
128 Kbps 
256 Kbps 
2T1 
3T1 ETHER 
.5 T1 
.5 E1 
10 msec 
clock 
T1 
E1 
Figure 14.10: Histogram of different single­bottleneck estimates for N 1 

277 
KBytes/sec 
5 10 50 100 500 
0 
1000 
2000 
3000 
4000 
5000 
6000 
64 Kbps 
128 Kbps 
256 Kbps 
2T1 
3T1 
ETHER 
.5 E1 
T1 
E1 
Figure 14.11: Histogram of different single­bottleneck estimates for N 2 

278 
Raw rate (ae R ) User data rate (ae U ) Notes 
56 Kbit/sec ß 6.2 Kbyte/sec 
64 Kbit/sec ß 7.1 Kbyte/sec 
128 Kbit/sec ß 14.2 Kbyte/sec 
256 Kbit/sec ß 28.4 Kbyte/sec 
1.544 Mbit/sec ß 171 Kbyte/sec T1 
2.048 Mbit/sec ß 227 Kbyte/sec E1 
10 Mbit/sec ß 1.1 Mbyte/sec Ethernet 
Table XX: Raw and user­data rates of different common links 
5 bytes. Finally, some links use data compression techniques to reduce the number of bytes required 
to transmit the user data. We presume these techniques did not affect the connections in our study 
because NPD sends a pseudo­random sequence of bytes (to avoid just this effect). 
Given these sundry considerations, we do not hope to nail down a single figure for each 
link technology reflecting the user data rate it delivers. Instead, we make ``ballpark'' estimates, as 
follows. For high­speed links, the framing and signaling overhead consumes about 4.5% of the raw 
bandwidth [Ta96]. We compromise on the issues of header compression versus additional bytes 
required for link­layer headers and TCP options by assuming 40 bytes of overhead for each TCP/IP 
packet. Finally, we assume that a ``typical'' data packet carries 512 bytes of user data. This is 
the most commonly observed value in our traces, though certainly not the only one. Given these 
assumptions, the user data rate available from a link with a raw rate of ae R is: 
ae U ß (:955)( 
512 
512 + 40 
)ae R 
ß :886ae R : 
Table XX summarizes the corresponding estimated user­data rates for the common raw link rates 
discussed above. From the table, it is clear that the strong 170 Kbyte/sec peak in Figure 14.10 and 
Figure 14.11 reflect T1 bottlenecks. Likewise, the 6.5 Kbyte/sec peak reflects 56 Kbit/sec links, and 
may be slightly higher than the estimate in the Table due to the likely use of header compression. Its 
shift to 7.5 Kbyte/sec reflects upgrading of 56 Kbit/sec links to 64 Kbit/sec. The 13--14 Kbyte/sec 
peak reflects 128 Kbit/sec links and the 30 Kbyte/sec peak, 256 Kbit/sec. The 1 Mbyte/sec peaks 
are clearly due to Ethernet bottlenecks. 
These identifications still leave us with some unexplained peaks from the bottleneck es­ 
timates. We speculate that the 330 Kbyte/sec peak reflects use of two T1 circuits in parallel, 
500 Kbyte/sec reflects three T1 circuits (not half an Ethernet, since there is no easy way to sub­ 
divide an Ethernet's bandwidth), and 80 Kbyte/sec comes from use of half of a T1. 
We then have only two unexplained peaks remaining: 50 and 100 Kbyte/sec. The 
50 Kbyte/sec peak is only prominent in N 1 . We find that this peak in fact reflects vagueness due 
to limited clock resolution: in x 14.4.2 we showed that, for packet pair, the fastest bandwidth a 
10 msec clock can yield for 512 byte packets is 51.2 Kbyte/sec. Thus, the 50 Kbyte/sec peak is 
a measurement artifact, though it also indicates the presence of connections for which PBM was 
unable to tighten its bottleneck estimate using higher extents (which would reduce uncertainties due 

279 
to clock resolution), presumably because the connection rarely had more than two packets delivered 
to the receiver at the bottleneck rate, due to extensive queueing noise. 
The 100 Kbyte/sec peak, on the other hand, most likely is due to splitting a single E1 
circuit in half. Indeed, we find non­North American sites predominating these connections, as 
well exhibiting peaks at 200--220 Kbyte/sec, above the T1 rate and just a bit below E1. This peak 
is, however, absent from North American connections. (See also Figure 14.12 and accompanying 
discussion, below.) 
In summary, we believe we can offer plausible explanations for all of the peaks. Passing 
this self­consistency test in turn argues that PBM is indeed detecting true bottleneck bandwidths. We 
next turn to variation in bottleneck rates. We would expect to observe strong site­specific variations 
in bottleneck rates, since some of the limits arise directly from the speed of the link connecting the 
site to the rest of the Internet. 
Figure 14.12 clearly shows this effect. The figure shows a ``box plot'' for log 10 of the bot­ 
tleneck estimates for each of the N 2 receiving sites. In these plots, we draw a box spanning the inner 
two quartiles (that is, from 25% to 75%). A dot shows the median and the ``whiskers'' extend out 
to 1.5 times the inter­quartile range. The plot shows any values beyond the whiskers as individual 
points. The horizontal line marks 171 Kbyte/sec, the popular T1 user data rate (Table XX). 
The plot clearly shows considerable site­to­site variation. While all sites reflect some 
64 and 128 Kbit/sec bottlenecks, we quickly see that austr2 has virtually only 128 Kbit/sec bot­ 
tlenecks, indicating it almost certainly uses a link with that rate for its Internet connection. (austr, 
on the other hand, has at least E1 connectivity.) lbli generally does not have a single bottleneck 
above 64 Kbit/sec (it often has a bottleneck change that includes 128 Kbit/sec, but in this section 
we only consider traces exhibiting a single, unchanged bottleneck). The lbli estimates tend to be 
quite sharply defined. Of those larger than 7 Kbyte/sec, 96% lay within a 30 byte/sec range centered 
about 7,791 byte/sec. The other site with a narrow bottleneck bandwidth region is oce, which has 
a 64 Kbit/sec link to the Internet, as clearly evidenced by the plot, except for a cluster of outliers 
at 17 Kbyte/sec. All of the outliers were localized to a 1 day period, perhaps a time when oce 
momentarily enjoyed faster connectivity. 
In the main, the plot exhibits a large number of sites with median bottlenecks at T1 rate. 
A few have slightly higher median bottlenecks, and these tend to be non­North American sites, 
consistent with E1 links. Two sites have occasional values just below log 10 = 1:5, corresponding 
to 256 Kbit/sec links. These sites are ucl and ukc, both located in Britain, so we suspect these 
bottlenecks reflect a British circuit or set of circuits. Some sites also exhibit a fair number of 
bottlenecks exceeding 1 Mbyte/sec: bnl, lbl, mid, near, panix, and wustl (as well as, more 
rarely, a number of others), indicating these all enjoyed Ethernet­limited Internet connectivity. 
We next investigate the stability of bottleneck bandwidth over time. We confine this in­ 
vestigation to N 2 , since it includes many more connections between the same sender/receiver pairs, 
spaced over a large range of time. We begin by constructing for each sender/receiver pair two 
sequences, \DeltaT s;r and R s;r , giving the difference in time between the beginning of successive con­ 
nections from the sender to the receiver, and the ratio of the estimated bottleneck rate for the second 
of the connections to that of the first. 
As noted in x 9.3, we varied the mean time between successive connections between 
sender/receiver pairs, and, in addition, our methodology would sometimes include ``revisiting'' a 
pair at a later date. Accordingly, \DeltaT s;r exhibits considerable range: its median is 8 minutes, its 90th 

280 
1.0 
1.5 
2.0 
2.5 
3.0 
a 
d 
v 
a 
u 
s 
t 
r 
a 
u 
s 
t 
r 
2 
b 
n 
l 
b 
s 
d 
i 
c 
o 
n 
n 
i 
x 
h 
a 
r 
v 
i 
n 
r 
i 
a 
l 
b 
l 
l 
b 
l 
i 
m 
i 
d 
n 
e 
a 
r 
n 
r 
a 
o 
o 
c 
e 
p 
a 
n 
i 
x 
p 
u 
b 
n 
i 
x 
r 
a 
i 
n 
s 
a 
n 
d 
i 
a 
s 
d 
s 
c 
s 
i 
n 
t 
e 
f 
1 
s 
i 
n 
t 
e 
f 
2 
s 
r 
i 
u 
c 
l 
u 
c 
l 
a 
u 
c 
o 
l 
u 
k 
c 
u 
m 
a 
n 
n 
u 
m 
o 
n 
t 
u 
n 
i 
j 
u 
s 
t 
u 
t 
t 
w 
u 
s 
t 
l 
Log10 
Kbytes/Sec 
Figure 14.12: Box plots of bottlenecks for different N 2 receiving sites 

281 
Time Until Shift (sec) 
10^2 10^3 10^4 10^5 10^6 
0.0 
0.2 
0.4 
0.6 
0.8 
1.0 
Figure 14.13: Time until a 20% shift in bottleneck bandwidth, if ever observed 
percentile is 104 minutes, but its mean is about 7 hours, due to revisiting. 
The bottleneck ratio R s;r overall shows little variation. Its median is exactly 1.0. Evalu­ 
ating R s;r 's distribution directly can be misleading, because it will tend to be ! 1 as often as ? 1, 
depending on whether the second of a pair of estimates was lower or higher than the first. What is 
more relevant is the ``magnitude'' of the ratio between successive estimates, which we define as: 
jRj s;r j exp[j log R s;r j]; 
that is, the ratio of the larger of the two estimates to the smaller. The median of jRj s;r is 1.0175, 
indicating that 50% of the successive estimates differ by less than 1.75% from the previous estimate. 
We find that 80% of the successive estimates differ by less than 10%, and 98% differ by less than a 
factor of two. 
We consider two different assessments of the stability of the bottleneck rate over time. 
First, we examine the correlation between jRj s;r and \DeltaT s;r . If bottlenecks fluctuate significantly 
over time, then we would expect the magnitude of the ratio to correlate with the time separating 
the connections. If fluctuations are mainly due to measurement imprecision, then the two should be 
uncorrelated. 
For \DeltaT s;r ! 1 hour (85% of the successive measurements), we find very slight positive 
correlation between jRj s;r and \DeltaT s;r , with a coefficient of correlation equal to 0.03. We obtain 
a coefficient of about this size regardless of whether we first apply logarithmic transformations to 
either or both of jRj s;r and \DeltaT s;r in an attempt to curb the influence of outliers. For \DeltaT s;r – 1 hour, 
the coefficient of correlation rises to about 0.09. This is still not strong positive correlation, and 
indicates that bottleneck bandwidth is quite stable over periods of time ranging from minutes to 
days (the mean of \DeltaT s;r , conditioned on it exceeding 1 hour, is 52 hours). 
We can also assess stability in terms of the time required to observe a significant change. 
To do so, for each sender/receiver pair we take the first bottleneck estimate as a ``base measurement'' 

282 
and then look to see when we find two consecutive later estimates that both differ from the base 
measurement by more than 20%, and that both agree in terms of the direction of the change (20% 
larger or smaller). We look for consecutive estimates to weed out spurious changes due to isolated 
measurement errors. We find that only about a fifth of the sender/receiver pairs ever exhibited a shift 
of this magnitude. Furthermore, the amount of time between the first measurement and the first of 
the pair constituting the shift has a striking distribution, shown in Figure 14.13. The distribution 
appears almost uniform, except that the x­axis is logarithmically scaled, indicating that shifts in 
bottleneck bandwidth occur over a wide range of time scales. This finding qualitatively matches that 
in Chapter 7 that the time over which different routes persist varies over a wide range of scales. We 
would expect general agreement since one obvious mechanism for a shift in bottleneck bandwidth 
is a routing change, though some routing changes will not alter the bottleneck. 
The last property of bottleneck bandwidth we study in this section is its symmetry: how 
often is the bottleneck from host A to host B the same as that from B to A? We know from Chapter 8 
that Internet routes often exhibit major routing asymmetries, with the route from A to B differing 
from the reverse of B to A by at least one city about 50% of the time in N 2 . It is quite possible 
that these asymmetries will also lead to bottleneck asymmetries, an important consideration because 
sender­based ``echo'' bottleneck measurement techniques such as those explored in [Bo93, CC96a] 
will observe the minimum bottleneck of the two directions. 
Figure 14.14 shows a scatter plot of the median bottleneck rate estimated in the two direc­ 
tions for the hosts in our study. The plot uses logarithmic scaling on both axes to accommodate the 
wide range of bottleneck rates. For each pair of hosts A and B for which we had successful mea­ 
surements in both directions, we plot a point corresponding to A's median estimate on the x­axis, 
and B's median estimate on the y­axis. The solid diagonal line has slope one and offset zero. Points 
falling on it have equal estimates in the two directions. The dashed diagonal lines mark the extent of 
estimates 20% above or below the solid line. About 45% of the points fall within \Sigma5% of equality, 
and 80% within \Sigma20% (i.e., within the dashed lines). But about 20% of the estimates differ by 
considerably more. For example, some paths are T1 limited in one direction but Ethernet limited in 
the other, a major difference. 
Of the considerably different estimates, the median ratio between the two estimates is 40% 
and the mean is 65%. In light of these variations, we see that sender­based bottleneck measurement 
provides a good rough estimate, but will sometimes yield quite inaccurate results. 
14.7.2 Bottleneck changes 
We now turn to analyzing how frequently the bottleneck bandwidth changes during a 
single TCP connection. From the results in the previous section, we expect such changes to occur 
only rarely, and indeed this is the case. If we disregard lbli, which, as noted in x 14.4.3, frequently 
exhibits a bottleneck change due to the activation of its second ISDN channel, then, as shown in 
Table XIX, only about 1 connection in 250 (0.4%) exhibited a bottleneck change. The changes 
are all large, by definition (since we merge bottleneck estimates with minor differences), with the 
median ratio between the two bottlenecks in the range 3­6. 
Figure 14.15 illustrates one of the smaller changes. At about T = 2:3, the bottleneck 
decreases from an estimated 168 Kbyte/sec to an estimated 99 Kbyte/sec. The effect here is not 
self­clocking, as the one­way delays of the packets show a considerable increase at T = 2:3 as well. 
Contrast this behavior with that at about T = 2:1, where we see a momentary decrease. In this 

283 
Median Bottlneck Rate (KBytes/sec), A ­> B 
Median 
Bottlneck 
Rate 
(KBytes/sec), 
B 
­> 
A 
10 50 100 500 1000 
10 
50 
100 
500 
1000 
Figure 14.14: Symmetry of median bottleneck rate 

284 
Time 
Sequence 
# 
1.6 1.8 2.0 2.2 2.4 2.6 
20000 
40000 
60000 
80000 
100000 
Figure 14.15: Sequence plot reflecting halving of bottleneck rate 
case, the slow­down is not accompanied by an increase in transit time, and is instead a self­clocking 
``echo'' of the slow­down at T = 1:9. 
Since 99 Kbyte/sec is not a particularly compelling link rate vis­a­vis Table XX, we might 
consider that the bottleneck rate did not in fact change, but instead at T = 2:3 a constant­rate source 
of competing traffic began arriving at the bottleneck link, diluting the bandwidth available to our 
connection and hence widening the spacing between arriving data packets. This may well be the 
case. We note, however, that effectively this situation is the same as a change in the bottleneck rate: 
if the additional traffic is indeed constant rate, and not adaptive to the presence of our traffic, then 
we might as well have suffered a reduction in the basic bottleneck link rate, since that is exactly the 
effect our connection will experience. So we argue that, in this case, we want to regard the change 
as due to a bottleneck shift, rather than due to congestion. 
A few of the bottleneck ``changes'' appear spurious, however. These apparently stem from 
connections with sufficient delay noise to completely wash out the true bottleneck spacing, and 
which coincidentally produce a common set of packet spacings that lead to a false bottleneck peak. 
Most changes, however, appear genuine. In both datasets, about 15% of the changes differ by about 
a factor of two, suggesting that a link had been split or two sub­links merged following a failure or 
repair at the physical layer. 
14.7.3 Multi­channel bottlenecks 
The final type of bottleneck we analyze are those exhibiting the multi­channel effect dis­ 
cussed in x 14.4.4. As shown in Table XIX, except for connections involving lbli, known to have a 
2­channel bottleneck link, we found few multi­channel bottlenecks. However, after excluding lbli, 
we still found a tendency for a few sites to predominate among those exhibiting multi­channel bot­ 
tlenecks: inria, ukc, and ustutt, in both datasets, and wustl in N 1 . The presence of this last 

285 
Time 
Sequence 
# 
1.5 2.0 2.5 
20000 
25000 
30000 
35000 
40000 
45000 
Figure 14.16: Excerpt from a trace exhibiting a false ``multi­channel'' bottleneck 
site in the list is not surprising, since we know that due to route ``flutter'' many of its connections 
used two very different paths to each remote site (x 6.6). 
However, we cannot confidently claim that any of the non­lbli purported multi­channel 
bottlenecks are in fact due to multi­channel links, since we find that very often the trace in question 
is plagued with delay noise, and lacks the compelling pattern shown in Figure 14.6. The ratios 
between the nominal bandwidths of extent k = 2 and k – 3 bunches also generally tend to be ! 2, 
which from our experience often instead indicates excessive measurement noise smearing out the 
bottleneck signature. 
Even when the measurements appear quite clean, we must exercise caution. Figure 14.16 
shows a portion of an N 1 trace from ukc to ucl with a pattern very similar to that in Figure 14.6. 
Most of the trace looks exactly like the pattern shown. PBM analyzes this trace as exhibiting a 
multi­channel bottleneck with an upper rate of 477 Kbyte/sec and a slower rate of 18 Kbyte/sec. 
However, detailed analysis of the trace reveals a few packet bunches with k – 3 that arrived spaced 
at 477 Kbyte/sec, evidence that either the bunches were compressed (x 16.3.2) subsequent to the 
multi­channel bottleneck, or the bottleneck is in fact not multi­channel. Further analysis reveals that 
the sending TCP was limited by a sender­window (x 11.3.2), and that the ack­every­other policy 
used by the receiver led to almost perfect self­clocking of flights of two packets arriving at the 
true bottleneck rate, followed by a self­clocking lull, followed by another flight of two, and so 
on. While PBM includes heuristics based on e 
¸ s;r (Eqn 14.6) that attempt to discard traces like 
these as multi­channel candidates, this one passed the heuristic due to some unfortuitous packet 
bunch expansion early in the trace. Had the sending TCP not been window­limited, it would have 
continued expanding the window as the self­clocking set in, leading to numerous flights of k – 3 
packets all arriving at the faster link rate, and PBM would have determined that in fact the link was 
not multi­channel. 
In summary, we are not able to make quantitative statements about multi­channel bottle­ 

286 
Time 
Sequence 
# 
0.0 0.5 1.0 1.5 2.0 
0 
20000 
40000 
60000 
80000 
100000 
Figure 14.17: Self­clocking TCP ``fast recovery'' 
necks in the Internet, except that in any case they are quite rare; that at least one link technology 
(ISDN) definitely exhibits them; and that some sites exhibit either true such links, or at least noise 
patterns resembling the multi­channel signature. 
14.7.4 Estimation errors due to TCP behavior 
In the previous section, we noted how TCP ``self­clocking'' can lead to a packet arrival 
pattern that matches that expected for a multi­channel bottleneck link quite closely, even though 
the bottleneck link is not in fact multi­channel. In this section we briefly illustrate another form of 
TCP behavior that can lead to false bottleneck estimates. Figure 14.17 shows a sequence plot of a 
connection clearly dominated by an unusually smooth and slow middle period. 
What has occurred is that a single packet was dropped at about T = 0:7. Enough ad­ 
ditional packets were in flight that 4 duplicate acks came back to the sender. The first 3 sufficed 
to trigger ``fast retransmit'' (x 9.2.7), and the congestion window was such that the 4th led to the 
transmission of an additional packet carrying new data via the ``fast recovery'' mechanism (x 9.2.7). 
However, the first packet retransmitted via fast retransmit was also dropped, while the fast­recovery 
packet carrying new data arrived successfully. This meant that the TCP receiver still had a sequence 
hole reflecting the original lost packet, so it sent another dup ack. The arrival of that duplicate then 
liberated another packet via fast recovery, and the cycle repeated 50 more times, until the original 
lost packet was finally retransmitted again, this time due to a timeout. Its retransmission filled the 
sequence hole and the connection proceeded normally from that point on. 
Since the connection had an RTT of about 22 msec and only one fast recovery packet or 
dup ack was in flight at any given time, during the retransmission epoch the connection transmitted 

287 
using ``stop­and­go,'' with an effective rate of: 
512 bytes 
0:22 sec = 23 Kbyte/sec: 
PBM finds this peak rather than the true bottleneck of 1 Mbyte/sec, because the true bottleneck is 
obscured by the receiver's 1 msec clock resolution. 
The TCP dynamics shown in the figure are quite striking. We note, however, that use of 
the SACK selective­acknowledgement option [MMFR96], now in the TCP standardization pipeline, 
will give the sender enough information to avoid situations like this one. We also note that, while 
this sort of TCP behavior is not exceptionally rare, this was the only such trace that we know PBM 
to have misanalyzed. 
14.8 Efficacy of other estimation techniques 
We finish with a look at how other, simpler bottleneck estimation techniques perform 
compared to PBM. Since PBM is quite complex, it would be useful to know if we can use a simpler 
method to get comparably sound results. In this context, the development of PBM serves as a way to 
calibrate the other methods. We confine our analysis to those traces for which PBM found a single 
bottleneck, as the other techniques all assume such a situation to begin with. 
We further associate error bars with each PBM estimate. These either span the range of 
``consistent'' estimates we found, where estimates are considered consistent if they lie within \Sigma20% 
of the main PBM estimate (x 14.6.2); or, if larger, the error bars reflect the inherent uncertainty 
in the PBM estimate due to limited clock resolution (x 14.4.2). If another technique produces an 
estimate lying within the error bars, then we consider it as performing as well as PBM, otherwise 
not. 
14.8.1 Efficacy of PR 
In this section we evaluate the ``conservative'' and ``optimistic'' peak­rate (PR) estimators 
developed in x 14.5. These estimators were developed primarily as calibration checks for PBM, and 
we noted in their discussion that they will tend to underestimate the true bottleneck rate. Still, since 
they are simple to compute, it behooves us to evaluate their efficacy. We only evaluate them for N 2 , 
since they rely on the sending TCP enjoying a large enough window that it could ``fill the pipe'' and 
send at a rate equal to or exceeding the bottleneck rate (x 9.3). 
As we might expect, we find that the conservative estimate c 
PR c given by Eqn 14.7 often 
underestimates the bottleneck: 60% of the time in N 2 , c 
PR c 
was below the lower bound given by 
PBM; 39% of the time, it was in agreement; and 2% of the time it exceeded the upper bound, due 
to packet compression effects (x 16.3). 
Unfortunately, the more optimistic estimate c 
PR o given by Eqn 14.8 only fares slightly 
better, underestimating 43% of the time, agreeing 52%, and overestimating 5% of the time. 
We conclude that neither peak­rate estimator is trustworthy: they both often underesti­ 
mate, because connections fail to fill the pipe due to congestion levels high enough to preclude an 
RTT's worth of access to the full link bandwidth. 

288 
14.8.2 Efficacy of RBPP 
Receiver­based packet pair (x 14.3) is equivalent to PBM with the extent limited to k = 2. 
(That is, it uses PBM's clustering algorithm to pick the best k = 2 estimate.) Consequently, we 
would expect it to do quite well in terms of agreeing with PBM, with disagreement potentially 
arising only due to clock resolution limitations for k = 2 (x 14.4.2); delay noise on very short time 
scales such that pairs of packets are perturbed and do not yield a clear bandwidth estimate peak, but 
larger extents do; and multi­channel bottlenecks (not further evaluated in this section), one of the 
main motivations for PBM in the first place. 
We find the RBPP estimate is almost always within \Sigma20% of PBM's, disagreeing in N 1 
and N 2 by more only 2­3% of the time. The two estimates are identical about 80% of the time, 
indicating PBM was usually unable to further hone RBPP's estimate by considering larger extents. 
Thus, if (1) PBM's general clustering and filtering algorithms are applied to packet pair, (2) we do 
packet pair estimation at the receiver, (3) the receiver benefits from sender timing information, so 
it can reliably detect out­of­order delivery and lack of bottleneck ``expansion,'' and (4) we are not 
concerned with multi­channel effects, then packet pair is a viable and relatively simple means to 
estimate the bottleneck bandwidth. 
14.8.3 Efficacy of SBPP 
We finish with an evaluation of one form of sender­based packet pair (SBPP). SBPP is of 
considerable interest because a sender can use it without any cooperation from the receiver. This 
property makes SBPP greatly appealing for use by TCP in the Internet, because it works with only 
partial deployment. That is, SBPP can enhance a TCP implementation's decision­making for every 
transfer it makes, even if the receiver is an old, unmodified TCP. We expect SBPP to have difficulties, 
though, due to noise induced by networking delays experienced by the acks, as well as variations in 
the TCP receiver's response delays in generating the acks themselves (x 11.6.4). 
The bottleneck bandwidth estimators previously studied are both sender­based 
[Bo93, CC96a]. They differ from how sender­based TCP packet pair would work in that those 
schemes use ``echo'' packets. As noted in the discussion of Figure 14.14, Internet paths do not al­ 
ways have symmetric bottlenecks in the forward and reverse directions. Consequently, echo­based 
techniques will sometimes perforce give erroneous answers for the forward path's bottleneck rate. 
For TCP's use, however, the ``echo'' is the acknowledgement of the data packet. Except for con­ 
nections sending data in both directions simultaneously, which are rare, these echoes are therefore 
returned in quite small ack packets. Consequently, bottleneck asymmetry will not in general perturb 
SBPP for TCP. Another significant difference is that, for TCP, usually an echo is only generated for 
every other data packet (x 11.6.1). Consequently, the interval between each pair of acks arriving 
at the sender echoes the difference in time between the arrivals of two data packets at the receiver, 
rather than the arrivals of consecutive data packets. Because of this loss of fine­scaled timing in­ 
formation, TCP SBPP cannot detect the presence of multi­channel links, since doing so requires 
observing per­packet timing differences. (It will instead see timings corresponding to an extent of 
k = 4, which, for 2­channel and 3­channel links, is in fact the true bottleneck rate.) 
To fairly evaluate SBPP, we assume use by the sender of the following considerations for 
generating ``good'' bandwidth estimates: 
1. The sender always correctly determines how many user data bytes arrived at the receiver 

289 
between when it sent the two acks. 
2. The sender does not consider pairs of acks if the first ack was for all the outstanding data, as 
such a pair is guaranteed to have a spurious RTT delay between the first and second ack. 
3. The sender never bases an estimate on an ack that is for only a single packet's worth of data 
(MSS), as these often are delayed acks, and the sender lacks sufficient information to remove 
the timer­induced additional delay. 
4. The sender never bases an estimate on an ack that does not acknowledge new data. This 
prevents the sender from using inaccurate timing information due to packet loss or reordering. 
5. The sender keeps track of the sending times for its data packets, so it can determine the sender 
expansion factor (x 14.5): 
e 
¸ s;s = 
\DeltaT a + C s 
\DeltaT d + C s 
; 
where \DeltaT a is the elapsed time between the arrival of successive acks, \DeltaT d is the elapsed 
time between the departure of the first and last data packet being acknowledged, and C s is the 
sender's clock resolution. 
The sender rejects an estimate if e 
¸ s;s ! 0:9. We use 0.9 instead of 1.0 as a ``fudge factor'' to 
account for self­clocking, which sometimes occurs at exactly the bottleneck rate. 
The sender also computes ``acceptable'' estimates, which are those that do not conform to 
all of the above considerations, but at least conform to the first two. (These estimates will be used 
if SBPP cannot form enough ``good'' estimates.) 
After collecting ``good'' and ``acceptable'' estimates for the entire trace, we then see 
whether we managed to collect 5 or more ``good'' estimates. If so, we take their 95th percentile 
as the bottleneck estimate (allowing for the last 5% to have been corrupted by ack compression, per 
x 16.3.1). If not, then we take the median of the ``acceptable'' estimates as our best guess. 
We find, unfortunately, that SBPP does not work especially well. In both datasets, the 
SBPP bottleneck estimate lies within \Sigma20% of the PBM estimate only about 60% of the time. 
About one third of the estimates are too low, reflecting inaccuracies induced by excessive delays 
incurred by the acks on their return, with the median amount of underestimation being a factor of 
two (and the mean, more than a factor of four). The remaining 5--6% are overestimates, reflecting 
frequent ack compression (x 16.3.1), with an N 1 median overestimation of 60% and a mean of 
175%, though in N 2 these dropped to 45% and 75%. 
A final interesting phenomenon in N 2 is that, about 2% of the time, SBPP was unable to 
form any sound estimate. These all entailed connections to receivers that generated only one ack 
for each entire slow­start ``flight'' (x 11.6.1). Since one of the considerations outlined above requires 
that the first ack of a pair not be an ack for all outstanding data (to avoid introducing a round­trip 
time lull that has nothing to do with the bottleneck spacing), if the network does not drop any data 
packets, then such a receiver will only generate acks for all outstanding data, so the SBPP algorithm 
above fails to find any acceptable measurements. 

290 
14.8.4 Summary of different bottleneck estimators 
In our evaluation of the different bottleneck rate estimators, we found that PBM overall 
appears quite strong. It produces many bandwidth estimates that accord with known link speeds, and 
produces few erroneous results, except for a tendency to misdiagnose a multiple­channel bottleneck 
link in the presence of considerable delay noise. 
Using PBM then as our benchmark, we found that the stressful ``peak rate'' (PR) tech­ 
niques perform poorly, frequently underestimating the bottleneck, as we surmised they probably 
would when developing them in x 14.5. They did, however, serve as useful calibration tests when de­ 
veloping PBM, since they pointed up traces for which we needed to investigate why PBM produced 
an estimate less than that of the conservative PR technique, or greater than that of the optimistic 
PR technique. 
We also found that receiver­based packet pair (RBPP) performs virtually identically to 
PBM, provided that we observe the requirements outlined in x 14.8.2, and are not concerned with 
detecting multi­channel bottleneck links. Unfortunately, one requirement for RBPP is sender co­ 
operation in timestamping the packets it sends, so the receiver can detect out­of­order delivery and 
data packet compression. We have not investigated the degree to which these requirements can be 
eased, but this would be a natural area for future work. 
We unfortunately found that sender­based packet pair (SBPP) does not fare nearly as well 
as RBPP. Even taking care to use only measurements the sender can deduce should be solid, SBPP 
suffers from ack arrival timings perturbed by queueing delays and ack compression. As a result, it 
renders accurate results less than 2/3's of the time. 
Thus, receiver­based bottleneck measurement appears to hold intrinsic advantages over 
sender­based measurement, and fairly simple receiver packet pair techniques, with sender coopera­ 
tion, gain all of the advantages of the more complex PBM, unless we are concerned with detecting 
multi­channel bottleneck links. 
Finally, a particularly interesting question for future work to address is how quickly these 
techniques can form solid estimates. If we envision a transport connection using an estimate of the 
bottleneck bandwidth to aid in its transmission decisions, then we would want to form these esti­ 
mates as early in the connection as possible, particularly since most TCP connections are short­lived 
and hence have little opportunity to adapt to network conditions they observe [DJCME92, Pa94a]. 

291 
Chapter 15 
Packet Loss 
In a packet­switched network that does not provide mechanisms for reserving resources 
within the network on behalf of a particular packet ``flow'', loss is inevitable under conditions 
of load. The Internet is such a network. According to traditional network traffic theory, based 
on Poisson models that emphasize at most fleeting correlations between packet arrivals, one 
can generally engineer a packet­switched network to have as low a packet loss rate as desired. 
Operational experience, however, has been quite contrary and brutal to the Poisson framework 
[JR86, G90, FL91, DJCME92, PF95], which appears woefully inadequate for accurately predicting 
actual network behavior. Recent years have seen the rise of self­similar traffic models, in which 
correlations are extremely long­lived and have a fractal structure, leading to ``burstiness on all time 
scales'' [LTWW94]. Fractal models predict that packet loss is extremely hard to avoid, due to the 
great burstiness of network traffic, and, more generally, due to the lack of a single burst time scale 
for which one can then engineer the network to accommodate. 
We should note that packet loss is not unequivocally a problem. TCP makes splendid 
use of packet loss as an implicit signal that the network is under stress and the TCP sender should 
reduce its sending rate [Ja88]. If the network had immense buffering within it to avoid packet loss, 
this over­engineering would defeat TCP's congestion signal. Furthermore, such buffering does not 
guarantee that the network can promise to always deliver useful throughput [Na87], and, actually, 
things would be worse off, since TCP senders then could not adapt their transmission rates to the 
limited capacity of the bottleneck link. 
In this chapter we look at what our measurements tell us about packet loss in the Internet: 
how frequently it occurs and with what general patterns (x 15.1); differences between loss rates of 
data packets and acks (x 15.2); the degree to which it occurs in bursts (x 15.3); the degree to which 
losses occur at the bottleneck link (x 15.4); how loss rates evolve over time (x 15.5); and how well 
TCP retransmission matches genuine loss (x 15.6). 
15.1 Loss rates 
A fundamental issue in measuring packet loss is to avoid confusing measurement drops 
with genuine losses. Doing so can often be difficult unless the measurement apparatus takes pains to 
accurately report measurement drops. As we saw in x 10.3.1, some do and some do not. Here is one 
of the analysis areas where the effort to ensure that tcpanaly understands the details of the many 

292 
Connection Duration (sec) 
0.5 5.0 50.0 500.0 
0.0 
0.2 
0.4 
0.6 
0.8 
1.0 
Figure 15.1: Connection durations for N 1 (solid) and N 2 (dotted) 
TCP implementations in our study pays off extremely well. Because we can determine whether 
traces suffer from measurement drops, we can exclude those that do from our packet loss analysis 
and avoid what could otherwise be significant inaccuracies. Since, for the most part, measurement 
drops will be uncorrelated with the presence of true network drops, excluding these tainted traces 
should not bias our subsequent analysis. An exception would be if the measurement drops are due to 
large bursts of traffic on the local network overrunning the packet filter's ability to record the burst, 
and if such bursts were coupled with true loss on the local network. Since our interest lies in loss in 
the Internet­in­the­large, and not in loss in local networks (even though local loss also contributes 
to the end­to­end chain), we regard this source of bias as minor. 
Our measurements do, however, suffer from one form of bias: due to their limited duration 
(x 9.3), we will fail to successfully measure and analyze connections that suffered such high packet 
loss rates that they required more than 10 minutes to transfer 100 Kbyte. When these measurement 
attempts reach the 10­minute lifetime without having successfully completed, the entire measure­ 
ment attempt is aborted, and no trace data is retrieved from the NPDs conducting the measurement. 
Unfortunately, due to the centralized control of the experiment, we cannot accurately 
assess how often a measurement failed for this reason, and how often for a different reason, such as 
a loss of connectivity between npd control and one of the remote NPDs (x 5.2, x 9.3). Thus, the 
statistics presented in this section will underestimate Internet packet loss rates somewhat. 
We argue, however, that the bias is, overall, fairly small. Figure 15.1 shows the distribu­ 
tion of the connection durations for N 1 (solid line) and N 2 (dotted line). The vertical line on the 
righthand side of the plot marks the 10­minute maximum duration. The x­axis is logarithmically 
scales, so we see that a large number of the connections in our study completed much sooner than 
the 10 minute upper lifetime. This in turn suggests that the lifetime was generally not a limita­ 
tion. At the end of this section, however, we show that it did significantly bias European loss rates 
towards underestimation. 

293 
We begin our analysis with looking at aggregate packet loss over the course of entire 
connections. In N 1 , out of about 714 thousand packets (data and ack) transmitted, 3.0% failed to 
arrive at the other end. In N 2 , for 4.66 million packets, the figure rose to 4.6%, a significant increase 
that merits further investigation. 
On immediate question is whether the use of additional sites in N 2 (and the absence of a 
few of the N 1 sites) skewed these basic numbers. Indeed, it did, but towards underestimating the 
increase! Of the sites in common, in N 1 , 2.7% of the packets were lost, while in N 2 , this figure 
nearly doubled to 5.2%. Conventional wisdom among TCP researchers holds that a loss rate of 5% 
has a significant adverse effect on TCP performance, because it will greatly limit the size of the 
congestion window and hence the transfer rate, while 3% is often substantially less serious. Thus, it 
behooves us to try to understand the circumstances and details of the increase as much as possible. 
First, we need to address the question of whether the increase in loss rate was due to the 
use of bigger windows in N 2 than in N 1 (x 9.3). Such could easily be the case, since with larger 
windows the transfers will often have significantly more data in flight, and, consequently, will load 
the router queues along the path much more. We can assess the impact of larger windows by looking 
at loss rates of data packets versus those for ack packets. Data packets contribute to queueing and 
having more in flight stresses the forward path. On the other hand, the rate at which a TCP transmits 
data packets adapts to current conditions. Ack packets contribute almost no additional load along 
the reverse path, other than occupying a buffer when queued, so having more of them in flight at one 
time should not significantly alter the loss rate they suffer. They do not adapt to current conditions, 
except during periods of heavy congestion, when an entire window's worth of acks is lost, forcing a 
timeout retransmission. 1 Thus, to compare changes in loss rates between N 1 and N 2 , using the ack 
loss rates should eliminate the bias caused by the different window sizes. We discuss more issues 
concerning data packet loss versus ack loss in x 15.2. 
Overall, in N 1 , acks were actually slightly more likely (3.16%) to be lost than data packets 
(2.96%), while in N 2 the ordering is the opposite (4.25% for acks versus 4.75% for data packets). 
Restricting the comparison to the sites in common, however, changed the discrepancy between data 
packets and acks, with 2.88% for acks versus 2.65% for data packets in N 1 , and 5.14% versus 5.28% 
for the same N 2 figures. So, even if we restrict ourselves to the ack loss rates for the common sites, 
which should be quite sound to compare, we observe a 78% increase in the loss rate, from 2.88% to 
5.14%. 
Another interesting loss rate figure is how the rate changes if we condition on observing at 
least one loss during the connection. Here we make a tacit assumption that a network path has two 
basic states, ``quiescent,'' during which connections tend to not suffer any loss, and ``busy,'' during 
which they tend to suffer loss. The first corresponds to, overall, light or steady enough load that the 
router buffers suffice to avoid packet loss, and the second to sufficient load, overall, to occasionally 
overflow the buffers. We would expect to find that ``busy'' states coincide with the usual peak usage 
times of working hours, and quiescent states with off­peak times. We return to this point below, in 
the discussion of Figure 15.3 and Figure 15.4. 
In N 1 , 52% of the connections between the common sites did not lose a single ack packet. 
However, only 28% of the connections losing at least one ack lost exactly one. For N 2 , the corre­ 
1 The transmission rate of acks can also adapt to current conditions if the loss conditions along both directions of the 
path are correlated, since the rate at which a TCP transmits acks reflects the rate at which it receives data packets. In 
x 15.2 below, however, we find that loss rates in the two directions are nearly uncorrelated. 

294 
Connection Duration (sec) 
0.5 5.0 50.0 500.0 
0.0 
0.2 
0.4 
0.6 
0.8 
1.0 
Figure 15.2: Connection durations for sites common to N 1 (solid) and N 2 (dotted) 
sponding figures are 49% and 20%. We see that part of the change in the higher N 2 ack loss rates 
stems from greater loss during busy periods. The proportion of quiescent periods remains virtually 
unchanged. Similarly, for the common sites, if we condition on a connection suffering at least one 
loss, then the ack loss rate for an N 1 connection climbs from 2.88% to 5.69%, while for N 2 the 
increase goes from 5.14% to 9.16%. Thus, even in N 1 , if the network path was busy (using our 
simplistic definition above), loss rates were quite high, and for N 2 they shot upward to a level that 
in general will seriously impede TCP performance. 
These increases give us strong evidence that networking conditions in one important re­ 
spect degraded during the course of 1995, similar to our earlier finding that several aspects of In­ 
ternet routing degraded during 1995 (x 6.10, x 8.5). Since bottleneck link rates generally increased 
during 1995 (x 14.7.1), we cannot tell from just the loss rate statistic whether users perceived the 
network as delivering better or worse service. A basic measure of perceived level of service is how 
long it takes to transfer a given amount of data. However, when comparing such durations we need 
to keep in mind that the use of bigger windows in N 2 gave N 2 connections more opportunity both to 
``fill the pipe'' and to utilize fast retransmission (x 9.2.7), which gives them performance advantages 
that have little to do with how the network service changed. (For the sites in common, in N 1 the 
mean number of fast retransmissions was 0.98, while in N 2 it climbed to 1.64.) 
Still, we find the comparison illuminating. Figure 15.2 shows the distribution of the du­ 
rations of connections between sites common to both N 1 (solid line) and N 2 (dotted line). For the 
sites in common, the median connection duration diminished from 11.8 sec in N 1 to 10.7 sec in N 2 , 
a rather modest improvement. That single figure does not tell the entire story, though, since we see 
from the figure that the distribution of durations did not unilaterally slide a bit to the left. Instead, 
N 2 connections were likely to be 20% shorter than those in N 1 if they were short, meaning that we 
condition on the duration being ! 12 sec; and 50% longer if we condition on the duration being 
? 12 sec. It seems likely that the differences are due to a higher prevalence of fast retransmission 

295 
Region # N 1 # N 2 N 1 loss rate N 2 loss rate \Delta 
Europe 104 734 2.8% 2.8% 
umont) 
75 562 1.5% 5.8% +287% 
Into Europe 255 1,243 6.2% 11.7% +88% 
Into North America 320 1,544 3.5% 3.2% 
enough to facilitate fast retransmission. 2 
So far, we have treated the Internet as a single aggregated network in our loss analysis. 
Geography, however, plays a crucial role in the prevalence of packet loss. To study geographic 
effects, we partition the connections between the sites common to N 1 and N 2 into four primary 
groups: ``European,'' ``North American,'' ``Into Europe,'' and ``Into North America.'' European con­ 
nections are those with both a European sender and a European receiver. North American have both 
sender and receiver in Canada or the United States (but see below). ``Into Europe'' are connections 
with European data senders and North American data receivers. The terminology is backwards 
here because what we will assess are ack loss rates, and these are generated by the receiver. Hence, 
``Into Europe'' loss rates reflect those experienced by packet streams traveling from North America 
into Europe. Similarly, ``Into North America'' are connections with North American data senders, 
European data receivers, and ack streams traveling from Europe into North America. 
This partition does not include connections to or from Australia, because we had only one 
Australian site common to both N 1 and N 2 , so it would be difficult to gauge the generality of loss 
rates involving it. We note, however, that it experienced a rise of more than a factor of two in the 
loss rates of ack traveling into and out of Australia, from 3.3% in N 1 to 7.8% in N 2 . 
While the above grouping was our original intent, upon examining the data we made one 
further distinction. The sole Canadian site, umont, 
was a major outlier for packet loss in N 2 , so 
large that its presence as one of the 13 North American hosts sufficed to significantly skew the 
overall North American findings. (It was not, however, an outlier in N 1 .) Since we had no other 
Canadian sites in our study, we cannot gauge whether this reflects a problem unique to umont 
or a 
more general problem with Canadian Internet service. Consequently, we removed umont 
from our 
notion of ``North America'' as described above; so, in fact, all of the North American sites discussed 
below are in the United States. We also summarize below connections from U.S. sites to or from 
umont, 
to illustrate its atypical loss rates. 
Table XXI shows the loss rates of ack packets for the different regions. The second and 
third columns give the number of N 1 and N 2 connections that occurred in the region. There were 
6 common European sites and 12 North American sites plus umont. 
The fourth and fifth columns 
give the overall loss rate for the ack packets sent during all of the region's connections, and the final 
column indicates the loss rate change between N 1 and N 2 . Clearly: 
2 Note that, had we not restricted ourselves to the sites common to the two datasets, but instead interpreted Figure 15.1 
in this regard, then we would have drawn a considerably different, less sound conclusion. 

296 
Region N 1 quies. N 2 quies. N 1 cond. loss N 2 cond. loss \Delta 
Europe 48% 58% 5.3% 5.9% +11% 
North America 66% 69% 3.6% 4.4% +21% 
(umont) 60% 15% 3.7% 6.8% +81% 
Into Europe 40% 31% 9.8% 16.9% +73% 
Into N.A. 35% 52% 4.9% 6.0% +22% 
All regions 53% 52% 5.6% 8.7% +54% 
Table XXII: Conditional ack loss rates for different connection geographies 
ffl Europe suffered considerably higher packet loss rates than did North America, but the loss 
rate appears stable. However, we show below that the European figures are biased towards 
underestimation; 
ffl North American loss rates were fairly low and, while the trend is increasing, it is not doing 
so at an ominous rate; 
ffl umont suffered a tremendous increase in packet loss rate, although we lack sufficient data to 
tell if this is a general problem for Canadian networks or specific to the University of Montreal 
or its local region; 
ffl the trans­Atlantic links carrying European traffic to North America had fairly high loss rates, 
but the situation is perhaps improving; and 
ffl the links carrying North American traffic to Europe were a compounding disaster. We note 
that since Europe's rates are significantly lower than those of trans­Atlantic traffic heading 
into Europe, it must be the case that most traffic between two European sites stays inside 
Europe, rather than transiting through North America, even though we sometimes observed 
such routes in x 6.9. 
Table XXII looks at loss rates for the same regions, but now with conditioning on whether 
any acks were lost. The second and third columns give the proportion of quiescent connections, 
where ``quiescent'' is defined as above to mean connections that did not lose any acks. We see 
that, except for umont and the trans­Atlantic links going into North America, the proportion of 
quiescent connections was fairly stable, suggesting that perhaps changes in loss rate are confined to 
already­loaded ``busy'' periods of heavy load. We investigate this possibility in more detail shortly. 
The fourth and fifth columns list the proportion of acks lost, given that at least one ack 
was lost, and the final column summarizes the relative change. None of the conditional loss rates 
is especially heartening, and the trends are all increasing. During N 2 , the trans­Atlantic links into 
Europe were close to unusable during busy periods, with a loss rate of nearly 17%. This matches 
anecdotal reports such as requests the author received to mail hardcopies of papers to European 
researchers since they could not viably retrieve them over the network. In summary, we note that, 
for every region, loss rates for busy connections increased between N 1 and N 2 . 
Within regions, we find considerable site­to­site variation in loss rates, as well as variation 
between loss rates for packets inbound to the site and those outbound (x 15.2). We did not, however, 

297 
Hour (Eastern Standard Time) 
Mean 
Loss 
Rate 
0 5 10 15 20 
0.01 
0.02 
0.03 
Figure 15.3: Hourly variation in ack loss rate for North American connections 
find any other outliers as dramatic as umont in N 2 , so we kept the regions otherwise intact. 
The last aggregate loss statistics we look at are variations of loss rate over the course of the 
day. We expect to find a diurnal cycle, as numerous studies have noted significant hourly variation 
in connection and packet arrival rates ([PF95] and many others). It was this expectation that led us 
to postulate that the distinction made above between ``busy'' and ``quiescent'' connections is broadly 
meaningful. 
Figures 15.3 and 15.4 show the hourly loss rates for the N 2 connections internal to North 
America and Europe, respectively. The North American loss rates, with the x­axis reflecting the 
hour in the Eastern Standard Time zone, clearly follow the oft­observed pattern of activity increasing 
over the morning hours and falling off during the late afternoon. [PF95] notes a pickup in evening 
FTP traffic, which agrees with the secondary peak. One unusual facet of Figure 15.3 is that it 
does not exhibit a noon­time ``dip.'' However, this is almost certainly due to the North American 
traffic spanning three time zones, effectively spreading out lunch­related lulls over several hours. 
The apparent discontinuity between the 23rd hour at the right and the midnight hour at the left, 
however, is puzzling. We have verified that as one approaches midnight, the rates come closer 
together. We do not, though, have an explanation as to why midnight EST would serve as such a 
sharp transition point, given that it corresponds to 9PM Pacific Standard Time, when presumably 
we still see considerable user activity. 
Figure 15.4 differs considerably from Figure 15.3. Here the x­axis reflects the hour in the 
Greenwich Mean Time zone. We observe a morning rise in loss rate, but a considerable noontime 
dip lasting several hours, followed by a striking increase in the late afternoon. Again, the evening 
hours are elevated compared to the early morning hours, with a sharp transition occurring around 
midnight. The late afternoon hours may in part reflect increasing background traffic from North 
American sites, too, since late afternoon GMT coincides with noon and early afternoon EST, which 
we see in Figure 15.3 is the peak North American period. 

298 
Hour (Greenwich Mean Time) 
Mean 
Loss 
Rate 
0 5 10 15 20 
0.0 
0.02 
0.04 
0.06 
0.08 
0.10 
0.12 
Figure 15.4: Hourly variation in ack loss rate for European connections 
0 5 10 15 20 25 
0 
20 
40 
60 
80 
100 
Hour (Eastern Standard Time) 
Number 
of 
Successful 
Measurements 
Figure 15.5: Successful North American measurements, per hour 

299 
0 5 10 15 20 25 
0 
10 
20 
30 
40 
50 
60 
Hour (Greenwich Mean Time) 
Number 
of 
Successful 
Measurements 
Figure 15.6: Successful European measurements, per hour 
We must exercise caution, however, in interpreting Figure 15.4, due to our measurement 
bias against very long­lived connections (discussed at the beginning of this section). We can test 
for the presence of this bias by examining how many successful measurements we made for each 
hour of the day. Because of our Poisson sampling methodology (x 9.1), measurement attempts were 
uniformly distributed over the course of the day. Figure 15.5 shows a histogram of the number of 
successful North American measurements made for each distinct hour of the day. The distribution 
appears fairly even, and, indeed, the measurement times pass the powerful Anderson­Darling A 2 
goodness­of­fit test for uniformity [DS86], using 5% significance (and, indeed, for higher signifi­ 
cance). 
Figure 15.6 shows the same histogram for the European measurements. The bias towards 
the less busy early morning and late evening hours immediately stands out. The distribution fails 
A 2 at all significance levels, as one might expect. The bias is strongest against the 11AM to 1PM 
periods, and eases somewhat in the later afternoon, so the apparent difference between the two 
corresponding peaks in Figure 15.4 may be simply due to measurement bias and not reflect a true 
underlying difference. However, we can certainly conclude based on Figure 15.6 that our analyses 
of European loss rates are in general underestimates. 
15.2 Data packet loss vs. ack loss 
We noted in the previous section that analyzing data packet loss rates can be complicated 
because the size of the data packets and the tendency for them to be sent closely together both add to 
queueing load along the network path. We expect that this load in turn leads to a greater likelihood 
of the data packets being lost, though, because TCP can unfairly distribute available bandwidth 
[FJ92], this is not necessarily the case. We saw in x 15.1 that, in N 1 , acks were actually slightly 
more likely to be lost than data packets, though, in N 2 , the pattern reverses, which we (at least 

300 
partially) attribute to the use of bigger windows in N 2 (x 9.3). 
In this section we take a closer look at the loss rates of data packets versus those of acks. 
We consider any packet carrying one or more bytes of user data as a data packet. We would expect 
to observe some differences between different­sized data packets. Unfortunately, it would prove 
difficult to explore this effect with our data. Some of the sites in our study always used a maximum 
segment size (MSS) of 512 bytes, the common default value, while others used larger sizes whenever 
the opportunity to do so arose. But the site­specific nature of the MSS used means that, for each 
site, the samples of data packet loss rates generally reflect only a small number of packet sizes, 
sometimes only one. Since in x 15.1 we showed that ack loss rates exhibit strong regional variation, 
we could easily conflate a spurious MSS size effect in data loss rates with a genuine, separate effect 
due to the regions. 
Thus, we confine ourselves to a simple definition of ``data packet'' as one carrying any 
user data whatsoever. But in addition, we make a key distinction between ``loaded'' and ``unloaded'' 
data packets. A ``loaded'' data packet is one that presumably queued at the bottleneck link behind 
one of the connection's previous packets, while an unloaded data packet is one that we know did 
not have to queue at the bottleneck behind a predecessor. Here we are abstracting the intricate, 
multi­element network path to a presumably equivalent model of a single element that forwards at 
the bottleneck rate, and at which all significant queueing occurs. 
To tell if a packet is unloaded, we first form an estimate of the bottleneck bandwidth using 
the methodology developed in Chapter 14. If the methodology indicates a bottleneck change or the 
possible presence of a multi­channel bottleneck, then we refrain from further packet­loss analysis. 
If, however, the methodology produces a single bottleneck estimate, ae B , as is generally 
the case, then the methodology also associates lower and upper bounds with ae B (Eqn 14.12): 
ae 
= b=ae 
ø + 
b ; (15.3) 
where b is the size of the packet. Subsequent packets have a load 
– + 
i 
= ø + 
b 
+ max h 
(T s 
i 
preceding packets across 
the bottleneck link, if i will arrive at the bottleneck before they completed transmission. The latter 
will be the case if 
T s 
i ! T s 
i 
bottleneck. 

301 
Per­Connection Packet Loss Rate (%) 
0 10 20 30 40 
0.0 
0.2 
0.4 
0.6 
0.8 
1.0 
Unloaded data pkts 
Loaded data pkts 
Acks 
Figure 15.7: N 2 loss rates for data packets and acks 
If Eqn 15.5 applies to packet i, then we will say that packet i was ``loaded,'' meaning that 
it had to wait for pending transmission of earlier packets. Otherwise, we term it ``unloaded.'' 
The development of the maximal load – + 
i has natural analogs – 
and ``unloaded'' depending on whether we use the maximal, central, or minimal 
definitions for – i . In this section, we exercise conservatism and only consider a packet as unloaded 
for the definition in terms of the maximal – + 
i . 
Presently, our interest in whether a packet is loaded or unloaded comes just from analyzing 
whether the two types have different loss patterns. In x 16.2.6 we look in more detail at the coupling 
between – i and the variation in packet transit times. 
In both N 1 and N 2 , about 2/3's of the data packets were loaded. We might at first expect 
more loaded packets in N 2 , due to its use of bigger windows. Window size, however, determines 
whether the bottleneck link might continuously remain loaded. Even for a relatively small window 
size, the TCP sender will transmit a number of packets (equal to the window size) over a fairly 
short amount of time, and all of these but the first will be loaded. Once the entire window is in 
flight, then a lull comes equal to the mismatch between the small window and the bandwidth­delay 
product corresponding to the bottleneck rate and the RTT. Then the acks for the window arrive in 
short order, and self­clocking leads to another flight of all­but­one loaded packets. Thus, window 
size does not have a great deal of impact on the proportion of loaded packets. 
Figure 15.7 shows the distributions of loss rates during N 2 for unloaded data packets, 
loaded data packets, and acks. All three distributions show considerable probability of zero loss. 4 
3 ae + 
B is associated with – 
above a loss rate of 0%. These reflect the fact that the loss rate is 

302 
From the figure, we immediately see that loaded packets are much more likely to be dropped than 
unloaded packets, as we would expect. In addition, we see that acks are consistently more likely than 
unloaded packets to be dropped, but generally less likely to be dropped than loaded packets, except 
during times of severe loss, above about 14%, which make up the upper 10% of the distributions. 
We interpret the difference between ack and data loss rates as reflecting the fact that, while an ack 
stream presents a much lighter load to the network than a data packet stream (particularly a series 
of loaded data packets), the ack stream does not adapt to the current network conditions, while 
the data packet stream does. Thus, unloaded data packets gain the twin benefits of traveling at a 
time when the connection is not itself significantly contributing to load along the network path, 
and also lowering their transmission rate during times of congestion. Loaded data packets stress 
the network path, but at least they adapt, and, during periods of heavy congestion, their adaptive 
behavior outweighs the advantages of ack streams that otherwise favor acks during periods of lower 
congestion. 
The equivalent set of distributions for N 1 is qualitatively the same, though the distance 
between the three distributions is narrower. This likely reflects both the overall lower loss rates in 
N 1 (x 15.1) and the use of smaller windows limiting loss rates for loaded packets. 
It is interesting to note the extremes that packet loss can reach. In N 2 , the largest unloaded 
data packet loss rate we observed was about 47%. For loaded packets it climbed to 65%. As we 
would expect, these connections suffered egregiously, achieving overall data throughput rates in 
the low hundreds of bytes per second due to lengthy, backed­off timeout periods. However, they 
did manage to successfully complete their transfers within their alloted ten minutes, a testimony to 
TCP's tenacity. For both of these extremes, no acks were lost in the reverse direction! The largest 
ack loss rate was even higher, 68%. Starved for confirmation of forward progress, this connection 
also managed only a few hundred bytes per second. Ironically, no data packets were lost in the 
forward direction! 
As indicated by these extreme cases, clearly packet losses on the forward and reverse paths 
are sometimes completely independent. Indeed, the coefficient of correlation between combined 
(loaded and unloaded) data packet loss rates and ack loss rates in N 1 was about 0.21, with the 
correlation for connections within North America falling to 0.13. In N 2 , however, the loss rates 
become uncorrelated (coefficient of :02), perhaps due to the greater prevalence of significant 
routing asymmetry (Chapter 8). 
Another form of asymmetry is the degree to which loss correlates with the connection's 
throughput. We would expect that data packet loss rates correlate more strongly, and negatively, with 
throughput, since each loss requires a retransmission that subsequently cuts the sender's transmis­ 
sion rate, and perhaps entails a lengthy timeout lull. Ack loss, on the other hand, may go unnoticed, 
if light, since acks are cumulative, and, if another ack arrives shortly, the connection will not stall 
for any appreciable amount of time. 
To fairly gauge the correlation, we need to first account for the different maximum 
throughput rates due to the different bottleneck bandwidth rates. We do so by dividing the achieved 
throughput over the entire connection (total bytes transferred divided by total duration) by the esti­ 
mated bottleneck bandwidth. We then compute `, the coefficient of correlation, between the loga­ 
rithm of the normalized throughput and the loss rates of interest, where the logarithmic transforma­ 
computed in terms of k packets lost out of a total of n, hence 1=n is the minimum possible positive loss rate. Since, for 
most connections, n ß 200 packets, we observe a minimum possible loss rate of around 0.5%. 

303 
Unloaded Data Packet Loss Rate (%) 
P(X 
>= 
x) 
0 10 20 30 40 
0.0001 
0.0010 
0.0100 
0.1000 
1.0000 
Figure 15.8: Complementary distribution plot of N 2 unloaded data packet loss rate 
tion is to reduce the otherwise dominating effect of throughput outliers. 
For N 2 , we find that ` for the overall data loss rate is quite large, about 
rates. Presumably this latter 
effect is because backed­off timeout retransmissions, which have the greatest deleterious effect on 
connection throughput, always generate unloaded data packets, and further back­off occurs when 
these packets are then lost. The corresponding ` for ack loss rates also indicates a fairly strongly 
correlation, with a value of not 
due to 
any coupling between the ack loss rate and the data packet loss rate, because the two are generally 
uncorrelated, as shown previously. Instead, the correlation is probably due to the coupling between 
the ack loss rate and the possibility of losing an entire flight's worth of acks, which then unavoidably 
leads to a timeout retransmission (x 15.6). 
The significant correlation between ack loss rates and normalized connection throughput 
indicates that, when attempting to predict a connection's throughput along a particular forward path, 
it pays to have information about conditions along the reverse path, too. For the North American 
region (as defined in x 15.1), the correlations weaken somewhat, to 
portion of each distribution. Further investigating the distributions, one 
striking feature we find is that the non­zero portion of both the unloaded and loaded data packet loss 
rates is almost exactly exponential, while that for acks is not nearly so close a match. 
Figures 15.8, 15.9, and 15.10 show logarithmically scaled complementary distribution 
plots of the unloaded, loaded, and ack loss rates, conditioned on observing at least one loss. A 
straight line on such a plot corresponds to an exponential distribution. We have added least­squares 
fits to each plot. We see that, for both unloaded and loaded data packets, the loss rate distribution is 

304 
Loaded Data Packet Loss Rate (%) 
P(X 
>= 
x) 
0 10 20 30 40 50 60 
0.0001 
0.0010 
0.0100 
0.1000 
1.0000 
Figure 15.9: Complementary distribution plot of N 2 loaded data packet loss rate 
Ack Loss Rate (%) 
P(X 
>= 
x) 
0 20 40 60 
0.0001 
0.0010 
0.0100 
0.1000 
1.0000 
Figure 15.10: Complementary distribution plot of N 2 ack loss rate 

305 
quite close to exponential, but for acks it deviates considerably more. The effect is widespread: it is 
also present for N 1 , and for the North American and European subsets of N 2 . 
While striking, interpreting the fit to the exponential distribution is difficult. If, for ex­ 
ample, packet loss occurs independently and with a constant probability, then we would expect the 
loss rate to reflect a binomial distribution, but that is not what we observe. (We also know from the 
results in x 15.1 that there is not a single Internet packet loss rate, or anything approaching such a 
situation.) 
It seems likely that the better exponential fit for both loaded and unloaded data loss rates 
than ack loss rates holds a clue. The most salient difference between the transmission of data 
packets and that of acks is that the rate at which the sender transmits data packets adapts to the 
current network conditions, and furthermore it adapts based on observing data packet loss. Thus, 
if we passively measure the loss rate by observing the fate of a connection's TCP data packets, 
then we in fact are making measurements using a mechanism whose goal is to lower the value of 
what we are measuring (by spacing out the measurements). Consequently, we need to take care 
to distinguish between measuring overall Internet packet loss rates, which is best done using non­ 
adaptive sampling, versus measuring loss rates experienced by a transport connection's packets--- 
the two can be quite different. 
15.3 Loss bursts 
In this section we look at the degree to which packet loss occurs in bursts of more than 
one consecutive loss. Analytic models of network behavior often assume individual packet losses 
occur at a fixed rate but independently from other losses, as this assumption aids in keeping the 
models tractable. Accordingly, to gauge the strength of these models we need to address the issue 
of the soundness of this assumption. 
As with loss rates, we expect that the size of loss bursts depends on whether we analyze 
losses of loaded data packets, unloaded data packets, or acks. These each correspond to a different 
transmission rate, and, furthermore, the first two are generated at a rate dynamically adapted to the 
frequency of previously observed packet loss, while acks are not. 
The first question we address is the degree to which packet losses are well­modeled as 
independent. In [Bo93], Bolot investigated this question by comparing the unconditional loss prob­ 
ability, which we denote as P u 
l (ulp in Bolot's paper), with the conditional loss probability, P c 
l (clp), 
where P c 
l is conditioned on the fact that the previous packet was also lost. He found that P c 
l – P u 
l 
always held, which one would expect, as it would be surprising if loss of the previous packet made 
loss of the next packet less likely. He investigated the relationship between P u 
l and P c 
l for different 
packet spacings ffi, ranging from 8 msec to 500 msec. He found that P c 
l approaches P u 
l as ffi in­ 
creases, indicating that loss correlations are short­lived, and concluded that ``losses of probe packets 
are essentially random as long as the probe traffic uses less than 10% of the available capacity of the 
connection over which the probes are sent.'' He also observed that P u 
l stabilized at about 10%, quite 
a high loss rate, though the path being studied included a heavily loaded trans­Atlantic link, and also 
a mid­level network known to have previously experienced 3% loss rates unrelated to congestion. 
Table XXIII summarizes P u 
l and P c 
l for the different types of packets and our two datasets. 
P c 
l conditions on whether the connection's previous packet was lost, even if it is a different type than 
its successor (e.g., a loaded packet lost followed by an unloaded). Clearly, for TCP packets (which 

306 
Type of loss P u 
l P c 
l 
N 1 N 2 N 1 N 2 
Loaded data pkt 2.8% 4.5% 49% 50% 
Unloaded data pkt 3.3% 5.3% 20% 25% 
Ack 3.2% 4.3% 25% 31% 
Table XXIII: Unconditional and conditional loss rates for different packet types 
have a large range of interarrival intervals), we must discard the assumption that loss events are 
well­modeled as independent. Even for the low­burden, relatively low­rate ack packets, the loss 
probability jumps by a factor of seven if the previous ack was lost. We would expect to find the 
disparity strongest for loaded data packets, as these must contend for buffers with the connection's 
own previous packets, as well as any additional traffic, and indeed this is the case. We find the 
effect least strong for unloaded data packets, which accords with these not having to contend with 
the connection's previous packets. 
It is interesting to observe that loaded packets are unconditionally less likely to be lost than 
unloaded packets. We suspect this reflects the fact that lengthy periods of heavy loss or outages will 
lead to timeout retransmissions, and these are unloaded, so they contribute to the loss probability of 
unloaded packets rather than loaded packets. 
The relative differences between P u 
l and P c 
l in Table XXIII all exceed those computed by 
Bolot by a large factor. His greatest observed ratio of P c 
l to P u 
l was about 2.5:1. However, his P u 
l 's 
were all much higher than those in Table XXIII, even for ffi = 500 msec, suggesting that the path he 
measured differed considerably from a ``typical'' path in our study. 
(We also note that, since TCP packet loss events are not well­modeled as independent, 
it behooves us in general to avoid discussing unconditional packet loss in terms of probability, 
since for networking analysis this stochastic term often carries with it an implicit assumption of 
independence among the events. We advocate instead consistent use of the term packet loss rate, 
since this term downplays the implication of independence.) 
Given that packet losses occur in bursts, the next natural question is: how big? To address 
this question, we grouped successive packet losses into outages and computed for each outage the 
number of packets lost and the duration of the outage in terms of the difference between the sending 
times of the two successfully arriving packets delimiting the outage. (Note that a data packet outage 
can encompass both loaded and unloaded packets.) 
Figure 15.11 shows the distributions of the outage durations for data packets and acks in 
N 1 and N 2 , using a logarithmic x­axis. We see considerable variation for the length of small outage 
durations. Our definition of duration as the time between two successfully arriving packets spanning 
the outage means the durations are both upper bounds, 5 and hence will be considerably skewed, for 
small values, by variations in the inter­packet spacing. The distributions are really only solid for 
larger values. Above 200 msec, the distributions agree quite closely, except that N 2 data packet 
5 However, for large estimates the degree of overestimation is limited by the retransmit timer backoff (x 9.2.3), and 
hence the estimated duration is off by at most a factor of two. Since we analyze the distributions using logarithmic x­ 
axes, this factor at most results in a translation of the distribution's body---it does not appreciably alter the shape of the 
log­transformed distribution. 

307 
Outage Duration (sec) 
10^­6 10^­4 10^­2 10^0 10^2 
0.0 
0.2 
0.4 
0.6 
0.8 
1.0 N1 Data 
N1 Acks 
N2 Data 
N2 Acks 
Figure 15.11: Distribution of packet loss outage durations 
outages are considerably shorter lived, no doubt because, in N 2 , the connections often had many 
more data packets in flight (x 9.3), and so had significantly more opportunity to observe short­lived 
outages. 
Figure 15.12 shows the distributions conditioned on the outage exceeding 200 msec, 
which removes the effect of the N 2 data packets observing more short­lived outages. (The x­axis 
extends only to 50 sec even though all of the distributions have some larger points. The plotting 
truncation lets us focus on the main body of the distribution in more detail than we could if we 
included the entire upper tail.) We see that, for outages of this length or longer, all four distributions 
agree fairly closely. 
It is clear from Figure 15.11 that outage durations span several orders of magnitude. For 
example, 10% of the N 2 ack outages were 33 msec or shorter, while another 10% were 3.2 sec or 
longer, a factor of a hundred larger. Furthermore, the upper tails of the distributions are consistent 
with those of Pareto distributions. Figure 15.13 shows a complementary distribution plot of the 
duration of N 2 ack outages, for those lasting more than 2 sec (about 16% of all the outages). Both 
axes are log­scaled, so a straight line on the plot corresponds to a Pareto distribution. We see the 
long outages fit quite well to a Pareto distribution with shape parameter ff = 1:06, except for the 
extreme upper tail, to which we will return in a moment. 
A shape parameter ff Ÿ 2 means that the distribution has infinite variance, indicating 
immense variability. Pareto distributions for activity and inactivity periods play key roles in some 
models of self­similar traffic [WTSW95, WP97, WPT97]. We do not attempt further analysis here 
of the possible role of packet loss outages in contributing to self­similar correlations in aggregate 
network traffic, but note that it may prove a fruitful area for further research. 
However, it is clear in the plot that the extreme upper tail does not fit the same Pareto distri­ 
bution. This discrepancy could simply be because the uppermost tail is subject to truncation, due to 
the 600­second lifetime to which our connections were limited (x 9.3). But the discrepancy could in­ 

308 
Outage Duration (sec) 
0.5 1.0 5.0 50.0 
0.0 
0.2 
0.4 
0.6 
0.8 
1.0 N1 Data 
N1 Acks 
N2 Data 
N2 Acks 
Figure 15.12: Distribution of packet loss outage durations exceeding 200 msec 
Ack Outage Duration (sec) 
P[X 
>= 
x] 
5 10 50 100 500 
0.0001 
0.0010 
0.0100 
0.1000 
1.0000 
Figure 15.13: Log­log complementary distribution plot of N 2 ack outage durations 

309 
stead reflect two different loss mechanisms. We showed in x 6.8 that ``temporary outages'' observed 
by traceroute measurements appear well­described using exponential distributions, which are 
much less volatile than Pareto distributions. That analysis, however, was confined to time scales of 
30 sec or longer, and, for R 2 (corresponding in time to N 2 ), we found a mixture of exponentials, 
with the second only fitting to outages exceeding 75 sec in duration. This latter fit corresponds to the 
extreme upper tail in Figure 15.13. This in turn leads us to speculate that the distribution of outage 
durations might reflect a Pareto distribution for losses due to heavy congestion, and an exponential 
distribution for losses due to routing outages. We could test this hypothesis by gathering packet loss 
measurements made over longer periods of time, which would eliminate the ambiguities presented 
by the 600­second lifetime truncating the upper tail of our measurements. 
We might also consider analyzing the number of lost packets in an outage, rather than 
the duration of the outage. This value, however, is much more subject to fluctuation due to the 
particulars of how many packets the TCP had in flight prior to the outage, or how many acks it had 
to generate during the outage in response to incoming data packets. We note that the mean number 
of packets lost during an outage was around 1.5, slightly lower for acks and higher for data packets. 
The loss extremes we observed were 68 consecutive data packets and 40 consecutive acks (most 
of which were dups in response to a large number of incoming packets). These extremes are less 
interesting than the extreme outage durations, because the former are specific to the structure of the 
TCP connections---both occurred due to very large numbers of data packets in flight, 
We also note that the patterns of loss bursts we observe might be greatly shaped by use of 
``drop­tail'' queueing. With the drop­tail policy, a router queues incoming packets until the available 
buffer space is exhausted, and then drops any additional arrivals until sufficient space becomes 
available again. Routers using drop­tail comprise the vast majority of Internet routers, no doubt 
because it is very simple to implement. 
Simulations show that drop­tail leads to large bursts of losses when a flight of closely­ 
spaced packets arrive at a router with no available buffers, and the entire flight is dropped [FJ93]. 
Related to this problem is a basic unfairness in how packets are dropped: a connection may suffer 
a large number of losses because a different connection is occupying all of the router's buffer. In 
response to these problems, [FJ93] developed the ``Random Early Drop'' (RED) policy, in which 
the router drops (or marks) incoming arrivals before all of the buffer has been exhausted. These 
drops are made with probabilities reflecting the proportion of the router's resources used by the 
connection, so the policy is much more fair than drop­tail. Because RED spreads out losses over 
time, widespread deployment of RED could significantly alter loss patterns and the corresponding 
connection dynamics. 
A final loss burst pattern we investigated was the presence of periodic losses: outages oc­ 
curring a fixed interval apart. Floyd and Jacobson observed periodic losses and described how they 
could arise due to global synchronization of the times at which routers exchange updates [FJ94]. 
They showed how fixed­interval timers such as thirty second update periods act as resonant fre­ 
quencies, which can synchronize in phase to other events occurring at the same frequency. Periodic 
losses are thus possibly symptomatic of widespread synchronization in the network, which can have 
debilitating effects on network performance, especially since large loss periods can in turn synchro­ 
nize all of the TCP senders that suffer a loss during the period. 
Unfortunately, our measurements are ill­suited to detecting periodic loss. Rather than 
having fixed intervals between our loss ``probes'' (i.e., the individual packets of a single TCP con­ 

310 
nection), which would then lend themselves nicely to frequency­domain analysis, we have variable 
intervals. Furthermore, we used much larger, variable intervals between groups of measurements 
(connections), precisely to avoid problems with the measurements synchronizing to any periodici­ 
ties present in the network. Thus, while we can analyze the timing of all of the lost packets in our 
measurements, the measurements themselves are sparse, and also are cluttered with a great deal of 
loss that is clearly not periodic. 
We attempted to analyze for periodic loss by first identifying a North American subset 
of our sites with clocks highly synchronized to each other. We identified the day with the most 
connections between those sites and extracted from the traces a dataset giving the times L i of each 
packet loss during those connections. We then constructed plots of L i versus L i mod ¯, and varied 
¯ through the range of 1; 2; : : : ; 120 sec. We hoped to find a ¯ for which many of the L i mod 
¯ clustered about a particular value. However, no compelling modulus emerged. We repeated 
the analysis for data packets sent to Europe, shown in Table XXII as the most loss­prone Internet 
path, to test whether perhaps their heavy losses are due in part to a periodic component rather than 
congestion. Again, we did not find persuasive evidence of frequent periodic losses. 
We conclude that periodic losses do not strongly dominate TCP packet losses. However, 
the mismatch between our measurements and those needed to thoroughly examine the question of 
periodic losses is great enough that we cannot from our evidence conclude that such losses do not 
regularly occur. 
15.4 Loss location 
We discussed in Chapter 14 how each network path contains one (or more) ``bottleneck'' 
element(s) that limit the maximum rate a connection using the path can achieve. It is natural to 
assume that this bottleneck element is also the point of congestion along the path, because it has the 
least amount of one of the network's most important resources, namely bandwidth. Consequently, 
for a given load in terms of volume of packets to forward along a network path, the bottleneck 
elements will be the most stressed of those along the path, since they require the most time to 
service the load. With this assumption, we are again (as in x 15.2) abstracting the intricate, multi­ 
element network path to a presumably equivalent model of a single element that forwards at the 
bottleneck rate, and at which all significant queueing occurs. 
One might think that, with only end­to­end measurements, one lacks sufficient informa­ 
tion to verify whether in fact loss occurs at the bottleneck or at some other element. Sometimes, 
however, we can, as illustrated by Figure 15.14 and Figure 15.15. Both sequence plots reflect data 
packet arrivals at the receiver, with the packets flowing in steadily at the bottleneck rate. In each 
plot, one packet has been lost, and the circle indicates where it would have arrived had it not been 
lost, and had it likewise arrived at the bottleneck rate. In Figure 15.14, its successor arrives in the 
position where the lost packet would have otherwise arrived. This indicates its successor did not 
queue behind the lost packet, but instead behind the lost packet's predecessor; hence the lost packet 
must never have made it across the bottleneck link. In Figure 15.15, however, the successor arrives 
in the same position that it would have, had the lost packet safely arrived too. Thus, the successor 
did queue behind the lost packet at the bottleneck, and we conclude that the lost packet did indeed 
make it across the bottleneck link, only to be dropped later. 
In general, we prefer that packets are dropped before the bottleneck, so they do not fruit­ 

311 
Time 
Sequence 
# 
6.6 6.8 7.0 7.2 
43000 
44000 
45000 
46000 
47000 
Figure 15.14: Receiver sequence plot showing packet lost at or before bottleneck link 
Time 
Sequence 
# 
3.4 3.6 3.8 4.0 
16000 
17000 
18000 
19000 
Figure 15.15: Receiver sequence plot showing packet lost after bottleneck link 

312 
lessly consume the (usually) scarce bottleneck resource. In this section, we analyze how often this 
occurs. We first clarify our terminology. We will refer to a packet lost after it has been successfully 
forwarded by the bottleneck element as occurring ``after the bottleneck,'' while one lost earlier as 
occurring ``before the bottleneck.'' These latter may have been lost because of a full queue just be­ 
fore the bottleneck element, or may have been lost further upstream. Some network paths may have 
multiple bottlenecks, meaning a number of elements with the same limiting rate. Since our analysis 
is based on the patterns in Figures 15.14 and 15.15, in the case of multiple bottlenecks we consider 
only loss after or before the first of the bottlenecks. Loss prior to subsequent bottlenecks will still 
appear at the receiver as in Figure 15.15, since the data packets will have already been spread out 
by the first bottleneck. 
Our analysis is doomed to be inexact, since effects such as data packet compression 
(x 16.3.2) and spurious extra delay often obscure the patterns so clearly evinced in Figures 15.14 
and 15.15. But we still aspire to attempt some sort of meaningful analysis, since the basic question 
of position of loss is an intriguing one, with the potential to reshape our abstractions when analyzing 
networks. 
We proceed as follows. For each lost data packet, we check whether both its predecessor 
and successor arrived successfully. If not, then we ignore the packet for our analysis, which removes 
from our possible results the effects of loss bursts. Since we know from x 15.3 that loss bursts are 
not uncommon, the resulting bias means our results will at best be only qualitative. (We attempted to 
extend the analysis to include loss bursts, but the ambiguities of whether the next successful packet 
had to queue behind only some of the packets lost in the burst proved too difficult to remove.) 
If both predecessor and successor arrived, then we check whether the lost packet was 
sufficiently ``loaded'' (x 15.2) that, upon arriving at the bottleneck, it would find its predecessor 
waiting in the queue, not yet having begun its service. If not, then we again ignore the packet for 
our analysis. Doing so assures we only analyze lost packets that would nominally have occupied a 
full ``slot'' at the queue, and not a partial slot due to arriving while its predecessor was in the process 
of transmission across the bottleneck. 
If the lost packet was sufficiently loaded, then we check whether its successor was sent 
soon enough after that, had the lost packet queued at the bottleneck, its successor would have arrived 
at the bottleneck before the lost packet began its bottleneck transmission, and thus the successor 
would have been delayed a full ``slot'' in the queue, too. If the successor was sent too late, we again 
ignore the lost packet for our analysis. 
If the successor was sent sufficiently soon after the lost packet, then we next inspect the 
arrival time of the successor. If it is within \Sigma25% of the time expected had the lost packet never been 
transmitted (no bottleneck ``load'' incurred), then we consider the lost packet as having been dropped 
before the bottleneck. If the successor arrives within \Sigma25% of the time expected had the lost packet 
indeed loaded the bottleneck, then we consider the loss as occurring after the bottleneck. If the 
successor's arrival is between these two ranges, then its arrival is ``ambiguous,'' and if its arrival is 
after (or before) both ranges, then its arrival is ``inconsistent,'' meaning the simple packets­arriving­ 
at­the­bottleneck­rate scenario we envision is inadequate, probably due to downstream queueing. 
In both N 1 and N 2 , about a third of the losses fit the ``inconsistent'' category, and almost 
none were ``ambiguous.'' Of the remaining two­thirds, we find that, in N 1 , fully 48% of the losses 
occurred after the bottleneck. In N 2 , the figure falls to 28%. These figures, however, are less than 
solid in two important ways. First, if a packet is lost before the bottleneck, but its successor queues 

313 
behind a packet from another connection at the bottleneck, then we will still obtain the signature of 
an after­bottleneck loss. It is difficult to see how to quantify the frequency of this effect given only 
end­to­end measurement data. Second, our analysis is somewhat skewed by the presence of sites in 
our study with low­speed Internet connections. For connections involving these sites, the bottleneck 
will often be immediately at the sender (or before the receiver), so there is little opportunity for loss 
before (or after) the bottleneck. If we restrict our analysis to only connections with a bottleneck rate 
exceeding 100 Kbyte/sec, then in N 1 we find 36% of the losses occur after the bottleneck, and 26% 
in N 2 . 
From this analysis, we conclude that, for isolated packet losses (not bursts), the assump­ 
tion that loss occurs at or before the bottleneck link is certainly true more often than not. But if loss 
position is critical to some analysis, then one must accommodate the possibility of loss occurring 
after the bottleneck. We also conclude that perhaps 25% of packet loss occurs regretfully late in 
the network path, meaning that an upstream bottleneck link spent its scarce resources carrying a 
doomed packet. 
15.5 Evolution of packet loss rate 
In this section we look at how packet loss rates along an Internet path evolve over time. 
Our goal is to determine how fruitful it might be to cache packet loss information for Internet paths 
to better estimate the service we might expect from the paths in the future. For each path in our 
study, we analyze the evolution of the ack loss rate along the path in several different ways. Clearly, 
there will be great variation among some of the paths in how the loss rate evolves over time. But 
we presently limit ourselves to investigating overall patterns of loss rate evolution, aggregated over 
all of the N 2 connections. We do not analyze the N 1 connections because few of the N 1 paths were 
measured frequently enough to allow solid analysis. 
We first look at how well observing no loss along the path for a 100 Kbyte connection 
predicts experiencing no loss along the path for another such connection at some point in the future. 
For each zero­loss connection, c, we compute the pair h\DeltaT c ; I z 
c i, where \DeltaT c is the time between 
that connection and the next successful connection, c 0 , we observed along that path; and I z 
c is an 
indicator function with a value of 1 if c 0 also experienced no loss, and 0 if it did. 
After constructing these pairs, we sort them on \DeltaT c and then compute the probability 
P z (\DeltaT ) that a connection that comes an interval \DeltaT after a zero­loss connection will also be zero­ 
loss, as follows. Let I z 
(i) denote the ith indicator, sorted on \DeltaT c . Beginning with b 
P z 
0 = 1, we run 
an exponentially­weighted moving average (EWMA) with ff = 0:01 through the sorted indicators, 
where the ith value of the average is computed as 
b 
P z 
i = (1 
preceding 100 values of I z , though 
earlier values still contribute to the smoothing. Our goal is to turn the indicator values into meaning­ 
ful probability estimates, while still allowing for effects that are localized to different time intervals. 
Figure 15.16 shows how b 
P z (\DeltaT ) evolves with time. The x­axis gives the time between 
the first zero­loss connection and the subsequent connection, logarithmically scaled, and the y­axis 
gives the smoothed probability that the subsequent connection is also zero­loss. 

314 
Interval Between Connections (sec) 
Probability 
Also 
Zero­Loss 
10^2 10^3 10^4 10^5 10^6 
0.7 
0.8 
0.9 
1.0 
Figure 15.16: Evolution of how well observing a zero­loss connection predicts that a future connec­ 
tion will also be zero­loss 
We had very few successive connections in our study separated by less than 60 sec, be­ 
cause the NPDs reuse TCP connection identifiers (to aid in filtering the traffic, per x A.2), and most 
TCP implementations set a minimum waiting interval on reusing identifiers of 1 minute or more. 6 
Because of the combination of exponential smoothing and very few closely­spaced suc­ 
cessive connections, the leftmost portion of the plot exhibits an artifact in terms of a steep dip from 
probability 1.0 to probability 0.8. Had we instead used an initial probability of b 
P z 
0 = 0:8, then this 
spike would disappear. Putting aside the spike, we see that the probability of again observing a 
zero­loss connection stays at about 0.75 for intervals on the order of a few minutes to a few hours. 
Above about 6 hours, it approaches what appears to be a ``steady state'' of 0.70, which continues all 
the way out to several weeks. Thus, observing a zero­loss connection remains a good predictor of 
observing future zero­loss connections, even for points in time quite far in the future. 
Figure 15.17 shows the same evolution except for the predictive power of observing a non­ 
zero­loss connection rather than a zero­loss connection. The pattern is similar, though the steady 
state shows signs of declining on time scales of weeks. The ``notch'' at about 6 hours (21,600 sec) 
is somewhat puzzling, though it is perhaps simply an artifact, as the region surrounding the notch 
contains only about 200 points. The notch at four minutes is likewise puzzling: it contains 20% of 
all of the points, and hence is clearly not spurious, but it is difficult to see what mechanism would 
lead to less correlation between connections 3­5 minutes apart compared to those further apart. (The 
comparable notch in Figure 15.16 occurs instead at two minutes, and contains only 3% of the points, 
so it is perhaps spurious.) 
The final aspect of packet loss evolution we look at is how loss rates change over time. 
For each connection, we compute hT c ; – c i, where T c is the time when the connection began and – c 
6 The TCP specification sets this time at 4 minutes, though it provides exceptions for which it can be bypassed [Br89]. 

315 
Interval Between Connections (sec) 
Probability 
Also 
Non­Zero­Loss 
10^2 10^3 10^4 10^5 10^6 
0.7 
0.8 
0.9 
1.0 
Figure 15.17: Evolution of how well observing a non­zero­loss connection predicts that a future 
connection will also be non­zero­loss 
is the ack loss rate. We then compute for consecutive connections c 1 and c 2 along the same path the 
pair h\DeltaT 1;2 ; \Lambda 1;2 i, where: 
\DeltaT 1;2 = T c 2 
es as \DeltaT 1;2 increases, where the 
smoothing is done with ff = 0:01 and with an initial value of \Lambda 0;0 = 0. We see an almost immedi­ 
ate jump to a mean difference of \Sigma2% in loss rate, followed by a steady climb up to a difference of 
\Sigma4% at about 10 hours, followed by a jump to the \Sigma6 
a path is 
a good predictor that we will continue to not observe loss along the path, even far into the future; 
that the same holds almost as strongly for observing loss predicting we will observe future loss; but 
that the farther into the future we wish to project, the more difficult it is to accurately assess the 
magnitude of the loss rate based on the magnitude of the currently observed loss rate. These find­ 
ings support the notion developed earlier in this chapter that network paths have two general states, 
a tendency towards loss­free connections (``quiescent''), and a tendency towards lossy connections 
(``busy''), and provide evidence that both states are long­lived, on time scales of hours, presumably 
because they are functions of whether the path has adequate capacity for the aggregate traffic deliv­ 
ered to it, and aggregate traffic rates generally change on time scales of hours [PF95]. We also find 
that, while we may predict future loss rates fairly accurately for time scales of minutes to hours, as 
time scales grow beyond, our predictive power diminishes. 

316 
Interval Between Connections (sec) 
Mean 
Difference 
In 
Loss 
Rate 
10^2 10^3 10^4 10^5 10^6 
0.0 
0.02 
0.04 
0.06 
0.08 
Figure 15.18: Evolution of the mean difference in loss­rate between successive connections along 
the same path 
15.6 Efficacy of TCP retransmission 
The final aspect of packet loss we investigate is how efficiently TCP deals with it. Ideally, 
TCP retransmits any lost data until it is successfully received, but never retransmits unnecessarily, 
as that would waste network resources. However, the transmitting TCP lacks perfect information, 
and consequently will sometimes indeed retransmit unnecessarily. For example, TCP acknowl­ 
edgements are not transmitted reliably; so, if a flight of data packets all arrive successfully at the 
receiver, but all of the corresponding acknowledgements are lost, then the TCP has no choice but to 
retransmit when the retransmission timer expires. 
We analyzed the efficacy of retransmission by the different TCPs in our study as follows. 
For each connection, we examine each retransmitted packet P r to see if the data contained in P r had 
already been successfully sent. 7 Note that the earlier, successful transmission may not have arrived 
yet at the receiver at the time of the retransmission; we consider it successful, however, if an earlier 
transmission of the data ever arrives at the receiver. 
If P r contained data that had not previously been successfully transmitted to the receiver, 
then we term P r ``necessary,'' otherwise we term it ``redundant.'' In both N 1 and N 2 , about 40% 
of the retransmissions were redundant! As an aggregate statistic, this is not a happy number. It 
means that two times out of five, the TCP should (1) not have retransmitted, and (2) not have cut 
its congestion window, if the retransmission led it to do so. However, we need to investigate the 
40% figure better, since there are a number of different reasons why a TCP might send redundant 
retransmissions (RRs): 
7 The exact test is whether all of the data in Pr had been successfully sent. This fine point can be important if different 
portions of Pr 's data were earlier sent in different packets. 

317 
Time 
Sequence 
# 
0.45 0.50 0.55 0.60 0.65 
60000 
70000 
80000 
90000 
Figure 15.19: Receiver sequence plot showing large number of sequence holes 
unavoidable We mentioned earlier that, if the network drops all of the acks for a flight of data 
packets, then the TCP sender has no choice but to retransmit, since no further feedback will 
be forthcoming from the receiver. 
pathological The packet was a timeout retransmission, but the interval between the data's earlier 
transmission and this packet's was less than the minimum round­trip time ever seen. Hence, 
the retransmission timeout used by the TCP was absolutely broken---the receiver did not 
even have a chance to acknowledge the data---and, furthermore, a simple test by the TCP to 
make sure that at least the minimum RTT had elapsed would have prevented the redundant 
retransmission. 
coarse feedback Since TCP acknowledgements simply give the highest data sequence number re­ 
ceived in­order, when a TCP retransmits with a window larger than one packet (such as during 
slow­start after a timeout), it may transmit unnecessary packets because the receiver lacks a 
fine enough feedback mechanism to tell it which above­sequence packets have already ar­ 
rived. Figures 15.19 and 15.20 illustrate the problem. In the first sequence plot (measured 
at the data receiver), we see that the sender has a large amount of data in flight, which until 
about T = 0:47 has steadily streamed in. At that point, however, the packet with sequence 
number 59,905 is lost. Many more packets continue streaming in, but they contain numer­ 
ous holes where some were lost. The new arrivals generate a torrent of duplicate acks in 
response. Since, however, the acks only provide coarse feedback to the sender, all the sender 
really knows is that sequence 59,905 was lost, and many more packets safely arrived---but it 
does not know which. 
The sender retransmits the first missing packet via fast retransmission (x 9.2.7), and this 
packet arrives at the receiver just before T = 0:6. The receiver duly acknowledges up to the 
next hole, and even generates some duplicate acks for new data arriving at sequence 90,625 

318 
Time 
Sequence 
# 
1.95 2.00 2.05 2.10 2.15 2.20 2.25 
65000 
70000 
75000 
80000 
85000 
90000 
95000 
Figure 15.20: Redundant retransmissions subsequent to previous figure 
and above (sent due to fast recovery). These in turn lead to a fast retransmission for the next 
hole, arriving at T = 0:63. At this point, however, the sender does not see any more incoming 
acks allowing it to send more data via fast recovery (and it has halved its congestion window 
twice, once per fast retransmission event, so it will take a while for more dup acks to inflate 
the window far enough to enable fast recovery). Consequently, self­clocking ceases and the 
sender stalls until a retransmission timeout occurs. 
Until now, the retransmissions have all been necessary. The retransmissions after the timeout, 
however, are a disaster, as shown in Figure 15.20. The first packet retransmitted after the 
timeout was also necessary. Unfortunately, the acks generated by it (shown as large squares 
in the plot) rapidly open the sender's congestion window due to slow start, and it sends larger 
and larger flights of packets. Nearly all of these retransmitted packets are unnecessary---all 
that is really needed is to fill the sporadic holes shown in Figure 15.19. Every duplicate ack 
in Figure 15.20 corresponds to an unnecessary retransmission, yet because the sender lacks 
fine­grain information regarding which above­sequence packets the receiver already has, it 
continues retransmitting to fill the known holes (as indicated by the latest ack it has received), 
as well as pouring additional, unnecessary packets into the network---23, all told. 
The TCP research community has long known about this problem and is in the midst of stan­ 
dardizing a TCP extension to remedy it. With the extension, a ``selective acknowledgement'' 
(SACK) option, acks can carry additional information concerning above­sequence packets 
that have arrived at the receiver (x 13.1.3). The sender then uses this information to select 
which packets require retransmission. 
We consider an RR as reflecting TCP's ``coarse feedback'' problem if it occurred after the 
arrival of an ack that itself was sent after the original copy of the data arrived at the receiver. 
Presumably, had we used SACK, this ack could have conveyed to the sender that the data had 

319 
Type of RR N 1 total N 2 total N 1 Solaris N 2 Solaris N 1 Other N 2 Other 
% all packets 2% 3% 6% 6% 1% 2% 
% retransmissions 43% 38% 66% 59% 26% 28% 
Unavoidable 25% 25% 14% 33% 44% 17% 
Pathological 2% 7% 3% 11% 0% 2% 
Coarse feedback 18% 41% 1% 1% 51% 80% 
Bad RTO 55% 28% 81% 55% 4% 1% 
Table XXIV: Proportion of redundant retransmissions (RRs) due to different causes 
already arrived, and the sender would have avoided the RR. 
bad RTO If the RR was prompted by a timeout, and if an acknowledgment for the previously sent 
data arrives after the timeout retransmission, then the TCP selected too low a value for its 
retransmission timeout (RTO). The RR could have been avoided simply by waiting longer. 
Table XXIV summarizes the prevalence of the different types of RRs in N 1 and N 2 . The 
second and third columns give the overall percentage of the N 1 and N 2 RRs due to each type. 
The fourth and fifth columns give the same figures if we restrict the analysis to just Solaris TCP 
senders, since in x 11.5.10 we discussed how it is prone to underestimating RTO and consequently 
retransmitting too early, so we would expect it to exhibit a higher frequency of ``pathological'' and 
``bad RTO'' types of RRs than the other TCPs in our study. The final two columns summarize the 
frequency of each type of retransmission for the non­Solaris TCPs. 8 
We see that a fair proportion of the RRs were unavoidable. (Some of these might, however, 
have been avoidable had the receiving TCP generated more acks.) We note that for N 2 , which, with 
its bigger windows (x 9.3), had more opportunity to successfully transmit an ack for part of the 
window, only about 1/6 of the RRs for non­Solaris TCPs were unavoidable. Clearly it is worth our 
efforts to first eliminate the avoidable 5/6's. 
Pathological RRs could be eliminated with a simple test: if the packet being retransmitted 
was previously transmitted (or retransmitted) less than one RTT in the past, then simply do not 
retransmit it. Aside from Solaris, most pathological RRs occur within retransmission epochs, during 
which earlier RRs lead to enough duplicate acks that the TCP resends data it sent shortly before due 
to the window advancing. For Solaris, many occurred due to the problems the Solaris TCP timer 
has with adapting to the true round­trip time, cf. x 11.5.10 and x 11.5.1. 
``Coarse feedback'' RRs would presumably all be fixed using SACK. The increase in non­ 
Solaris coarse feedback RRs in N 2 is no doubt due to the use of bigger windows in N 2 , and hence 
more opportunity for acks (and, thus, finer feedback) to potentially inform the sending TCP of what 
8 In x 11.5.8 we identified the Linux 1.0 TCP as suffering from many RRs due to its practice of retransmitting all 
the unacknowledged packets rather than just the first. However, in x 10.5 we discussed how many of the Linux traces 
could not be unambiguously paired in terms of packet departures and arrivals, precisely because of this retransmission 
problem. In this section, we confine our retransmission analysis to those traces that we could unambiguously pair, so 
we can distinguish between the different types of RRs (in particular, ``coarse feedback,'' which depends on whether the 
original data arrived before a subsequently transmitted and received ack). Consequently, we analyzed very few Linux 1.0 
traces and thus their presence does not significantly affect the statistics in Table XXIV. 

320 
Time 
Sequence 
# 
10 15 20 25 
20000 
40000 
60000 
80000 
Figure 15.21: Sender sequence plot showing failure of RTO adaption 
packets the receiver already has. It is encouraging to see that, aside from Solaris TCPs, deployment 
of SACK remedies almost all of the avoidable RRs. It makes almost no difference for Solaris TCP, 
since many of its RRs occur before any ack for the previous transmission of data has arrived from 
the receiver, due to the Solaris timer adaption problems. 
``Bad RTO'' RRs indicate that the TCP's computation of the retransmission timeout was 
erroneous. These are the bane of Solaris TCP, as noted above. More than half of its RRs were 
due to miscalculating the timeout. Fixing the calculation eliminates 4­5% of all of the data traffic 
generated by the TCP. 9 
The TCP standard requires use of Jacobson's exponentially­weighted moving average 
(EWMA) round­trip time (RTT) estimate and associated variance estimate ([Br89, 4.2.2.15] and 
[Ja88]), along with Karn's algorithm for eliminating ambiguous RTT estimates [KP87]. If we as­ 
sume that the non­Solaris TCPs do in fact implement this algorithm, then from Table XXIV we see 
that it performs quite well. 
Figure 15.21 shows an instance where it failed, or at least where HP/UX 9.05's imple­ 
mentation of it failed. Here the receiving TCP is offering a very large window, to which the sending 
TCP is rapidly opening its congestion window in the face of no packet loss. The bottleneck link, 
however, can only support about 7.3 Kbyte/sec, and so the window represents a large mismatch 
with the correct window size needed to avoid overloading the bottleneck. Consequently, the RTT 
rises rapidly as packets queue behind their predecessors. During the last five round trips, starting at 
time T = 10, the RTT increases by about 1 second during each trip. The RTO estimation algorithm 
fails to track this rapid increase, and at time T = 23 a retransmission timeout occurs, even though 
the corresponding ack is just about to arrive. Subsequent acks for the first transmissions of the data 
then rapidly feed the slow­start sequence begun by the timeout retransmission, and the sending TCP 
9 We note that this problem has already been fixed in Solaris 2.5.1. 

321 
promptly resends 63 packets, all redundant. However, we found pathological behavior like that 
shown in the figure quite rare. 
While the standard RTO estimation almost never leads to an unnecessary timeout retrans­ 
mission, a separate question, unanswered by these statistics, is whether it could be safely modified 
to be less conservative. At present the timeout often occurs after much more than an RTT elapses. 
A more aggressive RTO algorithm could potentially lead to higher connection throughput, because 
timeout lulls would be less costly than they currently are. Yet, if the more aggressive algorithm leads 
to excessive retransmission during times of RTT fluctuation, then it could contribute to congestion 
collapse, a major disaster. 
Answering the question of how the RTO estimation might be reengineered is a complex 
problem. The current timer uses coarse­grained (as much as 500 msec granularity) measurements 
with some subtle adjustments to compensate for the granularity, as well as timing only one packet 
per flight. A revised timer might take advantage of both higher­resolution clocks and the opportunity 
to time multiple packets per flight. The first affects the adjustment factors used by the current 
algorithm, and the second changes the constant used in the EWMA estimator. Because the issues 
are complex, we leave this interesting question for future work. 
In summary: assuring standard­conformant RTO calculations and deploying the SACK 
option together eliminate virtually all of the avoidable redundant retransmissions. The few remain­ 
ing RRs are rare enough to not present, overall, any serious performance problems. 
The last aspects of TCP retransmission we investigate are the patterns of packet loss dur­ 
ing fast recovery sequences. The TCP fast recovery mechanism, described in x 9.2.7, works best 
when only a single packet out of a flight is lost. When multiple packets in one flight are lost, the 
fast recovery mechanism generally will not suffice to retransmit all of the missing packets, and the 
TCP transfer will subsequently stall until a retransmission timeout, seriously diminishing through­ 
put [FF96, Ho96]. It was this problem that motivated the development of the SACK option, which 
allows a TCP to efficiently recover from multiple losses. 
A separate fast recovery problem occurs when the retransmitted packet is also lost. 10 
When this happens, the TCP will again stall until a retransmission timeout expires. In some circum­ 
stances, and depending on the algorithm used by a TCP to act upon information it acquires by using 
the SACK option, a TCP using SACK can avoid this timeout by determining that the retransmitted 
packet was itself lost, and retransmitting it again. 
While these problems have been recognized for quite a while, no hard data has been avail­ 
able in order to gauge the degree to which they actually present difficulties for Internet connections. 
We analyzed the N 1 and N 2 measurements to provide such data, as follows. For each packet retrans­ 
mitted using the fast recovery mechanism, we tallied whether the retransmitted packet was lost or 
successfully arrived at the receiver, and also counted the number of outstanding (unacknowledged) 
packets at the time of the retransmission that were lost. 
In N 1 , out of 1,178 packets retransmitted using fast recovery, only 3.9% were themselves 
lost. In N 2 , 15,444 packets were retransmitted using fast recovery (a significantly higher proportion 
of all of the retransmissions than in N 1 , due to the use of bigger windows in N 2 , per x 9.3). Of 
these, only 4.5% were also lost. (These proportions are quite close to the unconditional loss rates 
we examined in x 15.1, and much lower than the conditional loss rates examined in x 15.3, indicating 
10 This problem also occurs for TCPs that implement ``fast retransmit'' (x 9.2.7) but not fast recovery. However, for 
simplicity, we will only use the term ``fast recovery'' in our discussion. 

322 
that congestion often drains on time scales of RTTs.) Thus, we conclude that the second concern 
discussed above is, in practice, not an especially serious problem. 
However, in both N 1 and N 2 , one third of the time more than one packet was lost in the 
flight prior to a fast recovery, and about 15% of the time, more than two packets were lost. These 
proportions are high enough to give solid support for refining the fast recovery mechanism (such 
as by adding SACK, or the modifications discussed by Hoe [Ho96]) in order to better cope with 
multiple packet losses within a single flight. 

323 
Chapter 16 
Packet Delay 
The final aspect of Internet packet dynamics we analyze is that of packet delay. Delay 
variation is arguably the most complex element of network behavior to analyze---with loss, for 
example, the packet either shows up at the receiver or it does not, while with delay there are many 
shades of possibility and meaning in the time required for a packet to arrive. Likewise, delay 
variation is potentially the richest source of information about the network, as one of the principle 
elements contributing to delay is queueing within the network, which is of vital importance in 
understanding how network capacities evolve over time. 
Any accurate assessment of delay must first deal with the issue of clock accuracy, as all 
delay measurement stems from clock measurements. Unless we tightly calibrate the clocks used 
for delay measurement, or, equally important, recognize which clocks cannot be well calibrated and 
discard the corresponding measurements, we cannot know that the subsequent analysis reflects true 
network behavior and not spurious or misleading clock artifacts. It was these considerations that led 
us to the lengthy efforts developed in Chapter 12. 
We proceed as follows. In x 16.1 we briefly discuss round­trip time (RTT) variation in our 
measurements, which plays a central role in transport protocol behavior. From the point of view of 
network path analysis, however, a packet's one­way transit time (OTT) is more fundamental, partic­ 
ularly since RTT measurements conflate delays along the forward and reverse path. Consequently, 
we devote the remainder of the chapter to OTT analysis. In x 16.2, we discuss OTT variation in 
large­scale terms. We then in x 16.3 turn to packet timing compression---network events in which 
a group of packets arrive at the receiver more closely spaced together than when they were sent. 
Compression is a significant event because it introduces potentially misleading discrepancies be­ 
tween the timing of events at the sender and at the receiver, clouding the ability of one endpoint to 
assess conditions perceived at the other. In x 16.4 we then tackle estimation of the amount of queue­ 
ing packets encounter during their transit. We attempt to determine the time scales associated with 
queueing, but find wide variation. Finally, in x 16.5 we look at the relationship between queueing 
delays and available bandwidth---the transfer rate the network can sustain for a connection, given 
the network's current load. 

324 
16.1 RTT variation 
16.1.1 The role of RTTs 
A transport connection's round­trip time (RTT) plays a central role in the connection's 
behavior. First, a reliable transport protocol such as TCP needs to decide how long to wait for an 
acknowledgement of data it has sent before retransmitting the data. There is a basic tension between 
wanting to wait long enough to assure that the protocol does not retransmit unnecessarily, versus 
not wanting to wait too long so as to unduly delay the connection when in fact retransmission is 
needed. Our analysis of the Solaris 2.3/2.4 TCP in x 11.5.10 highlights how unfortunate it can be to 
err on the side of retransmitting too quickly. Network researchers have made considerable efforts in 
studying how to set a connection's retransmission timeout (RTO), and early problems with TCP's 
RTO computation identified by Zhang [Zh86] have for the most part been rectified by the work of 
Karn and Partridge in eliminating ambiguous RTT measurements [KP87], and by that of Jacobson in 
introducing exponentially­weighted moving averages to estimate both RTT and its variance [Ja88]. 
The second way in which a connection's RTT influences the connection's behavior con­ 
cerns the important notion of bandwidth­delay product (BDP). A connection's BDP is the product 
of ae A , the available bandwidth, measured in bytes/sec, with ø , the RTT, measured in seconds. The 
result is a number B = ae A \Delta ø of bytes indicating how much data the connection must have in flight 
to fully utilize the available bandwidth. A simple way to understand this relationship is to consider 
that, to fully utilize the available bandwidth, the connection must send ae A bytes every second, and 
thus it must send ae A \Delta ø bytes every round­trip time in order to achieve this goal. A round­trip time, 
however, exactly corresponds to one cycle of send­and­receive feedback. This relationship, in turn, 
is directly reflected in the connection's window (x 9.2.2)---the current window controls how much 
data the connection can have in flight at any given moment, and the window can only change due to 
feedback for the currently in­flight packets after one RTT has elapsed, since no feedback can arrive 
sooner than that. Thus, B gives the size of the window the connection must use to fully utilize a 
bandwidth of ae A . 
We must, however, make a crucial distinction between these two different roles of RTT 
in a connection's behavior. For the first role, regulating retransmission, the RTT of interest is how 
long it might take for a packet to reach the receiver and the corresponding acknowledgement to 
return to the sender---the maximum RTT. For the second role, the RTT of interest is the minimum 
time required for packets to traverse the network path to the receiver and for acks to return. The 
larger values possibly observed for the actual amount of time required in general reflect queueing 
along the network path. It does not improve a connection's throughput to use such a larger RTT 
when computing B; it instead only adds to queueing along the path. This observation motivated the 
development of TCP Vegas [BOP94], in which a significant increase in measured RTT is interpreted 
as due to using too large a window and adding to queueing along the path, and thus calling for a 
decrease in the window size to diminish the queueing. 
16.1.2 RTT measurement considerations 
When discussing RTT times, we must bear in mind that larger packets require larger 
transmission times, proportional to the bottleneck bandwidth. The effect, naturally, is most apparent 
on slow links. Accordingly, we need to make sure we do not confuse RTT variation due to packet 

325 
size with RTT variation due to queueing. 
Another consideration is that, if we measure RTT as simply the difference in time be­ 
tween when a packet is sent and when a corresponding reply returns, then we will include in the 
measurements ``response delays'' at the receiver (x 11.6.4). For many purposes, doing so is appro­ 
priate, since the roles played by RTT above both concern quantifying the feedback time scale, and 
this includes both the network's delays and those of the receiver. If, however, we wish to discuss 
only the network's contribution to the feedback time scale, then we need to deduct the response 
delay from the measured RTT. tcpanaly can do this since it knows how to pair packets with their 
responses. However, we argue that the network's contribution to delay is best studied in terms of 
one­way transit times (OTT), since doing so allows for the possibility of asymmetries along the two 
directions of the network path, which we find in x 16.2.3 are in fact common. So, for our RTT 
analysis, we do not deduct response delays from the measurements, that we might study the entire 
``closed loop.'' 
Finally, we note that RTT can be measured in two different ways: as the amount of time 
elapsed between when a TCP sends a packet and when it receives an acknowledgement in response 
to that packet, or as the time between when a TCP sends an acknowledgement and when it receives 
the packet liberated by that acknowledgement (x 11.3.1). As we might expect, overall we find these 
two values to be very close to one another, except for variations due to ``response delays'' (x 11.6.4). 
(They also can appear different if the clocks at the sender and receiver run at significantly different 
rates, per x 12.7.7.) In the remainder of this section, we confine our analysis to RTTs measured at 
the sender. 
16.1.3 RTT extremes 
Extremes of network behavior are always interesting to consider, since they sometimes 
challenge the assumptions made by our mental models of how networks ``really'' work. For example, 
some might find RTTs larger than a few hundred milliseconds exceedingly unlikely---where could 
a packet spend all that time?---and thus best treated as pathological events rather than part of the 
regime we must accommodate as ``normal.'' (We saw how dangerous this can be in Figure 11.9.) 
Our data is inappropriate for exploring the full range of RTTs one finds in the Internet, 
since the set of sites in our study is small, and we would expect RTT extremes to be governed for 
the most part by geography. This is especially the case for network paths that include satellite links, 
as these can add hundreds of milliseconds due to the propagation delays up to and back down from 
the satellite. 
However, while geography certainly dominates upper RTT extremes, it is not the only 
factor. To our surprise, we found that one site in our study, oce, experiences extremely high delays 
for many of its connections. 50% of its connections had a minimum RTT of over 1 sec. 
oce is sited in the Netherlands. One striking connection came from wustl in North 
America. It never observed an RTT less than 4.4 sec! 1 Another came from unij, never experiencing 
an RTT below 2.3 sec---yet unij is also in the Netherlands! A traceroute from unij to oce 
reveals that the route stays wholly within the Netherlands. Furthermore, it shows that all of the delay 
occurs at the hop between NLNet, the Netherlands Internet backbone, and the oce site itself. The 
1 Alas, wustl is a Solaris 2.4 site. Its RTO timer had great difficulty accommodating the large RTT, per Figure 11.9. 
During the first minute of the connection, before the timer finally adapted, it sent 31 new data packets and 51 retransmis­ 
sions, all but one unnecessary. One packet was retransmitted seven times! 

326 
cause of this large delay, which we discussed in x 12.7.8, remains unexplained, despite investigative 
efforts by staff at the oce site. It highlights, however, how commonplace---and often correct--- 
assumptions concerning network behavior can be violated in unexpected ways. 
Even after eliminating oce, we still find some striking RTT extremes. Connections in­ 
volving austr2 experienced minimum RTTs as high as 1.85 sec (to a host in California). 2 If we 
remove austr2, then, curiously, the next highest extremes involved not international traffic but 
connections with both endpoints in the United States. One, from wustl to adv, never saw an RTT 
lower than 1.2 sec, even though a connection ten minutes earlier had a minimum of 156 msec, and 
one 25 minutes later was back to the typical value of 47 msec. Unfortunately, we do not have a 
traceroute measured right at the time of the anomalous connection. Ones fifteen minutes earlier 
and 80 minutes later show no anomalies and both report an RTT of about 44 msec. 
The most extreme RTT connection in N 1 involved not korea, for which we might ex­ 
pect high RTTs (and, indeed, it had plenty), but nrao and bsdi, in Virginia and Colorado. This 
connection had a minimum RTT of 1.4 sec and a median value of 2.1 sec. While in x 6.9 we gave 
an example of a circuitous route involving bsdi, traceroute reported its RTT as only about 
160 msec, much less than observed by this connection; so, we do not have an explanation for what 
took the packets so long. 
So far in this section we have focussed on the minimum RTT observed during a connec­ 
tion, which is important for correctly determining B, the bandwidth­delay product. For computing 
RTO, the connection's retransmission timeout, we instead are interested in the maximum RTT, which 
we now look at briefly. (As discussed in x 15.6, we do not undertake a detailed analysis of how we 
might modify TCP's RTO algorithms to increase their performance, as this is a complex problem.) 
We would expect that RTT maxima can rise very high for connections with slow bottle­ 
neck links and many available buffers at the bottleneck. In such cases, the sending TCP will not 
receive a packet loss signal until it has exhausted the available buffer. For a slow link, a significant 
amount of buffer can translate into a huge delay as packets finally wend their way through the queue. 
The largest apparent RTT we ever observed was 23.8 sec, for a SYN packet and its accom­ 
panying SYN­ack. This was not, however, a true RTT: the receiving SunOS 4.1 TCP generating the 
SYN­ack was retransmitting it in an attempt to establish the connection, and its timer backed off first 
to 6 sec and then to 24 sec. At the same time, the sender, also a SunOS 4.1 TCP, was backing off its 
retransmission timer for the original SYN. The two timers were slightly out of phase. Consequently, 
just before the sender reached the 24 sec retransmission, a SYN­ack arrived from the receiver, lead­ 
ing to the huge apparent RTT. We mention this anomaly because some modifications to TCP such 
as Hoe's in [Ho96] suggest using the RTT timing for the SYN packet as a quick estimate of the 
path's true RTT. Such schemes must take care not to get fooled by SYN­ack retransmissions. In this 
particular case, use of Karn's algorithm would have discarded the RTT measurement as ambiguous 
[KP87]. However, had the retransmitted SYN­ack arrived just before the first retransmission of the 
SYN (i.e., just before the 6 sec timer expired), then even Karn's algorithm would have accepted the 
measurement, since the algorithm is predicated upon the assumption that acks are not retransmitted. 
Finally, we note that Hoe's scheme uses the RTT to estimate B, the bandwidth­delay product. Using 
a value of 6 sec instead of the correct value of 220 msec would grossly overestimate B, leading to 
the connection overestimating the window it should use. Hoe's scheme, however, could be easily 
modified to use a more robust initial RTT estimate, since it does not make any decisions based on 
2 austr2, alas, is also a Solaris 2.4 site : : : 

327 
B until it has received a flight of 3 closely spaced acks. At that point, there should have been ample 
opportunity to estimate RTT better. 
Putting aside anomalies due to SYN­ack retransmissions, we find that the largest true RTT 
in our study was 15.1 sec, for a connection involving oce. We discussed above oce's peculiarly 
large RTTs, and in x 12.7.8 the puzzling interplay between the transit times of packets and acks in 
its connections, so we will not further analyze oce­connection RTTs here. If we eliminate oce, 
then we find the next largest RTT comes from the 12­second packet reordering event discussed in 
x 13.6. Putting aside this pathology, we finally find a ``normal'' extreme RTT, not due to any unusual 
network dynamics, of 7.9 sec (involving a connection to lbli, which has a low­speed Internet 
link with a lot of buffer space). A few others range above 6 sec, including one from a high­speed 
connection between sintef2 in Norway and austr in Australia. 
16.1.4 RTT variation during a connection 
Another way to characterize RTT extremes is in terms of the variation we observe in RTT 
over the course of a connection. Our interest lies in whether we can develop a ``rule of thumb'' 
such as ``it is rare to observe a maximum RTT more than double the minimum RTT.'' This sort 
of empirical finding would aid in considering how transport protocols can best adapt to network 
conditions. 
We first note that connections with slow bottlenecks can often experience great swings in 
RTT as their own packets pile up at the queue for the bottleneck (x 16.1.3). While such connections 
are an important consideration for general­purpose transport protocols, for our purposes we elimi­ 
nate any connection with an estimated ae B less than 100 Kbyte/sec, so that we might focus on RTT 
variations not heavily dominated by the connection's own behavior. We also eliminate connections 
between sintef1 and sintef2, as they are sited very close together and thus much more easily 
exhibit large relative swings in RTT, even though in absolute terms the swings are quite small. 3 
After these eliminations, in N 2 we are left with 12,486 connections. Figure 16.1 shows the 
distribution of the ratio between their maximum RTT and minimum RTT, log­scaled. We compute 
RTTs from the TCP sender's perspective, using the time required to receive an acknowledgement 
for a full­sized packet. 
The distribution shows great variation, with a median ratio of 2.2:1 (mean of 3:1), but the 
upper 5% have ratios of 6.7:1 and higher. The entire upper 50% fits closely to a Pareto distribution 
with ff = 2:1, shown with a log­log complementary distribution plot in Figure 16.2 (we discussed 
these plots in x 15.3). A value of ff ? 2 means that the ratio has finite variance, and this is probably 
due to the fact that the maximum RTT is bounded by the amount of buffer space available along the 
network path. However, the great degree of variation means that, without additional information, 
we cannot accurately predict the relationship between the minimum and maximum RTTs. 
The ratios exhibit one other striking distribution. If we instead consider the ratio of the 
minimum RTT to the maximum, then the corresponding distribution is very nearly normal. Fig­ 
ure 16.3 shows this distribution, with a normal distribution fitted to the mean and variance shown by 
a dotted line. Figure 16.4 shows a Q­Q plot of the same fitted normal, with the line corresponding 
to slope 1 and offset 0. Clearly, the agreement is quite good except in the tails. Unfortunately, an 
3 The pair lbl and lbli do not exhibit this problem because lbli's low­bandwidth ISDN link leads to fairly large 
RTTs between the two sites. 

328 
Ratio of Max RTT to Min RTT 
1 5 10 50 100 
0.0 
0.2 
0.4 
0.6 
0.8 
1.0 
Figure 16.1: Distribution of the ratio between a connection's maximum RTT to minimum RTT 
Ratio of Max RTT to Min RTT 
P[X 
>= 
x] 
5 10 50 100 
0.0001 
0.0010 
0.0100 
0.1000 
1.0000 
Figure 16.2: Log­log complementary distribution plot of max­min RTT ratio 

329 
Ratio of Min RTT to Max RTT 
0.0 0.2 0.4 0.6 0.8 
0.0 
0.2 
0.4 
0.6 
0.8 
1.0 
Figure 16.3: Distribution of inverse ratio (minimum RTT to maximum RTT) 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . . 
. 
. 
. . . 
. 
. 
. 
. . 
. . . 
. 
. . . 
. . 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . . 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . . 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . . 
. 
. . . 
. 
. 
. 
. 
. . 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. . 
. 
. . 
. . 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. . . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. . 
. 
. . . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
... 
. 
. 
. 
.. 
. . 
. . 
. . . 
. . . 
.. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. . 
. . . 
. . 
. . . 
. . 
. . . 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . . . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. . . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . . 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. . . 
. 
. . 
. 
. 
. 
. . 
. 
. 
. . . . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. . 
. . 
. 
. 
. 
. . 
. 
. . 
. 
. . . 
. . 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . . 
. . 
. 
. 
. 
. . . . . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. . 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. . 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . . 
. . 
. . 
. . . . . 
. 
. 
. . 
. 
. 
. . 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. . 
. . 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. . 
. . 
. . 
. . . . 
. 
. 
. . 
. 
. . 
. . 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. . . 
. 
. . 
. 
. 
. . . 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . . 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. . 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. . . . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. . . 
. 
. 
. . . . . . . 
. 
. 
. 
. . 
. 
. . 
. 
. 
. . 
. . . 
. . . . 
. . 
. . . 
. . 
. 
. . . 
. . . . 
. . 
. 
. . 
. . . 
. . 
. 
. 
. . 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . . 
. . 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . . 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. . 
. 
. . 
. . 
. 
. . 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. . . . 
. . . 
. 
. 
. 
. 
. . 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. . 
. 
. . . . . 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. . . . 
. 
. 
. . . 
. . 
. 
. 
. 
. . 
. . . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. . . . . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. . 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. . . . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. . . 
. 
. . 
. 
. 
. 
. 
. 
. . . . 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. . 
. . 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. . 
. 
. 
. 
. 
. . . . . . 
. . 
. 
. . 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . . 
. 
. 
. . 
. 
. 
. 
. 
. . . 
. 
. . 
. . . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . . . 
. . . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. . . . 
. 
. 
. 
. . 
. 
. 
. 
. . 
. . 
. . 
. . 
. . . 
. 
. . 
. . 
. . . . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . . 
. 
. 
. 
. . 
. . . 
. 
. . . . 
. 
. 
. . . . 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . . 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. . . . 
. 
. 
. . 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. . 
. 
. . 
. . 
. 
. 
. . 
. 
. 
. . 
. . . . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . . 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. . 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
.. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. . . 
. . . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . . 
. . 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. . 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. . . 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. . . 
. 
. 
. 
. . 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. . . 
. 
. 
. 
. 
. . . 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. . . 
. 
. 
.. 
. . 
.. . 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. . . 
. . 
. . 
. 
. . 
. . . 
. . 
. 
. . 
. .. . . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. . . . 
. 
. 
. . . 
. . . 
. 
. 
. 
. . . 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. . 
. 
. 
. 
. 
. . . 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . . . 
. . . . . . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. . 
. . 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. . . . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. . 
. 
. 
. . 
. 
. 
. . 
. 
. 
. . 
. 
. . . . 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. . 
. 
. . 
. . . 
. . 
. . 
. . 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. . . 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. . . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. . . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. . 
. . 
. 
. . . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. . 
. . . 
. . 
. 
. 
. . 
. 
. . . . 
. 
. 
. 
. 
. 
. . 
. . 
. . 
. 
. 
. . 
. . . 
. . . 
. . 
. . . . 
. . . 
. 
. 
. . . 
. . . 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . . 
. . . . 
. 
. 
. 
. . 
. 
. 
. . . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. . . 
. . . 
. 
. 
. 
. 
. 
. 
. . . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . . . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . . 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. . . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. . 
. . 
. . 
. 
. 
. 
. 
. 
. . . 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. . 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. . . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. . . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. . 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. . . . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. . 
. . . . 
. . 
. 
. . . . 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. . 
. . . . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. . 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. . . . 
. 
. . 
. . . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. . . 
. . 
. . . 
. . . 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. . 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. . 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . . 
. 
. . . 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. . 
. . . 
. 
. 
. 
. . . 
. 
. 
. . 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. . 
. . . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. . . 
. 
. 
. . . 
. 
. . . . 
. 
. . 
. . . 
. . . . 
. 
. . 
. . 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. . 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. . 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. . 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. . .. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. . 
. 
. 
. 
. . 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. . 
. 
. . 
. 
. 
. 
. 
. 
. . . 
. 
. . . 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. . 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. . 
. 
. 
. 
. 
. . 
. . . 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . . 
. 
. 
. 
. 
. . 
. 
. . 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. . . . 
. 
. 
. 
. 
. . 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. . . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . . . . 
. . . 
. . . 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . . 
. 
. . . 
. . 
. . . 
. 
. . 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. . 
. . 
. . . . 
. . 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . . 
. . 
. 
. 
. . 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. . 
. . 
. . 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. . . . 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . . . 
. . . 
. 
. . . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. . 
. . 
. 
. . . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. . . . 
. 
. . 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. . 
. . 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. . 
. . . 
. 
. 
. . 
. .. . 
. . 
. 
. 
. 
. 
. . 
. . . . 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . . 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. . 
. 
. 
. 
. . 
. . . 
. 
. 
. . 
. 
. 
. 
. . . 
. 
. . 
. . 
. 
. 
. 
. 
. . 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. . 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . . . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
.. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . . 
. . 
. 
. . . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . . 
. 
. . . 
. 
. 
. 
. . 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . . . 
. 
. 
. 
. 
. 
. . . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. . 
. 
. . . . 
. 
. . . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . . . 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. . 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. . 
. . 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. . . 
. . . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . . . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. . 
. 
. 
. . 
. . 
. 
. 
. . . . 
. . . 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. . 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . . . . 
. . 
. 
. . 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. . 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. . 
. . . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . . . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. . 
. . . . 
. . . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. . 
. . . 
. 
. 
. 
. 
. . . 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. . 
. . 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. . . . . . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. . . . 
. . 
. 
. . 
. 
. 
. . . 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. . 
. 
. 
. . 
. . 
. 
. . 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . . 
. 
. 
. 
. . . 
. 
. 
. 
. 
. . . . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. . . . 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. . 
. 
. . 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. . . 
. . 
. 
. . . 
. 
. 
. 
. . 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. . . . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
Quantiles of Fitted Normal 
Min 
RTT 
: 
Max 
RTT 
­0.2 0.0 0.2 0.4 0.6 0.8 1.0 
0.0 
0.2 
0.4 
0.6 
0.8 
Figure 16.4: Q­Q plot of ratio of minimum RTT to maximum RTT (y­axis) versus fitted normal 
distribution (x­axis) 

330 
Interquartile RTT Range (sec) 
0.0001 0.0010 0.0100 0.1000 1.0000 
0.0 
0.2 
0.4 
0.6 
0.8 
1.0 
Figure 16.5: Distribution of RTT interquartile range 
interpretation for this fit (or for the corresponding Pareto fit) eludes us. As with the elusive expo­ 
nential fit to data packet loss rates (x 15.2), we mention the fit here in hopes that it might stimulate 
further research. 
We finish with a look at less extreme RTT variation: the interquartile range (75th per­ 
centile minus 25th percentile), IQR. This range gives a much more robust statistic in the sense of 
being insensitive to extreme values. We are particularly interested in IQR as an aid in estimating the 
maximum RTT, as this has immediate applications for computing retransmission timeouts (RTOs). 
Figure 16.5 shows the distribution of IQR, and Figure 16.6 shows the distribution if we 
normalize to the minimum RTT. Both plots use a logarithmic scale on the x­axis. We see a wide 
range of variation, with the lower and upper 5% tails of the absolute range spanning 6 msec up to 
106 msec, and, with normalization, the same tails range from a factor of 0.046 up to a factor of 1.23. 
The interquartile range is in many ways analogous to a robust version of standard devi­ 
ation [Ri95]. Consequently, we interpret the wide range of variation as supporting the argument 
that RTT estimation (for purposes of computing timeouts, for example) must include a notion of 
variation in addition to estimating the mean or minimum value. Jacobson's estimator does exactly 
this for TCP [Ja88]. 
In Figure 16.2 we found that maximum RTTs often are much larger than minimum RTTs. 
We might wonder, though, whether this discrepancy can be reduced if expressed in terms of RTT 
variation. For example, it could be the case that the maximum is generally less than n times IQR 
above the minimum. Unfortunately, this does not appear to be the case. Figure 16.7 shows the 
distribution of the difference between the maximum and minimum RTT, normalized by dividing by 
IQR. Again, the x­axis is scaled logarithmically, indicating a wide range of variation. Furthermore, 
normalization has diminished but not eliminated the Pareto distribution for the upper tail. Instead 
of occupying a full 50% of the distribution, it now occupies the upper 20%, with ff = 1:84, within 
the domain of infinite variance. Finally, these results do not change appreciably if we look at the 

331 
Normalized Interquartile RTT Range (sec) 
0.1 1.0 10.0 
0.0 
0.2 
0.4 
0.6 
0.8 
1.0 
Figure 16.6: Distribution of RTT interquartile range, normalized to minimum RTT 
Normalized Difference of Max RTT and Min RTT 
1 5 10 50 100 500 
0.0 
0.2 
0.4 
0.6 
0.8 
1.0 
Figure 16.7: Distribution of difference between maximum RTT and minimum RTT, normalized by 
interquartile range 

332 
normalized difference between the maximum RTT and the median RTT, rather than the minimum 
RTT. 
From Figure 16.7 it appears that the combination of minimum RTT and interquartile range 
is inadequate for estimating maximum RTT. TCP RTO estimation is based on similar information, 
i.e., the estimated RTT mean and standard deviation. Yet, we should not conclude from this that 
TCP's estimation algorithm cannot work, because the algorithm updates its estimates as the con­ 
nection progresses, using exponentially­weighted moving averages to incorporate new information. 
Consequently, it has opportunities to adapt, while the preceding analysis is static. Again, as dis­ 
cussed in x 15.6, we do not undertake here a detailed analysis of how well TCP's RTT estimation 
algorithm performs, as doing so involves a number of subtle issues. 
16.2 OTT variation 
For the remainder of this chapter, we focus on one­way transit times (OTTs). Any ac­ 
curate assessment of delay must first deal with the issue of clock accuracy, from which all delay 
measurement stems. This problem is particularly pronounced when measuring OTTs since doing 
so involves comparing measurements from two separate clocks. It was primarily to this end that 
we undertook the efforts described in Chapter 12 attempting to assure that we can soundly gauge 
the trustworthiness of the packet timestamps. The subsequent analysis we discuss was always done 
after first using these algorithms to reject or adjust traces with clock errors. 
OTT variation was previously analyzed by Claffy and colleagues in a study of four Internet 
paths [CPB93a]. They found that mean OTTs are often not well approximated by dividing RTTs in 
half, and that variations in the paths' OTTs are often asymmetric. From our data we cannot confirm 
their first finding, but we discuss the asymmetry finding shortly. 
16.2.1 Why we do not analyze OTT extremes 
We do not investigate extreme OTT variation, as we did for RTTs in x 16.1.3, for two 
reasons. First, most of the RTT extremes are due to network delays, and, in particular, extreme 
OTTs, so the OTT results are very similar to the RTT results. 4 Second, our absolute OTT values 
were derived using the approximation that we could rectify clocks in our study by dividing RTTs in 
half (Eqn 12.5 in x 12.5.1). We know from the Claffy et al. study, and from our earlier results on 
routing asymmetries (x 8), that this approximation is often erroneous, and we noted in the derivation 
that consequently we must refrain from analyzing the absolute OTT values themselves. 
16.2.2 Range of OTT variation 
Our measurements do, however, let us accurately assess variations in OTT. In doing so, 
we will always distinguish between ack OTTs and data packet OTTs, as we expect the latter to 
show significantly more variation due to their queueing load. Figure 16.8 shows the distributions 
of IQR and max­min variations in OTTs for N 2 data packets and acks. Again, we have limited our 
4 This would not have been the case if RTT extremes were due to delays by the TCP endpoint, or combined increases 
in delay along the two directions of the network path. But neither of these is the dominant effect. 

333 
OTT Variation (sec) 
0.0001 0.0010 0.0100 0.1000 1.0000 
0.0 
0.2 
0.4 
0.6 
0.8 
1.0 
. . .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . 
Ack IQR 
Data IQR 
Ack Max­Min 
.......... Data Max­Min 
Figure 16.8: Distribution of interquartile and max­min OTT variation 
analysis to connections with a bottleneck bandwidth exceeding 100 Kbyte/sec, and have removed 
those between sintef1 and sintef2. 
The x­axis reflects logarithmic scaling; so, as with many aspects of RTT variation, we 
see a wide range of variation. For example, for data packets the median ratio between the max­min 
variation and IQR is 3.5:1, and the upper 5% tail exceeds 13:1. For acks, the numbers are higher, 
the median being 5:1 and the upper 5% tail at 29:1. The difference lies in data packets having a 
larger IQR to begin with, due to OTT variation caused by the connection's own queueing; for acks, 
IQR is fairly tame, so the same absolute OTT extreme will be relatively larger when compared to 
the IQR. 
As with normalized RTT variation (Figure 16.7), much of the distribution of the ratio 
between maximum OTT variation and IQR fits well to a Pareto distribution, for both data packets 
and acks. Here, the fit is to the entire upper 50% of the distribution, and the ff's are well below 2, 
reflecting sometimes enormous variation. 
16.2.3 Path symmetry of OTT variation 
We now turn to the relationship between OTT variation on the forward path and that on 
the reverse path. For N 2 , we find that the coefficient of correlation, j, between the max­min OTT 
variations of the data packets and the corresponding acks is about 0.1---quite weak, though not 
negligible. For IQR, it drops to 0.06, and for the max­min variation divided by IQR, it drops still 
further, to 0.02. 
However, these statistics do not tell the whole story. As noted above, the forward path is 
often perturbed by the queueing load of the connection's data packets. We can instead look at OTT 
variation for only unloaded packets (where a packet is considered unloaded if it does not satisfy 
Eqn 15.5). Such packets did not queue behind their predecessors, unless cross traffic delayed their 

334 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . . 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. . . . 
. 
. 
. 
. 
. .. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . . 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. . . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . . 
. . . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. . 
. . . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. . 
. . . 
. . 
. 
. 
. . . 
. 
. 
. 
. . . 
. . . 
. 
. 
. 
. . 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. . . 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. . 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . . 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . . 
. . . 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . . 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . . . 
. 
. 
. . 
. . 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. . 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. . 
. 
. 
. 
. . 
. 
. 
. . 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. .. 
. 
. . 
. . 
. . 
. 
. . 
. . 
. . 
. 
. 
. 
. . 
. 
. 
. . 
. . . . 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
.. 
. . 
. 
. . . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. . . . 
. 
. 
. 
. 
. . 
. 
. 
. 
. . . . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. . 
. . . 
. 
. 
. . 
. . 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. . 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. . 
. 
. . 
. . . 
. 
. . 
. 
. 
. . 
. . 
. . 
. 
. 
. . 
. . . 
. . 
. 
. . 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. . . 
. 
. . 
. 
. . 
. 
. 
. . . 
. 
. 
. 
. . 
. . 
. . 
. 
. 
. 
. 
. . 
. 
. . . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. . 
. 
. . . 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. . . . 
. . 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. . 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . . 
. 
. 
. 
. . . 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. . 
. . 
. . 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. . . . 
. 
. . 
. 
. . 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
.. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . . 
. . . 
. 
. 
. 
. . 
. . . 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. . 
. . . 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . . 
. 
. . 
. 
.. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. . 
. 
. . 
. 
. 
. 
. 
. . 
. . 
. . 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. . 
. . . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. . 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. . 
. . 
. . 
. 
. 
. 
. 
. . 
. . 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. . 
. . . 
. 
. . 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. . . 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . . 
. 
. 
. . 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. . . . 
. . 
. . 
. 
. . 
. 
. 
. . 
. 
. . 
. 
. 
. . 
. 
. . . 
. . 
. . 
. . . 
. . 
. 
. . 
. 
. . 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . . 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . . 
. 
. 
. 
.. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . . 
. . 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. . 
. 
. 
. 
. 
. . 
. . 
. 
. . 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. . 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. . . 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
.. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . . 
. 
. 
. . . 
. 
. . 
. 
. 
. 
. . . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. . 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . . 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. . 
. . 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
.. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. . 
. . 
. . 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. . 
. 
. 
. . . . 
. 
. 
. 
. 
. 
. . . . 
. . . 
. . 
. . . 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . . 
. 
. 
. 
. . 
. . 
. . 
. 
. . . 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. . . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
.. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. . 
. . 
. 
. . 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
.. 
. . . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
.. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. . 
.. . . . 
. 
. 
. . . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . . 
. . 
. . 
. 
. 
. 
. 
. 
. . . 
. . . 
. 
. . 
. 
. 
. 
. . 
. . 
. 
. 
. . .. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . . . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. . . 
. 
. 
. . . 
. 
. 
. . . 
. 
. . 
. 
. 
. . 
. 
. 
. 
. . 
. 
. . . 
. . 
. 
. . 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . . 
. . 
. 
. 
.. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. . 
. 
. 
. 
. . 
. . . 
. 
. . . . 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
.. 
. 
. . 
. 
. . 
. 
. 
. . 
. 
. . 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. . . 
. 
. 
. 
. 
. 
. 
.. 
. 
. . 
. 
. . 
. 
. 
. 
. . 
. . 
. . . 
. 
. . . . 
. 
. . . 
. 
. . . 
. 
. 
. . 
. . 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. .. . . . 
. 
. 
. . . . 
. 
. 
. 
. 
. . . 
. 
. . 
. . . 
. 
. 
. 
. 
. . 
. . . . 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. . 
. 
. . . . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. . 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . . 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
.. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. . . 
. 
. 
. . . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . . . 
. . . 
. . . 
. 
. . 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. . . . 
. . 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. . 
. . 
. . . 
. . 
. . . 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. . 
. . 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. . . . 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
.. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. .. . 
. 
. 
. 
. 
. . 
. . 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. . . 
.. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. . . . 
. . . 
. 
. 
. . . . 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. . 
. . 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. . 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. . 
. . 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. . 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. . . 
. . 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. . 
. 
. .. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . . 
. . 
. 
. 
. . . . 
. 
. 
. 
. 
. . 
. . . 
. 
. . 
. 
. 
. . . 
. . 
. 
. . 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. . 
. 
. . . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. . . 
. 
. 
. 
. . . 
. 
. . 
. . 
. . 
. 
. 
. . . 
. 
. 
. 
. 
. 
. . 
. . 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. . 
. . . 
. 
. . 
. 
. 
. 
. 
. . . . 
. . 
. 
. 
. 
. . 
. 
. . . 
. 
. 
. 
. 
. . 
. . 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. . 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . . 
. 
. 
. . 
. . . 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. . 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. . . 
. . . 
. 
. . . 
. 
. 
. 
. . 
.. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . . 
. . 
. . 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. . 
. . 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. . . 
. 
. . 
. . . . . . 
. 
. 
. 
. . . 
. 
. 
. . 
. . 
. . . . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. . 
. 
. 
. . . 
. . . . 
. . . 
. 
. . . 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. . 
. 
. 
. . 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. . 
. . 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. . 
. . 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. . 
. 
. . 
. 
. 
. . 
. 
. . 
. . 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. . . 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. . 
. 
. . 
. 
. . 
. 
. . 
. 
. . . 
. 
. 
. 
. 
. 
. 
. . . 
. 
. . 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. . 
. 
. 
. . 
. . 
. 
. 
. 
. . 
. 
. 
. 
. . . . 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . . . 
. . . 
. 
. 
. 
. 
. . 
. . . 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. . 
. . 
. . . . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. . 
. 
. . 
. . . 
. . 
. 
. 
. 
. . 
. . 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. . 
. . . 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. . 
. . . 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. . 
. 
.. . 
. . . . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. . 
. . 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. . . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. . . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. . 
. 
. 
.. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . . . 
. 
. . 
. . 
. . . 
. 
. . . 
. 
. . 
. . 
. 
. . . 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
.. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . . 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. . 
. 
. . 
. . . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. . . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. . 
. . 
. . 
. 
. 
. 
. . . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. . 
. 
. . . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. . . 
. 
. 
. 
. . . 
. . 
. . 
. 
. 
. 
. 
. 
. 
. . 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. . 
. 
. 
. . 
. 
. 
. . 
. 
. 
. . . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. . 
. 
. . 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. . 
. 
. 
. . 
. 
. 
. . 
. 
. 
. . 
. 
. . . 
. 
. 
. . 
. 
. 
. 
. 
. 
. . . 
. . 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. . 
. . . . 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
.. 
. 
. 
. 
. . . . 
. 
. 
. 
. 
. 
. . 
. . 
. . . 
. 
. 
. 
. 
. . 
. . 
. 
. . 
. . 
. 
. 
. 
. 
.. . 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . . 
. . 
. . . 
. 
. 
. 
. 
. 
. . . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . . . 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. . . . 
. 
. . . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
.. . 
. . . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. . 
. . 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. . 
. . . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. . 
. . 
. . 
. 
. 
. . . 
. . 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. . . 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. . 
. 
. 
. . 
. 
. 
. 
. 
.. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. . . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. . 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. . 
. . 
. . 
. 
. 
. . 
. 
. . . 
. . . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. . 
. . 
. 
. . 
.. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. . . 
. . . 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
.. 
. 
. 
. 
. .. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. . . 
. 
. . 
. 
. 
.. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. . 
. 
. . . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. . . 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. . . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. . . 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. . 
. 
. . . . 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. . 
. . 
. 
. 
. 
. . 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. . .. 
. 
. . . 
. 
. 
. 
. 
. 
. . . 
. . 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. . 
. . 
. . . 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. . 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
.. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. . . 
. 
. . 
. . . . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. . . . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. .. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. . 
. 
. 
. . . . 
. 
. . 
. . 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . . 
. 
. . 
. 
. 
. 
. 
. 
. 
.. 
. 
. 
. 
. 
. 
. . 
. . . . 
. 
. 
. 
. 
. . . 
. 
. . . . 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. . 
. 
. 
. . . . 
. 
. 
. 
. . . 
. 
. . 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
.. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. . 
. 
. 
. 
. . . 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. . . 
. . . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . . 
. . . 
. 
. 
. 
. 
. 
. 
. . 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. . . 
. 
. . 
. 
. 
. . 
. 
. 
. . . 
. 
. . . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. . 
. 
. . 
. . 
. 
. 
. 
. . 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
.. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. . . . 
. 
. 
. 
. . 
. . 
. 
. . 
. . . 
. . 
. . 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . . . 
. 
. 
. 
. . . 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. .. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
.. . . 
. 
. 
. 
. 
. 
. 
. 
. . . . 
. . 
. 
. . . 
. . 
. 
. 
. . 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. . 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. . . 
. . 
. 
. . 
. 
. 
. 
. . 
. 
. . 
. 
. . 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . . 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. . . 
. . 
. 
. . 
. . 
. . . 
. 
. 
. . . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. . 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. . 
. . 
. . 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. . 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. . 
. 
. . 
. 
. 
. . 
. 
. . 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. . . 
. . 
. . . 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
Unloaded Data Packet OTT IQR 
Ack 
OTT 
IQR 
0.0001 0.0010 0.0100 0.1000 1.0000 
0.0001 
0.0010 
0.0100 
0.1000 
1.0000 
Figure 16.9: Scatter plot of interquartile ranges of unloaded data packet OTT variations (x­axis) 
versus acks (y­axis) 
predecessors. If we analyze only unloaded packets on the forward path, then j between the IQRs of 
the forward and reverse variations rises to 0.18, considerably more substantial. j for the logarithms 
of the IQRs is 0.55, indicating that the order of magnitude of the variation along one path is a 
good predictor of the order of magnitude of the variation encountered along the other. 5 Figure 16.9 
shows a scatter plot of the forward path IQR variation for unloaded data packets, versus the ack IQR 
variation. Note that both axes are log­scaled. 
The correlations appear to indicate that delay variations along both directions of an Inter­ 
net path are indeed coupled, albeit weakly. However, we must investigate a bit further. It could be 
the case that only some Internet paths have coupled variations, while most do not. In particular, we 
found in x 15.1 that European sites have higher loss rates than those in the United States, and that the 
paths from Europe to the U.S., and, particularly, from the U.S. to Europe, have the highest loss rates. 
So it could easily be that traffic between the U.S. and Europe, which traverses in each direction the 
highly congested trans­Atlantic links, experiences similar delay variations in both directions; while 
other traffic does not. 
To test this effect, we repeated the above analysis with only those N 2 connections between 
two sites in the U.S. We found that the correlations were only slightly weaker, indicating that the 
effect has only a mild influence. 
In summary: if we know the OTT variation along one direction of a path, then we can 
fairly well predict the order of magnitude of the variation along the other direction. Predicting the 
variation to a finer degree is difficult. However, if we are interested not in the intrinsic delays along 
the path, but the delays actually experienced by a TCP connection, which include variations induced 
by the connection's load (i.e., its packets queueing behind their predecessors), then prediction is very 
5 If we normalize the IQRs by the round­trip times, the coefficients of correlation do not change much (rising to 0.22 
and falling to 0.50, respectively). 

335 
difficult: the two directions are nearly uncorrelated. 
16.2.4 Relationship between loss rate and OTT variation 
It is natural to expect that delay variation might be closely correlated with packet loss, 
because, whenever packets are delayed in the network, they must be stored somewhere, and that 
storage will have a finite capacity. Thus, if delay climbs high enough, loss ensues as buffers become 
exhausted. However, this relationship can be obscured if routers have enough buffers to absorb 
considerable delay variations. It can also be obscured because delay variation derives from the end­ 
to­end concatenation of variations at each hop along a path, while loss is presumed to be governed 
by one or perhaps a few overloaded elements along the path. Hence, many elements will contribute 
to delay variation but not to loss. 
To investigate the relationship between delay variation and loss, we look at how the IQR 
of ack OTT variation correlates with the loss rate experienced by the acks. (We confine our analysis 
to acks to avoid the complications introduced by higher data packet loss rates due to the load they 
present to the forward path, per x 15.2.) 
Overall, we find j = 0:22, indicating a definite, but not overly strong, linkage. However, 
much of the linkage comes from low OTT variation being coupled with experiencing no loss, a 
situation we referred to in x 15.1 as ``quiescence.'' If we confine our analysis to those connections 
experiencing at least one loss (``busy''), then j drops to 0.12. Figure 16.10 shows the corresponding 
scatter plot. The plot shows some apparent structure: the region corresponding to a very low loss rate 
(on the y­axis) appears separate from the rest of the plot. However, this difference is a granularity 
artifact. The log scale highlights the difference between losing a single ack and losing two ack, 
since the latter corresponds to twice the ack loss rate of the former. Setting aside this artifact, we 
conclude that there is no strong relationship between OTT variation and loss rate. 
If we log­transform both the IQR and the loss rate, then j climbs to 0.35, indicating that 
the order of magnitude of the IQR is a fairly good predictor of the order of magnitude of the loss rate, 
but nothing finer. These statistics are virtually unchanged if we confine our analysis to connections 
between U.S. sites, so the effect is not being skewed by the trans­Atlantic or European sites, which 
differ in their loss patterns (x 15.1). 
Finally, if we normalize the delay variation IQR by the connection's round­trip time, then 
correlation decreases, and, for ``busy'' connections, the two become uncorrelated, with j = 
ving large amounts of buffer space, or the 
end­to­end chain accruing a number of small variations into a single, considerably larger variation. 
16.2.5 Evolution of OTT variation 
We now look briefly at how OTT variation evolves with time. To do so, we follow the 
methodology used in x 15.5 to assess how loss rates evolve with time. For each connection c between 
the same source and destination, we compute the pair h\DeltaT c ; j\Deltaoe c ji, where \DeltaT c is the time between 
that connection and the next successful connection, c 0 , we observed along that path; and j\Deltaoe c j is 
the absolute value of the difference between the IQRs of the ack OTT variations for c and c 0 , where 
each IQR is normalized by the connection's round­trip time. 

336 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
.. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
.. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . . 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. . 
. 
.. 
. . 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. . 
. 
. . 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
.. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . . 
. 
. 
. . 
. 
. 
. . . 
. . 
. 
. 
. 
. 
. 
. . . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. . 
. . 
. . 
. . . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. . 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
.. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. .. . . 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. . . 
. 
. 
. 
. . 
. . . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . . 
. 
. 
. 
. . 
. 
. 
.. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. . 
. 
. . 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . . 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. . . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
.. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
.. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. . 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . . 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . . 
. . . 
. . 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
.. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. . 
. 
. . 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. . 
. 
. 
. . 
. 
.. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . 
. . 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. . 
.. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. . . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. . 
. . . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . . 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. . . 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. . 
. 
. 
. 
. . 
. 
. . 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
.. . . 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. . . 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
.. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. . . 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. . 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. . . . . 
. 
. 
. 
. 
. 
. . . 
. . 
. . 
. 
. 
. 
. 
. 
. . 
. . . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. . 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. . 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. . 
. 
. 
. 
. 
. 
. 
. 
. 
. 
Interquartile OTT Variation (sec) 
Ack 
Loss 
Rate 
0.001 0.010 0.100 1.000 
0.005 
0.010 
0.050 
0.100 
0.500 
Figure 16.10: Scatter plot of ack loss rate versus interquartile ack OTT variation, for N 2 connections 
that lost at least one ack 

337 
.. . .. .. ... . . 
. . . .. . . . . . . . . . . .. . .. . . . . . . . . . . . . . . . .. . . . . . . . . .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . ... . . . .. . .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . .. . . .. . . . . . . . . . . ... .. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . .. .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . .. . . . . .. .. . . . . . . . . . . . . . . . . . .. . . . . . . . . .. . .. . . . . . . . . .. . . . . . . . . .. . . . . . .. .. . . . . . . . . . . . . . . . . .. .. . .. . . . . . . . . . . . . . . . . .. . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .. . . . . . . .. . . . . . . . . . . . . .. . . .. .. . . . . . . . . . . . . . . . . . . . . 
. . . . . .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . .. . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . .. . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . .. . .. . . . . . . . .. . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . .. . .. . . . . . .. . . . . . . . . .. . . . . . . . . . . . .... . . . . . . . . . . .. . . . . . . . . .. . . . . . . . . . . . .. .. . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . .. . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . .. . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . .. . . . .. . . . . . . .. . .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . .. . . . . . . . .. . . . . . . . . . .. . . . . .. .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . .. . . . . . . . . . . .. . . . . . .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . .. . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . .. . . . . . . . . . . . . . . . . .. . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . ... . . . . . . . . . . .. . .. . . . . . . . . . . .. . . . . . . . . .. . . . . . . . . . . . . . .. . . . . . . .. . . . . . .. . . . . .. . . .. . . . . . .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . .. .. . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . .. . .. . . . . . . . . . .. . . . . . . . . . ... . . .. . . . . . . . . .. . . . . . . . . . .. .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . .. . .. . . . . .. . . . . . . . . .. . .. . . . . . . . . . . . . . . . . .. . . . . . . . .. . . . . . . . . . ... . . . . . . . . . . . . . . . .... .. . .. . . . . . . . . . .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... . .. .. .. . . . . . . ... .. . . . . . .. . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... . . . . . . . . . . . .. . . .. . . . .. . . . . . . . .. .. . . . . .. . . . . . . . . . . . . .. . . . . .. . . . . . .. . . . . . . . . . . . . ... . . . .. . . .. .. . . . . . . .. . . .. . . . . . . . . . . .. . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . .. . . . . . . .. . .. . . . . . . . . . . . . . . ... .. . . . . . . . . . . . . . . . . . . .. . . . .. . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. .. .. . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . .. . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . ... . . . .. . . . . . . . . . .. . . . . . . .. . . .. . . . . . .. . ... . . . . . . .. . . . . . .. . . . . . . . .. .. .. . . . . . . . . . . . . .. . . . . . . . .. . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . .. . . .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . .. . . . . . . . .. .. . . . . .. . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . .. . . .. . . .. . . . . .. . . . . . . . . . . . . . . . . . . . . .. . . . . .. . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . .. . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... . . . . . . . . . . . .. . . . .. . . . . . . . . . . . . . . . . . . .. . . . . .. . . . . . . . 
. . . . . . . . . . . . . 
. . . . . . . . . . . . . . . .. . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . . . . . .. . . .. . . . .. . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . .. . . . . . . . . . . . .. . .. . . .. . . . . ... .. . . . . . . .. . . . . . . . . .. . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . .. . .. . . . . .. . . . . . . . . . . . . 
.. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . .. . . . . . . .. . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . .. . . . . . . . . . . . . . . . . . . .. .. . . . . . . . .. . . . .. . . .. . . . . . . . .. . . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . . .. . . .. . .. . . . . . . . . . .. . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . .. . . . . . .. .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . .. . . . . . . . . . . .. . ... .. . . . . . . . . . .. . . . . . . . . . . .. .. . . . . . . . . ... . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . .. .. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . .. . . . .. . . . . . . . . . . . .. . . . . . . . . .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . ... . . . . . . . . . . . . . . . . . . . . . . ... . . . . . . ... . . . . . . . . . . . . . .. . . . . .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . .. . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . 
. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . .. . . . . . . . .. . . . . . . . . . . . . . . . . . . . . .. .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . ... . . . .. . . . . . . . . . . . .. . . . . . . .. ... . . . . . . . . . . . . . . . . ... . . . . . . . . . . . . .. . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . .. . . . . . . . . . . . . . . . . .. .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .. . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . ... . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . .. . . . . . . . . . . . . .. . . . . . . .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . ... . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . .. . . .. . . . . . . . . . . . . . . . . . . . .. . . . .. . .. . . . . . . . . . .. . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . .. . . .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . .. . . .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .. .. . . . . . . . . . . . ... . . . . .. . . . . . . . . . . . . . . . . . . . . .. . . .. . . . . . . . . . . .. . . . . . . . . .. . . . . . . . . . . . . . . . .. . . . . . . . .. . . . . . . . . . . . . . . . . ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... .. .. . ... . . . . . . . . . . . .. . . . . . .. . . .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . .. . . . . . . . . . . . . .. . . . . . . .. . . . . . . . . . . . .. . . . . . . . .. . . . . . . .. . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . .. . . . . . . . . .. . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . .. . .. . . . . .. . . . . . . .. . . . . . . . .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . .. .. . . . . .. .. . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . .. . . . . . . . . . . . . . . . . . . . . . . . .... . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . .. . . .. . . . . . . . .. . . .. .. . . . .. . . . .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .. . . . . . . . . .. . . . . . . .. . . .. . . . . ... . . . . . . . . . . .. . . .. . . . . . . . . . . . . . . . . .. . .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . .. . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 
. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... .. . . . . . . . . . . . . . 
. . . . .. . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . . . . . ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . .. .. . . . . . . . . . . . . . . . . . . . . . . .. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . .. . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . . .. . . . . . . . . . . .. . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . .. . . . . . . . . . . . . .. . . . . . .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . .. .. .. . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .. . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . .. . .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . .. . . .. . . . . . . . . . . . .. . . . .. . . . . . . . . . . . . . . . . . . .. . . . . .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. . .. . . . . . .. . . . . . . . . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . .... . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . .. ... . . . . . . . . . . . . .. . . . .. . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . .. . . . . . . . . . .. . .. . . . . .. . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . .. . . . . . . . . .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . . . . .. . . . . .. . . .. .. . . . . . . . . .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . ... . . . . . . . . . . . . . . ... . . . . . . . . . . . . . . . .. .. . . . . . . . . .. . . . . . . ... 
. . . . . . . . . . . . . . . . . . . . 
.. . . . . ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 
. . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .. . . . . . . . . . .. .. . . . . .. . . . .. . . . . . . . . . ... . . . . . .. . . . . . . . . . .. . . . . . . .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . .. . . . . .. . . . . . .. . . . . . . . . . .. .. . . . . . . . . . . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . .. . . . . . . . . . . . . . . . . .. .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. . .. . . . . . . . . . . . . . . . .. . . . . . . .. . . . . . . . . . . ... . . . . . . . . . . ... . . . . . .. . . . . . . . . . . . . . .. . . . . . . . . . . . . .. . . . . . . .. . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . .. . . ... . . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . .. . .. . . ... . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . .... . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . .. . . . . . . . .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . .. . . . . .. .. . . . . . . . . . . . . . . . .. . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . .. . . . . . . . . . . . .. . . . 
. . . . . . . . .. . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . .. . . . . . . . . . . . . . . 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . .. . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .. .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . .. . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . ... .. . . . . . .. . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . .. . . . . . . . . .. . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . .. . .. . . . . . . . .. . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
. . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... . . . . . . . . . . .. . .. . . . . . . .. . . . . . . . . . . . .. . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . .. . .. . . . . . . . . . . . . . . . . . . . . . . . ... . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . .. .. . . . .. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . .. . .. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . ... . . . . .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
. . . . . . . . . . . . . .. . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
Interval Between Connections (sec) 
Magnitude 
of 
Difference 
Between 
Normalized 
Ack 
OTT 
Variation 
(IQR) 
10^2 10^3 10^4 10^5 10^6 
0.04 
0.05 
0.06 
0.07 
Figure 16.11: Evolution of how the interquartile range of normalized ack OTT variation differs with 
time 
After constructing these pairs, we sort them on \DeltaT c and then use an exponentially­ 
weighted moving average (EWMA) with ff = 0:01 to smooth how j\Deltaoe c j evolves as a function 
of \DeltaT c . We first computed the EWMA with an initial value of 0, but inspecting the resulting plot 
indicated that, even for very small \DeltaT c 's, j\Deltaoe c j was around 0.04, so we used 0.04 for the initial 
value in our final computation. 
Figure 16.11 shows how the smoothed j\Deltaoe c j evolves with time. The horizontal line cor­ 
responds to the median normalized ack OTT interquartile range for a single connection: a bit over 
5% of the RTT. Note that the y­axis ranges from only 0.04 to 0.07. Thus, the change in normalized 
variation slowly ranges from a bit below the median variation to a bit above, across a wide range of 
time scales. Figure 16.12 shows the same plot except for ``raw'' ack OTT variations, that is, the IQR 
of the variations without normalizing by dividing by the connection's round­trip time. Again, we 
see a rapid rise followed by a slowly­increasing regime between 6­9 msec (keep in mind that this 
plot is heavily averaged; some paths have IQR variations far higher than 10 msec). The horizontal 
line corresponds to the median IQR variation for a single connection---just under 6 msec---which is 
quickly exceeded. 
Since even the minimum j\Deltaoe c j is not a great deal below the median normalized OTT 
variation, and the raw IQR differences rapidly exceed the median raw OTT variation, we conclude 
that a connection's ack OTT variation is not a very good predictor of future variation. This compares 
with Figure 15.18, which shows that a connection's loss rate is not a very good predictor of its 
future loss rate, either. Both figures argue that caching detailed network path information will prove 
beneficial only in the near­term, meaning on the order of a few minutes into the future. 

338 
. .. . . .. . . ... .... . .. .... . .... ... .. . ..... .... .. .. ..... . ... .. ...... ...... ... . . ..... ... .. .... . . .. .. . .. ... .... . .... .. . .. ... .. .. ... ... . .. .. .. .. .. . . . ... . . . ... ... .... .. .... .. ... . . . ... . . . ........ .. . . . .. . .. ........ . .. .... .. . . . .... . . . . . .... .... .... . ....... .... .. .. .... . .. .. . .. . . .. . . .... . ........... . . .. .... .. . ... .. .. .... ... . .. .. .. ... .. .. .. ... . ..... . . .. . . . .. .. ... .... . . . ... .. .... . . ..... ... .... ... .. ... .... . . . . . . ... ... . . ... . ... .. .. . .. .. ... ... .. .... .. . . .. . ... . . . ... . .. .. . .... . . . . . . ..... . .. .... . .... . . . ...... ..... ... .. ..... . ... . .. . . . . . . . ... . ... . . ... ... ..... . .... .. .. . ....... .. . ... ..... .. . ... .. ..... . ... .. ... ... .. .. . . ... ... ... ..... ... ... . .. ... ... ..... .... .. .. .... .. ..... . ... . . .... . ... . . . .... ..... ..... . . .. . .. .. . . .. .. .. . .. ......... ... ... . . ... . ... ... .. .. . ... . . . ... .. .. . . ... . .. ... . . .. ... .... . .. ... ... . .. . . . ...... .... ....... . .. . . .... . . ....... . .. .. .. . . . ... . .. ... . .... . . .. .. . .. ... .... ... .. ... . . ... . . .. .. .. . .... . .. . . . .. .. .. .... .. ... . .. . . .. . ... . . ... .. ... . .. .. ... .. . .. . . .. ... . . . ..... .. . . ... . .... .. . .. .. . .... ..... ..... ... ... .... ... .... ... .... ..... .. .. .. .. .. . .. . .. .. . . . . ... .. .... . ..... . .. .. ... .. . . . ... .. .. .. . . .. .. .. . .. . . . . .. . . .... ...... . . ... .. . . . . .. . ... . . ... . .. . .. . .. .. . .. .. .. . .. . .... . .. ... .. ... .. .. .. . .. . .. .. .. .. . ... ... .. .. .. . . .. . . .. .. . .. . . .... .. . .. ... .. ... . . . .... . .. . .. .. . .. ... .. ... . . . .. . . . ... .. .. . .... . .. . .. . . ... .. .. .. .... ... .. .. .. . . .. .. . ... ... . . . .. .. .. ... ... .. .. .. .... . ... .. ... .. .. ... . .. ... . . . .. .. . .. .. .... . .. . . ... . .... . . .. . .. . .. .. . . . ... . .. .... . .. . .. ... .. . ... .... ... .. . ... . .... .. . .... . ... .. .. . . .. .. .. . ...... .. .. .. . ... . .. .. . . . ... . . . . . .. .... ... .. . .. .... . . .. . .. . ... .. . .. . . ... . ... .. . .. . .. .. .. . ... . ... . ..... . .. ... . . . . .... . . . .. .. .. ..... .. .. .. .. .... .. ... ... ..... . . . .. .. . . .. . .. . ..... ..... ...... .. ... .. . . . .. .. .. . . .. .... . .. .... .. . ..... .. .. .... . .. .. .. . .. .. . . . .. ... .. . .. .... .... ... .. .. . . .. . .... ... ..... ... ..... .. . ... ... .. . ... . . . .... ... .. ... . .. . .. .. . .. . .. . . . .. ... .. . ... . .. ... .. . . . .. .. . . . . . .. . .. ... . . . . . ... . ... . . . . .. ... . . .. .. .. . ...... .. . . .. .. ... . .. .. . . . . . . . .. . . .. ... . . . .. ... .. .. . .. . .. . . . . ... . .. . .. . .. .. . . . .... ... . .. . .. .. . . ... .. .. . .... . ..... ... . . ... .. .. ... . . ... . ... . .. . . . ... .... .. . . .. .. .. . .. . . .. .. . .... . .. .. .. ... . . . .. .. .. .. . .. . . .... .... ... . . . .. . .. . .. . . .. . .. . . .. .. . . ... ... .. .. .. . . .. . . .. .. . .. . .. . ... .. . .. ... . . ... . .. .. .. . . . .. . . . . . . . . . . . .. . .. . .... . . . ... .. . .. . .. . .. . . .. ... . . . . . . .. . . .. . .. . ... .. . . .. . .... .. . ..... . . . . . . . . .. .. . .. . .. . . . .. . .. .. . ... ... . . . .. ... . .. .. .. . . . . .. .. .. . . ... . . . . . . . . . . .. .. ... . .. .. .. . .. .. . .. . . . .. . . . . .. . . .. . . .. .. .. .. . .. .. .. . .. .. . . . .. . . . .. .. . . .. .. . .. . .. .. . . .. .. . . ... . . . . ... . . . ... . . .. . . . . . . .. . . . . . . .. . .. . .. . .. .. . . .. .. . .. . . ... . .. ... . . .. . . . . . . . .. .. . .. . . . . . ... .. ... . . . . . . . .. . . . .. . . .. . .. . . . . . . .. . ... .. ... .. . . . . . . ... .. . . .. . . .. .. . . .. . . . .. . . . .. .. . ... . . . .. .. . . ... . . ... .. . . ... . . . .. . . . . . . . . .. . . ... . . . . ... . . ... . .. . .. . . . . .. .. . . ... ... . . . . . . . . . . .... . . . . . .. . . ... . ... . .. . .. ... .. . .. . . .. .. . .. .. . . . . . . . . .. ... . . . . . . . . .. . . . . . . . . . .. . . . . . .. .... . .. . . . . .. . ... . . .. . . . . .. .... .. .. .. . . . . .. . .. . . .. . . . . . . . . . .. . ... .. .. .. . . .. . . .. . . . . . . . . . . .. . . . . .. .. . . . . .. . . . . .. .. . .. . . . . . . . . . . . . . . . . .. . . . .. .. . . .. . . . . .. . . . . ... . . .. . . .. .. . . .. . . . . . . . . .. .. . . . . . . . . .. . . . .. .. .. . . . .. . . . . .. . .. .... . .. .. . . .. . . ... . .. . . .. . . .. . . . .. . .. . . .. . . . .. ... . . . . . . . . . . . .. ... . . . .. . . .. . . . . . .. . .. .. . . . . .. . ... . .. .. . . . . .. . . . .. . . . . . . . . . .. . .. . . . . .. . .. . . . . . .... . . . . .. . . ..... . .. . .. . . . .. . .... . . . . . .. . . . . . .. .. . .. . . .. . .. . . . . . . .. . . .. . .. . . . ..... .. .. .. . . . . .. . .. . . . . . . . .. .. . . .. .. . . . . . . . . ... . . . . ... . . . . . .. . . . . .. . . ... . .. . ... . .. . . .. . ... . . .. . .. . . . . .. .. ... .. . . . ... . ... .. . .. . . . . . . . . . .. . .. . ... .. . ... . . . . . ... . . . .. . . . .. ... .. . .. .. . . .. ... .. . . . .. .. . . . . . . .. .. . . .. . . .. .. . ... .. . . . . . .. . . . .. . .. . .. .. . ... . . . .. .. . .... .. . . ... .. . . . .. . . .. . .. . . . . . . . .. . .. . . . . . . . .. . .. .. . . .. . .. .. .. . . . . . ... . . . . . .. . . . .. . . . . . .... . . . . . . . . . .. . . . .. . . . .. . . . .. . . .. .. . . . .. . ... . .. .. . .. .. . . .... . . . . .. . . . . .. . . .... .. . . . . . . ... . . . . . .. . .. . . . . ... . .. . . ... . . . . . . . . . . . . . .. . . . .. . . .. . . . . . ... .. . . . . . . ... .. . . . .. ... .. . . .. ... .. . . . . . . . .. .. . .. .. .. . .. . . . .. . .. .. .. . . . . . .. .. . .. .. . . .. . . . .. . . .. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. . . . .. . . . . . .. .. . ... . . . .. . . . .. . . . . ... . .. .. . .. . . . .. . . . . .. . . . . . . . . .. . .. . . . . . . . . . . . . .. . . . .. . . . . . ... . .. .. . . .. . . .. . . . . . . . . . . . . . . . ... . . . . .... . . . . . . .. . . .... ... . . . . . . .. ... . .. .. . . . . . ... . . .. . .. ... . .. . . . .. . . . . .. . . ... .. . . . . . . . . . . . . . . . . . .. .. . .. . . . . . .. . . .. .. . . . . . . . . . . . . . .. . . . . . .. . .. . . . . . . .. . . . . . .. . .. . . .. . .. . . . . . .. .. . . . . . . . . . . .. .. . . 
. . . . . . . . . . . . . . . . . .. . . .. . . . . . .. . .. . . . . . . . . . . . . . . . ... .. .. . .. . . . . . . . . . . .. . .. . . . .. . . . . .. . . . . .. . . . . . . . . . . . . . . . . . . . . . . . ... . . . . .. . .. .. . . . . . . ... . . .. . . .. 
. . . .. . . . . . . .. . . . . . . . .... . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . .. . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . ... . . .. . . . .. . . . . .. . . . . . . . . .. .. . . . . .. . .. . . . . . .. . . . . .. . . . . . . . . . . . . . .. . . . . . . . .. . . . . . . .. . . . . . . . . . . . . . .. . . . . . .. .. . . . . . . . .. . .. . .. . . . . . ... . . . ... . ... . . . . . .. . . . . . . .. . . . . . . . . .. . . . . . . .. . .. . .. . . . . . . . .. . . . . . .. . .. . . . . . . ... .. . .. . . . . ... . .. . .. . . . . . .. . .. . . . . . . . . . . . . . . . . . . . . . . . .... . . . . . . . . .. . . .. . . . . . . . .. ... . .. . . .. .. . .. . .. . . . .. . .. . . . . . . . . . . . . .. .. . 
. . . ... .. . . . . . . . . . . . . .. . . . . .. . . .. . . . . . . . . . . . . . . . .. . . .. . .. . . . .. . . .. .. . .. . .. .. . .... . . . . . ... . .. .. . . . . . . . . . . . . . . . . ... . .. ... . . . . .. . . . . . . .. . . . . .. . . . .. . . . . . . ... . . . . . . ... .. . . . . . . . . . .. .. . . . . . . . .. .. .. . .. . . . . . . .. . . ... . . . .. . . . . . . . . . . . . . . . . .. . . . . .. .. . . . . .. . . . . . . . . . . . . .. . . . . . .. . . . . . . . . . ... . . .. . . .. . .. . . . . . . . . . . . . . . . . . . . ... . . .. . . . .. . . . . . . . . . .. . . . . . . . . . . . .. .. . . . . . . . . . .. . . . . . .. . .. . . . . . . . . . ... .. . . . . . . .. . . . .. . . . . . . . . . . . . . . . . . . . . . . .. .. . . . . . . . . .. . . . . . . . .. .. .. . . . . . . . . . .. . . .. .. . . . . .. . . .. . . . . . ... . . . . . . . . ... . . . . . . . .. . . . . .. .. . . .. .. . . . . . . .. . . . . . . .. . . . . . . . . . . . . .. .. . . . .. . . . . . . . . . .. . . . .. ... . . . . . . . . . .. ... . . . . . . . .. . . . . .. . . . . . . . . . .. . . ... . . . .. .. . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . .. . . . . .. . . . . . . . . . . . . . . .. .. . . . . . . . . . . . .. . . . . . . . . . . . . .. . ... . . . . . . . . . .. . . .. . . . . . . . . .. . . . .. .. . . . . . . . .. . .. . . . . . . .. . . . . . . . . .. . . . . . . . .. ... . . . . . . . . . . .. . . .. . .. . . . .. . . .. . . . . . . . . . . .. . . . . . . . . . . . .. . . . .. . ... . ... . . . . . . . . . . . . . . .. . . .. . . .. . . . . . .. . . . . .. . . . . . . . . .. .. .. . . . .. . .. . . . .. . . . . . . . .. . . . . . . . . . .. . . . . . . . . . . . . . . .. . . .. . . . .. . . . . . . . . . . . . .. ... .. . . . .. . . . . . . . . . .. . . . . .. . . .. . . . . . . . . . .. . . . . . . . .. . . . . . . . . .. . . ... . . . . . . . . . . . .. . . . .. . . ... . . . . . . . . . . . .. . . . . . . . . . .. . . ... . . . . . . . . . . . . . . . . . . . . . ... . . . .. . . . . . . . . . . . . . . . .. . . . . . . . . . .. . . . .. . . . . .. . .. . . . . . . . ... . . . . . . . . . . . . . . . . . .. . .. . . . . . .. . . . . . . . . . . . . . . . .. . . . . . . . .. . . . . . . . . . . . . . . .. . . . . . . ... . . . . . . . .. . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . ... . .. . . .. . . . . . . . . . . ... .. . . .. .. . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . . . .. . . . . . . ... . . .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . .. . . . . . . . . . . . . . . .. . . .. . .. . ... . . . . . . .. . . .. . . . . . . . . .. . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . .. . ... .. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . ... . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . .. .. . .. . . . . . . . . . . . . . . . . . . . . .. . .. 
. . . . . . . . . . .. . . . . . . . .. . . . . . .. . . . . . . . .. . .. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . .. .. . . . . . . . . . . . . .. . . . . . .. . . . . .... .. . . . . . . . . . . . . . . . . . . . .. . . . . . . .. . . ... . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . .. . . .. . . . . . . . ... . .. . . . . . . . . . . . . . . .. . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . .. . . . . . . . . .. . . . . . .. ... . . . . . .. . . . .. . . . . . .. . .. . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . .. . .. . . . .. . . . . . . . . . . .. . . .. . ... . . . . . .. . . . . . .. ... . . . . . . . . . . . . . . . . . . . . . .. . . . .. .. . . . . .. . . .. . . . . . .. . . . . . . . . .. . . . . . . . .. . . . . . . . . . .. . . . . . . . . . . . .. . . .. . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . .. .. . . . . . . . . .. . . . . . . . . . . . . .. . . .. . . .. . . .. .. . .. . . . . . . . . . .. . . .. . . . . . . . . . . . . . . .. . . . . . . . . .. . . . . . . .. . . . .. . . . . . . . . . . . . . . . . . . . . .. .. . .. . . . . . . . .. . . . . . . . . . . . . . . . . . .. . . . .. . . . .. .. . . . .. .. .. . .. . . . . . . . . . .. .. . . . . . . .. . . . .. .. . . . . . . . .. . . . .. . . . . . . . . . . . . .. . . . .. . .. . .. .. .. . . . . . . . . . .. . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . ... . . . . . . . . . . . . . . . . . .. . . . . . .. . . . . . . . ... . .. .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . ... . . . . . . . . . .. . . . . .. . . . . . . . . .. . . . . . . .. . . . . . .. . . . . . . . .. . . . .. . . . . . . . . . . ... . . .. .. . . . .. . . . . . . .. .. . . . . . . . . . .. . . . . . . . . .. . . .. .. . . .. . . . . . . . . . . . . . . . . . .. . . . . . . . .. . . . .. ... . .. . . . . . . . . . . . . .. .. . . . . . .. . . . . ... . . . . . . . . . . . .. . . . . . . . . . . . . .. .. . . . . . . . . .. . . . . ... . . . . . . . .. .. .. . . . . .. . . . . . . .. . . . . .. . . . . . . . . . .. .. . . .. . .. . . . . . . . . . . . . . . . . . .. .. . . . . . ... . . . . .. .. . . . . . . . . . . . . . .. . . .. .. .. . . . ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. . . . . . . . . .. . . . . . . . . . . . . . . .. . . . . .. . . . . . .. . . . . . . . .. . . . . .. . . . . . .. .. . . . . .. . . . . . . . . . . . . .... . . .. .. . . .... . . . . . . . .. . .. . .. . .. .. . . . . . . . . .. . . . . . . .. . . . . . . . . . . .. . . . .. . . . . . . . . . . . . . . .. . . . .. 
. . . . . . . . . . .. . .. . .. .. . .. .. . . . . . . . . . . .. . . . . .. . . . . . . . . . . . . . . . . . . . . .. . .. . . . . . .. . . . . . .. . . .. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . .. . . . .. . . .. . . . . . . . . . . . . . . . .. .. . . . . . .. . . . . . . . . . . . . . . .. . . . . . . .. . . . . . . . .. . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. . . .. . . . .. . . .. . .. . . . . . . . . . .. .. . . . . . . . . . . .. . . . . . . . . . . . . . . . .. . . . . . . . . .. . . . . . . . . ... . . . . . .. . .. . .. . . . . . . .. . . . . . . . . . .. . . . . . . .. . . . .. . . . . . . . . . .. .. . ... . . . . . . .. . .. . . . . . . . . . . . . . . . . . . . . .. . . .. . . . . . . .. . . . . . . .. . . .. . . . . . . . . . ... . . . . . . . .. . .. . . .. . . . . . . . . .. . . . . .. . . . . . . . . . . . . . . .. .. . . . . . . . . . . . . . . .. .. . . .. . . . . . . . . . . . .. .. . . . . . . . . . . . . . .. . . . . ... . . . . . . . . . . . . . . . . . . . . . . .. . . . . .. . . . . . . . . . . .. .. . . . . . . . . .. .. . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. . . . . . . . . . . . . . .. . .. . . . . . . . . .. . . . . . . . . . . . . . . . . . . . .. . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . .. . . . . . . .. .. . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . .. .. . . . . . . . . .. . . . . . . . .. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . .. . . .. . . . . ... . . . . . .. . . . . . . ... . . . . . . . . . . . .. . . . . . .. . . . . .. .. . . . . . . . . . .. . .. . . . . . . . . . . .. .. . . . . . .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . .. . . . . . ... . . . .. ... . . . .. . . .. . .. . . . . . .. . .. . . . . . . . . . . . . . . . .. .. . . . .. . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... . . . . . . . . . . . . . . . .. . . . . . . . . .. . . . . . . . . . . .. . . . . . . . .. . . .. . . . . . . . . . . . .. . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . .. . . . . . . . . . . . . . . . . . . . .. . .. .. . . . . .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . .. .. . .. . . . . .. . . . . . . . . . . . . .. .. . . .. . . . .. . . . . . . . . .. . . . . . . . . .. . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. . . . . . . . . . .. . . . . . . .. . . . . . . . . . .. . . . . . . .. . . .. . . . . .. . . . .. . ... . . . .. . . . . . . . . . . . . . . . . . . . . . . 
. . .. . . . . .. . .. . . . . . . . . . . .. . .. .. . . . . . . . . . . . . . .. . . .. . . .. . . . . . . . . . . .. . . . . . .. . .. .. . . . . .. .. .. .. ... . . . . . . . . . . .. . . . . . . . . . . . . . . .. . . . . . .. . . . . . . . . . . . . .. . . . . . ... .. . . .. . . . . . . . .. . . . .. . . . . .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . .. . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . .. . . .. . .. . . . . . . . . . . .. . .. . .. . . .. . . . . . . . . .. . . . . ... . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . .. . .. .. . .. . . . . . . . . . . .. .. .. . .. . . . . .. . . . . . . . . . . . . . . . . ... . . . . . . .. .. . .. . . . . . . . . . .. . . . . . . .. . . . . . . . . . . . . . .. . . . . . . . . . . .. . . . . .. .. . . . . ... . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . .. . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . . . . .. . .. . ... . . .. . . .. . . . .. . . . .. .. . . . . . . .. . .. . . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . .. . . . . . . . .. .. . . . . ... . .. . .. . . .. . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . .. . .. . . . . ... . . .. . . . .. . . . . .. . . . . . . . .. . . . . . .. . .. . . . . . . . . . . ... . . . . . . . . . . . . .. . .. . . . . . . . . . . .. . . .. . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . .. . .. . . . . .. . . . . .. . . .. . . .. . . . . . . .. . . ... . . . . . . . .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . .. . . . . . . . . .. . . . . . . . . . . . . . .. .. . . .. .. . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . .. . . ... . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . .. . .. . . . . .. . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . . . . .. ... . . . . . . . . . .. . . . . . . . . . . .. . . . .. . . . . . . . . . . . . . . . .. . . . . . .. . . . . . . . . . . .. . . . . . ... . . . . . .. . . . . . . .. . . . . . .. . . . . . . .... . . . . . . ... . . .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . .. .. . . . .. . . . .. . . . .. . . . . . . . . . .. . . . . .. . . . .. . . .. . . .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . .. . . .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . .. . . . .. . . . . . . . .. . . . . .. .. . . . .. .. ... . . .. . . . . . .. . . . . . . . . . . ... . . . . .. . . . . . . . . .. . . . . . . . ... . .. .. . . . . . . . . .. . . . . .. . . .. . . . . . .. . . . . . . . .. . . . . . .. . . . .. . .. . . . . . . .. . . . . .. . . . . . . . . . . . . . . .. . . . . . ... . . . . . . . . . . . .. . . . . . . . . . . . . .. . . . . . .. . . . . . . . . . . . . . . . . . ... . . .. . . . . . .. ... . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . .. .. . . . .. . .. . .. . . . . . . . .. . . . .. . . . . . . . . . . . . . . . .. . . . . . . ... . . . .. . .... .. . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . .. . . .. . . .. . . .. . .. . . . . . .. . . . . . . .. . . . . . .. . . . . . . . . . ... . . . . . .. . ... .. . . . . .. . . . . . . . . . . . . . . . . . .. . . . .. . . . . .. . . . . ... . . . . . . . . .. .. .. . . .. . .. . . . 
. . . . . . . . . . . . .. . . .. . . . . . . . . . . . . . . . . . . . .. . . . . .. . . .. ... . . . .. . .. . . . . . . . . . .. . . . . . .. . . . .. . ... . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . .. . . . . . . . . . . . .. . .. . . . . .. . . . .. . . ... . . . .. . . . . . . . . . .. . . . . . . . .. . . . . . . .. . . . .. . . . . . .. . . . . . . . . . . . . . . .. . .. .. . . . . . . . .. . . . . .. .. . . . . .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . .. . .. . . . . . . . . . . . . . . . . .. . .. . . . . . . . .. . . . . . . . . .. . .. . . . . . . . . . . .. . . . . . . . . .. .. . . . . . . . . . . .. . . . .. . . .. . . . . . . . . .. . . . . .. . . . . . . . . . . . . . . . . . . . . . . . 
Interval Between Connections (sec) 
Magnitude 
of 
Difference 
Between 
Raw 
Ack 
OTT 
Variation 
(IQR) 
10^2 10^3 10^4 10^5 10^6 
0 
2 
4 
6 
8 
Figure 16.12: Evolution of how the interquartile range of raw ack OTT variation differs with time 
16.2.6 Removing load from OTTs 
In x 15.2 we developed the notion of ``loaded'' data packets, namely those which would 
have to queue behind their predecessors at the bottleneck due to the spacing between the time of 
their transmission and that of their predecessors. In this section we look at the subtle problem of 
removing the packet's load, as given by Eqns 15.3 and 15.4. The main problem we face in doing so 
is that the estimated bottleneck bandwidth given by Eqn 14.12 in Chapter 14 is inexact. In particular, 
our methodology produces an error range associated with the estimate. 
Depending on which value within this range we use, Eqns 15.3 and 15.4 (or, more ac­ 
curately, their counterparts for the particular estimate we use) can in some circumstances produce 
considerably different values for a packet's load. Thus, if we subtract that load from the packet's 
OTT we can easily under­ or over­estimate the packet's ``true'' OTT, meaning its OTT if it did not 
have to queue behind its predecessors at the bottleneck. 
We can partially address this uncertainty using a self­consistency check for the estimated 
bottleneck bandwidth. In particular, we can test the soundness of the central estimate of the bottle­ 
neck bandwidth, ae B , as follows. We first compute for each connection \DeltaQ, the difference between 
the minimum and maximum OTTs for the connection's loaded data packets. (The difference is 
presumably due to queueing, hence the notation \DeltaQ.) We then subtract out each packet's load (as 
given by Eqns 15.3 and 15.4 when using ae B rather than ae 
minimum and maximum OTTs. \Delta e 
Q is thus the counterpart to \DeltaQ 
for the loaded packet OTTs, after adjusting for the connection's own contribution to the delays. We 
would expect to find 
\Delta e 
Q Ÿ \DeltaQ; 
since a connection's extra, self­induced delay should only increase the OTT extremes it experienced. 

339 
Time (sec) 
OTT 
(msec) 
0.2 0.4 0.6 0.8 1.0 1.2 1.4 
­10 
0 
10 
20 
30 
40 
50 
Figure 16.13: OTT plot revealing ``broken'' bottleneck estimate: one that is too low. Solid squares 
mark unadjusted OTTs, hollow squares mark OTTs adjusted to remove load based on bottleneck 
estimate. 
If, however, we find that 
\Delta e 
Q ? 1:1 \Delta \DeltaQ; (16.1) 
and if the difference between the two is also larger than twice the joint clock resolution R s;r (x 12.3) 
(to assure that it is not just due to measurement noise), then we consider the bandwidth estimate ae B 
as broken: likely wrong, since using it to subtract out queueing effects actually increases the range 
of OTTs we observe. 
This check is not foolproof. It can generate both false positives and false negatives. For 
example, it may be that the packet with the greatest OTT had little load to subtract out, while that 
with the least OTT happened to have more load, leading to an erroneous determination that ae B is 
broken. Using a factor of 1.1 in Eqn 16.1 helps avoid the possibility of these sorts of false positives, 
by only flagging a ae B estimate as broken if using it leads to a significant increase in adjusted delay. 
The check might also fail, generating a false negative, if ae B is indeed quite inaccurate, 
but subtracting out inaccurate loads from the OTTs still happens to reduce their range. We find that 
these false negatives are much more likely to occur if ae B is too high, since an overestimate leads to 
relatively little (but still some) load being subtracted. If ae B is an underestimate, then excessive load 
is removed, which tends to lead to some packets having grossly under­adjusted OTTs, widening 
\Delta e 
Q. 
The test is worth making because it detects two situations of interest. First, as noted 
above, if ae B is too low, then the calculated packet loads will be too high, and subtracting them out 
will often expand the range. Figure 16.13 shows an instance of this occurring. The solid squares 
show the OTTs of the connection's data packets, and the hollow squares correspond to the OTTs 
adjusted for the (erroneously too small) bottleneck bandwidth. The trend towards progressively 
lower adjusted OTTs indicates that the low estimate leads to removing more and more spurious load 

340 
Time (sec) 
OTT 
(msec) 
0 5 10 15 
­1500 
­1000 
­500 
0 
500 
Figure 16.14: Another OTT plot revealing a ``broken'' bottleneck estimate: one that failed to detect 
a change in the bottleneck rate. Solid squares mark unadjusted OTTs, hollow squares mark OTTs 
adjusted to remove load based on bottleneck estimate. 
as the connection transmits more packets that are erroneously judged to queue behind one another. 
We particularly want to detect the case of ae B too low, because later in this chapter we will 
use load computations as a basis for determining the degree of available bandwidth in the network 
(x 16.5), and we want these computations based on solid estimates of packet loads. The other case 
that the test can detect is the presence of an undiagnosed bottleneck change. If ae B corresponds to 
the slower of the two bottleneck rates, then tcpanaly will compute excessive loads for the packets 
transmitted during the era of the faster bottleneck rate. Figure 16.14 illustrates this happening. The 
estimated ae B is fairly accurate for most of the trace (a bit too high, as indicated by the slowly rising 
adjusted OTTs---not enough load is being removed). However, at T = 12 sec, when the bottleneck 
rate doubles, the estimate becomes much too low, and leads to removing too much load. 6 
Table XVIII in x 14.7 summarizes how often this check detected a broken bottleneck rate 
estimate in N 1 and N 2 . It was not very often, which contributes to our faith in the PBM algorithm 
for detecting bottleneck rates (x 14.6), but it did detect some problems, indicating it is worth the 
effort to perform the test. 
As the lefthand portion of Figure 16.14 indicates, a slight mismatch in ae B can lead to 
definite, spurious trends in the adjusted OTTs. Such trends are apparent even when the estimated 
ae B is quite good. Figure 16.15 shows an OTT plot in which the bottleneck estimate is clearly 
quite good, as it accounts for virtually all of the variation in the OTTs (the adjusted times, shown 
with hollow squares, are nearly constant). Yet, if we zoom in on just the adjusted OTTs, shown 
in Figure 16.16, we see a clear downward trend in the adjusted OTTs. The trend corresponds to 
500 ¯sec over about 300 msec, or about 1 part in 600. Consequently, we see that, even though our 
6 PBM does not detect this bottleneck change because it comes so close to the end of the trace. 

341 
Time (sec) 
OTT 
(msec) 
0.2 0.4 0.6 
0 
50 
100 
150 
Figure 16.15: OTT plot showing virtually all OTT variation due to connection's own queueing load 
Time (sec) 
OTT 
(msec) 
0.2 0.4 0.6 
­0.5 
0.0 
0.5 
1.0 
1.5 
2.0 
2.5 
Figure 16.16: Enlargement of adjusted OTTs from previous figure 

342 
Time (sec) 
OTT 
(msec) 
0 10 20 30 
500 
1000 
1500 
2000 
Figure 16.17: Ack OTT plot showing 10­sec periodicities 
estimated ae B is quite good, it is not sufficiently exact to avoid introducing an artificial trend in the 
adjusted OTTs. 
Because of this problem, we abandoned our original goal of trying to treat loaded packets 
the same as unloaded packets by adjusting their OTTs, as doing so requires extremely precise es­ 
timation of ae B . If the estimate is off, we introduce systematic errors that could easily be confused 
with genuine network effects. 
16.2.7 Periodicity in OTTs 
In x 15.3 we discussed our efforts at testing whether packet loss patterns exhibit period­ 
icity. We might expect them to do so due to synchronization effects known to sometimes plague 
Internet routers, resulting in periodic packet forwarding outages. These lead to lengthy delays and 
perhaps loss, if buffers become exhausted during the outage [FJ94]. In this section we briefly discuss 
evidence in our data for periodic variations in packet delays. 
In attempting to assess delay periodicities, we run into the same problems as when assess­ 
ing loss periodicities: our data unfortunately are not suited for a proper investigation of the question. 
x 15.3 outlines the reasons for this and we will not repeat them here. We did, however, attempt the 
same analysis as in x 15.3: we selected connections between the North American sites exhibiting the 
highest degree of clock synchronization, singled out the busiest day among them, and analyzed their 
connections to determine the time at which the connection's largest delay occurred. We then studied 
plots of the peak delay time versus the same time modulo different possible periodicity intervals. 
This effort did not find any conclusive evidence of global periodicities. 
However, phenomenological inspection of other traces shows that delay periodicities def­ 
initely do occur. Figure 16.17 shows a plot of ack OTT times for a connection from connix to 
ucl. The distance between the first OTT peak at about 1000 msec and the second such peak (we 

343 
are ignoring the striking, 2000 msec peak) is 10.07 sec, while that between the second peak and the 
third such peak (at about 1200 msec) is 19.92 sec. Furthermore, two acks were sent about 10 sec 
after the second peak, but both were lost (hence, they do not appear on the plot). Thus, this trace 
exhibits strong evidence of a 10­second periodicity. We find a number of other traces to ucl with 
the same spacing between delay peaks, suggesting that this is an ongoing phenomenon. 
We observed other traces with apparent 5­second and 30­second periodicities in delay 
spikes, involving different hosts, indicating that the phenomenon is not confined to only ucl. On 
the other hand, we did not find strong evidence above of global periodic delay variation among the 
highly­synchronized North American sites. Thus, we conclude that the phenomenon is definitely 
present, but, if widespread, at least not globally synchronized. 
16.3 Timing compression 
Packet timing compression occurs when a flight of packets are sent over an interval \DeltaT s 
but arrive at the receiver over an interval \DeltaT r , with \DeltaT r ! \DeltaT s . To first order, compression should 
not occur, since the main mechanism at work in the network for altering the spacing between packets 
is queueing, which in general expands flights of packets, as later ones have to wait behind the 
transmission of earlier ones (x 14.2). However, compression can occur if a flight of packets is at 
some point held up by the network, such that transmission of the first packet stalls and the later 
packets have time to catch up. 
Zhang et al. predicted from theory and simulation that acks could be compressed (``ack 
compression'') if a flight arrived at a busy router (one with a significant queue), and if no intervening 
packets arrived between the different acks [ZSC91]. As the acks queue behind one another, the 
potentially large spacing between them due to self­clocking (x 9.2.5) and ack­every­other policies 
(x 11.6) would then be lost when the acks were later transmitted back­to­back upon reaching the 
front of the queue. 
This situation corresponds to a draining queue: a router that was busy when the first ack 
arrived (and hence could not service it before the others arrived), and yet new arrivals from other 
traffic sources are sporadic. If instead new arrivals were steady, then they would occupy slots in the 
queue between the acks in the flight, and their spacing would be (roughly) preserved, rather than 
compressed. 
Mogul subsequently analyzed a trace of Internet traffic and confirmed (among other phe­ 
nomena) the presence of ack compression [Mo92]. His definition of ack compression is somewhat 
complex, involving significant deviations from the median inter­ack spacing, since he had to in­ 
fer endpoint behavior from an observation point inside the network (a vantage point problem, per 
x 10.4). But he clearly detected the presence of ack compression. He found that compression was 
correlated with packet loss but considerably more rare. His study was limited, however, to a single 
5­hour traffic trace. 
Since we can readily compute from our data both \DeltaT s and \DeltaT r for any flight of packets, 
we can use a simpler definition of compression than employed by Mogul. In this section we char­ 
acterize three different types of compression: ack compression (x 16.3.1), data packet compression 
(x 16.3.2), and receiver compression (x 16.3.3). We show that all three types of compression occur 
within the Internet, though each is limited in its effects. 

344 
Time 
Sequence 
# 
1.30 1.35 1.40 1.45 1.50 1.55 
36000 
38000 
40000 
42000 
44000 
46000 
Figure 16.18: Paired sequence plot showing ack compression 
16.3.1 Ack compression 
If ack compression is frequent, it presents two problems. First, as acks arrive they advance 
TCP's sliding window and ``clock out'' new data packets at the rate reflected by their arrival (x 9.2.5). 
For compressed acks, this means that the data packets go out faster than previously, which can 
result in network stress. Second, sender­based measurement techniques such as SBPP (x 14.3) can 
misinterpret compressed acks as reflecting greater bandwidth than truly available. On the other 
hand, some researchers argue that occasional ack compression is beneficial since it provides an 
opportunity for self­clocking to discover newly­available bandwidth. 
To detect ack compression, for each group of at least 3 acks we compute: 
¸ = 
\DeltaT r +C r 
\DeltaT s 
¸. 
We consider a group of acks compressed if ¸ ! 0:75. We term such a group a compression 
event. In N 1 , 50% of the connections experienced at least one compression event, and in N 2 , 60% 
did. In both, the mean number of events per connection was around 2, and 1% of the connections 
experienced 15 or more. Almost all compression events are small, however, with only 5% spanning 
more than five acks. Figure 16.18 shows a paired sequence plot of one of the larger events, in 
which eleven acks were compressed. The solid squares indicate when the data packets were sent, 
and the arrows stemming from them point to their arrival times at the receiver. The corresponding 
acks (offset downward a bit, for legibility) are shown with hollow squares. The arrows from these 
squares all stop at virtually the same point in time, T = 1:51, indicating that, even though the acks 

345 
were sent over an interval of 77 msec, they arrived all together, about 760 ¯sec apart---compressed 
by a factor of 100. 
We also note that a significant minority (10--25%) of the compression events occurred 
for dup acks. These are sent with less spacing between them than regular acks sent by ack­every­ 
other policies, so it takes less timing perturbation to compress them. Compressed dup acks are only 
slightly more likely to occur in a large burst than compressed regular acks. In N 2 , overall 5.1% of 
the compression events consisted of six or more acks: 4.8% of the regular­ack compression events, 
and 6.0 
Finally, we classify compression events as ``major'' if the compression results in the acks 
arriving at the data sender with a spacing less than that corresponding to the bottleneck bandwidth; 
otherwise, we term the event ``minor.'' Major events are significant because they reflect a breakdown 
in self­clocking---namely, the sender will transmit in response to them at a rate exceeding the bot­ 
tleneck bandwidth---and they also make sender­based bottleneck estimation difficult, since, unless 
detected, they will lead to overestimates. 
Let ae + 
B be the upper bound on the estimated bottleneck bandwidth, per Eqn 14.12. If a 
flight of k packets arrives during an interval \DeltaT r , and they together acknowledge a total of b bytes, 
then we consider the flight to reflect a major compression event if: 
b 
\DeltaT r 
? ae + 
B : 
We apply this test to each ack compression event detected by tcpanaly, except we omit the final 
ack of the event. The reason for this omission is that tcpanaly finds compression events by con­ 
structing groups of acks for which ¸ ! 0:75, and sometimes the final ack of the group is relatively 
uncompressed compared to the others (i.e., it raised ¸ from a small value to a value near 0:75). 
Consequently, we omit this final ack to avoid skewing the assessment of ``major'' events by our 
methodology for grouping acks into events. 
We find that in both N 1 and N 2 , about 75% of the compression events are major. This 
figure only slightly diminishes if we confine our analysis to compressed ``regular'' acks, eliminating 
compressed dup acks. 
Of the major compression events, 80% reflect acks arriving at a rate corresponding to more 
than twice ae + 
B . Thus, when compression occurs, it is usually large enough to result in a significant 
overestimate of the bottleneck bandwidth. 
From these findings, we conclude that ack compression definitely occurs in the Internet, 
but rarely enough as to not pose a significant problem by corrupting self­clocking or causing ex­ 
cessive burstiness. That it occurs for more than half the connections, however, and that most of 
these are ``major,'' indicates that a sender­based measurement scheme must employ filtering to re­ 
move extreme values from its bottleneck estimates, as otherwise it is very likely to overestimate the 
bottleneck bandwidth, with perhaps disastrous consequences. 
16.3.2 Data packet timing compression 
For data packet timing compression, our concerns are different. Sometimes a flight of data 
packets is sent at a high rate due to a sudden advance in the receiver's offered window. Normally 
these flights are spread out by the bottleneck and arrive at the receiver with a distance Q b between 
each packet (x 14.2). If after the bottleneck their timing is compressed, then use of Eqn 16.2 will not 

346 
Time 
Sequence 
# 
1.2 1.3 1.4 1.5 1.6 1.7 
35000 
40000 
45000 
50000 
55000 
Figure 16.19: Data packet timing compression 
detect this fact unless they are compressed to a greater degree than their sending rate. Figure 16.19 
illustrates this concern: the flights of data packets arrived at the receiver at 170 Kbyte/sec (T1 rate), 
except for the central flight, which arrived at Ethernet speed. However, it was also sent at Ethernet 
speed, so, for it, ¸ ß 1. 
Consequently, we consider a group of data packets as ``compressed'' if they arrive at 
greater than twice the upper bound on the estimated bottleneck bandwidth, ae + 
B . We only consider 
groups of at least four data packets, as these, coupled with ack­every­other policies, have the poten­ 
tial to then elicit a pair of acks reflecting the compressed timing, leading to bogus self­clocking. 
These compression events are more rare than ack compression, occurring in only 3% of 
the N 1 traces and 7% of those in N 2 . We were interested in whether some paths might be plagued 
by repeated compression events due to either peculiar router architectures or network dynamics. 
Only 25--30% of the traces with an event had more than one, and 3% had more than five, suggesting 
that such phenomena are rare. But those connections with multiple events are dominated by a few 
host pairs, indicating that some paths are indeed prone to timing compression. Figure 16.20 shows 
an example. Here, the bottleneck rate is T1, which corresponds closely with the flatter slopes in the 
plot. 
Thus, it appears that data packet timing compression is rare enough not to present a sig­ 
nificant problem. That it does occur, though, again highlights the necessity for outlier­filtering when 
conducting timing measurements. 7 

347 
Time 
Sequence 
# 
2.15 2.20 2.25 2.30 2.35 2.40 
35000 
40000 
45000 
50000 
55000 
60000 
Figure 16.20: Rampant data packet timing compression 
Time 
Sequence 
# 
0.70 0.72 0.74 0.76 0.78 0.80 0.82 
50000 
60000 
70000 
80000 
90000 
Figure 16.21: Receiver sequence plot showing major receiver compression 

348 
16.3.3 Receiver compression 
A third type of timing compression occurs when the receiver delays in generating acks 
in response to incoming data packets, and then generates a whole series of acks at one time. The 
timing of these acks appears compressed to the sender, though not for reasons of network dynamics, 
but instead due to lulls at its remote peer. Figure 16.21 shows the most striking example in our 
traces, in which the lbl receiver compressed 25 of its acks, sending them over a 2 msec interval 
instead of over the 83 msec interval corresponding to the data packets they acknowledged. (Slightly 
earlier, the receiver also compressed 6 other acks, as seen in the figure.) 
Since receiver compression is an endpoint effect, its presence tells us nothing about the dy­ 
namics of the connection's Internet path. However, receiver compression remains quite interesting 
because it is an additional noise element that any sender­only measurement scheme must contend 
with. It also leads to the same consequences as true ack compression, namely a break­down of a 
connection's self­clocking. 
To assess receiver compression, we compute: 
¸ 0 = 
\DeltaT a + C r 
\DeltaT d 
in the numerator and subtraction in the denominator makes ¸ 0 conservative. We 
consider ¸ 0 ! 0:75 as indicating a receiver compression event. Note that our earlier analysis of 
ack compression uses \DeltaT a as the original spacing of a flight of acks, and then checks whether 
that was compressed while the packets were in flight. Consequently, that analysis does not confuse 
ack compression with receiver compression: the earlier ack compression analysis only evaluates 
compression due to network behavior. 
We include delayed acks in our analysis, as these affect self­clocking. Sender­based mea­ 
surement techniques can generally detect delayed acks. 8 In both N 1 and N 2 , we find that about 
10% of the connections included a receiver compression event of at least three acks. Of these, about 
three­quarters experienced only one receiver compression event, and, in N 1 , none experienced more 
than four, though, in N 2 , the upper limit was 15. Almost all events were only 3 acks in size (95% 
in N 1 , 80% in N 2 ). 
While these statistics indicate that receiver compression is fairly rare, and even less often 
significant, we must note that, because receiver compression is an endpoint effect, these statistics 
are not necessarily representative of its frequency in the Internet as a whole. In particular, we find 
that just a few sites cause the majority of the receiver compression events in our study, so we have 
no way of telling whether other sites would tend overall towards more receiver compression or less. 
Given this caveat, we note that we find receiver compression, like other forms of timing 
compression, to be fairly rare. In particular, in our datasets it appears more rare than ack com­ 
pression, so, if this is a representative finding, then sender­based assessment of ack compression 
caused by network dynamics will not be terribly skewed by the presence of receiver compression. 
7 It also has a measurement benefit: from the arrival rate of the compressed packets, we can estimate the downstream 
bottleneck rate. 
8 Using the rule that an ack for less than two full segments was presumably delayed. This heuristic could fail in the 
future, if TCPs begin to ack every packet, which they might do to accelerate the slow­start process. 

349 
If the sender­based measurement employs filtering to remove outliers, as it needs to do anyway to 
deal with ack compression, then receiver compression does not make the measurement significantly 
harder. 
16.4 Queueing analysis 
In this section we develop a rough estimate of the time scales over which queueing occurs. 
If we take care to eliminate suspect clocks (Chapter 12), reordered packets (x 13.1), compressed 
packets (x 16.3), and traces exhibiting TTL shifts (which indicate routing changes, per x 7.7), then 
we argue that the remaining measured OTT variation is mostly due to queueing. Hence, we can 
estimate queueing time scales by analyzing time scales of OTT variations. 
For a given time scale, ø , we compute the queueing variation on that time scale as follows. 
First, we partition the packets sent by a TCP into intervals of length ø . For each interval, let n l and 
n r be the number of successfully­arriving packets in the left and right halves of the interval. If either 
is zero, or if n l ! 1 
4 n r , or vice versa, then we reject the interval as containing too few measurements 
or too much imbalance between the halves. 
Otherwise, let m l and m r be the median OTTs of the two halves. We then define the 
interval's queueing variation as jm l 
only changes that occurred on the time scale of ø . 
Changes that occurred on smaller time scales will in general all occur within either the left or right 
half, and the median of the half will not reflect the smaller­time­scale change. Changes occurring 
on larger time scales will not in general result in variation between the two halves, and so likewise 
will not enter into the computation. 
By using medians, we attempt to reduce the effects of occasionally very large OTTs. We 
found that means and standard deviations can often be unduly skewed by a small set of large OTTs. 
The question remains how to summarize the interval changes. We investigate two different 
summaries. In the first, we define \DeltaQ ø as the median of jm l 
non­queueing effects, or 
transient queueing spikes. In addition, we compute Q max 
ø , the maximum observed difference across 
any two halves of an interval of length ø . \DeltaQ ø thus summarizes sustained variation on the time 
scale ø , while Q max 
ø summarizes bursts of variation on the time scale ø . 
We now analyze \DeltaQ ø and Q max 
ø for different values of ø , confining ourselves to vari­ 
ations in ack OTTs, as these are not clouded by self­interference and adaptive transmission rate 
effects (x 15.2). The question we wish to address is: are there particular ø 's on which most queue­ 
ing variation occurs? This question is particularly interesting because of its potential implications 
for engineering transport protocols. For example, if the dominant ø is less than a connection's 
RTT, then it is pointless for the connection to try to adapt to queueing fluctuations, since it can­ 
not acquire feedback quickly enough to do so. Or if, for example, the dominant ø is on the order 
of 1 sec, then that constant helps us determine the related constants---such as the ff's for EWMA 
estimators---governing how a transport connection should update its RTT estimate in order to com­ 
pute its retransmission timeout. 
For each connection, we range through 2 4 ; 2 5 ; : : : ; 2 16 msec to find b ø , the value of ø for 

350 
Time (sec) 
OTT 
(msec) 
0 2 4 6 8 10 12 
60 
80 
100 
120 
Figure 16.22: Ack OTT plot for a connection with b ø = 4 sec for \DeltaQ ø 
Time (sec) 
OTT 
(msec) 
0 10 20 30 
50 
100 
150 
200 
250 
Figure 16.23: Ack OTT plot for a connection with b ø = 1 sec for Q max 
ø 

351 
which \DeltaQ ø or Q max 
ø is greatest. b ø reflects the time scale for which the connection experienced the 
greatest OTT variation, where the variation is sustained if computed for \DeltaQ ø , and momentary if 
computed for Q max 
ø . Figure 16.22 shows a plot of the ack OTTs for a connection with b ø = 4 sec for 
\DeltaQ ø , indicating maximum sustained variation occurs on 4­second time scales. Figure 16.23 shows 
a connection with b 
ø = 1 sec for Q max 
ø , which emphasizes the large increase in delay at T = 20 sec. 
For the first connection, the maximal Q max 
ø occurs for b ø = 64 msec, corresponding to the sharp 
spike just after T = 1 sec. For the second, the maximal \DeltaQ ø occurs for b 
ø = 4 sec, due to the 
sustained variation on 4­second time scales (for this connection, other time scales have large \DeltaQ ø , 
too, but the largest is for ø = 4 sec). Clearly, the time scales of maximum sustained burstiness 
versus those of maximum peak burstiness can differ considerably. 
Before looking at the range in b 
ø 's for our measurements, a natural calibration question is 
what sort of b ø 's we find for synthetic variations. We investigated this question by simulating 10,000 
independent and identically distributed (i.i.d.) OTT variations. Each variation was simulated as a 
random variable drawn from an exponential distribution with – = 1, 9 corresponding to an OTT 
variation computed for one unit of time (the equivalent of 2 4 msec for the preceding discussion). 
For 100 simulation runs, b 
ø was always Ÿ 2 units of time for \DeltaQ ø , and Ÿ 4 units of time for Q max 
ø . 
Thus, we see that b ø correctly indicates that the variation is confined to small time scales. If we 
simulate i.i.d. Pareto variations with ff = 1:01 (so, infinite variance and, just barely, finite mean), 
we still find b 
ø confined to small time scales, never exceeding 4 units of time. Again, this is what we 
would expect, because the fundamental time scale of change is one time unit, since the variations 
are independent. 
Figure 16.24 shows the normalized proportion of the connections in N 1 and N 2 exhibit­ 
ing different values of b 
ø for \DeltaQ ø . Normalization is done by dividing the number of connections 
that exhibited b ø by the number that had durations at least as long as b 
ø , so that the prevalence of 
short connections does not skew the distribution. For both datasets, time scales of 128--2048 msec 
primarily dominate. This range, though, spans more than an order of magnitude, and also exceeds 
typical RTT values. Furthermore, while less prevalent, b 
ø values all the way up to 65 sec remain 
common, with N 1 having a strong peak at 65 sec. 10 
Consequently, the figure indicates that sustained Internet delay variations occur primarily 
on time scales of 0.1­2 sec, but extend out quite frequently to much larger time scales. 
Figure 16.25 shows the same figure but for Q max 
ø . Here we see that basically the same 
time scales dominate variation peaks, ranging from 128 to 1024 msec. Smaller time scales clearly 
contribute, however, and so do larger time scales up to about 4 sec, with N 1 exhibiting a trend 
towards still larger time scales, while N 2 does not. We interpret the figure as indicating that peak 
Internet delay variations also occur primarily on time scales of 0.1­1 sec, but they too extend to 
larger time scales, and quite often to smaller time scales. Consequently, it appears clear that there 
is no single time scale of ``burstiness,'' which accords with the recent ``self­similar'' models of net­ 
work traffic [LTWW94], though, as a rule of thumb, most variation occurs on time scales of a 
quarter­second to a half­second, a bit above usual connection round­trip times. Thus, it appears that 
transport connections can feasibly adapt to queueing changes, but to do so they must act quickly, 
within a few RTTs, or else it will often be too late. 
9 The results are independent of –, however, since – only determines the size of the identically­distributed variations, 
but not the time scales of the variations among them. 
10 Manual inspection of traces with bø = 65 sec indicates that they do indeed exhibit their maximum variation on that 
time scale, addressing the concern that perhaps the peaks were due to some other effect, and hence spurious. 

352 
N1 
N2 
0.016 0.032 0.064 0.128 0.256 0.512 1.024 2.048 4.096 8.192 16.384 32.768 65.536 
0.15 
0.10 
0.05 
0.0 
0.05 
0.10 
Time Scale of Maximum Sustained Variation (sec) 
Normalized 
Proportion 
Figure 16.24: Proportion (normalized) of connections with given timescale of maximum sustained 
delay variation (b ø ) 

353 
N1 
N2 
0.016 0.032 0.064 0.128 0.256 0.512 1.024 2.048 4.096 8.192 16.384 32.768 65.536 
0.2 
0.1 
0.0 
0.1 
0.2 
Time Scale of Maximum Peak Variation 
Normalized 
Proportion 
Figure 16.25: Proportion (normalized) of connections with given timescale of maximum peak delay 
variation (b ø ) 

354 
16.5 Available bandwidth 
The last aspect of delay variation we look at is an interpretation of how it reflects the 
available bandwidth. In a packet­switched network, available bandwidth is a somewhat elusive 
notion. The amount of bandwidth a connection might fruitfully use varies with time, as other cross­ 
traffic connections come and go. From x 16.4 we know that significant OTT fluctuations often 
occur on time scales of 100­1000 msec, and for the upper end of this range (which actually extends 
appreciably to much larger time scales), no doubt most of the fluctuations are due to connections 
beginning or ending, rather than flights of packets within single connections beginning or ending. 
Two existing approaches for estimating available bandwidth are cprobe [CC96b] and 
Treno [MM96]. cprobe works in conjunction with bprobe [CC96a], which we discussed in 
x 14.2. To estimate available bandwidth along a network path, cprobe first uses bprobe to estimate 
the bottleneck bandwidth along the path. cprobe then transmits four groups of probes, each probe 
consisting of 10 ICMP echo packets (as with bprobe). The echo packets are sent at a rate exceeding 
that of the estimated bottleneck bandwidth, to make sure they attempt to fully utilize the bottleneck. 
cprobe then computes from the timing of the ICMP echo replies the achieved throughput, and 
considers the ratio between this and the bottleneck bandwidth to be the utilization (similar to the 
value fi which we define in Eqn 16.4 below), which indicates how much of the bottleneck bandwidth 
was actually available. 
cprobe has three limitations that we attempt to address. The first is that it requires 
sending a fairly large flight of packets at a rate known to exceed what the network path can support, 
so cprobe can be viewed as fairly stressful to a network path. The second is that, because its 
probes use ICMP echo packets, which elicit same­sized replies, the achieved throughput the probes 
achieve will reflect the minimum of the available bandwidth along the forward and reverse paths. 
As we have seen that many path properties are asymmetric, it would not be surprising to find that 
available bandwidth is, too, and thus, for a unidirectional connection, cprobe might produce too 
pessimistic an estimate. The third limitation is that the pattern in which the probe packets are sent 
differs from that in which a TCP sender will transmit its data packets. We have seen in x 15.2 
that, because TCP adapts its transmission rate to the presence of packet loss along the forward path, 
network conditions observed by TCP data packets can differ significantly from those observed by 
TCP ack packets. Thus, we suspect that available bandwidth estimates produces by cprobe might 
not closely reflect the throughput that a TCP would actually achieve. 
This second point is addressed by the developers of the Treno utility [MM96]. Treno 
also uses ICMP echo packets to probe network paths, but it sends them using an algorithm equivalent 
to that used by TCP congestion control (x 9.2). In addition, Treno can probe hop­by­hop available 
bandwidth by using increasing TTL (time­to­live) values in the IP headers of the echo packets it 
sends, just as does traceroute (x 4.2.1). When doing so, it receives in response from each hop 
(except the last) not a full­sized echo reply, but a short ICMP Time Exceeded message. Thus, even 
if the available bandwidth along the return path is less than that along the forward path, Treno will 
still primarily observe the forward­path available bandwidth, just as would a TCP connection that 
receives only data­less acks in response to its data packets. 
The main drawback of Treno is that it is a stressful technique. It estimates how fast a TCP 
could transfer data over a given network path by seeing how fast it itself can transfer data over the 
path, using a standard­conformant, but well­tuned, implementation of the TCP congestion control 
algorithm. 

355 
Ideally, we would like to estimate available TCP bandwidth without fully stressing the 
network path to do so. We do not achieve this goal in our present work. Instead, in this section 
we analyze our TCP transfer data both to characterize available bandwidths in the Internet, and to 
explore how we might perhaps in the future develop a non­stressful available­bandwidth estimation 
technique, based on fine­scale analysis of TCP packet timings. For this technique, the hope is that 
by carefully scrutinizing the delays of individual TCP packets, we might form a good estimate of 
the bandwidth available along the path they were sent, without requiring that we send the packets at 
a rate that saturates the path for any lengthy period of time. 
We proceed as follows. First, we need to define what we mean by available bandwidth. 
We might argue that, if we know that a connection is competing with k other connections, then its 
fair share of the network resources is 1 
k+1 . In particular, the connection's fair share of the bottleneck 
bandwidth, ae B , is ae B 
k+1 . These simple notions, however, quickly run into difficulties. First, during a 
connection's lifetime, competing connections come and go, so there is no single value to assign to 
k. Second, the competing connections do not in general compete along the entire end­to­end path, 
but only for a portion of it, so there may in fact be a great number of competing connections, but 
each competing for different resources. Finally, ``fairness'' itself is an elusive notion: it might well 
be the case that, for policy reasons (such as who is paying for what), or due to different traffic types, 
the simple each­gets­an­equal­share division of the resources is deemed inappropriate. (See [Fl91] 
for further discussion of the difficulty of defining a single notion of fairness.) 
With these considerations in mind, we now strive to develop a notion of ``equivalent com­ 
peting connections,'' in order to talk in general terms about available resources. To do so, we attempt 
to characterize the network resources available to a connection as a fraction of the total resources 
in use. The term we will use to capture this notion is ``available bandwidth.'' Here we presume 
that connections push on the network to extract as much resource from it as they can---TCP's slow 
start does exactly this. 11 Therefore, if a connection pushes on the network and we observe that it 
consumed m units of resources, and we can determine that other connections consumed n units of 
the same resources, then we will consider the available bandwidth as m 
m+n ; or, equivalently, that, 
over its lifetime, the connection competed with the equivalent of n 
m other connections like itself. 
We will use as our unit of resource the amounts of buffer space and transmission time 
the connection consumed at the bottleneck link. In x 15.2 we developed a notion of data packet 
i's ``load,'' – i , meaning how much delay it incurs due to queueing at the bottleneck behind its 
predecessors, plus its own bottleneck transmission time, OE i , which is directly determined by the 
packet's size and the bottleneck bandwidth. Let 
/ i = – i 
path is completely unloaded except for the connection's 
load itself (no competing traffic), then we should have / i = fl i , i.e., all of i's delay variation is due 
to queueing behind its predecessors. More generally, define 
fi = 
P 
i (/ i + OE i ) 
P 
j (fl j + OE j ) 
: (16.4) 
11 We do not, however, presume that the measurement technique for estimating how much bandwidth is available must 
also do so. 

356 
fi then reflects the proportion of the packet's delay due to the connection's own loading of the 
network. If fi ß 1, then all of the delay variation is due to the connection's own queueing load on 
the network, while, if fi ß 0, then the connection's load is insignificant compared to that of other 
traffic in the network. 
More generally, 
P 
i (/ i + OE i ) reflects the resources consumed by the connection, while 
X 
j 
(fl j + OE j ) 
load the bottleneck link (perhaps 
because its transmission perfectly matches the bottleneck rate) will exhibit 
X 
i 
/ i = 0: 
In this case, any slight variation in its OTTs, i.e., 
X 
j 
fl j = ffl ? 0; 
will result in fi = 0. But in this limiting case we want our evaluation to indicate that almost all the 
resource was available (as indicated by 
P 
j fl j being small), and this is exactly the limiting behavior 
of Eqn 16.4. 
Thus, fi captures the proportion of the total resources that were consumed by the connec­ 
tion itself, and we interpret fi as reflecting the available bandwidth. Values of fi close to 1 mean 
that the entire bottleneck bandwidth was available, and values close to 0 mean that almost none of 
it was actually available. 
Note that we can have fi ß 1 even if the connection does not consume all of the network 
path's capacity. All that is required is that, to the degree that the connection did attempt to consume 
network resources, they were readily available. This observation provides the basis for hoping that 
we might be able to use fi to estimate available bandwidth without fully stressing the network path. 
We can gauge how well fi truly reflects available bandwidth by computing the coefficient 
of correlation between fi and the connection's overall throughput (normalized by dividing by the 
bottleneck bandwidth). For N 1 , this is 0.44, while, for N 2 , it rises to 0.55. We conjecture that the 
difference is due to the use of bigger windows in N 2 (x 9.3), which lead to more opportunities for 
fast retransmission. Any time a connection times out, its overall throughput becomes greatly diluted 
by the lengthy timeout lull. 
Thus, the correlations, particularly for N 2 , indicate that fi is indeed a solid predictor of a 
connection's likely overall performance. It is not a perfect predictor, however, nor would we expect 
it to be: a TCP connection's overall throughput is affected by the number of retransmissions it 
incurs, whether any of these are timeout retransmissions, the receiver's offered window, the sender's 
internal window (x 11.3.2), how the TCP manages the congestion window, and the acking policy 
used by its remote peer, which determines how fast the slow­start sequence increases the window 
(x 11.6.1). 
Figure 16.26 and Figure 16.27 show the density of fi for N 1 and N 2 . Values less than 
zero and greater than one, which can result from erroneous estimates of ae B , have been adjusted 

357 
0.0 0.2 0.4 0.6 0.8 1.0 
0 
20 
40 
60 
80 
100 
120 
140 
N1 Inferred Available Bandwidth 
Figure 16.26: Distribution of N 1 inferred available bandwidth (fi) 
0.0 0.2 0.4 0.6 0.8 1.0 
0 
200 
400 
600 
800 
N2 Inferred Available Bandwidth 
Figure 16.27: Distribution of N 2 inferred available bandwidth (fi) 

358 
to zero and one, respectively. 12 Clearly, Internet connections encounter a broad range of available 
bandwidths, ranging from very little to almost all. N 1 's main mode lies at 0.30­0.35, corresponding 
to about two equivalent competing connections, while for N 2 this shifts considerably downward, to 
about 0.10­0.15, or eight equivalent competing connections. The overall decrease in fi between N 1 
and N 2 is clear, though the N 2 density diminishes less quickly than that of N 1 , indicating that for 
it, especially, the range of available bandwidth was indeed very broad. Unfortunately, it is difficult 
from these statistics to make a definitive statement about how available bandwidth changed over 
the course of 1995, because the use of bigger windows (x 9.3) in N 2 means that the notion of 
``equivalent connection'' is different between the two datasets. It is not clear how we could adjust 
for this difference in order to directly compare the two. 
Both densities exhibit two ``edge'' effects: a greatly diminished density at 0.0­0.05, and a 
second mode at 0.95--1.0. The first most likely reflects the measurement bias our experiment suffers 
from due to the limited lifetimes of each connection (x 9.3): those connections for which very little 
bandwidth was available often did not finish within the allotted ten minutes, and thus do not figure 
into the measured distribution of fi. 
The second mode at 0.95--1.0 at first appears to indicate that sometimes a network path 
is completely quiescent, and packets sail along it without any cross traffic perturbing them. This, 
however, turns out to only sometimes be the case. Closer inspection of those connections with 
fi ß 1 reveals that many are connections with low bottleneck bandwidths. These connections very 
often are able to completely fill the bottleneck link, because, even if the network can provide only a 
few non­bottleneck resources to the connection, these still suffice to drive the bottleneck at capacity. 
That is, the connection requires only modest resources available elsewhere to saturate the bottleneck 
link and achieve the maximum possible end­to­end performance. We summarize this effect as: If 
you only want to go slowly, the network often can provide enough resources for doing so. 
Figure 16.28 and Figure 16.29 show the same densities if we restrict the analysis to con­ 
nections with ae B – 100 Kbyte/sec. We see that, for N 1 , doing so completely eliminates the sec­ 
ondary ``all bandwidth available'' peak, though, for N 2 , it only slightly diminishes it. The difference 
again appears due to the use of bigger windows in N 2 . Figure 16.30 shows the N 2 densities if 
we restrict ourselves to ae B – 250 Kbyte/sec. Doing so eliminates the T1­ and E1­limited con­ 
nections, which with the bigger windows the N 2 connections could often fill to capacity, much as 
the N 1 connection could for the slower bottleneck links. Now, the second peak has disappeared, 
indicating that, at these speeds, the connections could no longer often utilize the entire bottleneck 
bandwidth. 13 We see that, overall, as path bandwidths increase, proportionally less bandwidth is 
available to connections using the path. This observation is not too surprising: higher bandwidths 
naturally attract higher traffic loads. 
Our observations so far have been based on the load, – i , and the bottleneck transmission 
time, OE i , per Eqn 16.3. Both are computed using the central bottleneck bandwidth estimate, ae B . 
The PBM algorithm, however, produces upper and lower bounds on the estimate, too, denoted by 
ae + 
B and ae 
12 We do not discard these connections because sometimes only a slight error in ae B will lead to an ``out of range'' 
estimate for fi, if the connection occurred at a time during which very little or almost all of the bandwidth was available. 
This point will be developed in more depth shortly. 
13 The depression at 0.0­0.05 has grown, too, a change likely due to the fact that, for high­bandwidth paths, a TCP 
connection can transfer 100 Kbyte in 10 minutes even in the face of many competing connections, so the measurement 
bias discussed earlier does not apply to such a large degree. 

359 
0.0 0.2 0.4 0.6 0.8 1.0 
0 
20 
40 
60 
80 
100 
N1 Inferred Available Bandwidth, 100KB/S or Faster 
Figure 16.28: Distribution of N 1 inferred available bandwidth (fi) for connections with bottleneck 
rates exceeding 100 Kbyte/sec 
0.0 0.2 0.4 0.6 0.8 1.0 
0 
200 
400 
600 
800 
N2 Inferred Available Bandwidth, 100KB/S or Faster 
Figure 16.29: Distribution of N 2 inferred available bandwidth (fi) for connections with bottleneck 
rates exceeding 100 Kbyte/sec 

360 
0.0 0.2 0.4 0.6 0.8 1.0 
0 
20 
40 
60 
80 
100 
120 
140 
N2 Inferred Available Bandwidth, 250KB/S or Faster 
Figure 16.30: Distribution of N 2 inferred available bandwidth (fi) for connections with bottleneck 
rates exceeding 250 Kbyte/sec 
0.0 0.2 0.4 0.6 0.8 1.0 
0 
20 
40 
60 
80 
100 
N1 Inferred Minimum Available Bandwidth 
Figure 16.31: Distribution of N 1 minimum inferred available bandwidth (fi) for connections with 
bottleneck rates exceeding 100 Kbyte/sec 

361 
0.0 0.2 0.4 0.6 0.8 1.0 
0 
20 
40 
60 
80 
100 
120 
140 
N1 Inferred Maximum Available Bandwidth 
Figure 16.32: Distribution of N 1 maximum inferred available bandwidth (fi) for connections with 
bottleneck rates exceeding 100 Kbyte/sec 
and lower bounds, respectively, and from them compute fi 
B – 100 Kbyte/sec. 
The density of fi 
about 0.1 to the right, except for a 
striking spike at fi ß 1. This spike is telling: three­quarters of it is for fi + ? 1, which is an unphys­ 
ical situation, namely, that the connection's load on the path exceeds the total variation observed 
on the path. Thus, the spike indicates that ae erroneously too low. 
Because it is too low, the corresponding loads, – + 
i , are too high. Furthermore, the loads can rapidly 
become much 
too high, due to self­clocking: if the connection is indeed transmitting at exactly the 
bottleneck rate, which self­clocking will promote in the absence of significant cross­traffic, then 
each packet's load will be zero, or perhaps will correspond to one additional packet at the bottle­ 
neck link if the receiver uses ack­every­other (so the window advances by two packets at a time). In 
this case, a slightly low estimate of ae 
we should not trust the variation between fi and fi + as reflecting the true 
error­bar range in fi's density; but that between fi and fi 
level of error 
is not large enough to alter any of the conclusions drawn above. 
As we might expect, we find that fi is inversely correlated with data packet loss rate. For 
both N 1 and N 2 , for connections with ae B – 100 Kbyte/sec, the coefficient of correlation between 

362 
0.0 0.2 0.4 0.6 0.8 1.0 
0 
50 
100 
150 
Inferred Available Bandwidth 
Figure 16.33: Distribution of N 2 inferred available bandwidth (fi) for U.S. connections 
fi and the loss rate is ariation and 
packet loss, which agrees with the widely held assumption that most packet loss in the Internet is 
due to congestion (which will first lead to delay variations as queues build up). The connection is 
not overwhelmingly strong, however, which we would also expect, because delay variation need not 
lead to packet loss if the congested element contains sufficient buffer space to absorb the variation. 
That fi and loss rate are negatively correlated suggests that we might find significant re­ 
gional variation in fi, much as we did for loss rates in x 15.1. Indeed, we do. Figure 16.33 shows 
fi for connections with ae B – 100 Kbyte/sec and with both sender and receiver sited in the United 
States. Figure 16.34 shows the same for sender and receiver both sited in Europe. Clearly, European 
sites suffer from much lower fi's than their U.S. counterparts, with the mean (and median) European 
fi at 40%, while for the U.S. connections, it is just under 60%. 
The last aspect of available bandwidth we investigate is how it evolves over time. To do 
so, we group connections with the same source and destination hosts together, after eliminating any 
with ae B ! 100 Kbyte/sec. For successive connections c and c 0 in each group, we compute the pair 
h\DeltaT c ; j\Deltafi c ji, where \DeltaT c is the time between c and c 0 , and j\Deltafi c j is the magnitude of the difference 
between the computed fi's for each connection. 
After constructing these pairs, we sort them on \DeltaT c and then compute j\Deltafi c j smoothed 
using an exponentially­weighted moving average with ff = 0:01 and an initial value of 0. Fig­ 
ure 16.35 shows the resulting smoothed evolution for the N 2 dataset. (The N 1 dataset exhibits a 
similar evolution.) We see that j\Deltafi c j almost immediately rises to about 0.12, which is somewhat 
higher than the error range we estimated for fi above, but not greatly higher. 14 This level is sus­ 
14 The exponential smoothing, along with starting the averaging with an value of 0, limits how rapidly the plot can 
reach this level. This is what creates the plotting artifact of what appears to be a rapid climb, falsely suggesting that 
j\Deltafi c j is significantly smaller for very low inter­connection times. A more sound interpretation is that even for very low 
inter­connection times, we will usually find j\Deltafi c j already quite close to 0.12. 

363 
0.0 0.2 0.4 0.6 0.8 1.0 
0 
20 
40 
60 
Inferred Available Bandwidth 
Figure 16.34: Distribution of N 2 inferred available bandwidth (fi) for European connections 
. . . . . . . . . . . . . . . . . . . . . . . .. . 
. . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . 
. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... . .. . . . . . . . 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
. . . . . . . . . . . . . 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
. . . . . .. . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . 
. . . . . . . . . . . . . 
. . . . . . . . . . . . . . . . . . . .. . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . .. . . . . . . .. 
. . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . 
. . . . . . . . . . . . . . . . . .. . . . 
. . . . . . . . . . . . . . . 
. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . .. . . . . . . . ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . .. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . .. . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . . . 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. . . . . . . . . .. . . . . 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . ... . . .. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . 
. . . . . . . . ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
. . . 
. . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . .. . . . . . . . .. . . . .. . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . .. . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . .. 
.. . . . . . . . . . . . . . . . . . . .. . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
. . . . . . . .. . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 
. . . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .. . . . . .. . . . . .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
. . . . . . . . . . . . . . . . . . . . . .. . .. . . . 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
. . . . . . 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
. . . . . . . . . . . . . . . . . . .. . . . . . . .. . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .. . . .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . .. . . . . . . . .. .. . . . . . . . . . . . . . . . .. . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . .. . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . .. . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 
. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . .. . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . .. . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . .. . . . . . . . . . . . . . . . . . . . .. . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
Time Between Connections (sec) 
|Delta 
Beta| 
10^2 10^3 10^4 10^5 10^6 
0.0 
0.05 
0.10 
0.15 
0.20 
Figure 16.35: Evolution of difference between inferred available bandwidth (fi) for successive 
connections 

364 
tained for a number of hours, after which it increases markedly, by about 50%. The transition no 
doubt coincides with the diurnal cycle we noted in x 15.1: the network is much more congested dur­ 
ing working hours than during off hours. Since the predictive power is, qualitatively, fairly good for 
time scales of several hours, we conclude that transport connections can fruitfully cache information 
regarding a path's available bandwidth for use in subsequent connections. 

365 
Chapter 17 
Summary 
We endeavored in this work to characterize a number of aspects of end­to­end Internet 
dynamics in general, meaningful ways. The Internet's great diversity makes this undertaking im­ 
mensely challenging. 
At the heart of our study lies the NPD measurement framework, in which a number of 
sites around the Internet run a specialized daemon that provides measurement services to authen­ 
ticated users. The key scaling property of this framework is that, for N participating sites, it can 
probe O(N 2 ) Internet paths. This scaling enabled us to probe over 1,000 Internet paths, due to the 
participation of 37 sites. Consequently, the data for our analysis is more than an order of magnitude 
richer than that available for previous end­to­end studies, and a serious argument can be made that 
we can indeed extrapolate our findings to conclusions about Internet paths in general. 
17.1 The routing study 
In Part I, we used the NPD framework to study the dynamics of end­to­end routing in the 
Internet, using two experimental runs, one at the end of 1994 and one at the end of 1995. The results 
were discussed in Chapter 2; here, we briefly summarize them. 
We began by characterizing routing pathologies, as we must first identify anomalies be­ 
fore proceeding to analysis of more typical behavior, lest they skew our results. We cataloged 
a number of pathologies, including loops, outages, and flutter. Furthermore, the prevalence of 
pathologies significantly increased between the 1994 dataset and the 1995 dataset, indicating that 
routing degraded over the course of 1995. 
We next analyzed routing stability, first developing a distinction between two orthogonal 
types of stability, routing ``prevalence'' and routing ``persistence.'' We found that most Internet paths 
are heavily dominated by a single dominant route, but that the length of time over which routes 
persist varies greatly, from seconds to many days. 
We finished our look at routing with an assessment of routing symmetry. While asymme­ 
tries have little direct impact on end­to­end performance, they introduce significant measurement 
problems, because they cloud the accuracy of the easiest form of measurement, ``sender­only'' mea­ 
surement, in which no receiver cooperation is required. We found that about half of all Internet 
routes exhibited a major asymmetry, in which at least one city differed between the route from 
A to B versus that from B to A. 

366 
17.2 The packet dynamics study 
The goal of Part II of our study was to use the NPD framework to measure end­to­end 
Internet packet dynamics. We recorded over 20,000 TCP transfers at both sender and receiver, 
again in two experimental runs. Faced with such a large volume of data, we adopted the strat­ 
egy of developing an analysis tool, tcpanaly, for automating the ``micro­analysis'' of individual 
connections. 
Our goal was to develop meaningful characterizations of end­to­end packet delays. To do 
so required a great deal of preparatory work, to assure that the analysis rested upon sound measure­ 
ments. 
17.2.1 Measurement calibration and TCP behavior 
We first needed to devise techniques for calibrating the measurement data, to assure that 
we did not misinterpret measurement artifacts for bona fide networking effects. The measurement 
process could fail in two basic ways: by misrecording which packets traversed the network, and 
by misrecording the times at which they appeared. We found that packet filters can: fail to record 
packets; record packets more than once; truncate the beginning or end of trace files; and rearrange 
the sequencing of packets. Accordingly, we developed tests so that tcpanaly can detect these 
events. We further developed the important notion of a packet filter's ``vantage point,'' meaning 
where in the network path it observed the traffic. A filter's vantage point can introduce ambiguities 
in the apparent chain of cause­and­effect, which can only be removed with considerable care. 
Hand­in­hand with calibrating the integrity of the traffic traces comes the problem of 
identifying the exact behavior of the TCP implementations used by the sending and receiving hosts. 
Often, the only way to accurately gauge the integrity of a traffic trace is by knowing in intimate 
detail how the TCPs participating in the connection behave and respond. Apparent deviations from 
this behavior then indicate a likely lack of integrity in the traffic trace, if the behavior has indeed 
been correctly characterized. 
tcpanaly holds promise as a valuable tool for analyzing TCP behavior, useful both in 
its own right for diagnosing performance and congestion problems, and also as a way to account 
for the separate effects on a connection's dynamics of the behavior of the TCP endpoints versus 
that of the connection's Internet path. In the course of its development, we found a wide range 
of TCP behaviors, some of which have major, negative performance and stability implications for 
the associated TCPs. The most serious problems include excessive retransmissions and failures 
to correctly diminish the transmission rate during periods of congestion. Indeed, if some of these 
TCPs were ubiquitous in the Internet, the network would quite simply cease to function, due to 
``congestion collapse.'' 
In the process of this analysis, we observed that the TCPs with the most serious problems 
were the only two in our study written independently from the ``BSD­derived'' implementations that 
directly benefited from much of the fundamental TCP research. To investigate this observation, we 
analyzed three additional implementations, finding a mid­level performance problem in one, a major 
performance problem in another (but one possibly due to use of a specific network interface card), 
and severe performance and stability problems in the third. Thus, our findings strongly argue that 
implementing TCP correctly is exceptionally difficult. Given that Internet stability relies on TCP 
correctness, it therefore behooves the Internet community to take energetic steps towards providing 

367 
analysis tools and reference implementations to aid the efforts of implementors. 
17.2.2 Timing calibration 
Armed with the ability to detect inaccurate packet traces and to distinguish between TCP­ 
induced effects and networking effects, we next turned to the difficult problem of calibrating the 
packet timings. The effort continued to be driven by the ultimate goal of analyzing end­to­end 
packet delays. To do so requires comparing pairs of unsynchronized clocks, namely those used by 
the tracing programs at the sender and receiver. We developed algorithms for (1) estimating clock 
resolution, (2) synchronizing clocks post facto, (3) detecting clock adjustments, and (4) detecting 
and removing relative clock skew. This last is particularly important because, if undetected, relative 
clock skew leads to variations in apparent packet delays quite similar to those of genuine networking 
effects. We found that it is fairly common for a pair of clocks to exhibit discernible relative skew. 
We also found that the fact that two clocks agree quite closely does not eliminate the possibility that 
the clocks suffer from problems such as adjustments and relative skew. 
17.2.3 Network pathologies 
With our measurements fully calibrated, we could then turn to analyzing packet dynamics. 
We began by characterizing packet­forwarding pathologies: out­of­order delivery, packet replica­ 
tion, and packet corruption. We found that the frequency with which packets arrive in a different 
order than sent varies enormously among Internet paths. While reordering often occurs in conjunc­ 
tion with the route ``flutter'' pathology, we also observed numerous instances in which it occurred 
in the absence of flutter, and some instances in which massive reordering events occurred due to 
``pauses'' in router forwarding. Finally, the possibility of reordering limits how quickly a TCP 
sender can infer a packet loss using the ``fast retransmission'' mechanism. We investigated whether, 
based on our data, this mechanism could be altered to retransmit more efficiently. We found that 
we could only do so if we required changes at both the TCP sender and receiver. Consequently, we 
might as well instead change the sender and receiver to use the more sophisticated TCP ``selective 
acknowledgement'' extension, now being standardized [MMFR96]. 
We found that the curious phenomenon of packet replication---the network delivering a 
single packet more than once---does indeed occur, but it is exceptionally rare. On the other hand, 
our analysis of packet corruption suggests that, overall, about 1 Internet data packet in 5,000 arrives 
with data different than what was originally sent. This rate is high enough that, given TCP's 16­bit 
checksum, about one packet in 300,000,000 will be accepted with undetected errors. The Internet 
carries many more packets than this each day. 
17.2.4 Estimating bottleneck bandwidth 
We next turned to the problem of identifying a network path's bottleneck bandwidth. We 
needed to do so before analyzing packet loss and delay because the bottleneck bandwidth deter­ 
mines what we call the ``self­interference time constant,'' Q b . Two data packets of size b sent less 
than an interval Q b apart must necessarily queue at the bottleneck element of the network path. Thus, 
knowledge of Q b enables us to determine which of our measurement probes were perforce corre­ 
lated. It further plays a major role in assessing packet loss, because we want to distinguish between 

368 
the loss of data packets that we know had to queue behind their predecessors (``self­interference''), 
versus those lost even though they did not have to queue on account of the connection's own loading 
of the network path. 
We discussed how the main existing technique for estimating bottleneck bandwidth, 
``packet pair,'' could produce incorrect estimates. These can occur in the presence of: excessive 
noise; packet reordering; changes in the bottleneck bandwidth; or network paths in which the bot­ 
tleneck is comprised of multiple, separate channels or links. This last case is particularly interesting, 
because it leads to erroneously large bottleneck estimates even if the network is completely quies­ 
cent. The problem lies in the fundamental assumption made by packet pair that packets must queue 
behind one another at the bottleneck and be served by it one at a time. For a multi­channel or multi­ 
link bottleneck, however, this assumption does not in fact apply, and a pair of packets can traverse 
the bottleneck without it altering the spacing between them. 
These observations motivated us to devise a robust algorithm for estimating bottleneck 
bandwidth, based on ``packet bunch modes'' (PBM). By focussing on identifying multiple modes in 
the distribution of the estimated bottleneck bandwidth, PBM can accommodate errors introduced 
by noise, as well as detecting changes in bottleneck bandwidth and the presence of multi­channel 
links. By using receiver­based measurement, it also can cope with packet reordering, and with the 
possibility of asymmetries in the bottleneck bandwidths along the two directions of a network path. 
We calibrated PBM by testing whether we could associate known, common link speeds 
with its estimates. We found that we could almost always do so. Once we had faith in PBM's 
accuracy, we could then test other estimation methods against PBM to see how well they perform. 
We found that receiver­based packet pair performs almost as well, if we can tolerate failing to detect 
shifts in bottleneck bandwidth or multi­channel links, both of which prove rare. Sender­based packet 
pair, however, does not perform nearly as well, due to the additional noise incurred by measuring 
timings that reflect the traversal of packets in both of a path's directions. Finally, we find that about 
20% of the time, a path's two directions have asymmetric bottleneck bandwidths, but that, along a 
single direction, the bottleneck generally remains constant over lengthy periods of time. 
One drawback with PBM is that it is ad hoc to an unsatisfying degree. It uses a consid­ 
erable number of heuristics that can only be defended on the basis that they appear to work well 
in practice. We found this acceptable (if regrettable), because for our study bottleneck bandwidth 
estimation was fundamentally only a stepping stone to the later analysis, and not an end in itself. 
We hope, however, that the basic ideas underlying PBM---searching for multiple modes and inter­ 
preting the ways they overlap in terms of bottleneck changes and multi­channel paths---might be 
revisited in the future, in an attempt to develop them in a more systematic fashion. 
17.2.5 Packet loss 
We now could turn to analyzing patterns of packet loss in the Internet. We found that over 
the course of 1995, packet loss rates nearly doubled, indicating a marked degradation in service. 
However, these rates required further inspection to understand their implications. We first developed 
the notion of the network having two general states, ``quiescent,'' corresponding to periods of no loss, 
and ``busy,'' corresponding to periods in which connections observe at least one loss. The proportion 
of quiescent connections did not change appreciably during 1995; instead, the loss rate increases 
were due to higher levels of loss during busy periods. 
We also distinguished between three different types of lost packets: ``loaded'' data packets, 

369 
meaning those that necessarily had to wait at the bottleneck behind one or more of their predeces­ 
sors; ``unloaded'' data packets, meaning those that did not have to queue behind predecessors, unless 
cross traffic arrived and delayed their predecessors; and acknowledgements. 
We found that loaded packets are much more likely to suffer high loss rates than unloaded 
packets, which is not surprising, since they encounter not only the ambient network load but that of 
their predecessors; and that acks are more likely to be lost than unloaded packets (or even loaded 
packets, for high loss rates). We interpret these findings as reflecting the fundamental difference 
between data packets being sent at a rate that adapts in an effort to diminish packet loss, and acks 
being sent at a rate that does not adapt to the rate at which acks are lost. This finding highlights how 
the loss rates observed by a TCP connection's data packets differ from the unconditional loss rates 
along the path they traverse. 
The last comparison between data packet and ack loss rates we made was to determine 
the degree of correlation between the two rates for a single connection. We found that the two are 
nearly uncorrelated, indicating that this fundamental property of a network path is asymmetric. 
We next found that different major regions of the Internet---the United States, Europe, 
and connections from one to the other---experienced very different loss rates. Then, after showing 
that loss rates follow the well­known diurnal cycle reflecting working hours and off­work hours, 
we analyzed variations in the time of day during which our measurement apparatus succeeded in 
executing a measurement. For North American sites, these successes were uniformly spread over the 
24 hours of each day. For European sites, though, the frequency of successes dipped to low points 
in patterns that closely matched the loss­rate cycle, indicating that our European measurements 
suffered from a discernible bias towards underestimating loss rates. 
Another question we investigated was whether packet loss events are well­modeled as 
independent, since this assumption is sometimes made when theorizing about network behavior. We 
found that loss events are instead strongly correlated. Furthermore, the duration of loss ``outages'' 
exhibits infinite variance, which accords with a recent model of how individual connection behavior 
can give rise to ``self­similar'' aggregate traffic behavior [WTSW95]. 
We then looked at the question of where packets are lost along an Internet path. In partic­ 
ular, whether they are lost before or after the bottleneck element. From careful analysis of timing 
information we can sometimes distinguish between these two. We found that, while most losses 
occur at or before the bottleneck, a significant minority (roughly 25%) occur after. 
We next evaluated how packet loss rates evolve over time, with an eye towards gauging 
the efficacy of caching packet loss statistics associated with a path in order to predict future path 
performance. We found that a path's state, in terms of ``quiescent'' or ``busy,'' is a good predictor of 
its future state for many hours, but a path's observed loss rate is not a good predictor of its future 
loss rate. 
We then investigated how efficiently TCP implementations retransmit. We found that, for 
some implementations, the large majority of their retransmissions are unnecessary. Fixing these 
implementations and deploying the SACK extension would eliminate nearly all of the unnecessary 
retransmissions. 
17.2.6 Packet delay 
We finished our study with an analysis of end­to­end packet transit delays. We found 
that both round­trip times (RTTs) and one­way transit times (OTTs) exhibit great ``peak­to­peak'' 

370 
variation. OTT variations for the most part are asymmetric. The only clear correlation occurs 
between the order­of­magnitude (logarithm) variation in the two directions. On the other hand, 
OTT variation is clearly correlated with packet loss rates, as we would expect. We further found 
that OTT variation is not a good predictor of future OTT variation, in accord with the finding that 
packet loss rates are not good predictors of future loss rates. 
We then turned to an assessment of packet timing compression, in which a group of pack­ 
ets arrives at their receiver more closely spaced than when they were sent. We identify three types of 
compression: ack compression, data packet compression, and receiver compression. Each requires 
somewhat different assessment considerations. Overall, none of the three types occur frequently 
enough to pose a significant problem in terms of network performance and stability. Their pres­ 
ence does, however, complicate path measurement efforts, which must use judicious filtering to 
avoid mistaking compression events for different network effects, such as a temporary increase in 
bottleneck bandwidth. 
We next investigated the time scales over which queueing occurs, by determining on 
which time scales we observed the maximum sustained and peak OTT variations. We found that 
both occur most frequently on time scales of about 100--1000 msec, though, as with many Inter­ 
net phenomena, we also found a wide range of behavior beyond this region. (In particular, we 
sometimes found maximal queueing occurring on much longer time scales.) 
The last aspect of packet delay we analyzed was the degree to which it reflects available 
bandwidth. We did this by studying the ratio between the delay a packet incurred due to its connec­ 
tion's own loading of the network path, versus the total delay it incurred. This ratio correlates well 
with the overall throughput achieved by a connection. However, we also showed that the accuracy 
of the ratio is diminished by the presence of errors in estimating the bottleneck bandwidth. 
We observed a distinct decrease in available bandwidth over the course of 1995, though 
we also observed significant regional variation, with U.S. sites enjoying considerably more available 
bandwidth than European sites. Finally, we investigated how available bandwidth evolves over 
time. We found that a connection's available bandwidth is a fairly good predictor of future available 
bandwidth out to time scales of hours. 
17.3 Future research 
There are three general areas of future work suggested by our research. First, our original 
goal when proposing the research was to use end­to­end measurements to drive the development of 
new algorithms for how transport protocols can adapt to changing network conditions. We had to 
abandon this goal once the scope of analyzing the measurements themselves became apparent, but 
clearly an important potential benefit of end­to­end characterization such as we have undertaken is 
to better optimize how connections use the network. 
Closely related to developing such new algorithms is the question of fast estimation of 
Internet path behavior. The algorithms we developed for calibrating network clocks (Chapter 12), 
estimating bottleneck bandwidth (Chapter 14), and assessing queueing time scales and available 
bandwidth (Chapter 16) all in their present form analyze entire connection traces. Yet, transport 
connections clearly need to make decisions based on path properties quickly, and cannot afford the 
luxury of analyzing the fate of several hundred packets. Our work, though, can play a key role in 
developing fast estimation techniques, because the algorithms we developed can then be used to 

371 
calibrate the faster algorithms. 
Finally, the NPD framework serves well to address the issue of capturing reasonably rep­ 
resentative samples of a cross­section of Internet path behavior. Another important form of Internet 
heterogeneity, however, is how Internet traffic changes with time. Only longitudinal studies can 
address such ``temporal'' heterogeneity. We have attempted to touch on this issue by capturing two 
datasets spaced a year apart. Clearly, though, we need longer­term studies to develop solid conclu­ 
sions about traffic trends. We believe this goal can be met in conjunction with the development of 
an Internet ``measurement infrastructure,'' that is, large­scale deployment of NPD­like measurement 
platforms. We do not claim that the NPD framework can simply be scaled up to serve as this infras­ 
tructure; indeed, the problem of an infrastructure that can scale to the full Internet is the key research 
problem for the infrastructure. But, if accomplished, such an infrastructure could serve, through the 
accumulated archives of its measurements, as the basis for longitudinal studies; and, even more 
significantly, as a mechanism for assessing and improving the overall health of the network. 
17.4 Themes of the work 
Several themes emerge from our study: 
ffl The N 2 scaling property of our measurement framework serves to measure a sufficiently di­ 
verse set of Internet paths that we might plausibly interpret the resulting analysis as accurately 
reflecting general Internet behavior. 
ffl To cope with such large­scaled measurements requires attention to calibration using self­ 
consistency checks; robust statistics to avoid skewing by outliers; and automated ``micro­ 
analysis,'' such as that performed by tcpanaly, that we might see the forest as well as the 
trees. 
ffl With due diligence to remove packet filter errors and TCP effects, TCP­based measurement 
provides a viable means for assessing end­to­end packet dynamics. 
ffl We find wide ranges of behavior, so we must exercise great caution in regarding any aspect 
of packet dynamics as ``typical.'' 
ffl Some common assumptions such as in­order packet delivery, FIFO bottleneck queueing, in­ 
dependent loss events, single congestion time scales, and path symmetries are all violated, 
sometimes frequently. 
ffl The combination of path asymmetries and reverse­path noise renders sender­only measure­ 
ment techniques markedly inferior to those that include receiver cooperation. 
This last point argues that, when the measurement of interest concerns a unidirec­ 
tional path---be it for measurement­based adaptive transport techniques such as TCP Vegas 
[BOP94], or general Internet performance metrics such as those in development by the IPPM effort 
[A+96, Pa96a]---the extra complications incurred by coordinating the sender and receiver are worth 
the effort. 
Finally, we believe an important aspect of this work is how it might contribute towards 
developing a ``measurement infrastructure'' for the Internet: one that proves ubiquitous, informative, 
and sound. 

372 
Bibliography 
[A+96] G. Almes et al., ``Framework for IP Provider Metrics,'' Internet draft, ftp://ftp.isi.edu/ 
internet­drafts/draft­ietf­bmwg­ippm­framework­00.txt, Nov. 1996. 
[AW96] M. Arlitt and C. Williamson, ``Web Server Workload Characterization: The Search for 
Invariants,'' Proceedings of SIGMETRICS '96, Philadelphia, May 23­26, 1996. 
[Aw90] B. Awerbuch, ``Shortest Paths and Loop­Free Routing in dynamic networks (Extended 
Abstract),'' Proceedings of SIGCOMM '90, pp. 177­187, September 1990. 
[Ba95] F. Baker, Ed., ``Requirements for IP Version 4 Routers,'' RFC 1812, DDN Network Infor­ 
mation Center, June 1995. 
[Ba94] A. Banerjea, Fault Management for Realtime Networks, Ph.D. thesis, University of Cali­ 
fornia, Berkeley, 1994. 
[BDG95] C. Baransel, W. Dobosiewicz, and P. Gburzynski, ``Routing in Multihop Packet Switching 
Networks: Gb/s Challenge,'' IEEE Network, 9(3), pp. 38­61, May/June 1995. 
[BCW88] R. Becker, J. Chambers, and A. Wilks, The New S Language, Wadsworth & 
Brooks/Cole, 1988. 
[Be95] S. Bellovin, ``Using the Domain Name System for System Break­ins,'' Proceedings of the 
5th USENIX UNIX Security Symposium, Salt Lake City, June 1995. 
[BCLF+] T. Berners­Lee et al., ``The World­Wide Web,'' Communications of the ACM, 37(8), 
pp. 76­82, August 1994. 
[Be82] D. Bertsekas, ``Dynamic Behavior of Shortest Path Routing Algorithms for Communica­ 
tion Networks,'' IEEE Transactions on Automatic Control, AC­27, pp. 60­74, February 
1982. 
[BM92] I. Bilinskis and A. Mikelsons, Randomized Signal Processing, Prentice Hall Interna­ 
tional, 1992. 
[Bi95] P.G. Bilse, private communication, October 16, 1995. 
[Bo93] J­C. Bolot, ``End­to­End Packet Delay and Loss Behavior in the Internet,'' Proceedings of 
SIGCOMM '93, pp. 289­298, September 1993. 

373 
[BCG95] J­C. Bolot, H. Cr’epin, and A.V. Garcia, ``Analysis of Audio Packet Loss in the Internet,'' 
Proceedings of the 5th International Workshop on Network and Operating System Support 
for Digital Audio and Video, Durham, New Hampshire, April 1995. 
[BBJ92] D. Borman, R. Braden and V. Jacobson, ``TCP Extensions for High Performance,'' 
RFC 1323, Network Information Center, SRI International, Menlo Park, CA, May 1992. 
[BJ88] R. Braden and V. Jacobson, ``TCP extensions for long­delay paths,'' RFC 1072, Network 
Information Center, SRI International, Menlo Park, CA, October 1988. 
[Br89] R. Braden, Ed., ``Requirements for Internet Hosts---Communication Layers,'' RFC 1122, 
Network Information Center, SRI International, Menlo Park, CA, October 1989. 
[Br94] R. Braden, ``T/TCP --- TCP Extensions for Transactions: Functional Specification,'' 
RFC 1644, DDN Network Information Center, July 1994. 
[BCS94] R. Braden, D. Clark, and S. Shenker, ``Integrated Services in the Internet Architecture: an 
Overview,'' RFC 1633, DDN Network Information Center, June 1994. 
[BOP94] L. Brakmo, S. O'Malley, and L. Peterson, ``TCP Vegas: New Techniques for Congestion 
Detection and Avoidance,'' Proceedings of SIGCOMM '94, pp. 24­35, September 1994. 
[BP95a] L. Brakmo and L. Peterson, ``TCP Vegas: End to End Congestion Avoidance on a Global 
Internet,'' IEEE JSAC, 13(8), pp. 1465­1480, October 1995. 
[BP95b] L. Brakmo and L. Peterson, ``Performance Problems in BSD4.4 TCP,'' Computer Com­ 
munication Review, 25(5), pp. 69­84, October 1995. 
[BE90] L. Breslau and D. Estrin, ``Design of Inter­Administrative Domain Routing Protocols,'' 
Proceedings of SIGCOMM '90, pp. 231­241, September 1990. 
[BMR97] N. Brownlee, C. Mills, and G. Ruth, ``Traffic Flow Measurement: Architecture,'' 
RFC 2063, DDN Network Information Center, January 1997. 
[CC96a] R. Carter and M. Crovella, ``Measuring Bottleneck Link Speed in Packet­Switched Net­ 
works,'' Technical Report BU­CS­96­006, Computer Science Department, Boston Uni­ 
versity, March 1996. 
[CC96b] R. Carter and M. Crovella, ``Dynamic Server Selection using Bandwidth Probing in 
Wide­Area Networks,'' Technical Report BU­CS­96­007, Computer Science Department, 
Boston University, March 1996. 
[Cha95] C. Chatfield, ``Model uncertainty, data mining and statistical inference,'' Journal of the 
Royal Statistical Society A, 158:419--466, 1995. 
[Che95] E. Chen, ``Symmetric routing in a Multi­Provider Internet,'' North American Network 
Operators' Group, May 1995 Meeting Notes, S. Barber, Ed., http://www.academ.com/ 
nanog/may1995/symmetric.html. 

374 
[CB94] W. Cheswick and S. Bellovin, Firewalls and Internet Security: Repelling the Wily Hacker, 
Addison­Wesley, 1994. 
[Ch93] B. Chinoy, ``Dynamics of Internet Routing Information,'' Proceedings of SIGCOMM '93, 
pp. 45­52, September 1993. 
[CBP94] K. Claffy, H­W. Braun and G. Polyzos, ``Tracking Long­Term Growth of the NSFNET,'' 
Communications of the ACM, 37(8), pp. 34­45, Aug. 1994. 
[CBP95] K. Claffy, H­W. Braun and G. Polyzos, ``A Parameterizable Methodology for Internet 
Traffic Flow Profiling,'' IEEE JSAC, 13(8), pp. 1481­1494 October 1995. 
[CPB93a] K. Claffy, G. Polyzos and H­W. Braun, ``Measurement Considerations for Assessing 
Unidirectional Latencies,'' Internetworking: Research and Experience, 4 (3), pp. 121­ 
132, September 1993. 
[CPB93b] K. Claffy, G. Polyzos, and H­W. Braun, ``Traffic Characteristics of the T1 NSFNET 
Backbone,'' Proceedings of INFOCOM '93, San Francisco, March, 1993. 
[Cl82] D. Clark, ``Window and Acknowledgement Strategy in TCP,'' RFC 813, Network Infor­ 
mation Center, SRI International, Menlo Park, CA, July 1982. 
[Cl88] D. Clark, ``The Design Philosophy of the DARPA Internet Protocols,'' Proceedings of 
SIGCOMM '88, pp. 106­114, August 1988. 
[CJRS89] D. Clark, V. Jacobson, J. Romkey, and H. Salwen, ``An Analysis of TCP Processing 
Overhead,'' IEEE Communications, pp. 23­29, June 1989. 
[CSZ92] D. Clark, S. Shenker, and L. Zhang, ``Supporting Real­Time Applications in an Integrated 
Services Packet Network: Architecture and Mechanism,'' Proceedings of SIGCOMM '92, 
pp. 14­26, August 1992. 
[Co90] S. Cohn, ``Arpanet Routing,'' in Fault­Tolerant Distributed Computing, B. Simons and A. 
Spector, editors, Springer­Verlag, 1990. 
[CL94] D. Comer and J. Lin, ``Probing TCP Implementations,'' Proceedings of the 1994 Summer 
USENIX Conference, Boston, MA. 
[Co91­95] A. Cooper, Ed., ``Internet Monthly Reports,'' http://www.isi.edu:80/in­notes/imr/. 
[CB96] M. Crovella and A. Bestavros, ``Self­Similarity in World Wide Web Traffic: Evidence and 
Possible Causes,'' Proceedings of SIGMETRICS '96, Philadelphia, May 23­26, 1996. 
[CW91] J. Crowcroft and I. Wakeman, ``Traffic Analysis of some UK­US Academic Network 
Data,'' Proceedings of INET '91, Copenhagen, June 1991. 
[DS86] R. B. D'Agostino and M. A. Stephens, editors, Goodness­of­Fit Techniques, Marcel 
Dekker, Inc., 1986. 
[DHS93] P. Danzig, R. Hall, and M. Schwartz, ``A Case for Caching File Objects Inside Internet­ 
works,'' Proceedings of SIGCOMM '93, San Francisco, September 1993. 

375 
[DJCME92] P. Danzig, S. Jamin, R. C’aceres, D. Mitzel, and D. Estrin, ``An Empirical Workload 
Model for Driving Wide­area TCP/IP Network Simulations,'' Internetworking: Research 
and Experience, 3(1), pp. 1­26, 1992. 
[DOK92] P. Danzig, K. Obraczka, and A. Kumar, ``An Analysis of Wide­Area Name Server Traf­ 
fic,'' Proceedings of SIGCOMM '92, Baltimore, Aug. 1992. 
[DJM97] S. Dawson, F. Jahanian, and T. Mitton, ``Experiments on Six Commercial TCP Imple­ 
mentations Using a Software Fault Injection Tool,'' to appear in Software: Practice & 
Experience. 
[DC90] S. Deering and D. Cheriton, ``Multicast Routing in Datagram Internetworks and Extended 
LANs,'' ACM Transactions on Computer Systems, 8(2), pp. 85­110, May 1990. 
[DEFJLW94] S. Deering, D. Estrin, D. Farinacci, V. Jacobson, C­G. Liu, and L. Wei, ``An Archi­ 
tecture for Wide­Area Multicast Routing,'' Proceedings of SIGCOMM '94, pp. 126­135, 
September 1994. 
[DB95] L. Delgrossi and L. Berger, Ed., ``Internet Stream Protocol Version 2 (ST2), Proto­ 
col Specification --- Version ST2+,'' RFC 1819, DDN Network Information Center, 
August 1995. 
[DISA95] Defense Information Systems Agency DDN Program Office, ftp://nic.ddn.mil/netinfo/ 
asn.txt; August 1995. 
[Do95] S. Doran, ``Route Flapping,'' with notes by Stan Barber, http://www.merit.edu/ 
routing.arbiter/NANOG/2.95.NANOG.notes/route­flapping.html. 
[DMT96] R. Durst, G. Miller and E. Travis, ``TCP Extensions for Space Communications,'' Pro­ 
ceedings of MOBICOM '96, pp. 15­26, November 1996. 
[El96] R. Elz, private communication, January 12, 1996. 
[ERH92] D. Estrin, Y. Rekhter and S. Hotz, ``Scalable Inter­Domain Routing Architecture,'' Pro­ 
ceedings of SIGCOMM '92, pp. 40­52, August 1992. 
[EHS92] D. Ewing, R. Hall, and M. Schwartz, ``A Measurement Study of Internet File Transfer 
Traffic,'' Report CU­CS­571­92, Department of Computer Science, University of Col­ 
orado, Boulder, 1992. 
[FF96] K. Fall and S. Floyd, ``Simulation­based Comparisons of Tahoe, Reno, and SACK TCP,'' 
Computer Communication Review, 26(3), pp. 5­21, July 1996. 
[Fe90] D. Ferrari, ``Client Requirements for Real­Time Communication Services,'' IEEE Com­ 
munications, pp. 65­72, November 1990. 
[FBZ94] D. Ferrari, A. Banerjea and H. Zhang, ``Network support for multimedia: A discussion 
of the Tenet approach,'' Computer Networks and ISDN Systems, 26(10), pp. 1267­1280, 
July 1994. 

376 
[FGV95] D. Ferrari, A. Gupta and G. Ventre, ``Distributed Advance Reservation of Real­Time Con­ 
nections,'' Proceedings of the 5th International Workshop on Network and Operating Sys­ 
tem Support for Digital Audio and Video, Durham, New Hampshire, April 1995. 
[Fl91] S. Floyd, ``Connections with Multiple Congested Gateways in Packet­Switched Net­ 
works, Part 1: One­way Traffic,'' Computer Communication Review, 21(5), pp. 30­47, 
October 1991. 
[FJ92] S. Floyd and V. Jacobson, ``On Traffic Phase Effects in Packet­switched Gateways,'' In­ 
ternetworking: Research and Experience, 3(3), pp. 115­156, September 1992. 
[FJ93] S. Floyd and V. Jacobson, ``Random Early Detection Gateways for Congestion Avoid­ 
ance,'' IEEE/ACM Transactions on Networking, 1(4), pp. 397­413, August 1993. 
[FJ94] S. Floyd and V. Jacobson, ``The Synchronization of Periodic Routing Messages,'' 
IEEE/ACM Transactions on Networking, 2(2), pp. 122­136, April 1994. 
[FL91] H. Fowler and W. Leland, ``Local Area Network Traffic Characteristics, with Implica­ 
tions for Broadband Network Congestion Management,'' IEEE JSAC, 9(7), pp. 1139­ 
1149, September 1991. 
[FJ70] E. Fuchs and P. E. Jackson, ``Estimates of Distributions of Random Variables for Certain 
Computer Communications Traffic Models,'' Communications of the ACM, 13(12), pp. 
752­757, December 1970. 
[GR95] A. Gupta and K. Rothermel, ``Fault handling for multi­party real­time communication,'' 
Technical Report TR­95­059, International Computer Science Institute, University of Cal­ 
ifornia, Berkeley, October 1995. 
[G90] R. Gusella, ``A Measurement Study of Diskless Workstation Traffic on an Ethernet,'' IEEE 
Transactions on Communications, 38(9), pp. 1557­1568, September 1990. 
[GK97] E. Gustafsson and G. Karlsson, ``A Literature Survey on Traffic Dispersion,'' IEEE Net­ 
work, 11(2), pp. 28­36, March/April 1997. 
[HK89] S. Hares and D. Katz, ``Administrative Domains and Routing Domains: A Model for Rout­ 
ing in the Internet,'' RFC 1136, Network Information Center, SRI International, Menlo 
Park, CA, December, 1989. 
[HSF85] K. Harrenstien, M. Stahl, and E. Feinler, ``NICNAME/WHOIS,'' RFC 954, Network In­ 
formation Center, SRI International, Menlo Park, CA, 1985. 
[He90] S. Heimlich, ``Traffic Characterization of the NSFNET National Backbone,'' Proceedings 
of the 1990 Winter USENIX Conference, Washington, D.C. 
[HMT83] D. Hoaglin, F. Mosteller, and J. Tukey, Ed., ``Understanding Robust and Exploratory 
Data Analysis,'' John Wiley & Sons, 1983. 
[Ho96] J. Hoe, ``Improving the Start­up Behavior of a Congestion Control Scheme for TCP,'' 
Proceedings of SIGCOMM '96, pp. 270­280, August 1996. 

377 
[Hu95] C. Huitema, Routing in the Internet, Prentice Hall PTR, 1995. 
[HP91] N. Hutchinson and L. Peterson, ``The x­kernel: An architecture for implementing network 
protocols,'' IEEE Transactions on Software Engineering, 17(1), pp. 64­76, January 1991. 
[Ja88] V. Jacobson, ``Congestion Avoidance and Control,'' Proceedings of SIGCOMM '88, 
pp. 314­329, August 1988. 
[Jac89] V. Jacobson, traceroute, ftp://ftp.ee.lbl.gov/traceroute.tar.Z, 1989. 
[JLM89] V. Jacobson, C. Leres, and S. McCanne, tcpdump, available via anonymous ftp to 
ftp.ee.lbl.gov, June 1989. 
[Jac90] V. Jacobson, ``Compressing TCP/IP headers for low­speed serial links,'' RFC 1144, Net­ 
work Information Center, SRI International, Menlo Park, CA, February 1990. 
[JLM97] F. Jahanian, C. Labovitz, and G. Malan, ``Internet Routing Instability,'' to appear in Pro­ 
ceedings of SIGCOMM '97, September 1997. 
[Jai89] R. Jain, ``A Delay­Based Approach for Congestion Avoidance in Interconnected Hetero­ 
geneous Computer Networks,'' Computer Communication Review, 19(5), pp. 56­71, Oc­ 
tober 1989. 
[Jai90] R. Jain, ``Performance Analysis of FDDI Token Ring Networks: Effect of Parameters 
and Guidelines for Setting TTRT,'' Proceedings of SIGCOMM '90, pp. 264­275, Septem­ 
ber 1990. 
[JR86] R. Jain and S. Routhier, ``Packet Trains --- Measurements and a New Model for Computer 
Network Traffic,'' IEEE JSAC, 4(6), pp. 986­995, September, 1986. 
[KP87] P. Karn and C. Partridge. ``Estimating round­trip times in reliable transport protocols,'' 
Proceedings of SIGCOMM '87, August 1987. 
[Ke91] S. Keshav, ``A Control­Theoretic Approach to Flow Control,'' Proceedings of SIG­ 
COMM '91, pp. 3­15, September 1991. 
[KZ89] A. Khanna and J. Zinky, ``The Revised ARPANET Routing Metric,'' Proceedings of SIG­ 
COMM '89, pp. 45­56, September 1989. 
[Kl76] L. Kleinrock, ``Queueing Systems, Volume II: Computer Applications,'' John Wiley & 
Sons, 1976. 
[LTWW94] W. Leland, M. Taqqu, W. Willinger, and D. Wilson, ``On the Self­Similar Nature 
of Ethernet Traffic (Extended Version),'' IEEE/ACM Transactions on Networking, 2(1), 
pp. 1­15, February 1994. 
[Lid96] K. Lidl, private communication, January 3, 1996. 
[Lin96] T. Lindgreen, private communication, January 12, 1996. 

378 
[Li89] M. Little, ``Goals and Functional Requirements for Inter­Autonomous System Rout­ 
ing,'' RFC 1126, Network Information Center, SRI International, Menlo Park, CA, 
October, 1989. 
[LMG95] D. Long, A. Muir and R. Golding, ``A longitudinal survey of Internet host reliability,'' 
Technical Report UCSC­CRL­95­16, University of California, Santa Cruz, 1995. 
[Lo95] M. Lottor, ftp://nic.merit.edu/nsfnet/statistics; October 1995. 
[Lo97] M. Lottor, http://www.nw.com/zone/WWW/top.html; February 1997. 
[MMFR96] M. Mathis, J. Mahdavi, S. Floyd and A. Romanow, ``TCP Selective Acknowledgment 
Options,'' RFC 2018, DDN Network Information Center, Oct. 1995. 
[MM96] M. Mathis and J. Mahdavi, ``Diagnosing Internet Congestion with a Transport Layer Per­ 
formance Tool,'' Proceedings of INET '96, Montreal, June 1996. 
[MJ93] S. McCanne and V. Jacobson, ``The BSD Packet Filter: A New Architecture for User­level 
Packet Capture,'' Proceedings of the 1993 Winter USENIX Conference, San Diego, CA. 
[MLJ94] S. McCanne, C. Leres and V. Jacobson, libpcap, available via anonymous ftp to 
ftp.ee.lbl.gov, 1994. 
[MFR78] J. McQuillan, G. Falk and I. Richer, ``A Review of the Development and Performance 
of the ARPANET Routing Algorithm,'' IEEE Transactions on Communications, 26(12), 
pp. 1802­1811, December 1978. 
[MRR80] J. McQuillan, I. Richer and E. Rosen, ``The New Routing Algorithm for the ARPANET,'' 
IEEE Transactions on Communications, 28(5), pp. 711­719, May 1980. 
[Me95a] Merit Network, Inc., ftp://nic.merit.edu/nsfnet/statistics/history.nets; May 1995. 
[Me95b] Merit Network, Inc., ``The Routing Arbiter'' home page. http://www.merit.edu/ 
routing.arbiter/RA/ describes the project as a whole, and http://nic.merit.edu/ 
routing.arbiter/RA/statistics/flap.html gives ``route flap'' statistics. 
[Mi83] D. Mills, ``Internet Delay Experiments,'' RFC 889, Network Information Center, SRI In­ 
ternational, Menlo Park, CA, 1983. 
[Mi92a] D. Mills, ``Network Time Protocol (Version 3): Specification, Implementation and Anal­ 
ysis,'' RFC 1305, Network Information Center, SRI International, Menlo Park, CA, 
March 1992. 
[Mi92b] D. Mills, ``Modelling and Analysis of Computer Network Clocks,'' Technical Report 92­ 
5­2, Electrical Engineering Department, University of Delaware, May 1992. 
[MD88] P. Mockapetris and K. Dunlap, ``Development of the Domain Name System,'' Proceedings 
of SIGCOMM '88, pp. 123­133, August 1988. 
[Mo92] J. Mogul, ``Observing TCP Dynamics in Real Networks,'' Proceedings of SIGCOMM '92, 
pp. 305­317, August 1992. 

379 
[Mo95] J. Moy, ``Link­State Routing,'' in [St95]. 
[Mu94] A. Mukherjee, ``On the Dynamics and Significance of Low Frequency Components of 
Internet Load,'' Internetworking: Research and Experience, Vol. 5, pp. 163­205, Decem­ 
ber 1994. 
[My95] C. Myers, private communication, October 16, 1995. 
[Na84] J. Nagle, ``Congestion Control in IP/TCP Internetworks,'' RFC 896, Network Information 
Center, SRI International, Menlo Park, CA, January 1984. 
[Na87] J. Nagle, ``On Packet Switches with Infinite Storage,'' IEEE Transactions on Communica­ 
tions, 35(4), pp. 435­438, 1987. 
[PaFe94] C. Parris and D. Ferrari, ``A Dynamic Connection Management Scheme for Guaranteed 
Performance Services in Packet­Switching Integrated Services Networks,'' Proceedings 
of INFOCOM '94, Toronto, June 1994. 
[PHS95] C. Partridge, J. Hughes, and J. Stone, ``Performance of Checksums and CRCs over Real 
Data,'' Proceedings of SIGCOMM '95, Cambridge, Massachusetts, August 1995. 
[Pa94a] V. Paxson, ``Empirically­Derived Analytic Models of Wide­Area TCP Connections,'' 
IEEE/ACM Transactions on Networking, 2(4), pp. 316­336, August 1994. 
[Pa94b] V. Paxson, ``Growth Trends in Wide­Area TCP Connections,'' IEEE Network, 8(4), pp. 
8­17, July/August 1994. 
[PF95] V. Paxson and S. Floyd, ``Wide­Area Traffic: The Failure of Poisson Modeling,'' 
IEEE/ACM Transactions on Networking, 3(3), pp. 226­244, June 1995. 
[Pa96a] V. Paxson, ``Towards a Framework for Defining Internet Performance Metrics,'' Proceed­ 
ings of INET '96, Montreal, June 1996. 
[Pa96b] V. Paxson, ``End­to­End Routing Behavior in the Internet,'' Proceedings of SIG­ 
COMM '96, pp. 25­38, August 1996. 
[PV88] R. Perlman and G. Varghese, ``Pitfalls in the Design of Distributed Routing Algorithms,'' 
Proceedings of SIGCOMM '88, pp. 43­54, August 1988. 
[Pe91] R. Perlman, ``A comparison between two routing protocols: OSPF and IS­IS,'' IEEE Net­ 
work, 5(5), pp. 18­24, September 1991. 
[Pe92] R. Perlman, Interconnections: Bridges and Routers, Addison­Wesley, 1992. 
[Po80] J. Postel, ``User Datagram Protocol,'' RFC 768, Network Information Center, SRI Inter­ 
national, Menlo Park, CA, August 1980. 
[Po81a] J. Postel, ``Internet Protocol,'' RFC 791, Network Information Center, SRI International, 
Menlo Park, CA, September 1981. 

380 
[Po81b] J. Postel, ``Internet Control Message Protocol,'' RFC 792, Network Information Center, 
SRI International, Menlo Park, CA, September 1981. 
[Po81c] J. Postel, ``Transmission Control Protocol,'' RFC 793, Network Information Center, SRI 
International, Menlo Park, CA, September 1981. 
[PFTV86] W. Press, B. Flannery, S. Teukolsky, and W. Vetterling, Numerical Recipes, Cambridge 
University Press, 1986. 
[RJ90] K. Ramakrishnan and R. Jain, ``A Binary Feedback Scheme for Congestion Avoidance 
in Computer Networks,'' ACM Transactions on Computer Systems, 8(2), pp. 158­181, 
May 1990. 
[RY94] K. Ramakrishnan and H. Yang, ``The Ethernet Capture Effect: Analysis and Solution,'' 
Proceedings of IEEE 19th Conference on Local Computer Networks, October 1994. 
[Re89] J. Rekhter, ``EGP and Policy Based Routing in the New NSFNET Backbone,'' RFC 1092, 
Network Information Center, SRI International, Menlo Park, CA, February 1989. 
[RC92] Y. Rekhter and B. Chinoy, ``Injecting Inter­autonomous System Routes into Intra­ 
autonomous System Routing: a Performance Analysis,'' Internetworking: Research and 
Experience, Vol. 3, pp. 189­202, 1992. 
[Re95] Y. Rekhter, ``Inter­Domain Routing: EGP, BGP, and IDRP,'' in [St95]. 
[RL95] Y. Rekhter and T. Li, ``A Border Gateway Protocol 4 (BGP­4),'' RFC 1771, DDN Network 
Information Center, March 1995. 
[RG95] Y. Rekhter and P. Gross, ``Application of the Border Gateway Protocol in the Internet,'' 
RFC 1772, DDN Network Information Center, March 1995. 
[Ri95] J. Rice, Mathematical Statistics and Data Analysis, 2nd edition, Duxbury Press, 1995. 
[Ri92] R. Rivest, ``The MD5 Message­Digest Algorithm,'' RFC 1321, DDN Network Information 
Center, April 1992. 
[Ro82] E. Rosen, ``Exterior Gateway Protocol (EGP),'' RFC 827, Network Information Center, 
SRI International, Menlo Park, CA, October 1982. 
[Ro83] S. Ross, Stochastic Processes, John Wiley & Sons, 1983. 
[SRC84] J. Saltzer, D. Reed and D. Clark, ``End­To­End Arguments in System Design,'' 
ACM Transactions on Computer Systems, 2(4), pp. 277­288, November 1984. 
[SAGJ93] D. Sanghi, A.K. Agrawal, ’ 
O. Gudmundsson, and B.N. Jain, ``Experimental Assess­ 
ment of End­to­end Behavior on Internet,'' Proceedings of INFOCOM '93, San Francisco, 
March, 1993. 
[Sc77] M. Schwartz, Computer Communication Network Design and Analysis, Prentice 
Hall, 1977. 

381 
[SS80] M. Schwartz and T. Stern, ``Routing Techniques Used in Computer Communication Net­ 
works,'' IEEE Transactions on Communications, 28(4), pp. 539­552, April 1980. 
[SFANC93] D. Sidhu, T. Fu, S. Abdallah, R. Nair, and R. Coltun, ``Open Shortest Path First 
(OSPF) Routing Protocol Simulation,'' Proceedings of SIGCOMM '93, pp. 53­62, 
September 1993. 
[St95] M. Steenstrup, editor, Routing in Communications Networks, Prentice­Hall, 1995. 
[St94] W.R. Stevens, TCP/IP Illustrated, Volume 1: The Protocols, Addison­Wesley, 1994. 
[St96] W.R. Stevens, TCP/IP Illustrated, Volume 3: TCP for Transactions, HTTP, NNTP, and 
the UNIX Domain Protocols, Addison­Wesley, 1996. 
[St97] W.R. Stevens, ``TCP Slow Start, Congestion Avoidance, Fast Retransmit, and Fast Recov­ 
ery Algorithms,'' RFC 2001, DDN Network Information Center, January 1997. 
[Ta96] A.S. Tanenbaum, Computer Networks, 3rd edition, Prentice Hall, 1996. 
[Tr95a] P. Traina, ``Experience with the BGP­4 Protocol,'' RFC 1773, DDN Network Information 
Center, March 1995. 
[Tr95b] P. Traina, editor, ``BGP­4 Protocol Analysis,'' RFC 1774, DDN Network Information Cen­ 
ter, March 1995. 
[Va95] Y. Vardi, ``Network Tomography I: Estimating Source­Destination Traffic Intensities from 
Link Data (Fixed Routing),'' under revision for publication in Journal of the American 
Statistical Association, 1995. 
[WLC92] I. Wakeman, D. Lewis, and J. Crowcroft, ``Traffic Analysis of Trans­Atlantic Traffic,'' 
Proceedings of INET '92, Kyoto, Japan, 1992. 
[WC91] Z. Wang and J. Crowcroft, ``A New Congestion Control Scheme: Slow Start and Search 
(Tri­S),'' Computer Communication Review, 21(1), pp. 32­43, January 1991. 
[WC92] Z. Wang and J. Crowcroft, ``Eliminating Periodic Packet Losses in the 4.3­Tahoe BSD 
TCP Congestion Control Algorithm,'' Computer Communication Review, 22(2), pp. 9­16, 
April 1992. 
[WTSW95] W. Willinger, M. Taqqu, R. Sherman, and D. Wilson, ``Self­Similarity Through 
High­Variability: Statistical Analysis of Ethernet LAN Traffic at the Source Level,'' 
Proceedings of SIGCOMM '95, pp. 100­113, Cambridge, MA, September 1995. 
[WP97] W. Willinger and V. Paxson, ``Discussion of ‘Heavy Tail Modeling and Teletraffic Data' 
by S.R. Resnick,'' to appear in Annals of Statistics, 1997. 
[WPT97] W. Willinger, V. Paxson and M. Taqqu, ``Self­Similarity and Heavy Tails: Structural Mod­ 
eling of Network Traffic,'' in A Practical Guide To Heavy Tails: Statistical Techniques for 
Analysing Heavy Tailed Distributions, to be published by Birkhauser, 1997. 

382 
[Wo82] R. Wolff, ``Poisson Arrivals See Time Averages,'' Operations Research, 30(2), 
pp. 223­231, 1982. 
[WS95] G. Wright and W. Stevens, TCP/IP Illustrated, Volume 2: The Implementation, Addison­ 
Wesley, 1995. 
[ZG­LA91] W. Zaumen and J.J. Garcia­Luna Aceves, ``Dynamics of Distributed Shortest­Path 
Routing Algorithms,'' Proceedings of SIGCOMM '91, pp. 31­42, September 1991. 
[ZG­LA92] W. Zaumen and J.J. Garcia­Luna Aceves, ``Dynamics of Link­state and Loop­free 
Distance­vector Routing Algorithms,'' Internetworking: Research and Experience, Vol. 3, 
pp. 161­188, 1992. 
[Zh86] L. Zhang. ``Why TCP timers don't work well,'' Proceedings of SIGCOMM '86, Au­ 
gust 1986. 
[ZSC91] L. Zhang, S. Shenker, and D. Clark, ``Observations on the Dynamics of a Congestion 
Control Algorithm: The Effects of Two­Way Traffic,'' Proceedings of SIGCOMM '91, 
pp. 133­147, September 1991. 
[ZDESZ93] L. Zhang, S. Deering, D. Estrin, S. Shenker, and D. Zappala, ``RSVP: A New Resource 
ReSerVation Protocol,'' IEEE Network, 7(5), pp. 8­18, September 1993. 

383 
Appendix A 
The Network Probe Daemon 
NPD (Network Probe Daemon) is a framework for probing paths through the Internet 
by tracing the routes corresponding to the paths, and by sending TCP packets along the paths and 
tracing the arrivals of both the packets and their acknowledgements. NPD consists of a daemon 
(npd) that services authenticated requests for tracing and generating probes, and a control program 
(npd control), which is run only at the site conducting the probe experiments. 
The following sections discuss the daemon's operation (x A.1) and the steps taken to 
address security concerns (x A.2). 
A.1 Daemon operation 
A site participates in the network probe experiment by running the network probe daemon 
npd on a Unix workstation connected to the Internet. The workstation does not need any special 
location in the network topology (e.g., it does not need to be located on the wide­area gateway 
network). 
The npd process is run by Internet services daemon inetd whenever a connection ap­ 
pears for the ``npd'' service (TCP port 7504, by default). This means that installing the daemon 
requires editing /etc/services to add the ``npd'' service, and /etc/inetd.conf to add the service with the 
given port number. 
Once running, npd responds to the following requests: 
trace­route X 
Run the traceroute utility [Jac89] to measure the path to host X and send back the results. 
begin­trace X Y 
Begin tracing ``discard'' or npd­to­npd packets and their acknowledgements between hosts 
X and Y . 
terminate­trace 
Stop the trace and send back the results. 
sink s 

384 
Accept a connection on the ``npd'' port, using a socket receive buffer of s bytes, and read from 
it until the connection is closed. 
source X p n s 
Send n bytes to the discard or ``npd'' port (as indicated by p being ``discard'' or ``npd'') of host 
X, using a socket send buffer of s bytes. 
npd sources and sinks always use a local TCP port of 7505 (that they both do has security 
benefits, as discussed in x A.2 below). If the bytes are sent to the ``discard'' port, then no re­ 
mote npd need run; the inetd process on the remote machine will instead handle discarding 
the data packets itself. 
restart­log 
Mail the current log to a preconfigured address and, upon success, clear it. 
self­test 
Perform a self­test and report the results. 
quit Terminate the connection. 
On some operating systems, the packet filter cannot capture traffic generated by the same 
host that is running the filter. In particular, Sun workstations using SunOS and the stock ``NIT'' 
(Network Interface Tap) interface do not capture their own outbound traffic. Because SunOS is 
quite popular, it was necessary to accommodate this deficiency. For the traceroute experiment 
it makes no difference, but for the packet dynamics (probe) experiment it is crucial that the TCP 
traffic comprising the probe be recorded at both endpoints. NPD can thus be configured at a site to 
run on two workstations, a source/sink host that sources or sinks TCP probes, and a trace host that 
runs traceroute or tcpdump, depending on the experiment. For a given site A, we refer here to 
these machines as A s (source) and A t (trace) respectively. For many sites, A s = A t , as summarized 
in Table XIV. 
To conduct a traceroute experiment measuring the route from site A to site B, the NPD 
master program (npd control) connects to the npd daemon at host A t and (after authentication) 
issues: 
trace­route B 
quit 
and reads back the traceroute output, if successful. To conduct a probe experiment of b bytes 
between A and B, using send and receive buffer sizes of s and r, npd control executes the 
following steps (assuming each preceding step is successful): 
1. Send the request begin­trace A B to A t and B t , and wait for them to indicate they are 
ready. 
2. Send the request sink r to B s and wait for it to indicate it is ready. 
3. Send the request source B npd b r to A s . 

385 
4. Wait for A s and B s to indicate they have finished sourcing/sinking the data stream. 
5. Wait two more seconds, to allow any packets still traveling inside the network to arrive at the 
endpoints. 
6. Send the request terminate­trace to A t and B t . 
7. Receive the trace and error files from A t and B t . 
8. Send the request quit to A s and B s , and to A t and B t if different. 
A.2 Security issues 
Allowing a program to originate and trace network traffic at an Internet site naturally 
raises important security issues. To this end, we took a number of steps to make NPD secure: 
ffl A host attempting to make NPD requests must first authenticate itself, as explained below. 
ffl npd does not need to be installed with any privilege, other than being able to exec tcpdump 
and traceroute. A site can also configure it so it can only run a special, restricted version 
of tcpdump (rtcpdump; see below). 
ffl npd is hardwired to only be able to trace TCP ``discard'' traffic, or traffic between two npd's. 
This is done by constructing a tcpdump filter of 
(RESTRICTION) and (XXX) 
whenever npd is asked to trace traffic using the filter XXX, where RESTRICTION is: 
(tcp port 9) or (tcp src port 7505 and tcp dst port 7505) 
i.e., only allow traffic involving either the TCP discard port, or both an npd sender and 
receiver. (TCP port 7505 is the well­known port used by npd for sourcing and sinking traffic; 
see x A.1.) 
ffl npd logs all of its connections and activity. If writing to the log fails, or if npd cannot lock 
the log for exclusive access, npd exits. 
ffl The log file can only be reset if npd first succeeds in mailing the previous log to a preset 
Internet mail address. Sites can configure this address to include a local address. 
ffl The only files created by npd (other than the log file) are temporary files created using the 
Unix tmpfile(3) library routine, which are guaranteed to disappear when npd exits, and 
also to be unreadable by other local processes. 
ffl When executed, npd forks a child process that sleeps for a fixed amount of time (10 minutes). 
When the child process wakes up, it kills its parent process. This mechanism acts as a crude 
``fail­safe.'' Normally, after npd successfully completes its requests, it kills the child process 
prior to exiting itself. But if for any reason npd fails to do so (for example, if the network 
connection between npd and npd control is lost), the fail­safe guarantees that npd will 
at some point cease consuming resources on the host. 

386 
A.2.1 Using rtcpdump instead of tcpdump 
The NPD sources include rtcpdump, a version of tcpdump that is restricted to capturing 
TCP discard packets (or npd­to­npd packets, as described above). rtcpdump can only capture 
live, restricted packets (it cannot read existing trace files), and only writes to stdout, which is under 
the full control of npd. 
Thus, a site can safely give rtcpdump ``setgid'' or ``setuid'' privilege to the Unix ``group 
id'' or ``user id'' necessary for packet capture on the tracing host, without needing to give the tracing 
group­id or user­id to npd itself. 
rtcpdump terminates whenever its stdin is closed, which happens automatically when 
npd exits. 
A.2.2 NPD authentication 
An important aspect of NPD security is the use of fairly strong authentication to restrict 
use of npd at a site to only authorized remote sites. npd authenticates a remote site in the following 
manner: 
1. The IP address of the remote host must translate to a hostname that in turn translates back 
to the given IP address. To illicitly pass this test, an attacker must subvert a Domain Name 
System (DNS) name server [MD88] (which, unfortunately, is possible [Be95]). 
2. As part of the authentication procedure, the host must identify itself using a DNS hostname. 
The host's claimed identity must then translate to the host's IP address. Like the previous 
step, this step requires that an attacker subvert a DNS name server. 
3. The host's claimed identity must appear in npd's directory of secret keys. For an attacker to 
pass this test, they must successfully subvert a DNS name server authoritative for one of sites 
appearing in the directory of secret keys; more difficult than the subversions above, but still 
possible. 
4. npd challenges the remote host to prove its identity by sending it a random bit­string. The 
remote site must successfully xor this bit­string with the secret key and send to npd the 
MD5 checksum [Ri92] of the result. npd then verifies that the result matches its own local 
computation of what the checksum should be. If so, then the remote site is presumed to know 
the secret key and is authenticated. 
For an attacker to successfully pass this test essentially requires that they know the secret 
key, since MD5 checksums take on 2 128 ß 10 38 possible values. Since the secret key never 
crosses the network, 1 to acquire the secret key requires either subverting the npd control 
site or the npd site, or computing the key by observing previous authentication exchanges 
as they crossed the network. This latter attack is believed infeasible due to the presumed 
non­invertibility of MD5 [Ri92]. 
1 Except when distributing the NPD sources to a remote site; or if npd retrieves the key using NFS.