There's another issue that the other answers are not addressing. In applications like this one you're often
not interested in the standard deviation, since it is a non-robust statistic with a breakdown point of 0%, which means that for a large sample size changing a negligible fraction of the data can result in an arbitrary change in the statistic's value. Instead, consider using quantiles, common ones being the inter-quantile range, which are more robust statistics. Specifically, the 25-th and 75
-th quantiles both have 25% breakdown point, because you need to change at least 25% of the data to arbitrarily affect them.
This is particular important in your consideration, because of a number of factors:
- Communication delays are often caused by one-time events that result in a down-time rather than a normal delay, and of course such down-times are very long in comparison. For example think of power outages, server crash, even sabotage...
- Even if there are no down-times in your data, other factors could have a significant impact on your measurements that are completely irrelevant to your application. For example, other processes running in the background might slow down your application, or memory caching might be improving the speed for some but not all runs. There might even be occasional hardware activity that affects the speed of your application only now and then.
- Usually people judge a system's responsiveness based on the average case, not the average of all cases. Most will accept that an operation might in a minority of the cases completely fail and never even return a response. An excellent example is the HTTP request. A small but nonzero proportion of packets get totally dropped from the internet and the request would have a theoretically infinite response time. Obviously people don't care and just press "Refresh" after a while.