I need to start with the caveat that I am not a networking expert by any stretch of the imagination, nor am I a LINUX expert, nor anything beyond a PC expert (if that) and consequently the information contained within this article could be wrong, misleading and potentially cause problems should you decide to follow in my footsteps. I myself was reluctant to tinker with any of these settings until it became clear that my ISP is not able to immediately resolve my connectivity issues, nor can they provide me with a clear ETR. Regardless, through my endeavors I have seen some performance improvements, but there are still problems. To give you an indication, it's been a few days that I've been working on this post, partly due to access to my website being periodically cut off throughout the day as a result of these issues.
What I am is a fairly active computer user. I am not a "power user" - I don't download torrents 24 hours a day, and I don't host a video server with an open portal to the internet. I absolutely don't engage in Peer-to-Peer file sharing. I download what I need at will, but not like a crazy hoarder of all things digital, preparing for the Ragnarok of the Internet to begin. Oddly enough, it may already be here, and we're all just starting to notice it.
My ISP is a local family owned provider, which is pretty much my ISP of choice ever since I can remember. I did briefly experience the perils of corporation-provided broadband service in the form of COX Digital Cable internet while we lived in New Orleans, but their support team left a lot to be desired. I tend to prefer local owned business for anything because I happen to like being treated like a real person, and not just a replaceable statistic. As it is CenturyLink has been continuing to overbill me on my telephone service for seven months now, despite my attempts to get them to correct the problem, and if there was a local alternative I would drop them in a heartbeat.
But that's another story, even though CenturyLink owns the telephone lines my service is transferred over.

Starting sometime during the Christmas holiday, I began experiencing severe connectivity issues. I would not lose service, but latency would spike up to something ridiculous, sometimes as high as 1500ms on a ping test to www.google.com, and also during these periods the same test would result in up to 25% packet loss. While it was possible to surf the web (as long as I was willing to wait up to five minutes for a page to load, with multiple refreshes in my browser as it kept timing out - I would classify this as painfully surfing the web) it was not possible to stream video or connect to online gaming servers such as Star Trek Online. I've spent enough time on help desk to know that sometimes this is just the DSL modem or router building up issues, so I reset both, and found the problem continued to persist after they came back up. This is unusual - power cycling equipment at the site experiencing connectivity issues will typically clear up connectivity issues almost immediately.
For those of you unfamiliar with this process, the best way to power cycle your equipment (assuming you have a high speed DSL or Cable modem and a router) is to remove the power cords from both devices, wait at least 10 seconds, plug in the high speed modem first, wait for the lights to stop blinking and stabilize (verify a light is present for your service type - bear in mind not all modems will have this indicator), and then plug in your router and give the network about 1-2 minutes to come back online. In cases of a service outage it may take longer to reconnect - and if you are unable to do so after trying these steps it is probably worth calling your ISP to see if there is an outage in your region.
In my case, however, service was immediately restored yet continued to have high latency and packet drops. This continued from about 8:00pm until 11:00pm and then went away. I presumed the issue was somewhere on the ISP end and decided to not worry about it, figuring that the problem had cleared up because they had resolved it. It is never safe to assume.
This problem returned the following night, and almost every night since. It also began cropping up in the middle of the day at various intervals, would persist for an hour or two, and then clear up. I started to notice a pattern, and submitted a ticket to my ISP for support but their initial response was that my router might be going bad and should be replaced. They claimed their tests showed the connection remained steady to the DSL modem, and since the router was my own equipment it therefore was my own responsibility to resolve. I had been using a D-Link wireless router, but I acquired a Linksys BFE series and decided to give that a shot and see what happened.
Not entirely to my surprise, the next evening yielded the same performance issues. I had expected this result even with a different router because the issues seemed to crop up intermittently, yet almost during the same time frame each night. My hypothesis at this point was that even though my DSL line is supposed to be a dedicated, constant connection - the network was overburdened during these hours due to an increase in heavy traffic, most likely due to an increase in Netflix and Hulu subscribers taking advantage of video on demand streaming services that are now being offered with modern television systems and in add-on set-top boxes. There is a clear shift away from cable television and toward these on-demand type services, and consequently there is a conflict of interest with many major ISPs who own these lines.
CenturyLink, Time Warner, Cox, Comcast and others all offer digital subscriber television service that is now in direct competition with Netflix, Hulu, YouTube and others. CenturyLink, Time Warner, Cox & Comcast also offer the internet service that provides access to Netflix, Hulu, YouTube and others. From a financial perspective, it is in their best interest to restrict access to internet based video streaming services for two reasons: 1) it will prevent the loss of subscribers to their digital television services and 2) it will reduce the overhead of keeping their internet service operational for their customers. Whether these companies are actively engaging in restricting traffic (look up Net Neutrality for more details, this article is about bufferbloat mitigation for now) is open to debate.
My point is that ISPs may have reached the point where they now need to make a choice: either invest in infrastructure that will support this new demand, or else begin restricting traffic for their subscribers. The reason they need to make this choice is an increase in video streaming, combined with an increase in new systems - the iPad has seen tremendous success and the Windows 7 operating system is now at over 20% market share. I predict most ISPs will go with the second option (restricting traffic) because it is the easiest, fastest and least expensive solution to implement, in the short term. What these ISPs may not understand, however, is that their customers will drop them given half a chance and switch over to another provider that does not restrict traffic, should such a thing happen. The number one rule to remember is that taking away something from your customers will result in their loss of loyalty to your brand.
To get back on topic, this problem had now been present for over a week and a half, and based on my limited dialog with my ISP thus far two things had become apparent - I was not the only customer suffering with this issue, and they did not seem to have any idea what the source of the problem was or how to resolve it. Since that communication I have only one additional response from my ISP indicating that they had a plan to resolve the issue, and would call me when it is repaired.
During this, I came across an article on Slashdot pertaining to bufferbloat, which was a term I had never heard before. But it seemed to coincide with what I was experiencing - and pointed back toward my network as a potential culprit in this bandwidth issue. Wireless connectivity was also a potential culprit, though for the time being I have eliminated that from my network entirely and still experience trouble.
Jim Gettys, the author of the above bufferbloat article and others, indicates that the problem can't be assigned to one source. This is a problem at the ISP level, but also present in the equipment being used at the sites in question. The problem is due to a decrease in the price of RAM, and the mistaken idea that adding RAM to high speed modems, routers, switches and various other networking equipment will make them perform better. In reality it allows these devices to have massive buffers, which can cause many of these issues I have been describing. The problem is also present in modern operating systems that automatically scale the size of the buffer used to transfer network data by using available RAM, and since most modern computer systems have plenty to spare, these buffers can easily become enormous.
Think of buffers in terms of markets, versus supermarkets, versus Super Walmart. Markets would be roughly the equivalent of hardware and systems from eight to ten years ago, with limited options but to this day they are still efficient in how they handle their customers at the cash register. If you do not have an old-fashioned market available to you where you live, you might compare them to what is now Walgreens or CVS - there are only a few registers and cashiers, and yet when you shop and then check out your total time in the store rarely exceeds fifteen minutes.
Supermarkets have more choices, and therefore attract more customers. To handle the additional customers most supermarkets use several checkout lanes, but only keep those lanes occupied by cashiers during periods of high traffic. They also provide "fast checkout" lanes for customers who are only purchasing a few items. Even with this attempt to mitigate the increased traffic, it is common for a shopping trip to a supermarket to take thirty minutes or more. Part of this is due to the size of the store - it is bigger so it takes longer to find what you are looking for. Part of this is also due to the increased number of items purchased - due to the length of time it takes to travel to a supermarket and the time it takes to find what you want, people tend to purchase more of what they need in order to reduce the number of trips they need to make. Part of this is also due to the customer's perception of which line will be the fastest, which can easily prove to be incorrect when the customer ahead of them throws a temper tantrum about not being able to double up the 10 cent coupons on their ice cream and demands a manager intervene, therefore holding up the line.
Super Walmart takes all of the above, and multiplies it a few times in terms of scope. Now your shopping trip takes an hour or more. People mistake Super Walmart as being convenient because everything they "need" is in one place, but do not realize that a few short trips to different small markets would actually take less time, and probably cause less stress as well. Financial savings (obviously) are another concern, but that does not really fall into the scope this analogy.
So applying the analogy above, think of the buffer size or RAM size as the size of the store you are shopping at. Think of the checkout lanes as the queue. Think of the amount of product in your shopping cart as the size of your data packets. Now we should be starting to get a clearer picture of what may be going on.
To attempt to mitigate this problem I started with my router. If I had a DSL modem set to routing mode I would start there, but because it is in bridge mode (and therefore a pass-through gateway only) I did not think I needed to make any adjustments there. The first stop in my router's setup is on the main configuration page, and specifically the MTU setting. Because my router connects to my ISP using PPPoE I specify a Manual MTU of 1492 (which is the recommended size for PPPoE connections). This was already set up correctly, so I then switched to the Applications & Gaming section. If your MTU is set to Automatic, however, this could potentially be one problem point with your internet connection, and you should contact your ISP to find out what they recommend you specify based on your internet connection. Generally the MTU size should be no smaller than 576, and generally (for home internet connections) should not exceed 1500. Leaving this to automatic SHOULD not allow the size to grow above 1500, but as stated above, this may no longer be the case.
For those of you not in IT, the word "should" is the most heavily used word by IT professionals, referring to how some sort of tech has been designed. For various reasons, such as equipment failure, improper voltages, strange bugs and other things too numerous to mention, tech does not always work as designed. "Should" is our mantra, our caveat and our legalese. If you hear the word "should" uttered by an IT professional, you ought to assume that things may not go as expected, and take steps to prevent a total disaster.
The next step was to configure QoS, which was not an option on the D-Link wireless router (most likely due to fact that the latest firmware available from D-Link for this router dates back to 2007), but blessedly became available with the 2009 firmware update that I had installed while configuring the router for my network. During my initial testing at Speedtest.net I had noticed that while my download speed would range from 0.10Mbps up to my rated 1.5Mbps, my upload speed would consistently be over 0.412Mbps, which was higher than the amount allocated by my service (I have not yet verified this with my ISP but according to their website my tier should have a maximum upload speed of 0.384Mbps).
Overloaded upstream was one of the potential sources of trouble as hinted by Jim Gettys on his blog, so specified my upload speed accordingly to 384Kbps (0.384Mbps) in this section. I also changed the download speed on each of the four ports to the lowest possible setting (256Kbps) even though only three ports are being actively used (I will need to alter this since port one of the router is now being shared with three separate devices via an old network hub I had lying around and port four is currently not attached to anything). I gave SSH high priority and HTTP medium priority, and left the rest of the applications listed at low priority.
My next step was to replace my network switch (being the newest network technology still in use at that point) with an old network hub. It had long been my belief that switches were superior to hubs in that they used intelligent traffic management and sent traffic directly to where it was intended, whereas a hub will attempt to communicate with each device connected to it until it finds the correct one. The truth of this statement is that in a large scale network, a switch is more efficient, but in a small network like the one in my home, the hub proves to be more efficient in this task. Replacing the switch removed a potential contributor toward buffer bloat and visibly (although minutely) decreased the latency on my home network.
Networking equipment had been replaced and better configured, so I turned to my systems attached to the network. One might think that the router should be able to manage and shape all the traffic on the network, but I found that little improvement to my network's health had been made, and during the intermittent periods of latency my systems were effectively crippled as far as the internet was concerned. So it was time to see what else was contributing to bufferbloat.
As it turns out, Windows XP has (this may not be a 100% accurate figure) a default MTU size of 1480 specified on network adapters. Windows Vista, Windows 7, OSX and modern releases of LINUX do not have MTU sizes specified. I have a LINUX server (upon which this website resides) and a LINUX desktop system (and also a LINUX netbook that has been temporarily disabled thanks to coffee spilled on the keyboard...grr...), but the other computers in the house run Windows XP (Brigitte's laptop and the "family" computer now shared by the kids). We also have a Playstation 3 that we like to use to watch Netflix streaming movies, which is running a proprietary "modern" OS that also does not specify MTU size by default.
Specifying MTU on the Playstation 3 and Ubuntu desktop was a fairly simple process. On the Playstation 3, navigate to the Network configuration section, choose manual configuration, and then continue through the menus until you reach MTU - switch it from auto to manual and then set your preferred value. I opted to go with an MTU of 1480 to coincide with the XP systems already on my network.
On my Ubuntu desktop, I right-clicked on the tray icon for my network connection on the top-right portion of the screen and selected Network Configuration. I then selected the IPV4 tab and selected my NIC from the menu. I immediately had an option to place a checkmark into a box next to MTU and specify 1480 in the box directly next to it.
My Ubuntu server does not have a monitor, mouse or keyboard because I only access it remotely - it only has a power cord and a CAT5 cable connecting it to the network hub. From my desktop I accessed my primary account with gterm + ssh and used sudo to open /etc/network/interfaces so I could add the following line:
eth0 mtu 1480
You would want to change eth0 to your network adapter as necessary. I then reset the network adapter with the following command:
sudo /etc/init.d/networking restart
Strangely I saw more impact after restricting my systems than after reconfiguring my router. Unfortunately this only lessened the effects of the severe latency, which continues to plague me to this day. While my ISP assures me they are working on the issue and will contact me when it is resolved, the fact that this has been a problem for longer than two weeks is worrisome, and I wonder if I might be forced to find another provider for my internet service to ultimately resolve this once and for all. But for those of you who were curious what this means - while suffering extreme latency it is now possible to stream video on one system on the network, with occasional buffering, but not more than one. It is possible to play Star Trek Online on one system, so long as I don't mind periodic lag and nobody is engaging in video streaming on another system at the same time. While not perfect, it IS an improvement, and I hope it will be more of an impact on my network's overall performance once the latency issues are resolved.
I am also interested in changing my router to one that would support OpenWRT + Gargoyle, which would then give me a finer level of control over network traffic. While I might re-purpose a spare PC to fill that role, I would rather use new equipment to prevent weird issues cropping up as a result of aging and failing components.