<!doctype html>
<html>
<head>
<meta charset="UTF-8">
</head>
<body>
<div>
Hi Stefan,
</div>
<div class="default-style">
</div>
<div class="default-style">
Some guesses for a possible cause :
</div>
<div class="default-style">
</div>
<div class="default-style">
MTU issue somewhere in this path, possibly limited to one member in a link bundel somewhere in the path.
</div>
<div class="default-style">
Given that you see this limited to Ziggo users, likely within the Ziggo network.
</div>
<div class="default-style">
</div>
<div class="default-style">
You might try to limit the MTU on your server to something like 1420 bytes .
</div>
<div class="default-style">
If that fixes the problem you have a clear indication that this is the problem.
</div>
<div class="default-style">
</div>
<div class="default-style">
Normally path mtu discovery should work (not to mention expecting that all internet is OK up to 1500 bytes) , but path mtu discovery expects that routers accurately know what the MTU is for their links to the neighbor.
</div>
<div class="default-style">
If a link between two routers drops packets less than the MTU configured on the router interfaces this behaviour is very hard to detect - it has become a black hole for oversized packets with no alert whatsoever.
</div>
<div class="default-style">
</div>
<div class="default-style">
Worse - when "the" link between two routers is a bundle and one member in this bundle has errors or perhaps supports not the full MTU size .
</div>
<div class="default-style">
Usually traffic is balanced across links members based on a hash of source ip, destination ip and source/destination ports .
</div>
<div class="default-style">
That means the same client and same destination may, depending on 'chance' (source port here) encounter the problem link or not .
</div>
<div class="default-style">
</div>
<div class="default-style">
Make sure that for your server logging you capture source IP but also source port .
</div>
<div class="default-style">
</div>
<div class="default-style">
If you have a knowledgeable 'friendly user' that has the problem on Ziggo and that you can work with for troubleshooting I suggest a packet capture of their traffic (and 'all icmp' ) on your end, and ideally also on the user side.
</div>
<div class="default-style">
</div>
<div class="default-style">
You want to capture 'all icmp' (not filter on source IP) , as any path mtu (mtu too big) icmp messages are sourced from "some router IP along the path" .
</div>
<div class="default-style">
</div>
<div class="default-style">
Also helpful to do a full traceroute from both ends - as traffic may flow differently on the forward and return path .
</div>
<div class="default-style">
</div>
<div class="default-style">
Do you have any indication it is "all Ziggo" , or perhaps limited to some IP ranges from Ziggo ?
</div>
<div class="default-style">
</div>
<div class="default-style">
You can with ping try to manually find out about the MTU allowed to clients , and vice versa.
</div>
<div class="default-style">
[I always need to think and double check is the size argument is payload, full IP packet,and how big ethernet headers are again ]
</div>
<div class="default-style">
</div>
<div class="default-style">
Best regards, Boudewijn
</div>
<div class="default-style">
</div>
<blockquote type="cite">
<div>
Op 23-08-2023 14:18 CEST schreef Stefan van den Oord <stefan+nlnog@medicinemen.eu>:
</div>
<div>
</div>
<div>
</div>
<div style="overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;">
Dear NLNOG community,
<div>
</div>
<div>
I’d like to present to you a problem that we’re experiencing. To us it is very strange, we are out of ideas. Of course solutions would be much appreciated, but also ways of diagnosing this and work-arounds are very much appreciated.
</div>
<div>
</div>
<div>
Background: we’re a small Dutch company developing the Viduet platform: a platform to help chronically ill patients better manage their wellbeing together with their care providers. Our product is web-based and we also have a mobile app.
</div>
<div>
</div>
<div>
The problem: since almost two weeks we’re getting reports from users that sometimes (!) get connection timeouts in their browsers/apps when they connect to our web platform. We have narrowed down the potential sources of the issue and found a small setup to reproduce the issue:
</div>
<div>
<ul class="MailOutline">
<li>We setup a clean Linode node in Frankfurt (smallest type, shared CPU) with Apache (just `apt install apache2`). Requesting the default apache index.html using `curl -i <a href="http://172.104.202.142`/">http://172.104.202.142`</a> (or <a href="http://172.104.202.142/large.html">http://172.104.202.142/large.html</a>) causes timeouts more than 1 in 10 tries for some users, who have in common that they are all customers of the Ziggo internet provider.</li>
<li>Doing the same with a Linode node in Paris has the same result.</li>
<li>Doing the same with a Linode node in London works as you would expect, so does not have this strange behaviour.</li>
<li>Running `mtr -rwzbc100 172.104.202.142` (the Frankfurt node mentioned above) shows no packet loss, nothing out of the ordinary.</li>
<li>Running nginx instead of apache makes no difference: same issue</li>
<li>Forcing apache to use http/1.0 makes no difference: same issue (but frankly I don’t think it goes wrong on this level of the protocol stack)</li>
<li>There seems to be a relationship with the content length of the HTTP response. For shorter HTML files like the index.html on the Frankfurt node, it sometimes times out, and sometimes succeeds after a delay of a few seconds (and sometimes returns as fast as you’d expect). Using slightly larger HTML files it just times out.</li>
<li>I’m hesitating whether this is relevant at all, but when using HTTPS instead of HTTP, the problem also manifests in TLS handshake errors.</li>
</ul>
<div>
</div>
</div>
<div>
We have been in touch with Linode support and they found nothing out of the ordinary on their side. They point to the Ziggo network, saying:
</div>
<div>
</div>
<div>
<blockquote type="cite">
<p style="box-sizing: inherit; white-space: pre-line; line-height: 1.4; margin: 16px 0px 0px; caret-color: #606469; color: #606469; font-family: LatoWeb, sans-serif; font-size: 14.4px;">The reason that the issue only exists from to the Frankfurt data center and not London is likely because the problem is related to the particular route that the traffic takes from one place to the other. While the MTRs you shared look good, I did find evidence in another ticket of a particular hop within Ziggo's network showing issues, but we can't say for sure what the issue is. Here is the information about the hop in case that helps with your communication with Ziggo:</p>
<pre style="font-size: 1rem; box-sizing: inherit; line-height: 1.3; overflow-x: auto; max-width: 100%; background-color: #f9fafa; caret-color: #606469; color: #606469;"><code class="hljs apache" style="box-sizing: inherit; display: block; overflow-x: auto; background: #fefefe; color: #444444; padding: 0.5em;"><span class="hljs-attribute" style="box-sizing: inherit; color: #aa5d00;">AS33915</span> <a href="http://asd-tr0021-cr101-be64.core.as9143.net/">asd-tr</a><span class="hljs-number" style="box-sizing: inherit; color: #aa5d00;"><a href="http://asd-tr0021-cr101-be64.core.as9143.net/">0021</a></span><a href="http://asd-tr0021-cr101-be64.core.as9143.net/">-cr</a><span class="hljs-number" style="box-sizing: inherit; color: #aa5d00;"><a href="http://asd-tr0021-cr101-be64.core.as9143.net/">101</a></span><a href="http://asd-tr0021-cr101-be64.core.as9143.net/">-be</a><span class="hljs-number" style="box-sizing: inherit; color: #aa5d00;"><a href="http://asd-tr0021-cr101-be64.core.as9143.net/">64</a></span><a href="http://asd-tr0021-cr101-be64.core.as9143.net/">.core.as</a><span class="hljs-number" style="box-sizing: inherit; color: #aa5d00;"><a href="http://asd-tr0021-cr101-be64.core.as9143.net/">9143</a></span><a href="http://asd-tr0021-cr101-be64.core.as9143.net/">.net</a> (<span class="hljs-number" style="box-sizing: inherit; color: #aa5d00;">213.51.64.193</span>)</code></pre>
</blockquote>
</div>
<div>
<div dir="auto" style="overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;">
<div dir="auto" style="overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;">
<div dir="auto" style="overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;">
<div>
Again, any help is much appreciated!
</div>
<div>
</div>
<div>
Kind regards,
</div>
<div>
</div>
<div>
—
<br>Stefan van den Oord
<br>CTO @ Medicine Men B.V.
<br>
<br><em>Not in the office on Wednesdays</em>
<br>
<br>Regulierenring 22
<br>3981 LB Bunnik
<br>The Netherlands
<br>+31 85 1307020
</div>
<div>
</div>
</div>
</div>
</div>
</div>
</div> _______________________________________________
<br>NLNOG mailing list
<br>NLNOG@nlnog.net
<br>http://mailman.nlnog.net/listinfo/nlnog
</blockquote>
</body>
</html>