[NLNOG] [SPAM] Curious problem with connections from Ziggo customers to Linode nodes in some data centers

Sabri Berisha sabri at cluecentral.net
Wed Aug 23 20:59:47 CEST 2023


Hi, 

In the past I've used nping troubleshoot (L3) ECMP issues. For example: 

for i in `seq 10000 60000` ; do /bin/echo -n "$i: " ; nping -c 1 -g $i --tcp-connect -p 80 --dest-ip 172.104.202.142 | grep Failed ; done 

If you get a failure, use: traceroute -n -T --sport=<whatever source port is failing> 172.104.202.142 80 

This will not work very well on link aggregation, it's most effective on IP ECMP. 

Thanks, 

Sabri 

----- On Aug 23, 2023, at 6:30 AM, Boudewijn Visser (nlnog) <bvisser-nlnog at xs4all.nl> wrote: 

> Hi Stefan,
> Some guesses for a possible cause :
> MTU issue somewhere in this path, possibly limited to one member in a link
> bundel somewhere in the path.
> Given that you see this limited to Ziggo users, likely within the Ziggo network.
> You might try to limit the MTU on your server to something like 1420 bytes .
> If that fixes the problem you have a clear indication that this is the problem.
> Normally path mtu discovery should work (not to mention expecting that all
> internet is OK up to 1500 bytes) , but path mtu discovery expects that routers
> accurately know what the MTU is for their links to the neighbor.
> If a link between two routers drops packets less than the MTU configured on the
> router interfaces this behaviour is very hard to detect - it has become a black
> hole for oversized packets with no alert whatsoever.
> Worse - when "the" link between two routers is a bundle and one member in this
> bundle has errors or perhaps supports not the full MTU size .
> Usually traffic is balanced across links members based on a hash of source ip,
> destination ip and source/destination ports .
> That means the same client and same destination may, depending on 'chance'
> (source port here) encounter the problem link or not .
> Make sure that for your server logging you capture source IP but also source
> port .
> If you have a knowledgeable 'friendly user' that has the problem on Ziggo and
> that you can work with for troubleshooting I suggest a packet capture of their
> traffic (and 'all icmp' ) on your end, and ideally also on the user side.
> You want to capture 'all icmp' (not filter on source IP) , as any path mtu (mtu
> too big) icmp messages are sourced from "some router IP along the path" .
> Also helpful to do a full traceroute from both ends - as traffic may flow
> differently on the forward and return path .
> Do you have any indication it is "all Ziggo" , or perhaps limited to some IP
> ranges from Ziggo ?
> You can with ping try to manually find out about the MTU allowed to clients ,
> and vice versa.
> [I always need to think and double check is the size argument is payload, full
> IP packet,and how big ethernet headers are again ]
> Best regards, Boudewijn

>> Op 23-08-2023 14:18 CEST schreef Stefan van den Oord
>> <stefan+nlnog at medicinemen.eu>:
>> Dear NLNOG community,
>> I’d like to present to you a problem that we’re experiencing. To us it is very
>> strange, we are out of ideas. Of course solutions would be much appreciated,
>> but also ways of diagnosing this and work-arounds are very much appreciated.
>> Background: we’re a small Dutch company developing the Viduet platform: a
>> platform to help chronically ill patients better manage their wellbeing
>> together with their care providers. Our product is web-based and we also have a
>> mobile app.
>> The problem: since almost two weeks we’re getting reports from users that
>> sometimes (!) get connection timeouts in their browsers/apps when they connect
>> to our web platform. We have narrowed down the potential sources of the issue
>> and found a small setup to reproduce the issue:

>>    * We setup a clean Linode node in Frankfurt (smallest type, shared CPU) with
>>    Apache (just `apt install apache2`). Requesting the default apache index.html
>>    using `curl -i [ http://172.104.202.142%60/ | http://172.104.202.142` ] (or [
>>    http://172.104.202.142/large.html | http://172.104.202.142/large.html ] )
>>    causes timeouts more than 1 in 10 tries for some users, who have in common that
>>     they are all customers of the Ziggo internet provider.
>>     * Doing the same with a Linode node in Paris has the same result.
>>    * Doing the same with a Linode node in London works as you would expect, so does
>>     not have this strange behaviour.
>>    * Running `mtr -rwzbc100 172.104.202.142` (the Frankfurt node mentioned above)
>>     shows no packet loss, nothing out of the ordinary.
>>     * Running nginx instead of apache makes no difference: same issue
>>    * Forcing apache to use http/1.0 makes no difference: same issue (but frankly I
>>     don’t think it goes wrong on this level of the protocol stack)
>>    * There seems to be a relationship with the content length of the HTTP response.
>>    For shorter HTML files like the index.html on the Frankfurt node, it sometimes
>>    times out, and sometimes succeeds after a delay of a few seconds (and sometimes
>>    returns as fast as you’d expect). Using slightly larger HTML files it just
>>     times out.
>>    * I’m hesitating whether this is relevant at all, but when using HTTPS instead
>>     of HTTP, the problem also manifests in TLS handshake errors.
>> We have been in touch with Linode support and they found nothing out of the
>> ordinary on their side. They point to the Ziggo network, saying:

>>> The reason that the issue only exists from to the Frankfurt data center and not
>>> London is likely because the problem is related to the particular route that
>>> the traffic takes from one place to the other. While the MTRs you shared look
>>> good, I did find evidence in another ticket of a particular hop within Ziggo's
>>> network showing issues, but we can't say for sure what the issue is. Here is
>>> the information about the hop in case that helps with your communication with
>>> Ziggo:
>>> AS33915 [ http://asd-tr0021-cr101-be64.core.as9143.net/ | asd-tr ] [
>>> http://asd-tr0021-cr101-be64.core.as9143.net/ | 0021 ] [
>>> http://asd-tr0021-cr101-be64.core.as9143.net/ | -cr ] [
>>> http://asd-tr0021-cr101-be64.core.as9143.net/ | 101 ] [
>>> http://asd-tr0021-cr101-be64.core.as9143.net/ | -be ] [
>>> http://asd-tr0021-cr101-be64.core.as9143.net/ | 64 ] [
>>> http://asd-tr0021-cr101-be64.core.as9143.net/ | .core.as ] [
>>> http://asd-tr0021-cr101-be64.core.as9143.net/ | 9143 ] [
>>> http://asd-tr0021-cr101-be64.core.as9143.net/ | .net ] ( 213.51.64.193 )

>> Again, any help is much appreciated!
>> Kind regards,
>>>> Stefan van den Oord
>> CTO @ Medicine Men B.V.

>> Not in the office on Wednesdays

>> Regulierenring 22
>> 3981 LB Bunnik
>> The Netherlands
>> +31 85 1307020
>> _______________________________________________
>> NLNOG mailing list
>> NLNOG at nlnog.net
>> http://mailman.nlnog.net/listinfo/nlnog
> _______________________________________________
> NLNOG mailing list
> NLNOG at nlnog.net
> http://mailman.nlnog.net/listinfo/nlnog
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nlnog.net/pipermail/nlnog/attachments/20230823/f4332a9a/attachment-0001.html>


More information about the NLNOG mailing list