“legal” ARP poisoning by machine aggregating 2 NICs crashes us
Strange things are afoot, threats are being made and we need to sort this problem out;
The situation:
Our device (a network camera) streams video over a network to a recorder/server (Using Live555 / WIS Streamer). The video is UDP packets.
On one particular site using one particular server, every so often (~24 hours) one thread of the Live555 streamer locks up whilst sending video. Other threads keep going, and we still have connectivity to the camera over IP - see web pages from it, PING it, etc.
We suspect: the server; it has 2 network ports and aggregates them - it has two MAC's but one IP address. On wiresharking this, we see the camera streaming to one port (let's call it A), we then get an ARP from the other port (let's call it B), our device stops squirting packets to MAC A, squirts one packet up the wire to MAC B and then appears to stop in its tracks.
Further info: The server seems to corrupt ARP packets from the "wrong" port, possibly as result of a misconfiguration or somesuch, but those packets still get read & acted upon by our device, possibly as a result of our driver or kernel networking being misconfigured or skipping checksums to save CPU cycles.
So this messy situation begs a few questions:
- Where in the kernel networking code should I be looking to check the packet checksum or enable checking? Our hardware is fixed, being an embedded device, so a tweak made to the driver is not the worst idea ever.
- Can anyone guess the failure mechanism that causes a process to lock up when it's constantly
send()
ing data on a port and the ARP tables shift underneath it?
Edited to add: We now suspect that the ARPs are not really corrupt, just that Wireshark is not correctly identifying the packet (it thinks the packet is long enough that there must be a FSC word, but we now think it's just zero-padding). That really just leaves part 2 of this question: what can we do to prevent this change in the ARP table knocking a transmitting process over?
Edit to further add: I don't want people to think I'm ignoring questions about port states or process states, the issue happens very rarely (average maybe once per 24h) and only on one (remote) installation that we can't easily get access to, we're trying hard to replicate it in the lab so we can do more detailed diagnostics but the system watchdog resets within ~3 mins of the problem occurring, so by the time the news reaches us it's already rebooted and started working OK.
Edit to add Wireshark info:
I'm not sure the best way to summarise wireshark captures here (very hard to upload ~1Tb of captured packets!) but I'll try. Cam:X
& Cam:Y
are two streams of RTSP video streamed by two identical instances of Live555 WIS Streamer from different ports. Server 'A' and 'B' are the MACs of the two NICs on the server.
The sequence of packets goes like this:
UDP Packet from Cam:X -> Server 'A'
UDP Packet from Cam:Y -> Server 'A'
UDP Packet from Cam:X -> Server 'A'
UDP Packet from Cam:Y -> Server 'A'
UDP Packet from Cam:X -> Server 'A'
UDP Packet from Cam:Y -> Server 'A'
ARP Packet to Cam from Server 'B' "<my IP> is now on 'B'"
Intel ANS Probe broadcast from Server 'B', Sender ID '1' team ID 'B'
Intel ANS Probe broadcast from Server 'A', Sender ID '2' team ID 'B'
<silence> from Cam:X
UDP Packet from Cam:Y -> Server 'B'
UDP Packet from Cam:Y -> Server 'B'
UDP Packet from Cam:Y -> Server 'B'
There are no other packets in the stream at or around this time. The Intel ANS packets do not always coincide with the ARPs from the NIC but I thought I'd include them for the sake of completeness.
The issue seems to be VERY sensitive to timing, we see these "team" ARPs regularly from the server and only once in a blue moon do they cause us an issue - as if there's a particular point in the network stack code that's sensitive to the ARP table changing. It's not always the same stream instance that falls over, and notably the other instance (as well as all other net traffic - HTTP etc.) continues to work fine.
It sounds like teamed NICs "should not" ARP like this mid-session, but of course they won't be aware of any session when the traffic is all UDP.
networking ip arp
|
show 2 more comments
Strange things are afoot, threats are being made and we need to sort this problem out;
The situation:
Our device (a network camera) streams video over a network to a recorder/server (Using Live555 / WIS Streamer). The video is UDP packets.
On one particular site using one particular server, every so often (~24 hours) one thread of the Live555 streamer locks up whilst sending video. Other threads keep going, and we still have connectivity to the camera over IP - see web pages from it, PING it, etc.
We suspect: the server; it has 2 network ports and aggregates them - it has two MAC's but one IP address. On wiresharking this, we see the camera streaming to one port (let's call it A), we then get an ARP from the other port (let's call it B), our device stops squirting packets to MAC A, squirts one packet up the wire to MAC B and then appears to stop in its tracks.
Further info: The server seems to corrupt ARP packets from the "wrong" port, possibly as result of a misconfiguration or somesuch, but those packets still get read & acted upon by our device, possibly as a result of our driver or kernel networking being misconfigured or skipping checksums to save CPU cycles.
So this messy situation begs a few questions:
- Where in the kernel networking code should I be looking to check the packet checksum or enable checking? Our hardware is fixed, being an embedded device, so a tweak made to the driver is not the worst idea ever.
- Can anyone guess the failure mechanism that causes a process to lock up when it's constantly
send()
ing data on a port and the ARP tables shift underneath it?
Edited to add: We now suspect that the ARPs are not really corrupt, just that Wireshark is not correctly identifying the packet (it thinks the packet is long enough that there must be a FSC word, but we now think it's just zero-padding). That really just leaves part 2 of this question: what can we do to prevent this change in the ARP table knocking a transmitting process over?
Edit to further add: I don't want people to think I'm ignoring questions about port states or process states, the issue happens very rarely (average maybe once per 24h) and only on one (remote) installation that we can't easily get access to, we're trying hard to replicate it in the lab so we can do more detailed diagnostics but the system watchdog resets within ~3 mins of the problem occurring, so by the time the news reaches us it's already rebooted and started working OK.
Edit to add Wireshark info:
I'm not sure the best way to summarise wireshark captures here (very hard to upload ~1Tb of captured packets!) but I'll try. Cam:X
& Cam:Y
are two streams of RTSP video streamed by two identical instances of Live555 WIS Streamer from different ports. Server 'A' and 'B' are the MACs of the two NICs on the server.
The sequence of packets goes like this:
UDP Packet from Cam:X -> Server 'A'
UDP Packet from Cam:Y -> Server 'A'
UDP Packet from Cam:X -> Server 'A'
UDP Packet from Cam:Y -> Server 'A'
UDP Packet from Cam:X -> Server 'A'
UDP Packet from Cam:Y -> Server 'A'
ARP Packet to Cam from Server 'B' "<my IP> is now on 'B'"
Intel ANS Probe broadcast from Server 'B', Sender ID '1' team ID 'B'
Intel ANS Probe broadcast from Server 'A', Sender ID '2' team ID 'B'
<silence> from Cam:X
UDP Packet from Cam:Y -> Server 'B'
UDP Packet from Cam:Y -> Server 'B'
UDP Packet from Cam:Y -> Server 'B'
There are no other packets in the stream at or around this time. The Intel ANS packets do not always coincide with the ARPs from the NIC but I thought I'd include them for the sake of completeness.
The issue seems to be VERY sensitive to timing, we see these "team" ARPs regularly from the server and only once in a blue moon do they cause us an issue - as if there's a particular point in the network stack code that's sensitive to the ARP table changing. It's not always the same stream instance that falls over, and notably the other instance (as well as all other net traffic - HTTP etc.) continues to work fine.
It sounds like teamed NICs "should not" ARP like this mid-session, but of course they won't be aware of any session when the traffic is all UDP.
networking ip arp
An IP is like the name in a phone book, and the MAC the actual file number...so if it indeed it changes, the call is gone.
– Rui F Ribeiro
Feb 15 '16 at 19:11
1
Well, yes, but I'm curious about how asend()
call can block/lock/crash when the ARP table changes rather than failing gracefully?
– John U
Feb 15 '16 at 19:19
The thread may keep in the sleep state waiting for data that never arrive I guess. Hard to tell. Do you have console? What doesps axms
says about the thread when it happens?
– Rui F Ribeiro
Feb 16 '16 at 9:00
We don't have the-axms
option onps
, it's an embedded system running busybox so relatively limited command set. Currently we can't reproduce the issue on demand as it's a bit challenging to craft corrupted packets of the correct form to order.
– John U
Feb 16 '16 at 11:25
Link aggregation may be the cause, if configured uncorrectly. Do you get ICMP answers shortly before the transmission stops, and shortly after the single packet to "B" ?
– gerhard d.
Feb 17 '16 at 11:04
|
show 2 more comments
Strange things are afoot, threats are being made and we need to sort this problem out;
The situation:
Our device (a network camera) streams video over a network to a recorder/server (Using Live555 / WIS Streamer). The video is UDP packets.
On one particular site using one particular server, every so often (~24 hours) one thread of the Live555 streamer locks up whilst sending video. Other threads keep going, and we still have connectivity to the camera over IP - see web pages from it, PING it, etc.
We suspect: the server; it has 2 network ports and aggregates them - it has two MAC's but one IP address. On wiresharking this, we see the camera streaming to one port (let's call it A), we then get an ARP from the other port (let's call it B), our device stops squirting packets to MAC A, squirts one packet up the wire to MAC B and then appears to stop in its tracks.
Further info: The server seems to corrupt ARP packets from the "wrong" port, possibly as result of a misconfiguration or somesuch, but those packets still get read & acted upon by our device, possibly as a result of our driver or kernel networking being misconfigured or skipping checksums to save CPU cycles.
So this messy situation begs a few questions:
- Where in the kernel networking code should I be looking to check the packet checksum or enable checking? Our hardware is fixed, being an embedded device, so a tweak made to the driver is not the worst idea ever.
- Can anyone guess the failure mechanism that causes a process to lock up when it's constantly
send()
ing data on a port and the ARP tables shift underneath it?
Edited to add: We now suspect that the ARPs are not really corrupt, just that Wireshark is not correctly identifying the packet (it thinks the packet is long enough that there must be a FSC word, but we now think it's just zero-padding). That really just leaves part 2 of this question: what can we do to prevent this change in the ARP table knocking a transmitting process over?
Edit to further add: I don't want people to think I'm ignoring questions about port states or process states, the issue happens very rarely (average maybe once per 24h) and only on one (remote) installation that we can't easily get access to, we're trying hard to replicate it in the lab so we can do more detailed diagnostics but the system watchdog resets within ~3 mins of the problem occurring, so by the time the news reaches us it's already rebooted and started working OK.
Edit to add Wireshark info:
I'm not sure the best way to summarise wireshark captures here (very hard to upload ~1Tb of captured packets!) but I'll try. Cam:X
& Cam:Y
are two streams of RTSP video streamed by two identical instances of Live555 WIS Streamer from different ports. Server 'A' and 'B' are the MACs of the two NICs on the server.
The sequence of packets goes like this:
UDP Packet from Cam:X -> Server 'A'
UDP Packet from Cam:Y -> Server 'A'
UDP Packet from Cam:X -> Server 'A'
UDP Packet from Cam:Y -> Server 'A'
UDP Packet from Cam:X -> Server 'A'
UDP Packet from Cam:Y -> Server 'A'
ARP Packet to Cam from Server 'B' "<my IP> is now on 'B'"
Intel ANS Probe broadcast from Server 'B', Sender ID '1' team ID 'B'
Intel ANS Probe broadcast from Server 'A', Sender ID '2' team ID 'B'
<silence> from Cam:X
UDP Packet from Cam:Y -> Server 'B'
UDP Packet from Cam:Y -> Server 'B'
UDP Packet from Cam:Y -> Server 'B'
There are no other packets in the stream at or around this time. The Intel ANS packets do not always coincide with the ARPs from the NIC but I thought I'd include them for the sake of completeness.
The issue seems to be VERY sensitive to timing, we see these "team" ARPs regularly from the server and only once in a blue moon do they cause us an issue - as if there's a particular point in the network stack code that's sensitive to the ARP table changing. It's not always the same stream instance that falls over, and notably the other instance (as well as all other net traffic - HTTP etc.) continues to work fine.
It sounds like teamed NICs "should not" ARP like this mid-session, but of course they won't be aware of any session when the traffic is all UDP.
networking ip arp
Strange things are afoot, threats are being made and we need to sort this problem out;
The situation:
Our device (a network camera) streams video over a network to a recorder/server (Using Live555 / WIS Streamer). The video is UDP packets.
On one particular site using one particular server, every so often (~24 hours) one thread of the Live555 streamer locks up whilst sending video. Other threads keep going, and we still have connectivity to the camera over IP - see web pages from it, PING it, etc.
We suspect: the server; it has 2 network ports and aggregates them - it has two MAC's but one IP address. On wiresharking this, we see the camera streaming to one port (let's call it A), we then get an ARP from the other port (let's call it B), our device stops squirting packets to MAC A, squirts one packet up the wire to MAC B and then appears to stop in its tracks.
Further info: The server seems to corrupt ARP packets from the "wrong" port, possibly as result of a misconfiguration or somesuch, but those packets still get read & acted upon by our device, possibly as a result of our driver or kernel networking being misconfigured or skipping checksums to save CPU cycles.
So this messy situation begs a few questions:
- Where in the kernel networking code should I be looking to check the packet checksum or enable checking? Our hardware is fixed, being an embedded device, so a tweak made to the driver is not the worst idea ever.
- Can anyone guess the failure mechanism that causes a process to lock up when it's constantly
send()
ing data on a port and the ARP tables shift underneath it?
Edited to add: We now suspect that the ARPs are not really corrupt, just that Wireshark is not correctly identifying the packet (it thinks the packet is long enough that there must be a FSC word, but we now think it's just zero-padding). That really just leaves part 2 of this question: what can we do to prevent this change in the ARP table knocking a transmitting process over?
Edit to further add: I don't want people to think I'm ignoring questions about port states or process states, the issue happens very rarely (average maybe once per 24h) and only on one (remote) installation that we can't easily get access to, we're trying hard to replicate it in the lab so we can do more detailed diagnostics but the system watchdog resets within ~3 mins of the problem occurring, so by the time the news reaches us it's already rebooted and started working OK.
Edit to add Wireshark info:
I'm not sure the best way to summarise wireshark captures here (very hard to upload ~1Tb of captured packets!) but I'll try. Cam:X
& Cam:Y
are two streams of RTSP video streamed by two identical instances of Live555 WIS Streamer from different ports. Server 'A' and 'B' are the MACs of the two NICs on the server.
The sequence of packets goes like this:
UDP Packet from Cam:X -> Server 'A'
UDP Packet from Cam:Y -> Server 'A'
UDP Packet from Cam:X -> Server 'A'
UDP Packet from Cam:Y -> Server 'A'
UDP Packet from Cam:X -> Server 'A'
UDP Packet from Cam:Y -> Server 'A'
ARP Packet to Cam from Server 'B' "<my IP> is now on 'B'"
Intel ANS Probe broadcast from Server 'B', Sender ID '1' team ID 'B'
Intel ANS Probe broadcast from Server 'A', Sender ID '2' team ID 'B'
<silence> from Cam:X
UDP Packet from Cam:Y -> Server 'B'
UDP Packet from Cam:Y -> Server 'B'
UDP Packet from Cam:Y -> Server 'B'
There are no other packets in the stream at or around this time. The Intel ANS packets do not always coincide with the ARPs from the NIC but I thought I'd include them for the sake of completeness.
The issue seems to be VERY sensitive to timing, we see these "team" ARPs regularly from the server and only once in a blue moon do they cause us an issue - as if there's a particular point in the network stack code that's sensitive to the ARP table changing. It's not always the same stream instance that falls over, and notably the other instance (as well as all other net traffic - HTTP etc.) continues to work fine.
It sounds like teamed NICs "should not" ARP like this mid-session, but of course they won't be aware of any session when the traffic is all UDP.
networking ip arp
networking ip arp
edited 39 mins ago
Rui F Ribeiro
40.1k1479136
40.1k1479136
asked Feb 15 '16 at 18:53
John UJohn U
18519
18519
An IP is like the name in a phone book, and the MAC the actual file number...so if it indeed it changes, the call is gone.
– Rui F Ribeiro
Feb 15 '16 at 19:11
1
Well, yes, but I'm curious about how asend()
call can block/lock/crash when the ARP table changes rather than failing gracefully?
– John U
Feb 15 '16 at 19:19
The thread may keep in the sleep state waiting for data that never arrive I guess. Hard to tell. Do you have console? What doesps axms
says about the thread when it happens?
– Rui F Ribeiro
Feb 16 '16 at 9:00
We don't have the-axms
option onps
, it's an embedded system running busybox so relatively limited command set. Currently we can't reproduce the issue on demand as it's a bit challenging to craft corrupted packets of the correct form to order.
– John U
Feb 16 '16 at 11:25
Link aggregation may be the cause, if configured uncorrectly. Do you get ICMP answers shortly before the transmission stops, and shortly after the single packet to "B" ?
– gerhard d.
Feb 17 '16 at 11:04
|
show 2 more comments
An IP is like the name in a phone book, and the MAC the actual file number...so if it indeed it changes, the call is gone.
– Rui F Ribeiro
Feb 15 '16 at 19:11
1
Well, yes, but I'm curious about how asend()
call can block/lock/crash when the ARP table changes rather than failing gracefully?
– John U
Feb 15 '16 at 19:19
The thread may keep in the sleep state waiting for data that never arrive I guess. Hard to tell. Do you have console? What doesps axms
says about the thread when it happens?
– Rui F Ribeiro
Feb 16 '16 at 9:00
We don't have the-axms
option onps
, it's an embedded system running busybox so relatively limited command set. Currently we can't reproduce the issue on demand as it's a bit challenging to craft corrupted packets of the correct form to order.
– John U
Feb 16 '16 at 11:25
Link aggregation may be the cause, if configured uncorrectly. Do you get ICMP answers shortly before the transmission stops, and shortly after the single packet to "B" ?
– gerhard d.
Feb 17 '16 at 11:04
An IP is like the name in a phone book, and the MAC the actual file number...so if it indeed it changes, the call is gone.
– Rui F Ribeiro
Feb 15 '16 at 19:11
An IP is like the name in a phone book, and the MAC the actual file number...so if it indeed it changes, the call is gone.
– Rui F Ribeiro
Feb 15 '16 at 19:11
1
1
Well, yes, but I'm curious about how a
send()
call can block/lock/crash when the ARP table changes rather than failing gracefully?– John U
Feb 15 '16 at 19:19
Well, yes, but I'm curious about how a
send()
call can block/lock/crash when the ARP table changes rather than failing gracefully?– John U
Feb 15 '16 at 19:19
The thread may keep in the sleep state waiting for data that never arrive I guess. Hard to tell. Do you have console? What does
ps axms
says about the thread when it happens?– Rui F Ribeiro
Feb 16 '16 at 9:00
The thread may keep in the sleep state waiting for data that never arrive I guess. Hard to tell. Do you have console? What does
ps axms
says about the thread when it happens?– Rui F Ribeiro
Feb 16 '16 at 9:00
We don't have the
-axms
option on ps
, it's an embedded system running busybox so relatively limited command set. Currently we can't reproduce the issue on demand as it's a bit challenging to craft corrupted packets of the correct form to order.– John U
Feb 16 '16 at 11:25
We don't have the
-axms
option on ps
, it's an embedded system running busybox so relatively limited command set. Currently we can't reproduce the issue on demand as it's a bit challenging to craft corrupted packets of the correct form to order.– John U
Feb 16 '16 at 11:25
Link aggregation may be the cause, if configured uncorrectly. Do you get ICMP answers shortly before the transmission stops, and shortly after the single packet to "B" ?
– gerhard d.
Feb 17 '16 at 11:04
Link aggregation may be the cause, if configured uncorrectly. Do you get ICMP answers shortly before the transmission stops, and shortly after the single packet to "B" ?
– gerhard d.
Feb 17 '16 at 11:04
|
show 2 more comments
1 Answer
1
active
oldest
votes
Well if only to give some closure to this the customer reconfigured their dodgy network card and everything worked, so unfortunately for the curious that means no-one is going to pay anyone to look too closely at what could've been done to fix that case.
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f263450%2flegal-arp-poisoning-by-machine-aggregating-2-nics-crashes-us%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Well if only to give some closure to this the customer reconfigured their dodgy network card and everything worked, so unfortunately for the curious that means no-one is going to pay anyone to look too closely at what could've been done to fix that case.
add a comment |
Well if only to give some closure to this the customer reconfigured their dodgy network card and everything worked, so unfortunately for the curious that means no-one is going to pay anyone to look too closely at what could've been done to fix that case.
add a comment |
Well if only to give some closure to this the customer reconfigured their dodgy network card and everything worked, so unfortunately for the curious that means no-one is going to pay anyone to look too closely at what could've been done to fix that case.
Well if only to give some closure to this the customer reconfigured their dodgy network card and everything worked, so unfortunately for the curious that means no-one is going to pay anyone to look too closely at what could've been done to fix that case.
answered Mar 16 '16 at 15:11
John UJohn U
18519
18519
add a comment |
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f263450%2flegal-arp-poisoning-by-machine-aggregating-2-nics-crashes-us%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
An IP is like the name in a phone book, and the MAC the actual file number...so if it indeed it changes, the call is gone.
– Rui F Ribeiro
Feb 15 '16 at 19:11
1
Well, yes, but I'm curious about how a
send()
call can block/lock/crash when the ARP table changes rather than failing gracefully?– John U
Feb 15 '16 at 19:19
The thread may keep in the sleep state waiting for data that never arrive I guess. Hard to tell. Do you have console? What does
ps axms
says about the thread when it happens?– Rui F Ribeiro
Feb 16 '16 at 9:00
We don't have the
-axms
option onps
, it's an embedded system running busybox so relatively limited command set. Currently we can't reproduce the issue on demand as it's a bit challenging to craft corrupted packets of the correct form to order.– John U
Feb 16 '16 at 11:25
Link aggregation may be the cause, if configured uncorrectly. Do you get ICMP answers shortly before the transmission stops, and shortly after the single packet to "B" ?
– gerhard d.
Feb 17 '16 at 11:04