NAPs going offline when modifications made to unrelated IP interfaces

Hi

For a long time, we are experiencing a strange issue on our baremetal installation of ProSBC (3.1.147.16).

Whenever you make a modification on any IP interface/address, a large number of NAPs goes down, even if the changed IP addresses have no relation to the NAPs in question.

Once the NAPs have gone down, besides rebooting the ProSBC, another workaround that sometimes helps is to go into any IP interface, modify the IP address temporarily, activate the configuration, then revert the IP interface back to previous/correct settings. This seems to be helpful about 95% of the time.

We are at our wit’s end with this issue. It has been reported to TelcoBridges before (in the previous support model) but it was not resolved (perhaps we were unable to get it communicated correctly :man_shrugging: ). So while we might be able to request management to purchase support, the concern is - could we be sure it will be solved?

Something I have observed is that it is less likely to happen if the proxy address set in the NAPs is exactly the same as the Gateway IP address specified on the IP interface.

The physical interfaces on the server are as follows:

04:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 2-port Gigabit Ethernet PCIe
04:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 2-port Gigabit Ethernet PCIe
18:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
18:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
18:00.2 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
18:00.3 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)

For SIP/RTP, we are using three of the I350 ports (one is facing our softswitch, one is facing customers connected over the internet, and the third is facing customers connected over point-to-point links). One of the BCM5720 is used for management.

We are now wondering if the BCM5720 be the cause of the instability, although the documentations seems to suggest that the NIC whitelist is only important for SIP/RTP?

If anyone has any idea/suggestion, we would appreciate it. Can provide any additional information that may be helpful.

Thank you for the feedback.

I recommend to review below documentation and the support NIC type to start with:
Baremetal Installation | ProSBC Documentation
List of Supported Network Interface Cards | ProSBC Documentation

If you face unexpected behavior, please send an email to support@telcobridges.com.

BCM5720 can be used for “management” (mgmt0) and I350 for “LAN/WAN” ports, used for SIP traffic. Other interfaces should be marked as “unused”.

We notice in https://telcobridges.atlassian.net/wiki/spaces/PUBLICDOC/pages/1200554023/3.3.10.59 that there is:

“Tracking 26270 - Inconsistent behavior with multiple network interfaces sharing the same network link - TB-5816
More robust network interface management. Now able to handle complex scenarios where multiple subnets share the same network interface, with or without their default route.”

This sounds related to our issue, so we will schedule an upgrade soon.

Is it possible to get more information about the tickets/issues referenced in that note?

Hi,

We’ve seen this behavior as well, mostly after version 3.1.145. In our experience, possible causes include:

  • IP interfaces without associated NAPs,
  • multiple aliases configured on the same interface,
  • default gateways pointing to different routers,
  • default gateways configured on IPs that do not respond to ARP (ARP resolution failures can affect other interfaces and lead to intermittent NAP flapping, especially when interfaces share the same physical NIC or VLAN),
  • or even different IPs from the same subnet assigned to multiple interfaces.

One recommendation is to keep the IP interface configuration as simple as possible. Also, whenever you change something related to IP interfaces, it often helps to reboot the SBC, even if Toolpack does not explicitly request it (similar to what is required after a software update).

This issue tends to occur more frequently on SBCs with a large number of NAPs (50+).

We’re also planning upgrades, since in release 3.3.10 TelcoBridges addressed “more robust network interface management” for complex scenarios involving multiple subnets and routes (TB-5816). It might be worth testing if upgrading to 3.3.10 improves stability in your case as well.

Let us know if you see any difference after upgrading.

Best regards,
Allan

Replying to mention that 3.3.10.59 seems to have resolved this issue for us.