Completed -
The switch replacement was completed successfully, and clients should no longer experience connectivity issues. Our team is actively monitoring the cluster for any recurrence of the problem.
If you encounter any issues, please contact the Nebius Support team.
Dec 3, 19:32 CET
Update -
After monitoring the previously rebooted InfiniBand switch, we discovered that the issue persists. Therefore, the switch will be replaced. During this work, clients may experience InfiniBand connectivity downtime on the same 72 affected GPU nodes for approximately 30–40 minutes.
We apologize for the inconvenience and recommend verifying your workloads once the maintenance is completed. If you encounter any issues, please contact the Nebius Support team.
Dec 3, 17:28 CET
In progress -
Scheduled maintenance is currently in progress. We will provide updates as necessary.
Dec 3, 17:11 CET
Scheduled -
We will reboot one XDR (InfiniBand) switch between now and 20:00 CET (19:00 GMT) to improve interconnect stability. Up to 72 GPU nodes may be affected. During this time, customers may experience partial InfiniBand connectivity interruptions of up to 20 minutes.
We apologize for any inconvenience and kindly ask you to contact the Nebius Support team if you encounter issues.
Dec 3, 17:10 CET