Cluster 1 Issues November 22, 2024
This morning we have seen a recurrence of the same CPU spikes that caused issues yesterday. Once again, it is limited to a single application server Cluster.
Despite working through millions of lines of logs yesterday, we have still not identified a Root Cause yet. However, we have formulated and enacted our mitigation strategies, so we should be able to restore full service much more quickly. We have also enabled more logging, in order to give us more information that will allow us to track down the current issues.
Please be assured that we are treating this as the highest priorities, and thank you for bearing with us.
ACTIONS
- We are spinning up a new server in the Cluster;
- using this, we will attempt to isolate each site in turn to identify and prove which are causing the problem;
- we are continuing to prioritise stabilisation first, and Root Cause next.