Issue:
Overall, Vault performance degrades across the entire organization in a relatively quick, but consistent fashion, as all users are affected. Users reported that basic Vault tasks begin to take an inordinate amount of time to complete, or they fail to complete altogether.
Typical troubleshooting steps such as rebooting the server, restarting IIS and SQL, repairing the ADMS, etc. all fail to have any meaningful impact on the performance issue.
In one case it was also noted that the server CPU would at times stay maxed at 100% utilization, with most of the resources being taken by the IIS Worker Process (Vault).
Causes:
A single specific root cause was never determined; however, some contributing factors were determined to be as follows:
• Total Vault size was double digit Terabytes in size.
• Multiple thousands of users had been bulk imported via active directory groups.
• An overwhelming number of open connections degraded overall performance.
Solution:
The final resolution was to migrate the Virtual Server’s host from the existing host to a new host. Once this was done performance returned to acceptable, pre-incident levels. A subsequent move back to the original VM host showed no return of the degraded performance. We speculate that possible connection instability, combined with the high number of open connects strongly contributed to the overall performance degradation. Switching VM hosts appeared to reset the problem components and allowed proper functionality to return.
About the Author
Follow on Linkedin More Content by Heath White