I had a major issue with our company's primary Hyper-V Server. It was housing our primary AD controller, User Storage and Share, along with 2 other VMs. The symptoms were major IO read/writing, reports of disconnects, and a loss of access to the AD, and user files. According to our log files we were getting iANSMiniport, and Intel Nvmestor errors.
Here are some log file samples
According to the logs the system starting giving Event ID 129 warnings and this continued to happen every 10 seconds and affected our DHCP, DNS and user logons. After a forced shutdown everything was fine according to the logs until later that morning when users started to have lag and login issues.
You can read more about Event ID 129 Here
We sent the server in for diagnostics and according to the Authorized Service Depot, they could not find anything out of date except for the bios.
Since this was a "Main Production Server" it was decided that we would move the virtual ad controller from the production server to a dev server to run until the failure of the drive could be determined so the organization could continue to run.
Here are some log file samples
According to the logs the system starting giving Event ID 129 warnings and this continued to happen every 10 seconds and affected our DHCP, DNS and user logons. After a forced shutdown everything was fine according to the logs until later that morning when users started to have lag and login issues.
You can read more about Event ID 129 Here
We sent the server in for diagnostics and according to the Authorized Service Depot, they could not find anything out of date except for the bios.
Since this was a "Main Production Server" it was decided that we would move the virtual ad controller from the production server to a dev server to run until the failure of the drive could be determined so the organization could continue to run.
After doing a second forced shutdown and boot up I was able to shut down the virtual machines running on the servers and did a full export of the data to an external drive. This made sure we did not lose any data but it did inconvenience some users as the data had to be copied back from the dev server.
Symptoms
When this issue occurs, your cluster may experience any of the following symptoms:
- Slow workload performance
- Virtual disks in the cluster that have an Operational Status value of Detached or No Redundancy.
- Physical disks that report a status of Lost Communication or IO Error.
I haven't had a chance to verify the issues have been 100% corrected but from the testing I have done some major stress testing on the storage spaces setup using hyper-v to do mass exports of VM's to the storage array with no issues at all, HD tune and crystal disk mark have also shown the Storage Spaces Array to be in good shape.
https://downloadcenter.intel.com/download/28175/Intel-Solid-State-Drive-Toolbox?_ga=2.179938403.692315852.1538156647-366274783.1537922699
https://downloadcenter.intel.com/download/28035/Intel-SSD-Data-Center-Tool-Intel-SSD-DCT-?_ga=2.179938403.692315852.1538156647-366274783.1537922699
https://downloadcenter.intel.com/download/27517/Datacenter-NVMe-Microsoft-Windows-Drivers-for-Intel-SSDs?_ga=2.192759661.692315852.1538156647-366274783.1537922699