Drive 1 of 4 failed in the middle of D4 Repair

Setup:
version: DS920+
RAID: SHR
Drives: started with (4) 4TB ironwolf drives using 1 volume
Current: (4) 14TB ironwolf drives inserted.

  • D1 - Critical
  • D2 - Healthy
  • D3 - Healthy
  • D4 - Repairing (39% done. Currently incrementing at about .08% per 24 hour period)
    (2) 1TB NVME drives configured as a second Volume. Both Healthy
    DSM 7.2
    —–
    A week ago it had (4) ironwolf 4TB drives.
    I bought 4 new ironwolf 14TB drives
    No Backup
    I replaced Drive 1 - 3 and the volume repaired just fine each time.
    During Drive 4 replacement, Drive 1 went CRITICAL with UNC errors.
    The system slowed to barely useable. Simple actions like logging into the web interface took 5-10 minutes and many times failed.
    SSH also took 5-10 minutes to login and then another 5 minutes when I did a sudo -i
  • D1 utilization at 100%
    All other drives in volume are typically 0% but occasionaly show some usage.

I have done a number of things to try and work through this.

  • Backing up during its current state is possible but so slow that I think it may take months at its current rate.
  • I removed the UNC errors in the database and restarted the machine to try and force a Healthy status in case it was a false error. Within 10 minutes of the system coming online, new UNC errors showed up so the the drive is going bad.

I don’t know what to do at this point. I want to backup in the fastest way possible but I can’t seem to speed that process up because D4 isn’t able to be used yet in the volume. Anyone have ideas on how to proceed?

Difficult. What you did not mention (and it is the most relevant piece of information in the entire story) is the RAID type of your storage pool. I assume it is SHR or RAID 5 with 1-drive fault tolerance.

The fact that you do not have a backup is something you probably regret much. But if I may ask, what were your considerations not to have one made regularly? I try to understand the logic people have.

Hi Paul,
You are correct, it is SHR. I’ll edit my original to include that important bit of info.
Looking back, I can see what happened. The NAS was something I wanted for my home for sometime. I wanted to move my HTPC (Plex) and the surrounding services into a NAS run by docker. I picked up the synology because of the price. Got lucky on ebay and got it for $400 including the four 4TB drives. I then began the evolving journey of building a sweet home set up and I think I forgot along the way how hard it would be to rebuild if I lost everything. Nothing magic, just evolved into a more complex system over time. At the beginning, I was playing and felt like I would rebuild if I screwed something up. Now however, having this issue show up, I am making a list of what it will take to rebuild and I wish sincerely that I had a full backup.
Side note, I just setup an S3 Glacier bucket for this purpose about 2 weeks ago and I was pushing small amounts of stuff up to it to test and see what the costs looked like. Again, kinda wish I had pushed the whole thing now.

Hi Boyd,
Thanks for sharing. Makes sense. Rebuilding a RAID with a new HDD often goes well, but not always. All the disks are pushed to their limits.

With clients, I always make a backup first or demand it be made before proceeding with a drive swap.

2 Likes