NVME Cache failure strategies

Here’s a topic I have never seen covered.
What’s the recovery strategy in the following 3 cases, each in increasing risk of severity.

  1. A single read-only nvme cache drive failing.
  2. A pair of read-write nvme cache drives failing.
  3. A pair of read-write nvme cache drives with pinned metadata failing.

I would imagine for the first case it’s simply a case of shutting down the diskstation, and either replacing the nvme with a new one, or setting DSM to no longer use a cache for the volume.

For a read-write cache failure, I’ve seen people say the volumes on the sata drives require a rebuild. I don’t know if that’s true.

For a r-w cache failure with pinned metadata, the only comment I saw in searches was loss of data and having to restore from an external backup.

I have never tested or confirmed any of those 3 failure modes, so I don’t know what the outcomes are.

There are so many “guides” online that say to add caches, but never talk about the risk to your data volumes when the caches fail.

Does anyone have any insight here?
Thanks!

Hi,
Good questions. I agree that few guides (if any) discuss recovering when functionality is lost.

I assume a read cache failure has little or no impact. A failing read-write cache is a different story. It may contain data that has not yet been committed to disk. I expect that this may corrupt the volume, but each case might be different depending on the activity during failure, the kind of failure, and so on. To what extent pinned metadata affects the result, I have no idea. Is the metadata stored in the cache in read-only mode or read-write mode? I do not know.

I would love to test it once in the future, and I have a spare device with SSD NVMe modules. However, I intend to remove these modules only during the power-off state, which affects the test results.