I recently moved my files to a new zfs-pool and used that chance to properly configure my datasets.
This led me to discovering zfs-deduplication.
As most of my storage is used by my jellyfin library (~7-8Tb), which is mostly uncompressed bluray rips I thought I might be able to save some storage using deduplication in addition to compression.
Has anyone here used that for similar files before? What was your experience with it?
I am not too worried about performance. The dataset in question is rarely changed. Basically only when I add more media every couple of months. I also have overshot my cpu-target when originally configuring my server so there is a lot of headroom there. I have 32Gb of ram which is not really fully utilized either (but I also would not mind upgrading to 64 too much).
My main concern is that I am unsure it is useful. I suspect just because of the amount of data and similarity in type there would statistically be a lot of block-level duplication but I could not find any real world data or experiences on that.
You should maybe read about the use cases for deduplication before using it. Here’s one recent article:
https://despairlabs.com/blog/posts/2024-10-27-openzfs-dedup-is-good-dont-use-it/
If you mostly store legit Blu-ray rips, the answer is probably no, you should not use zfs deduplication.
I was also going to link this. I started using zfs 10-ish years ago and used dedup when it came out, and it was really not worth it except for archiving a bunch of stuff I knew had gigs of duplicate data. Performance was so poor.
ZFS dedup is memory constrained, and the memory use scales with the block hashes.
If performance isn’t a concern, you’re better off compressing your media. You’ll get similar storage efficiency with less crash consistency risk.
You better off enabling compression on a dataset.
Dedupe, even with the recent improvements, has huge overheads and will generally degrade in performance as the dataset increases in size, as it needs to keep track of the ‘routing’ table in RAM to redirect the request deduplicated blocks to the actual stored data. Apparently the latest openZFS release reduces the speeds loses over larger datasets, but it’s still subpar compared to compressed data
Video files are already heavily compressed, you’d be better off transcoding it to a more efficient media codec, like X265 or AV1, to save space on video files