Trying to figure out if there is a way to do this without zfs sending a ton of data. I have:

  • s/test1, inside it are folders:
    • folder1
    • folder2

I have this pool backed up remotely by sending snapshots.

I’d like to split this up into:

  • s/test1, inside is folder:
    • folder1
  • s/test2, inside is folder:
    • folder2

I’m trying to figure out if there is some combination of clone and promote that would limit the amount of data needed to be sent over the network.

Or maybe there is some record/replay method I could do on snapshots that I’m not aware of.

Thoughts?

  • tvcvt@lemmy.ml
    link
    fedilink
    arrow-up
    3
    ·
    10 days ago

    I can’t think of a way off hand to match your scenario, but Ive heard ideas suggested that come close. This is exactly the type of question you should ask at practicalzfs.com.

    If you don’t know it, that’s Jim Salter’s forum (author of sanoid and syncoid) and there are some sharp ZFS experts hanging out there.

  • ReversalHatchery@beehaw.org
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    10 days ago

    what is your goal with this?

    do you still want to keep all the data in a single pool?
    if so, you could make datasets in the pool, and move the top directories into the datasets. datasets are basically dirs that can have special settings on how they are handled

    ninja edit: now that I think about it, moving across datasets probably makes that data to be resent.
    it would be easier to give advice by knowing why do you want to do this

    • fmstrat@lemmy.nowsci.comOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      9 days ago

      Yea your edit is the problem unfortunately. Moving across datasets would incur disk reads/writes and sending of terabytes of data.

      The goal in separating them out is because I want to be able to independently zfs send folder 1 somewhere without including folder 2. Poor choice of dataset layout when I built the array.

      • ReversalHatchery@beehaw.org
        link
        fedilink
        English
        arrow-up
        1
        ·
        8 days ago

        hmm I see. and why do you want that? balancing storage usage between backup sites? one of them is too little for the whole pool?

        for now I don’t have a better idea, sorry. maybe this is the second best time to think up a structure for the datasets, and move everything into it.
        but if the reason is the latter, one backup site cant hold the whole pool, you may need to reorganize it again in the future. and that’s not an easy thing, because now you’ll have the same data (files of the same category) scattered around the FS tree even locally. maybe you could ease that with something like mergerfs, and having it write each file to the dataset with lower storage usage.

        if you are ready to reorganize, think about what kinds (and subkinds) of files will you be likely to store in a larger amount, like media/video, media/image, and don’t forget to take advantage of per-dataset storage settings, like for compression, recordsize, maybe caching. not everything needs its own custom recordsize, but for contiguously read files a higher value might be better, also if its not too often accessed and want better compression ratio as compression (and checksumming!) happens per records. video is sometimes compressible, or rather some larger data blob inside the container