Cthulu

Posted on

Cthulu is a filesystem project of mine because my Pi’s power supply would undervolt and reset my harddrives, making btrfs have to constantly repair. Btrfs has the weakness (or design choice) of wanting to force the admin to manually repair things before a remount, so degraded mounts are not automatic.

TODO: copy over manual page and examples. Maybe put mdbook inside here…

Cthulu

Cthulu is a transparent overlay FUSE of remote sane directories. It balances, duplicates, & repairs in the background per your subpath-specific directives.

While data is durable if it got to the first remote safely, and gets more durable with time, the cache is dangerous.

  • remotes are sane filesystems on their own (not databases/erasurecoded)
    • add remotes using ssh paths (ssh://[email protected]:/path), binaries copy over & start automatically
    • files are stored in the same layout as your fuse, minus permissions/attributes.
    • A remote may not necessarily have all files in any folder. To trade-off between striping on remote adds/removes and export speed, folders are spread out to 2 x num_copies hosts.
  • degraded/self-healing mounts come standard
    • all operations may occur whether mounted or not
    • As long as one copy is live, the file path is live.
    • The current version of a file locks to one remote until closed for 10 minutes, then it replicates per your specifications.
    • minimum of one copies at all times
    • repairs happen automatically unless the cluster has a conflict, allowing admins to resolve it.
  • per path directives are followed in the background
    • import or export individual paths to & from remotes
    • copies per path may be set cluster-wide (default 3)
    • quotas per path may be set per remote (default 90%)
    • herding (require/drain/balance) per path may be set per remote or cluster wide (default balance)
    • at least one copy must exist. quota outranks copies otherwise, copies outranks drain, quota outranks require.

commands

  • cluster: (1)create,(2)destroy,(3)mount,(4)umount,(5)copies
  • remote: (6)add,(7)remove,(8)import,(9)export,(10)quota,(11)herding
  • debug: (12)info,(13)check,(14)fix
  • internals: (15)fuse,(16)api
  1. cluster-create [cluster] - create a named volume (local kv at ~/.cthulu)
  2. cluster-destroy [cluster] - destroy a named volume (removes entry from ~/.cthulu)
  3. cluster-mount [cluster] ([mount_path]) - forks client FUSE mount
  4. cluster-umount [cluster|mount_path] - tells client FUSE daemon socket to stop client FUSE mount
  5. cluster-copies [cluster] [inside_path] ([number|unset]) - set the cluster-wide target number of copies for a path and contents. sticks with the path on renames or moves. if blank it returns the value, otherwise you can set it.
  6. remote-add [cluster] [remote] [remote_uri] - adds a remote by SSH-ing into it and starting a daemon that listens on a random HTTPS port. files already at that path are gracefully imported
  7. remote-remove [cluster] [remote|remote_uri] - removes a remote from consideration by all peers, remote will gracefully stop and uninstall itself, files remain in place
  8. remote-import [cluster] [remote] [external_path] [internal_path - locks path, moves files in from a remote, each becomes available once their SHA is read
  9. remote-export [cluster] [remote] [internal_path] [external_path - locks path, moves files out to a remote, each path is available once files moved off
  10. remote-quota [cluster] [remote|all] [inside_path] ([-gb|gb|%|unset]) - sets the quota of disk space (excluding space taken by somebody else) as a percentage or as gb. blank value gets the setting, provide a value to set it
  11. remote-herding [cluster] [remote|all] ([require|drain|balance|unset]) - set whether you want, or don’t want, a path on a remote. otherwise, follow cluster setting. blank value gets the setting, provide a value to set it
  12. debug-info [cluster] ([remote]) ([inside_path]) - tells the cached info for a specific location (narrow it down as much as you like)
  13. debug-check [cluster] ([remote]) ([inside_path]) - forces remotes to re-check the info for a specific location (narrow it down as much as you like)
  14. debug-fix [cluster] [remote] [split|remote|newer-file|older-file] - manually resolve conflicts by deciding which version of a path’s conflicts to keep. “remote” makes the version stored at that remote win. split allows keeping both versions but splits even non-conflicting files at the level specified by your command
  • internals-startfuse (called by mount)
  • internals-startremote (called by ssh-ing in, or service file)

dependencies

  • github.com/cobracli/cobra (command line options)
  • github.com/cockroach/pebble (key value store)
  • github.com/hanwen/gofuse/v2 (FUSE)
  • crypto/ssh (ssh connections)
  • net/http (API servers) - switch to grpc
  • db diff serialization
  • (rsync binary format)

notes

  • directive communication voices through client socket (~/.cthulu/socket), whether mounted or not. (client socket daemon)
  • fuse daemon runs separately
  • api server runs separately (not exposed by default)
  • raft-like (but not exact) consensus by liveliness
  • should switch to gRPC
  • should switch database to bolthold?
  • could possibly use SSH for gRPC - https://github.com/johnsiilver/serveonssh

Benchmarks

https://fio.readthedocs.io/en/latest/fio_doc.html#job-file-format bonnie++

https://blog.ja-ke.tech/2019/08/27/nas-performance-sshfs-nfs-smb.html

# does not include bench.file...
#!/bin/bash
OUT=$HOME/logs

fio --name=job-w --rw=write --size=2G --ioengine=libaio --iodepth=4 --bs=128k --direct=1 --filename=bench.file --output-format=normal,terse --output=$OUT/fio-write.log
sleep 5
fio --name=job-r --rw=read --size=2G --ioengine=libaio --iodepth=4 --bs=128K --direct=1 --filename=bench.file --output-format=normal,terse --output=$OUT/fio-read.log
sleep 5
fio --name=job-randw --rw=randwrite --size=2G --ioengine=libaio --iodepth=32 --bs=4k --direct=1 --filename=bench.file --output-format=normal,terse --output=$OUT/fio-randwrite.log
sleep 5
fio --name=job-randr --rw=randread --size=2G --ioengine=libaio --iodepth=32 --bs=4K --direct=1 --filename=bench.file --output-format=normal,terse --output=$OUT/fio-randread.log

First two are classic read/write sequential tests, with 128 KB block size an a queue depth of 4. The last are small 4 KB random read/writes, but with are 32 deep queue. The direct flag means direct IO, to make sure that no caching happens on the client.

For the real world tests i used rsync in archive mode (-rlptgoD) and the included measurements:

rsync --info=progress2 -a sshfs/TMU /tmp/TMU

redesign

  • .cache/cthulu is just a cache of guesses
  • .config/cthulu.yml with viper? serialize objects?