Cthulu

Cthulu is a filesystem project of mine because my Pi’s power supply would undervolt and reset my harddrives, making btrfs have to constantly repair. Btrfs has the weakness (or design choice) of wanting to force the admin to manually repair things before a remount, so degraded mounts are not automatic.

TODO: copy over manual page and examples. Maybe put mdbook inside here…

get it:
- cthulu-arm 11M
- cthulu-x86 11M
- cthulu-arm-small 3.7M
- cthulu-x86-small 4.3M
- cthulu-src.tar.xz 49k
- cthulu-bins.tar.xz 14M

Cthulu

Cthulu is a transparent overlay FUSE of remote sane directories. It balances, duplicates, & repairs in the background per your subpath-specific directives.

While data is durable if it got to the first remote safely, and gets more durable with time, the cache is dangerous.

remotes are sane filesystems on their own (not databases/erasurecoded)
- add remotes using ssh paths (ssh://user@host:/path), binaries copy over & start automatically
- files are stored in the same layout as your fuse, minus permissions/attributes.
- A remote may not necessarily have all files in any folder. To trade-off between striping on remote adds/removes and export speed, folders are spread out to 2 x num_copies hosts.
degraded/self-healing mounts come standard
- all operations may occur whether mounted or not
- As long as one copy is live, the file path is live.
- The current version of a file locks to one remote until closed for 10 minutes, then it replicates per your specifications.
- minimum of one copies at all times
- repairs happen automatically unless the cluster has a conflict, allowing admins to resolve it.
per path directives are followed in the background
- import or export individual paths to & from remotes
- copies per path may be set cluster-wide (default 3)
- quotas per path may be set per remote (default 90%)
- herding (require/drain/balance) per path may be set per remote or cluster wide (default balance)
- at least one copy must exist. quota outranks copies otherwise, copies outranks drain, quota outranks require.

commands

cluster: (1)create,(2)destroy,(3)mount,(4)umount,(5)copies
remote: (6)add,(7)remove,(8)import,(9)export,(10)quota,(11)herding
debug: (12)info,(13)check,(14)fix
internals: (15)fuse,(16)api

cluster-create [cluster] - create a named volume (local kv at ~/.cthulu)
cluster-destroy [cluster] - destroy a named volume (removes entry from ~/.cthulu)
cluster-mount [cluster] ([mount_path]) - forks client FUSE mount
cluster-umount [cluster|mount_path] - tells client FUSE daemon socket to stop client FUSE mount
cluster-copies [cluster] [inside_path] ([number|unset]) - set the cluster-wide target number of copies for a path and contents. sticks with the path on renames or moves. if blank it returns the value, otherwise you can set it.
remote-add [cluster] [remote] [remote_uri] - adds a remote by SSH-ing into it and starting a daemon that listens on a random HTTPS port. files already at that path are gracefully imported
remote-remove [cluster] [remote|remote_uri] - removes a remote from consideration by all peers, remote will gracefully stop and uninstall itself, files remain in place
remote-import [cluster] [remote] [external_path] [internal_path - locks path, moves files in from a remote, each becomes available once their SHA is read
remote-export [cluster] [remote] [internal_path] [external_path - locks path, moves files out to a remote, each path is available once files moved off
remote-quota [cluster] [remote|all] [inside_path] ([-gb|gb|%|unset]) - sets the quota of disk space (excluding space taken by somebody else) as a percentage or as gb. blank value gets the setting, provide a value to set it
remote-herding [cluster] [remote|all] ([require|drain|balance|unset]) - set whether you want, or don’t want, a path on a remote. otherwise, follow cluster setting. blank value gets the setting, provide a value to set it
debug-info [cluster] ([remote]) ([inside_path]) - tells the cached info for a specific location (narrow it down as much as you like)
debug-check [cluster] ([remote]) ([inside_path]) - forces remotes to re-check the info for a specific location (narrow it down as much as you like)
debug-fix [cluster] [remote] [split|remote|newer-file|older-file] - manually resolve conflicts by deciding which version of a path’s conflicts to keep. “remote” makes the version stored at that remote win. split allows keeping both versions but splits even non-conflicting files at the level specified by your command

internals-startfuse (called by mount)
internals-startremote (called by ssh-ing in, or service file)

dependencies

github.com/cobracli/cobra (command line options)
github.com/cockroach/pebble (key value store)
github.com/hanwen/gofuse/v2 (FUSE)
crypto/ssh (ssh connections)
net/http (API servers) - switch to grpc
db diff serialization
(rsync binary format)

notes

directive communication voices through client socket (~/.cthulu/socket), whether mounted or not. (client socket daemon)
fuse daemon runs separately
api server runs separately (not exposed by default)
raft-like (but not exact) consensus by liveliness
should switch to gRPC
should switch database to bolthold?
could possibly use SSH for gRPC - https://github.com/johnsiilver/serveonssh

Benchmarks

https://fio.readthedocs.io/en/latest/fio_doc.html#job-file-format bonnie++

https://github.com/nxsre/sshfs-go
NFS
- https://github.com/davecheney/nfs
- https://github.com/willscott/go-nfs
going for 20 MB/s read/write minimum, 40MB/s write, 60MB/s read max
2ms read/write latency
10-20k IOPS goal random
sequential 75MB/s-100MB/s read/write

https://blog.ja-ke.tech/2019/08/27/nas-performance-sshfs-nfs-smb.html

# does not include bench.file...
#!/bin/bash
OUT=$HOME/logs

fio --name=job-w --rw=write --size=2G --ioengine=libaio --iodepth=4 --bs=128k --direct=1 --filename=bench.file --output-format=normal,terse --output=$OUT/fio-write.log
sleep 5
fio --name=job-r --rw=read --size=2G --ioengine=libaio --iodepth=4 --bs=128K --direct=1 --filename=bench.file --output-format=normal,terse --output=$OUT/fio-read.log
sleep 5
fio --name=job-randw --rw=randwrite --size=2G --ioengine=libaio --iodepth=32 --bs=4k --direct=1 --filename=bench.file --output-format=normal,terse --output=$OUT/fio-randwrite.log
sleep 5
fio --name=job-randr --rw=randread --size=2G --ioengine=libaio --iodepth=32 --bs=4K --direct=1 --filename=bench.file --output-format=normal,terse --output=$OUT/fio-randread.log

First two are classic read/write sequential tests, with 128 KB block size an a queue depth of 4. The last are small 4 KB random read/writes, but with are 32 deep queue. The direct flag means direct IO, to make sure that no caching happens on the client.

For the real world tests i used rsync in archive mode (-rlptgoD) and the included measurements:

rsync --info=progress2 -a sshfs/TMU /tmp/TMU

redesign

.cache/cthulu is just a cache of guesses
.config/cthulu.yml with viper? serialize objects?