Reconfiguring Mastodon to use Object Storage
If you ask anybody that runs a small Mastodon server, you're quite likely to hear that media storage can be a bit of a pain.
Although there are various tootctl
incantations that are used to help free up space (I've included mine further down), as the fediverse grows so do a server's storage demands.
When I originally set up my Mastodon server, I used a locally attached volume for media storage.
Over time, though, the instance's needs have started to out-grow the storage that I've made available, and the volume has hit 100% a couple of times (although I have alerts, it's not something I'm going to rush back home to fix if I happen to be out).
Recently, I decided to move to using object storage instead: that way, the server could (within reason) consume whatever space it needed without costing a fortune.
The Mastodon documentation does a fantastic job of describing how to enable object storage, but information on how to move an existing server is much more limited.
There is, however, an excellent post by Thomas Leister on moving to Amazon S3.
I ran into some issues along the way though, so thought it was still worth putting some information up on my experience. I also use DigitalOcean Spaces rather than AWS S3, which slightly changes the config that's required.
tootctl
Magic
Before we get into the migration, it's worth sharing the tootctl
incantations that I run periodically:
docker exec -it web tootctl media remove --days=7
docker exec -it web tootctl media remove --days=7 --prune-profiles
docker exec -it web tootctl media remove --remove-headers
docker exec -it web tootctl preview_cards remove --days 7
# Remove any accounts that no longer exist on the remotes
# best not to run this too regularly as it polls remote servers
docker exec -it web tootctl accounts cull
# Remove any files that exist but aren't referred to in the db
docker exec -it web tootctl media remove-orphans
Although object storage is (theoretically) unbounded, we still have to pay for the space that's used, making these commands no less relevant than they were
Object Storage Bucket
The first thing to do, unsurprisingly, is to create a bucket.
I used DigitalOcean's Spaces, it's $5/month for up to 250GB of storage. Backblaze B2 is also quite popular with fedi admins.
Note that the name you choose needs to be globally unique - if another DO customer has used the name, you can't reuse it.
Tick the Enable CDN
option - there's no extra cost (at time of writing) and it speeds up responses a bit.
Initial File Sync
Although you could cut straight over to using object storage, any historic images (including avatars and header images) will be broken. So, unless you're really gasping for space, it's best to run an initial sync before reconfiguring Mastodon.
To push the data up, I used s3cmd
(although, I used it via a docker container).
You'll need a few bits of information:
- bucket url (visible at https://cloud.digitalocean.com/spaces - also visible in the screenshot above)
- The region - you can take this from the bucket url (mine, for example, is
ams3
) - access keys, in DigitalOcean that's https://cloud.digitalocean.com/account/api/spaces. Aftern creating one, you should end up with a secret key and an access key.
You can then use that information to build a configuration file for s3cmd
to use, remember to replace the labelled bits
mkdir conf
cat << EOM > conf/.s3cfg
[default]
access_key = <your access key>
secret_key = <your secret key>
host_base = <region>.digitaloceanspaces.com
host_bucket = %(bucket)s.<region>.digitaloceanspaces.com
This can then be used to list the contents of your bucket (which'll be empty)
docker run \
--rm \
-v $PWD$/conf:/root \
d3fk/s3cmd \
ls s3://<your bucket name>
Assuming that that doesn't return an error, you're ready to start pushing files up.
The utility first has to iterate through the files on disk and there tends to be a lot of them, so I opted to break the work up into chunks by running slightly different commands in separate terminals.
Assuming that Mastodon's public
directory is at /usr/local/share/mastodon/files/public/
:
BUCKET=myfedibucket
docker run \
--rm \
-v /usr/local/share/mastodon/files/public/system:/s3 \
-v /root/conf:/root \
d3fk/s3cmd \
sync --recursive -P --acl-public ./accounts s3://$BUCKET/
BUCKET=myfedibucket
docker run \
--rm \
-v /usr/local/share/mastodon/files/public/system:/s3 \
-v /root/conf:/root \
d3fk/s3cmd \
sync --recursive -P --acl-public ./custom_emojis s3://$BUCKET/
BUCKET=myfedibucket
docker run \
--rm \
-v /usr/local/share/mastodon/files/public/system:/s3 \
-v /root/conf:/root \
d3fk/s3cmd \
sync --recursive -P --acl-public ./media_attachments s3://$BUCKET/
The cache
directory has a similar structure and is where all the fedi originated content lives, so really does take quite a while (you could consider breaking that up too if you wanted)
BUCKET=myfedibucket
docker run \
--rm \
-v /usr/local/share/mastodon/files/public/system:/s3 \
-v /root/conf:/root \
d3fk/s3cmd \
sync --recursive -P --acl-public ./cache s3://$BUCKET/
Mastodon Configuration
With your uploads done (or, if you're brave, underway), it's time to tell Mastodon to use object storage.
This is simply a case of adding some environment variables.
With my docker setup, there's a file called .env.production
that's referenced by my docker-compose:
web:
image: tootsuite/mastodon:v4.2.0
restart: always
container_name: web
env_file: /usr/local/share/meiko/files/mastodon/files/.env.production
So, I just needed to update that to add some lines
S3_ENABLED=true
S3_PROTOCOL=https
S3_ENDPOINT=https://myfedibucket.ams3.digitaloceanspaces.com
S3_HOSTNAME=myfedibucket.ams3.digitaloceanspaces.com
S3_BUCKET=
AWS_ACCESS_KEY_ID=<YOUR ACCESS KEY>
AWS_SECRET_ACCESS_KEY=<YOUR SECRET KEY>
S3_ALIAS_HOST=myfedibucket.ams3.cdn.digitaloceanspaces.com
Now, there are a couple of subtle gotcha's here.
- The hostname used for DigitalOcean includes the bucket name so you must not provide a value for
S3_BUCKET
(if you do, Mastodon will create a subdirectory using the bucket name but won't include that in URLs provided to the browser/apps, leading to broken images) - The
S3_ALIAS_HOST
option should use the CDN URL rather than the bucket origin url.
With that change made though, it was just a case of recreating the containers
docker-compose up -d web
docker-compose up -d streaming
docker-compose up -d sidekiq
Once it had come up, Mastodon started writing new media into the DigitalOcean Space.
Browser Breakage
Although files were being uploaded to the correct location, I immediately ran into an issue.
On my laptop, I use Firefox to access Mastodon's UI, but none of the images (whether old or new) were loading.
Hitting F12
to open Developer Tools revealed that there was an issue: the requests were failing with NS_ERROR_INTERCEPTION_FAILED
. There wasn't much on the net to go on, other than a hint that this message has something to do with service workers.
So, I re-opened Developer Tools, Hit Application
and saw that there was, indeed, a service worker listed
I hit the Unregister
button to remove the worker and then refreshed the browser tab.
A new worker was stood up and all of the images started loading.
Final Sync
If (like me), you had Mastodon running whilst doing your initial sync, it's almost inevitable that some additional media will have been stored just before you cut over to using object storage.
So, before removing the local storage, it's well worth running the syncs once again - s3cmd
will only upload anything that's new or has changed (though it's still got to iterate through everything to check).
Conclusion
Despite there being a couple of odd gotchas along the way, there were no real headaches involved in reconfiguring Mastodon to use object storage.
As a result, I've been able to move from paying $10/month for 100GB storage to $5/month for up to 250GB.
What I probably do need to look at doing, though, is creating a telegraf
plugin to invoke tootctl media usage
so that I can avoid bill-shock by monitoring storage usage.