Reconfiguring Mastodon to use Object Storage

Ben Tasker

2023-12-20 13:09

If you ask anybody that runs a small Mastodon server, you're quite likely to hear that media storage can be a bit of a pain.

Although there are various tootctl incantations that are used to help free up space (I've included mine further down), as the fediverse grows so do a server's storage demands.

When I originally set up my Mastodon server, I used a locally attached volume for media storage.

Over time, though, the instance's needs have started to out-grow the storage that I've made available, and the volume has hit 100% a couple of times (although I have alerts, it's not something I'm going to rush back home to fix if I happen to be out).

Recently, I decided to move to using object storage instead: that way, the server could (within reason) consume whatever space it needed without costing a fortune.

The Mastodon documentation does a fantastic job of describing how to enable object storage, but information on how to move an existing server is much more limited.

There is, however, an excellent post by Thomas Leister on moving to Amazon S3.

I ran into some issues along the way though, so thought it was still worth putting some information up on my experience. I also use DigitalOcean Spaces rather than AWS S3, which slightly changes the config that's required.

`tootctl` Magic

Before we get into the migration, it's worth sharing the tootctl incantations that I run periodically:

 docker exec -it web tootctl media remove --days=7
 docker exec -it web tootctl media remove  --days=7 --prune-profiles
 docker exec -it web tootctl media remove --remove-headers
 docker exec -it web tootctl preview_cards remove --days 7

 # Remove any accounts that no longer exist on the remotes
 # best not to run this too regularly as it polls remote servers
 docker exec -it web tootctl accounts cull

 # Remove any files that exist but aren't referred to in the db
 docker exec -it web tootctl media remove-orphans

Although object storage is (theoretically) unbounded, we still have to pay for the space that's used, making these commands no less relevant than they were

Object Storage Bucket

The first thing to do, unsurprisingly, is to create a bucket.

I used DigitalOcean's Spaces, it's $5/month for up to 250GB of storage. Backblaze B2 is also quite popular with fedi admins.

Screenshot of the DigitalOcean bucket creation screen, I've provided the name myfedibucket for the bucket

Note that the name you choose needs to be globally unique - if another DO customer has used the name, you can't reuse it.

Tick the Enable CDN option - there's no extra cost (at time of writing) and it speeds up responses a bit.

Initial File Sync

Although you could cut straight over to using object storage, any historic images (including avatars and header images) will be broken. So, unless you're really gasping for space, it's best to run an initial sync before reconfiguring Mastodon.

To push the data up, I used s3cmd (although, I used it via a docker container).

You'll need a few bits of information:

bucket url (visible at https://cloud.digitalocean.com/spaces - also visible in the screenshot above)
The region - you can take this from the bucket url (mine, for example, is ams3)
access keys, in DigitalOcean that's https://cloud.digitalocean.com/account/api/spaces. Aftern creating one, you should end up with a secret key and an access key.

You can then use that information to build a configuration file for s3cmd to use, remember to replace the labelled bits

mkdir conf
cat << EOM > conf/.s3cfg
[default]
access_key = <your access key>
secret_key = <your secret key>
host_base = <region>.digitaloceanspaces.com
host_bucket = %(bucket)s.<region>.digitaloceanspaces.com

This can then be used to list the contents of your bucket (which'll be empty)

docker run \
--rm \
-v $PWD$/conf:/root \
d3fk/s3cmd \
ls s3://<your bucket name>

Assuming that that doesn't return an error, you're ready to start pushing files up.

The utility first has to iterate through the files on disk and there tends to be a lot of them, so I opted to break the work up into chunks by running slightly different commands in separate terminals.

Assuming that Mastodon's public directory is at /usr/local/share/mastodon/files/public/:

BUCKET=myfedibucket

docker run \
--rm \
-v /usr/local/share/mastodon/files/public/system:/s3 \
-v /root/conf:/root \
d3fk/s3cmd \
sync --recursive -P --acl-public ./accounts s3://$BUCKET/

BUCKET=myfedibucket

docker run \
--rm \
-v /usr/local/share/mastodon/files/public/system:/s3 \
-v /root/conf:/root \
d3fk/s3cmd \
sync --recursive -P --acl-public ./custom_emojis s3://$BUCKET/

BUCKET=myfedibucket

docker run \
--rm \
-v /usr/local/share/mastodon/files/public/system:/s3 \
-v /root/conf:/root \
d3fk/s3cmd \
sync --recursive -P --acl-public ./media_attachments s3://$BUCKET/

The cache directory has a similar structure and is where all the fedi originated content lives, so really does take quite a while (you could consider breaking that up too if you wanted)

BUCKET=myfedibucket

docker run \
--rm \
-v /usr/local/share/mastodon/files/public/system:/s3 \
-v /root/conf:/root \
d3fk/s3cmd \
sync --recursive -P --acl-public ./cache s3://$BUCKET/

Mastodon Configuration

With your uploads done (or, if you're brave, underway), it's time to tell Mastodon to use object storage.

This is simply a case of adding some environment variables.

With my docker setup, there's a file called .env.production that's referenced by my docker-compose:

   web:
        image: tootsuite/mastodon:v4.2.0
        restart: always
        container_name: web
        env_file: /usr/local/share/meiko/files/mastodon/files/.env.production

So, I just needed to update that to add some lines

S3_ENABLED=true
S3_PROTOCOL=https
S3_ENDPOINT=https://myfedibucket.ams3.digitaloceanspaces.com
S3_HOSTNAME=myfedibucket.ams3.digitaloceanspaces.com
S3_BUCKET=
AWS_ACCESS_KEY_ID=<YOUR ACCESS KEY>
AWS_SECRET_ACCESS_KEY=<YOUR SECRET KEY>
S3_ALIAS_HOST=myfedibucket.ams3.cdn.digitaloceanspaces.com

Now, there are a couple of subtle gotcha's here.

The hostname used for DigitalOcean includes the bucket name so you must not provide a value for S3_BUCKET (if you do, Mastodon will create a subdirectory using the bucket name but won't include that in URLs provided to the browser/apps, leading to broken images)
The S3_ALIAS_HOST option should use the CDN URL rather than the bucket origin url.

With that change made though, it was just a case of recreating the containers

docker-compose up -d web
docker-compose up -d streaming
docker-compose up -d sidekiq

Once it had come up, Mastodon started writing new media into the DigitalOcean Space.

Browser Breakage

Although files were being uploaded to the correct location, I immediately ran into an issue.

On my laptop, I use Firefox to access Mastodon's UI, but none of the images (whether old or new) were loading.

Hitting F12 to open Developer Tools revealed that there was an issue: the requests were failing with NS_ERROR_INTERCEPTION_FAILED. There wasn't much on the net to go on, other than a hint that this message has something to do with service workers.

So, I re-opened Developer Tools, Hit Application and saw that there was, indeed, a service worker listed

Screenshot of Firefox showing a service worker for mastodon. Admittedly, I took this screenshot after resolving the problem.

I hit the Unregister button to remove the worker and then refreshed the browser tab.

A new worker was stood up and all of the images started loading.

Final Sync

If (like me), you had Mastodon running whilst doing your initial sync, it's almost inevitable that some additional media will have been stored just before you cut over to using object storage.

So, before removing the local storage, it's well worth running the syncs once again - s3cmd will only upload anything that's new or has changed (though it's still got to iterate through everything to check).

Conclusion

Despite there being a couple of odd gotchas along the way, there were no real headaches involved in reconfiguring Mastodon to use object storage.

As a result, I've been able to move from paying $10/month for 100GB storage to $5/month for up to 250GB.

What I probably do need to look at doing, though, is creating a telegraf plugin to invoke tootctl media usage so that I can avoid bill-shock by monitoring storage usage.

tootctl Magic