Kubernetes Backup - perform Gitea backup on S3 with MinIO

When dealing with Kubernetes, backing up an application's data is not as straightforward as when the application is installed directly on your system. Due to its design, Kubernetes orchestrates applications to make them resilient by horizontally scaling them as needed and restarting them upon failure. This often results in applications being started or restarted on different worker nodes in an unpredictable manner.
Moreover, it is often not possible to attach a container for running the backup process to an already running pod. Given these factors, it is clear that backing up applications on Kubernetes introduces additional challenges. Kubernetes Backup - perform Gitea backup on S3 with MinIO shows how to run backups of Kubernets workloads, providing as example how to backup Gitea on S3.

Traditional Backup

In traditional environments, configuring backups requires addressing several key challenges:

Permissions Management: The backup process must run with sufficient privileges to access the data. To mitigate security risks, it is necessary to adhere to the principle of least privilege by granting only the minimum necessary permissions, aligning with best practices for secure operations.
Data Consistency: This is particularly challenging when backing up data that undergoes unpredictable write operations, such as databases. During the backup read cycle, ongoing writes can result in an inconsistent snapshot, leading to a corrupted backup copy. The issue is exacerbated in clustered environments, where multiple instances on separate nodes may concurrently modify shared data. Solutions typically involve application-specific backup tools or storage-level snapshotting, if supported.
Temporary Scratch Storage: Backups are often directed to removable or remote storage with high latency, which can significantly prolong the process. A common workaround is to mount a temporary scratch disk for staging the backup data, followed by a secondary transfer to the final destination (e.g., tapes or cloud object storage like S3 buckets).

Back Up On Kubernetes

As we just said, Kubernetes' resilient orchestration introduces additional complexities for backups. More specifically:

Ephemeral Pod Scheduling: Unless when using node affinity rules (but these are just exceptions), there is no guarantee that a container holding the data to back up will be scheduled on the same node across runs. This makes it impractical to rely on the host filesystem for scratch space without cumbersome workarounds.
Dynamic Persistent Storage: For non-trivial workloads, data is typically provided via Container Storage Interface (CSI)-mounted Persistent Volumes (PVs). These are managed dynamically by storage operators, which provision volumes on demand and deallocate them when no longer needed. Backing up such data requires configuring a Persistent Volume Claim (PVC) for the backup process that can attach to the application's PV.
CSI Snapshot Limitations: While CSI snapshots can simplify backups, support varies by storage provider. Additionally, managing the snapshot lifecycle (creation, retention, and deletion) demands ongoing operational attention.

In summary, backing up Kubernetes workloads requires meticulous handling of storage access, application state, data movement, and security to achieve reliable, consistent backups across diverse workload types.

For these reasons, the most straightforward and reliable approach to data backups on Kubernetes is to spin up a dedicated pod for the backup process. This pod should use a PVC that allows it to mount the same PV which is used by the pod being the backup source workload, enabling direct access to its data without complex intermediaries.

Example Lab - Gitea Backup

To illustrate these concepts in practice, we can walk through a real-world scenario, such as configuring a backup process for a Gitea instance running on Kubernetes.

Gitea is a full-featured, self-hosted Source Code Management (SCM) application inspired by GitHub. It allows you to run a GitHub-like SCM platform either on-premises or in the cloud, complete with features like issue tracking, wikis, and pull requests. Gitea also supports advanced capabilities, such as Action Workflows, which provide nearly the same feature set as GitHub Actions for CI/CD automation. It can be deployed as a standalone application, via containers using Docker or Podman, or easily on Kubernetes using official Helm charts.

The recommended way to run a back up of a Gitea instance is to use the gitea command-line tool with the dump action. This generates a single zipped archive containing:

A snapshot of the Git repositories stored on the filesystem
A dump of the underlying database (e.g., PostgreSQL, MySQL, or SQLite)

This process ensures data consistency by leveraging Gitea's built-in mechanisms to quiescence operations temporarily if needed.

The process does not support to directly store data on long-term backup storage - such as a tape or on a S3 bucket: in our scenario, we address this shortcoming by using a temporary scratch storage to hold the zipped archive during creation - we will then transfer it to the S3 bucket in the next step, using the MinIO S3 client. This aligns with the traditional challenges of scratch space and data movement discussed earlier.

The last problem to address, since we are in a Kubernetes scenario, is where and how to run the gitea dump process followed by the copy to S3 using the MinIO command line tool: this platform indeed as we said introduces unique challenges (e.g., ephemeral scheduling and dynamic storage).

The solution is to run both process in a dedicated Kubernetes pod which mounts the necessary volumes. This approach avoids relying on host filesystems and ensures the pod can access Gitea's Persistent Volume (PV) for the repositories and database data.

More specifically, we must:

create a dedicated Persistent Volume (PV) used as temporary scratch disk for storing the data before they are moved to the S3 bucket
define a Persistent Volume Claim (PVC) suitable to be bounded to the above Persistent Volume (PV), so to be able to mount it in the container running the backup process
define the Kubernetes job instantiating the pod for running the backup process.

The above description requires having a Kubernetes instance with a CSI driver to automatically provision Persistent Volume as needed, automatically purging them when no longer necessary. In this post, to maximize the portability of this lab, I'm instead using Persistent Volumes (PVs) mounted on the local filesystem. This enables us to illustrate the key concepts through a hands-on experience that can always be easily reproduced and modified by you to match the CSI driver for the storage in your own environment as necessary. Of course, this approach requires us to assume that the Gitea deployment has been configured to run only on a specific worker node. This allows us to avoid the need for a complex CSI driver to handle shared storage.

Gitea Backup PV

Let's start setting up the lab from provisioning the Persistent Volume (PV) used as the temporary storage during the backup process. As said explained in the above box, in this post, to keep things simple, easy to reproduce in small environments and independent of any specific CIS implementation, we use a filesystem-based PV. The PV in this example will get mounted to a path on the host's filesystem of the Kubernetes worker node matching the node selector criteria.

Again, ... in a real-world scenario, this PV must instead be a dynamically provisioned one (e.g., via a StorageClass for local SSDs or cloud block storage) or a statically previously provisioned volume.

Create the Persistent Volume using the following YAML definition:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: gitea-backup-pv
spec:
  capacity:
    storage: 20Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Delete
  storageClassName: local-storage
  claimRef:
    namespace: gitea-t1
    name: gitea-backup-pvc
  local:
    path: /scratch/gitea-backup
  nodeAffinity:
    required:
      nodeSelectorTerms:
        - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
                - kubew-ca-uta001.t1.carcano.corp

The above definition creates the gitea-backup-pv Persistent Volume with the following settings:

Node Affinity Restriction: Due to the nodeAffinity rules, this Persistent Volume can only be created on node kubew-ca-uta001.t1.carcano.corp - if you are using a CSI driver automatically provisioning the volume, you don't need this node affinity rule
Storage Type and Host Path: It is defined as a Filesystem type PV using the local-storage storage class. The volume is mounted to the /scratch/gitea-backup directory on the host that matches the node affinity restriction - if you are using a CSI driver, you must tweak this part to fit your specific scenario
Claim Reference: It includes a claimRef that restricts binding to only the gitea-backup-pvc defined in the gitea-t1 namespace, ensuring exclusivity and preventing accidental use by other PVCs - as you can infer, this is a critical mandatory restriction
Access Mode and Capacity: It provides ReadWriteOnce (Read-Write) access to a storage capacity of 20 GiB. Of course, you must increase this value to suit your specific needs.
Reclaim Policy: The reclaimPolicy is set to Delete, which means (ideally) that Kubernetes will automatically wipe the storage when the bound Persistent Volume Claim (PVC) is deleted - this is ignored when using a Filesystem type PV - I am mentioning it here only as a hint

Note that you must create the /scratch/gitea-backup directory on the host's filesystem beforehand, and make sure it is writable by the system user running Gitea (e.g., via appropriate permissions like chown and chmod).

Gitea Backup PVC

We can them move to the creation of a Persistent Volume Claim (PVC) that binds to the gitea-backup-pv Persistent Volume (PV) we just created.

This is the YAML to use:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: gitea-backup-pvc
  namespace: gitea-t1
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 20Gi
  storageClassName: local-storage

Once submitted to Kubernetes, it creates the gitea-backup-pvc Persistent Volume Claim (PVC).

It is mandatory that not only its requisites (storageClassName, accessModes, and requested storage size) match the ones defined in the gitea-backup-pv Persistent Volume: because of the claimRef clause in the PV definition, also the PVC name (gitea-backup-pvc) and namespace (gitea-t1) must match. Conversely, Kubernetes will not bind this PVC to the gitea-backup-pv Persistent Volume.

MinIO Kubernetes Secret

As anticipated, in our scenario we are setting up a process to move the zipped dump file from the Gitea backup to an S3-compatible bucket. To implement this, we use a MinIO container to push the zipped dump to the S3 endpoint.

This requires us to provide the necessary MinIO configuration settings to define an alias which enables the push to the correct S3 endpoint in the right bucket.

For this purpose, create the config.json file with the following contents:

{
    "version": "10",
    "aliases": {
        "minio-t1": {
            "url": "http://minio-ca-ut1a001.t1.carcano.corp:9000",
            "accessKey": "gitat-t1",
            "secretkey": "th3.P6wr0D",
            "api": "s3v4",
            "path": "auto"
        }
    }
}

This configures the alias named minio-t1 which can be referenced when using the MinIO client (mc) command line tool to connect to the http://minio-ca-ut1a001.t1.carcano.corp:9000 endpoint using the provided access key and secret key for authentication - the alias configures also the use of S3v4 API compatibility and the auto-detection of the path style for bucket operations.

Once done, we can store this configuration in the Kubernetes Secret named minio within the gitea-t1 namespace.

Assuming the config.json file is in the current directory, just run:

kubectl -n gitea-t1 create secret generic minio --from-file=config.json

This command creates the Secret, which can then be mounted into your backup pod or MinIO client container to access the configuration.

For the sake of completeness, let's ensure the Secret was created successfully by running:

kubectl -n gitea-t1 get secret minio -o yaml

This setup will enable seamless S3 uploads from the Gitea backup process running inside the containers.

The Batch Job

We can finally define the actual Kubernetes Batch Job to perform the Gitea backup and upload to S3: for this purpose we create a Job named gitea-backup-job in the gitea-t1 namespace.

To handle the two steps process (gitea dump stage followed by the push to the S3 bucket stage) we will configure a Kubernetes initContainer to generate the Gitea dump, and a main container to upload it to the S3-compatible bucket using the MinIO command line client.

The Job will be designed to run once (Backofflimit=1) and never restart on failure (restartPolicy=Never).

To access Gitea's data directory (including the app.ini config for database connectivity and backups), the Job assumes an already existent separate Persistent Volume Claim (PVC) named gitea-shared-storage that shares the original PV used by the pods running the Gitea instances in the Kubernetes deployment.

Here is the example YAML for the gitea-backup-job:

apiVersion: batch/v1
kind: Job
metadata:
  name: gitea-backup-job
  namespace: gitea-t1
spec:
  template:
    spec:
      initContainers:
        - name: gitea
          image: docker.gitea.com/gitea:1.24.2-rootless
          command: ["/bin/sh", "-c"]
          securityContext:
            runAsUser: 1000
          args:
            - |
              [ -f /tmp/gitea_dump.zip ] && rm -f /tmp/gitea_dump.zip
              [ -d /tmp/gitea ] && rm -rf /tmp/gitea 
              mkdir /tmp/gitea 
              /usr/local/bin/gitea manager flush-queues -c /data/gitea/conf/app.ini
              /usr/local/bin/gitea dump -c /data/gitea/conf/app.ini -f /tmp/gitea_dump.zip
          volumeMounts:
            - name: scratch
              mountPath: /tmp
            - name: data
              mountPath: /data
      containers:
        - name: minio-client
          image: minio/mc:latest
          command: ["/bin/sh", "-c"]
          args:
            - |
              mc cp /scratch/gitea_dump.zip minio-t1/carcano/gitea-testing/
          volumeMounts:
            - name: scratch
              mountPath: /scratch
            - name: minio 
              mountPath: /root/.mc/config.json
              readonly: true
              subPath: config.json
      restartPolicy: Never
      volumes:
        - name: scratch
          persistentVolumeClaim:
            claimName: gitea-backup-pvc
        - name: data
          persistentVolumeClaim:
            claimName: gitea-shared-storage
        - name: minio
          secret:
            secretName: minio
            defaultMode: 0644
            optional: false
            items:
              - key: config.json
                path: config.json
  backoffLimit: 1

In details, it defines the following containers:

InitContainer (gitea backup process)

It uses the official Gitea rootless image, running as non-root user (UID 1000) for improving security, mounting the scratch volume at /tmp and the data volume at /data, which is mounted at a level which enables accessing both Gitea's configurations (the app.ini file) and the repositories directory tree. The app.ini file is then used by the process for connecting to the database to backup.

It is mandatory to have the release in the container tag matching the release of the Gitea instance you are backing up, otherwise it's very likely to fall in compatibility issues, such as trying to use a different data structure in the database to backup.

Once started, the process generates the backup data into the scratch volume mounted on the /tmp directory.

Upon successful execution, the initContainer generates the gitea_dump.zip zipped dump file, which includes repositories, database, and configuration files.

It is worth to mention that, before starting, it cleans up the dump file if it already exists and creates the /tmp/gitea temporary directory, removing the one created during previous executions, which is used to store the temporary data that will be stored in the zipped dump file. It then flushes Gitea queues to reach mostly consistent state, and runs the gitea dump command to perform the actual dump.

Main Container (MinIO client process):

It uses the MinIO client image, mounting the scratch volume at /scratch and, read-only, the minio Secret on the /root/.mc directory, so to expose the config.json file. This allows mc to use the configured alias without manual setup.

The process then runs MinIO client (mc) to upload to the minio-t1 alias the /scratch/gitea_dump.zip zipped file which was generated in the scratch volume by the initContainer: the file is pushed to the http://minio-ca-ut1a001.t1.carcano.corp:9000 endpoint storing it in the gitea-testing bucket.

In this lab ,the container runs as root by default: if you decide to get inspirations from this example and bring it to production, remind to consider security hardening as needed. Mind also that, while this setup provides a reliable, one-off backup process, before bringing it to production you must at least add error handling (e.g., checks for dump success before upload), scheduling via CronJob, or notifications on completion/failure.

Monitoring The Outcome Of the Batch Job

We can monitor the outcome of the gitea-backup-job job by checking the logs of the spawned pods related to it.

First. let's get the pod's list by running:

kubectl get pods -n gitea-t1 -l job-name=gitea-backup-job

on my system, the output is:

NAME                     READY   STATUS      RESTARTS   AGE
gitea-backup-job-vxtd9   0/1     Completed   0          53m

We can then inspect the outcome by checking the gitea-backup-job-vxtd9 pod's log as follows:

kubectl -n gitea-t1 logs gitea-backup-job-vxtd9

on my system, the output is:

Defaulted container "minio-client" out of: minio-client, gitea (init)
`/scratch/gitea_dump.zip` -> `carcano/gitea-testing/gitea_dump.zip`
┌────────────┬─────────────┬──────────┬─────────────┐
│ Total      │ Transferred │ Duration │ Speed       │
│ 283.49 KiB │ 283.49 KiB  │ 00m00s   │ 14.11 MiB/s │
└────────────┴─────────────┴──────────┴─────────────┘

This confirms that the gitea_dump.zip file containing all the backup data was successfully transferred to the S3 bucket.

Footnotes

We have reached the end of this post. As we saw, backing up an application's data when it's running on Kubernetes is not as straightforward as when the application is installed directly on your system - it introduces additional challenges due to the platform's dynamic nature.

This tutorial demonstrated how to back up a Gitea instance on Kubernetes by addressing core challenges like ephemeral pod scheduling and data consistency. You explored provisioning scratch storage via Persistent Volumes, securing credentials with Secrets, and orchestrating backups using a Batch Job with initContainers for dumping data and main containers for S3 uploads via MinIO - ensuring reliable, automated transfers without host dependencies.

In the fast-paced DevOps and DevSecOps world, where applications scale dynamically and downtime is costly, mastering Kubernetes-native backups ensures data resilience and compliance. These techniques promote automation, security (e.g., least-privilege access), and portability across environments, reducing risks from failures or migrations. By leveraging tools like Jobs and PVCs, teams can achieve consistent, efficient workflows that align with CI/CD pipelines, ultimately supporting reliable software delivery and disaster recovery.

I hope this post sparked some inspiration and equipped you with practical skills for your DevSecOps toolkit. If it did, why not fuel the fire? Drop a tip in the small cup below - blogs like this thrive on community support, and every contribution keeps the knowledge flowing!

I hate blogs with pop-ups, ads and all the (even worse) other stuff that distracts from the topics you're reading and violates your privacy. I want to offer my readers the best experience possible for free, ... but please be wary that for me it's not really free: on top of the raw costs of running the blog, I usually spend on average 50-60 hours writing each post. I offer all this for free because I think it's nice to help people, but if you think something in this blog has helped you professionally and you want to give concrete support, your contribution is very much appreciated: you can just use the above button.

Kubernetes Backup – perform Gitea backup on S3 with MinIO