archive.sh

A script that will monitor archive directories on a btrfs file system for changes, and call a user-defined bash function with a list of changed files and directories.

Requires:

  • A list of archive directories, one full path per line, that will be recursively monitored for changes: archives
  • A bash helper script containing configuration parameters, and helper functions: archivethis-config.sh

Keep these in a directory, together with archive.sh:

.
├── archives
├── archive-config.sh
└── archive.sh

Also required:

  • btrfs btrfs-progs
  • btrbk We suggest you always install the most recent version, as described in its README
  • A working btrbk setup that is peridoically taking snapshots containing the archive directories specified in archives

What the script will do:

  1. Using the btrbk configuration given, collect information about existing btrfs snapshots containing archive directories
  2. Based on differences between snapshots, determine lists of archive files and directories that have changed, since the last snapshot was taken
  3. Call a user-defined helper function handleChanges with those lists, to allow for automated archive creation / processing; alternatively, call handleUnchanged if no changes were detected in any of the archive directories.

How to call it:

archivethis.sh
To download this script, first hover with your mouse over the listing below and press »Open code in new window«. Copying and pasting the colored and formatted listing below, as-is, won’t work.

archive-config.sh

Set BTRBK_SNAPSHOTTING_CONF to the path of your btrbk configuration file, and define three helper functions:

  1. ignoreFile() is called for every changed file. Return 0 if you want the file to be considered unchanged, anyway. Useful for files created during the archival process, so they don’t trigger the archival process again, in an endless loop; e.g.: .par2 parity files, .tsq and .tsr RFC3161 timestamping files, …
  2. handleChanges() is called with a list of all changed files and a list of all changed directories. If processing those changes might take a long time, consider just scheduling a separate task here: Task Spooler is an unobtrusive, simple solution available as a package in many Linux distributions.
  3. handleUnchanged() is called when no changes within archive directories have been detected, since the last snapshot was taken.
#!/bin/bash
BTRBK_SNAPSHOTTING_CONF="/etc/btrbk-snapshot.conf"

# Check whether a file should be considered unchanged, despite it was changed
# Returns 0 if file is to be ignored
function ignoreFile() {
    file_path="$1"
    extension="${file_path##*.}"

    declare -a IGNORED_EXTENSIONS=("par2" "tsq" "tsr")

    for ext in ${IGNORED_EXTENSIONS[@]}; do
        if [ "${extension,,}" = ${ext,,} ]; then
            return 0
        fi
    done

    return 1
}

# The temporary files containing lists of changed files and changed directories
# are deleted immediately after this function returns.
function handleChanges {
    changed_files_list="$1"
    changed_directories_list="$2"

    changed_dirs_count=$(wc -l < "$changed_directories_list")
    echo "Archive directories changed: $changed_dirs_count."
}

function handleUnchanged {
    echo No archive changes detected.
}

archive.sh

#!/bin/bash

# Look for the configuration files in the same directory
SCRIPT_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" &> /dev/null && pwd)
ARCHIVE_DIRS_LIST="$SCRIPT_DIR/archives"
source "$SCRIPT_DIR/archive-config.sh"

TMP_DIR=$(mktemp -d /tmp/tmpdir.XXXXXXXXXXXX)

BTRBK_SOURCES="$TMP_DIR/sources"
BTRBK_LATEST_SNAPSHOTS="$TMP_DIR/latest"
DIFF_LIST="$TMP_DIR/diff"

CHANGED_FILES_LIST="$TMP_DIR/changedfiles"
CHANGED_DIRS_LIST="$TMP_DIR/changeddirs"

function fillKeyValueArrayFromLine {
  declare -n the_array="$1"
  line="$2"

  for pair in $line; do
    set -- $(echo "$pair" | tr '=' ' ')
    VALUE="${2%\'}"
    VALUE="${VALUE#\'}"
    the_array["$1"]="$VALUE"
  done
}

# Collect all btrbk sources, create maps from btrbk source urls
# to source subvolumes, snapshot paths and snapshot names
declare -A SOURCE_URL2SOURCE_SUBVOLUME
declare -A SOURCE_URL2SNAPSHOT_PATH
declare -A SOURCE_URL2SNAPSHOT_NAME

btrbk list source --format=raw -c "$BTRBK_SNAPSHOTTING_CONF" > "$BTRBK_SOURCES"
while read current_source; do
    declare -A PAIR
    fillKeyValueArrayFromLine PAIR "$current_source"

    key="${PAIR[source_url]}"
    [[ "$key" != */ ]] && key="$key/"
    SOURCE_URL2SOURCE_SUBVOLUME["$key"]="${PAIR[source_subvolume]}"
    SOURCE_URL2SNAPSHOT_PATH["$key"]="${PAIR[snapshot_path]}"
    SOURCE_URL2SNAPSHOT_NAME["$key"]="${PAIR[snapshot_name]}"

done < "$BTRBK_SOURCES"


# Map archive directories to btrbk source urls
declare -A SOURCE_URL_SET
declare -A ARCHIVE_DIR2SOURCE_URL

while read current_archive_dir; do
    for current_source_url in "${!SOURCE_URL2SOURCE_SUBVOLUME[@]}"; do
        [[ "$current_archive_dir" != */ ]] && current_archive_dir="$current_archive_dir/"
        [[ "$current_source_url" != */ ]] && current_source_url="$current_source_url/"
        if [[ "$current_archive_dir" == "$current_source_url"* ]]; then
            if [[ ${#current_source_url} -gt ${#ARCHIVE_DIR2SOURCE_URL["$current_archive_dir"]} ]]; then
                SOURCE_URL_SET["$current_source_url"]=1
                ARCHIVE_DIR2SOURCE_URL["$current_archive_dir"]="$current_source_url"
            fi
        fi
    done
done < "$ARCHIVE_DIRS_LIST"


# Determine archive files and directories that have changed since last snapshot
declare -A SOURCE_SUBVOLUME2SNAPSHOT_SUBVOLUME

[[ -f "$CHANGED_DIRS_LIST" ]] && rm "$CHANGED_DIRS_LIST"
touch "$CHANGED_DIRS_LIST"
[[ -f "$CHANGED_FILES_LIST" ]] && rm "$CHANGED_FILES_LIST"
touch "$CHANGED_FILES_LIST"

btrbk list latest --format=raw -c "$BTRBK_SNAPSHOTTING_CONF"  > "$BTRBK_LATEST_SNAPSHOTS"
while read current_latest; do
    declare -A PAIR
    fillKeyValueArrayFromLine PAIR "$current_latest"

    source_subvolume=${PAIR[source_subvolume]}

    for current_source_url in "${!SOURCE_URL_SET[@]}"; do

        current_subvolume=${SOURCE_URL2SOURCE_SUBVOLUME["$current_source_url"]}

        if [[ "$current_subvolume" == "$source_subvolume" ]]; then
            snapshot_subvolume=${PAIR[snapshot_subvolume]}

            btrbk diff --format=col:h:FILE "$snapshot_subvolume" "$current_subvolume" | sed -e "/Total size/d" -e '/^[[:space:]]*$/d' > "$DIFF_LIST"
            while read changed_file; do
                file_path="${current_source_url}${changed_file}"
                if ! ignoreFile "$file_path"; then
                    if [ -f "$file_path" ]; then
                        echo $file_path >> "$CHANGED_FILES_LIST"
                        echo $(dirname "$file_path") >> "$CHANGED_DIRS_LIST"
                    fi
                fi
            done < "$DIFF_LIST"
            break
        fi
    done
done < "$BTRBK_LATEST_SNAPSHOTS"

if [ -s "$CHANGED_FILES_LIST" ]; then
    sort -o "$CHANGED_FILES_LIST" "$CHANGED_FILES_LIST"
    sort -u -o "$CHANGED_DIRS_LIST" "$CHANGED_DIRS_LIST"

    handleChanges "$CHANGED_FILES_LIST" "$CHANGED_DIRS_LIST"
else
    handleUnchanged
fi

rm -rf "$TMP_DIR"

Image Credits:
Papirus icon for Terminal (modified) | GNU General Public License, version 3

Licensing:
This content is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
For your attributions to us please use the word »tuxwise«, and the link https://tuxwise.net.