rsync files on a removable device

It’s not uncommon having to synchronize files on a removable USB device, and automating it often goes along with these two problems:

  • finding where the USB device has actually been mounted
  • transmitting files efficiently given that the USB storage is usually not permission/owner aware

You can check-out this post if you’re looking for tips to partition and format a removable device.

Finding the USB device

Most Linux distributions, if not every, have automounters that use partition label to name the mounted directory under /media. To be 100% sure to recognize a partition without depending on its label, using partition’s UUID is better.

Find the device’s UUID

Once you’ve plugged your USB device, you should find the UUID in /dev/disks/by-uuid/:

$ find /dev/disk/by-uuid/ -type l -printf '%f -> %l\n'
F865-6420 -> ../../sdc1
c82f2904-b4d7-49d4-9a4d-57fc40d98455 -> ../../dm-2
7a414b3f-08a7-4d15-89f8-cd98d3a7e4f3 -> ../../dm-1
5ed15afa-c236-48e8-bf6b-dbc93d1cbca6 -> ../../sda1
22206685-4471-4dd0-8e40-cbd0eca4b62e -> ../../sda5

To make things easier, you can just set a variable with your UUID value:

$ UUID="F865-6420"

Find the directory where the device was mounted

If the UUID is found, then print the corresponding MOUNTPOINT:

$ test ! -h /dev/disk/by-uuid/$UUID \
   || find /dev/disk/by-uuid/$UUID -exec lsblk -nro MOUNTPOINT {} \;
/media/usb

Synchronizing files

Using rsync

Copying all the files is not efficient, the rsync command should be used instead. First you can create an example directory to re-use later:

$ cp -r /usr/share/doc/rsync/ /tmp/mydata

Usually the removable device is not aware of files permissions or owners because formatted in FAT32 or NTFS), the only meta-data that matters is the files timestamps to let rsync recognize outdated files:

$ rsync -rhvt -- /tmp/mydata /media/usb
rsync -rhvt -- /tmp/mydata /media/usb
sending incremental file list
mydata/
mydata/README.gz
mydata/TODO.gz
mydata/changelog.Debian.gz
mydata/copyright
mydata/tech_report.tex.gz
mydata/examples/
mydata/examples/logrotate.conf.rsync
mydata/examples/rsyncd.conf
mydata/scripts/
mydata/scripts/atomic-rsync.gz
mydata/scripts/cull_options.gz
mydata/scripts/cvs2includes.gz
mydata/scripts/file-attr-restore.gz
mydata/scripts/files-to-excludes.gz
mydata/scripts/git-set-file-times.gz
mydata/scripts/logfilter.gz
mydata/scripts/lsh.gz
mydata/scripts/mnt-excl.gz
mydata/scripts/munge-symlinks.gz
mydata/scripts/rrsync.gz
mydata/scripts/rsyncstats.gz

sent 33.72K bytes  received 385 bytes  68.22K bytes/sec
total size is 32.41K  speedup is 0.95
$ ls /media/usb/mydata/
changelog.Debian.gz  copyright  examples  README.gz  scripts  tech_report.tex.gz  TODO.gz
$ rsync -rhvt -- /tmp/mydata /media/usb
sending incremental file list

sent 489 bytes  received 15 bytes  1.01K bytes/sec
total size is 32.41K  speedup is 64.30

Nothing has been copied during the second execution, so it works :) Now we can check deletion also works as expected:

$ rm -vf /tmp/mydata/examples/rsyncd.conf
removed `/tmp/mydata/examples/rsyncd.conf'
$ rsync -rhvt --delete --delete-before -- /tmp/mydata /media/usb
building file list ... done
deleting mydata/examples/rsyncd.conf
mydata/examples/

sent 465 bytes  received 15 bytes  960.00 bytes/sec
total size is 31.36K  speedup is 65.34

rsync deletes files missing in directory mydata/ on the removable device.

Synchronization script

The script below will automatically find the USB device and synchronize the directory if it is mounted, do nothing otherwise:

#!/bin/sh
#
# Synchronizes a directory on a removable device only if it is mounted
#

UUID='F865-6420'
SRCDIR='/tmp/mydata'

if [ -h /dev/disk/by-uuid/$UUID ]; then
   find /dev/disk/by-uuid/$UUID -exec lsblk -nro MOUNTPOINT {} \; \
      | xargs -rn1 -I{} -- rsync -rhvt --log-file="{}/rsync.log" \
         --delete --delete-before -- "$SRCDIR" {}/
fi
$ cat /media/usb/rsync.log
2013/02/12 21:09:44 [4284] building file list
2013/02/12 21:09:44 [4284] done
2013/02/12 21:09:44 [4284] sent 462 bytes  received 12 bytes  948.00 bytes/sec
2013/02/12 21:09:44 [4284] total size is 31.36K  speedup is 66.17

Crontab Integration

If you want your removable device being synchronized automatically without bothering with USB events, just add this line to your crontab:

*/10 * * * *      /full/path/to/synchronize-directory.sh

It should update the USB device every 10 minutes, without being told to :)

Delete old files while keeping at least few of them

One can remove files older than, say 30 days, with this simple command:

$ find . -type f -mtime +30 -delete

However if no new file is produced during 30 days, then all files are deleted and the directory will be empty.

Keep at least 100 files and delete others when older than 30 days

To avoid this situation, a condition must be added on the files count:

$ find . -type f -printf '%Ts\t%p\n' \
   | sort -rnk1 \
   | awk -v"threshold=$(date -d "30 days ago")" 'NR > 100 && $1 < threshold' \
   | cut -f 2- \
   | xargs -r rm
  • find: print all files path and timestamp
  • sort: sort lines by timestamps, newest first
  • awk: print only lines greater than 100 whose timestamp is older than 30 days
  • cut: remove the timestamp from the line
  • xargs: remove the files

Setup an HTTP Server in 10 seconds with Python3

A quick way to share files on a local network :)

Start the server

The following command creates an HTTP server listening on port 8888 and exposing the current directory as a webpage:

$ python3 -m http.server 8888

Connect a client

You can open the webpage with your favorite browser:

$ x-www-browser http://localhost:8888

Note: if accessing the webpage from another computer, you should replace localhost by the actual IP address of the server.