This repository has been archived on 2022-12-04. You can view files and clone it, but cannot push or open issues or pull requests.
rsync-docker/README.md

120 lines
5.9 KiB
Markdown
Raw Normal View History

2020-04-12 21:43:33 +00:00
# rsync
rsync is an open source utility that provides fast incremental file transfer.
If you are into Dockerizing everything or you just want to have a better view
over the rsync process (e.g., to see it in your cluster visualizer),
this repository provides you with a Docker image to run the rsync process.
The provided image is open-source and built from scratch with the goal to
enable you to run a stateless and an immutable container, according to the best
practices.
With this image you can:
* run a simple one-time rsync job for local drives
* run a rsync job for remote drives using ssh (included in the image)
* run scheduled rsync jobs using cron (included in the image)
2020-04-12 21:43:33 +00:00
Supported architectures:
* the image supports multiple architectures: `x86-64` and `arm32`
* the docker manifest is used for multi-platform awareness
* by simply pulling or running `ogivuk/rsync:latest`, the correct image for your architecture will be retreived
| Tag | Transmission Version and Architecture |
| :--- | :---- |
| `:latest` | supports both `x64` and `arm32v7` architectures |
| `:x64` | targeted to the `x64` architecture |
| `:arm32v7` | targeted to the `arm32v7` architecture |
The image is based on `alpine` image and it includes:
* `rsync`
* `openssh-client` for remote sync over ssh
* `tzdata` for easier setting up of local timezone (for file timestamps)
* `cron` (included in the alpine) for scheduling regular back-ups
* `rsync.sh` script that prepares the cron job and starts the cron daemon
2020-04-12 21:43:33 +00:00
## Usage
### Quick Start: one time run and sync of local folder
2020-04-12 21:43:33 +00:00
```bash
docker run --rm \
--name=rsync \
--volume /path/to/source/data:/data/src \
--volume /path/to/destination/data:/data/dst \
ogivuk/rsync [OPTIONS] /data/src/ /data/dst/
```
Replace:
* `/path/to/source/data` with the source folder to be copied or backed-up
* `/path/to/destination/data` with the destination folder
* `[OPTIONS]` with desired rsync optional arguments
### Use with cron or ssh
#### Step 1. Prepare the setup (_First time only_)
* create a folder, for example `~/rsync`, that will be later mounted in the container
* this folder is intended to hold supporting files such as log files, crontab file, and any supporting rsync files (e.g., list of files to rsync)
* the subfolder `logs`, for example `~/rsync/logs` is suggested for the log files
* important note: for Docker Swarm, this directory **needs to be available on all nodes in Docker swarm**, e.g., via network shared storage
```bash
mkdir -p ~/rsync/logs
```
* you can replace `~/rsync` with any other desired location
#### Step 2. Run
* run as a container:
```bash
docker run --rm \
--name=rsync \
--env TZ="Europe/Zurich" \
--env RSYNC_CRONTAB="crontab" \
--volume ~/rsync:/rsync
--volume /path/to/source/data:/data/src \
--volume /path/to/destination/data:/data/dst \
ogivuk/rsync
```
* run as a swarm service:
```bash
docker service create \
--name=rsync \
--env TZ="Europe/Zurich" \
--env RSYNC_CRONTAB="crontab" \
--mount type=bind,src=~/rsync,dst=/rsync \
--mount type=bind,src=/path/to/source/data,dst=/data/src \
--mount type=bind,src=/path/to/destination/data,dst=/data/dst \
ogivuk/rsync
```
| Parameter | Explanation | When to Use |
| :-------- | :---------- | :---------- |
| `--env TZ="Europe/Zurich"` | Sets the timezone in the container, which is important for the correct timestamping of logs. Replace `Europe/Zurich` with your own timezone from the list of available [timezones](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones). | Always |
| `--env RSYNC_CRONTAB="crontab"` | Specifies that the rsync is to be run as one or multiple cron jobs, and that the jobs are defined in the `crontab` file located in the mount-binded `~/rsync` folder. The rsync parameters used in the crontab must be mindful of the data locations in the container. | When using cron for regular rsync jobs |
| `~/rsync` | Specifies the local folder `~/rsync` that is mounted to the container at `/rsync`. Change `~/rsync` if another location is chosen in Step 1. | When using cron or ssh |
| `/path/to/source/data` | Specifies the source folder for sync and is mounted to the container at `/data/src`. Change to the appropriate folder. Multiple folders can be mounted in this way. | If any source is local |
| `/path/to/destination/data` | Specifies the destination folder for sync and is mounted to the container at `/data/src`. Change to the appropriate folder. Multiple folders can be mounted in this way. | If any destination is local |
| `--env RSYNC_UID=$(id -u)` | Provides the UID of the user starting the container so that the ownership of the files that rsync copies belong to that user. | If the rsync option for preserving ownership is not selected |
| `--env RSYNC_GID=$(id -g)` | Provides the GID of the user starting the container so that the ownership of the files that rsync copies belong to that user group. | If the rsync option for preserving ownership is not selected |
Remarks:
* **rsync will not be run by default**, you need to be specify the rsync command with all its arguments in the crontab, or in a script called in the crontab
* **any later changes to the crontab file require the service to be restarted**, and that's why consider to define the rsync job in a script that is called in the crontab
* when defining the rsync arguments, including source and destination, do that from the perspective of the container (/data/src, /data/dst)
* more volumes can be mount binded if needed
* the ssh client is included in the image in case your source or destination is a remote host
* ssh required files (private key, known_hosts, ssh_config) needs to be stored in a folder mounted to the container, for example in `~/rsync/ssh/`
* you can define the ssh connection in a `ssh_config` file
* rsync option `-e "ssh -F /rsync/ssh/ssh_config"` instructs rsync to use the ssh with the `ssh_config` for the remote sync