The recipe in this article covers the following use case:
- You need to deliver a large array of files (millions)
- The array of files keeps populated
- No files are removed from the array
- You need to keep it synchronized
- The source and target systems don't have enough RAM (around 2GB per every million of files) to keep all files tree in RAM
It's achieved in two steps, first - the whole array needs to be delivered to remote agents not consuming all the RAM; second - keep the delivered files in sync, synchronize newly populated files.
Step 1 - Initial files synchronization
- Calculate amount of subfolders (recursively) in the folder you are going to synchronize.
- Add custom "transfer_job_files_limit" parameter to Job profile of the agents that will sync the array. This value indicates the amount of file/folder entries to be synced per iteration. It is limited from lower side with the amount of folders you get: it MUST be greater than amount of folders. And it is limited with your RAM capabilities from upper side: each file/folder consumes up to 2KB of RAM. Therefore, if you get 4GB of RAM, set this value to 1500000 at maximum (=3GB, keeping in mind that some RAM should be reserved for the OS itself)
- Configure and start distribution job
- Mark down the time when the job started
Once the job completed initial files transfer, proceed to the second step
Step 2 - Keeping the array of data synchronized
- Add custom parameter "scanner_max_file_age" to Job profile.
- This parameter allows Agent to only sync files that has changed during X last seconds. Therefore, setting it to 86400 orders Agent to only sync files changed during last 24 hours. Ensure that this value overlaps the timestamp you marked down in Step 1.
- Setup a Sync job.
Agents will only recheck the files that fall into the configured time range. It will speed up syncing, as agents won't have to recheck all those millions of files.
Enforcing absolute time window for files to be synced
In other complex setups, for step 2 administrator might need to force agent to scan files using absolute time window, not a sliding window. This can be done by using 2 custom parameters:
- scanner_min_file_age - starting time of the window in UNIXTIME format
- scanner_max_file_age - ending time of the window in UNIXTIME format
For example, to only sync files from May 25 2019 9:00am (UTC) to May 29 2019 9:00am (UTC) set
- scanner_min_file_age = 1559120400
- scanner_max_file_age = 1558774800
You can use this site for UNIXTIME conversion