As all the data delivery is performed by agents (not Management Console), some agents will have to take the role of "cloud" agent when an admin needs to work with data in cloud storage. This article explains some of the specifics of its behavior and contains recommendations on how to configure the 'cloud' agent for better performance.
Technically an agent can be set up to sync files at local storages in some jobs and at cloud storages in other jobs. However, for the latter, some fine-tuning of the agent's settings will be required which may negatively impact the former. Thus, first of all, it's highly advisable to isolate such an agent and let it only work with a specific cloud storage and assign a special profile to that agent.
Create a new group, add the agent here and assign Cloud storage profile to the group or apply "Cloud storage" preset to a separate profile for this group. Later, perform all settings adjustments in that isolated profile only.
Cloud storage path for a group is not supportedDon't use group for a cloud storage path. Use the agent from this group when configuring the job.
Some of the behavior patterns are hardcoded and cannot be changed manually. Others are to be changed in the isolated profile as advised above.
A cloud agent does not try to resolve filename conflict and will upload a file to storage as-is, ending up with two objects. Other non-cloud agents will act in accordance with their system specifics and Profile settings in this regard.
Cloud agent does not check files' pieces for copying local file pieces benefits. The updated file will be fully re-synced.
Cloud storages don't have common file system notifications about new or updated files while working in continuous Synchronization job, thus cloud agent relies on periodic folder scan to discover file updates. In the configured "Cloud storage profile", "Rescan interval" parameter is set to an hour. If files change frequently there, decrease the rescan interval even more. Else, use force rescan option.
If the cloud agent acts as a destination in Consolidation or distribution job, set it to 0 (= disable) for the agents not to check local pre-seeded objects.
RAM requirements are the same as for synchronization to a non-cloud. See this article for details.
Special Profile settings
As mentioned above, the agent that will perform file transfer to/from a cloud storage, will require some special settings set in its Profile:
1) Increase "Number of disk I/O threads" parameter (agent profile) to 20 for your cloud agent. It's already set in the pre-configured Cloud storage profile.
2) When working with Google Cloud storage and uploading large files there, increase "Min size of torrent block" to 8388608 = 8Mb).
3) Disable periodic folder scan if the cloud agent is a destination agent in Distribution or Consolidation job. Set Rescan interval to 0 and restart the agent.
|No changes in cloud expected||Set "Rescan interval" parameter to zero (= disable it)|
|Cloud agent is Linux||
Run the following commands to allow Agent to control buffers for better performance
Add the following custom settings to profile
Change these settings as below:
|Cloud agent is Windows, performance over 1Gbps||
Add the following settings tp Profile:
Change these settings as below:
|Use distribution or consolidation job, cloud is the destination||Set "Rescan interval" parameter (agent profile) to zero.|
|Delivery to S3, performance up to 1Gbps||EC2 linux instance type t3.xlarge and below, general-purpose drive|
|Delivery to S3, performance above 1Gbps||EC2 linux instance type t3.xlarge with guaranteed 10Gbps performance (e.g. m5.8xlarge), provisioned drive|
|Delivery to Azure, performance up to 1Gbps||Linux general-purpose machine virtual D4s v3 or better|
|Delivery to Azure, performance above 1Gbps||
Linux memory-optimized virtual, storage D14 v2
|Delivery to Google cloud, performance above 2Gbps||
Linux machine type n1-standard-64 or better
Pre-seeded folder specifics
All cloud solutions do not allow to explicitly adjust modification time of the files that were uploaded to a cloud. This forces Agents to follow a complex logic to make a decision whether to download data from cloud or deliver data to cloud when same filenames are already on destination.
Basically, agents rely on file size to determine whether some syncing needs to be performed.
If file size matches on all agents - on local and cloud storages, it's assumed that file on local file system is newer if its timestamp is newer, it will be uploaded to cloud. If timestamp on local filesystem is older, nothing is synced. If all files are on cloud storage, files are considered equal if sizes match.
If size differs, the one with latest timestamp will be uploaded to others. Once synced, it will keep true mtime of files in cloud in its own database.
If you synchronize data with RW -> RO pattern, please see the behavior in the "Transfer jobs" section for the initial synchronization.
Distribution / consolidation jobs
|Mtime on SRC is newer||Mtime on SRC is older|
|Cloud is DESTINATION||Upload file to cloud||
Do nothing *
|Cloud is SOURCE||Do nothing||Download file from cloud **
(or set TJET=0 to do nothing)
TJET - transfer_job_exact_timestamps
* set "transfer_job_exact_timestamps" custom parameter = 0 (i.e. "Disable the mtime optimization feature") to override and DO UPLOAD all files to cloud on every job run
** set "transfer_job_exact_timestamps" = 0 (i.e. "Enable mtime optimization feature for regular disks, too") to override and DO NOT download file from cloud
Only job profile for TJETSet the "transfer_job_exact_timestamps" custom parameter in job profile, not in the agent profile.
What if the agent decides to upload/download a file to/from cloud
As opposed to standard "Disk-to-disk" transfer, agent WILL NOT hash file in cloud to decide which pieces it needs to upload/download, as this is equal to full file download from a cloud and costs money. Instead:
- when agent wants to upload a new version of a file into the cloud - it simply overwrites existing in the cloud.
- when agent wants to download a new version of a file from the cloud - it simply downloads complete file and overwrites local one.
Agents WILL hash newly added files as they are uploaded to the cloud. These hashes are used when merging with agents.
Archive and .sync folder specifics
Agents in cloud can create the service .sync directory in a cloud storage with the following specifics:
- Amazon S3 storage won't create it, until it's necessary to store an object deleted on remote agents in its .sync/Archive. No more service files will be created in .sync.
- Azure files will create .sync with Archive and ID file there.
- Azure Blob creates .sync, but Archive is created only after it's necessary to store an object deleted on remote agent.
- Google Cloud storage automatically creates .sync folder
List of limitations for jobs working with data in the cloud
- Renaming/moving files not supported. Renamed files will be simply re-uploaded to a cloud.
- Zero-sized files cannot be uploaded to a cloud storage.
- Archive is enabled, but does not store file versions. It only stores deleted objects. Use your cloud versioning capabilities to store file versions.
- Selective sync is not supported for cloud storage.
- Real-time notifications for cloud storage not supported. New files / updates are only detected via rescan.
- Cloud storage-specific attributes are not synchronized.
- Folder picker cannot browse cloud storage.
- Configuring cloud storage for a group of agents is not supported. Only for a single agent.
- File attributes synchronization not supported (both basic and extended).
- Seeding of partially downloaded files not supported. Cloud agent will only seed files if it has full file content.
- Symbolic links synchronization is supported with peculiarities and not supported at all for S3 storages.
- Checking file content before uploading to cloud not supported (files are always simply uploaded).