Skip to main content

SciLo: Long Term Data Archiving

SciLo is a long term archiving service at ACCRE based on the Spectra Logic BlackPearl converged storage solution. With SciLo you can archive data at a very low cost and minimal system administrator intervention. Movement of data from ACCRE (or anywhere) to SciLo and back is accomplished with command line client (ds3_java_cli) or, if you are using portal, an available GUI (dsb-gui). There are other options available including cyberduck and an API with a number of SDKs depending on your expertise. There is also the Eon Browser GUI solution from Spectra Logic for Windows, Mac and Linux. All of this is based on Spectra S3 which uses the standard HTTP S3 command set plus expanded commands designed to optimize moving data object to and from tape.

Getting Started

Initial sign up to the SciLo requires creation of an account on our BlackPearl and issuance of an id and key. Open a helpdesk ticket with ACCRE requesting access. You will receive confirmation of account setup and the ID and secret key.

Recommended Server

Archiving can take some time. As such running on a gateway will not work. If your group has a custom gateway connected to the cluster you can use that (I always recommend using screen or tmux so that you can log out and back in later).

We do have a gateway dedicated to archiving in the case where you don’t have a custom gateway to use. When your access is created your login credentials will work on that gateway and your ticket will be updated with that server’s login information.

You can also backup other sources (not just the cluster) and so this can be run from personal servers/desktops.

Environment

Once you have access you will want to add this information to your environment. The environment variables $DS3_ACCESS_KEY, $DS3_SECRET_KEY, and DS3_ENDPOINT are all special variables that ds3_java_cli uses by default. These can be overridden with options (see -a, -k, and -e below) if that fits more into your workflow.

~ $ export DS3_ACCESS_KEY=<Assigned s3 id>
~ $ export DS3_SECRET_KEY=<Assigned secret s3 key>
~ $ export DS3_ENDPOINT=archive1.accre.vanderbilt.edu
~ $ export s3bucket=<Assigned bucket>

Load the Software

The command line client (as well as the gui if you are on portal) are in our Lmod setup. You will want to execute the following to get them into your environment:

~$ module load GCC
~$ module load scilo-cli  #for the command line ds3_java_cli
~$ module load scilo-gui  # loads the gui for portal or X 11 forwarding dsp-gui

Command Line

The command line interface is ds3_java_cli. Below are some examples:

~$ ds3_java_cli --help #displays a general help listing
~$ ds3_java_cli -c get_service #get a list of available buckets
+-------------------------------------------------------+-----Vanderbilt Help---------------------+
|                      Bucket Name                      |       Creation Date      |
+-------------------------------------------------------+--------------------------+
| my_bucket                                             | 2019-03-07T00:08:24.000Z |
+-------------------------------------------------------+--------------------------+

~$ ds3_java_cli --http -c put_bulk -b mybucket -p /home/myusername/ -d /home/myusername/archivedirectory/ --sync -nt 6 --checksum

usage: ds3_java_cli

OptionOption Help
-a Access Key ID or have "DS3_ACCESS_KEY" set asan environment variable
-bs Set the buffer size in bytes. The default is 1MB
-c The Command to execute. For Possible values, use '--help list_commands.'
--debugDebug (more verbose) output to console.
-e The ds3 endpoint to connect to or have "DS3_ENDPOINT" set as an environment variable.
-hHelp Menu
--help Command Help (provide command name from -c)
--httpSend all requests over standard HTTP
--insecureIgnore ssl certificate verification
-k Secret access key or have "DS3_SECRET_KEY" set as an environment variable
--log-debugDebug (more verbose) output to log file.
--log-traceTrace (most verbose) output to log file.
--log-verboseLog output to log file.
--output-format Configure how the output should be displayed.
Possible values: [cli, json]
-r Specifies how many times puts and gets will be attempted before failing the request. The default is 5
--traceTrace (most verbose) output to console.
--verboseLog output to console.
--versionPrint version information
-x The URL of the PROXY server to use or have "http_proxy" set as an environment variable

Generally the ds3_java_cli follows this example:

ds3_java_cli -e -a -k --http -c -o <object, if used by command> -b <bucket, if used by command>

Available Commands

CommandCommand Help
delete_bucketDeletes an empty bucket.
Requires the '-b' parameter to specify bucket (by name or UUID).
Use the '--force' flag to delete a bucket and all its contents.
Use the get_service command to retrieve a list of buckets
delete_folderDeletes a folder and all its contents.
Requires the '-b' parameter to specify bucket (name or UUID).
Requires the '-d' parameter to specify folder name
delete_jobTerminates and removes a current job.
Requires the '-i' parameter with the UUID of the jobUse the '--force' flag to remove objects already loaded into cache.
Use the get_jobs command to retrieve a list of jobs
delete_objectPermanently deletes an object.
Requires the '-b' parameter to specify bucketname.
Requires the '-i' parameter to specify object name (UUID or name).
Use the get_service command to retrieve a list of buckets.
Use the get_bucket comma/nd to retrieve a list of objects
delete_tapeDeletes the specified tape which has been permanently lost from the BlackPearl database.
Any data lost as a result is marked degraded to trigger a rebuild.
Requires the '-i' parameter to specify tape ID (UUID or barcode).
Use the get_tapes command to retrieve a list of tape
delete_tape_driveDeletes the specified offline tape drive.
This request is useful when a tape drive is permanently removed from a partition.
Requires the '-i' parameter to specify tape drive ID.
Use the get_tape_drives command to retrieve a list of tapes
delete_tape_failureDeletes a tape failure from the failure list.
Requires the '-i' parameter to specify tape failure ID (UUID).
Use the get_tape_failure command to retrieve a list of IDs
delete_tape_partitionDeletes the specified offline tape partition from the BlackPearl gateway configuration.
Any tapes in the partition that have data on them are disassociated from the partition.
Any tapes without data on them and all tape drives associated with the partition are deletedfrom the BlackPearl gateway configuration.
This request is useful if the partition should neverhave been associated with the BlackPearl gateway or if the partition was deleted from the library.
Requires the '-i' parameter to specify tape partition
get_bucketReturns bucket details plus a list of objects contained.
Requires the '-b' parameter to specify bucket name or UUID.
Use the get_service command to retrieve a list of buckets
get_bulkRetrieve multiple objects from a bucket.
Requires the '-b' parameter to specify bucket (name or UUID).
Optional '-d' parameter to specify restore directory (default '.').
Optional '-p' parameter to specify prefix or directory name.
Separate multiple values with spaces, e.g., -p prefix1 prefix2Optional '--sync' flag to retrieve only newer or non-extant files.
Optional '--file-metadata' flag restores file metadata to the values extant when archived.
Optional '-nt' parameter to specify number of threads
system_informationRetrieves basic system information: software version, build, and system serial number.
Useful to test communication
get_config_summaryRuns multiple commands to capture configuration information
get_data_policyReturns information about the specified data policy.
Requires the '-i' parameter to specify data policy (UUID or name).
Use the get_data_policies command to retrieve a list of policies
get_data_policiesReturns information about the specified data policy.
Requires the '-i' parameter to specify data policy (UUID or name).
Use the get_data_policies command to retrieve a list of policies
get_jobRetrieves information about a current job.
Requires the '-i' parameter with the UUID of the jobUse the get_jobs command to retrieve a list of jobs
get_jobsRetrieves a list of all current jobs
get_objectRetrieves a single object from a bucket.
Requires the '-b' parameter to specify bucket (name or UUID).
Requires the '-o' parameter to specify object (name or UUID).
Optional '-d' parameter to specify restore directory (default '.').
Optional '--sync' flag to retrieve only newer or non-extant files.
Optional '--file-metadata' flag restores file metadata to the values extant when archived.
Optional '-nt' parameter to specify number of threads.
Use the get_service command to retrieve a list of buckets.
Use the get_bucket command to retrieve a list of objects
get_objects_on_tapeReturns a list of the contents of a single tape.
Requires the '-i' parameter to specify tape (barcode or UUID).
Use the get_tapes command to retrieve a list of tapes
get_physical_placementReturns the location of a single object on tape.
Requires the '-b' parameter to specify bucket (name or UUID).
Requires the '-o' parameter to specify object (name or UUID).
Use the get_service command to retrieve a list of buckets.
Use the get_bucket command to retrieve a list of objects
get_serviceReturns a list of buckets on the device
get_tape_failureReturns a list of tape failures
get_tapesReturns a list of all tapes
get_userReturns information about an individual user.
Requires the '-i' parameter to specify user (name or UUID).
Use the get_users command to retrieve a list of users
get_usersReturns a list of all users
head_objectReturns metadata but does not retrieve an object from a bucket.
Requires the '-b' parameter to specify bucket (name or UUID).
Requires the '-o' parameter to specify object (name or UUID).
Useful to determine if an object exists and you have permission to access it
modify_data_policyAlter parameters for the specified data policy.
.
Requires the '-i' parameter to specify data policy (UUID or name).
Requires the '--modify-params' parameter to be set.
Use key:value pair key:value,key2:value2: . . .
Legal values:name, checksum_type, default_blob_size, default_get_job_priority,default_put_job_priority, default_verify_job_priority, rebuild_priority,end_to_end_crc_required, versioning.
See API documentation for possible values).
Use the get_data_policies command to retrieve a list of policies and current values
modify_userAlters information about an individual user.
Requires the '-i' parameter to specify user (name or UUID).
Requires the '--modify-params' parameter to be set.
Use key:value pair key:value,key2:value2: . . .
Legal values:default_data_policy_idUse the get_users command to retrieve a list of users
performanceFor internal testing.
Generates mock file streams for put, and a discard (/dev/null)stream for get.
Useful for testing network and system performance.
Requires the '-b' parameter with a unique bucketname to be used for the test.
Requires the '-n' parameter with the number of files to be used for the test.
Requires the '-s' parameter with the size of each file in MB for the test.
Optional '-bs' parameter with the buffer size in bytes (default 1MB).
Optional '-nt' parameter with the number of threads
put_bucketCreate a new empty bucket.
Requires the '-b' parameter to specify bucket name
put_bulkPut multiple objects from a directory or pipe into a bucket.
Requires the '-b' parameter to specify bucket (name or UUID).
Requires the '-d' parameter (unless \") to specify source directory.
Optional '-p' parameter (unless \" ) to specify prefix or directory name.
Optional '--sync' flag to put only newer or non-extant files.
Optional '--file-metadata' flag archives file metadata with files.
Optional '-nt' parameter to specify number of threads.
Optional '--ignore-errors' flag to continue on errors.
Optional '--follow-symlinks' flag to follow symlink (default is disregard)
reclaim_cacheForces a full reclaim of all caches, and waits untilthe reclaim completes.
Cache contents that need to be retainedbecause they are a part of an active job are retained.
Any cachecontents that can be reclaimed will be.
This operation may take avery long time to complete, depending on how much of the cache canbe reclaimed and how many blobs the cache is managing
verify_bulk_jobA verify job reads data from the permanent data store and verifies that the CRC of the dataread matches the expected CRC.
Verify jobs ALWAYS read from the data store - even if the datacurrently resides in cache.
Requires the '-b' parameter to specify bucket (name or UUID).
Requires the '-o' parameter to specify object (name or UUID).
Optional '-p' parameter to specify prefix or directory name
get_data_path_backendGets configuration information about the data path backend
get_cache_stateGets the utilization and state information for all cache filesystems
get_system_failure
get_capacity_summaryGet a summary of the BlackPearl Deep Storage Gateway system-wide capacity
verify_system_healthVerifies that the system appears to be online and functioning normally and that there is adequate free space for the database file system
verify_all_tapesVerify the integrity of all the tapes in the black pearl
verify_tape
get_suspect_objects
get_suspect_blob_tapes
modify_data_path
verify_pool
verify_all_pools
get_detailed_objectsFilter an object list by size or creation date.
Returns one line for each object.
Optional '-b' bucket_nameOptional '--filter-params' to filter results.
Use key:value pair key:value,key2:value2: . . .
Legal values:newerthan, olderthan specify relative date from now in format d1.
h2.
m3.
s4 (zero values can be omitted , separate with '.')before, after specify absolute UTC date in format Y2016.
M11.
D9.
h12.
ZPDT(zero values or UTC time zone can be omitted , separate with '.')owner owner namecontains string to match in object namelargerthan, smallerthan object size in bytesNote: bucket will restrict values returned, filter-params will transfer (potentially large) object listand filter client-side
get_detailed_objects_physicalGet a list of objects on tape, filtered by size or creation date.
Returns one line for each instance on tape.
Optional '-b' bucket_nameOptional '--filter-params' to filter results.
Use key:value pair key:value,key2:value2: . . .
Legal values:newerthan, olderthan specify relative date from now in format d1.
h2.
m3.
s4 (zero values can be omitted , separate with '.')before, after specify absolute UTC date in format Y2016.
M11.
D9.
h12.
ZPDT(zero values or UTC time zone can be omitted , separate with '.')owner owner namecontains string to match in object namelargerthan, smallerthan object size in bytesNote: bucket will restrict values returned, filter-params will transfer (potentially large) object listand filter client-side
eject_storage_domainEjects all eligible tapes within the specified storage domain.
Tapes are not eligible for ejection if mediaEjectionAllowed=FALSE for the storage domain.
If a tape is being used for a job, it is ejected once it is no longer in use.
Use the get_storage_domains command to retrieve a list of storage domains
get_storage_domainsGet information about all storage domains.
Optional -i (UUID or name) restricts output to one storage domainOptional --writeOptimization (capacity performance) filters results to those matching write optimization.
get_tapeReturns information on a single tape.
If the tape has been ejected, then the ejection information will also be displayed.
Required '-i' tape barcode or i
get_bucket_detailsReturns bucket details by either UUID or bucket name.
Requires the '-b' parameter to specify bucket name or UUID.
Useful to get name by ID or ID by name.
Use the get_service command to retrieve a list of buckets
eject_tapeEjects the tape uniquely identified by ID.
Tapes are not eligible for ejection if mediaEjectionAllowed=FALSE for the storage domain.
If a tape is being used for a job, it is ejected once it is no longer in use.
Use the get_tapes command or get_detailed_objects_physical to find tape id
modify_job
recover_put_bulkRecovers a put_bulk job.
Requires the '-b' parameter to specify bucket (name or UUID).
Requires the '-d' parameter (unless \|) to specify source directory.
Requires the '-i" parameter with the UUID for the interrupted or failed job.
Optional '--file-metadata' flag archives file metadata with files.
Other parameters should match the original put_bulk
recover_get_bulkRecovers a get_bulk job.
Requires the '-b' parameter to specify bucket (name or UUID).
Requires the '-i" parameter with the UUID for the interrupted or failed job.
Optional '--file-metadata' flag restores file metadata to the values extant when archived.
Other parameters should match the original get_bulk
cancel_verify_all_tapesCancel a previous request to verify all the tapes in the DS3 appliance
cancel_verify_tapeCancel a previous request to verify a tape in the DS3 appliance.
Required '-i' tape id (barcode, name, UUID
get_poolsReturns all pools matching option filter criteria
get_poolReturns information on a single pool.
Required '-i' pool name or i
cancel_verify_poolCancel previous request to verify a pool in the DS3 appliance.
Required '-i' pool id (name or UUID
cancel_verify_all_poolsCancel previous request to verify all the pools in the DS3 appliance
recoverRecover a failed or iterrupted put_bulk or get_bulk job using recover files.
Recover files are written to temp space on put_bulk and get_bulk