SciLo: Long Term Data Archiving
SciLo is a long term archiving service at ACCRE based on the Spectra Logic BlackPearl converged storage solution. With SciLo you can archive data at a very low cost and minimal system administrator intervention. Movement of data from ACCRE (or anywhere) to SciLo and back is accomplished with command line client (ds3_java_cli
) or, if you are using portal, an available GUI (dsb-gui
). There are other options available including cyberduck
and an API with a number of SDKs depending on your expertise. There is also the . All of this is based on Spectra S3 which uses the standard HTTP S3 command set plus expanded commands designed to optimize moving data object to and from tape.
Getting Started
Initial sign up to the SciLo requires creation of an account on our BlackPearl and issuance of an id and key. Open a helpdesk ticket with ACCRE requesting access. You will receive confirmation of account setup and the ID and secret key.
Recommended Server
Archiving can take some time. As such running on a gateway will not work. If your group has a custom gateway connected to the cluster you can use that (I always recommend using screen or tmux
so that you can log out and back in later).
We do have a gateway dedicated to archiving in the case where you don’t have a custom gateway to use. When your access is created your login credentials will work on that gateway and your ticket will be updated with that server’s login information.
Environment
Once you have access you will want to add this information to your environment. The environment variables $DS3_ACCESS_KEY
, $DS3_SECRET_KEY
, and DS3_ENDPOINT
are all special variables that ds3_java_cli
uses by default. These can be overridden with options (see -a
, -k
, and -e
below) if that fits more into your workflow.
~ $ export DS3_ACCESS_KEY=<Assigned s3 id>
~ $ export DS3_SECRET_KEY=<Assigned secret s3 key>
~ $ export DS3_ENDPOINT=archive1.accre.vanderbilt.edu
~ $ export s3bucket=<Assigned bucket>
Load the Software
The command line client (as well as the gui if you are on portal) are in our Lmod setup. You will want to execute the following to get them into your environment:
~$ module load GCC
~$ module load scilo-cli #for the command line ds3_java_cli
~$ module load scilo-gui # loads the gui for portal or X 11 forwarding dsp-gui
Command Line
The command line interface is ds3_java_cli
. Below are some examples:
~$ ds3_java_cli --help #displays a general help listing
~$ ds3_java_cli -c get_service #get a list of available buckets
+-------------------------------------------------------+-----Vanderbilt Help---------------------+
| Bucket Name | Creation Date |
+-------------------------------------------------------+--------------------------+
| my_bucket | 2019-03-07T00:08:24.000Z |
+-------------------------------------------------------+--------------------------+
~$ ds3_java_cli --http -c put_bulk -b mybucket -p /home/myusername/ -d /home/myusername/archivedirectory/ --sync -nt 6 --checksum
usage: ds3_java_cli
Option | Option Help |
---|---|
-a | Access Key ID or have "DS3_ACCESS_KEY" set asan environment variable |
-bs | Set the buffer size in bytes. The default is 1MB |
-c | The Command to execute. For Possible values, use '--help list_commands.' |
--debug | Debug (more verbose) output to console. |
-e | The ds3 endpoint to connect to or have "DS3_ENDPOINT" set as an environment variable. |
-h | Help Menu |
--help | Command Help (provide command name from -c) |
--http | Send all requests over standard HTTP |
--insecure | Ignore ssl certificate verification |
-k | Secret access key or have "DS3_SECRET_KEY" set as an environment variable |
--log-debug | Debug (more verbose) output to log file. |
--log-trace | Trace (most verbose) output to log file. |
--log-verbose | Log output to log file. |
--output-format | Configure how the output should be displayed. Possible values: [cli, json] |
-r | Specifies how many times puts and gets will be attempted before failing the request. The default is 5 |
--trace | Trace (most verbose) output to console. |
--verbose | Log output to console. |
--version | Print version information |
-x | The URL of the PROXY server to use or have "http_proxy" set as an environment variable |
Generally the ds3_java_cli
follows this example:
ds3_java_cli -e -a -k --http -c -o <object, if used by command> -b <bucket, if used by command>
Available Commands
Command | Command Help |
---|---|
delete_bucket | Deletes an empty bucket. Requires the '-b' parameter to specify bucket (by name or UUID). Use the '--force' flag to delete a bucket and all its contents. Use the get_service command to retrieve a list of buckets |
delete_folder | Deletes a folder and all its contents. Requires the '-b' parameter to specify bucket (name or UUID). Requires the '-d' parameter to specify folder name |
delete_job | Terminates and removes a current job. Requires the '-i' parameter with the UUID of the jobUse the '--force' flag to remove objects already loaded into cache. Use the get_jobs command to retrieve a list of jobs |
delete_object | Permanently deletes an object. Requires the '-b' parameter to specify bucketname. Requires the '-i' parameter to specify object name (UUID or name). Use the get_service command to retrieve a list of buckets. Use the get_bucket comma/nd to retrieve a list of objects |
delete_tape | Deletes the specified tape which has been permanently lost from the BlackPearl database. Any data lost as a result is marked degraded to trigger a rebuild. Requires the '-i' parameter to specify tape ID (UUID or barcode). Use the get_tapes command to retrieve a list of tape |
delete_tape_drive | Deletes the specified offline tape drive. This request is useful when a tape drive is permanently removed from a partition. Requires the '-i' parameter to specify tape drive ID. Use the get_tape_drives command to retrieve a list of tapes |
delete_tape_failure | Deletes a tape failure from the failure list. Requires the '-i' parameter to specify tape failure ID (UUID). Use the get_tape_failure command to retrieve a list of IDs |
delete_tape_partition | Deletes the specified offline tape partition from the BlackPearl gateway configuration. Any tapes in the partition that have data on them are disassociated from the partition. Any tapes without data on them and all tape drives associated with the partition are deletedfrom the BlackPearl gateway configuration. This request is useful if the partition should neverhave been associated with the BlackPearl gateway or if the partition was deleted from the library. Requires the '-i' parameter to specify tape partition |
get_bucket | Returns bucket details plus a list of objects contained. Requires the '-b' parameter to specify bucket name or UUID. Use the get_service command to retrieve a list of buckets |
get_bulk | Retrieve multiple objects from a bucket. Requires the '-b' parameter to specify bucket (name or UUID). Optional '-d' parameter to specify restore directory (default '.'). Optional '-p' parameter to specify prefix or directory name. Separate multiple values with spaces, e.g., -p prefix1 prefix2Optional '--sync' flag to retrieve only newer or non-extant files. Optional '--file-metadata' flag restores file metadata to the values extant when archived. Optional '-nt' parameter to specify number of threads |
system_information | Retrieves basic system information: software version, build, and system serial number. Useful to test communication |
get_config_summary | Runs multiple commands to capture configuration information |
get_data_policy | Returns information about the specified data policy. Requires the '-i' parameter to specify data policy (UUID or name). Use the get_data_policies command to retrieve a list of policies |
get_data_policies | Returns information about the specified data policy. Requires the '-i' parameter to specify data policy (UUID or name). Use the get_data_policies command to retrieve a list of policies |
get_job | Retrieves information about a current job. Requires the '-i' parameter with the UUID of the jobUse the get_jobs command to retrieve a list of jobs |
get_jobs | Retrieves a list of all current jobs |
get_object | Retrieves a single object from a bucket. Requires the '-b' parameter to specify bucket (name or UUID). Requires the '-o' parameter to specify object (name or UUID). Optional '-d' parameter to specify restore directory (default '.'). Optional '--sync' flag to retrieve only newer or non-extant files. Optional '--file-metadata' flag restores file metadata to the values extant when archived. Optional '-nt' parameter to specify number of threads. Use the get_service command to retrieve a list of buckets. Use the get_bucket command to retrieve a list of objects |
get_objects_on_tape | Returns a list of the contents of a single tape. Requires the '-i' parameter to specify tape (barcode or UUID). Use the get_tapes command to retrieve a list of tapes |
get_physical_placement | Returns the location of a single object on tape. Requires the '-b' parameter to specify bucket (name or UUID). Requires the '-o' parameter to specify object (name or UUID). Use the get_service command to retrieve a list of buckets. Use the get_bucket command to retrieve a list of objects |
get_service | Returns a list of buckets on the device |
get_tape_failure | Returns a list of tape failures |
get_tapes | Returns a list of all tapes |
get_user | Returns information about an individual user. Requires the '-i' parameter to specify user (name or UUID). Use the get_users command to retrieve a list of users |
get_users | Returns a list of all users |
head_object | Returns metadata but does not retrieve an object from a bucket. Requires the '-b' parameter to specify bucket (name or UUID). Requires the '-o' parameter to specify object (name or UUID). Useful to determine if an object exists and you have permission to access it |
modify_data_policy | Alter parameters for the specified data policy. . Requires the '-i' parameter to specify data policy (UUID or name). Requires the '--modify-params' parameter to be set. Use key:value pair key:value,key2:value2: . . . Legal values:name, checksum_type, default_blob_size, default_get_job_priority,default_put_job_priority, default_verify_job_priority, rebuild_priority,end_to_end_crc_required, versioning. See API documentation for possible values). Use the get_data_policies command to retrieve a list of policies and current values |
modify_user | Alters information about an individual user. Requires the '-i' parameter to specify user (name or UUID). Requires the '--modify-params' parameter to be set. Use key:value pair key:value,key2:value2: . . . Legal values:default_data_policy_idUse the get_users command to retrieve a list of users |
performance | For internal testing. Generates mock file streams for put, and a discard (/dev/null)stream for get. Useful for testing network and system performance. Requires the '-b' parameter with a unique bucketname to be used for the test. Requires the '-n' parameter with the number of files to be used for the test. Requires the '-s' parameter with the size of each file in MB for the test. Optional '-bs' parameter with the buffer size in bytes (default 1MB). Optional '-nt' parameter with the number of threads |
put_bucket | Create a new empty bucket. Requires the '-b' parameter to specify bucket name |
put_bulk | Put multiple objects from a directory or pipe into a bucket. Requires the '-b' parameter to specify bucket (name or UUID). Requires the '-d' parameter (unless \") to specify source directory. Optional '-p' parameter (unless \" ) to specify prefix or directory name. Optional '--sync' flag to put only newer or non-extant files. Optional '--file-metadata' flag archives file metadata with files. Optional '-nt' parameter to specify number of threads. Optional '--ignore-errors' flag to continue on errors. Optional '--follow-symlinks' flag to follow symlink (default is disregard) |
reclaim_cache | Forces a full reclaim of all caches, and waits untilthe reclaim completes. Cache contents that need to be retainedbecause they are a part of an active job are retained. Any cachecontents that can be reclaimed will be. This operation may take avery long time to complete, depending on how much of the cache canbe reclaimed and how many blobs the cache is managing |
verify_bulk_job | A verify job reads data from the permanent data store and verifies that the CRC of the dataread matches the expected CRC. Verify jobs ALWAYS read from the data store - even if the datacurrently resides in cache. Requires the '-b' parameter to specify bucket (name or UUID). Requires the '-o' parameter to specify object (name or UUID). Optional '-p' parameter to specify prefix or directory name |
get_data_path_backend | Gets configuration information about the data path backend |
get_cache_state | Gets the utilization and state information for all cache filesystems |
get_system_failure | |
get_capacity_summary | Get a summary of the BlackPearl Deep Storage Gateway system-wide capacity |
verify_system_health | Verifies that the system appears to be online and functioning normally and that there is adequate free space for the database file system |
verify_all_tapes | Verify the integrity of all the tapes in the black pearl |
verify_tape | |
get_suspect_objects | |
get_suspect_blob_tapes | |
modify_data_path | |
verify_pool | |
verify_all_pools | |
get_detailed_objects | Filter an object list by size or creation date. Returns one line for each object. Optional '-b' bucket_nameOptional '--filter-params' to filter results. Use key:value pair key:value,key2:value2: . . . Legal values:newerthan, olderthan specify relative date from now in format d1. h2. m3. s4 (zero values can be omitted , separate with '.')before, after specify absolute UTC date in format Y2016. M11. D9. h12. ZPDT(zero values or UTC time zone can be omitted , separate with '.')owner owner namecontains string to match in object namelargerthan, smallerthan object size in bytesNote: bucket will restrict values returned, filter-params will transfer (potentially large) object listand filter client-side |
get_detailed_objects_physical | Get a list of objects on tape, filtered by size or creation date. Returns one line for each instance on tape. Optional '-b' bucket_nameOptional '--filter-params' to filter results. Use key:value pair key:value,key2:value2: . . . Legal values:newerthan, olderthan specify relative date from now in format d1. h2. m3. s4 (zero values can be omitted , separate with '.')before, after specify absolute UTC date in format Y2016. M11. D9. h12. ZPDT(zero values or UTC time zone can be omitted , separate with '.')owner owner namecontains string to match in object namelargerthan, smallerthan object size in bytesNote: bucket will restrict values returned, filter-params will transfer (potentially large) object listand filter client-side |
eject_storage_domain | Ejects all eligible tapes within the specified storage domain. Tapes are not eligible for ejection if mediaEjectionAllowed=FALSE for the storage domain. If a tape is being used for a job, it is ejected once it is no longer in use. Use the get_storage_domains command to retrieve a list of storage domains |
get_storage_domains | Get information about all storage domains. Optional -i (UUID or name) restricts output to one storage domainOptional --writeOptimization (capacity performance) filters results to those matching write optimization. |
get_tape | Returns information on a single tape. If the tape has been ejected, then the ejection information will also be displayed. Required '-i' tape barcode or i |
get_bucket_details | Returns bucket details by either UUID or bucket name. Requires the '-b' parameter to specify bucket name or UUID. Useful to get name by ID or ID by name. Use the get_service command to retrieve a list of buckets |
eject_tape | Ejects the tape uniquely identified by ID. Tapes are not eligible for ejection if mediaEjectionAllowed=FALSE for the storage domain. If a tape is being used for a job, it is ejected once it is no longer in use. Use the get_tapes command or get_detailed_objects_physical to find tape id |
modify_job | |
recover_put_bulk | Recovers a put_bulk job. Requires the '-b' parameter to specify bucket (name or UUID). Requires the '-d' parameter (unless \|) to specify source directory. Requires the '-i" parameter with the UUID for the interrupted or failed job. Optional '--file-metadata' flag archives file metadata with files. Other parameters should match the original put_bulk |
recover_get_bulk | Recovers a get_bulk job. Requires the '-b' parameter to specify bucket (name or UUID). Requires the '-i" parameter with the UUID for the interrupted or failed job. Optional '--file-metadata' flag restores file metadata to the values extant when archived. Other parameters should match the original get_bulk |
cancel_verify_all_tapes | Cancel a previous request to verify all the tapes in the DS3 appliance |
cancel_verify_tape | Cancel a previous request to verify a tape in the DS3 appliance. Required '-i' tape id (barcode, name, UUID |
get_pools | Returns all pools matching option filter criteria |
get_pool | Returns information on a single pool. Required '-i' pool name or i |
cancel_verify_pool | Cancel previous request to verify a pool in the DS3 appliance. Required '-i' pool id (name or UUID |
cancel_verify_all_pools | Cancel previous request to verify all the pools in the DS3 appliance |
recover | Recover a failed or iterrupted put_bulk or get_bulk job using recover files. Recover files are written to temp space on put_bulk and get_bulk |