Pluggable On-Disk Back-end APIs

The internal REST API used between the proxy server and the account, container and object server is almost identical to public Swift REST API, but with a few internal extensions (for example, update an account with a new container).

The pluggable back-end APIs for the three REST API servers (account, container, object) abstracts the needs for servicing the various REST APIs from the details of how data is laid out and stored on-disk.

The APIs are documented in the reference implementations for all three servers. For historical reasons, the object server backend reference implementation module is named diskfile, while the account and container server backend reference implementation modules are named appropriately.

This API is still under development and not yet finalized.

Back-end API for Account Server REST APIs

Pluggable Back-end for Account Server

class swift.account.backend.AccountBroker(db_file, timeout=25, logger=None, account=None, container=None, pending_timeout=None, stale_reads_ok=False)

Encapsulates working with an account database.

create_account_stat_table(conn, put_timestamp)

Create account_stat table which is specific to the account DB. Not a part of Pluggable Back-ends, internal to the baseline code.

Parameters:
  • conn – DB connection object
  • put_timestamp – put timestamp
create_container_table(conn)

Create container table which is specific to the account DB.

Parameters:conn – DB connection object
create_policy_stat_table(conn)

Create policy_stat table which is specific to the account DB. Not a part of Pluggable Back-ends, internal to the baseline code.

Parameters:conn – DB connection object
empty()

Check if the account DB is empty.

Returns:True if the database has no active containers.
get_info()

Get global data for the account.

Returns:dict with keys: account, created_at, put_timestamp, delete_timestamp, status_changed_at, container_count, object_count, bytes_used, hash, id
get_policy_stats(do_migrations=False)

Get global policy stats for the account.

Parameters:do_migrations – boolean, if True the policy stat dicts will always include the ‘container_count’ key; otherwise it may be omitted on legacy databases until they are migrated.
Returns:dict of policy stats where the key is the policy index and the value is a dictionary like {‘object_count’: M, ‘bytes_used’: N, ‘container_count’: L}
is_status_deleted()

Only returns true if the status field is set to DELETED.

list_containers_iter(limit, marker, end_marker, prefix, delimiter)

Get a list of containers sorted by name starting at marker onward, up to limit entries. Entries will begin with the prefix and will not have the delimiter after the prefix.

Parameters:
  • limit – maximum number of entries to get
  • marker – marker query
  • end_marker – end marker query
  • prefix – prefix query
  • delimiter – delimiter for query
Returns:

list of tuples of (name, object_count, bytes_used, 0)

merge_items(item_list, source=None)

Merge items into the container table.

Parameters:
  • item_list – list of dictionaries of {‘name’, ‘put_timestamp’, ‘delete_timestamp’, ‘object_count’, ‘bytes_used’, ‘deleted’, ‘storage_policy_index’}
  • source – if defined, update incoming_sync with the source
put_container(name, put_timestamp, delete_timestamp, object_count, bytes_used, storage_policy_index)

Create a container with the given attributes.

Parameters:
  • name – name of the container to create
  • put_timestamp – put_timestamp of the container to create
  • delete_timestamp – delete_timestamp of the container to create
  • object_count – number of objects in the container
  • bytes_used – number of bytes used by the container
  • storage_policy_index – the storage policy for this container

Back-end API for Container Server REST APIs

Pluggable Back-ends for Container Server

class swift.container.backend.ContainerBroker(db_file, timeout=25, logger=None, account=None, container=None, pending_timeout=None, stale_reads_ok=False)

Encapsulates working with a container database.

create_container_info_table(conn, put_timestamp, storage_policy_index)

Create the container_info table which is specific to the container DB. Not a part of Pluggable Back-ends, internal to the baseline code. Also creates the container_stat view.

Parameters:
  • conn – DB connection object
  • put_timestamp – put timestamp
  • storage_policy_index – storage policy index
create_object_table(conn)

Create the object table which is specific to the container DB. Not a part of Pluggable Back-ends, internal to the baseline code.

Parameters:conn – DB connection object
create_policy_stat_table(conn, storage_policy_index=0)

Create policy_stat table.

Parameters:
  • conn – DB connection object
  • storage_policy_index – the policy_index the container is being created with
delete_object(name, timestamp, storage_policy_index=0)

Mark an object deleted.

Parameters:
  • name – object name to be deleted
  • timestamp – timestamp when the object was marked as deleted
empty()

Check if container DB is empty.

Returns:True if the database has no active objects, False otherwise
get_info()

Get global data for the container.

Returns:dict with keys: account, container, created_at, put_timestamp, delete_timestamp, status_changed_at, object_count, bytes_used, reported_put_timestamp, reported_delete_timestamp, reported_object_count, reported_bytes_used, hash, id, x_container_sync_point1, x_container_sync_point2, and storage_policy_index.
get_info_is_deleted()

Get the is_deleted status and info for the container.

Returns:a tuple, in the form (info, is_deleted) info is a dict as returned by get_info and is_deleted is a boolean.
get_misplaced_since(start, count)

Get a list of objects which are in a storage policy different from the container’s storage policy.

Parameters:
  • start – last reconciler sync point
  • count – maximum number of entries to get
Returns:

list of dicts with keys: name, created_at, size, content_type, etag, storage_policy_index

list_objects_iter(limit, marker, end_marker, prefix, delimiter, path=None, storage_policy_index=0)

Get a list of objects sorted by name starting at marker onward, up to limit entries. Entries will begin with the prefix and will not have the delimiter after the prefix.

Parameters:
  • limit – maximum number of entries to get
  • marker – marker query
  • end_marker – end marker query
  • prefix – prefix query
  • delimiter – delimiter for query
  • path – if defined, will set the prefix and delimiter based on the path
Returns:

list of tuples of (name, created_at, size, content_type, etag)

merge_items(item_list, source=None)

Merge items into the object table.

Parameters:
  • item_list – list of dictionaries of {‘name’, ‘created_at’, ‘size’, ‘content_type’, ‘etag’, ‘deleted’, ‘storage_policy_index’}
  • source – if defined, update incoming_sync with the source
put_object(name, timestamp, size, content_type, etag, deleted=0, storage_policy_index=0)

Creates an object in the DB with its metadata.

Parameters:
  • name – object name to be created
  • timestamp – timestamp of when the object was created
  • size – object size
  • content_type – object content-type
  • etag – object etag
  • deleted – if True, marks the object as deleted and sets the deleted_at timestamp to timestamp
  • storage_policy_index – the storage policy index for the object
reported(put_timestamp, delete_timestamp, object_count, bytes_used)

Update reported stats, available with container’s get_info.

Parameters:
  • put_timestamp – put_timestamp to update
  • delete_timestamp – delete_timestamp to update
  • object_count – object_count to update
  • bytes_used – bytes_used to update
set_storage_policy_index(policy_index, timestamp=None)

Update the container_stat policy_index and status_changed_at.

Back-end API for Object Server REST APIs

Disk File Interface for the Swift Object Server

The DiskFile, DiskFileWriter and DiskFileReader classes combined define the on-disk abstraction layer for supporting the object server REST API interfaces (excluding REPLICATE). Other implementations wishing to provide an alternative backend for the object server must implement the three classes. An example alternative implementation can be found in the mem_server.py and mem_diskfile.py modules along size this one.

The DiskFileManager is a reference implemenation specific class and is not part of the backend API.

The remaining methods in this module are considered implementation specific and are also not considered part of the backend API.

class swift.obj.diskfile.AuditLocation(path, device, partition, policy)

Represents an object location to be audited.

Other than being a bucket of data, the only useful thing this does is stringify to a filesystem path so the auditor’s logs look okay.

class swift.obj.diskfile.BaseDiskFile(mgr, device_path, threadpool, partition, account=None, container=None, obj=None, _datadir=None, policy=None, use_splice=False, pipe_size=None, **kwargs)

Manage object files.

This specific implementation manages object files on a disk formatted with a POSIX-compliant file system that supports extended attributes as metadata on a file or directory.

Note

The arguments to the constructor are considered implementation specific. The API does not define the constructor arguments.

The following path format is used for data file locations: <devices_path/<device_dir>/<datadir>/<partdir>/<suffixdir>/<hashdir>/ <datafile>.<ext>

Parameters:
  • mgr – associated DiskFileManager instance
  • device_path – path to the target device or drive
  • threadpool – thread pool to use for blocking operations
  • partition – partition on the device in which the object lives
  • account – account name for the object
  • container – container name for the object
  • obj – object name for the object
  • _datadir – override the full datadir otherwise constructed here
  • policy – the StoragePolicy instance
  • use_splice – if true, use zero-copy splice() to send data
  • pipe_size – size of pipe buffer used in zero-copy operations
create(*args, **kwds)

Context manager to create a file. We create a temporary file first, and then return a DiskFileWriter object to encapsulate the state.

Note

An implementation is not required to perform on-disk preallocations even if the parameter is specified. But if it does and it fails, it must raise a DiskFileNoSpace exception.

Parameters:size – optional initial size of file to explicitly allocate on disk
Raises DiskFileNoSpace:
 if a size is specified and allocation fails
delete(timestamp)

Delete the object.

This implementation creates a tombstone file using the given timestamp, and removes any older versions of the object file. Any file that has an older timestamp than timestamp will be deleted.

Note

An implementation is free to use or ignore the timestamp parameter.

Parameters:timestamp – timestamp to compare with each file
Raises DiskFileError:
 this implementation will raise the same errors as the create() method.
get_datafile_metadata()

Provide the datafile metadata for a previously opened object as a dictionary. This is metadata that was included when the object was first PUT, and does not include metadata set by any subsequent POST.

Returns:object’s datafile metadata dictionary
Raises DiskFileNotOpen:
 if the swift.obj.diskfile.DiskFile.open() method was not previously invoked
get_metadata()

Provide the metadata for a previously opened object as a dictionary.

Returns:object’s metadata dictionary
Raises DiskFileNotOpen:
 if the swift.obj.diskfile.DiskFile.open() method was not previously invoked
get_metafile_metadata()

Provide the metafile metadata for a previously opened object as a dictionary. This is metadata that was written by a POST and does not include any persistent metadata that was set by the original PUT.

Returns:object’s .meta file metadata dictionary, or None if there is no .meta file
Raises DiskFileNotOpen:
 if the swift.obj.diskfile.DiskFile.open() method was not previously invoked
open()

Open the object.

This implementation opens the data file representing the object, reads the associated metadata in the extended attributes, additionally combining metadata from fast-POST .meta files.

Note

An implementation is allowed to raise any of the following exceptions, but is only required to raise DiskFileNotExist when the object representation does not exist.

Raises:
  • DiskFileCollision – on name mis-match with metadata
  • DiskFileNotExist – if the object does not exist
  • DiskFileDeleted – if the object was previously deleted
  • DiskFileQuarantined – if while reading metadata of the file some data did pass cross checks
Returns:

itself for use as a context manager

read_metadata()

Return the metadata for an object without requiring the caller to open the object first.

Returns:metadata dictionary for an object
Raises DiskFileError:
 this implementation will raise the same errors as the open() method.
reader(keep_cache=False, _quarantine_hook=<function <lambda>>)

Return a swift.common.swob.Response class compatible “app_iter” object as defined by swift.obj.diskfile.DiskFileReader.

For this implementation, the responsibility of closing the open file is passed to the swift.obj.diskfile.DiskFileReader object.

Parameters:
  • keep_cache – caller’s preference for keeping data read in the OS buffer cache
  • _quarantine_hook – 1-arg callable called when obj quarantined; the arg is the reason for quarantine. Default is to ignore it. Not needed by the REST layer.
Returns:

a swift.obj.diskfile.DiskFileReader object

write_metadata(metadata)

Write a block of metadata to an object without requiring the caller to create the object first. Supports fast-POST behavior semantics.

Parameters:metadata – dictionary of metadata to be associated with the object
Raises DiskFileError:
 this implementation will raise the same errors as the create() method.
class swift.obj.diskfile.BaseDiskFileManager(conf, logger)

Management class for devices, providing common place for shared parameters and methods not provided by the DiskFile class (which primarily services the object server REST API layer).

The get_diskfile() method is how this implementation creates a DiskFile object.

Note

This class is reference implementation specific and not part of the pluggable on-disk backend API.

Note

TODO(portante): Not sure what the right name to recommend here, as “manager” seemed generic enough, though suggestions are welcome.

Parameters:
  • conf – caller provided configuration object
  • logger – caller provided logger
cleanup_ondisk_files(hsh_path, reclaim_age=604800, **kwargs)

Clean up on-disk files that are obsolete and gather the set of valid on-disk files for an object.

Parameters:
  • hsh_path – object hash path
  • reclaim_age – age in seconds at which to remove tombstones
  • frag_index – if set, search for a specific fragment index .data file, otherwise accept the first valid .data file
Returns:

a dict that may contain: valid on disk files keyed by their filename extension; a list of obsolete files stored under the key ‘obsolete’; a list of files remaining in the directory, reverse sorted, stored under the key ‘files’.

construct_dev_path(device)

Construct the path to a device without checking if it is mounted.

Parameters:device – name of target device
Returns:full path to the device
gather_ondisk_files(files, include_obsolete=False, verify=False, **kwargs)

Given a simple list of files names, iterate over them to determine the files that constitute a valid object, and optionally determine the files that are obsolete and could be deleted. Note that some files may fall into neither category.

Parameters:
  • files – a list of file names.
  • include_obsolete – By default the iteration will stop when a valid file set has been found. Setting this argument to True will cause the iteration to continue in order to find all obsolete files.
  • verify – if True verify that the ondisk file contract has not been violated, otherwise do not verify.
Returns:

a dict that may contain: valid on disk files keyed by their filename extension; a list of obsolete files stored under the key ‘obsolete’.

get_dev_path(device, mount_check=None)

Return the path to a device, first checking to see if either it is a proper mount point, or at least a directory depending on the mount_check configuration option.

Parameters:
  • device – name of target device
  • mount_check – whether or not to check mountedness of device. Defaults to bool(self.mount_check).
Returns:

full path to the device, None if the path to the device is not a proper mount point or directory.

get_diskfile_from_hash(device, partition, object_hash, policy, **kwargs)

Returns a DiskFile instance for an object at the given object_hash. Just in case someone thinks of refactoring, be sure DiskFileDeleted is not raised, but the DiskFile instance representing the tombstoned object is returned instead.

Raises DiskFileNotExist:
 if the object does not exist
get_ondisk_files(files, datadir, **kwargs)

Given a simple list of files names, determine the files to use.

Parameters:
  • files – simple set of files as a python list
  • datadir – directory name files are from for convenience
Returns:

dict of files to use having keys ‘data_file’, ‘ts_file’, ‘meta_file’ and optionally other policy specific keys

hash_cleanup_listdir(hsh_path, reclaim_age=604800)

List contents of a hash directory and clean up any old files. For EC policy, delete files older than a .durable or .ts file.

Parameters:
  • hsh_path – object hash path
  • reclaim_age – age in seconds at which to remove tombstones
Returns:

list of files remaining in the directory, reverse sorted

parse_on_disk_filename(filename)

Parse an on disk file name.

Parameters:filename – the data file name including extension
Returns:a dict, with keys for timestamp, and ext:
* timestamp is a :class:`~swift.common.utils.Timestamp`
* ext is a string, the file extension including the leading dot or
  the empty string if the filename has no extension.

Subclases may add further keys to the returned dict.

Raises DiskFileError:
 if any part of the filename is not able to be validated.
replication_lock(*args, **kwds)

A context manager that will lock on the device given, if configured to do so.

Raises ReplicationLockTimeout:
 If the lock on the device cannot be granted within the configured timeout.
yield_hashes(device, partition, policy, suffixes=None, **kwargs)

Yields tuples of (full_path, hash_only, timestamps) for object information stored for the given device, partition, and (optionally) suffixes. If suffixes is None, all stored suffixes will be searched for object hashes. Note that if suffixes is not None but empty, such as [], then nothing will be yielded.

timestamps is a dict which may contain items mapping:
ts_data -> timestamp of data or tombstone file, ts_meta -> timestamp of meta file, if one exists

where timestamps are instances of Timestamp

yield_suffixes(device, partition, policy)

Yields tuples of (full_path, suffix_only) for suffixes stored on the given device and partition.

class swift.obj.diskfile.BaseDiskFileReader(fp, data_file, obj_size, etag, threadpool, disk_chunk_size, keep_cache_size, device_path, logger, quarantine_hook, use_splice, pipe_size, diskfile, keep_cache=False)

Encapsulation of the WSGI read context for servicing GET REST API requests. Serves as the context manager object for the swift.obj.diskfile.DiskFile class’s swift.obj.diskfile.DiskFile.reader() method.

Note

The quarantining behavior of this method is considered implementation specific, and is not required of the API.

Note

The arguments to the constructor are considered implementation specific. The API does not define the constructor arguments.

Parameters:
  • fp – open file object pointer reference
  • data_file – on-disk data file name for the object
  • obj_size – verified on-disk size of the object
  • etag – expected metadata etag value for entire file
  • threadpool – thread pool to use for read operations
  • disk_chunk_size – size of reads from disk in bytes
  • keep_cache_size – maximum object size that will be kept in cache
  • device_path – on-disk device path, used when quarantining an obj
  • logger – logger caller wants this object to use
  • quarantine_hook – 1-arg callable called w/reason when quarantined
  • use_splice – if true, use zero-copy splice() to send data
  • pipe_size – size of pipe buffer used in zero-copy operations
  • diskfile – the diskfile creating this DiskFileReader instance
  • keep_cache – should resulting reads be kept in the buffer cache
app_iter_range(start, stop)

Returns an iterator over the data file for range (start, stop)

app_iter_ranges(ranges, content_type, boundary, size)

Returns an iterator over the data file for a set of ranges

close()

Close the open file handle if present.

For this specific implementation, this method will handle quarantining the file if necessary.

zero_copy_send(wsockfd)

Does some magic with splice() and tee() to move stuff from disk to network without ever touching userspace.

Parameters:wsockfd – file descriptor (integer) of the socket out which to send data
class swift.obj.diskfile.BaseDiskFileWriter(name, datadir, fd, tmppath, bytes_per_sync, threadpool, diskfile)

Encapsulation of the write context for servicing PUT REST API requests. Serves as the context manager object for the swift.obj.diskfile.DiskFile class’s swift.obj.diskfile.DiskFile.create() method.

Note

It is the responsibility of the swift.obj.diskfile.DiskFile.create() method context manager to close the open file descriptor.

Note

The arguments to the constructor are considered implementation specific. The API does not define the constructor arguments.

Parameters:
  • name – name of object from REST API
  • datadir – on-disk directory object will end up in on swift.obj.diskfile.DiskFileWriter.put()
  • fd – open file descriptor of temporary file to receive data
  • tmppath – full path name of the opened file descriptor
  • bytes_per_sync – number bytes written between sync calls
  • threadpool – internal thread pool to use for disk operations
  • diskfile – the diskfile creating this DiskFileWriter instance
commit(timestamp)

Perform any operations necessary to mark the object as durable. For replication policy type this is a no-op.

Parameters:timestamp – object put timestamp, an instance of Timestamp
put(metadata)

Finalize writing the file on disk.

Parameters:metadata – dictionary of metadata to be associated with the object
write(chunk)

Write a chunk of data to disk. All invocations of this method must come before invoking the :func:

For this implementation, the data is written into a temporary file.

Parameters:chunk – the chunk of data to write as a string object
Returns:the total number of bytes written to an object
swift.obj.diskfile.extract_policy(obj_path)

Extracts the policy for an object (based on the name of the objects directory) given the device-relative path to the object. Returns None in the event that the path is malformed in some way.

The device-relative path is everything after the mount point; for example:

/srv/node/d42/objects-5/179/
485dc017205a81df3af616d917c90179/1401811134.873649.data

would have device-relative path:

objects-5/179/485dc017205a81df3af616d917c90179/1401811134.873649.data

Parameters:obj_path – device-relative path of an object, or the full path
Returns:a BaseStoragePolicy or None
swift.obj.diskfile.invalidate_hash(suffix_dir)

Invalidates the hash for a suffix_dir in the partition’s hashes file.

Parameters:suffix_dir – absolute path to suffix dir whose hash needs invalidating
swift.obj.diskfile.object_audit_location_generator(devices, mount_check=True, logger=None, device_dirs=None)

Given a devices path (e.g. “/srv/node”), yield an AuditLocation for all objects stored under that directory if device_dirs isn’t set. If device_dirs is set, only yield AuditLocation for the objects under the entries in device_dirs. The AuditLocation only knows the path to the hash directory, not to the .data file therein (if any). This is to avoid a double listdir(hash_dir); the DiskFile object will always do one, so we don’t.

Parameters:
  • devices – parent directory of the devices to be audited
  • mount_check – flag to check if a mount check should be performed on devices
  • logger – a logger object
Device_dirs:

a list of directories under devices to traverse

swift.obj.diskfile.quarantine_renamer(device_path, corrupted_file_path)

In the case that a file is corrupted, move it to a quarantined area to allow replication to fix it.

Params device_path:
 The path to the device the corrupted file is on.
Params corrupted_file_path:
 The path to the file you want quarantined.
Returns:path (str) of directory the file was moved to
Raises OSError:re-raises non errno.EEXIST / errno.ENOTEMPTY exceptions from rename
swift.obj.diskfile.read_metadata(fd)

Helper function to read the pickled metadata from an object file.

Parameters:fd – file descriptor or filename to load the metadata from
Returns:dictionary of metadata
swift.obj.diskfile.strip_self(f)

Wrapper to attach module level functions to base class.

swift.obj.diskfile.write_metadata(fd, metadata, xattr_size=65536)

Helper function to write pickled metadata for an object file.

Parameters:
  • fd – file descriptor or filename to write the metadata
  • metadata – metadata to write