Techniques are provided for on-demand serverless disaster recovery. A primary node may host a primary volume. Snapshots of the primary volume may be backed up to an object store. In response to failure, a secondary node and/or an on- demand volume may be created on-demand. The secondary node may provide clients with failover access to the on-demand volume while a restore process restores a snapshot of the primary volume to the on-demand volume. In some embodiments, there was no secondary node and/or on-demand volume while the primary node was operational. This conserves computing resources that would be wasted by otherwise hosting the secondary node and/or on-demand volume while clients were able to access the primary volume through the primary node. Modifications directed to the on-demand volume are incrementally backed up to the object store for subsequently restoring the primary volume after recovery.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 11/16 - Error detection or correction of the data by redundancy in hardware
G06F 11/20 - Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
Techniques are provided for restoring a directory from a snapshot of a volume backed up to an object store. The snapshot may be backed up from a node to the object store, such as a cloud computing environment. A user may want to restore the directory within the volume without having to restore the entire volume, which otherwise would waste computing resources, storage, network bandwidth, and time. Accordingly, the techniques provided herein are capable of restoring just the directory from the snapshot that is stored within the object store. Because snapshot data of the snapshot may be stored across multiple objects within the object store, certain objects are identified as comprising snapshot data (backup data) of the directory and content items within the directory. In this way, the snapshot data of the directory is restored from these objects to a restore directory at a restore target.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
3.
DISTRIBUTED CONTROL PLANE FOR REFORMATTING COMMUNICATION BETWEEN A CONTAINER ORCHESTRATION PLATFORM AND A DISTRIBUTED STORAGE ARCHITECTURE
Techniques are provided for implementing a distributed control plane to facilitate communication between a container orchestration platform and a distributed storage architecture. The distributed storage architecture hosts worker nodes that manage distributed storage that can be made accessible to applications within the container orchestration platform through the distributed control plane. However, the worker nodes may support an imperative model of program commands, but the container orchestration platform and applications therein utilize a declarative model of programming commands not supported by the distributed storage architecture. Accordingly, the distributed control plane is configured with control plane controllers that are paired with the worker nodes and are configured to reformat commands between the imperative model and the declarative model. In this way, the control plane controllers can facilitate communication and performance of commands between the applications of the container orchestration platform and the worker nodes of the distributed storage architecture.
Methods and systems for a networked storage environment are provided. One method includes generating a first and a second batch of entries in response to scanning a source file system, each batch of entries associated with one or more directories of the source file system and indicating a path to a file associated with each entry; determining, by a first worker process, a first checksum for data associated with the first batch of entries loaded in a first buffer; appending, the first buffer contents processed by the first worker process to a first archive file; and generating an archive data structure having a manifest file storing metadata for the first batch and a second batch entries with a first checksum determined by the first worker process and a second checksum determined by a second worker process, and data from the first archive file and a second archive file.
G06F 16/11 - File system administration, e.g. details of archiving or snapshots
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
5.
SCALABLE SOLID-STATE STORAGE SYSTEM AND METHODS THEREOF
Methods and systems for solid state drives are provided, including assigning a first namespace to a first instance of a storage operating system and a second instance of the storage operating system for enabling read access to a first portion of a flash storage system by the first instance, and read and write access to the second instance; allocating a second namespace to the first instance for exclusive read and write access within a second portion of the flash storage system; generating, by the first instance, a request for the second instance to transfer a data object from the second portion owned by the first instance to the first portion; storing, by the second instance, the data object at the first portion; and updating metadata of the data object at the second portion, the metadata indicating a storage location at the second portion where the data object is stored.
In one embodiment, distributed data storage systems and methods are described for integrating a change tracking manager with scalable databases. According to one embodiment, a computer implemented method comprises managing storage of objects and continuously tracking changes of the objects in a distributed object storage database, creating a record for an object having an object name, the object being stored in a bucket of the distributed object storage database, linking the bucket to a peer bucket based on a directive, generating a peer marker field for the record to store one peer marker of multiple different peer markers depending on a relationship between the bucket and the peer bucket; and automatically adding a work item for the object to the secondary index of a chapter database based on the record being created in the bucket and the peer marker for the peer bucket.
Techniques are provided for implementing a snapshot copy operation between endpoints. One or more snapshots (e.g., snapshots of an on-premise volume) is stored within a source endpoint, such as a source bucket of an object store. A post operation is executed to copy objects comprising snapshot data of a snapshot from the source endpoint to a destination endpoint. A get operation and a tracking object such as a cookie is used to track progress of copying the objects from the source endpoint to the destination endpoint. The tracking object is used to restart the copying of the objects from a point where the copying left off (e.g., in the event there is a failure) without having to restart from the beginning.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
Systems, methods, and machine-readable media are disclosed for detecting sensitive personal information in a file. An entity extractor may extract a set of terms and a set of person candidates from a sentence in a file. Each term of the set of terms belongs to a set of sensitive categories. For a verb in the sentence, a relationship builder may determine a relationship between the respective verb, a subject, and an object in the sentence, where the respective verb, the subject, or the object includes a term. An event detector may determine, based on the relationship, that the term relates to a person from the set of set of person candidates and to a sensitive category. The event detector may create an event specifying the term, the sensitive category, and the person in response to determining that the term relates to the person and to the sensitive category.
Techniques are provided for storing immutable snapshot copes in write once read many (WORM) storage. A snapshot of a volume may be stored into one or more objects formatted according to an object format. An expiry time may be assigned to the snapshot and the one or more objects based upon a creation time of the snapshot and a retention time. The one or more objects may be stored within a remote object store. The one or more objects are retained in an immutable state and cannot be deleted until expiration of the expiry time. In response to identifying an existing object within the remote object store comprising shared snapshot data referenced by the snapshot, an assigned expiry time of the existing object may be modified based upon the expiry time of the snapshot to create a modified expiry time for the existing object.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 3/06 - Digital input from, or digital output to, record carriers
10.
FLEXIBLE TIERING OF SNAPSHOTS TO ARCHIVAL STORAGE IN REMOTE OBJECT STORES
Techniques are provided for tiering snapshots to archival storage in remote object stores. A restore time metric, indicating that objects comprising snapshot data of snapshots created within a threshold timespan are to be available within a storage tier of a remote object store for performing restore operations, may be identified. A scanner may be executed to evaluate snapshots using the restore time metric to identify a set of candidate snapshots for archival from the storage tier to an archival storage tier of the remote object store. For each candidate snapshot within the set of candidate snapshots, the scanner may evaluate metadata associated with the candidate snapshot to identity one or more objects eligible for archival from the storage tier to the archival storage tier, and may archive the one or more objects from the storage tier to the archival storage tier.
Methods and systems for using a hierarchical consistency group (CG) in a storage system are provided. A parent CG is associated with at least a first child CG having a plurality of storage volumes. An atomic application programming interface (API) provisions the parent CG and the first child CG by allocating storage and storing policies for the parent CG and the first CG. A storage service selected from a backup service, a replication service and a cloning service for the parent CG and the first CG is executed based on the stored policies.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 11/20 - Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
G06F 16/27 - Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
12.
DATA CONNECTOR COMPONENT FOR IMPLEMENTING MANAGEMENT REQUESTS
Techniques are provided for implementing management requests associated with objects of an object store. A data connector component may be instantiated as a container for processing management requests associated with backup data stored within an object store as an object according to an object format. A management request associated with the backup data may be received by the data connector component. A structure associated with the object having the object format may be traversed by the data connector component to identify the backup data. The data connector component may be implemented upon the backup data stored within the object.
G06F 3/06 - Digital input from, or digital output to, record carriers
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 16/11 - File system administration, e.g. details of archiving or snapshots
13.
DATA CONNECTOR COMPONENT FOR IMPLEMENTING DATA REQUESTS
Techniques are provided for implementing data requests associated with objects of an object store. A data connector component may be instantiated as a container for processing data requests associated with backup data stored within objects of an object store. The data connector component may evaluate the object store to identify snapshots stored as the backup data within the objects of the object store according to an object format. The data connector component may provide a client device with access to backup data of the snapshots.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 3/06 - Digital input from, or digital output to, record carriers
H04L 67/1097 - Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
14.
INTERRUPTION PREDICTIONS FOR CLOUD COMPUTE INSTANCES
Systems, methods, and machine-readable media for predicting interruptions to the use of spare cloud resources and rebalancing based on those predictions are disclosed. A computing platform collects data for customers over time. The computing platform runs a machine learning algorithm on the historical data to generate a prediction classifier. The prediction classifier relates to a time window for prediction into the future, on the order of minutes or hours. The prediction classifier is run on monitored data from ongoing activity with a cloud provider to generate a risk score. Each risk score may identify an amount of risk that a spare cloud resource related to new resource metrics data will be interrupted within the future time frame corresponding to that prediction classifier. If predicted to be interrupted, the customer may be assisted in rebalancing to other resources. As a result, interruptions can be predicted hours into the future.
A system, method, and machine-readable storage medium for determining an amount of unique data in a distributed storage system are provided. In some embodiments, a combined efficiency set for a first data set stored in the distributed storage system, such as at a volume, may be generated. The first data set may include a first subset of data and a second subset of data in the distributed storage system. Additionally, a set of efficiency sets for the first subset of data may be generated. A set difference based on the combined efficiency set and the set of efficiency sets may be computed. An amount of memory used for storing unique data of the second subset of data may be estimated based on the set difference. The unique data may be present in the second subset of data but absent from the first subset of data.
Techniques are provided for incremental backup to an object store. A request may be received from an application to perform a backup from a volume hosted by a node to a backup target within the object store. A set of changed files within the volume since a prior backup of the volume was performed to the backup target is identified, along with metadata associated with the set of changed files. The metadata is utilized to identify changed data blocks comprising data of the set of changed files that was modified since the prior backup. The changed data blocks are backed up to the object store.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 3/06 - Digital input from, or digital output to, record carriers
Techniques are provided for a layout format for compressed data. A first set of data blocks are grouped into a first group based upon a first frequency of access to the first set of data blocks. A second set of data blocks are grouped into a second group based upon a second frequency of access to the second set of data blocks. The first set of data blocks are compressed into a first compression group using a first compression algorithm. The second set of data blocks are compressed into a second compression group using a second compression algorithm.
Techniques are provided for implementing a persistent memory storage tier to manage persistent memory of a node. The persistent memory is managed by the persistent memory storage tier at a higher level within a storage operating system storage stack than a level at which a storage file system of the node is managed. The persistent memory storage tier intercepts an operation targeting the storage file system. The persistent memory storage tier retargets the operation from targeting the storage file system to targeting the persistent memory. The operation is transmitted to the persistent memory.
A system, method, and machine-readable storage medium for performing garbage collection in a distributed storage system are provided. In some embodiments, an efficiency level of a garbage collection process is monitored. The garbage collection process may include removal of one or more data blocks of a set of data blocks that is referenced by a set of content identifiers. The set of slice services and the set of data blocks may reside in a cluster, and a set of filters may indicate whether the set of data blocks is in-use. At least one parameter of a filter of the set of filters may be adjusted (e.g., increased or reduced) if the efficiency level is below the efficiency threshold. Garbage collection may be performed on the set of data blocks in accordance with the set of filters.
G06F 12/0864 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using pseudo-associative means, e.g. set-associative or hashing
The disclosed technology relates to determining physical zone data within a zoned namespace solid state drive (SSD), associated with logical zone data included in a first received input-output operation based on a mapping data structure within a namespace of the zoned namespace SSD. A second input-output operation specific to the determined physical zone data is generated wherein the second input-output operation and the received input-output operation is of a same type. The generated second input-output operation is completed using the determined physical zone data within the zoned namespace SSD.
A technique provides efficient management of policies for objects of a distributed storage architecture configured to service storage requests issued by one or more clients of a storage cluster. The objects may include volumes for storing data served by storage nodes of the cluster and the policies may include quality of service (QoS) policies. The technique enables dynamic grouping of the volumes as management domains and applying attributes, such as performance settings of the QoS policies, to the management domains. A group of volumes may be organized as a management domain and a QoS policy may be applied to the domain. If membership of the management domain is modified, the QoS policy is automatically applied to the added volume or stripped from the removed volume. If a performance setting of the policy is modified, the modification is atomically applied and propagated to each volume of the management domain.
The disclosed technology relates to managing input-output operation in a zoned storage system includes identifying a first physical zone and a second physical zone within a zoned namespace solid-state drive associated with a logical zone to perform a received write operation. Data to be written in the received write operation is temporarily staged in a zone random write area associated with the identified second physical zone. Based a storage threshold of the zone random write area, a determination is made regarding when to transfer temporarily staged data to be written area to the identified second physical zone. When the storage threshold of the zone random write area determined to have exceeded, temporarily staged data to be written is transferred to the identified second physical zone.
A system, method, and machine-readable storage medium for providing a set of recommended quality of service (QoS) settings are provided. In some embodiments, providing the recommendation includes receiving a set of QoS settings of a volume for a client, a set of measured QoS metrics of the volume for the client, and a measure of load for a slice service corresponding to the volume. Providing the recommendation further includes determining a predicted QoS metric of the volume and a predicted load of the slice service. Providing the recommendation also includes determining, based on the predicted QoS metric, the predicted load, and the set of QoS settings, a set of recommended QoS settings to the client. The set of QoS settings of the volume for the client is then updated with the set of recommended QoS settings.
Computing technology for managing support requests are provided. The technology includes a processor executable application programming interface (API) that receives a support case indicating a problem associated with a device. The API utilizes a training model to predict a problem category for the support case. The training model predicts the problem category based on a feature extracted from information included in the support case. The training model further identifies a plurality of proximate support cases based on a distance between the support case and the proximate support cases within a virtual space assigned to the predicted problem category; determines relevance of each proximate support case to the support case; and outputs a resolution code for the support case based on the determined relevance of each proximate support case.
An augmented reality (AR) diagnostic tool embodied as a software application on a portable device employs AR infrastructure to enable a user to locate a failed/malfunctioning node of a cluster and, with minimal interaction, diagnose causes and provide recommendations to repair the node. The portable device may be a computer embodied as visualization technology and configured to execute the software application. Once installed, the AR diagnostic (ARD) tool is ready for use by the user, e.g., a customer service technician, to locate and repair one or more failed cluster nodes. In response to a failure/malfunction, the cluster node sends diagnostic and configuration information (i.e., failure/malfunction information) of the failed node to an analytics service. The failure information informs the technician of the cluster failure. The technician may then activate the ARD tool and AR infrastructure to locate and repair the failed node.
Methods, non-transitory machine readable media, and computing devices that provide file backup catalogs with improved scalability are disclosed. With this technology, a sequence number is incremented and an entry for a snapshot associated with obtained metadata for the snapshot is generated. The snapshot entry comprises a snapshot identifier for the snapshot and the incremented sequence number. A current version flag is then set in another entry for a file associated with a create event identified in the metadata. The file entry includes a file identifier for the file, a create attribute comprising the incremented sequence number, and a delete attribute. The file and snapshot entries are then inserted into indice(s) in a catalog database. Based on the schema of the indice(s), this technology provides a lightweight, elegant, and highly scalable catalog that more efficiently facilitates full path global file search and restore functionality with reduced resource utilization.
G06F 16/11 - File system administration, e.g. details of archiving or snapshots
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
27.
METHODS FOR DYNAMIC THROTTLING TO SATISFY MINIMUM THROUGHPUT SERVICE LEVEL OBJECTIVES AND DEVICES THEREOF
Methods, non-transitory machine readable media, and computing devices that dynamically throttle non-priority workloads to satisfy minimum throughput service level objectives (SLOs) are disclosed. With this technology, a determination is made when a number of detection intervals with a violation within a detection window exceeds a threshold, when a current one of the detection intervals is outside an observation area. The detection intervals are identified a violated based on an average throughput for priority workloads within the detection intervals exceeding a minimum throughput SLO. A throttle is then set to rate-limit non-priority workloads, when the number of violated detection intervals within the detection window exceeds the threshold. Advantageously, throughput for priority workloads is more effectively managed and utilized with this technology such that throttling oscillations are reduced, throttling is not deployed in conditions in which it would not improve throughput, and throttling is minimally deployed to maximize throughput.
Techniques are provided for orchestrating operations between a storage environment and a computing environment hosting virtual machines. A virtual machine proxy, associated with a computing environment hosting a virtual machine, is accessed by an orchestrator to identify the virtual machine and properties of the virtual machine. A storage proxy, associated with a storage environment comprising a volume within which snapshots of the virtual machine are to be stored, is accessed by the orchestrator to initialize a backup procedure. The orchestrator utilizes the virtual machine proxy to create a snapshot of the virtual machine. The orchestrator utilizes the storage proxy to back up the snapshot to the volume using the backup procedure.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
An optimistic and failsafe technique validates network configurations of storage and compute nodes deployed as a cluster. An optimistic aspect of the technique does not pre-validate an expected network configuration (state) of the deployed nodes. Instead, an initial network configuration state of each node is saved as a "failsafe" state and an expected network end-state of a data model is applied to each node. According to a validation procedure aspect of the technique, each node employs the data model as a test to validate, inter alia, connectivity with other nodes in the cluster. In response to every validating node responding to a coordinating node that the validation test succeeded, the coordinating node sends an "all-clear" message to all of the nodes (including itself) instructing each node to maintain its newly applied expected network end-state. If any node is unreachable due to a network configuration validation test failure, then a failsafe aspect of the technique is invoked wherein the coordinating storage node does not send the all-clear message before expiration of a predetermined timeout value, in response to which the remaining nodes of the cluster automatically "roll-back" and impose the initial failsafe network state.
Techniques are provided for timestamp consistency. An operation targeting a first storage object having a synchronous replication relationship with a second storage object is intercepted. A timestamp is assigned to the operation. A replication operation is created as a replication of the operation. The same timestamp is assigned to the replication operation. The operation is implemented upon the first storage object and the replication operation is implemented upon the second storage object.
A technique is configured to provide data protection, such as replication and erasure coding, of content driven distribution of data blocks served by storage nodes of a cluster. When providing data protection in the form of replication (redundancy), a slice service of the storage node generates one or more copies or replicas of a data block for storage on the cluster. Each replicated data block is illustratively organized within a bin that is maintained by block services of the nodes for storage on storage devices. When providing data protection in the form of erasure coding, the block services may select data blocks to be erasure coded. A set of data blocks for erasure coding may then be grouped together to form a write group. According to the technique, EC group membership is guided by varying bin groups so the data is resilient against failure. Slice services of the storage nodes assign data blocks of different bins and replicas to a write group.
G06F 11/10 - Adding special bits or symbols to the coded information, e.g. parity check, casting out nines or elevens
G06F 11/18 - Error detection or correction of the data by redundancy in hardware using passive fault-masking of the redundant circuits, e.g. by quadding or by majority decision circuits
32.
IMPROVING AVAILABLE STORAGE SPACE IN A SYSTEM WITH VARYING DATA REDUNDANCY SCHEMES
A technique is configured to provide various data protection schemes, such as replication and erasure coding, for data blocks of volumes served by storage nodes of a cluster configured to perform deduplication of the data blocks. Additionally, the technique is configured to ensure that each deduplicated data block complies with data redundancy guarantees of the data protection schemes, while improving storage space of the storage nodes. In order to satisfy the data integrity guarantees while improving available storage space, the storage nodes perform periodic garbage collection for data blocks to optimize storage in accordance with currently applicable data protection schemes
G06F 11/10 - Adding special bits or symbols to the coded information, e.g. parity check, casting out nines or elevens
G06F 11/08 - Error detection or correction by redundancy in data representation, e.g. by using checking codes
G06F 11/20 - Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
33.
EFFICIENT MEMORY FOOTPRINT IN DEDUPLICATED SYSTEM STORING WITH CONTENT BASED ADDRESSING
A technique is configured to reduce an amount of memory (i.e., memory footprint) usage by each storage node of a cluster needed to store metadata while providing fast and efficient servicing of data in accordance with storage requests issued by a client of the cluster. Illustratively, a block identifier (ID) is used to identify a block of data serviced by the storage node. Metadata embodied as mappings between block IDs and locations of data blocks in the cluster are illustratively maintained in map fragments. A map fragment may be embodied as "active" map fragment or a "frozen" map fragment. An active map fragment refers to a map fragment that has space available to store a mapping, whereas a frozen map fragment refers to a map fragment that is full, i.e., has no available space for storing a mapping. In order to reduce the memory footprint of each storage node, yet still provide fast and efficient servicing of data by the node, the active map fragments are preferably maintained in memory as "in-core" data structures, whereas the frozen map fragments are paged-out and stored on storage devices of the cluster as "on-disk" map fragment structures.
A technique is configured to log and update metadata in a log-structured file system to facilitate recovery and restart in response to failure of a storage node of a cluster. A block identifier (ID) is used to identify a block of data serviced by the storage node. Metadata embodied as mappings between block IDs and locations of data blocks in the cluster are illustratively maintained in "active" and "frozen" map fragments. An active map fragment refers to a map fragment that has space available to store a mapping, whereas a frozen map fragment refers to a map fragment that no available space for storing a mapping. The active map fragments are maintained in memory as "in-core" data structures, whereas the frozen map fragments are paged-out and stored on storage devices of the cluster as "on-disk" map fragment structures. Each frozen map fragment written to a segment includes a pointer to a last written frozen map fragment to form a chain (e.g., linked-list) of on-disk frozen map fragments. Each time a data block is persisted on a segment of the storage devices, an active map fragment is populated in-core and a metadata write marker is recorded on the segment (on-disk) indicating the location of the data block that was written to the segment. If a storage node crashes when the active map fragment is only partially populated, the metadata write markers facilitate rebuild of the active map fragment upon recovery and restart of a storage service of the node.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
nnn i:i i i i tokens; receiving a text corpus, wherein said text corpus is segmented into tokens; and automatically matching each token in said text corpus against said populated probabilistic data representation model, wherein said matching comprises sequentially querying each said BF pair in the order of said indexing, to determine a match.
Methods and systems for a networked storage system is provided. One method includes creating a first snapshot for data units stored at a persistent memory of a computing device, the data units managed by a first file system; transferring metadata associated with the data units and the data units stored at the persistent memory to a storage device managed by a second file system using a logical object, the second file system executed by a storage system interfacing with the computing device; and generating a second snapshot of the logical object at the storage device, the second snapshot including data units and associated metadata of the first snapshot.
Techniques are provided for resynchronizing a synchronous replication relationship. Asynchronous incremental transfers are performed to replicate data of a storage object to a replicated storage object. Incoming write requests, targeting the storage object, are logged into a dirty region log during a last asynchronous incremental transfer. Metadata operations, executed on the storage object, are logged into a metadata log during the last asynchronous incremental transfer. Sequence numbers are assigned to the metadata operations based upon an order of execution. The metadata operations are replicated to the replicated storage object for execution according to the sequence numbers, and the dirty regions are replicated to the replicated storage object in response to the metadata operations having been replicated to the replicated storage object. The storage object and replicated storage object are transitioned to a synchronous replication state where incoming operations are synchronously replicated to the replicated storage object.
G06F 11/20 - Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
G06F 11/16 - Error detection or correction of the data by redundancy in hardware
Techniques are provided for synchronous replication based cutover. An asynchronous replication process is executed to perform asynchronous incremental transfers of data of a storage object from a first computing environment to a replicated storage object at a second computing environment until a cutover criteria is met. A synchronous replication process is executed to synchronously replicate operations, targeting the storage object, to the replicated storage object based upon the cutover criteria being met. A cutover is performed to direct operations from targeting the storage object to targeting the replicated storage object based upon the synchronous replication process reaching a steady state of synchronous replication for sub-objects of the storage object, where operations are committed to both the storage object and the replicated storage object.
A technique is configured to utilize frames generated by a first layer of a protocol stack for a first network to configure network parameters associated with a second layer of the protocol stack for a second network. The frames are illustratively beacon frames generated by a data link layer of a Transmission Control Protocol/Internet Protocol (TCP/IP) stack for a wireless network, and the network parameters are illustratively IP addresses associated with a network layer of the TCP/IP stack for a wired network. Notably, the beacon frames of the wireless network may be utilized for two-way communication exchange on a per node basis for each node in the wired network.
The present technology relates to managing workload within a storage system. A quality of service parameter proposal associated with managing incoming network traffic is generated and provided to a plurality of nodes. The generated quality of service parameter proposal to manage the incoming network traffic is modified based on a response received from the nodes. The incoming network traffic is serviced using the data from the modified quality of service parameter proposal.
A method comprising operating at least one hardware processor for: receiving, as input, a plurality of electronic documents, training a machine learning classifier based, at least on part, on a training set comprising: (i) labels associated with the electronic documents, (ii) raw text from each of said plurality of electronic documents, and (iii) a rasterized version of each of said plurality of electronic documents, and applying said machine learning classifier to classify one or more new electronic documents.
A method, non-transitory computer readable medium, and device that assists with managing cloud storage includes identifying a portion of data in a data unit identified for deletion in the metadata. The identified portion of the data identified for delete is compare to a threshold amount. Deletion of the data unit from a first storage object is deferred when the determined portion of data identified for deletion is less than the threshold amount. A second storage object with a portion of data unmarked for deletion in the data unit is generated when the determined portion of data marked for deletion is equal to the threshold amount, wherein the second storage object has a same identifier as the first storage object.
Methods, non-transitory computer readable media, and computing devices that group objects with different service level objectives for an application includes receiving a request including a service level data to provision a volume. One or more aggregates for the received service level is identified, a resource pool including the identified one or more aggregates is generated. The volume including the generated resource pool with the identified one or more aggregates for the received service level is provisioned.
Techniques are provided for replay of metadata and data operations. During initial execution of operations, identifiers of objects modified by the execution of each operation are identified and stored in association with the operations. When the operations are to be replayed (e.g., executed again, such as part of a replication operation or as part of flushing content from a cache to persistent storage), the identifiers are evaluated to determine which operations are independent with respect to one another and which operations are dependent with respect to one another. In this way, independent operations are executed in parallel and dependent operations are executed serially with respect to the operations from the dependent operations depend.
G06F 11/20 - Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
45.
METHODS FOR ACCELERATING STORAGE MEDIA ACCESS AND DEVICES THEREOF
Methods, non-transitory computer readable media, and computing devices that accelerate data access requests. With this technology, a hierarchy of a plurality of objects is inserted into a location database. Each of at least a subset of the plurality of objects comprises a physical storage location for data stored in a filesystem. One or more of the plurality of objects includes an object version number and a parent version number of a parent one of the plurality of objects. A determination is made when an invalidation event has occurred in the filesystem. The invalidation event is associated with one of the plurality of objects. The object version number for the one of the plurality of objects is modified to invalidate one or more of the subset of the objects, when the determining indicates that the invalidation event has occurred in the filesystem.
Methods, non-transitory computer readable media, and computing devices that manages distributed snapshot for low latency storage includes obtaining one or more snapshots from one or more solid state devices (SSD), wherein the obtained one or more snapshots are stored in a snapshot allocated capacity of the one or more SSD. A data transfer operation is initiated from a primary storage to a secondary storage using the obtained one or more snapshots. It is determined if the initiated data transfer operation is completed and when it is determined to be completed, the obtained one or more snapshots stored in the snapshot allocated capacity of the one or more SSD are deleted.
Methods, non-transitory computer readable media, and computing devices that receive data from a primary storage node. The data is stored in a primary volume within a primary composite aggregate hosted by the primary storage node. A determination is made when the data is tagged to indicate that the data is stored in the primary volume on a remote data storage device of the primary composite aggregate. The data is stored on another remote data storage device without storing the data in a local data storage device, when the determining indicates that the data is tagged to indicate that the data is stored in the primary volume on a remote data storage device of the primary composite aggregate. Accordingly, this technology allows data placement to remain consistent across primary and secondary volumes and facilitates efficient operation of secondary storage nodes by eliminating two-phase writes for data stored on cloud storage devices.
G06F 12/0888 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using selective caching, e.g. bypass
G06F 3/06 - Digital input from, or digital output to, record carriers
G06F 12/0868 - Data transfer between cache memory and other subsystems, e.g. storage devices or host systems
G06F 12/0866 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
G06F 11/20 - Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
48.
ITERATIVE OBJECT SCANNING FOR INFORMATION LIFECYCLE MANAGEMENT
To effectively implement ILM policies and account for unreliability in a geographically distributed large-scale storage system, "scanners" and "ILM rules appliers" can be deployed on nodes throughout the storage system for large scale ILM implementation. Each scanner is programmed to deterministically self-assign a region of object namespace and scan that region of object namespace. To "scan" a region, a scanner accesses metadata of each object that has an identifier within the scanner's region and inserts the object metadata into one of a set of queues for ILM evaluation. An ILM rules applier dequeues object metadata for evaluation against ILM rules and determines whether an ILM task is to be performed for ILM rule compliance.
A method, non-transitory computer readable medium, and device that assists with performing global data deduplication on data blocks across different volumes includes identifying at least two data blocks stored in two or more storage volumes. It is determined whether the at least two data blocks are classified as a shared data block. A new data volume signature is created when the at least two data blocks are determined to be shared. One of the at least two data blocks that are determined to be shared is deleted and the other one of the at least two data blocks and the created signature in one of the two or more storage volumes is stored.
Data traffic of different customers or tenants can be efficiently handled at a shared node while still being isolated from each other. An application instance can create multiple network stack that are isolated from each other and intelligently manage threads across the isolated network stack instances. To intelligently manage the threads across the network stack instances, each thread maintains data that identifies the network stack to which the thread is assigned. With this information, the application can intelligently use a thread already assigned to a network stack that will process the data traffic and avoid the performance impact of a system call to assign the thread to the network stack.
H04L 12/713 - Route fault prevention or recovery, e.g. rerouting, route redundancy, virtual router redundancy protocol [VRRP] or hot standby router protocol [HSRP] using node redundancy, e.g. VRRP
H04L 29/08 - Transmission control procedure, e.g. data link level control procedure
H04L 29/06 - Communication control; Communication processing characterised by a protocol
Techniques are provided for providing a storage abstraction layer for a composite aggregate architecture. A storage abstraction layer is utilized as an indirection layer between a file system and a storage environment. The storage abstraction layer obtains characteristic of a plurality of storage providers that provide access to heterogeneous types of storage of the storage environment (e.g., solid state storage, high availability storage, object storage, hard disk drive storage, etc.). The storage abstraction layer generates storage bins to manage storage of each storage provider. The storage abstraction layer generates a storage aggregate from the heterogeneous types of storage as a single storage container. The storage aggregate is exposed to the file system as the single storage container that abstracts away from the file system the management and physical storage details of data of the storage aggregate.
Techniques are provided for selectively storing data into allocation areas using streams. A set of allocation areas (e.g., ranges of block numbers such as virtual block numbers) are defined for a storage device. Data having particular characteristics (e.g., user data, metadata, hot data, cold data, randomly accessed data, sequentially accessed data, etc.) will be sent to the storage device for selective storage in corresponding allocation areas. For example, when a file system receives a write stream of hot data, the hot data may be assigned to a stream. The stream will be tagged using a stream identifier that is used as an indicator to the storage device to process data of the stream using an allocation area defined for hot data. In this way, data having different characteristics will be stored/confined within particular allocation areas of the storage device to reduce fragmentation and write amplification.
A method, non-transitory computer readable medium, and device that assists with performing data deduplication on data blocks includes receiving a plurality of data blocks, wherein each of the received plurality of data blocks are of an equal memory size. Each of the received plurality of data blocks are split into a plurality of segments with a segment size less than the equal memory size. Duplicate data is identified within each of the plurality of segments for each of the received plurality of data blocks. One occurrence of the identified duplicate data is stored from each of the received plurality of data blocks into a new data block.
Methods and systems for an object based storage are provided. As an example, a method for generating a metadata object for an archive data container having a plurality of data containers is disclosed. The method includes generating a first metadata signature for the archive data container using an archive data container identifier, a number of data containers within the archive data container, and placement information of each data container within the archive data container; assigning a plurality of blocks for storing data for the plurality of data containers at an object based storage to an intermediate logical object; updating a payload signature with placement information of the plurality of blocks within the intermediate logical object; and placing the first metadata signature and the updated payload signature within the metadata object, wherein the metadata object is used to retrieve location information for a specific data container within the archive data container.
A storage layer based orchestration method can efficiently migrate a virtualized, enterprise scale system across disparate virtualization environments. A copy of a source logical storage container with multiple virtual disks of virtual machines (VMs) can be created in a public cloud destination as a destination logical storage container. Each of the VMs is associated with at least one virtual disk that includes boot data ("boot disk") for the VM. With application programming interface function calls and/or scripted task automation and configuration management commands, the orchestration method coordinates different applications and tools to convert the boot disks into canonical storage representations (e.g., logical unit numbers (LUNs)), to instantiate VMs in the destination environment, and to chain load the boot disks to launch the VMs in a different virtualization environment.
Methods, non-transitory computer readable media, and devices that dynamically adjust a logical unit number fault domain in a distributed storage area network environment includes determining when at least one of a plurality of nodes of a cluster is cut off from others of the plurality of nodes of the cluster. Any logical unit numbers (LUNs) owned by each of the plurality of nodes are identified. A fault domain for any of the identified LUNs owned by the at least one of the plurality of nodes determined to be cut off is adjusted from a distributed task set mode (DTM) of operation to a single task set mode (STM) of operation. This adjustment is made without any communication from the DTM operation to the STM operation to any of one or more host computing devices interacting with the cluster.
G06F 3/06 - Digital input from, or digital output to, record carriers
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
H04L 29/08 - Transmission control procedure, e.g. data link level control procedure
57.
REDUCING STABLE DATA EVICTION WITH SYNTHETIC BASELINE SNAPSHOT AND EVICTION STATE REFRESH
With a forever incremental snapshot configuration and a typical caching policy (e.g., least recently used), a storage appliance may evict stable data blocks of an older snapshot, perhaps unchanged data blocks of the snapshot baseline. If stable data blocks have been evicted, restore of a recent snapshot will suffer the time penalty of downloading the stable blocks for restoring the recent snapshot. Creating synthetic baseline snapshots and refreshing eviction data of stable data blocks can avoid eviction of stable data blocks and reduce the risk of violating a recovery time objective.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 17/30 - Information retrieval; Database structures therefor
58.
SNAPSHOT METADATA ARRANGEMENT FOR CLOUD INTEGRATION
A storage appliance arranges snapshot data and snapshot metadata into different structures, and arranges the snapshot metadata to facilitate efficient snapshot manipulation, which may be for snapshot management or snapshot restore. The storage appliance receives snapshots according to a forever incremental configuration and arranges snapshot metadata into different types of records. The storage appliance stores these records in key-value stores maintained for each defined data collection (e.g., volume). The storage appliance arranges the snapshot metadata into records for inode information, records for directory information, and records that map source descriptors of data blocks to snapshot file descriptors. The storage appliance uses a locally generated snapshot identifier as a key prefix for the records to conform to a sort constrain of the key-value store, which allows the efficiency of the key-value store to be leveraged. The snapshot metadata arrangement facilitates efficient snapshot restore, file restore, and snapshot reclamation.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
59.
MULTIPLE NODE REPAIR USING HIGH RATE MINIMUM STORAGE REGENERATION ERASURE CODE
A distributed storage system can use a high rate MSR erasure code to repair multiple nodes when multiple node failures occur. An encoder constructs m r-ary trees to determine the symbol arrays for the parity nodes. These symbol arrays are used to generate the parity data according to parity definitions or parity equations. The m r-ary trees are also used to identify a set of recovery rows across helper nodes for repairing a systematic node. When failed systematic nodes correspond to different ones of the m r-ary trees, a decoder may select additional recovery rows. The decoder selects additional recovery rows when the parity definitions do not provide a sufficient number of independent linear equations to solve the unknown symbols of the failed nodes. The decoder can select recovery rows contiguous to the already identified recovery rows for access efficiency.
G06F 11/10 - Adding special bits or symbols to the coded information, e.g. parity check, casting out nines or elevens
H03M 13/03 - Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words
60.
METHODS FOR MINIMIZING FRAGMENTATION IN SSD WITHIN A STORAGE SYSTEM AND DEVICES THEREOF
A method, non-transitory computer readable medium, and device that assists with reducing memory fragmentation in solid state devices includes identifying an allocation area within an address range to write data from a cache. Next, the identified allocation area is determined for including previously stored data. The previously stored data is read from the identified allocation area when it is determined that the identified allocation area comprises previously stored data. Next, both the write data from the cache and the read previously stored data are written back into the identified allocation area sequentially through the address range.
A method, non-transitory computer readable medium and storage server computing device that stores an identifier for a file system block evicted from a buffer cache in an entry in a table. The file system block is inserted into a victim cache hosted by an ephemeral block-level storage device by invoking a function provided by an application programming interface (API). The API exposes the ephemeral block-level storage device to a virtual storage appliance via an operating system of the storage server computing device. The entry in the table is updated to include location(s) on the ephemeral block-level storage device at which one or more portions of the file system block are stored, the location(s) returned in response to the function invocation. By this technology, performance of the virtual storage appliance is significantly improved, resulting in lower latency for client devices requesting data in a cloud storage environment.
G06F 12/0804 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with main memory updating
G06F 12/0868 - Data transfer between cache memory and other subsystems, e.g. storage devices or host systems
G06F 12/0897 - Caches characterised by their organisation or structure with two or more cache hierarchy levels
G06F 12/128 - Replacement control using replacement algorithms adapted to multidimensional cache systems, e.g. set-associative, multicache, multiset or multilevel
To decrease a load on a network and a storage system, encryption operations can be offloaded to a server locally connected to the storage system. The server receives requests to perform encryption operations, such as LUN encryption or file encryption, for a host. The server obtains an encryption key unique to the host and performs the encryption operation using the encryption key. The server then notifies the host that an encrypted LUN or encrypted file is available for use. The host is able to utilize the encrypted data because the encryption was performed with the host's unique key. Since the server is locally connected to the storage system, offloading encryption requests to the server reduces the load on a network by reducing the amount of traffic transmitted between a host and the storage system.
One or more techniques and/or computing devices are provided for cross- platform replication. For example, a replication relationship may be established between a first storage endpoint and a second storage endpoint, where at least one of the storage endpoints, such as the first storage endpoint, lacks or has incompatible functionality to perform and manage replication because the storage endpoints have different storage platforms that store data differently, use different control operations and interfaces, etc. Accordingly, replication destination workflow, replication source workflow, and/or a proxy representing the first storage endpoint may be implemented at the second storage endpoint comprising the replication functionality. In this way, replication, such as snapshot replication, may be implemented between the storage endpoints by the second storage endpoint using the replication destination workflow, the replication source workflow, and/or the proxy that either locally executes tasks or routes tasks to the first storage endpoint such as for data access.
G06F 11/20 - Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
64.
NAMESPACE HIERARCHY PRESERVATION WITH MULTIPLE OBJECT STORAGE OBJECTS
To leverage the attributes of object storage for applications/systems created to interface with a network files system, an object storage backed file system can accept the defined file system commands from the applications/systems and transform the file system commands into requests that target object storage. The file system is "backed" by object storage because attributes and content of file system entities are stored in objects. For instance, content data and metadata of a file are stored in objects in object storage. This object storage backed file system can be considered a bridge between a client perceived hierarchical file system namespace and a flat namespace of an object storage.
An archival cloud storage service can be created with cost efficient components for large scale data storage and can efficiently use these components. A frontend of the cloud storage service presents an asynchronous storage interface to consuming devices of the cloud storage service. Providing an asynchronous storage service interface avoids at least some of the state data overhead that accompanies a time constrained interface (e.g., a request-response based interface with timeouts in seconds). Backend nodes of the cloud storage service periodically query the frontend servers to select requests that the backend nodes can fulfill. Each backend node selects requests based on backend characteristics information, likely dynamic characteristics, of the backend node. Thus, the storage system underlying the cloud storage service can be considered a self-organizing storage system.
In some embodiments, a cluster computing system notifies a host system that a first path to a resource in the cluster computing system is optimized and that a second path to the resource is non-optimized. The resource is owned or managed by a first computing node of the cluster computing system. The first path includes the first computing node. The second path includes a second computing node and an intra-cluster connection between the second computing node and the first computing node. A disruption in the intra-cluster connection, which prevents communication between the first and second computing nodes via the intra- cluster connection, is identified. During a time period in which the disruption exists, the host system is notified that the first path is optimized and that the second path is unavailable, and input/output operations between the host system and the resource via the first path are continued.
One or more techniques and/or computing devices are provided for inline deduplication. For example, a checksum hash table and/or a block number hash table may be maintained within memory (e.g., a storage controller may maintain the hash tables in-core). The checksum hash table may be utilized for inline deduplication to identify potential donor blocks that may comprise the same data as an incoming storage operation. Data within an in-core buffer cache is eligible as potential donor blocks so that inline deduplication may be performed using data from the in-core buffer cache, which may mitigate disk access to underlying storage for which the in- core buffer cache is used for caching. The block number hash table may be used for updating or removing entries from the hash tables, such as for blocks that are no longer eligible as potential donor blocks (e.g., deleted blocks, blocks evicted from the in-core buffer cache, etc.).
One or more techniques and/or computing devices are provided for managing an arbitrary set of storage items using a granset. For example, a storage controller may host a plurality of storage items and/or logical unit numbers (LUNs). A subset of the storage items are grouped into a consistency group. A granset is created for tracking, managing, and/or providing access to the storage items within the consistency group. For example, the granset comprises application programming interfaces (APIs) and/or properties used to provide certain levels of access to the storage items (e.g., read access, write access, no access), redirect operations to access either data of an active file system or to a snapshot, fence certain operations (e.g., rename and delete operations), and/or other properties that apply to each storage item within the consistency group. Thus, the granset provides a persistent on-disk layout used to manage an arbitrary set of storage items.
One or more techniques and/or computing devices are provided for replicating virtual machine disk clones. For example, a first storage controller, hosting first storage, may have a synchronous replication relationship with a second storage controller hosting second storage. A virtual machine, within the first storage, may be specified as having synchronous replication protection. Accordingly, virtual machine disk clones of a virtual machine disk of the virtual machine may be replicated from the first storage to the second storage. For example, virtual machine disk clones may be synchronous replicated, replicated by a resync process invoked by a hypervisor agent, and/or stored and replicated from a clone backup directory.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 11/20 - Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
One or more techniques and/or computing devices are provided for secure data replication. For example, a first storage controller (116) may host first storage (128) within which storage resources (e.g., files, logical unit numbers (LUNs), volumes, etc.) are stored. The first storage controller (116) may establish an access policy with a second storage controller (118) to which data is to be replicated from the first storage (128). The access policy may define an authentication mechanism for the first storage controller (116) to authenticate the second storage controller (118), an authorization mechanism specifying a type of access that the second storage controller (118) has for a storage resource, and an access control mechanism specifying how the second storage controller's access to data of the storage resource is to be controlled. In this way, data replication requests may be authenticated and authorized so that data may be provided, according to the access control mechanism, in a secure manner.
A method and system for managing backup storage of file system entities. In an aspect, a file system catalog includes a database populator tool that generates records within a metadata table that may be maintained within a database. In response to detecting a replication cycle, the populator tool reads a stream of replication operations. For each of the replication operations, the populator tool determines the type of operation and in response to determining that a directory inode is an operand of the replication operation, the tool generates one or more catalog records. Each of the generated records includes and logically associates data entries corresponding to an inode number, a parent inode number, an entity type, a point-in-time-image (PTI) ID, an absolute path, and an operation.
G06F 17/30 - Information retrieval; Database structures therefor
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
72.
MULTIPLE DATASET BACKUP VERSIONS ACROSS MULTI-TIERED STORAGE
A storage tier manager creates different versions of a dataset backup for different retention periods. Each of the versions is distinctly identifiable despite initially representing a same dataset backup. One version can be referred to as a cached version of the dataset backup and another version can be referred to as a cloud version of the dataset backup. When the retention period expires for the cached version of the dataset backup, the storage tier manager migrates the cloud version of the dataset backup from the caching storage tier to the cloud storage tier. The storage tier manager can then recover storage space occupied by data that has been migrated, as long as that data is not shared with other cached versions of other dataset backups due to deduplication.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
A method and system for replicating file system entities in a manner that preserves block-level access and file system efficiency mechanisms such as deduplication are disclosed. In an embodiment, a replication engine receives a stream of file system entities that include a file system inodes and file system data blocks. The replication engine generates object-based storage (OBS) objects based on data and reference information specified by the file system entities. As part of generating the OBS objects, the replication engine generates at least one inode file object that associates file block numbers of a file system inode file and the inode numbers. The replication engine uses inode information to generate reference objects that logically associate file block numbers with data block numbers in per inode manner. The replication engine further generates data objects that contains the file system data blocks and that associates the data blocks with corresponding data block numbers.
G06F 3/06 - Digital input from, or digital output to, record carriers
G06F 17/30 - Information retrieval; Database structures therefor
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
One or more techniques and/or computing devices are provided for granular replication for data protection. For example, a first storage controller may host a first volume. A consistency group, comprising a subset of files, logical unit numbers, and/or other data of the first volume, is defined through a consistency group configuration. A baseline transfer, using a baseline snapshot of the first volume, is used to create a replicated consistency group within a second volume hosted by a second storage controller. In this way, an arbitrary level of granularity is used to synchronize/replicate a subset of the first volume to the second volume. If a synchronous replication relationship is specified, then one or more incremental transfer are performed and a synchronous replication engine is implemented. If an asynchronous replication relationship is specified, then snapshots are used to identify delta data of the consistency group for updating the replication consistency group.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 11/20 - Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
75.
ADAPTIVE, SELF LEARNING CONSISTENCY POINT TRIGGERS
Presented herein are methods, non-transitory computer readable media, and devices for allocating resources to a particular volume and triggering a consistency point based on the amount given to each volume, thus providing dynamic consistency point triggers. Methods for providing dynamic consistency point triggers are disclosed which include: determining a volume's capacity to undertake the resources based on the volume's performance; receiving a portion of the divided resources based on total system resources available within the storage server and the volume's performance; and triggering a consistency point upon exhausting a threshold percentage of the received portion of resources.
One or more techniques and/or computing devices are provided for utilizing snapshots for data integrity validation and/or faster application recovery. For example, a first storage controller, hosting first storage, has a synchronous replication relationship with a second storage controller hosting second storage. A snapshot replication policy rule is defined to specify that a replication label is to be used for snapshot create requests, targeting the first storage, that are to be replicated to the second storage. A snapshot creation policy is created to issue snapshot create requests comprising the replication label. Thus a snapshot of the first storage and a replication snapshot of the second storage are created based upon a snapshot create request comprising the replication label. The snapshot and the replication snapshot may be compared for data integrity validation (e.g., determine whether the snapshots comprise the same data) and/or quickly recovering an application after a disaster.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 11/20 - Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
G06F 11/16 - Error detection or correction of the data by redundancy in hardware
77.
METHODS FOR MANAGING ARRAY LUNS IN A STORAGE NETWORK WITH A MULTI-PATH CONFIGURATION AND DEVICES THEREOF
Methods, non-transitory computer readable media, and storage management computing devices that obtains and stores a plurality of LUN ranges and an indication of a plurality of ports of a bridge device corresponding to the plurality of LUN ranges. A LUN is received from the bridge device. One of the plurality of ports of the bridge device to which a storage device associated with the received LUN is attached via inclusion in one of a plurality of stacks communicably coupled to the one of the plurality of ports of the bridge device is identified. The received LUN is within one of the plurality of LUN ranges corresponding to the one of the plurality of ports. An action is initiated based on the identified one of the plurality of ports of the bridge.
One or more techniques and/or computing devices are provided for resilient replication of storage operations. For example, a first storage controller may host first storage having a replication relationship with second storage hosted by a second storage controller. To improve resiliency against transient network issues of a network between the storage controllers, the first storage controller may implement a queue and retry mechanism to retry replication operations not acknowledge back by the second storage controller within a threshold time. The second storage controller may maintain a cumulative sequence number of a latest replication operation performed in order, an operation response map of replication operations performed out of order, and an operation finder map identifying currently implemented replication operations, which may be used to process incoming replication operations. Single write semantics, write order consistency, and reduction of write amplification may be provided.
A method, a computing device, and a non-transitory machine-readable medium for assessing data segments for garbage collection is provided. In some embodiments, the method includes identifying a plurality of data segments. A first rate at which data within each of the plurality of data segments has been invalidated since a first point in time is determined, and a second rate at which data within each of the plurality of data segments has been invalidated since a second point in time subsequent to the first point in time is determined. The second rate is compared to the first rate for each of the plurality of data segments, and a garbage collection score is assigned to the respective data segment based on the comparison. The garbage collection score may be further based on a utilization of the respective data segment and/or an age of the respective data segment.
Embodiments address the problem of disk fragmentation by using the heuristics of write operations to assign block sizes. As write requests are received, a storage system may register a size of the write request. Using the registered sizes, the storage system may identify one or more clusters of sizes at which write requests are particularly prevalent. The storage system may calculate a distribution or variance for block sizes centered on each cluster. The distribution or variance may be used to distribute the block sizes such that the block sizes change by a small amount in the vicinity of the cluster, and by a larger amount as the blocks move away from the center of the cluster. When it comes time to allocate new blocks, the clusters and distribution may be consulted to determine what sizes of blocks to allocate, and how many blocks of each size.
A system and method for improving storage system performance by reducing or avoiding load spike amplification when performing garbage collection is disclosed. A storage controller in a storage system tracks system load including write load and read load, as well as available free segments. The storage controller uses these tracked values as inputs and, with these inputs, generates a garbage collection rate. Where read load is included, a scaled portion of the read load is taken into consideration so that, as the number of free segments nears the minimum amount desired and to prevent garbage collecting too slowly, the read load is gradually excluded from the garbage collection rate determination. The garbage collection rate is therefore responsive to system load so that, in times of high system load, the rate reduces as much as is safe so that the write load takes priority with computing resources of the storage controller.
One or more techniques and/or computing devices are provided for synchronous replication. For example, synchronous replication relationships are established between a first storage object (e.g., a file, a logical unit number (LUN), a consistency group, etc.), hosted by a first storage controller, and a plurality of replication storage objects hosted by other storage controllers. In this way, a write operation to the first storage object is implemented in parallel upon the first storage object and the replication storage objects in a synchronous manner, such as using a zero-copy operation to reduce overhead otherwise introduced by performing copy operations. Reconciliation is performed in response to a failure so that the first storage object and the replication storage objects comprise consistent data. Failed write operations and replication write operations are retried, while enforcing a single write semantic. Dependent write order consistency is enforced for dependent write operations, such as overlapping write operations.
G06F 11/20 - Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
A request to generate a storage system model is received. The storage system model represents at least a portion of a storage system. In response to receiving the request, a storage system interface configuration is loaded. The storage system interface configuration comprises an attribute of an entity model. The attribute corresponds to an attribute of a storage system entity of the storage system. Further in response to receiving the request, the entity model is identified as representing the storage system entity. In response to identifying the entity model as representing the storage system entity, the entity model is instantiated.
High-Rate MSR (n, k) erasure codes (HMSR) for application in distributed data storage systems are generated using m r-Ary trees where n=k x m and r = n - k. Nodes in the tree structures represent systematic and parity storage nodes. Each parity symbol for the HMSR erasure codes will be a linear combination of maximum k + klr systematic symbols. The tree structures show that when a systematic node fails, its original systematic symbols can be recovered by accessing beta symbols for each of its leaf nodes from each of the remaining nodes. Traversing the m r-Ary trees to design a codeword array will provide the linear equations needed to solve for and recover the lost systematic symbols. When forming the linear equations, random number or other coefficients can be added to the systematic symbols to construct the parity symbols. The parities of the HMSR erasure code will ensure access-optimal, help-by-transfer recovery of any systematic node failure by using only a minimum bandwith.
H03M 13/03 - Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words
A persistence management system performs, at a server, operations associated with a number of applications. At the server, a persistence manager can intercept a file system call from one of the applications, wherein the file system call specifies a file located on a remote persistent storage device separate from the server. The persistence manager can determine that data belonging to the file requested by the file system call is stored on a local persistent storage device at the server, retrieve the data from the local persistent storage, and respond to the file system call from the application with the data.
One or more techniques and/or computing devices are provided for implementing synchronous replication. For example, a synchronous replication relationship may be established between a local storage controller hosting local storage and a remote storage controller hosting remote storage (e.g., replication may be specified at a file, logical unit number (LUN), or any other level of granularity). Data file operations may be implemented in parallel upon the local storage and the remote storage. Independent metadata file operations may be independently implemented from data file operations upon the local storage, and upon local completion may be remotely implemented upon the remote storage. In-flight data file operations may be drained before dependent metadata file operations are locally implemented upon the local storage, and upon local completion may be remotely implemented upon the remote storage.
G06F 3/06 - Digital input from, or digital output to, record carriers
G06F 11/20 - Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
87.
SYNCHRONOUS REPLICATION FOR STORAGE AREA NETWORK PROTOCOL STORAGE
One or more techniques and/or computing devices are provided for implementing synchronous replication. For example, a synchronous replication relationship may be established between a first storage controller hosting local storage and a second storage controller hosting remote storage (e.g., replication may be specified at a file, logical unit number (LUN), or any other level of granularity). Data operations and offloaded operations may be implemented in parallel upon the local storage and the remote storage. Error handling operations may be implemented upon the local storage and implement in parallel as a best effort on the remote storage, and a reconciliation may be performed to identify any data divergence from the best effort parallel implementation. Storage area network (SAN) operations may be implemented upon the local storage, and upon local completion may be remotely implemented upon the remote storage.
G06F 3/06 - Digital input from, or digital output to, record carriers
G06F 11/20 - Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
One or more techniques and/or computing devices are provided for directory level incremental replication. For example, a first storage controller may evaluate a base snapshot and an incremental snapshot of a source subdirectory to generate a set of operations that can be used by a second storage controller for reconstructing a mirror of the source subdirectory as reflected by the incremental snapshot. Accordingly, the first storage controller may send the set of operations and/or source data to the second storage controller for constructing a destination directory structure mirroring the source subdirectory. In this way, replication may be achieved at an arbitrary level of granularity, such as to replicate a particular subdirectory of a volume.
One or more techniques and/or computing devices are provided for non- disruptively establishing a synchronous replication relationship between a primary volume and a secondary volume and/or for resynchronizing the primary volume and the secondary volume. For example, a baseline snapshot and one or more incremental snapshots of the primary volume are used to construct and incrementally update the secondary volume with data from the primary volume. A dirty region log is used to track modifications to the primary volume. A splitter object is used to split client write requests to the primary volume and to the secondary volume. A synchronous transfer engine session is initiated to processing incoming client write requests using the dirty region log. A cutover scanner is used to transfer dirty data from the primary volume to the secondary volume. In this way, a synchronous replication relationship is established between the primary volume and the secondary volume.
A system and method for recovering a dataset is provided that analyzes the dataset as it currently exists in order to determine those portions that do not need to be recovered. In some embodiments, the method includes identifying a dataset stored on a set of storage devices and corresponding to a first point in time. A request to restore the dataset to a second point in time is received, and a subset of the dataset is identified that is different between the first point in time and the second point in time. Data associated with the subset is selectively retrieved that corresponds to the second point in time, and the retrieved data is merged with the dataset stored on the set of storage devices. The two points in time may have any relationship, and in various examples, the method performs a roll-back or a roll-forward of the dataset.
Disclosed are systems, computer-readable mediums, and methods for transforming data in a file system. As part of a recycling process, a determination is made that transformations should be attempted. A data block is determined to be in use by at least one user of the storage system. If a transformation should be attempted on the data block is determined. Parameters associated with the performance of the file system can be used in this determination. A type of transformation to be done is determined. The data block is transformed based upon the selected transformation. The transformed data block is written to the storage system. As part of the recycling process, the transformation requires no additional input/output requests.
Exemplary embodiments provide methods, mediums, and systems for replicating metafiles between a source and a destination. The metafile may be subdivided into blocks. The contents of the metafile may be transferred by locating the blocks which are changed between the source version of the metafile and the destination version of the metafile. The changed blocks may be examined to retrieve the contents of the changed blocks. The records in the changed blocks may be evaluated to determine whether to create a corresponding record at the destination, delete a corresponding record at the destination, or update a corresponding record atthe destination. Accordingly, the metafile may be replicated in a logical manner, by transferring only changed records rather than the entirety of a changed block. Moreover, the transfer is conducted efficiently because unchanged blocks are eliminated from consideration at the outset.
One or more techniques and/or computing devices are provided for determining whether to perform a switchover operation between computing nodes. A first computing node and a second computing node, configured as disaster recovery partners, may be deployed within a computing environment. The first computing node and the second computing node may be configured to provide operational state information (e.g., normal operation, a failure, etc.) to a cloud environment node state provider and/or cloud persistent storage accessible through a cloud storage service. Accordingly, a computing node may obtain operational state information of a partner node from the cloud environment node state provider and/or the cloud storage service notwithstanding a loss of internode communication and/or an infrastructure failure that may otherwise appear as a failure of the partner node. In this way, the computing node may accurately determine whether the partner node has failed.
G06F 11/20 - Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
G06F 11/07 - Responding to the occurrence of a fault, e.g. fault tolerance
One or more techniques and/or computing devices are provided for automatic switchover implementation. For example, a first storage controller, of a first storage cluster, may have a disaster recovery relationship with a second storage controller of a second storage cluster. In the event the first storage controller fails, the second storage controller may automatically switchover operation from the first storage controller to the second storage controller for providing clients with failover access to data previously accessible to the clients through the first storage controller. The second storage controller may detect, cross-cluster, a failure of the first storage controller utilizing remote direct memory access (RDMA) read operations to access heartbeat information, heartbeat information stored within a disk mailbox, and/or service processor traps. In this way, the second storage controller may efficiently detect failure of the first storage controller to trigger automatic switchover for non-disruptive client access to data.
G06F 11/20 - Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
A system, method, and machine-readable storage medium for recovering data in a distributed storage system are provided. In some embodiments, the method includes identifying a failing storage device of a first storage node having an inaccessible data segment. When it is determined that the inaccessible data segment cannot be recovered using a first data protection scheme, a first chunk of data associated with the inaccessible data segment is identified and a group associated with the first chunk of data is identified. A second chunk of data associated with the group is selectively retrieved from a second storage node such that data associated with an accessible data segment of the first storage node is not retrieved. The inaccessible data segment is recovered by recovering the first chunk of data using a second data protection scheme and the second chunk of data.
G06F 11/08 - Error detection or correction by redundancy in data representation, e.g. by using checking codes
G06F 11/16 - Error detection or correction of the data by redundancy in hardware
G06F 11/20 - Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
G06F 15/16 - Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
G06F 15/177 - Initialisation or configuration control
96.
CACHE FLUSHING AND INTERRUPTED WRITE HANDLING IN STORAGE SYSTEMS
Systems and techniques for cache management are disclosed that provide improved cache performance by prioritizing particular storage stripes for cache flush operations. The systems and techniques may also leverage features of the storage devices to provide atomicity without the overhead of inter-controller mirroring. In some embodiments, the systems and techniques include a storage controller that stores data in a cache. The data is associated with one or more sectors of a storage stripe that is defined over plurality of storage devices. The storage controller identifies a locality of dirty sectors of the one or more sectors, classifies the storage stripe into a category based on the locality, provides a category ordering of the category relative to at least one other category, and flushes the storage stripe from the cache to the plurality of storage devices according to the category ordering.
To leverage the attributes of object storage for applications/systems configured to interface with a network type file system, an object storage backed file system can accept the defined file system commands from the applications/systems and transform the file system commands into requests that target object storage. The file system is "backed" by object storage because attributes and content of file system entities are stored in objects. For instance, content data and metadata of a file are stored in objects in object storage. This object storage backed file system can be considered a bridge between a client perceived hierarchical file system namespace and a flat namespace of an object storage.
One or more techniques and/or computing devices are provided for data synchronization. For example, an in-flight log may be maintained to track storage operations that are received by a first storage node, but have not been committed to both first storage of the first storage node and second storage of a second storage node that has a replication relationship, such as a disaster recovery relationship, with the first storage node. A dirty region log may be maintained to track regions within the first storage that have been modified by storage operations that have not been replicated to the second storage. Accordingly, a catchup synchronization phase (e.g., asynchronous replication by a resync scanner) may be performed to replicate storage operations (e.g., replicate data within dirty regions of the first storage that were modified by such storage operations) to the second storage until the first storage and the second storage are synchronized.
G06F 17/30 - Information retrieval; Database structures therefor
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 11/20 - Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
A method for migration of operations between CPU cores, the method includes: processing, by a source core, one or more tasks and one or more interrupt service routines; accessing a mapping corresponding to a task of the one or more tasks and an interrupt service routine of the one or more interrupt service routines; identifying, based on the mapping, a target core that corresponds to the task and the interrupt service routine; blocking the task from being processed by the source core in response to identifying the target core; in response to identifying the target core, disabling an interrupt corresponding to the interrupt service routine; in response to identifying the target core, assigning the task and the interrupt to the target core; after assigning the interrupt to the target core, enabling the interrupt; and after assigning the task to the target core, processing the task by the target core.
G06F 11/07 - Responding to the occurrence of a fault, e.g. fault tolerance
G06F 15/80 - Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
H04L 12/803 - Load balancing, e.g. traffic distribution over multiple links
100.
DYNAMIC RESOURCE ALLOCATION BASED UPON NETWORK FLOW CONTROL
One or more techniques and/or devices are provided for dynamic resource allocation based upon network flow control. For example, a first counter, corresponding to a count of communication availability signals provided by a network interface to a storage process, may be maintained. A second counter, corresponding to a count of communication unavailability signals provided by the network interface to the storage process, may be maintained. Responsive to the first counter exceeding a resource allocation threshold, additional resources may be dynamically allocated to the storage process during operation of the storage process. Responsive to the second counter exceeding a resource deallocation threshold, resources may be dynamically deallocated from the storage process during operation of the storage process. In this way, resources allocation for the storage process may be dynamically adjusted based upon real-time network flow control information indicative of whether the storage process is efficiently utilizing network communication channel availability.