A method for storing data, comprising receiving, by a file system (FS) client executing on an offload component, a first request from a translation module, wherein the translation module translated a second request that was to be performed on an emulated block device into the first request, wherein the first request is specified using file semantics, wherein the first request is associated with data, wherein the offload component is located in a hardware layer of a client application node, and wherein the translation module is located on the offload component, and processing the first request by the FS client and a memory hypervisor module, wherein the FS client and the memory hypervisor module are executing in a modified client FS container on the offload component, wherein processing the first request results in at least a portion of the data being stored in a location in a storage pool.
A method for securing data, the method including obtaining, from a metadata node and by file system (FS) client executing on a client application node, a data layout and an encryption key, encrypting, by the client application node, the data stored on the client application node using the encryption key to obtain encrypted data, generating, by a memory hypervisor module executing on the client application node, at least one input/output (I/O) request, wherein the at least one I/O request specifies a location in a storage pool, wherein the location is determined using the data layout, and issuing, by the memory hypervisor module, the at least one I/O request to the storage pool, wherein processing the at least one I/O request results in at least a portion of the encrypted data being stored at the location.
G06F 21/53 - Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity, buffer overflow or preventing unwanted data erasure by executing in a restricted environment, e.g. sandbox or secure virtual machine
G06F 12/14 - Protection against unauthorised use of memory
G06F 21/62 - Protecting access to data via a platform, e.g. using keys or access control rules
An aspect of the present disclosure relates to one or more techniques to identify and resolve storage array errors. In embodiments, an error notification related to a computing device can be received. One or more threads related to the error notification can further be identified. Additionally, an error resolution technique can be performed based on each identified thread.
Direct read in clustered file systems is described herein. A method as described herein can include determining, for a write operation on a resource stored by a data storage system, as initiated by an initiator node, a reference count for the resource, the reference count comprising a number of target storage regions of the data storage system to be modified by write data during the write operation; facilitating conveying, from the initiator node to a lock coordinator node, the reference count for the resource; facilitating conveying, from the initiator node to respective participant nodes that are respectively assigned to the target storage regions, the write data and a key value for the write operation; and facilitating causing the respective participant nodes to convey respective notifications that comprise the key value in response to the respective participant nodes writing the write data to the target storage regions.
The present disclosure relates to one or more memory management techniques. In embodiments, one or more regions of storage class memory (SCM) of a storage array is provisioned as expanded global memory. The one or more regions can correspond to SCM persistent cache memory regions. The storage array's global memory and expanded global memory can be used to execute one or more storage-related services connected to servicing (e.g., executing) an input/output (IO) operation.
One or more aspects of the present disclosure relate to recovering at least one failed disk. In embodiments, determining a storage reserve capacity allocated for recovering at least one storage device of a storage array is determined. Zero or more storage portions from each storage device of at least one storage cluster for disk recovery are adaptively assigned based on the storage reserve capacity. The failing and/or failed disk using the assigned storage portions is recovered in response to detecting a failing and/or failed disk.
Techniques for managing data patterns involve: acquiring multiple sets of data patterns respectively associated with multiple collection devices, wherein a set of data patterns in the multiple sets of data patterns represent patterns of duplicate data in data from one of the multiple collection devices; dividing the multiple collection devices into multiple groups based on clusters of the multiple sets of data patterns; and determining, based on sets of data patterns associated with collection devices in a group in the multiple groups, a set of shared data patterns for sharing among the collection devices in the group. Accordingly, data patterns that can be shared among multiple collection devices can be determined in a more accurate and effective manner, thereby facilitating the removal of duplicate data from the multiple collection devices.
Load balancing may include: receiving I/O workloads of storage server entities that service I/O operations received for logical devices, wherein each logical device has an owner that is one of the storage server entities that processes I/O operations directed to the logical device; determining normalized I/O workloads corresponding to the I/O workloads of the storage server entities; determining, in accordance with utilization criteria, imbalance criteria and the normalized I/O workloads, whether to rebalance the I/O workloads of the storage server entities; and responsive to determining to rebalance the I/O workloads of the storage server entities, performing processing to alleviate a detected I/O workload imbalance between two storage server entities. The processing may include moving logical device from a first storage server entity to a second storage server entity; and transferring ownership of the logical device from the first to the second storage server entity.
Dictionary-based compression is performed to compress data units using a similar data unit as the base unit (i.e., dictionary) for each candidate data unit. Similarity may be determined between data units by applying a locality-sensitive hashing scheme to each candidate data unit to produce a hash value, and by determining whether there is a matching value in a hash index of hash values for existing data units on the system. If there is a matching hash value, the candidate data unit may be compressed using the data unit corresponding to the matching hash value as the dictionary. Only a representative portion of the data unit may be hashed to produce the hash value, the portion comprised of chunks of the data unit, where each chunk is a continuous, uninterrupted section of data. The chunks themselves may not be (in some embodiments likely are not) contiguous to one another.
One example method includes correlating trust scoring with authentication levels. Trust scores are protected in a computing system such that devices can be validated. Authentication levels are based on the verified trust scores.
A method and a non-transitory storage medium having stored therein instructions that are executable by one or more hardware processors to perform operations comprising: receiving from a user, by a trust algorithm, primary input that comprises a user query that specifies search parameters, a list of one or more trust factor definitions, and a respective user-specified weighting for each trust factor definition; receiving secondary system inputs and, based on the search parameters, retrieving data from the secondary system inputs; running, on the data retrieved from the secondary system inputs, one or more trust factor functions, each of which generates a respective trust factor; generating a trust score by running a trust score function on the trust factors; aggregating the data with the trust score to create a result set; and storing the result set.
Rebalancing the workload of logical devices across multiple nodes may include dynamically modifying preferred paths for one or more logical devices in order to rebalance the I/O workload of the logical devices among the nodes of the data storage system. Determining whether to rebalance the I/O workload between the two nodes may be performed in accordance with one or more criteria. Processing may include monitoring the current workloads of both nodes over time and periodically evaluating, in accordance with the one or more criteria, whether the current workloads of the nodes are imbalanced. Responsive to determining, in accordance with the criteria, that rebalancing of workload between the nodes is needed, the rebalancing may be performed. A notification may be sent to the host regarding any path state changes made as a result of the workload rebalancing.
Embodiments of the present disclosure provide a method for distributing virtual visual content, including: sending a first content portion in virtual visual content to be interacted in user equipment to a plurality of edge devices; selecting at least one edge device from the plurality of edge devices; and sending a second content portion in the virtual visual content to the selected at least one edge device, the second content portion having a higher change rate than the first content portion in the interaction. According to the embodiments of the present disclosure, a portion of virtual visual content can be distributed to selected edge devices in advance, and there is no need to distribute the virtual visual content to all edge devices, thereby saving network burden, reducing bandwidth requirements, and improving distribution efficiency.
In an information processing system with at least a first node and a second node separated from the first node, and each of the first node and the second node configured to execute an application in accordance with at least one entity that moves from a proximity of the first node to a proximity of the second node, a method maintains, as part of a context at the first node, a set of status indicators for a set of computations associated with a computation graph representing at least a portion of the execution of the application at the first node. Further, the method causes the transfer of the context from the first node to the second node to enable the second node to continue execution of the application using the transferred context from the first node.
Managing lock coordinator rebalance in distributed file systems is provided herein. A node device of a cluster of node devices can comprise a processor and a memory that stores executable instructions that, when executed by the processor, facilitate performance of operations. The operations can comprise determining an occurrence of a group change between a cluster of node devices and executing a probe function based on the occurrence of the group change. Further, the operations can comprise reasserting first locks of a group of locks based on a result of the probe function indicating reassertion of the first locks. The second locks of the group of locks, other than the first locks, are not reasserted based on the result of the probe function. The cluster of node devices can operate as a distributed file system.
Determining and using deduplication estimates may include: determining two deduplication sample indexes (DSIs) for two logical device sets each including one or more logical devices, determining a Jaccard Similarity for the two DSIs, wherein the Jaccard Similarity denotes a measurement of similarity and mutual deduplication between the two logical device sets; determining, in accordance with one or more criteria, whether the two logical device sets should be located in different data storage systems or a same data storage system that performs data deduplication, wherein the one or more criteria uses the Jaccard Similarity in determining whether to locate the two logical device sets in the same data storage system or the different data storage systems; and responsive to determining that the two logical device sets should be located in the same data storage system, locating the two logical device sets in the same data storage system.
Systems and methods facilitating fault tolerance for transaction mirroring are described herein. A method as described herein includes: receiving a commit command for a data transaction from an initiator node of the system, wherein the data transaction is associated with a first failure domain, and wherein the commit command is directed to a primary participant node and a secondary participant node of the system; determining whether a response to the commit command has been received at the primary participant node from the secondary participant node in response to the receiving; and, in response to determining that the response to the commit command was not received at the primary participant node, indicating that the secondary participant node is invalid in a data store associated with a second failure domain that is distinct from the first failure domain.
Techniques for caching may include: determining an update to a first data page of a first cache on a first node, wherein a second node includes a second cache and wherein the second cache includes a copy of the first data page; determining, in accordance with one or more criteria, whether to send the update from the first node to the second node; responsive to determining, in accordance with the one or more criteria, to send the update, sending the update from the first node to the second node; and responsive to determining not to send the update, sending an invalidate request from the first node to the second node, wherein the invalidate request instructs the second node to invalidate the copy of the first data page stored in the second cache of the second node.
Techniques are provided for consistent entity tags with multiple protocol data access. In an example, a file storage system is configured to process data according to file storage protocol(s) and object storage protocol(s). An object storage protocol can utilize entity tags that indicate whether an object (represented with a file in the file storage system) has changed. Where a file storage protocol is utilized to modify a file, an indication may be stored that indicates that the file lacks a valid entity tag. If an object storage operation is made to retrieve an object, and if the object corresponds to a valid entity tag, then that entity tag can be returned as part of the response. If the object does not correspond to a valid entity tag, then the file storage system can generate a new entity tag and return the newly generated entity tag as part of the response.
One example method includes performing, as part a planned failover procedure, operations that include connecting a replica OS disk to a replica VM, powering up the replica VM, booting an OS of the replica VM, disconnecting a source VM from a network, and connecting replica data disks to the replica VM. IOs issued by an application at the source VM continue to be processed by the source VM while the replica OS disk is connected, the replica VM is powered up, and the OS of the replica VM is booted.
Techniques are provided for processing user input/output (I/O) write requests in a fault- tolerant data storage system (e.g., a RAID storage system) by selecting between performing a degraded write operation or a write operation to spare capacity, when the fault-tolerant data storage system is operating in a degraded mode. A method includes receiving a user I/O write request comprising data to be written to a RAID array operating in a degraded mode, and determining whether spare capacity has been allocated for rebuilding missing data of an inaccessible storage device of the RAID array and whether a missing data block, which is associated with I/O write request, has been rebuilt to the spare capacity. A degraded write operation is performed without using the spare capacity, when the missing data block, which is associated with the data of the I/O write request, has not been rebuilt to the allocated spare capacity.
A method includes retrieving, with a masker controller job, an object and an associated object ID from a masking bucket that is defined in storage, making a copy of the object, with a masker worker microservice, masking the copy of the object to create a masked object, transmitting the masked object to an object access microservice, with the object access microservice, transmitting the masked object to a deduplication microservice, with the deduplication microservice, deduplicating the masked object, and storing the masked object in the storage.
Data protection operations including replication operations that dynamically adapt a topology of replica virtual machines. A data protection system may implement a machine model that is trained using, as input, characteristics of virtual machines. When a failure is predicted, a topology of the replica virtual machines is changed. The topology may also change when changes in the environment are detected. The changes may include redistributing the protected applications to the replica virtual machines and/or scaling the replica virtual machines.
On-the-fly point-in-time recovery operations are disclosed. During a recovery operation, the PiT being restored can be changed on-the-fly or during the existing recovery operation without restarting the recovery process from the beginning. IN one example, this improves recovery time operation (RTO) and prevents aspects of the recovery operation to be avoided when changing to a different PiT.
Data protection operations including replication operations are disclosed. Virtual machines, applications, and/or application data are replicated according to at least one strategy. The replication strategy can improve performance of the recovery operation.
One example method includes contacting, by a client, a service, receiving a credential from the service, obtaining trust information from a trust broker, comparing the credential with the trust information, and either connecting to the service if the credential 5 and trust information match, or declining to connect to the service if the credential and the trust information do not match. Other than by way of the trust information obtained from the trust broker, the client may have no way to verify whether or not the service can be trusted.
One example method includes performing a cloning process that includes cloning an OS disk and a data disk of a source VM to create a replica VM having an OS disk and a data disk that correspond, respectively, to the OS disk and data disk of the source VM. A replication process is then performed that includes replicating an application from the data disk of the source VM to the data disk of the replica VM, and the replication process does not include any replication of the OS disk of the source VM to the OS disk of the replica VM. Finally, the replica VM is powered up so that the OS of the replica VM is running, and the application is running on the replica VM. When performing a recovery with the replica VM, the RTO does not include an OS boot time.
Generating any point in time backups without native snapshot generation. Production data is split such that a journal stream is sent to a data protection system, which may be local or remote. The journal stream includes a data stream and a metadata stream. Backups are synthesized at the data protection system by rolling at least a portion of the journal. A backup for any point in time represented in the journal can be synthesized.
Masking a data rate of transmitted data is disclosed. As data is transmitted from a production site to a secondary site, the data rate is masked. Masking the data rate can include transmitting at a fixed rate, a random rate, or an adaptive rate. Each mode of data transmission masks or obscures the actual data rate and thus prevents others from gaining information about the data or the data owner from the data transfer rate.
An aspect includes implementing capacity reduction in a storage system includes for each of a candidate page and a target page in the storage system, identifying a subset of sectors having identical data or a minimum amount of non-identical data, performing a bit-wise exclusive OR (XOR) operation on sectors of the candidate page and the target page, determining entropy from results of the XOR operation. Upon determining the entropy is less than or equal to a threshold value, an aspect includes building a reference page from an XOR sector containing results of the bit-wise XOR operation, and performing a compression operation on the reference page.
Techniques for handling data with different lifetime characteristics in stream-aware data storage systems. The data storage systems can include a file system that has a log-based architecture design, and can employ one or more solid state drives (SSDs) that provide log-based data storage, which can include a data log divided into a series of storage segments. The techniques can be employed in the data storage systems to control the placement of data in the respective segments of the data log based at least on the lifetime of the data, significantly reducing the processing overhead associated with performing garbage collection functions within the SSDs.
G06F 12/06 - Addressing a physical block of locations, e.g. base addressing, module addressing, address space extension, memory dedication
G06F 3/00 - Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
One example method includes receiving, from an entity, a proposed entry for a ledger, where the ledger is shared and accessible by multiple users and includes a whitelist and a blacklist, determining, or assigning, a credibility score and rate limiter value for the entity, comparing the credibility score and rate limiter value with respective credibility score and rate limiter value thresholds, determining that the credibility score and rate limiter value meet or exceed the respective credibility score and rate limiter value thresholds, and submitting the proposed entry to the ledger.
One example method includes receiving from a node, in an HSAN that includes multiple nodes, an ADD_DATA request to add an entry to a distributed ledger of the HSAN, the request comprising a user ID that identifies the node, a hash of a data segment, and a storage location of the data segment at the node, performing a challenge-and-response process with the node to verify that the node has a copy of the data that was the subject of the entry, making a determination that a replication factor X has not been met, and adding the entry to the distributed ledger upon successful conclusion of the challenge-and-response process.
Systems and methods for providing any point in time functionality. With a storage system such as a VSAN, any point in time protection is achieved by combining a metadata stream with snapshots of the storage system. This allows snapshots to be generated in hindsight such that any point in time functionality is provided.
One example method includes receiving input concerning a mobile IoT device, and the input includes information about a location of the mobile IoT device, information about whether the mobile IoT device is moving, and, when the mobile IoT device is moving, information about the range, speed, and bearing of the mobile IoT device. Next, the method includes generating a predicted location of the mobile IoT device based on the inputs received, using the predicted location of the mobile IoT device and a map of nodes in an environment where the mobile IoT device is located to make a migration decision concerning an application used by the mobile IoT device, and migrating the application from a present location to a node expected to be accessible by the mobile IoT device when the mobile IoT device reaches the predicted location, and the node and present location are physically separated by a distance.
Systems and methods for performing data protection operations including garbage collection operations and copy forward operations. For deduplicated data stored in a cloud-based storage or in a cloud tier that stores containers containing dead and live segments or dead and live regions such as compression regions, the dead compression regions are deleted by copying the live compression regions into new containers and then deleting the old containers. The copy forward is based on a recipe from a data protection system and is performed using a serverless approach.
Systems and methods for performing data protection operations including replication operations. A replication operation may automatically learn and predict when a replication system will need to switch modes, such as to a protective mode or to a fast-forward mode. The replication operation ensures that the data is replicated in a manner that optimizes the ability to retain data needed to perform point in time recovery operations while prioritizing the replication operation of new data.
Systems and methods for performing data protection operations including garbage collection operations and copy forward operations. For deduplicated data stored in a cloud-based storage or in a cloud tier that stores containers containing dead and live regions such as compression regions, the dead segments in the dead compression regions are deleted by copying the live compression regions into new containers and then deleting the old containers. The copy forward is based on a recipe from a data protection system and is performed using a microservices based approach.
One example method includes chunking a respective disk of each of a plurality of virtual machines (VM) to create a respective plurality of chunks associated with each of the VMs, creating, based on the chunking process, a cluster comprising one or more of the VMs, creating a VM template whose data and disk structure match respective data and disk structures of each of the VMs in the cluster, and in response to a file operation involving a first one of the VM disks, defragmenting the first VM disk so that a disk structure of the first VM disk is the same as a disk structure of the VM template.
Systems and methods for performing data protection operations including garbage collection operations and copy forward operations. For deduplicated data stored in a cloud-based storage or in a cloud tier that stores containers containing dead and live segments, the dead segments are deleted by copying live segments into new containers and then deleting the old containers. The copy forward is based on a recipe from a data protection system and is performed using a microservices that can be run as needed in the cloud.
One example method includes receiving a set of filesystem parameters, creating a simulated filesystem based on the filesystem parameters, receiving a set of target characteristics for a file collection, based on the target characteristics, slicing a datastream into a grouping of data slices, populating the simulated files with the data slices to create the file collection and forward or reverse morphing the file collection from one generation to another without rewriting the entire file collection.
One example method includes receiving a first data stream that has a compressibility greater than zero, receiving a second data stream that has a compressibility that is different from the compressibility of the first data stream, receiving a compressibility merging parameter N, creating a mixed data stream having a compressibility of N by mixing data from the first data stream with data from the second data stream, and outputting the mixed data stream.
One example method includes receiving 'n' data streams, where 'n' is ≥ 2, receiving a commonality parameter 'F', creating a mixed data stream having a commonality of 'F' by mixing data from the 'n' data streams together, and outputting the mixed data stream. The mixed data stream may be provided to a deduplication engine for deduplication of stream data that is common to one or more other data streams.
Systems and methods for replicating a device image to a storage such as the cloud. The cloud is seeded with a base image that corresponds to the device. Changes between the contents of the device and the base image are identified, uploaded to the cloud, and applied to the image. The changes are tracked continuously and the image in the cloud can thus be used to restore the device to any point in time. The cloud image can also be used in a cloud based virtual machine that provides a user of the device with access to the device's contents via the cloud based image.
One example includes performing a VM restore instance type discovery process, creating a test VM with a VM restore instance type matching a VM restore instance type identified during discovery, using the test VM to create a test restore VM at a cloud storage site, restoring the test VM at the cloud storage site using the test restore VM, generating a 4-D baseline vector based on the restoration of the test VM, the 4-D baseline vector identifying a particular VM restore instance type, generating a 5-D vector based on the 4-D baseline vector, ranking the 5-D vector relative to other 5-D vectors, the 5-D vectors identifying the same production site VM, and restoring, at the cloud storage site, the production site VM identified in the 5-D vectors, the production site VM restored at the cloud storage site has a VM restore instance type identified in the highest ranked 5-D vector.
An apparatus in an illustrative embodiment comprises at least one processing device comprising a processor coupled to a memory. The apparatus is configured to maintain a snapshot tree data structure having a plurality of volume nodes corresponding to respective ones of (i) a root volume and (ii) multiple snapshots related directly or indirectly to the root volume. The apparatus is further configured to receive a request to read a data item from a given volume offset of a particular one of the volume nodes, to determine a set of data descriptors for the given volume offset, to determine a set of volume nodes of interest for the particular volume node, to determine a contribution set based at least in part on the set of data descriptors and the set of volume nodes of interest, to determine a read address for the data item as a function of the contribution set, and to read the data item from the read address.
An apparatus in an illustrative embodiment comprises at least one processing device comprising a processor coupled to a memory. The apparatus is configured to maintain a snapshot tree data structure having a plurality of volume nodes corresponding to respective ones of (i) a root volume and (ii) multiple snapshots related directly or indirectly to the root volume. The apparatus is further configured to determine a set of data descriptors for a given volume offset, to determine a set of reader volume nodes that are readers of a corresponding data item based at least in part on the set of data descriptors, to adjust one or more of the data descriptors in the set of data descriptors based at least in part on the set of reader volume nodes, and to reclaim storage space previously allocated to the data item responsive to the adjusting of the one or more data descriptors.
Systems and methods for marking similarity groups impacted by a garbage collection operation are disclosed. Similarity groups are used to identify segments associated with objects in a computing system. Using deletion records that identify objects to be deleted, the similarity groups impacted by the deletion records can be identified. The live segments associated with the impacted similarity groups are also identified. This allows segments that are associated with the deleted objects and that are not associated with any live objects to be removed.
Systems and methods for cleaning a storage system. A deduplicated storage system is cleaned by identifying structures that include dead or unreferenced segments. This includes processing recipes to identify the segments that are no longer part of a live object recipe. Then, the dead segments are removed. This is accomplished by copying forward the live segments and then deleting, as a whole, the structure that included the dead segments.
A consumption request, comprising a stack parameter and a resource characteristic parameter, is accessed. The stack parameter specifies at least one type of storage asset that is requested. The resource characteristic parameter specifies at least one functional capability required of the storage asset. Based on the stack parameter, a set of one or more first storage assets able to satisfy the consumption request, is determined. For each first storage asset in the set that is not deployed, a first workflow is generated, the first workflow configured to deploy the respective first storage asset in the set that is not deployed. For each second storage asset in the set that lacks the resource characteristic parameter, a second workflow, configured to implement that resource characteristic in the respective second storage asset, is generated. The set of storage assets is configured to satisfy the consumption request, by running the first and second workflows.
One example method includes receiving, at a blockchain node of an auditing cloud service, information associated with one or more data management transactions, registering, at the blockchain node, the information received concerning the data management transactions, receiving, by the cloud auditing service, a request for access to the information registered at the blockchain node, and, enabling, by the cloud auditing service, access to the requested information.
Systems, apparatus and methods for protecting data stored in the cloud or other storage are provided. A distributed ledger is used to record transactions between a client and an object store. The distributed ledger records the transaction and also attests to the object authenticity. Thus, the transactions can be verified and may assist in resolving issues that arise with respect to the stored objects. The ledger and entries therein allow risk of loss associated with the data to be evaluated and allow the data to be insured against loss and/or for liability.
One example method includes creating a backup of data, creating metadata associated with the backup, hashing the backup to create a backup hash, obtaining a key from a blockchain, generating an aggregate hash of a combination that includes the key and the backup hash, and transmitting the aggregate hash to a blockchain network. Because the aggregate hash is not modifiable when stored in a blockchain, an immutable record exists that establishes when a particular backup was created.
One example method includes performing a data management transaction, such as a data read operation, a data write operation, or a data delete operation, generating transaction metadata relating to the data management transaction, transmitting the transaction metadata to a blockchain network, and receiving, from the blockchain network, confirmation that the transaction metadata has been stored in a distributed ledger associated with the blockchain network.
An apparatus comprises a host device configured to communicate over a network with a storage system comprising a plurality of storage devices. The host device comprises a set of input- output queues and a multi-path input-output driver configured to select input-output operations from the set of input-output queues for delivery to the storage system over the network. The multi- path input-output driver is further configured to maintain payload size counters to track outstanding command payload for respective ones of a plurality of paths from the host device to the storage system, to detect an oversubscription condition relating to at least one of the paths based at least in part on values of one or more of the payload size counters, and to initiate one or more automated actions responsive to the detected oversubscription condition. For example, automated deployment of one or more additional paths associated with respective spare communication links between the host device and the storage system may be initiated.
Deduplication of files or other data or objects in a manner that is aware of a format of the file and the application. A file may be chunked based on the format. The chunks are more consistent and lead to higher deduplication ratios. The file may be presented as a single file. However, the file is stored in chunks or subfiles and deduplication is performed with respect to the chunks. When the file is read, the file is rebuilt from its respective chunks. The files may also be compressed using differential compression, which leverages the content of similar files to compress a current file.
One example method includes implementing a function as a service (FaaS) at a datacenter by performing operations including receiving an application program interface (API) gateway call from a client application, wherein the API gateway call is associated with an object PUT request, and automatically triggering, with the API gateway call, performance of an object insertion function. The object insertion function includes retrieving, from backend object storage, a previous version of the object, differentially compressing the object relative to the previous version of the object so as to generate a differential, and storing the differential in the backend object storage.
Systems, apparatus, and methods for any point in time replication to the cloud. Data is replicated by replicating data to a remote storage or a data bucket in the cloud. At the same time, a metadata stream is generated and stored. The metadata stream establishes a relationship between the data and offsets of the data in the production volume. This allows continuous replication without having to maintain a replica volume. The replica volume can be generated during a rehydration operation that uses the metadata stream to construct the production volume from the cloud data.
Systems, apparatus and methods for managing an object's lifecycle in an object store. A distributed ledger is used to record transactions between a client and an object store. The distributed ledger records the transaction and also attests to the object authenticity. Thus, the transactions can be verified and may assist in resolving issues that arise with respect to the stored objects.
One example method includes creating an empty reconstruction stream database, identifying a data time interval, identifying data sources in which data was stored during the data time interval, reading data from the data sources, where the data read out from the data sources are associated with respective timestamps that fall within the data time interval, inserting the read out data into the empty reconstruction stream database so as to create a high resolution data stream, where the data are ordered in the empty reconstruction stream database according to timestamp, processing the data in the high resolution data stream and, based on the processing of the data, identifying and resolving a problem relating to an operating environment in which the data was initially generated.
Systems and methods for allocating resources are disclosed. Resources such as streams are allocated using restore credits. Credits are issued to the clients in a manner that ensure the system is operating in a safe allocation state. The credits can be used not only to allocate resources but also to throttle clients where necessary. Credits can be granted fully, partially, and in a number greater than requested. Zero or negative credits can also be issued to throttle clients. Restore credits are associated with reads and may be allocated by determining how many credits a CPU/cores can support. This maximum number may be divided amongst clients connected with the server.
Systems and methods for allocating resources are disclosed. Resources as processing time, writes or reads are allocated. Credits are issued to the clients in a manner that ensure the system is operating in a safe allocation state. The credits can be used not only to allocate resources but also to throttle clients where necessary. Credits can be granted fully, partially, and in a number greater than requested. Zero or negative credits can also be issued to throttle clients. Segment credits are associated with identifying unique fingerprints or segments and may be allocated by determining how many credits a CPU/cores can support. This maximum number may be divided amongst clients connected with the server.
One example method includes exposing a block storage which is distributed across a group of multiple sites, receiving a primary write request that identifies data to be stored, separating data identified in the primary write request into multiple data pieces, encoding the data pieces by creating multiple new blocks of data based on the multiple data pieces, where the data pieces are encoded in such a way that when a sufficient number, but fewer than all, of the multiple new blocks of data are retrieved, the data identified in the write request is recoverable by decoding, and writing the new blocks of data to different respective sites of the group, where writing of the new blocks of data is performed in conjunction with a plurality of secondary write requests, each of which corresponds to one of the new blocks of data.
Systems and methods for allocating resources are disclosed. Resources such as streams are allocated using a stream credit system. Credits are issued to the clients in a manner that ensure the system is operating in a safe allocation state. The credits can be used not only to allocate resources but also to throttle clients where necessary. Credits can be granted fully, partially, and in a number greater than a request. Zero or negative credits can also be issued to throttle clients.
An apparatus in one embodiment comprises a host device that includes a set of input-output (IO) queues and a current multi-path input-output (MPIO) driver configured to select IO operations from the set of IO queues for delivery to the storage system. The current MPIO driver is configured to group a plurality of paths from the host device to a logical unit number of the storage system into a multi-path logical device. The host device is configured to install a target MPIO driver and to migrate control of the multi-path logical device to the target MPIO driver from the current MPIO driver where the migration comprises transferring IO entry points of the multi-path logical device from the current MPIO driver to the target MPIO driver. The host device is configured to deliver IO operations selected by the target MPIO driver to the storage system using the multi-path logical device.
An apparatus in one embodiment comprises a host device configured to communicate over a network with a storage system comprising a plurality of storage devices. The host device includes a set of input-output queues and a multi-path input-output driver configured to select input-output operations from the set of input-output queues for delivery to the storage system. The multi-path input-output driver is configured to analyze an input-output load pattern of the host device for a predetermined period of time and to categorize the input-output load pattern into one of a plurality of predetermined load pattern categories based at least in part on the analysis. The multi-path input-output driver is configured to transmit information specifying the categorization of the input-output load pattern to the storage system. The storage system is configured to adjust its processing of input-output operations based at least in part on the categorization of the input-output load pattern.
An illustrative embodiment includes a host device configured to communicate over a network with a storage system comprising a plurality of storage devices. The host device comprises a set of input-output queues and a multi-path input-output driver configured to select input-output operations from the set of input-output queues for delivery to the storage system over the network. The multi-path input-output driver is further configured to determine fabric identifiers for respective ones of a plurality of paths from the host device to the storage system, and to select particular ones of the paths for delivery of the input-output operations to the storage system based at least in part on the fabric identifiers. The fabric identifiers may be determined for the respective paths, for example, based at least in part on responses to a predetermined command sent over the paths by the multi-path input-output driver.
Managing execution of a workflow has a set of subworkflows. Optimizing the set of subworkflows using a deep neural network, each subworkflow of the set has a set of tasks. Each task of the sets has a requirement of resources of a set of resources; each task of the sets is enabled to be dependent on another task of the sets of tasks. Training the deep neural network by: executing the set of subworkflows, collecting provenance data from the execution, and collecting monitoring data that represents the state of said set of resources. Training causes the neural network to learn relationships between the states of the set of resources, the sets of tasks, their parameters and the obtained performance. Optimizing an allocation of resources to each task to ensure compliance with a user-defined quality metric based on the deep neural network output.
Technology for proactively allocating data storage resources to a storage object. A rate at which host I/O operations directed to the storage object are received and/or processed is monitored during a monitored time period, and a high activity time range is identified. An anticipatory time range is defined that is a range of time immediately preceding the high activity time range. During the anticipatory time range within a subsequent time period following the monitored time period, high performance non-volatile storage is allocated to the storage object that is available for processing host I/O operations directed to the storage object at the beginning of and throughout the high activity time range. A low activity time range may also be identified, and lower performance non-volatile storage may be allocated to the storage object within an anticipatory time range immediately preceding the low activity time range.
A computing device includes a persistent storage and a processor. The persistent storage includes an asset. The processor obtains a computation request for the asset, instantiates an executable entity based on a computation prototype and a manifest associated with the asset, performs the computation request using the instantiated executable entity and metadata specified by the manifest associated with the asset to obtain a computation result; and provides the obtained computation result.
A computing device includes a persistent storage and a processor. The processor includes a local storage. The local storage includes blocks and an address space. The address space includes a first portion of entries that specify blocks of the local storage and a second portion of entries that specify blocks of the remote data storage. The processor obtains data for storage and makes a determination that the data cannot be stored in the local storage. In response to the determination, the processor stores the data in the remote storage using the second portion of entries.
A technique manages file data of a file system. The technique involves provisioning a first LUN with slice locations from a heterogeneous storage pool created from a solid state device (SSD) storage tier formed from SSD memory and a hard disk drive (HDD) storage tier formed from HDD memory. The technique further involves provisioning a second LUN with slice locations from the heterogeneous storage pool. The technique further involves, while different LUN level policies are applied to the first LUN and the second LUN, moving file data of a file system from the slice locations of the first LUN to the slice locations of the second LUN. Such a technique enables effective auto-tiering at the file level where active file data moves to a higher performance storage tier and inactive file data moves to a cost effective lower performance storage tier thus optimizing operation.
A technique operates multiple data storage tiers including a solid state drive (SSD) storage tier having SSD storage components and a hard disk drive (HDD) storage tier having magnetic disk devices. The technique involves establishing write quotas for the SSD storage components of the SSD storage tier. Each write quota identifies an amount of data that is permitted to be written to a respective SSD storage component during a predefined amount of time. The technique further involves consuming the write quotas in response to write operations performed on the SSD storage components of the SSD storage tier. The technique further involves, in response to a particular write quota for a particular SSD storage component of the SSD storage tier becoming fully consumed, performing a set of remedial activities on the multiple storage tiers to protect operation of the particular SSD storage component of the SSD storage tier.
A technique is directed to performing a tuning operation in data storage equipment. The technique involves generating, while the data storage equipment performs input/output (I/O) transactions, an observed I/O statistics profile based on performance of at least some of the I/O transactions. The technique further involves performing a comparison operation that compares the observed I/O statistics profile to an expected I/O statistics profile which is defined by a set of operating settings that controls operation of the data storage equipment. The technique further involves operating the data storage equipment in a normal state when a result of the comparison operation indicates that the observed I/O statistics profile matches the expected I/O statistics profile and in a remedial state which is different from the normal state when the result of the comparison operation indicates that the observed I/O statistics profile does not match the expected I/O statistics profile.
A technique processes read requests from a set of requesters. The technique involves providing, while a first data element and a second data element are stored in secondary storage, the first data element from the secondary storage to the set of requesters in response to a first request to read the first data element from the set of requesters. The technique further involves providing, after the first data element is provided to the set of requesters in response to the first request, the second data element to the set of requesters in response to a second request to read the second data element from the set of requesters. The technique further involves maintaining, in response to detecting that the first data element and the second data element match, a single copy of the first and second data elements in a read cache for subsequent read access.
A technique manages data in slices of difference sizes within different storage tiers. The technique involves, based on access activity for first data currently residing within a first slice having a first size, selecting a target set of storage devices within which to store the first data from among multiple sets of storage devices. The technique further involves moving the first data from the first slice having the first size to a second slice having a second size that is different from the first size. The technique further involves, after the first data is moved from the first slice to the second slice, storing the second slice in the target set of storage devices.
A technique manages data within solid state device (SSD) storage. The technique involves, in response to writing data to a set of SSD storage components, consuming a set of recurring write quotas for the set of SSD storage components. Each recurring write quota identifies an amount of remaining usefulness for a respective SSD storage component, e.g., periodically allocated budgets for write operations based on measured (or counted) reliability and/or healthiness factors. The technique further involves, as the set of recurring write quotas are consumed, performing a set of quota evaluation operations to evaluate the set of recurring write quotas. The technique further involves, in response to a set of results from the set of quota evaluation operations, performing a set of remedial activities to control access to the data that was written to the set of SSD storage components.
A technique balances data storage activity within a mapped-RAID environment. The technique involves selecting, by processing circuitry, a source slice of storage from multiple slices of storage of the mapped-RAID environment, the source slice containing particular data to be relocated. The technique further involves selecting, by the processing circuitry, a destination slice of storage from the multiple slices of storage of the mapped-RAID environment. The technique further involves relocating, by the processing circuitry, the particular data from the source slice to the destination slice to balance data storage activity within the mapped-RAID environment. The mapped-RAID environment includes multiple storage devices. Each storage device provides multiple non-overlapping device extents. Each slice of the multiple slices of storage of the mapped-RAID environment is formed of storage stripes extending across device extents provided by a group of storage devices that includes less than all of the storage devices of the mapped-RAID environment.
A technique performs best-effort deduplication. The technique involves activating a front-end log deduplication service that is configured and operative to perform deduplication operations on data in front-end log-based storage prior to that data reaching back-end storage that is different from the front-end log-based storage. The technique further involves, after the front-end log deduplication service is activated, receiving new data in the front-end log-based storage. The technique further involves, providing the front-end log deduplication service to perform a data deduplication operation on the new data while the new data resides within the front-end log-based storage. The technique further involves, after the data deduplication operation is performed on the new data, updating the back-end storage to indicate storage of the new data within the back-end storage.
Technique manage data within computerized memory. The techniques involve, in response to receiving host data in a write cache, updating a data order log that holds order information indicating a temporal order for the host data. The temporal order initially is the order that the host data was received in the write cache. The techniques further involve transferring the host data from the write cache to secondary storage. The techniques further involve, after the host data is transferred from the write cache to secondary storage, providing a garbage collection service that consolidates the host data within the secondary storage in accordance with the data order log that holds the order information indicating the temporal order for the host data. With the temporal order of the host data generally preserved, data access operations may enjoy various optimizations such as improved prefetching, more sequential reads, improved auto-tiering, and so on.
A system that manages an object storage may include frontend micro-services and backend micro-services. The frontend micro-services may obtain a request to store data in an object storage and divide the data into slices. The backend micro-services may generate a sketch of each slice, match each slice to a similarity group of a plurality of similarity groups, obtain meta-data associated with each matched similarity group, and add at least a portion of a slice of the slices to a compression region using the meta-data.
A system for managing an object storage includes frontend micro-services and backend micro-services. The frontend micro-services obtain a request to store data in an object storage; divide the data into slices; send a slice analysis request, based on a slice of the slices, to the backend micro-services; obtain, from the plurality of backend micro-services, a list of segments of the slice that are not stored in the object storage; and add a segment specified by the list of segments to a compression region. The backend micro-services identify segments of the slice specified by the slice analysis request that are not stored in the object storage and generate the list of segments of the slice based on the identified segments.
Techniques for performing storage tiering in a data storage system taking into account the write endurance of flash drives and the frequencies with which data are written to storage extents in a data storage system are disclosed. Such storage tiering tends to maximize data temperature of a flash tier by selecting hot extents for placement thereon, but subject to a constraint that doing so does not cause flash drives in the flash tier to operate beyond their endurance levels.
An apparatus in one embodiment comprises an ingestion manager, a plurality of ingestion engines associated with the ingestion manager, and an analytics platform configured to receive data from the ingestion engines under the control of the ingestion manager. The ingestion manager is configured to interact with one or more of the ingestion engines in conjunction with providing data to a given one of a plurality of analytics workspaces of the analytics platform. For example, the analytics workspaces of the analytics platform are illustratively configured to receive data from respective potentially disjoint subsets of the ingestion engines under the control of the ingestion manager. Additionally or alternatively, the ingestion manager may be configured to implement data-as-a-service functionality for one or more of the analytics workspaces of the analytics platform.
A technique for managing cache in a data storage system is disclosed. Data storage system cache memory is arranged into multiple input/output (IO) cache macroblocks, where a first set of IO cache macroblocks are configured as compressed IO cache macroblocks, each compressed IO cache macroblock storing a plurality of variable sized compressed IO data blocks, and a second set of IO cache macroblocks are configured as non-compressed IO cache macroblocks, each non-compressed IO cache macroblock storing a plurality of fixed sized non-compressed IO data blocks. A write request is receive at the data storage system. If the IO data associated with the write request is determined to be compressible, the IO data is compressed in-line and written to an IO data block in a compressed IO cache macroblock, otherwise non-compressed IO data is written to an IO data block in a non-compressed IO cache macroblock.