A non-autoregressive transformer model is improved to maintain output quality while reducing a number of iterative applications of the model by training parameters of a student model based on a teacher model. The teacher model is applied several iterations to a masked output and a student model is applied one iteration, such that the respective output token predictions for the masked positions can be compared and a loss propagated to the student. The loss may be based on token distributions rather than the specific output tokens alone, and may additionally consider hidden state losses. The teacher model may also be updated for use in further training based on the updated model, for example, by updating its parameters as a moving average.
A model evaluation system evaluates the extent to which privacy-aware training processes affect the direction of training gradients for groups. A modified differential-privacy ("DP") training process provides per-sample gradient adjustments with parameters that may be adaptively modified for different data batches. Per-sample gradients are modified with respect to a reference bound and a clipping bound. A scaling factor may be determined for each per-sample gradient based on the higher of the reference bound or a magnitude of the per-sample gradient. Per-sample gradients may then be adjusted based on a ratio of the clipping bound to the scaling factor. A relative privacy cost between groups may be determined as excess training risk based on a difference in group gradient direction relative to an unadjusted batch gradient and the adjusted batch gradient according to the privacy-aware training.
Probability density modeling, such as for generative modeling, for data on a manifold of a high-dimensional space is performed with an implicitly-defined manifold such that points belonging to the manifold is the zero set of a manifold-defining function. An energy function is trained to learn an energy function that, evaluated on the manifold, describes a probability density for the manifold. As such, the relevant portions of the energy function are "filtered through" the defined manifold for training and in application. The combined energy function and manifold-defining function provide an "energy-based implicit manifold" that can more effectively model probability densities of a manifold in the high-dimensional space. As the manifold-defining function and the energy function are defined across the high-dimensional space, they may more effectively learn geometries and avoid distortions due to change in dimension that occur for models that model the manifold in a lower-dimensional space.
A computer model is trained to account for data samples in a high-dimensional space as lying on different manifolds, rather than a single manifold to represent the data set, accounting for the data set as a whole as a union of manifolds. Different data samples that may be expected to belong to the same underlying manifold are determined by grouping the data. For generative models, a generative model may be trained that includes a sub-model for each group trained on that group's data samples, such that each sub-model can account for the manifold of that group. The overall generative model includes information describing the frequency to sample from each sub-model to correctly represent the data set as a whole in sampling. Multi-class classification models may also use the grouping to improve classification accuracy by weighing group data samples according to the estimated latent dimensionality of the group.
Model training systems collaborate on model training without revealing respective private data sets. Each private data set learns a set of client weights for a set of computer models that are also learned during training. Inference for a particular private data set is determined as a mixture of the computer model parameters according to the client weights. During training, at each iteration, the client weights are updated in one step based on how well sampled models represent the private data set. In another step, gradients are determined for each sampled model and may be weighed according to the client weight for that model, relatively increasing the gradient contribution of a private data set for model parameters that correspond more highly to that private data set.
A text-video recommendation model determines relevance of a text to a video in a text-video pair (e.g., as a relevance score) with a text embedding and a text-conditioned video embedding. The text-conditioned video embedding is a representation of the video used for evaluating the relevance of the video to the text, where the representation itself is a function of the text it is evaluated for. As such, the input text may be used to weigh or attend to different frames of the video in determining the text-conditioned video embedding. The representation of the video may thus differ for different input texts for comparison. The text-conditioned video embedding may be determined in various ways, such as with a set of the most-similar frames to the input text (the top-k frames) or may be based on an attention function based on query, key, and value projections.
G06F 16/783 - Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
G06F 16/71 - Indexing; Data structures therefor; Storage structures
G06V 20/40 - Scenes; Scene-specific elements in video content
A model evaluation system evaluates the effect of a feature value at a particular time in a time-series data record on predictions made by a time-series model. The time-series model may make predictions with black-box parameters that can impede explainability of the relationship between predictions for a data record and the values of the data record. To determine the relative importance of a feature occurring at a time and evaluated at an evaluation time, the model predictions are determined on the unmasked data record at the evaluation time and on the data record with feature values masked within a window between the time and the evaluation time, permitting comparison of the evaluation with the features and without the features. In addition, the contribution at the initial time in the window may be determined by comparing the score with another score determined by masking the values except for the initial time.
G16H 50/30 - ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for individual health risk assessment
8.
CORRECTING MANIFOLD OVERFITTING OF PROBABILISTIC MODELS
To effectively learn a probability density from a data set in a high-dimensional space without manifold overfitting, a computer model first learns an autoencoder model that can transform data from a high-dimensional space to a low-dimensional space, and then learns a probability density model that may be effectively learned with maximum-likelihood. By separating these components, different types of models can be employed for each portion (e.g., manifold learning and density learning) and permits effective modeling of high-dimensional data sets that lie along a manifold representable with fewer dimensions, thus effectively learning both the density and the manifold and permitting effective data generation and density estimation.
A model training system protects data leakage of private data in a federated learning environment by training a private model in conjunction with a proxy model. The proxy model is trained with protections for the private data and may be shared with other participants. Proxy models from other participants are used to train the private model, enabling the private model to benefit from parameters based on other models' private data without privacy leakage. The proxy model may be trained with a differentially private algorithm that quantifies a privacy cost for the proxy model, enabling a participant to measure the potential exposure of private data and drop out. Iterations may include training the proxy and private models and then mixing the proxy models with other participants. The mixing may include updating and applying a bias to account for the weights of other participants in the received proxy models.
An autoencoder model includes an encoder portion and a decoder portion. The encoder encodes an input token sequence to an input sequence representation that is decoded by the decoder to generate an output token sequence. The autoencoder model may decode multiple output tokens in parallel, such that the decoder may be applied iteratively. The decoder may receive an output estimate from a prior iteration to predict output tokens. To improve positional representation and reduce positional errors and repetitive tokens, the autoencoder may include a trained layer for combining token embeddings with positional encodings. In addition, the model may be trained with a corrective loss based on output predictions when the model receives a masked input as the output estimate.
A recommendation system generates item recommendations for a user based on a distance between a user embedding and item embeddings. To train the item and user embeddings, the recommendation system user-item pairs as training data to focus on difficult items based on the positive and negative items with respect to individual users in the training set. In training, the weight of individual user-item pairs in affecting the user and item embeddings may be determined based on the distance of the particular user-item pair between user embedding and item embedding, as well as the comparative distance for other items of the same type for that user and for the distance of user-item pairs for other users, which may regulate the distances across types and across the training batch.
An object detection model and relationship prediction model are jointly trained with parameters that may be updated through a joint backbone. The offset detection model predicts object locations based on keypoint detection, such as a heatmap local peak, enabling disambiguation of objects. The relationship prediction model may predict a relationship between detected objects and be trained with a joint loss with the object detection model. The loss may include terms for object connectedness and model confidence, enabling training to focus first on highly-connected objects and later on lower-confidence items.
G06V 10/44 - Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
G06V 10/46 - Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
G06V 10/77 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
13.
SYSTEM AND METHOD FOR DETERMINING EXPECTED LOSS USING A MACHINE LEARNING FRAMEWORK
A computing device for predicting an expected loss for a set of claim transactions is provided. The computing device predicts, at a first machine learning model, a claim frequency of the set of claim transactions over a given time period and trained using historical frequency data and based on a segment type defining a type of claim, each type of segment having peril types. The computing device also predicts, at a second machine learning model, claim severity of the set of claim transactions during the given time period, the second machine learning model trained using historical severity data and based on the segment type and the corresponding peril types. The computing device then determines the expected loss for the set of claim transactions over the given time period by applying a product of prediction of the first machine learning model and the second machine learning model.
A model visualization system analyzes model behavior to identify clusters of data instances with similar behavior. For a selected feature, data instances are modified to set the selected feature to different values evaluated by a model to determine corresponding model outputs. The feature values and outputs may be visualized in an instance-feature variation plot. The instance-feature variation plots for the different data instances may be clustered to identify latent differences in behavior of the model with respect to different data instances when varying the selected feature. The number of clusters for the clustering may be automatically determined, and the clusters may be further explored by identifying another feature which may explain the different behavior of the model for the clusters, or by identifying outlier data instances in the clusters.
A computer models a high-dimensional data with a low-dimensional manifold in conjunction with a low-dimensional base probability density. A first transform (a manifold transform) may be used to transform the high-dimensional data to a low-dimensional manifold, and a second transform (a density transform) may be used to transform the low-dimensional manifold to a low-dimensional probability distribution. To enable the model to tractably learn the manifold transformation from the high-dimensional to low-dimensional spaces, the manifold transformation includes conformal flows, which simplify the probabilistic volume transform and enables tractable learning of the transform. This may also allow the manifold transform to be jointly learned with density transform.
A video localization system localizes actions in videos based on a classification model and an actionness model. The classification model is trained to make predictions of which segments of a video depict an action and to classify the actions in the segments. The actionness model predicts whether any action is occurring in each segment, rather than predicting a particular type of action. This reduces the likelihood that the video localization system over-relies on contextual information in localizing actions in video. Furthermore, the classification model and the actionness model are trained based on weakly-labeled data, thereby reducing the cost and time required to generate training data for the video localization system.
G06V 10/764 - Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
17.
PREDICTING OCCURRENCES OF FUTURE EVENTS USING TRAINED ARTIFICIAL-INTELLIGENCE PROCESSES AND NORMALIZED FEATURE DATA
In some examples, computer-implemented systems and processes facilitate a prediction of occurrences of future events using trained artificial intelligence processes and normalized feature data. For instance, an apparatus may generate an input dataset based on elements of interaction data that characterize an occurrence of a first event during a first temporal interval, and that include at least one element of normalized data. Based on an application of a trained artificial intelligence process to the input dataset, the apparatus may generate output data representative of a predicted likelihood of an occurrence of a second event associated with during a second temporal interval. The apparatus may also transmit at least a portion of the output data to a computing system, which may perform operations consistent with the portion of the output data.
G06Q 10/04 - Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
G06Q 40/02 - Banking, e.g. interest calculation or account maintenance
18.
PREDICTING TARGETED, AGENCY-SPECIFIC RECOVERY EVENTS USING ADAPTIVELY TRAINED ARTIFICIAL-INTELLIGENCE PROCESSES
The disclosed embodiments include computer-implemented systems and methods that predicts targeted, agency-specific recovery events using a trained machine-learning or artificial-intelligence processes. For example, an apparatus may generate an input dataset based on elements of interaction data associated with an occurrence of a first event. Based on an application of a trained artificial intelligence process to the input dataset, the apparatus may generate elements of output data indicative of an expected occurrence of a corresponding one of a plurality of targeted second events involving each of a plurality of candidate event assignments during a future temporal interval. The apparatus may transmit at least a portion of the generated output data to a computing system via the communications interface, the computing system may perform operations that assign the first event to a corresponding one of the candidate event assignments based on the elements of output data.
The disclosed embodiments include computer-implemented systems and processes that predict activity-specific engagement events using trained artificial-intelligence processes. For example, an apparatus may generate an input dataset based on elements of first interaction data associated with an activity and a first temporal interval. Based on an application of a trained artificial intelligence process to the input dataset, the apparatus may generate output data representative of a predicted likelihood of an occurrence of an engagement event associated with a cessation of the activity during a second temporal interval, which may be disposed subsequent to the first temporal interval and separated from the first temporal interval by a corresponding buffer interval. The apparatus may transmit at least a portion of the generated output data to a computing system, which may perform operations based on the portion of the output data.
G06Q 10/04 - Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
G06Q 40/02 - Banking, e.g. interest calculation or account maintenance
20.
PREDICTING TARGETED REDEMPTION EVENTS USING TRAINED ARTIFICIAL-INTELLIGENCE PROCESSES
The disclosed embodiments include computer-implemented systems and methods that facilitate a prediction of future occurrences of redemption events using adaptively trained artificial intelligence processes. For example, an apparatus may generate an input dataset based on elements of first interaction data associated with a first temporal interval. Based on an application of a trained artificial intelligence process to the input dataset, the apparatus may generate output data representative of a predicted likelihood of an occurrence of each of a plurality of targeted events during a second temporal interval. The apparatus may also transmit at least a portion of the output data and explainability data associated with the trained artificial intelligence process to a computing system, which may perform operations based on the portion of the output data and the explainability data.
The disclosed embodiments include computer-implemented processes that predict service-specific attrition events using trained artificial-intelligence processes. For example, an apparatus may generate an input dataset based on elements of first interaction data associated with a first temporal interval. The elements of first interaction data includes an element of geographic data or an element of engagement data. Based on an application of a trained artificial-intelligence process to the input dataset, the apparatus may generate output data representative of a predicted likelihood of an occurrence of an attrition event during a second temporal interval that is subsequent to the first temporal interval, and separated from the first temporal interval by a corresponding buffer interval. The apparatus may transmit at least a portion of the generated output data to a computing system, which may perform operations based on the portion of the output data.
G06Q 10/04 - Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
G06Q 40/02 - Banking, e.g. interest calculation or account maintenance
22.
PREDICTING OCCURRENCES OF TARGETED ATTRITION EVENTS USING TRAINED ARTIFICIAL-INTELLIGENCE PROCESSES
The disclosed embodiments include computer-implemented systems and processes that predict occurrences of targeted attrition events using trained artificial-intelligence processes. For example, an apparatus may generate an input dataset based on elements of first interaction data associated with a targeted participant during a first temporal interval. Based on an application of a trained artificial-intelligence process to the input dataset, the apparatus may generate output data representative of a predicted likelihood of an occurrence of an attrition event involving the targeted participant during a second temporal interval that is disposed subsequent to the first temporal interval, and that is separated from the first temporal interval by a buffer interval. The apparatus may transmit at least a portion of the generated output data and explainability data associated with the trained artificial-intelligence process to a computing system, which may perform operations based on the portion of the output data and the explainability data.
G06Q 10/04 - Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
G06Q 40/02 - Banking, e.g. interest calculation or account maintenance
23.
PREDICTING PRODUCT-SPECIFIC EVENTS DURING TARGETED TEMPORAL INTERVALS USING TRAINED ARTIFICIAL-INTELLIGENCE PROCESSES
The disclosed embodiments relate to computer-implemented systems and processes that facilitate a prediction of occurrences of product-specific events during targeted temporal intervals using trained artificial intelligence processes. For example, an apparatus may generate an input dataset based on elements of first interaction data associated with an occurrence of a first event. Based on an application of a trained artificial intelligence process to the input dataset, the apparatus may generate an element of output data representative of a predicted likelihood of an occurrence of each of a plurality of second events during a target temporal interval associated with the first event. The apparatus may also transmit the elements of output data to a computing system, which may perform operations that are consistent with the elements of output data.
G06Q 10/04 - Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
G06Q 40/02 - Banking, e.g. interest calculation or account maintenance
24.
PREDICTING FUTURE EVENTS OF PREDETERMINED DURATION USING ADAPTIVELY TRAINED ARTIFICIAL-INTELLIGENCE PROCESSES
The disclosed embodiments include computer-implemented systems and methods that dynamically predict future occurrences of events using adaptively trained machine-learning or artificial-intelligence processes. For example, an apparatus may generate an input dataset based on elements of interaction data associated with an extraction interval. Based on an application of a trained artificial intelligence process to the input dataset, the apparatus may generate output data representative of a predicted likelihood of an occurrence of a first event during a first portion of a target interval, which may be separated from the extraction interval by a second portion of the target interval. The first event may be associated with a predetermined temporal duration within the first portion of the target interval. The apparatus may transmit a portion of the generated output data to a computing system, and the computing system may be configured to perform operations based on the portion of the output data.
G06Q 10/04 - Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
G06F 17/00 - Digital computing or data processing equipment or methods, specially adapted for specific functions
A computing device configured to communicate with a central server in order to predict likelihood of fraud in current transactions for a target claim. The computing device then extracts from information stored in the central server (relating to the target claim and past transactions for past claims including those marked as fraud), a plurality of distinct sets of features: text-based features derived from the descriptions of communications between the requesting device and the endpoint device, graph-based features derived from information relating to a network of claims and policies connected through shared information, and tabular features derived from the details related to claim information and exposure details. The features are input into a machine learning model for generating a likelihood of fraud in the current transactions and triggering an action based on the likelihood of fraud (e.g. stopping subsequent related transactions to the target claim).
A cumulative accessibility estimation (CAE) system estimates the probability that an agent will reach a goal state within a time horizon to determine which actions the agent should take. The CAE system receives agent data from an agent and estimates the probability that the agent will reach a goal state within a time horizon based on the agent data. The CAE system may use a CAE model that is trained to estimate a cumulative accessibility function to estimate the probability that the agent will reach the goal state within the time horizon. The CAE system may use the CAE model to identify an optimal action for the agent based on the agent data. The CAE system may then transmit the optimal action to the agent for the agent to perform.
The disclosed embodiments include computer-implemented apparatuses and processes that dynamically predict future occurrences of targeted classes of events using adaptively trained machine-learning or artificial-intelligence processes. For example, an apparatus may generate an input dataset based on interaction data associated with a prior temporal interval, and may apply a trained, gradient-boosted, decision-tree process to the input dataset. Based on the application of the trained, gradient-boosted, decision-tree process to the input dataset, the apparatus may generate output data representative of an expected occurrence of a corresponding one of a plurality of targeted events during a future temporal interval, which may be separated from the prior temporal interval by a corresponding buffer interval. The apparatus may also transmit a portion of the generated output data to a computing system, and the computing system may transmit digital content to a device associated with the expected occurrence based on the portion of the output data.
A recommendation system generates recommendations for user-item pairs based on embeddings in hyperbolic space. Each user and item may be associated with a local hyperbolic embedding representing the user or item in hyperbolic space. The hyperbolic embedding may be modified by neighborhood information. Because the hyperbolic space may have no closed form for combining neighbor information, the local embedding may be converted to a tangent space for neighborhood aggregation information and converted back to hyperbolic space for a neighborhood-aware embedding to be used in the recommendation score.
H04N 21/25 - Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication or learning user preferences for recommending movies
29.
PREDICTION OF FUTURE OCCURRENCES OF EVENTS USING ADAPTIVELY TRAINED ARTIFICIAL-INTELLIGENCE PROCESSES
The disclosed embodiments include computer-implemented apparatuses and processes that dynamically predict future occurrences of events using adaptively trained machine-learning or artificial-intelligence processes. For example, an apparatus may generate an input dataset based on first interaction data associated with a prior temporal interval, and may apply an adaptively trained, gradient-boosted, decision-tree process to the input dataset. Based on the application of the adaptively trained, gradient-boosted, decision-tree process to the input dataset, the apparatus may generate output data representative of a predicted likelihood of an occurrence of an event during a future temporal interval, which may be separated from the prior temporal interval by a corresponding buffer interval. The apparatus may also transmit a portion of the generated output data to a computing system, and the computing system may be configured to generate or modify second interaction data based on the portion of the output data.
G06Q 10/04 - Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
G06Q 40/02 - Banking, e.g. interest calculation or account maintenance
30.
PREDICTING TARGETED FUTURE ENGAGEMENT USING TRAINED ARTIFICIAL INTELLIGENCE PROCESSES
The disclosed embodiments include computer-implemented processes that determine, in real time, a likelihood of a targeted future engagement using trained artificial intelligence processes. For example, an apparatus may generate a first input dataset based on elements of first interaction data associated with a first temporal interval, and based on an application of a trained first artificial intelligence process to the first input dataset, generate output data representative of a predicted likelihood of an occurrence of each of a plurality of target events during a second temporal interval. The second temporal interval is subsequent to the first temporal interval and is separated from the first temporal interval by a corresponding buffer interval. Further, the apparatus may transmit at least a portion of the output data to a computing system, which may generate notification data associated with the predicted likelihood, and provision the notification data to a device.
G06Q 10/04 - Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
The disclosed embodiments include computer-implemented apparatuses and processes that dynamically predict future occurrences of events using adaptively trained artificial-intelligence processes and contextual data. For example, an apparatus may generate an input dataset based on first interaction data and contextual data associated with a prior temporal interval, and may apply an adaptively trained, gradient-boosted, decision-tree process to the input dataset. Based on the application of the adaptively trained, gradient-boosted, decision-tree process to the input dataset, the apparatus may generate output data representative of a predicted likelihood of an occurrence of an event during a future temporal interval, which may be separated from the prior temporal interval by a corresponding buffer interval. The apparatus may also transmit a portion of the generated output data to a computing system, and the computing system may be configured to generate or modify second interaction data based on the portion of the output data.
G06Q 10/04 - Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
G06Q 40/02 - Banking, e.g. interest calculation or account maintenance
32.
PREDICTING OCCURRENCES OF TEMPORALLY SEPARATED EVENTS USING ADAPTIVELY TRAINED ARTIFICIAL-INTELLIGENCE PROCESSES
The disclosed embodiments include computer-implemented apparatuses and methods that predict occurrences of temporally separated events using adaptively trained artificial intelligence processes. For example, an apparatus may generate an input dataset based on first interaction data that characterizes an occurrence of a first event, and may apply a trained artificial intelligence process to the input dataset. Based on the application of the trained artificial intelligence process to the input dataset, the apparatus may generate output data representative of a predicted likelihood of an occurrence of a second event within a predetermined time period subsequent to the occurrence of the first event, and may transmit the output data to a computing system. The computing system may generate second interaction data specifying an operation associated with the occurrence of the first event based on the output data, and perform the operation in accordance with the second interaction data.
G06Q 10/04 - Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
G06Q 40/02 - Banking, e.g. interest calculation or account maintenance
33.
DYNAMIC ANALYSIS AND MONITORING MACHINE LEARNING PROCESSES
The disclosed embodiments include computer-implemented processes that flexibly and dynamically analyze a machine learning process, and that generate analytical output characterizing an operation of the machine learning process across multiple analytical periods. For example, an apparatus may receive an identifier of a dataset associated with the machine learning process and feature data that specifies an input feature of the machine learning process. The apparatus may access at least a portion of the dataset based on the received identifier, and obtain, from the accessed portion of the dataset, a feature vector associated with the machine learning process. The apparatus may generate a plurality of modified feature vectors based on the obtained feature vector, and based on an application of the machine learning process to the obtained and modified feature vectors, generate and transmit, to a device, first explainability data associated with the specified input feature for presentation within a digital interface.
A recommendation system uses a trained two-headed attention fused autoencoder to generate likelihood scores indicating a likelihood that a user will interact with a content item if that content item is suggested or otherwise presented to the user. The autoencoder is trained to jointly learn features from two sets of training data, including user review data and implicit feedback data. One or more fusion stages generate a set of fused feature representations that include aggregated information from both the user reviews and user preferences. The fused feature representations are inputted into a preference decoder for making predictions by generating a set of likelihood scores. The system may train the autoencoder by including an additional NCE decoder that further helps with reducing popularity bias. The trained parameters are stored and used in a deployment process for making predictions, where only the reconstruction results from the preference decoder are used as predictions.
An online system trains a transformer architecture by an initialization method which allows the transformer architecture to be trained without normalization layers of learning rate warmup, resulting in significant improvements in computational efficiency for transformer architectures. Specifically, an attention block included in an encoder or a decoder of the transformer architecture generates the set of attention representations by applying a key matrix to the input key, a query matrix to the input query, a value matrix to the input value to generate an output, and applying an output matrix to the output to generate the set of attention representations. The initialization method may be performed by scaling the parameters of the value matrix and the output matrix with a factor that is inverse to a number of the set of encoders or a number of the set of decoders.
A content retrieval system uses a graph neural network architecture to determine images relevant to an image designated in a query. The graph neural network learns a new descriptor space that can be used to map images in the repository to image descriptors and the query image to a query descriptor. The image descriptors characterize the images in the repository as vectors in the descriptor space, and the query descriptor characterizes the query image as a vector in the descriptor space. The content retrieval system obtains the query result by identifying a set of relevant images associated with image descriptors having above a similarity threshold with the query descriptor.
A modeling system trains a recurrent machine-learned model by determining a latent distribution and a prior distribution for a latent state. The parameters of the model are trained based on a divergence loss that penalizes significant deviations between the latent distribution the prior distribution. The latent distribution for a current observation is a distribution for the latent state given a value of the current observation and the latent state for the previous observation. The prior distribution for a current observation is a distribution for the latent state given the latent state for the previous observation independent of the value of the current observation, and represents a belief about the latent state before input evidence is taken into account.
An image retrieval system receives an image for which to identify relevant images from an image repository. Relevant images may be of the same environment or object and features and other characteristics. Images in the repository are represented in an image retrieval graph by a set of image nodes connected by edges to other related image nodes with edge weights representing the similarity of the nodes to each other. Based on the received image, the image traversal system identifies an image in the image retrieval graph and alternatively explores and traverses (also termed "exploits") the image nodes with the edge weights. In the exploration step, image nodes in an exploration set are evaluated to identify connected nodes that are added to a traversal set of image nodes. In the traversal step, the relevant nodes in the traversal set are added to the exploration set and a query result set.
A recommendation system models unknown preferences as samples from a noise distribution to generate recommendations for an online system. Specifically, the recommendation system obtains latent user and item representations from preference information that are representations of users and items in a lower-dimensional latent space. A recommendation for a user and item with an unknown preference can be generated by combining the latent representation for the user with the latent representation for the item. The latent user and item representations are learned to discriminate between observed interactions and unobserved noise samples in the preference information by increasing estimated predictions for known preferences in the ratings matrix, and decreasing estimated predictions for unobserved preferences sampled from the noise distribution.
H04N 21/466 - Learning process for intelligent management, e.g. learning user preferences for recommending movies
G06Q 30/06 - Buying, selling or leasing transactions
H04N 21/258 - Client or end-user data management, e.g. managing client capabilities, user preferences or demographics or processing of multiple end-users preferences to derive collaborative data
H04N 21/482 - End-user interface for program selection
40.
LEARNING DOCUMENT EMBEDDINGS WITH CONVOLUTIONAL NEURAL NETWORK ARCHITECTURES
A document analysis system trains a document embedding model configured to receive a set of word embeddings for an ordered set of words in a document and generate a document embedding for the document. The document embedding is a representation of the document in a latent space that characterizes the document with respect to properties such as structure, content, and sentiment. The document embedding may represent a prediction of a set of words that follow the last word in the ordered set of words of the document. The document embedding model may be associated with a convolutional neural network( CNN) architecture that includes one or more convolutional layers. The CNN architecture of the document embedding model allows the document analysis system to overcome various difficulties of existing document embedding models, and allows the document analysis system to easily process variable-length documents that include a variable number of words.
A recommendation system generates recommendations for an online system using one or more neural network models that predict preferences of users for items in the online system. The neural network models generate a latent representation of a user and of a user that can be combined to determine the expected preference of the user to the item. By using neural network models, the recommendation system can generate predictions in real-time for new users and items without the need to re-calibrate the models. Moreover, the recommendation system can easily incorporate other forms of information other than preference information to generate improved preference predictions by including the additional information to generate the latent description of the user or item.
G06Q 30/02 - Marketing; Price estimation or determination; Fundraising
G06F 19/00 - Digital computing or data processing equipment or methods, specially adapted for specific applications (specially adapted for specific functions G06F 17/00;data processing systems or methods specially adapted for administrative, commercial, financial, managerial, supervisory or forecasting purposes G06Q;healthcare informatics G16H)