Methods and apparatuses for encoding/decoding an image or a video using neural network are disclosed. In some embodiments, side-information is decoded from a bitstream that allows for adapting a first neural network-based decoder, the decoded side-information and coded data representative of an image or a video obtained from the bitstream or a separate bitstream are provided as inputs to the first neural-network-based decoder and a reconstructed image or video is obtained from an output of the first neural network-based decoder.
Video encoding and decoding is implemented with auto encoders using luminance information to derive motion information for chrominance prediction. In one embodiment YUV 4:2:0 video is encoded and decoded in which luminance information is downsampled to generate predictions from chrominance components of a reference frame. In a related embodiment, more than one reference frame is used for predictions. In another embodiment, convolutions and transpose convolutions implement derivation of motion information.
H04N 19/537 - Motion estimation other than block-based
H04N 19/573 - Motion compensation with multiple frame prediction using two or more reference frames in a given prediction direction
H04N 19/58 - Motion compensation with long-term prediction, i.e. the reference frame for a current frame not being the temporally closest one
H04N 19/82 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals - Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation involving filtering within a prediction loop
H04N 19/132 - Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
H04N 19/177 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a group of pictures [GOP]
H04N 19/186 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
H04N 19/52 - Processing of motion vectors by encoding by predictive encoding
3.
TEMPORAL STRUCTURE-BASED CONDITIONAL CONVOLUTIONAL NEURAL NETWORKS FOR VIDEO COMPRESSION
Video encoding and decoding is implemented with auto encoders using luminance information to derive motion information for chrominance prediction. In one embodiment conditional convolutions are used to encode motion flow information. A current condition, for example, GOP structure, is used as input to a succession of fully connected layers to implement the conditional convolution. In a related embodiment, more than one reference frame is used to encode motion flow information.
H04N 19/537 - Motion estimation other than block-based
H04N 19/573 - Motion compensation with multiple frame prediction using two or more reference frames in a given prediction direction
H04N 19/58 - Motion compensation with long-term prediction, i.e. the reference frame for a current frame not being the temporally closest one
H04N 19/82 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals - Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation involving filtering within a prediction loop
4.
LEARNED VIDEO COMPRESSION FRAMEWORK FOR MULTIPLE MACHINE TASKS
Processing of a compressed representation of a video signal is optimized for multiple tasks, such as object detection, viewing of displayed video, or other machine tasks. In one embodiment, multiple analysis stages and a single synthesis is performed as part of a coding/decoding operation with training of an encoder side analysis and, optionally, a corresponding machine task. In another embodiment, multiple synthesis operations are performed on the decoding side, so that respective analysis, synthesis, and task stages are optimized. Other embodiments comprise feeding decoded feature maps to tasks, predictive coding, and using hyperprior-based models.
A processing module, or connector, adapts an output of a codec, or a decoded output, to a form suitable for an alternate task. In one embodiment, the output of a codec is used for a machine task and the connector adapts this output to a form suitable for a video display. In another embodiment, metadata accompanies the codec output, which can instruct the connector how to adapt the codec output for an alternate task. In other embodiments, the processing module performs averaging over a NxM window, or convolution.
A method and apparatus include receiving a timed‑metadata track identifying point cloud tiles corresponding to one or more spatial regions within a point cloud scene. A decoding device determines one or more point cloud tiles to be used for rendering an image. One or more geometry tile tracks are retrieved, via a communications network, corresponding to the determined one or more point cloud tiles. Each geometry tile track comprises point cloud geometry data for a respective tile. The retrieved geometry tile tracks are processed
H04N 21/218 - Source of audio or video content, e.g. local disk arrays
H04N 21/44 - Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to MPEG-4 scene graphs
In an embodiment, an adaptive streaming client may be configured to receive a media presentation description (MPD) of a V3C content. The MPD may comprise a plurality of adaptation set elements, where different adaptation set elements may be associated with different spatial regions of the V3C content. The adaptive streaming client may be further configured to select an adaptation set element associated with a spatial region for requesting at least one media file corresponding to the selected adaptation set element.
H04N 19/597 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
The disclosure relates, according to a first aspect, to a method for compressing data including encoding at least one information representative of a use, during the compression, of a compressed sparse format. The disclosure relates, according to a second aspect, to a method for decompressing input data comprising obtaining information representative of zero or non-zero values in at least a part of the input data, and using only the non-zero values of the zero or non-zero values for a further processing of the part of the input data, based on the representative information. Corresponding devices, system, non-transitory program product, computer storage medium and signal are also disclosed.
Apparatus and methods for implementing a real-time Versatile Video Coding (VVC) decoder use multiple threads to address the limitation with existing parallelization techniques and fully utilizes the available CPU computation resource without compromising on the coding efficiency. The proposed Multi-threaded (MT) framework uses CTU level parallel processing techniques without compromising on the memory bandwidth. Picture level parallel processing separates the sequence into temporal levels by considering the picture's referencing hierarchy. Embodiments are provided using various optimization techniques to achieve real-time VVC decoding on heterogenous platforms with multi-core CPUs, for those bitstreams generated using a VVC reference encoder with a default configuration.
H04N 19/436 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals - characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation using parallelised computational arrangements
H04N 19/91 - Entropy coding, e.g. variable length coding [VLC] or arithmetic coding
H04N 19/127 - Prioritisation of hardware or computational resources
H04N 19/157 - Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
Systems, methods, and instrumentalities are disclosed that relate to the processing of a media container file associated with 3D video data. The media container file may indicate that certain video-based point cloud compression (V-PCC) component tracks may be played together as a playout group. These V-PCG component tracks may represent respective encoded versions of one or more V-PCC components, and a video decoding device may play the tracks together in response to determining that the tracks belong to the same playout track group. The video decoding device may also determine from the media container file that certain PCC component tracks include tile groups that correspond to different objects in a point cloud or different parts of a same object in the point cloud. The video decoding device may decode these tile groups independently from each other so that a subset of the objects or parts of the point cloud may be accessed without also accessing the rest of the objects or parts.
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
H04N 21/4402 - Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
H04N 21/2343 - Processing of video elementary streams, e.g. splicing of video streams or manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
An apparatus may be configured to determine a reference picture listed in a first reference picture list and a reference picture listed in a second reference picture list, for a coding block. The apparatus may be configured to determine whether to perform bi-directional optical flow (BDOF) for the coding block based at least in part on whether a distance between a picture associated with the coding block and the reference picture listed in the first reference picture list differs from a distance between the picture associated with the coding block and the reference picture listed in the second reference picture list. The apparatus may be configured to decode the coding block based on the determination of whether to perform BDOF for the coding block.
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
H04N 19/577 - Motion compensation with bidirectional frame interpolation, i.e. using B-pictures
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
A filter may be applied to a subset of components associated with a sample in a coding block. The output of the filter may be used to modify values for other component(s). For example, a filter may be applied to a selected (for example, dominant) component(s). The output of the filter may be used to modify a value for one of the other components (for example, non-dominant components). The output of the filter may be used, for example, after a weighting factor is applied to the filter output, to modify a value for another one of the other components. A joint refinement signal may be obtained, for example, as the filtered output signal minus the filter input signal of the selected component(s). A properly weighted version of the joint refinement signal may be applied to modify the other components.
H04N 19/117 - Filters, e.g. for pre-processing or post-processing
H04N 19/136 - Incoming video signal characteristics or properties
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
H04N 19/186 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
H04N 19/82 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals - Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation involving filtering within a prediction loop
H04N 19/86 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving reduction of coding artifacts, e.g. of blockiness
Described herein are systems, methods, and instrumentalities associated with video coding. The signaling of certain syntax elements may be moved from a slice header to a picture header and/or a layer access unit delimiter (AUD). The dependency between AUD and one or more parameter sets may be explored. Syntax elements may be signaled to enable wrap-around motion compensation for certain sub-picture(s) and specify wrap-around motion compensation offsets for the sub-picture(s).
H04N 19/597 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
H04N 19/55 - Motion estimation with spatial constraints, e.g. at image or region borders
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
H04N 19/105 - Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
Systems, methods, and instrumentalities may be used for decoding and/or encoding a coding unit (CD), An intra-prediction mode for a CD may be determined. A split mode may be determined based on the intra-prediction mode, to generate a plurality of sub-partitions in the CU. A prediction for a first sub-partition of the plurality of sub-partitions in the CU may be based on a reference sample in a second sub-partition of the plurality of sub-partitions in the CU. The CU may be decoded and/or encoded, for example, based on the determined split mode.
H04N 19/119 - Adaptive subdivision aspects e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
H04N 19/105 - Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
H04N 19/157 - Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
H04N 19/82 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals - Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation involving filtering within a prediction loop
H04N 19/11 - Selection of coding mode or of prediction mode among a plurality of spatial predictive coding modes
15.
ADAPTIVE INTERPOLATION FILTER FOR MOTION COMPENSATION
A video processing apparatus may comprise one or more processors that are configured to determine an interpolation filter length for an interpolation filter associated with a coding unit (CU) based on a size of the CU. The one or more processor may be configured to determine an interpolated reference sample based on the determined interpolation filter length for the interpolation filter and a reference sample for the CU. The one or more processor may be configured to predict the CU based on the interpolated reference sample. For example, if a first CU has a size that is greater than the size of a second CU, the one or more processors may be configured to use a shorter interpolation filter for the first CU than for the second CU.
H04N 19/117 - Filters, e.g. for pre-processing or post-processing
H04N 19/157 - Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
H04N 19/523 - Motion estimation or motion compensation with sub-pixel accuracy
H04N 19/82 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals - Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation involving filtering within a prediction loop
16.
CLUSTERING-BASED QUANTIZATION FOR NEURAL NETWORK COMPRESSION
Systems, methods, and instrumentalities are disclosed for clustering-based quantization for neural network (NN) compression. A distribution of weights in weight tensors in NN layers may be analyzed to identify cluster outliers. Cluster inliers may be coded from cluster outliers, for example, using scalar and/or vector quantization. Weight-rearrangement may rearrange weights for higher dimensional weight tensors into lower dimensional matrices. For example, weight rearrangement may flatten a convolutional kernel into a vector. Correlation between kernels may be preserved, for example, by treating a filter or kernels across a channel as a point. A tensor may be split into multiple subspaces, for example, along an input and/or an output channel. Predictive coding may be performed for a current block of weights or weight matrix based on a reshaped or previously coded block or matrix. Arrangement, inlier, outlier, and/or prediction information may be signaled to a decoder for reconstruction of a compressed NN.
A method of encoding or decoding a video comprising a current picture, a first reference picture, and a weight tensor associated with a trained neural network (NN) model are provided. The method includes generating any number of kernel tensors, input channels and output channels associated with the weight tensor, each kernel tensor being associated with any of: a layer type, an input signal type, and a tree partition type, and each kernel tensor including weight coefficients, generating, for each of the any number of kernel tensors, tree partitions for any of a coding tree unit (CTU), a coding unit (CU), a prediction unit (PU), and a transform unit (TU) according to respective tree partition types associated with each of the any number of kernel tensors, and generating a compressed representation of the trained NN model by compressing and coding the any number of kernel tensors
A media content processing device may decode visual volumetric content based on one or more messages, which may indicate which attribute sub-bitstream of one or more attribute sub-bitstreams indicated in a parameter set is active, The parameter set may include a visual volumetric video-based parameter set. The message indicating one or more active attribute sub-bitstreams may be received by the decoder, A decoder may perform decoding, such as determining which attribute sub-bitstream to use for decoding visual media content, based on the one or more messages, The one or more messages may be generated and sent to a decoder, for example, to indicate the deactivation of the one or more attribute sub- bitstreams. The decoder may determine an inactive attribute sub-bitstream and skip the inactive attribute sub-bitstream for decoding the visual media content based on the one or more messages.
H04N 19/597 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
H04N 21/235 - Processing of additional data, e.g. scrambling of additional data or processing content descriptors
Systems and methods are described for refining motion compensated predictions in block-based video coding. In an example embodiment, motion-compensated prediction is used to generate predicted sample values in a current block of samples. A precision difference value and a motion vector refinement for the current block are signaled in the bitstream. For each sample in the current block, a spatial gradient is calculated at the sample, and a scalar product is calculated between the spatial gradient and the motion vector refinement. The scalar product is scaled (e.g. bit-shifted) by an amount indicated by the precision difference value to generate a sample difference value, and the sample difference value is added to the predicted sample value to generate a refined sample value.
H04N 19/105 - Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
H04N 19/147 - Data rate or code amount at the encoder output according to rate distortion criteria
H04N 19/182 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
H04N 19/523 - Motion estimation or motion compensation with sub-pixel accuracy
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
H04N 19/54 - Motion estimation other than block-based using feature points or meshes
H04N 19/577 - Motion compensation with bidirectional frame interpolation, i.e. using B-pictures
H04N 19/567 - Motion estimation based on rate distortion criteria
20.
BLOCK BOUNDARY PREDICTION REFINEMENT WITH OPTICAL FLOW
Systems, methods, and instrumentalities are disclosed for sub-block/block refinement, including sub-block/block boundary refinement, such as block boundary prediction refinement with optical flow (BBPROF). A block comprising a current sub-block may be decoded based on a sample value for a first pixel that is obtained based on, for example, an MV for a current sub-block, an MV for a sub-block adjacent the current sub-block, and a sample value for a second pixel adjacent the first pixel. BBPROF may include determining spatial gradients at pixel(s)/sample location(s). An MV difference may be calculated between a current sub-block and one or more neighboring sub-blocks. An MV offset may be determined at pixel(s)/sample location(s) based on the MV difference. A sample value offset for the pixel in a current sub-block may be determined. The prediction for a reference picture list may be refined by adding the calculated sample value offset to the sub-block prediction.
H04N 19/583 - Motion compensation with overlapping blocks
H04N 19/82 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals - Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation involving filtering within a prediction loop
H04N 19/86 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving reduction of coding artifacts, e.g. of blockiness
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
Systems, methods, and instrumentalities are disclosed herein that related to video-based point cloud streams in one or more ISO Base Media File Format (ISOBMFF) container files, A container format for point cloud data is provided and the container format indicates at least a relationship between a 3D region of the point cloud and one or more video-based point cloud compression (V-PCC) tracks. The V-PCC tracks may be grouped together and linked to the 3D region to allow spatial access to the 3D region.
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
H04N 21/4402 - Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
22.
CONTENT ADAPTIVE TRANSFORM PRECISION FOR VIDEO CODING
Systems, methods, and instrumentalities are disclosed for obtaining coded video data comprising quantized transform coefficients for a plurality of blocks, obtaining a first precision factor associated with a first block for performing at least one decoding function on the first block, obtaining a second precision factor associated with a second block for performing the at least one decoding function on the second block, and performing the at least one decoding function on the quantized transform coefficients for the first block using the first precision factor and on the quantized transform coefficients for the second block using the second precision factor.
H04N 19/126 - Quantisation - Details of normalisation or weighting functions, e.g. normalisation matrices or variable uniform quantisers
H04N 19/14 - Coding unit complexity, e.g. amount of activity or edge presence estimation
H04N 19/18 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a set of transform coefficients
H04N 19/60 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
23.
METHODS AND APPARATUS FOR PREDICTION REFINEMENT FOR DECODER SIDE MOTION VECTOR REFINEMENT WITH OPTICAL FLOW
Methods, devices, apparatus, systems, architectures and interfaces to improve motion vector (MV) refinement based sub-block (SB) level motion compensated prediction are provided. A decoding method includes receiving a bitstream of encoded video data, the bitstream including at least one block of video data including a plurality of SBs; performing a MV derivation, including a decoder based MV (DMVR) process, for at least one SB in the block to generate a refined MV for each SB; performing SB based motion compensation on the at least one sub-block to generate a SB based prediction within each SB; obtaining a spatial gradient for the prediction within each SB; determining a MV offset for each pixel in each SB; obtaining an intensity change in each SB based on the spatial gradients and MV offsets via an optical flow equation; and refining the prediction within each SB based on the obtained intensity changes.
H04N 19/44 - Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
H04N 19/136 - Incoming video signal characteristics or properties
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
H04N 19/109 - Selection of coding mode or of prediction mode among a plurality of temporal predictive coding modes
24.
INTER PREDICTION MEMORY ACCESS BANDWIDTH REDUCTION METHOD WITH OPTICAL FLOW COMPENSATION
Systems and methods are described for video coding. In some embodiments, inter prediction of a sample in a current block is performed by rounding an initial motion vector and determining a rounding error vector caused by the rounding. An unrefined prediction of the sample is generated using the rounded motion vector. Unrefined predictions are similarly generated for other samples in the current block. Based on the unrefined predictions, a spatial gradient is determined for each sample position in the block. A refined prediction is generated for each sample position by adding, to the unrefined prediction, a scalar product between the spatial gradient and the rounding error vector at the sample position. Example methods can reduce the number of reference pixels used to predict a current block and thus may reduce memory access bandwidth.
Systems and methods are described for video coding using adaptive Hadamard filtering of reconstructed blocks, such as coding units. In some embodiments, where Hadamard filtering might otherwise encompass samples outside the current coding unit, extrapolated samples are generated for use in the filtering. Reconstructed samples from neighboring blocks may be used in the filtering where available (e.g. in a line buffer). In some embodiments, different filter strengths are applied to different spectrum components in the transform domain. In some embodiments, filter strength is based on position of filtered samples within the block. In some embodiments, filter strength is based on the prediction mode used to code the current block.
H04N 19/82 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals - Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation involving filtering within a prediction loop
H04N 19/48 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using compressed domain processing techniques other than decoding, e.g. modification of transform coefficients, variable length coding [VLC] data or run-length data
H04N 19/117 - Filters, e.g. for pre-processing or post-processing
H04N 19/60 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
H04N 19/50 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
H04N 19/18 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a set of transform coefficients
H04N 19/182 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
H04N 19/157 - Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
Intra sub-partitions (ISP) may be enabled for a current block, for example, based on an ISP indication. The block may be partitioned into multiple sub-partitions, and a sub-partition may belong to a prediction unit (PU). A sub-partition width for the current block and a minimum prediction block width may be obtained. A PU corresponding to a current sub-partition may be determined based on the sub-partition width and the minimum prediction block width. For example, when the sub-partition width is less than the minimum prediction block width, the PU may include multiple sub-partitions. In examples, the minimum prediction block width may be four samples. Reference samples may be determined, and the PU may be predicted using the reference samples.
H04N 19/11 - Selection of coding mode or of prediction mode among a plurality of spatial predictive coding modes
H04N 19/593 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques
H04N 19/105 - Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
27.
METHODS AND APPARATUS FOR ADAPTIVE STREAMING OF POINT CLOUDS
Methods, apparatus, and systems directed to adaptive streaming of V-PCC (Video-based Point Cloud Compression) data using an adaptive HTTP streaming protocol, such as MPEG DASH. A method includes signaling the point cloud data of the point cloud in a DASH MPD including: a main AdaptationSet for the point cloud, including at least (1) a @codecs attribute that is set to a unique value signifying that the corresponding AdaptationSet corresponds to V-PCC data and (2) an initialization segment containing at least one V-PCC sequence parameter set for a representation of the point cloud; and a plurality of component AdaptationSets, each corresponding to one of the V-PCC components and including at least (1) a VPCCComponent descriptor identifying a type of the corresponding V-PCC component and (2) at least one property of the V-PCC component; and transmitting the DASH bitstream over the network.
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
28.
METHODS AND APPARATUS FOR SUB-PICTURE ADAPTIVE RESOLUTION CHANGE
Methods and apparatus relate to picture and video coding in communication systems are provided. Included therein is a method comprising determining one or more layers associated with a parameter set, generating a syntax element including an indication indicating whether the one or more layers associated with the parameter set are independently coded, and generating a message including the syntax element.
H04N 19/59 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial sub-sampling or interpolation, e.g. alteration of picture size or resolution
H04N 19/597 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
Systems and methods described herein employ a high-level syntax design that supports a sub-picture extraction and reposition process. An input video may be encoded into multiple representations, each representation may be represented as a layer. A layer picture may be partitioned into multiple sub-pictures. Each sub-picture may have its own tile partitioning, resolution, color format and bit depth. Each sub-picture is encoded independently from other sub-pictures of the same layer, but it may be inter-predicted from the corresponding sub-pictures from its dependent layers. Each sub-picture may refer to a sub-picture parameter set where the sub-picture properties such as resolution and coordinate is signaled. Each sub-picture parameter set may refer to a PPS where the resolution of the entire picture is signaled.
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
Systems, devices, and methods are described herein for symmetric merge mode motion vector coding. Symmetric bi-prediction (bi-pred) motion vectors (MVs) may be constructed from available candidates in a merge candidate list for regular inter prediction merge mode and/or affine prediction merge mode. Available MV merge candidates may be symmetrically extended or mapped in either direction (e.g., between reference pictures before and after a current picture), for example, when coding a picture that allows bi-directional motion compensation prediction (MCP). A symmetric bi-pred merge candidate may be selected among merge candidates for predicting the motion information of a current prediction unit (PU). The symmetric mapping construction may be repeated by a decoder (e.g., based on a coded index of the MV merge candidate list), for example, to obtain the same merge candidates and coded MV at an encoder.
Systems and methods are described for video coding using affine motion prediction. In an example method, motion vector gradients are determined from respective motion vectors of a plurality of neighboring sub-blocks neighboring a current block. An estimate of at least one affine parameter for the current block is determined based on the motion vector gradients. An affine motion model is determined based at least in part on the estimated affine parameter(s), and a prediction of the current block is generated using the affine motion model. The estimated parameter(s) may be used in the affine motion model itself. Alternatively, the estimated parameter(s) may be used in a prediction of the affine motion model. In some embodiments, only neighboring sub-blocks above and/or to the left of the current block are used in estimating the affine parameter(s).
H04N 19/196 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding being specially adapted for the computation of encoding parameters, e.g. by averaging previously computed encoding parameters
Method, apparatus and systems are disclosed. In one embodiment, a method of decoding includes obtaining a sub-block based motion prediction signal for a current block of the video; obtaining one or more spatial gradients of the sub-block based motion prediction signal or one or more motion vector difference values; obtaining a refinement signal for the current block based on the one or more obtained spatial gradients or the one or more obtained motion vector difference values; obtaining a refined motion prediction signal for the current block based on the sub-block based motion prediction signal and the refinement signal; and decoding the current block based on the refined motion prediction signal.
H04N 19/577 - Motion compensation with bidirectional frame interpolation, i.e. using B-pictures
H04N 19/117 - Filters, e.g. for pre-processing or post-processing
H04N 19/139 - Analysis of motion vectors, e.g. their magnitude, direction, variance or reliability
H04N 19/182 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
33.
IMPROVED INTRA PLANAR PREDICTION USING MERGE MODE MOTION VECTOR CANDIDATES
Methods, procedures, architectures, apparatuses, systems, devices, interfaces, and computer program products for encoding/decoding data (e.g. a data stream) are provided. A video coding method for predicting a current block includes identifying a first block adjacent to the current block, the first block having motion information, performing motion compensation using the motion information to generate a set of reference samples adjacent to the current block, identifying a first line of reference samples from the set of generated reference samples to be used for intra prediction of the current block, and performing intra prediction of the current block using at least the first line of reference samples.
H04N 19/105 - Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
H04N 19/11 - Selection of coding mode or of prediction mode among a plurality of spatial predictive coding modes
H04N 19/139 - Analysis of motion vectors, e.g. their magnitude, direction, variance or reliability
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
H04N 19/46 - Embedding additional information in the video signal during the compression process
H04N 19/593 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques
Systems, methods, and instrumentalities are disclosed for a combined inter and intra prediction, A video coding device may receive a motion vector difference (MMVD) mode indication that indicates whether MMVD mode is used to generate inter prediction of a coding unit (CU). The video coding device may receive a combined inter merge/intra prediction (CUP) indication, for example, when the MMVD mode indication indicates that MMVD mode is not used to generate the inter prediction of the CU, The video coding device may determine whether to use triangle merge mode for the CU, for example, based on the MMVD mode indication and/or the CUP indication. On a condition that the CUP indication indicates that CUP is applied for the CU or the MMVD mode indication indicates that MMVD mode is used to generate the inter prediction, the video coding device may disable the triangle merge mode for the CU.
H04N 19/103 - Selection of coding mode or of prediction mode
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
Systems, methods, and instrumentalities are disclosed for processing history-based motion vector prediction (HMVP). A video coding device may generate a history-based motion vector prediction (HMVP) list for a current block. The video coding device derive an HMVP candidate from a previously coded block. The HMVP candidate may include motion information associated with a neighboring block of the current block, one or more reference indices, and a bi-prediction weight index. The video coding device may add the HMVP candidate to the HMVP list for motion compensated prediction of a motion vector associated with the current block. The video coding device use one HMVP selected from the HMVP list to perform motion compensated prediction of the current block. The motion compensated prediction may be performed using the motion information associated with the neighboring block of the current block, the one or more reference indices, and the bi-prediction weight index.
H04N 19/52 - Processing of motion vectors by encoding by predictive encoding
H04N 19/577 - Motion compensation with bidirectional frame interpolation, i.e. using B-pictures
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
36.
METHODS, ARCHITECTURES, APPARATUSES AND SYSTEMS DIRECTED TO IMPROVED LINEAR MODEL ESTIMATION FOR TEMPLATE BASED VIDEO CODING
Procedures, methods, architectures, apparatuses, systems, devices, and computer program products directed to improved linear model estimation for template-based video coding are provided. Included therein is a method comprising determining minimum and maximum ("min/max") values of luma and chroma samples neighboring a coding block, wherein the min/max chroma values correspond to the min/max luma values; determining a first linear model parameter of a template-based video coding technique (i) based on a single look-up table and the min/max chroma values and (ii) at a precision no greater than 16 bits; determining a second linear model parameter of the template-based video coding technique (i) based on the first linear model parameter and the minimum chroma and luma values and (ii) at a precision no greater than 16 bits; and predicting chroma samples of the coding block based on reconstructed luma samples of the coding block and the first and second linear model parameters.
H04N 19/42 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals - characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
H04N 19/186 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
H04N 19/593 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques
Bi-directional optical flow (BDOF) may be bypassed, for a current coding block, based on whether symmetric motion vector difference (8MVD) is used in motion vector coding for the current coding block, A coding device (e.g., an encoder or a decoder) may determine whether to bypass BDOF for the current coding block based at least in part on an SMVD indication for the current coding block, The coding device may obtain the SMVD indication that indicates whether SMVD is used in motion vector coding for the current coding block. If SMVD Indication indicates that SMVD is used in the motion vector coding for the current coding block, the coding device may bypass BDOF for the current coding block. The coding device may reconstruct, the current coding block without performing BDOF if it determines to bypass BDOF for the current coding block.
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
H04N 19/577 - Motion compensation with bidirectional frame interpolation, i.e. using B-pictures
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
A system may identify a defined rectangular picture area and render video corresponding to the defined rectangular picture area. The system may receive a video bitstream comprising a picture having a header and may receive data specifying a structure of the picture. The system may parse the data specifying the structure of the picture for an identifier corresponding to a defined rectangular area in the first picture and for a tile index of a top left tile in the defined rectangular area. The system may determine one or more tiles comprised in the defined rectangular area based on the identifier corresponding to the defined rectangular area and the tile index of the top left tile. The system may reconstruct the picture including a sub-picture that comprises the defined rectangular area based upon the identifier corresponding to the defined rectangular area. The computing system may render the sub-picture in the defined rectangular area.
H04N 19/17 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
H04N 19/174 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a slice, e.g. a line of blocks or a group of blocks
H04N 19/46 - Embedding additional information in the video signal during the compression process
Systems, methods, and instrumentalities are disclosed for performing horizontal geometry padding on a current sample based on receiving a wraparound enabled indication that indicates whether a horizontal wraparound motion compensation is enabled. If the horizontal wraparound motion compensation is enabled based on the wraparound enabled indication, a video coding device may determine a reference sample wraparound offset of a current sample in a picture. The reference sample wraparound offset may indicate a face width of the picture. The video coding device may determine a reference sample location for the current sample based on the reference sample wraparound offset, a picture width of the picture, and a current sample location. The video coding device may predict the current sample based on the reference sample location in a horizontal direction. Repetitive padding or clipping may be used in the vertical direction.
H04N 19/597 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
H04N 19/55 - Motion estimation with spatial constraints, e.g. at image or region borders
e.ge.g., generalized bi-prediction (GBi)). A coding system may combine coding modes, coding techniques, and/or coding tools. The coding system may include a wireless transmit/receive unit (WTRU). For example, the coding system may combine BDOF and bi-prediction with GU weights (BCW). BDOF may include refining a motion vector associated with a current CU based at least in part on gradients associated with a location in the current CU. The coding system may determine that BDOF is enabled, and/or that bi-prediction with CU weights is enabled for the current CU. The coding system's determination that bi-prediction with CU weights is enabled and/or that BDOF is enabled may be based on one or more indications.
Methods, apparatus, systems, architectures and interfaces for encoding and/or decoding point cloud bitstreams including coded point cloud sequences are provided. Included among such methods, apparatuses, systems, architectures, and interfaces is an apparatus that may include a processor and memory. A method may include any of: mapping components of the point cloud bitstream into tracks; generating information identifying any of geometry streams or texture streams according to the mapping of the components; generating information associated with layers corresponding to respective geometry component streams; and generating information indicating operation points associated with the point cloud bitstream.
Systems and methods are described for reducing the complexity of using bi-directional optical flow (BIO) in video coding. In some embodiments, bit-width reduction steps are introduced in the BIO motion refinement process to reduce the maximum bit-width used for BIO calculations. In some embodiments, simplified interpolation filters are used to generate predicted samples in an extended region around a current coding unit. In some embodiments, different interpolation filters are used for vertical versus horizontal interpolation. In some embodiments, BIO is disabled for coding units with small heights and/or for coding units that are predicted using a sub-block level inter prediction technique, such as advanced temporal motion vector prediction (ATMVP) or affine prediction.
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
H04N 19/52 - Processing of motion vectors by encoding by predictive encoding
H04N 19/577 - Motion compensation with bidirectional frame interpolation, i.e. using B-pictures
43.
AFFINE MOTION ESTIMATION FOR AFFINE MODEL-BASED VIDEO CODING
Systems, methods, and instrumentalities for affine motion estimation for affine model-based video coding may be disclosed herein. A first motion vector (MV) set including one or more MVs may be derived for a first coding block. The MVs may be control point MVs (CPMVs) and the MVs may be derived by performing affine motion estimation (ME) associated with the first coding block. The first MV set may be added to a recently-estimated MV list. A head of the recently-estimated MV list may be set to the first MV set. The recently-estimated MV list may be empty or may contain one or more previously-added MV sets.
Methods and apparatus for using flexible grid regions in picture or video frames are disclosed. In one embodiment, a method includes receiving a set of first parameters that defines a plurality of first grid regions comprising a frame. For each first grid region, the method includes receiving a set of second parameters that defines a plurality of second grid regions, and the plurality of second grid regions partitions the respective first grid region. The method further includes partitioning the frame into the plurality of first grid regions based on the set of first parameters, and partitioning each first grid region into the plurality of second grid regions based on the respective set of second parameters.
H04N 19/119 - Adaptive subdivision aspects e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
H04N 19/167 - Position within a video image, e.g. region of interest [ROI]
H04N 19/174 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a slice, e.g. a line of blocks or a group of blocks
H04N 19/563 - Motion estimation with padding, i.e. with filling of non-object values in an arbitrarily shaped picture block or region for estimation purposes
45.
ADAPTIVE MOTION VECTOR PRECISION FOR AFFINE MOTION MODEL BASED VIDEO CODING
Systems and methods are described for video coding using affine motion models with adaptive precision. In an example, a block of video is encoded in a bitstream using an affine motion model, where the affine motion model is characterized by at least two motion vectors. A precision is selected for each of the motion vectors, and the selected precisions are signaled in the bitstream. In some embodiments, the precisions are signaled by including in the bitstream information that identifies one of a plurality of elements in a selected predetermined precision set. The identified element indicates the precision of each of the motion vectors that characterize the affine motion model. In some embodiments, the precision set to be used is signaled expressly in the bitstream; in other embodiments, the precision set may be inferred, e.g., from the block size, block shape or temporal layer.
H04N 19/109 - Selection of coding mode or of prediction mode among a plurality of temporal predictive coding modes
H04N 19/13 - Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
H04N 19/134 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
H04N 19/147 - Data rate or code amount at the encoder output according to rate distortion criteria
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
H04N 19/184 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being bits, e.g. of the compressed video stream
H04N 19/52 - Processing of motion vectors by encoding by predictive encoding
H04N 19/523 - Motion estimation or motion compensation with sub-pixel accuracy
H04N 19/54 - Motion estimation other than block-based using feature points or meshes
46.
METHODS AND APPARATUS FOR REDUCING THE CODING LATENCY OF DECODER-SIDE MOTION REFINEMENT
Embodiments of video coding systems and methods are described for reducing coding latency introduced by decoder-side motion vector refinement (DMVR). In one example, two non-refined motion vectors are identified for coding of a first block of samples (e.g. a first coding unit) using bi-prediction. One or both of the non-refined motion vectors are used to predict motion information for a second block of samples (e.g. a second coding unit). The two non-refined motion vectors are refined using DMVR, and the refined motion vectors are used to generate a prediction signal of the first block of samples. Such embodiments allow the second block of samples to be coded substantially in parallel with the first block without waiting for completion of DMVR on the first block. In additional embodiments, optical-flow-based techniques are described for motion vector refinement.
H04N 19/436 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals - characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation using parallelised computational arrangements
47.
ADAPTIVE CONTROL POINT SELECTION FOR AFFINE MOTION MODEL BASED VIDEO CODING
Systems, methods, and instrumentalities are disclosed for motion vector clipping when affine motion mode is enabled for a video block. A video coding device may determine that an affine mode for a video block is enabled. The video coding device may determine a plurality of control point affine motion vectors associated with the video block. The video coding device may store the plurality of clipped control point affine motion vectors for motion vector prediction of a neighboring control point affine motion vector. The video coding device may derive a sub-block motion vector associated with a sub-block of the video block, clip the derived sub-block motion vector, and store it for spatial motion vector prediction or temporal motion vector prediction. For example, the video coding device may clip the derived sub-block motion vector based on a motion field range that may be based on a bit depth value.
A video coding device may be configured to perform directional Bi-directional optical flow (BDOF) refinement on a coding unit (CU). The device may determine the direction in which to perform directional BDOF refinement. The device may calculate the vertical direction gradient difference and the horizontal direction gradient difference for the CU. The vertical direction gradient difference may indicate the difference between the vertical gradients for a first reference picture and the vertical gradients for a second reference picture. The horizontal direction gradient difference may indicate the difference between the horizontal gradients for the first reference picture and the horizontal gradients for the second reference picture. The video coding device may determine the direction in which to perform directional BDOF refinement based on the vertical direction gradient difference and the horizontal direction gradient difference. The video coding device may perform directional BDOF refinement in the determined direction.
H04N 19/139 - Analysis of motion vectors, e.g. their magnitude, direction, variance or reliability
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
H04N 19/103 - Selection of coding mode or of prediction mode
H04N 19/577 - Motion compensation with bidirectional frame interpolation, i.e. using B-pictures
G06T 7/269 - Analysis of motion using gradient-based methods
Methods are described herein for signaling information regarding different viewpoints in a multi-viewpoint omnidirectional media presentation. In some embodiments, a container file (which may use the ISO Base Media File Format) is generated containing several tracks. The tracks are grouped using a track-group identifier, where each track-group identifier is associated with a different viewpoint. In some embodiments, a manifest (such as an MPEG-DASFI MPD) is generated, where the manifest includes viewpoint identifiers that identify the viewpoint associated with each stream. In some embodiments, metadata included in a container file and/or in a manifest provides information on the position of each viewpoint, the intervals during which each viewpoint is available, transition effects for transitions between viewpoints, and/or recommended projection formats for corresponding field-of-view ranges.
H04N 13/117 - Transformation of image signals corresponding to virtual viewpoints, e.g. spatial image interpolation the virtual viewpoint locations being selected by the viewers or determined by viewer tracking
H04N 21/845 - Structuring of content, e.g. decomposing content into time segments
H04N 21/235 - Processing of additional data, e.g. scrambling of additional data or processing content descriptors
H04N 21/218 - Source of audio or video content, e.g. local disk arrays
50.
TEMPLATE-BASED INTER PREDICTION TECHNIQUES BASED ON ENCODING AND DECODING LATENCY REDUCTION
Video coding methods are described for reducing latency in template-based inter coding. In some embodiments, a method is provided for coding a video that includes a current picture and at least one reference picture. For at least a current block in the current picture, a respective predicted value is generated (e.g. using motion compensated prediction) for each sample in a template region adjacent to the current block. Once the predicted values are generated for each sample in the template region, a process is invoked to determine a template-based inter prediction parameter by using predicted values in the template region and sample values the reference picture. This process can be invoked without waiting for reconstructed sample values in the template region. Template-based inter prediction of the current block is then performed using the determined template-based inter prediction parameter.
H04N 19/43 - Hardware specially adapted for motion estimation or compensation
H04N 19/436 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals - characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation using parallelised computational arrangements
H04N 19/105 - Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
H04N 19/109 - Selection of coding mode or of prediction mode among a plurality of temporal predictive coding modes
H04N 19/159 - Prediction type, e.g. intra-frame, inter-frame or bidirectional frame prediction
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
H04N 19/52 - Processing of motion vectors by encoding by predictive encoding
Systems and methods described herein provide for rendering and quality monitoring of rendering of a 360-degree video, where the video has a plurality of representations with different levels of quality in different regions. In an exemplary method, a client device tracks a position of a viewport with respect to the 360-degree video and renders to the viewport a selected set of the representations. The client adaptively adds and removes representations from the selected set based on the viewport position. The client also measures and reports a viewport switching latency. In some embodiments, the latency for a viewport switch is a comparable-quality viewport switch latency that represents the time it takes after a viewport switch to return to a quality comparable to the pre-switch viewport quality.
H04N 21/218 - Source of audio or video content, e.g. local disk arrays
H04N 21/442 - Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed or the storage space available from the internal hard disk
H04N 21/2343 - Processing of video elementary streams, e.g. splicing of video streams or manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
H04N 21/845 - Structuring of content, e.g. decomposing content into time segments
52.
GENERALIZED BI-PREDICTION FOR VIDEO CODING WITH REDUCED CODING COMPLEXITY
Exemplary embodiments include systems and methods for coding a video comprising a plurality of pictures including a current picture, a first reference picture, and a second reference picture, where each picture includes a plurality of blocks. In one method, for at least a current block in the current picture, a number of available bi-prediction weights is determined based at least in part on a temporal layer and/or a quantization parameter of the current picture. From among available bi-prediction weights a pair of weights are identified. Using the identified weights, the current block is then predicted as a weighted sum of a first reference block in the first reference picture and a second reference block in the second reference picture. Encoding techniques are also described for efficient searching and selection of a pair of bi-prediction weights to use for prediction of a block.
H04N 19/577 - Motion compensation with bidirectional frame interpolation, i.e. using B-pictures
H04N 19/31 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability in the temporal domain
H04N 19/573 - Motion compensation with multiple frame prediction using two or more reference frames in a given prediction direction
Systems and methods are described for selecting a motion vector (MV) to use in frame-rate up conversion (FRUC) coding of a block of video. In one embodiment, a first set of motion vector candidates is identified for FRUC prediction of the block. A search center is defined based on the first set of motion vector candidates, and a search window is determined, the search window having a selected width and being centered on the search center. A search for a selected MV is performed within the search window. In some embodiments, an initial set of MVs is processed with a clustering algorithm to generate a smaller number of MVs that are used as the first set. The selected MV may be subject to a motion refinement search, which may also be performed over a constrained search range. In additional embodiments, search iterations are constrained to limit complexity.
e.g.,e.g., face layout and/or face rotations parameters) associated with a RAS, The device may receive a plurality of pictures, which may each comprise a plurality of faces. The pictures may be grouped Into a plurality of RASs. The device may select a frame packing configuration with the lowest cost for a first RAS. For example, the cost of a frame packing configuration may be determined based on the first picture of the first RAS. The device may select a frame packing configuration for a second RAS. The frame packing configuration for the first RAS may be different than the frame packing configuration for the second RAS. The frame packing configuration for the first RAS and the frame packing configuration for the second RAS may be signaled in the video bitstream.
H04N 19/597 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
55.
MOTION COMPENSATED BI-PREDICTION BASED ON LOCAL ILLUMINATION COMPENSATION
Systems, methods, and Instrumentalities are described herein for calculating local Illumination compensation (LIC) parameters for bi-predicted coding unit (CU). The LIC parameters may be used to generate adjusted samples for the current CU and to address local illumination changes that may exist among temporal neighboring pictures. LIC parameters may be calculated based on bi-predicted reference template samples and template samples for a current CU. Bi-predicted reference template samples may be generated based on reference template samples neighboring temporal reference CUs. For example, the bi-predicted reference template samples may be generated based on averaging the reference template samples. The reference template samples may correspond to template samples for the current CU. A CU may be or may include a coding block and/or a sub-block that may be derived by dividing the coding block.
H04N 19/105 - Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
H04N 19/137 - Motion inside a coding unit, e.g. average field, frame or block difference
H04N 19/583 - Motion compensation with overlapping blocks
H04N 19/577 - Motion compensation with bidirectional frame interpolation, i.e. using B-pictures
56.
FACE DISCONTINUITY FILTERING FOR 360-DEGREE VIDEO CODING
Systems, methods, and instrumentalities are disclosed for discontinuous face boundary filtering for 360-degree video coding, A face discontinuity may be filtered (e.g., to reduce seam artifacts) in whole or in part, for example, using coded samples or padded samples on either side of the face discontinuity. Filtering may be applied, for example, as an in-ioop filter or a post-processing step. 2D positional information related to two sides of the face discontinuity may be signaled In a video bitstream so that filtering may be applied independent of projection formats and/or frame packing techniques.
H04N 19/597 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
H04N 19/134 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
H04N 19/117 - Filters, e.g. for pre-processing or post-processing
H04N 19/82 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals - Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation involving filtering within a prediction loop
H04N 19/86 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving reduction of coding artifacts, e.g. of blockiness
H04N 19/88 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving rearrangement of data among different coding units, e.g. shuffling, interleaving, scrambling or permutation of pixel data or permutation of transform coefficient data among different blocks
57.
360-DEGREE VIDEO CODING USING FACE-BASED GEOMETRY PADDING
A frame-packed picture for a 360-degree video content may be received, A group of continuous faces in the frame-packed picture may be identified based on frame packing information for the frame-packed picture. A sample Iocation in the group of continuous faces may be identified. Whether a neighboring sample Iocation associated with the identified sample location is located outside of a discontinuous edge of the group of continuous faces may be determined. If the neighboring sample iocation is located outside of the discontinuous edge of the group of continuous faces, geometry padding on the identified sample Iocation may be performed, if the neighboring sample Iocation is located outside of the discontinuous edge of the group of continuous faces, geometry padding may be skipped. The 360-degree video content may be processed based on the geometry padding.
H04N 19/597 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
H04N 19/563 - Motion estimation with padding, i.e. with filling of non-object values in an arbitrarily shaped picture block or region for estimation purposes
H04N 21/218 - Source of audio or video content, e.g. local disk arrays
58.
METHODS FOR SIMPLIFYING ADAPTIVE LOOP FILTER IN VIDEO CODING
Systems, methods and instrumentalities are disclosed for adaptively selecting an adaptive loop filter (ALF) procedure for a frame based on which temporal layer the frame is in. ALF procedures may vary in computational complexity. One or more frames including the current frame may be in a temporal layer of a coding scheme. The decoder may determine the current frame's temporal layer level within the coding scheme. The decoder may select an ALF procedure based on the current frame's temporal layer level. If the current frame's temporal layer level is higher within the coding scheme than some other temporal layer levels, an ALF procedure that is less computationally complex may be selected for the current frame. Then the decoder may perform the selected ALF procedure on the current frame.
H04N 19/82 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals - Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation involving filtering within a prediction loop
H04N 19/117 - Filters, e.g. for pre-processing or post-processing
H04N 19/14 - Coding unit complexity, e.g. amount of activity or edge presence estimation
H04N 19/187 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a scalable video layer
External overlapped block motion compensation (OBMC) may be performed for samples of a coding unit (CU) located along an inter-CU boundary of the CU while internal OBMC may be performed separately for samples located along inter-sub-block boundaries inside the CU. External OBMC may be applied based on substantially similar motion information associated with multiple external blocks neighboring the CU. The external blocks may be treated as a group to provide OBMC for multiple boundary samples together in an external OBMC operation. Internal OBMC may be applied using the same sub-block size used for sub-block level motion derivation. Internal OBMC may be disabled for the CU, for example, if the CU is coded in a spatial-temporal motion vector prediction (STMVP) mode.
A block may be identified. The block may be partitioned into one or more (e.g., two) sibling nodes (e.g., sibling nodes BO and B1 ). A partition direction and a partition type for the block may be determined. If the partition type for the block is binary tree (BT), one or more (e.g., two) partition parameters may be determined for sibling node BO. A partition parameter (e.g., a first partition parameter) may be determined for sibling node B1. A decoder may determine whether to receive an indication of a second partition parameter for B1 based on, for example, the partition direction for the block, the partition type for the block, and the first partition parameter for B1. The decoder may derive the second partition parameter based on, for example, the partition direction and type for the block, and the first partition parameter for B1.
H04N 19/119 - Adaptive subdivision aspects e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
H04N 19/463 - Embedding additional information in the video signal during the compression process by compressing encoding parameters before transmission
H04N 19/14 - Coding unit complexity, e.g. amount of activity or edge presence estimation
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
61.
SUB-BLOCK MOTION DERIVATION AND DECODER-SIDE MOTION VECTOR REFINEMENT FOR MERGE MODE
Systems, methods, and instrumentalities for sub-block motion derivation and motion vector refinement for merge mode may be disclosed herein. Video data may be coded (e.g., encoded and/or decoded). A collocated picture for a current slice of the video data may be identified. The current slice may include one or more coding units (CUs). One or more neighboring CUs may be identified for a current CU. A neighboring CU (e.g., each neighboring CU) may correspond to a reference picture. A (e.g., one) neighboring CU may be selected to be a candidate neighboring CU based on the reference pictures and the collocated picture. A motion vector (MV) (e.g., collocated MV) may be identified from the collocated picture based on an MV (e.g., a reference MV) of the candidate neighboring CU. The current CU may be coded (e.g., encoded and/or decoded) using the collocated MV.
A device may receive a 360-degree video comprising one or more frames. The frames may comprise multiple faces and/or may be associated with one or more parameterized transform functions. The one or more parameterized transform functions may be associated with a transform function parameter. For example, a transform function parameter and/or parameterized transform function may be defined for each face and/or in each direction. The device may search through a parameter space for a first transform function parameter for a first frame. The device may determine a progressive search range (PSR) which may be relative to the first transform function parameter. For example, the PSR may include a range the surrounds the first transform function parameter. The device may search through the PSR to find a second transform function parameter for a second frame. The device may signal the first and the second transform function parameter in video bitstream.
H04N 19/117 - Filters, e.g. for pre-processing or post-processing
H04N 19/167 - Position within a video image, e.g. region of interest [ROI]
H04N 19/172 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
H04N 19/46 - Embedding additional information in the video signal during the compression process
A device may be configured to render at least one spatial region of 360-degree media content, which may include two or more spatial regions. The device may include a receiver configured to receive the 360- degree media content and metadata associated with the 360-degree content. The metadata may include a classification of a respective spatial region of the 360-degree media content. The device may further include a memory configured to store a user preference and a sensor configured to detect a user movement. The device may include a processor configured to determine that the user movement is associated with a rendering of the respective spatial region. The processor may further determine whether the classification complies with the user preference and alter the rendering of the respective spatial region if the classification violates the user preference.
H04N 21/218 - Source of audio or video content, e.g. local disk arrays
H04N 21/2343 - Processing of video elementary streams, e.g. splicing of video streams or manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
H04N 21/258 - Client or end-user data management, e.g. managing client capabilities, user preferences or demographics or processing of multiple end-users preferences to derive collaborative data
H04N 21/262 - Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission or generating play-lists
H04N 21/41 - Structure of client; Structure of client peripherals
H04N 21/422 - Input-only peripherals, e.g. global positioning system [GPS]
H04N 21/431 - Generation of visual interfaces; Content or additional data rendering
H04N 21/442 - Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed or the storage space available from the internal hard disk
H04N 21/45 - Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies
H04N 21/454 - Content filtering, e.g. blocking advertisements
H04N 21/4545 - Input to filtering algorithms, e.g. filtering a region of the image
H04N 21/462 - Content or additional data management e.g. creating a master electronic program guide from data received from the Internet and a Head-end or controlling the complexity of a video stream by scaling the resolution or bit-rate based on the client capabi
Overlapped block motion compensation (OBMC) may be performed for a current video block based on motion information associated with the current video block and motion information associated with one or more neighboring blocks of the current video block. Under certain conditions, some or ail of these neighboring blocks may be omitted from the OBMC operation of the current block. For instance, a neighboring block may be skipped during the OBMC operation if the current video block and the neighboring block are both uni-directionally or bi-directionally predicted, if the motion vectors associated with the current block and the neighboring block refer to a same reference picture, and if a sum of absolute differences between those motion vectors is smaller than a threshold value. Further, OBMC may be conducted in conjunction with regular motion compensation and may use simplified filters than traditionally allowed.
Systems, methods, and instrumentalities may be provided for discounting reconstructed samples and/or coding information from spatial neighbors across face discontinuities. Whether a current block is located at a face discontinuity may be determined. The face discontinuity may be a face boundary between two or more adjoining blocks that are not spherical neighbors. The coding availability of a neighboring block of the current block may be determined, e.g., based on whether the neighboring block is on the same side of the face discontinuity as the current block. For example, the neighboring block may be determined to be available for decoding the current block if it is on the same side of the face discontinuity as the current block, and unavailable if it Is not on the same side of the face discontinuity. The neighboring block may be a spatial neighboring block or a temporal neighboring block.
H04N 19/597 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
H04N 19/593 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques
H04N 19/82 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals - Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation involving filtering within a prediction loop
H04N 19/86 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving reduction of coding artifacts, e.g. of blockiness
Systems, methods, and instrumentalities are disclosed for dynamic picture-in-picture (PIP) by a client. The client may reside on any device. The client may receive video content from a server, and identify an object within the video content using at least one of object recognition or metadata. The metadata may include information that indicates a location of an object within a frame of the video content. The client may receive a selection of the object by a user, and determine positional data of the object across frames of the video content using at least one of object recognition or metadata. The client may display an enlarged and time-delayed version of the object within a PIP window across the frames of the video content. Alternatively or additionally, the location of the PIP window within each frame may be fixed or may be based on the location of the object within each frame.
H04N 21/4728 - End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification or for manipulating displayed content for selecting a ROI [Region Of Interest], e.g. for requesting a higher resolution version of a selected region
H04N 21/431 - Generation of visual interfaces; Content or additional data rendering
H04N 21/44 - Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to MPEG-4 scene graphs
When a FRUC mode is enabled for a current coding unit (CU), motion vector (MV) candidates may be derived for the current CU, One or more search MVs may be derived from the MV candidates so that an initiai motion search may be performed for the current CU using the search MVs. The search MVs, which may be fewer than the MV candidates for the CU, may be derived based on one or more attributes of the MV candidates. At the sub-CU level, sub-CU MV candidates may be determined for a current sub-CU. Sub-CU search MVs may be derived from the sub-CU MV candidates for the current sub-CU so that a motion search may be performed for the current sub-CU using the sub-CU search MVs. The number of the sub-CU search MVs may be smaller than the number of the sub-CU MV candidates.
H04N 19/56 - Motion estimation with initialisation of the vector search, e.g. estimating a good candidate to initiate a search
H04N 19/139 - Analysis of motion vectors, e.g. their magnitude, direction, variance or reliability
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
H04N 19/109 - Selection of coding mode or of prediction mode among a plurality of temporal predictive coding modes
68.
MOTION-COMPENSATION PREDICTION BASED ON BI-DIRECTIONAL OPTICAL FLOW
A device may determine whether to enable or disable bi-directional optical flow (BIO) for a current coding unit (CU) (e.g., block and/or sub-block). Prediction information for the CU may be identified and may include prediction signals associated with a first reference block and a second reference block (e.g., or a first reference sub-block and a second reference sub-block). A prediction difference may be calculated and may be used to determine the similarity between the two prediction signals. The CU may be reconstructed based on the similarity. For example, whether to reconstruct the CU with BIO enabled or BIO disabled may be based on whether the two prediction signals are similar, it may be determined to enable BIO for the CU when the two prediction signals are determined to be dissimilar. For example, the CU may be reconstructed with BIO disabled when the two prediction signals are determined to be similar.
H04N 19/139 - Analysis of motion vectors, e.g. their magnitude, direction, variance or reliability
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
H04N 19/103 - Selection of coding mode or of prediction mode
H04N 19/577 - Motion compensation with bidirectional frame interpolation, i.e. using B-pictures
G06T 7/269 - Analysis of motion using gradient-based methods
69.
WEIGHTED TO SPHERICALLY UNIFORM PSNR FOR 360-DEGREE VIDEO QUALITY EVALUATION USING CUBEMAP-BASED PROJECTIONS
360-degree video content may be coded. A sampling position in a projection format may be determined to code 360-degree video content. For example, a sampling position in a target projection format and a sampling position in a reference projection format may be identified. The sample position in the target projection format may be related to the corresponding sample position in the reference projection format via a transform function. A parameter weight (e.g., a reference parameter weight) for the sampling position in the reference projection format may be identified. An adjustment factor associated with the parameter weight for the sampling position in the reference projection format may be determined. The parameter weight (e.g., adjusted parameter weight) for the sampling position in the target projection format may be calculated. The calculated adjusted parameter weight may be applied to the sampling position in the target projection format when coding the 360-degree video content.
H04N 19/597 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
H04N 19/126 - Quantisation - Details of normalisation or weighting functions, e.g. normalisation matrices or variable uniform quantisers
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
H04N 19/16 - Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter for a given display mode, e.g. for interlaced or progressive display mode
H04N 19/46 - Embedding additional information in the video signal during the compression process
G06T 3/00 - Geometric image transformation in the plane of the image
70.
LOCAL ILLUMINATION COMPENSATION USING GENERALIZED BI-PREDICTION
Based on the prediction mode used for the current block, a decoder may determine whether to parse an illumination compensation indication for the current block. The illumination compensation indication may indicate whether to enable an illumination compensation process for the current block. If the prediction mode is indicative of the continuous motion changes between the current block and one or more of the reference blocks, the decoder may bypass parsing the illumination compensation indication. The decoder may disable the illumination compensation process on the current block based on the determination to bypass parsing the illumination compensation indication for the current block.
Systems, procedures, and instrumentalities may be provided for adaptive!y adjusting quantization parameters (QPs) for 360-degree video coding. For example, a first luma QP for a first region may be identified. Based on the first luma QP, a first chroma QP for the first region may be determined. A QP offset for a second region may be identified. A second luma QP for the second region may be determined based on the first luma QP and/or the QP offset for the second region. A second chroma QP of the second region may be determined based on the first chroma QP and/or the QP offset for the second region. An inverse quantization may be performed for the second region based on the second luma QP for the second region and/or the second chroma QP for the second region. The QP offset may be adapted based on a spherical sampling density.
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
A camera may be configured to provide a real-time image in a virtual reality (VR) session. A VR system may discover available external cameras. The VR system may determine positions of the external cameras. The active camera may be selected automatically based on at least one of the positions of the external cameras, a motion or environmental change, or an object detection. The active camera may be updated periodically based on one or more of user movement tracking information, user gesture, or a user input device. The VR system may establish a video connection with the active camera. Images from the active camera may be received and displayed during the VR session. The images may comprise an inset view showing a self-view of the VR user or inset view showing the VR user environment.
Systems, methods, and instrumentalities are disclosed for a 360-degree video streaming. A video streaming device may receive a 360-degree video stream from a network node. The video streaming device may determine a viewport associated with the video streaming device and/or the 360-degree video stream. The video streaming device may determine (e.g., based on the viewport} to request in advance a first segment and a second segment of the 360-degree video stream. The video streaming device may determine a relative priority order for the first segment and the second segment. The video streaming device may generate an anticipated requests message. The anticipated requests message may indicate the determined relative priority order, for example, by listing the first segment and the second segment in decreasing relative priority based on the determined relative priority order. The video streaming device may send the anticipated requests message to the network node.
H04N 21/00 - Selective content distribution, e.g. interactive television or video on demand [VOD]
H04N 21/2343 - Processing of video elementary streams, e.g. splicing of video streams or manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
H04N 21/218 - Source of audio or video content, e.g. local disk arrays
H04N 21/4728 - End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification or for manipulating displayed content for selecting a ROI [Region Of Interest], e.g. for requesting a higher resolution version of a selected region
H04N 21/6587 - Control parameters, e.g. trick play commands or viewpoint selection
Video tracking systems and methods is performed allows for tracking of one or more objects in video even if the objects are occluded or otherwise unavailable for optical tracking methods. In one such method, video of a scene is captured with a camera-equipped device. A selected object in the captured video is optically tracked to determine an optically-tracked location within the captured video. A position and orientation of the camera is determined. The device wirelessly receives coordinates that indicate the position of the selected object. Based on the position and orientation of the camera, the received coordinates are mapped to a mapped location in the captured video, which may be represented by pixel coordinates. In response to a determination that the selected object is obscured in the captured video, the mapped location is used to track the selected object.
A system, method, and/or instrumentality may be provided for coding a 360-degree video. A picture of the 360-degree video may be received. The picture may include one or more faces associated with one or more projection formats. A first projection format indication may be received that indicates a first projection format may be associated with a first face. A second projection format indication may be received that indicates a second projection format may be associated with a second face. Based on the first projection format, a first transform function associated with the first face may be determined. Based on the second projection format, a second transform function associated with the second face may be determined. At least one decoding process may be performed on the first face using the first transform function and/or at least one decoding process may be performed on the second face using the second transform function.
H04N 19/597 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
H04N 19/563 - Motion estimation with padding, i.e. with filling of non-object values in an arbitrarily shaped picture block or region for estimation purposes
H04N 19/119 - Adaptive subdivision aspects e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
H04N 19/593 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques
H04N 13/00 - PICTORIAL COMMUNICATION, e.g. TELEVISION - Details thereof
76.
HIGHER-ORDER MOTION MODELS AND GRADUATED MOTION PARAMETER ESTIMATION FOR VIDEO CODING
Systems, methods, and instrumentalities are disclosed for higher-order motion models and graduated motion parameter estimation for video coding. Motion compensated prediction may be performed on a block level using one or more orthogonal basis functions (e.g., Legendre polynomial functions). A motion parameter count indication associated with a current block may be received. An order of an orthogonal basis function for motion modeling associated with the current block may be determined. The order of the orthogonal basis function may be determined based on the motion parameter count indication. Motion parameter values for the orthogonal basis function associated with the current block may be determined, and the current block may be predicted based on the orthogonal basis function having the determined motion parameter values.
H04N 19/105 - Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
H04N 19/51 - Motion estimation or motion compensation
H04N 19/463 - Embedding additional information in the video signal during the compression process by compressing encoding parameters before transmission
H04N 19/136 - Incoming video signal characteristics or properties
H04N 19/182 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
A coding device (e.g., that may be or may include encoder and/or decoder) may receive a frame-packed picture of 380-degree video. The coding device may identify a face in the frame-packed picture that the current block belongs to. The coding device may determine that a current block is located at a boundary of the face that the current block belongs to. The coding device may identify multiple spherical neighboring blocks of the current block. The coding device may identify a cross-face boundary neighboring block. The coding device may identify a block in the frame-packed picture that corresponds to the cross-face boundary neighboring block. The coding device may determine whether to use the identified block to code the current block based on availability of the identified block. The coding device may code the current block based on the determination to use the identified block.
H04N 19/597 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
H04N 19/105 - Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
H04N 19/61 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
H04N 19/593 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques
H04N 19/167 - Position within a video image, e.g. region of interest [ROI]
78.
METRICS AND MESSAGES TO IMPROVE EXPERIENCE FOR 360-DEGREE ADAPTIVE STREAMING
A method for receiving and displaying media content may be provided. The method may include requesting a set of DASH video segments that are associated with various viewports and qualities. The method may include displaying the DASH video segments. The method may indue determining a latency metric based on a time difference between the display of a DASH video segment and one of: a device beginning to move, the device ceasing to move, the device determining that the device has begun to move, the device determining that the device has stopped moving, or the display of a different DASH video segment. The different DASH video segment may be associated with one or more of a different quality or a different viewport.
H04L 29/06 - Communication control; Communication processing characterised by a protocol
G06F 3/01 - Input arrangements or combined input and output arrangements for interaction between user and computer
G06F 3/0481 - Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
H04N 21/442 - Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed or the storage space available from the internal hard disk
H04N 21/4728 - End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification or for manipulating displayed content for selecting a ROI [Region Of Interest], e.g. for requesting a higher resolution version of a selected region
H04N 21/6587 - Control parameters, e.g. trick play commands or viewpoint selection
H04N 19/597 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
H04N 13/344 - Displays for viewing with the aid of special glasses or head-mounted displays [HMD] with head-mounted left-right displays
H04N 13/117 - Transformation of image signals corresponding to virtual viewpoints, e.g. spatial image interpolation the virtual viewpoint locations being selected by the viewers or determined by viewer tracking
79.
PREDICTIVE CODING FOR 360-DEGREE VIDEO BASED ON GEOMETRY PADDING
A video coding system (e.g., an encoder and/or a decoder) may perform face-based sub-block motion compensation for 360-degree video to predict samples (e.g., of a sub-block). The video coding system may receive a 360-degree video content. The 360-degree video content may include a current block. The current block may include a plurality of sub-blocks. The system may determine whether a sub-block mode is used for the current block. The system may predict a sample in the current block based on the sub-block level face association. For a first sub-block in the current block, the system may identify a first location of the first sub-block. The system may associate the first sub-block with a first face based on the identified first location of the first sub-block. The system may predict a first sample in the first sub-block based on the first face that is associated with the first sub-block.
H04N 19/563 - Motion estimation with padding, i.e. with filling of non-object values in an arbitrarily shaped picture block or region for estimation purposes
H04N 19/597 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
H04N 19/105 - Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
80.
FLOATING POINT TO INTEGER CONVERSION FOR 360-DEGREE VIDEO PROJECTION FORMAT CONVERSION AND SPHERICAL METRICS CALCULATION
A system, method, and/or instrumentality may convert content of a first projection format to content of a second projection format. A sample position associated with the content of the first projection format may be identified and/or represented as a floating point value. A scaling factor for converting the floating point value to a fixed point value may be identified. The scaling factor may be less than a scaling limit divided by a floating point computation precision limit. The fixed point value may be converted to an integer value. The integer value may be the top-left integer sampling position of the fixed point value. An interpolation filter coefficient may be determined based on a distance between the fixed point value and the integer value. The content of the first projection format may be converted to the content of the second projection format based on the interpolation filter coefficient.
Systems and methods are described for video encoding for devices equipped with two video cameras, particularly where one of the video cameras is a zoom camera. Videos of a scene are simultaneously captured from both video cameras. Motion information (such as a motion field and/or motion vectors) collected from one video stream is used for the encoding of the other. For example, a motion vector from one video may be transformed into a grid of the other video. The transformed motion vector may be used to predict a block of pixels in the other video, or it may be used as a candidate or starting point in an algorithm for selecting a motion vector. The transformation of the motion vector may comprise aligning and scaling the vector, or other linear or nonlinear transformations may be used.
H04N 19/597 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
H04N 19/56 - Motion estimation with initialisation of the vector search, e.g. estimating a good candidate to initiate a search
H04N 19/30 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
H04N 19/194 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding the adaptation method, adaptation tool or adaptation type being iterative or recursive involving only two passes
H04N 13/25 - Image signal generators using stereoscopic image cameras using image signals from one sensor to control the characteristics of another sensor
82.
ZOOM CODING USING SIMULTANEOUS AND SYNCHRONOUS MULTIPLE-CAMERA CAPTURES
Systems and methods described herein disclose use of simultaneous and synchronous multiple-camera captures for a zoom region. An exemplary method using two field of view (FOV) video streams of a scene, where the second FOV is narrower than the first, comprises: tracking an object captured within the first FOV; responsive to determining that the object is entirely within the second FOV, outputting video corresponding to the second FOV; and responsive to determining that the object is outside the second FOV, outputting a cropped and up-scaled representation of video corresponding to the first FOV. Systems and methods disclosed herein, prior to tracking the object, display video captured for the first FOV and receive user input indicating selection of an object to be tracked in the displayed video for the first FOV.
Systems and methods are described for enabling a consumer of streaming video to obtain different views of the video, such as zoomed views of one or more objects of interest. In an exemplary embodiment, a client device receives an original video stream along with data identifying objects of interest and their spatial locations within the original video. In one embodiment, in response to user selection of an object of interest, the client device switches to display of a cropped and scaled version of the original video to present a zoomed video of the object of interest. The zoomed video tracks the selected object even as the position of the selected object changes with respect to the original video. In some embodiments, the object of interest and the appropriate zoom factor are both selected with a single expanding-pinch gesture on a touch screen.
H04N 21/234 - Processing of video elementary streams, e.g. splicing of video streams or manipulating MPEG-4 scene graphs
H04N 21/2343 - Processing of video elementary streams, e.g. splicing of video streams or manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
H04N 21/236 - Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator ] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
H04N 21/4402 - Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
H04N 21/4728 - End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification or for manipulating displayed content for selecting a ROI [Region Of Interest], e.g. for requesting a higher resolution version of a selected region
H04N 21/845 - Structuring of content, e.g. decomposing content into time segments
Intra planar approach(es) may be used to predict a pixel(s) in a current block. The current block may be associated with a reconstructed left reference line, a reconstructed top reference line, and an non- reconstructed reference line to be predicted. The reconstructed reference lines may have been decoded and may be available. The non-reconstructed reference lines to be predicted may include an non-reconstructed right and/or an non-reconstructed bottom reference lines. A pivot reference pixel may be identified and may be located on an extension of the reconstructed left and/or top reference lines. A reference pixel may be determined and may be located on the reconstructed top and/or left reference lines. Pixels on the non- reconstructed reference line(s) may be predicted based on the pivot reference pixel and the reference pixel. Pixels of the current block may be predicted using the predicted pixels on the right and the bottom reference lines.
H04N 19/593 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques
H04N 19/11 - Selection of coding mode or of prediction mode among a plurality of spatial predictive coding modes
H04N 19/182 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
H04N 19/59 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial sub-sampling or interpolation, e.g. alteration of picture size or resolution
H04N 19/105 - Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
85.
GEOMETRY CONVERSION AND FRAME PACKING ASSOCIATED WITH 360-DEGREE VIDEOS
Conversion between different projection formats of a 360-degree video may be performed in a uniform way. The geometric characteristics of the different projection formats may be considered when applying 3D-to-2D and 2D-to-3D mapping. Parameters reflective of the geometric characteristics of the different projection formats may be determined and used in the mapping and/or conversion. The parameters may include a normal vector that is perpendicular to a projection plane, a reference point in the projection plane, and/or unit vectors defined in the projection plane. An architecture with consolidated modules for handling the various projection formats may be provided.
A client device adaptively streams a 360-degree video. A first segment is displayed based on a first viewing direction at a first time, where the first viewing direction is associated with a first viewport. The client requests a first base buffer segment based on the first viewport. The first base buffer segment has a presentation time after the first segment. At a second time, the viewing direction changes to a second viewing direction associated with a second viewport. The client requests, prior to the presentation time, a first viewport buffer segment based on the second viewport, with the same presentation time. The client device displays a second segment at the presentation time, wherein the second segment is either the first viewport buffer segment or the first base buffer segment. The client provides reports on viewport switching latency and on the most-requested segments.
Systems and methods are described to enable video clients to zoom in to a region or object of interest without substantial loss of resolution. In an exemplary method, a server transmits a manifest, such as a DASH MPD, to a client device. The manifest identifies a plurality of sub-streams, where each sub-stream represents a respective spatial portion of a source video. The manifest also includes information associating an object of interest with a plurality of the spatial portions. To view high-quality zoomed video, the client requests the sub-streams that are associated with the object of interest and renders the requested sub-streams. In some embodiments, different sub-streams are available with different zoom ratios.
H04N 21/2343 - Processing of video elementary streams, e.g. splicing of video streams or manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
H04L 29/06 - Communication control; Communication processing characterised by a protocol
H04N 21/262 - Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission or generating play-lists
H04N 21/2662 - Controlling the complexity of the video stream, e.g. by scaling the resolution or bitrate of the video stream based on the client capabilities
H04N 21/414 - Specialised client platforms, e.g. receiver in car or embedded in a mobile appliance
H04N 21/431 - Generation of visual interfaces; Content or additional data rendering
H04N 21/4402 - Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
H04N 21/462 - Content or additional data management e.g. creating a master electronic program guide from data received from the Internet and a Head-end or controlling the complexity of a video stream by scaling the resolution or bit-rate based on the client capabi
H04N 21/4728 - End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification or for manipulating displayed content for selecting a ROI [Region Of Interest], e.g. for requesting a higher resolution version of a selected region
H04N 21/6377 - Control signals issued by the client directed to the server or network components directed to server
H04N 21/658 - Transmission by the client directed to the server
H04N 21/6587 - Control parameters, e.g. trick play commands or viewpoint selection
H04N 21/845 - Structuring of content, e.g. decomposing content into time segments
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
H04N 19/46 - Embedding additional information in the video signal during the compression process
H04N 19/10 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
A normal broadcast video viewing experience may be augmented by providing access to enhanced views, such as zoomed or highlighted views of particular regions of interest, or partial or complete views of content with high resolution, high frame rate, high bit depth, or customized tone mapping. Such enhanced views, or zoom coded streams, may be made available over a source other than broadcast, such as a packet-switched network. Information, such as metadata, identifying the available zoom coded streams may be provided in-band in the broadcast video. A second video stream may be requested over the network using the received metadata. The second video stream may be received over the network and then displayed.
H04N 21/4728 - End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification or for manipulating displayed content for selecting a ROI [Region Of Interest], e.g. for requesting a higher resolution version of a selected region
H04N 21/858 - Linking data to content, e.g. by linking an URL to a video object or by creating a hotspot
H04N 21/236 - Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator ] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
H04N 21/262 - Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission or generating play-lists
H04N 21/61 - Network physical structure; Signal processing
H04N 21/43 - Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronizing decoder's clock; Client middleware
H04N 21/434 - Disassembling of a multiplex stream, e.g. demultiplexing audio and video streams or extraction of additional data from a video stream; Remultiplexing of multiplex streams; Extraction or processing of SI; Disassembling of packetised elementary stream
89.
METHODS AND APPARATUS FOR CODED BLOCK FLAG CODING IN QUAD-TREE PLUS BINARY-TREE BLOCK PARTITIONING
Systems and methods are proposed herein for coded block flag (CBF) signaling. In some embodiments, a hierarchical signaling method is used to signal the CBFs of chroma components for the quad-tree plus binary tree (QTBT) structure. A CBF flag may be signaled at each QTBT node level for each chroma component, indicating whether any descendent QTBT leaf node under the current level is associated with a non-zero coefficient. In some embodiments, for inter-coded pictures, a flag at the QTBT root node may indicate whether there are non-zero transform coefficients in the descendent leaf nodes that originate from the current root node. When the flag is equal to 1, the coefficients of the descendent leaf nodes under the current node may be signaled; otherwise, no further residual information is transmitted and all the transform coefficients are inferred to be 0.
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
Coding techniques for 360-degree video are described. An encoder selects a projection format and maps the 360-degree video to a 2D planar video using the selected projection format. The encoder encodes the 2D planar video in a bitstream and further signals, in the bitstream, parameters identifying the projection format. The parameters identifying the projection format may be signaled in a video parameter set, sequence parameter set, and/or picture parameter set of the bitstream. Different projection formats that may be signaled include formats using geometries such as equirectangular, cubemap, equal-area, octahedron, icosahedron, cylinder, and user-specified polygon. Other parameters that may be signaled include different arrangements of geometric faces or different encoding quality for different faces. Corresponding decoders are also described. In some embodiments, projection parameters may further include relative geometry rotation parameters that define an orientation of the projection geometry.
H04N 19/597 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
H04N 19/46 - Embedding additional information in the video signal during the compression process
H04N 19/167 - Position within a video image, e.g. region of interest [ROI]
A secondary content such as an advertisement may be inserted based on users' interests in 360 degree video streaming. Users may have different interests and may watch different areas within a 360 degree video. The information about area(s) of 360 degree scenes that users watch the most may be used to select an ad(s) relevant to their interests. One or more secondary content viewports may be defined within a 360 degree video frame. Secondaiy content viewport parameter(s) may be tracked. For example, statistics of the user's head orientation for some time leading to tile presentation of the ad(s) may be collected. Secondary content may be determined based on the tracked secondary content viewport parameters).
G06T 19/00 - Manipulating 3D models or images for computer graphics
H04N 21/258 - Client or end-user data management, e.g. managing client capabilities, user preferences or demographics or processing of multiple end-users preferences to derive collaborative data
92.
SYSTEMS AND METHODS FOR INTEGRATING AND DELIVERING OBJECTS OF INTEREST IN VIDEO
Systems and methods are described for providing clear areas related to objects of interest in a video display. In accordance with an embodiment, a method includes capturing, with a camera, a video frame of a scene; determining a camera orientation and camera location of the camera capturing the video; determining a location of an object of interest; mapping the location of the object of interest to a location on the video frame; determining an object-of-interest area based on the location of the object of interest on the video frame; determining a clear area on the video frame; transmitting a location of the clear area to a client device; and displaying the video frame and metadata associated with the object of interest in the clear area.
H04N 21/431 - Generation of visual interfaces; Content or additional data rendering
H04N 21/44 - Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to MPEG-4 scene graphs
H04N 21/858 - Linking data to content, e.g. by linking an URL to a video object or by creating a hotspot
93.
QUALITY EVALUATION SYSTEM AND METHOD FOR 360-DEGREE VIDEO
Systems and methods are described herein for determining a distortion metric for encoding of spherical video. In spherical video, there is a mapping between a given geometry of samples and respective points on a unit sphere. In some embodiments, distortion is measured at each sample of interest, and the distortion of each sample is weighted by the area on the unit sphere associated with the sample. In some embodiments, a plurality of points on the unit sphere are selected, and the points are mapped to a nearest sample on the given geometry. Distortion is calculated at the nearest sample points and is weighted by a latitude-dependent weighting based on the latitude of the respective nearest sample point. The latitude-dependent weighting may be based on a viewing probability for that latitude.
H04N 19/597 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
H04N 19/154 - Measured or subjectively estimated visual quality after decoding, e.g. measurement of distortion
Processing video data may include capturing the video data with multiple cameras and stitching the video data together to obtain a 360-degree video. A frame-packed picture may be provided based on the captured and stitched video data. A current sample location may be identified in the frame-packed picture. Whether a neighboring sample location is located outside of a content boundary of the frame-packed picture may be determined. When the neighboring sample location is located outside of the content boundary, a padding sample location may be derived based on at least one circular characteristic of the 360-degree video content and the projection geometry. The 360-degree video content may be processed based on the padding sample location.
H04N 19/563 - Motion estimation with padding, i.e. with filling of non-object values in an arbitrarily shaped picture block or region for estimation purposes
H04N 19/593 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques
G06T 17/10 - Volume description, e.g. cylinders, cubes or using CSG [Constructive Solid Geometry]
G06T 17/30 - Surface description, e.g. polynomial surface description
H04N 13/00 - PICTORIAL COMMUNICATION, e.g. TELEVISION - Details thereof
95.
SYSTEMS AND METHODS FOR REGION-OF-INTEREST TONE REMAPPING
Systems and methods are described for providing viewers of adaptive bit rate (ABR) streaming video with the option to view alternative streams in which an alternative tone mapping is applied to one or more regions of interest. The availability of streams with alternative tone mappings may be identified in a media presentation description (MPD) in an MPEG-DASH system. In some embodiments, the streaming video is divided into slices, and alternative tone mappings are applied to regions of interest within the slices. When a server receives a request from a client device for alternative tone mappings of different regions, slices with the appropriate mapping may be assembled on demand and delivered to the requestor as a single video stream. Tone mappings may be used, for example, to highlight particular players in a sporting event.
H04N 21/2343 - Processing of video elementary streams, e.g. splicing of video streams or manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
H04N 19/20 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding
Systems and methods described herein relate to providing fast switching between different available video streams. In an exemplary embodiment, a user viewing a selected channel of video content receives a manifest file (such as a DASH MPD) that identifies various representations of the selected channel. The manifest file also identifies channel-change streams for one or more alternate channels. The channel-change streams may have a shorter segment size than regular streaming content. While displaying the selected content, a client also retrieves the channel-change streams of the alternate channels. If the client changes to one of the alternate channels, the client displays the appropriate channel-change stream while a regular representation of the alternate channel is being retrieved.
H04N 21/2343 - Processing of video elementary streams, e.g. splicing of video streams or manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
H04N 21/235 - Processing of additional data, e.g. scrambling of additional data or processing content descriptors
H04N 21/266 - Channel or content management, e.g. generation and management of keys and entitlement messages in a conditional access system or merging a VOD unicast channel into a multicast channel
H04N 21/435 - Processing of additional data, e.g. decrypting of additional data or reconstructing software from modules extracted from the transport stream
H04N 21/438 - Interfacing the downstream path of the transmission network originating from a server, e.g. retrieving MPEG packets from an IP network
H04N 21/462 - Content or additional data management e.g. creating a master electronic program guide from data received from the Internet and a Head-end or controlling the complexity of a video stream by scaling the resolution or bit-rate based on the client capabi
H04N 21/4728 - End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification or for manipulating displayed content for selecting a ROI [Region Of Interest], e.g. for requesting a higher resolution version of a selected region
H04N 21/61 - Network physical structure; Signal processing
Processing a 360-degree video content for video coding may include receiving the video content in a first geometry. The video content may include unaligned chroma and luma components associated with a first chroma sampling scheme. The unaligned chroma and luma components may be aligned to a sampling grid associated with a second chroma sampling scheme that has aligned chroma and luma components. A geometric conversion to the video content may be performed. The video content, that may comprise the aligned chroma and luma components, in the first geometry may be converted to a second geometry. The first geometry may be a stitched geometry, and the second geometry may be a coding geometry. The converted video content in the second geometry may include the chroma and luma components aligned to the sampling grid associated with the second chroma sampling scheme.
H04N 19/597 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
H04N 19/186 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
H04N 19/59 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial sub-sampling or interpolation, e.g. alteration of picture size or resolution
98.
METHODS AND APPARATUS OF VIEWPORT ADAPTIVE 360 DEGREE VIDEO DELIVERY
Systems, methods, and instrumentalities are disclosed for client centric service quality control. A first viewport of a 360 degree video may be determined. The 360 degree video may comprise one or more of an equirectangular, a cube-map, a cylindrical, a pyramidal, and/or a spherical projection mapping. The first viewport may be associated with a spatial region of the 360 degree video. An adjacent area that extends around the spatial region may be determined. A second viewport of the 360 degree video may be determined. A bitstream associated with the 360 degree video may be received. One or more enhanced regions may be included in the bitstream. The one or more enhanced regions may correspond to the first and/or second viewport, A high coding bitrate may be associated with the first viewport and/or the second viewport.
H04N 19/597 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
H04N 21/2343 - Processing of video elementary streams, e.g. splicing of video streams or manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
H04N 21/8543 - Content authoring using a description language, e.g. MHEG [Multimedia and Hypermedia information coding Expert Group] or XML [eXtensible Markup Language]
Systems, methods, and instrumentalities are disclosed for managing a service quality for data consumption with a wireless transmit/receive unit (WTRU), comprising determining a cost associated with obtaining the data, determining an amount of unused data in a monthly data plan, determining a preference for a content type related to the data: determining an amount of congestion in a network over which the data will be received, determining a desired service quality value based upon the cost, unused data, preference, and network congestion, comparing the desired service quality value to a set of representations of the data, wherein each of the representations is associated with a different service quality (for example, each of the representations may have an associated bitrate, and wherein each bitrate may be associated with a different service quality), and requesting the data at a representation having a quality closest to the desired service quality value.
Systems and methods are described for enabling a client device to request video streams with different bit depth remappings for different viewing conditions. In an embodiment, information indicating the availability of additional remapped profiles is sent in a manifest file. Alternative bit-depth remappings may be optimized for different regions of interest in the image or video content, or for different viewing conditions, such as different display technologies and different ambient illumination. Some embodiments based on the DASH protocol perform multiple depth mappings at the encoder and also perform ABR-encoding for distribution. The manifest file contains information indicating additional remapping profiles. The remapping profiles are associated with different transformation functions used to convert from a higher bit-depth to a lower bit-depth.
H04N 21/2343 - Processing of video elementary streams, e.g. splicing of video streams or manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
H04N 21/258 - Client or end-user data management, e.g. managing client capabilities, user preferences or demographics or processing of multiple end-users preferences to derive collaborative data
H04N 19/102 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
H04N 19/16 - Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter for a given display mode, e.g. for interlaced or progressive display mode
H04N 19/179 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a scene or a shot