An electronic device for encoding a picture is described. The electronic device includes a processor and instructions stored in memory that are in electronic communication with the processor. The instructions are executable to encode a step-wise temporal sub-layer access (STSA) sample grouping. The instructions are further executable to send and/or store the STSA sample grouping.
H04N 19/31 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability in the temporal domain
H04N 19/172 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
H04N 21/2343 - Processing of video elementary streams, e.g. splicing of video streams or manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
H04N 21/235 - Processing of additional data, e.g. scrambling of additional data or processing content descriptors
2.
SIGNALING SCALABILITY INFORMATION IN A PARAMETER SET
A system for decoding a video bitstream includes receiving a frame of the video that includes at least one slice and at least one tile and where each of the at least one slice and the at least one tile are not all aligned with one another.
H04N 19/44 - Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
H04N 13/161 - Encoding, multiplexing or demultiplexing different image signal components
H04N 19/597 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
H04N 19/30 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
H04N 19/112 - Selection of coding mode or of prediction mode according to a given display mode, e.g. for interlaced or progressive display mode
H04N 19/103 - Selection of coding mode or of prediction mode
A method for coding includes; segmenting an image into blocks; grouping blocks into a number of subsets; coding, using an entropy coding module, each subset, by associating digital information with symbols of each block of a subset, including, for the first block of the image, initializing state variables of the coding module; and generating a data sub-stream representative of at least one of the coded subsets of blocks. Where a current block is the first block to be coded of a subset, symbol occurrence probabilities for the first current block are determined based on those for a coded and decoded predetermined block of at least one other subset. Where the current block is the last coded block of the subset: writing, in the sub-stream representative of the subset, the entire the digital information associated with the symbols during coding of the blocks of the subset, and implementing the initializing sub-step.
H04N 19/13 - Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
H04N 19/196 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding being specially adapted for the computation of encoding parameters, e.g. by averaging previously computed encoding parameters
H04N 19/50 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
H04N 19/503 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
H04N 19/51 - Motion estimation or motion compensation
H04N 19/61 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
H04N 19/593 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques
H04N 19/137 - Motion inside a coding unit, e.g. average field, frame or block difference
H04N 19/174 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a slice, e.g. a line of blocks or a group of blocks
H04N 19/44 - Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
H04N 19/436 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals - characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation using parallelised computational arrangements
H04N 19/625 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using discrete cosine transform [DCT]
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
H04N 19/91 - Entropy coding, e.g. variable length coding [VLC] or arithmetic coding
H04N 19/136 - Incoming video signal characteristics or properties
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
H04N 19/119 - Adaptive subdivision aspects e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
5.
Method of Coding and Decoding Images, Coding and Decoding Device and Computer Programs Corresponding Thereto
A method of coding at least one image comprising the steps of splitting the image into a plurality of blocks, of grouping said blocks into a predetermined number of subsets of blocks, of coding each of said subsets of blocks in parallel, the blocks of a subset considered being coded according to a predetermined sequential order of traversal. The coding step comprises, for a current block of a subset considered, the sub-step of predictive coding of said current block with respect to at least one previously coded and decoded block, and the sub-step of entropy coding of said current block on the basis of at least one probability of appearance of a symbol.
H04N 19/593 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques
H04N 19/50 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
H04N 19/13 - Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
H04N 19/61 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
H04N 19/91 - Entropy coding, e.g. variable length coding [VLC] or arithmetic coding
H04N 19/174 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a slice, e.g. a line of blocks or a group of blocks
H04N 19/436 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals - characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation using parallelised computational arrangements
H04N 19/25 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding with scene description coding, e.g. binary format for scenes [BIFS] compression
H04N 19/184 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being bits, e.g. of the compressed video stream
H04N 19/51 - Motion estimation or motion compensation
6.
ACOUSTIC ECHO CANCELLATION CONTROL FOR DISTRIBUTED AUDIO DEVICES
An audio processing method may involve receiving output signals from each microphone of a plurality of microphones in an audio environment, the output signals corresponding to a current utterance of a person and determining, based on the output signals, one or more aspects of context information relating to the person, including an estimated current proximity of the person to one or more microphone locations. The method may involve selecting two or more loudspeaker-equipped audio devices based, at least in part, on the one or more aspects of the context information, determining one or more types of audio processing changes to apply to audio data being rendered to loudspeaker feed signals for the audio devices and causing one or more types of audio processing changes to be applied. In some examples, the audio processing changes have the effect of increasing a speech to echo ratio at one or more microphones.
H04M 9/08 - Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
G10L 15/22 - Procedures used during a speech recognition process, e.g. man-machine dialog
A system utilizing a high throughput coding mode for CABAC in HEVC is described. The system may include an electronic device configured to obtain a block of data to be encoded using an arithmetic based encoder; to generate a sequence of syntax elements using the obtained block; to compare an Absolute-3 value of the sequence or a parameter associated with the Absolute-3 value to a preset value; and to convert the Absolute-3 value to a codeword using a first code or a second code that is different than the first code, according to a result of the comparison.
H04N 19/60 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
H03M 7/40 - Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
H04N 19/91 - Entropy coding, e.g. variable length coding [VLC] or arithmetic coding
H04N 19/44 - Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
A method of audio processing includes generating harmonics in a hybrid complex quadrature mirror filter domain. Generating the harmonics may include multiplication, using a feedback delay loop, and dynamic compression. The harmonics may be generated based on one or more hybrid sub-bands of the complex transform domain signal.
An audio session management method may involve: determining, by an audio session manager, one or more first media engine capabilities of a first media engine of a first smart audio device, the first media engine being configured for managing one or more audio media streams received by the first smart audio device and for performing first smart audio device signal processing for the one or more audio media streams according to a first media engine sample clock; receiving, by the audio session manager and via a first application communication link, first application control signals from the first application; and controlling the first smart audio device according to the first media engine capabilities, by the audio session manager, via first audio session management control signals transmitted to the first smart audio device via a first smart audio device communication link and without reference to the first media engine sample clock.
H04S 7/00 - Indicating arrangements; Control arrangements, e.g. balance control
H04R 1/32 - Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
H04R 1/40 - Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers
Described herein is a method of processing audio content for rendering in a three-dimensional audio scene, wherein the audio content comprises a sound source at a source position, the method comprising: obtaining a voxelized representation of the three-dimensional audio scene, wherein the voxelized representation indicates volume elements in which sound can propagate and volume elements by which sound is occluded; generating a two-dimensional projection map for the audio scene based on the voxelized representation by applying a projection operation to the voxelized representation that projects onto a horizontal plane; and determining parameters indicating a virtual source position of a virtual sound source based on the source position, a listener position, and the projection map, to simulate, by rendering a virtual source signal from the virtual source position, an impact of acoustic diffraction by the three-dimensional audio scene on a source signal of the sound source at the source position. Described are moreover a corresponding apparatus as well as corresponding computer program products.
Embodiments are disclosed for automatic leveling of speech content. In an embodiment, a method comprises: receiving, using one or more processors, frames of an audio recording including speech and non-speech content; for each frame: determining, using the one or more processors, a speech probability; analyzing, using the one or more processors, a perceptual loudness of the frame; obtaining, using the one or more processors, a target loudness range for the frame; computing, using the one or more processors, gains to apply to the frame based on the target loudness range and the perceptual loudness analysis, where the gains include dynamic gains that change frame-by-frame and that are scaled based on the speech probability; and applying the gains to the frame so that a resulting loudness range of the speech content in the audio recording fits within the target loudness range.
G10L 21/0364 - Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility
G10L 25/30 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique using neural networks
G10L 25/21 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the type of extracted parameters the extracted parameters being power information
G10L 25/84 - Detection of presence or absence of voice signals for discriminating voice from noise
G10L 17/20 - Pattern transformations or operations aimed at increasing system robustness, e.g. against channel noise or different working conditions
G10L 21/028 - Voice signal separating using properties of sound source
A system utilizing a high throughput coding mode for CABAC in HEVC is described. The system may include an electronic device configured to obtain a block of data to be encoded using an arithmetic based encoder; to generate a sequence of syntax elements using the obtained block; to compare an Absolute-3 value of the sequence or a parameter associated with the Absolute-3 value to a preset value; and to convert the Absolute-3 value to a codeword using a first code or a second code that is different than the first code, according to a result of the comparison.
H04N 19/60 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
H03M 7/40 - Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
H04N 19/91 - Entropy coding, e.g. variable length coding [VLC] or arithmetic coding
H04N 19/44 - Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
Methods and systems for improving coding decoding efficiency of video by providing a syntax modeler, a buffer, and a decoder. The syntax modeler may associate a first sequence of symbols with syntax elements. The buffer may store tables, each represented by a symbol in the first sequence, and each used to associate a respective symbol in a second sequence of symbols with encoded data. The decoder decodes the data into a bitstream using the second sequence retrieved from a table.
H04N 19/13 - Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
H04N 19/91 - Entropy coding, e.g. variable length coding [VLC] or arithmetic coding
H03M 7/40 - Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
H03M 7/42 - Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code using table look-up for the coding or decoding process, e.g. using read-only memory
H03M 7/30 - Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
H04N 19/52 - Processing of motion vectors by encoding by predictive encoding
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
H04N 19/46 - Embedding additional information in the video signal during the compression process
H04N 19/61 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
H04N 19/15 - Data rate or code amount at the encoder output by monitoring actual compressed data size at the memory before deciding storage at the transmission buffer
Embodiments are disclosed for noise floor estimation and noise reduction, In an embodiment, a method comprises: obtaining an audio signal; dividing the audio signal into a plurality of buffers; determining time-frequency samples for each buffer of the audio signal; for each buffer and for each frequency, determining a median (or mean) and a measure of an amount of variation of energy based on the samples in the buffer and samples in neighboring buffers that together span a specified time range of the audio signal; combining the median (or mean) and the measure of the amount of variation of energy into a cost function; for each frequency: determining a signal energy of a particular buffer of the audio signal that corresponds to a minimum value of the cost function; selecting the signal energy as the estimated noise floor of the audio signal; and reducing, using the estimated noise floor, noise in the audio signal.
A method for decoding a video bitstream is disclosed. The method comprises: entropy decoding a first portion of a video bitstream, wherein first portion of video bitstream is associated with a video frame, thereby producing a first portion of decoded data; entropy decoding a second portion of video bitstream, wherein second portion of video bitstream is associated with video frame, thereby producing a second portion of decoded data, wherein entropy decoding second portion of video bitstream is independent of entropy decoding first portion of video bitstream; and reconstructing a first portion of video frame associated with video bitstream using first portion of decoded data and second portion of decoded data.
H04N 19/91 - Entropy coding, e.g. variable length coding [VLC] or arithmetic coding
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
H04N 19/119 - Adaptive subdivision aspects e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
H04N 19/15 - Data rate or code amount at the encoder output by monitoring actual compressed data size at the memory before deciding storage at the transmission buffer
H04N 19/192 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding the adaptation method, adaptation tool or adaptation type being iterative or recursive
H04N 19/46 - Embedding additional information in the video signal during the compression process
H04N 19/61 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
H04N 19/174 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a slice, e.g. a line of blocks or a group of blocks
H04N 19/17 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
H04N 19/44 - Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
H04N 19/436 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals - characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation using parallelised computational arrangements
H04N 19/159 - Prediction type, e.g. intra-frame, inter-frame or bidirectional frame prediction
H04N 19/43 - Hardware specially adapted for motion estimation or compensation
H04N 19/136 - Incoming video signal characteristics or properties
H04N 19/40 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video transcoding, i.e. partial or full decoding of a coded input stream followed by re-encoding of the decoded output stream
H04N 19/172 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
H04N 19/80 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals - Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation
H04N 19/184 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being bits, e.g. of the compressed video stream
16.
Apparatus and method for processing an input audio signal using cascaded filterbanks
Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung (Germany)
Dolby International AB (Netherlands)
Inventor
Villemoes, Lars
Ekstrand, Per
Disch, Sascha
Nagel, Frederik
Wilde, Stephan
Abstract
An apparatus for processing an input audio signal relies on a cascade of filterbanks, the cascade having a synthesis filterbank for synthesizing an audio intermediate signal from the input audio signal, the input audio signal being represented by a plurality of first subband signals generated by an analysis filterbank, wherein a number of filterbank channels of the synthesis filterbank is smaller than a number of channels of the analysis filterbank. The apparatus furthermore has a further analysis filterbank for generating a plurality of second subband signals from the audio intermediate signal, wherein the further analysis filterbank has a number of channels being different from the number of channels of the synthesis filterbank, so that a sampling rate of a subband signal of the plurality of second subband signals is different from a sampling rate of a first subband signal of the plurality of first subband signals.
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
G10L 21/038 - Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques
G10L 19/02 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
17.
METHODS, APPARATUS AND SYSTEMS FOR 6DOF AUDIO RENDERING AND DATA REPRESENTATIONS AND BITSTREAM STRUCTURES FOR 6DOF AUDIO RENDERING
The present disclosure relates to methods, apparatus and systems for encoding an audio signal into a bitstream, in particular at an encoder, comprising: encoding or including audio signal data associated with 3DoF audio rendering into one or more first bitstream parts of the bitstream, and encoding or including metadata associated with 6DoF audio rendering into one or more second bitstream parts of the bitstream. The present disclosure further relates to methods, apparatus and systems for decoding an audio signal and audio rendering based on the bitstream.
H04S 7/00 - Indicating arrangements; Control arrangements, e.g. balance control
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
Described herein is a method for controlling media data playout on a client device, wherein the method includes the steps of: (a) retrieving, by the client device, media data comprising a plurality of segments subdivided into one or more chunks for playout from at least one media server; (b) analyzing a current chunk of the one or more chunks of a current segment; and (c) adapting the playout of the media data in response to the result of the analysis prior to fully retrieving the current chunk. Described herein are further a client device having implemented a media player application configured to perform said method and a computer program product with instructions adapted to cause a device having processing capability to carry out said method.
H04N 21/4402 - Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
H04N 21/845 - Structuring of content, e.g. decomposing content into time segments
19.
COMPANDING SYSTEM AND METHOD TO REDUCE QUANTIZATION NOISE USING ADVANCED SPECTRAL EXTENSION
Embodiments are directed to a companding method and system for reducing coding noise in an audio codec. A compression process reduces an original dynamic range of an initial audio signal through a compression process that divides the initial audio signal into a plurality of segments using a defined window shape, calculates a wideband gain in the frequency domain using a non-energy based average of frequency domain samples of the initial audio signal, and applies individual gain values to amplify segments of relatively low intensity and attenuate segments of relatively high intensity. The compressed audio signal is then expanded back to the substantially the original dynamic range that applies inverse gain values to amplify segments of relatively high intensity and attenuating segments of relatively low intensity. A QMF filterbank is used to analyze the initial audio signal to obtain a frequency domain representation.
H04B 1/66 - TRANSMISSION - Details of transmission systems not characterised by the medium used for transmission for improving efficiency of transmission
G10L 19/02 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
G10L 25/18 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
G10L 19/032 - Quantisation or dequantisation of spectral components
A method for adaptive streaming of media content with bitrate switching is described, wherein the media content comprising a plurality of consecutive media segments. The method comprising, at a media streaming server: transmitting a segment of the media content encoded in a first coding mode having a first bitrate; receiving an indication for a coding mode switch to a second coding mode having a second bitrate and in response transmitting a transition segment for transitioning between the first coding mode and the second coding mode; and transmitting another segment of the media content encoded in the second coding mode.
H04N 21/2662 - Controlling the complexity of the video stream, e.g. by scaling the resolution or bitrate of the video stream based on the client capabilities
21.
VIDEO DECODER WITH REDUCED DYNAMIC RANGE TRANSFORM WITH INVERSE TRANSFORM SHIFTING MEMORY
A method for decoding video includes receiving quantized coefficients representative of a block of video representative of a plurality of pixels. The quantized coefficients are dequantized based upon a function of a remainder. The dequantized coefficients are inverse transformed to determine a decoded residue.
H04N 19/132 - Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
H04N 19/184 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being bits, e.g. of the compressed video stream
H04N 19/44 - Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
H04N 19/182 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
22.
SYSTEMS, METHODS AND APPARATUS FOR CONVERSION FROM CHANNEL-BASED AUDIO TO OBJECT-BASED AUDIO
Embodiments are disclosed for channel-based audio (CBA) (e.g., 22.2-ch audio) to object-based audio (OBA) conversion. The conversion includes converting CBA metadata to object audio metadata (OAMD) and reordering the CBA channels based on channel shuffle information derived in accordance with channel ordering constraints of the OAMD. The OBA with reordered channels is rendered in a playback device using the OAMD or in a source device, such as a set-top box or audio/video recorder. In an embodiment, the CBA metadata includes signaling that indicates a specific OAMD representation to be used in the conversion of the metadata. In an embodiment, pre-computed OAMD is transmitted in a native audio bitstream (e.g., AAC) for transmission (e.g., over HDMI) or for rendering in a source device. In an embodiment, pre-computed OAMD is transmitted in a transport layer bitstream (e.g., ISO BMFF, MPEG4 audio bitstream) to a playback device or source device.
A projection system and method includes a light source configured to emit a light in response to an image data; a phase light modulator configured to receive the light from the light source and to apply a spatially-varying phase modulation on the light; and a controller configured to determine, for a frame of the image data, a plurality of phase configurations, respective ones of the plurality of phase configurations corresponding to solutions of a phase algorithm and representing the same image with a different modulation pattern, and provide a phase control signal to the phase light modulator, the phase control signal configured to cause the phase light modulator to modulate the plurality of phase configurations in a time-divisional manner within a time period of the frame, thereby to project a series of subframes within the time period.
G10L 19/022 - Blocking, i.e. grouping of samples in time; Choice of analysis windows; Overlap factoring
G10L 19/24 - Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding
G10L 21/038 - Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques
G10L 19/02 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
25.
METHODS AND DEVICES FOR PROVIDING PERSONALIZED AUDIO TO A USER
The present application describes a method (400) for providing personalized audio to a user. The method (400) comprises receiving (401) a manifest file (140) for a media element from which audio is to be rendered, wherein the manifest file (140) comprises a description (141) for a plurality of different presentations (152) of audio content of the media element. In addition, the method (400) comprises selecting (402) a presentation (152) from the plurality of presentations (152) based on the manifest file (140). The method (400) further comprises receiving (403) a list of audio track objects comprised within the media element, and selecting (404) an audio track object from the list of audio track objects, in dependence of the selected presentation (152).
H04N 21/2343 - Processing of video elementary streams, e.g. splicing of video streams or manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
H04N 21/439 - Processing of audio elementary streams
H04N 21/2662 - Controlling the complexity of the video stream, e.g. by scaling the resolution or bitrate of the video stream based on the client capabilities
H04N 21/84 - Generation or processing of descriptive data, e.g. content descriptors
H04N 21/8352 - Generation of protective data, e.g. certificates involving content or source identification data, e.g. UMID [Unique Material Identifier]
26.
METHODS, APPARATUS AND SYSTEM FOR RENDERING AN AUDIO PROGRAM
A method for generating a bitstream indicative of an object based audio program is described. The bitstream comprises a sequence of containers. A first container of the sequence of containers comprises a plurality of substream entities for a plurality of substreams of the object based audio program and a presentation section. The method comprises determining a set of object channels. The method further comprises providing a set of object related metadata for the set of object channels. In addition, the method comprises inserting a first set of object channel frames and a first set of object related metadata frames into a respective set of substream entities of the first container. Furthermore, the method comprises inserting presentation data into the presentation section.
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
G11B 27/034 - Electronic editing of digitised analogue information signals, e.g. audio or video signals on discs
G11B 27/32 - Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording on separate auxiliary tracks of the same or an auxiliary record carrier
G10L 19/018 - Audio watermarking, i.e. embedding inaudible data in the audio signal
A speech separation server comprises a deep-learning encoder with nonlinear activation. The encoder is programmed to take a mixture audio waveform in the time domain, learn generalized patterns from the mixture audio waveform, and generate an encoded representation that effectively characterizes the mixture audio waveform for speech separation.
G10L 21/028 - Voice signal separating using properties of sound source
G10L 25/30 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique using neural networks
G06N 3/04 - Architecture, e.g. interconnection topology
28.
METHODS AND SYSTEM FOR WAVEFORM CODING OF AUDIO SIGNALS WITH A GENERATIVE MODEL
Described herein is a method of waveform decoding, the method including the steps of: (a) receiving, by a waveform decoder, a bitstream including a finite bitrate representation of a source signal; (b) waveform decoding the finite bitrate representation of the source signal to obtain a waveform approximation of the source signal; (c) providing the waveform approximation of the source signal to a generative model that implements a probability density function, to obtain a probability distribution for a reconstructed signal of the source signal; and (d) generating the reconstructed signal of the source signal based on the probability distribution. Described are further a method and system for waveform coding and a method of training a generative model.
G10L 19/00 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
29.
Picture coding method, picture decoding method, picture coding apparatus, picture decoding apparatus, and program thereof
A picture coding method of the present invention codes a picture signal and a ratio of a number of luminance pixels and a number of chrominance pixels for the picture signal, and then one coding method out of at least two coding methods is selected depending on the ratio. Next, data related to a picture size is coded in accordance with the selected coding method. The data related to the picture size indicates a size of the picture corresponding to the picture signal or an output area, which is a pixel area to be outputted in decoding in a whole pixel area coded in the picture signal coding.
H04N 7/12 - Systems in which the television signal is transmitted via one channel or a plurality of parallel channels, the bandwidth of each channel being less than the bandwidth of the television signal
H04N 19/16 - Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter for a given display mode, e.g. for interlaced or progressive display mode
H04N 19/103 - Selection of coding mode or of prediction mode
H04N 19/105 - Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
H04N 19/122 - Selection of transform size, e.g. 8x8 or 2x4x8 DCT; Selection of sub-band transforms of varying structure or type
H04N 19/182 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
H04N 19/186 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
H04N 19/59 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial sub-sampling or interpolation, e.g. alteration of picture size or resolution
H04N 19/85 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
30.
TRANSFORMING AUDIO SIGNALS CAPTURED IN DIFFERENT FORMATS INTO A REDUCED NUMBER OF FORMATS FOR SIMPLIFYING ENCODING AND DECODING OPERATIONS
The disclosed embodiments enable converting audio signals captured in various formats by various capture devices into a limited number of formats that can be processed by an audio codec (e.g., an Immersive Voice and Audio Services (IVAS) codec). In an embodiment, a simplification unit of the audio device receives an audio signal captured by one or more audio capture devices coupled to the audio device. The simplification unit determines whether the audio signal is in a format that is supported/not supported by an encoding unit of the audio device. Based on the determining, the simplification unit, converts the audio signal into a format that is supported by the encoding unit. In an embodiment, if the simplification unit determines that the audio signal is in a spatial format, the simplification unit can convert the audio signal into a spatial “mezzanine” format supported by the encoding.
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
The present document relates to audio source coding systems which make use of a harmonic transposition method for high frequency reconstruction (HFR), as well as to digital effect processors, e.g. exciters, where generation of harmonic distortion add brightness to the processed signal, and to time stretchers where a signal duration is prolonged with maintained spectral content. A system and method configured to generate a time stretched and/or frequency transposed signal from an input signal is described. The system comprises an analysis filterbank configured to provide an analysis subband signal from the input signal; wherein the analysis subband signal comprises a plurality of complex valued analysis samples, each having a phase and a magnitude. Furthermore, the system comprises a subband processing unit configured to determine a synthesis subband signal from the analysis subband signal using a subband transposition factor Q and a subband stretch factor S. The subband processing unit performs a block based nonlinear processing wherein the magnitude of samples of the synthesis subband signal are determined from the magnitude of corresponding samples of the analysis subband signal and a predetermined sample of the analysis subband signal. In addition, the system comprises a synthesis filterbank configured to generate the time stretched and/or frequency transposed signal from the synthesis subband signal.
G10L 21/038 - Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques
G10L 19/02 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
G10L 19/022 - Blocking, i.e. grouping of samples in time; Choice of analysis windows; Overlap factoring
G10L 25/18 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
G10L 19/032 - Quantisation or dequantisation of spectral components
32.
DYNAMICS PROCESSING ACROSS DEVICES WITH DIFFERING PLAYBACK CAPABILITIES
Individual loudspeaker dynamics processing configuration data, for each of a plurality of loudspeakers of a listening environment, may be obtained. Listening environment dynamics processing configuration data may be determined, based on the individual loudspeaker dynamics processing configuration data. Dynamics processing may be performed on received audio data based on the listening environment dynamics processing configuration data, to generate processed audio data. The processed audio data may be rendered for reproduction via a set of loudspeakers that includes at least some of the plurality of loudspeakers, to produce rendered audio signals. The rendered audio signals may be provided to, and reproduced by, the set of loudspeakers.
Many portable playback devices cannot decode and playback encoded audio content having wide bandwidth and wide dynamic range with consistent loudness and intelligibility unless the encoded audio content has been prepared specially for these devices. This problem can be overcome by including with the encoded content some metadata that specifies a suitable dynamic range compression profile by either absolute values or differential values relative to another known compression profile. A playback device may also adaptively apply gain and limiting to the playback audio. Implementations in encoders, in transcoders and in decoders are disclosed.
G10L 19/02 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
H03G 7/00 - Volume compression or expansion in amplifiers
H03G 3/32 - Automatic control in amplifiers having semiconductor devices the control being dependent upon ambient noise level or sound level
An audio session management method for an audio environment having multiple audio devices may involve receiving, from a first device implementing a first application and by a device implementing an audio session manager, a first route initiation request to initiate a first route for a first audio session. The first route initiation request may indicate a first audio source and a first audio environment destination. The first audio environment destination may correspond with at least a first person in the audio environment, but in some instances will not indicate an audio device. The method may involve establishing a first route corresponding to the first route initiation request. Establishing the first route may involve determining a first location of at least the first person in the audio environment, determining at least one audio device for a first stage of the first audio session and initiating or scheduling the first audio session.
A rendering mode may be determined for received audio data, including audio signals and associated spatial data. The audio data may be rendered for reproduction via a set of loudspeakers of an environment according to the rendering mode, to produce rendered audio signals. Rendering the audio data may involve determining relative activation of a set of loudspeakers in an environment. The rendering mode may be variable between a reference spatial mode and one or more distributed spatial modes. The reference spatial mode may have an assumed listening position and orientation. In the distributed spatial mode(s), one or more elements of the audio data may each be rendered in a more spatially distributed manner than in the reference spatial mode and spatial locations of remaining elements of the audio data may be warped such that they span a rendering space of the environment more completely than in the reference spatial mode.
Some disclosed methods involve encoding or decoding directional audio data. Some encoding methods may involve receiving a mono audio signal corresponding to an audio object and a representation of a radiation pattern corresponding to the audio object. The radiation pattern may include sound levels corresponding to plurality of sample times, a plurality of frequency bands and a plurality of directions. The methods may involve encoding the mono audio signal and encoding the source radiation pattern to determine radiation pattern metadata. Encoding the radiation pattern may involve determining a spherical harmonic transform of the representation of the radiation pattern and compressing the spherical harmonic transform to obtain encoded radiation pattern metadata.
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
H04S 7/00 - Indicating arrangements; Control arrangements, e.g. balance control
37.
RENDERING AUDIO OVER MULTIPLE SPEAKERS WITH MULTIPLE ACTIVATION CRITERIA
Methods for rendering audio for playback by two or more speakers are disclosed. The audio includes one or more audio signals, each with an associated intended perceived spatial position. Relative activation of the speakers may be a cost function of a model of perceived spatial position of the audio signals when played back over the speakers, a measure of proximity of the intended perceived spatial position of the audio signals to positions of the speakers, and one or more additional dynamically configurable functions. The dynamically configurable functions may be based on at least one or more properties of the audio signals, one or more properties of the set of speakers and/or one or more external inputs.
Embodiments relate to an audio processing unit that includes a buffer, bitstream payload deformatter, and a decoding subsystem. The buffer stores at least one block of an encoded audio bitstream. The block includes a fill element that begins with an identifier followed by fill data. The fill data includes at least one flag identifying whether enhanced spectral band replication (eSBR) processing is to be performed on audio content of the block. A corresponding method for decoding an encoded audio bitstream is also provided.
G10L 19/24 - Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding
G10L 21/038 - Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques
A reference picture information decoding unit (13) omits decoding of a reference list sorting presence or absence flag and/or a reference list sorting order based on the number of current picture referable pictures.
H04N 19/44 - Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
H04N 19/50 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
H04N 19/51 - Motion estimation or motion compensation
H04N 19/52 - Processing of motion vectors by encoding by predictive encoding
H04N 19/537 - Motion estimation other than block-based
H04N 19/573 - Motion compensation with multiple frame prediction using two or more reference frames in a given prediction direction
H04N 19/577 - Motion compensation with bidirectional frame interpolation, i.e. using B-pictures
H04N 19/61 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
The present document relates to a method of layered encoding of a frame of a compressed higher-order Ambisonics, HOA, representation of a sound or sound field. The compressed HOA representation comprises a plurality of transport signals. The method comprises assigning the plurality of transport signals to a plurality of hierarchical layers, the plurality of layers including a base layer and one or more hierarchical enhancement layers, generating, for each layer, a respective HOA extension payload including side information for parametrically enhancing a reconstructed HOA representation obtainable from the transport signals assigned to the respective layer and any layers lower than the respective layer, assigning the generated HOA extension payloads to their respective layers, and signaling the generated HOA extension payloads in an output bitstream. The present document further relates to a method of decoding a frame of a compressed HOA representation of a sound or sound field, an encoder and a decoder for layered coding of a compressed HOA representation, and a data structure representing a frame of a compressed HOA representation of a sound or sound field.
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
G10L 19/24 - Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding
41.
LAYERED CODING FOR COMPRESSED SOUND OR SOUND FIELD REPRESENTENTATIONS
The present document relates to a method of layered encoding of a compressed sound representation of a sound or sound field. The compressed sound representation comprises a basic compressed sound representation comprising a plurality of components, basic side information for decoding the basic compressed sound representation to a basic reconstructed sound representation of the sound or sound field, and enhancement side information including parameters for improving the basic reconstructed sound representation. The method comprises sub-dividing the plurality of components into a plurality of groups of components and assigning each of the plurality of groups to a respective one of a plurality of hierarchical layers, the number of groups corresponding to the number of layers, and the plurality of layers including a base layer and one or more hierarchical enhancement layers, adding the basic side information to the base layer, and determining a plurality of portions of enhancement side information from the enhancement side information and assigning each of the plurality of portions of enhancement side information to a respective one of the plurality of layers, wherein each portion of enhancement side information includes parameters for improving a reconstructed sound representation obtainable from data included in the respective layer and any layers lower than the respective layer. The document further relates to a method of decoding a compressed sound representation of a sound or sound field, wherein the compressed sound representation is encoded in a plurality of hierarchical layers that include a base layer and one or more hierarchical enhancement layers, as well as to an encoder and a decoder for layered coding of a compressed sound representation.
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
G10L 19/24 - Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding
H04S 7/00 - Indicating arrangements; Control arrangements, e.g. balance control
42.
METHODS AND DEVICES FOR GENERATION AND PROCESSING OF MODIFIED AUDIO BITSTREAMS
Described herein is a method for generating a modified bitstream on a source device, wherein the method includes the steps of: a) receiving, by a receiver, a bitstream including coded media data; b) generating, by an embedder, payload of additional media data and embedding the payload in the bitstream for obtaining, as an output from the embedder, a modified bitstream including the coded media data and the payload of the additional media data; and c) outputting the modified bitstream to a sink device. Described is further a method for processing said modified bitstream on a sink device. Described are moreover a respective source device and sink device as well as a system of a source device and a sink device and respective computer program products.
G10L 19/083 - Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being an excitation gain
Described herein is a method of encoding an audio signal. The method comprises: generating a plurality of subband audio signals based on the audio signal; determining a spectral envelope of the audio signal; for each subband audio signal, determining autocorrelation information for the subband audio signal based on an autocorrelation function of the subband audio signal; and generating an encoded representation of the audio signal, the encoded representation comprising a representation of the spectral envelope of the audio signal and a representation of the autocorrelation information for the plurality of subband audio signals. Further described are methods of decoding the audio signal from the encoded representation, as well as corresponding encoders, decoders, computer programs, and computer-readable recording media.
G10L 19/02 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
G10L 25/18 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
G10L 19/24 - Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding
G10L 25/06 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the type of extracted parameters the extracted parameters being correlation coefficients
44.
MANAGING PLAYBACK OF MULTIPLE STREAMS OF AUDIO OVER MULTIPLE SPEAKERS
A multi-stream rendering system and method may render and play simultaneously a plurality of audio program streams over a plurality of arbitrarily placed loudspeakers. At least one of the program streams may be a spatial mix. The rendering of said spatial mix may be dynamically modified as a function of the simultaneous rendering of one or more additional program streams. The rendering of one or more additional program streams may be dynamically modified as a function of the simultaneous rendering of the spatial mix.
A method is provided for coding at least one image split up into partitions, a current partition to be coded containing data, at least one data item of which is allotted a sign. The coding method includes, for the current partition, the following steps: calculating the value of a function representative of the data of the current partition with the exclusion of the sign; comparing the calculated value with a predetermined value of the sign; as a function of the result of the comparison, modifying or not modifying at least one of the data items of the current partition, in the case of modification, coding the at least one modified data item.
H04N 19/50 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
H04N 19/463 - Embedding additional information in the video signal during the compression process by compressing encoding parameters before transmission
H04N 19/467 - Embedding additional information in the video signal during the compression process characterised by the embedded information being invisible, e.g. watermarking
H04N 19/91 - Entropy coding, e.g. variable length coding [VLC] or arithmetic coding
H04N 19/117 - Filters, e.g. for pre-processing or post-processing
H04N 19/18 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a set of transform coefficients
H04N 19/48 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using compressed domain processing techniques other than decoding, e.g. modification of transform coefficients, variable length coding [VLC] data or run-length data
H04N 19/147 - Data rate or code amount at the encoder output according to rate distortion criteria
H04N 19/46 - Embedding additional information in the video signal during the compression process
H04N 19/196 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding being specially adapted for the computation of encoding parameters, e.g. by averaging previously computed encoding parameters
H04N 19/167 - Position within a video image, e.g. region of interest [ROI]
H04N 19/85 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
H04N 19/13 - Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
46.
AUDIO DE-ESSER INDEPENDENT OF ABSOLUTE SIGNAL LEVEL
Methods, systems, and computer program products of automatic de-essing are disclosed. An automatic de-esser can be used without manually setting parameters and can perform reliable sibilance detection and reduction regardless of absolute signal level, singer gender and other extraneous factors. An audio processing device divides input audio signals into buffers each containing a number of samples, the buffers overlapping one another. The audio processing device transforms each buffer from the time domain into the frequency domain and implements de-essing as a multi-band compressor that only acts on a designated sibilance band. The audio processing device determines an amount of attenuation in the sibilance band based on comparison of energy level in sibilance band of a buffer to broadband energy level in a previous buffer. The amount of attenuation is also determined based on a zero-crossing rate, as well as a slope and onset of a compression curve.
G10L 21/0264 - Noise filtering characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
H03G 3/30 - Automatic control in amplifiers having semiconductor devices
The present document discloses a method for playback of media content via a delivery channel. The delivery channel may generally refer to the channels through which audio or video programs are delivered (transmitted) to the user (receiver). The media content may generally comprise consecutive media programs. In particular, for a specific media program within the media content, a respective content type for that specific media program is also provided. The method may comprise receiving an indication of the sensitivity of a media program to playback latency. The method may further comprise receiving at least a portion of the media program. The method may yet further comprise adapting the playback of the media program based on the indication of its sensitivity to playback latency.
H04N 21/4402 - Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
H04N 21/435 - Processing of additional data, e.g. decrypting of additional data or reconstructing software from modules extracted from the transport stream
48.
SELECTABLE LINEAR PREDICTIVE OR TRANSFORM CODING MODES WITH ADVANCED STEREO CODING
Methods and systems for advanced stereo processing of an audio signal are disclosed. The methods and systems include selecting a coding mode of either transform coding or linear predictive coding and performing advanced stereo processing when in the selected coding mode. Both encoding and decoding operations are provided.
H04S 5/02 - Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation of the pseudo four-channel type, e.g. in which rear channel signals are derived from two-channel stereo signals
H04S 5/00 - Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation
H04S 3/02 - Systems employing more than two channels, e.g. quadraphonic of the matrix type, i.e. in which input signals are combined algebraically, e.g. after having been phase shifted with respect to each other
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
49.
METHODS AND DEVICES FOR GENERATION AND PROCESSING OF MODIFIED BITSTREAMS
Described herein is a method for generating a modified bitstream on a source device, wherein the method includes the steps of: a) receiving, by a receiver, a bitstream including coded media data; b) generating, by an embedder, payload of additional media data and embedding the payload in the bitstream for obtaining, as an output from the embedder, a modified bitstream including the coded media data and the payload of the additional media data; and d) outputting the modified bitstream to a sink device. Described is further a method for processing said modified bitstream on a sink device. Described are moreover a respective source device and sink device as well as a system of a source device and a sink device and respective computer program products.
G10L 19/018 - Audio watermarking, i.e. embedding inaudible data in the audio signal
H04W 4/80 - Services using short range communication, e.g. near-field communication [NFC], radio-frequency identification [RFID] or low energy communication
50.
System and method for displaying high quality images in a dual modulation projection system
A novel high efficiency image projection system includes a beam-steering modulator, an amplitude modulator, and a controller. In a particular embodiment the controller generates beam-steering drive values from image data and uses the beam-steering drive values to drive the beam-steering modulator. Additionally, the controller utilizes the beam-steering drive values to generate a lightfield simulation of a lightfield projected onto the amplitude modulator by the beam-steering modulator. The controller utilizes the lightfield simulation to generate amplitude drive values for driving the amplitude modulator in order to project a high quality version of the image described by the image data.
Dialogue enhancement of an audio signal, comprising obtaining a set of time-varying parameters configured to estimate a dialogue component present in said audio signal, estimating the dialogue component from the audio signal, applying a compressor only to the estimated dialogue component, to generate a processed dialogue component, applying a user-determined gain to the processed dialogue component, to provide an enhanced dialogue component. The processing of the estimated dialogue may be performed on the decoder side or encoder side. The invention enables an improved dialogue enhancement.
G10L 21/0308 - Voice signal separating characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
H03G 9/18 - Combinations of two or more types of control, e.g. gain control and tone control in untuned amplifiers having semiconductor devices for tone control and volume expansion or compression
52.
METHOD AND APPARATUS FOR DECODING STEREO LOUDSPEAKER SIGNALS FROM A HIGHER-ORDER AMBISONICS AUDIO SIGNAL
Decoding of Ambisonics representations for a stereo loudspeaker setup is known for first-order Ambisonics audio signals. But such first-order Ambisonics approaches have either high negative side lobes or poor localisation in the frontal region. The invention deals with the processing for stereo decoders for higher-order Ambisonics HOA.
H04S 3/00 - Systems employing more than two channels, e.g. quadraphonic
H04S 7/00 - Indicating arrangements; Control arrangements, e.g. balance control
H04S 3/02 - Systems employing more than two channels, e.g. quadraphonic of the matrix type, i.e. in which input signals are combined algebraically, e.g. after having been phase shifted with respect to each other
G10L 19/00 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
53.
Layered coding for compressed sound or sound field representations
The present document relates to a method of layered encoding of a compressed sound representation of a sound or sound field. The compressed sound representation comprises a basic compressed sound representation comprising a plurality of components, basic side information for decoding the basic compressed sound representation to a basic reconstructed sound representation of the sound or sound field, and enhancement side information including parameters for improving the basic reconstructed sound representation. The method comprises sub-dividing the plurality of components into a plurality of groups of components and assigning each of the plurality of groups to a respective one of a plurality of hierarchical layers, the number of groups corresponding to the number of layers, and the plurality of layers including a base layer and one or more hierarchical enhancement layers, adding the basic side information to the base layer, and determining a plurality of portions of enhancement side information from the enhancement side information and assigning each of the plurality of portions of enhancement side information to a respective one of the plurality of layers, wherein each portion of enhancement side information includes parameters for improving a reconstructed sound representation obtainable from data included in the respective layer and any layers lower than the respective layer. The document further relates to a method of decoding a compressed sound representation of a sound or sound field, wherein the compressed sound representation is encoded in a plurality of hierarchical layers that include a base layer and one or more hierarchical enhancement layers, as well as to an encoder and a decoder for layered coding of a compressed sound representation.
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
H04S 3/00 - Systems employing more than two channels, e.g. quadraphonic
54.
Method and apparatus for processing of auxiliary media streams embedded in a MPEGH 3D audio stream
The disclosure relates to methods, apparatus and systems for side load processing of packetized media streams. In an embodiment, the apparatus comprises: a receiver for receiving a bitstream, and a splitter for identifying a packet type in the bitstream and splitting, based on the identification of a value of the packet type in the bit stream into a main stream and an auxiliary stream.
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
H04N 21/434 - Disassembling of a multiplex stream, e.g. demultiplexing audio and video streams or extraction of additional data from a video stream; Remultiplexing of multiplex streams; Extraction or processing of SI; Disassembling of packetised elementary stream
H04N 21/4363 - Adapting the video stream to a specific local network, e.g. a IEEE 1394 or Bluetooth® network
H04N 21/439 - Processing of audio elementary streams
55.
METHOD AND APPARATUS FOR UPDATING A NEURAL NETWORK
Described herein is a method of generating a media bitstream to transmit parameters for updating a neural network implemented in a decoder, wherein the method includes the steps of: (a) determining at least one set of parameters for updating the neural network; (b) encoding the at least one set of parameters and media data to generate the media bitstream; and (c) transmitting the media bitstream to the decoder for updating the neural network with the at least one set of parameters. Described herein are further a method for updating a neural network implemented in a decoder, an apparatus for generating a media bitstream to transmit parameters for updating a neural network implemented in a decoder, an apparatus for updating a neural network implemented in a decoder and computer program products comprising a computer-readable storage medium with instructions adapted to cause the device to carry out said methods when executed by a device having processing capability.
In some embodiments, a pitch filter for filtering a preliminary audio signal generated from an audio bitstream is disclosed. The pitch filter has an operating mode selected from one of either: (i) an active mode where the preliminary audio signal is filtered using filtering information to obtain a filtered audio signal, and (ii) an inactive mode where the pitch filter is disabled. The preliminary audio signal is generated in an audio encoder or audio decoder having a coding mode selected from at least two distinct coding modes, and the pitch filter is capable of being selectively operated in either the active mode or the inactive mode while operating in the coding mode based on control information.
G10L 19/00 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
G10L 19/20 - Vocoders using multiple modes using sound class specific coding, hybrid encoders or object based coding
G10L 19/12 - Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders
G10L 19/22 - Mode decision, i.e. based on audio signal content versus external parameters
G10L 21/007 - Changing voice quality, e.g. pitch or formants characterised by the process used
G10L 19/032 - Quantisation or dequantisation of spectral components
G10L 19/02 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
G10L 19/107 - Sparse pulse excitation, e.g. by using algebraic codebook
A system for decoding a video bitstream includes receiving a bitstream and a plurality of enhancement bitstreams together with receiving a video parameter set and a video parameter set extension. The system also receives an output layer set change message including information indicating a change in at least one output layer set.
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
H04N 19/30 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
H04N 19/12 - Selection from among a plurality of transforms or standards, e.g. selection between discrete cosine transform [DCT] and sub-band transform or selection between H.263 and H.264
H04N 19/44 - Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
H04N 19/423 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals - characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation characterised by memory arrangements
H04N 19/187 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a scalable video layer
H04N 21/845 - Structuring of content, e.g. decomposing content into time segments
H04N 21/2343 - Processing of video elementary streams, e.g. splicing of video streams or manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
H04N 21/44 - Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to MPEG-4 scene graphs
H04N 19/172 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
H04N 19/177 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a group of pictures [GOP]
H04N 19/597 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
58.
METHODS AND DEVICES FOR CONTROLLING AUDIO PARAMETERS
A method of controlling headphones having external microphone signal pass-through functionality may involve controlling a display to present a geometric shape on the display and receiving an indication of digit motion from a sensor system associated with the display. The sensor system may include a touch sensor system or a gesture sensor system. The indication may be an indication of a direction of digit motion relative to the display. The method may involve controlling the display to present a sequence of images indicating that the geometric shape either enlarges or contracts, depending on the direction of digit motion and changing a headphone transparency setting according to a current size of the geometric shape. The headphone transparency setting may correspond to an external microphone signal gain setting and/or a media signal gain setting of the headphones.
A system and method comprise a light source; a spatial light modulator including a substantially transparent material layer and a phase modulation layer; an imaging device configured to receive a light from the light source as reflected by the spatial light modulator, and to generate an image data; and a controller. The controller provides a phase-drive signal to the spatial light modulator and determines an attenuating wavefront of the substantially transparent material layer based on the image data.
G09G 3/36 - Control arrangements or circuits, of interest only in connection with visual indicators other than cathode-ray tubes for presentation of an assembly of a number of characters, e.g. a page, by composing the assembly by combination of individual elements arranged in a matrix by control of light from an independent source using liquid crystals
G02B 26/06 - Optical devices or arrangements for the control of light using movable or deformable optical elements for controlling the phase of light
G03H 1/04 - Processes or apparatus for producing holograms
G03H 1/22 - Processes or apparatus for obtaining an optical image from holograms
60.
Methods, Apparatus and Systems for Dual-Ended Media Intelligence
A method of encoding audio content comprises performing a content analysis of the audio content, generating classification information indicative of a content type of the audio content based on the content analysis, encoding the audio content and the classification information in a bitstream, and outputting the bitstream. A method of decoding audio content from a bitstream including audio content and classification information for the audio content, wherein the classification information is indicative of a content classification of the audio content, comprises receiving the bitstream, decoding the audio content and the classification information, and selecting, based on the classification information, a post processing mode for performing post processing of the decoded audio content. Selecting the post processing mode can involve calculating one or more control weights for post processing of the decoded audio content based on the classification information.
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
Methods for dialogue enhancing audio content, comprising providing a first audio signal presentation of the audio components, providing a second audio signal presentation, receiving a set of dialogue estimation parameters configured to enable estimation of dialogue components from the first audio signal presentation, applying said set of dialogue estimation parameters to said first audio signal presentation, to form a dialogue presentation of the dialogue components; and combining the dialogue presentation with said second audio signal presentation to form a dialogue enhanced audio signal presentation for reproduction on the second audio reproduction system, wherein at least one of said first and second audio signal presentation is a binaural audio signal presentation.
H04S 3/00 - Systems employing more than two channels, e.g. quadraphonic
H04S 7/00 - Indicating arrangements; Control arrangements, e.g. balance control
H04S 3/02 - Systems employing more than two channels, e.g. quadraphonic of the matrix type, i.e. in which input signals are combined algebraically, e.g. after having been phase shifted with respect to each other
A method for encoding an image having been cut up into partitions. The method includes: predicting data of a current partition based on an already encoded and then decoded reference partition, generating a predicted partition; determining residual data by comparing data relating to the current partition with the predicted partition, the residual data associated with various digital data items. Prior producing a signal containing the encoded information, performing the following steps; determining, from the predetermined residual data, a subset containing residual data capable of being modified; calculating the value of a function representative of the residual data; comparing the calculated value with a value of at least one of the digital data items; based on the comparison, modification or non-modification of at least one of the residual data items of the subset; and, in the event of a modification, entropy encoding the at least one modified residual data item.
H04N 19/91 - Entropy coding, e.g. variable length coding [VLC] or arithmetic coding
H04N 19/105 - Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
H04N 19/117 - Filters, e.g. for pre-processing or post-processing
H04N 19/119 - Adaptive subdivision aspects e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
H04N 19/122 - Selection of transform size, e.g. 8x8 or 2x4x8 DCT; Selection of sub-band transforms of varying structure or type
H04N 19/13 - Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
H04N 19/14 - Coding unit complexity, e.g. amount of activity or edge presence estimation
H04N 19/147 - Data rate or code amount at the encoder output according to rate distortion criteria
H04N 19/159 - Prediction type, e.g. intra-frame, inter-frame or bidirectional frame prediction
H04N 19/172 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
H04N 19/18 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a set of transform coefficients
H04N 19/184 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being bits, e.g. of the compressed video stream
H04N 19/44 - Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
H04N 19/46 - Embedding additional information in the video signal during the compression process
H04N 19/463 - Embedding additional information in the video signal during the compression process by compressing encoding parameters before transmission
H04N 19/467 - Embedding additional information in the video signal during the compression process characterised by the embedded information being invisible, e.g. watermarking
H04N 19/48 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using compressed domain processing techniques other than decoding, e.g. modification of transform coefficients, variable length coding [VLC] data or run-length data
H04N 19/593 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques
63.
Methods and apparatus for rate quality scalable coding with generative models
Described herein is a method of decoding an audio or speech signal, the method including the steps of: (a) receiving, by a decoder, a coded bitstream including the audio or speech signal and conditioning information; (b) providing, by a bitstream decoder, decoded conditioning information in a format associated with a first bitrate; (c) converting, by a converter, the decoded conditioning information from the format associated with the first bitrate to a format associated with a second bitrate; and (d) providing, by a generative neural network, a reconstruction of the audio or speech signal according to a probabilistic model conditioned by the conditioning information in the format associated with the second bitrate. Described are further an apparatus for decoding an audio or speech signal, a respective encoder, a system of the encoder and the apparatus for decoding an audio or speech signal as well as a respective computer program product.
G10L 19/06 - Determination or coding of the spectral characteristics, e.g. of the short-term prediction coefficients
G10L 19/032 - Quantisation or dequantisation of spectral components
G10L 19/24 - Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding
G10L 25/30 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique using neural networks
64.
Tracking a reference picture on an electronic device
A method for tracking a reference picture on an electronic device is described. The method includes receiving a bitstream. The method also includes decoding a portion of the bitstream to produce a decoded reference picture. The method further includes tracking the decoded reference picture in a decoded picture buffer (DPB) with reduced overhead referencing. The method additionally includes decoding a picture based on the decoded reference picture.
H04N 19/00 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
H04N 19/423 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals - characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation characterised by memory arrangements
H04N 19/44 - Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
H04N 19/503 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
H04N 19/573 - Motion compensation with multiple frame prediction using two or more reference frames in a given prediction direction
H04N 19/58 - Motion compensation with long-term prediction, i.e. the reference frame for a current frame not being the temporally closest one
H04N 19/587 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal sub-sampling or interpolation, e.g. decimation or subsequent interpolation of pictures in a video sequence
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
The disclosure herein generally relates to capturing, acoustic pre-processing, encoding, decoding, and rendering of directional audio of an audio scene. In particular, it relates to a device adapted to modify a directional property of a captured directional audio in response to spatial data of a microphone system capturing the directional audio. The disclosure further relates to a rendering device configured to modify a directional property of a received directional audio in response to received spatial data.
This disclosure falls into the field of audio coding, in particular it is related to the field of providing a framework for providing loudness consistency among differing audio output signals. In particular, the disclosure relates to methods, computer program products and apparatus for encoding and decoding of audio data bitstreams in order to attain a desired loudness level of an output audio signal.
G10L 19/24 - Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding
A system utilizing a high throughput coding mode for CABAC in HEVC is described. The system may include an electronic device configured to obtain a block of data to be encoded using an arithmetic based encoder; to generate a sequence of syntax elements using the obtained block; to compare an Absolute-3 value of the sequence or a parameter associated with the Absolute-3 value to a preset value; and to convert the Absolute-3 value to a codeword using a first code or a second code that is different than the first code, according to a result of the comparison.
H04N 19/60 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
H03M 7/40 - Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
H04N 19/91 - Entropy coding, e.g. variable length coding [VLC] or arithmetic coding
H04N 19/44 - Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
The present disclosure provides methods, devices and computer program products for encoding and decoding of a vector of parameters in an audio coding system. The disclosure further relates to a method and apparatus for reconstructing an audio object in an audio decoding system. According to the disclosure, a modulo differential approach for coding and encoding a vector of a non-periodic quantity may improve the coding efficiency and provide encoders and decoders with less memory requirements. Moreover, an efficient method for encoding and decoding a sparse matrix is provided.
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
G10L 19/00 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
G10L 19/038 - Vector quantisation, e.g. TwinVQ audio
H04S 3/02 - Systems employing more than two channels, e.g. quadraphonic of the matrix type, i.e. in which input signals are combined algebraically, e.g. after having been phase shifted with respect to each other
G10L 19/032 - Quantisation or dequantisation of spectral components
69.
Harmonic transposition in an audio coding method and system
G10L 19/24 - Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding
G10L 19/02 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
G10L 19/022 - Blocking, i.e. grouping of samples in time; Choice of analysis windows; Overlap factoring
70.
METHODS AND DEVICES FOR GENERATING OR DECODING A BITSTREAM COMPRISING IMMERSIVE AUDIO SIGNALS
The present document describes a method (500) for generating a bitstream (101), wherein the bitstream (101) comprises a sequence of superframes (400) for a sequence of frames of an immersive audio signal (111). The method (500) comprises, repeatedly for the sequence of superframes (400), inserting (501) coded audio data (206) for one or more frames of one or more downmix channel signals (203) derived from the immersive audio signal (111), into data fields (411, 421, 412, 422) of a superframe (400); and inserting (502) metadata (202, 205) for reconstructing one or more frames of the immersive audio signal (111) from the coded audio data (206), into a metadata field (403) of the superframe (400).
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
71.
Processing of audio signals during high frequency reconstruction
The application relates to HFR (High Frequency Reconstruction/Regeneration) of audio signals. In particular, the application relates to a method and system for performing HFR of audio signals having large variations in energy level across the low frequency range which is used to reconstruct the high frequencies of the audio signal. A system configured to generate a plurality of high frequency subband signals covering a high frequency interval from a plurality of low frequency subband signals is described. The system comprises means for receiving the plurality of low frequency subband signals; means for receiving a set of target energies, each target energy covering a different target interval within the high frequency interval and being indicative of the desired energy of one or more high frequency subband signals lying within the target interval; means for generating the plurality of high frequency subband signals from the plurality of low frequency subband signals and from a plurality of spectral gain coefficients associated with the plurality of low frequency subband signals, respectively; and means for adjusting the energy of the plurality of high frequency subband signals using the set of target energies.
G10L 21/038 - Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques
G10L 19/00 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
G10L 19/02 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
A system for decoding a video bitstream includes receiving a frame of the video that includes at least one slice and at least one tile and where each of the at least one slice and the at least one tile are not all aligned with one another.
H04N 19/44 - Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
H04N 13/161 - Encoding, multiplexing or demultiplexing different image signal components
H04N 19/597 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
H04N 19/30 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
H04N 19/112 - Selection of coding mode or of prediction mode according to a given display mode, e.g. for interlaced or progressive display mode
H04N 19/103 - Selection of coding mode or of prediction mode
Multiple virtual source locations may be defined for a volume within which audio objects can move. A set-up process for rendering audio data may involve receiving reproduction speaker location data and pre-computing gain values for each of the virtual sources according to the reproduction speaker location data and each virtual source location. The gain values may be stored and used during “run time,” during which audio reproduction data are rendered for the speakers of the reproduction environment. During run time, for each audio object, contributions from virtual source locations within an area or volume defined by the audio object position data and the audio object size data may be computed. A set of gain values for each output channel of the reproduction environment may be computed based, at least in part, on the computed contributions. Each output channel may correspond to at least one reproduction speaker of the reproduction environment.
H04R 5/02 - Spatial or constructional arrangements of loudspeakers
H04S 7/00 - Indicating arrangements; Control arrangements, e.g. balance control
H04S 3/00 - Systems employing more than two channels, e.g. quadraphonic
H04S 5/00 - Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation
75.
METHODS, APPARATUS AND SYSTEMS FOR GENERATION, TRANSPORTATION AND PROCESSING OF IMMEDIATE PLAYOUT FRAMES (IPFS)
Described herein is an audio decoder for decoding a bitstream of encoded audio data, wherein the bitstream of encoded audio data represents a sequence of audio sample values and comprises a plurality of frames, wherein each frame comprises associated encoded audio sample values, the audio decoder comprising: a determiner configured to determine whether a frame of the bitstream of encoded audio data is an immediate playout frame comprising encoded audio sample values associated with a current frame and additional information; and an initializer configured to initialize the decoder if the determiner determines that the frame is an immediate playout frame, wherein initializing the decoder comprises decoding the encoded audio sample values comprised by the additional information before decoding the encoded audio sample values associated with the current frame. Described are further a method for decoding said bitstream of encoded audio data as well as an audio encoder, a system of audio encoders and a method for generating said bitstream of encoded audio data with immediate playout frames. Described are moreover also an apparatus for generating immediate playout frames in a bitstream of encoded audio data or for removing immediate playout frames from a bitstream of encoded audio data and respective non-transitory digital storage media.
An electronic device for encoding a picture is described. The electronic device includes a processor and instructions stored in memory that are in electronic communication with the processor. The instructions are executable to encode a step-wise temporal sub-layer access (STSA) sample grouping. The instructions are further executable to send and/or store the STSA sample grouping.
H04N 19/31 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability in the temporal domain
H04N 21/2343 - Processing of video elementary streams, e.g. splicing of video streams or manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
H04N 21/235 - Processing of additional data, e.g. scrambling of additional data or processing content descriptors
H04N 19/172 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
H04N 1/31 - Mechanical arrangements for picture transmission, e.g. adaptation of clutches, gearing, gear transmissions
77.
Methods, apparatus and systems for low latency audio discontinuity fade out
The present document discloses a method for fading discontinued audio feeds for replay by a speaker. In particular, the method may first comprise receiving an input audio feed comprising a plurality of samples. The method may further comprise determining whether the input audio feed is discontinued. And, when discontinuity of the input audio feed is detected, the method may comprise generating an intermediate audio signal comprising a plurality of samples based on the discontinued input audio feed. In particular, the intermediate audio signal may be generated based on a last portion of the discontinued input audio feed that has been output for replay. In addition, the method may further comprise applying a fadeout function to the intermediate audio signal to generate a fadeout audio signal. Finally, the method may comprise outputting the fadeout audio signal for replay by the speaker.
H04R 3/04 - Circuits for transducers for correcting frequency response
H04R 1/40 - Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers
78.
Dynamic range control for a wide variety of playback environments
In an audio encoder, for audio content received in a source audio format, default gains are generated based on a default dynamic range compression (DRC) curve, and non-default gains are generated for a non-default gain profile. Based on the default gains and non-default gains, differential gains are generated. An audio signal comprising the audio content, the default DRC curve, and differential gains is generated. In an audio decoder, the default DRC curve and the differential gains are identified from the audio signal. Default gains are re-generated based on the default DRC curve. Based on the combination of the re-generated default gains and the differential gains, operations are performed on the audio content extracted from the audio signal.
H03G 7/00 - Volume compression or expansion in amplifiers
H03G 9/00 - Combinations of two or more types of control, e.g. gain control and tone control
H03G 9/02 - Combinations of two or more types of control, e.g. gain control and tone control in untuned amplifiers
H03G 9/18 - Combinations of two or more types of control, e.g. gain control and tone control in untuned amplifiers having semiconductor devices for tone control and volume expansion or compression
H03G 3/30 - Automatic control in amplifiers having semiconductor devices
H03G 9/12 - Combinations of two or more types of control, e.g. gain control and tone control in untuned amplifiers having semiconductor devices
Audio content coded for a reference speaker configuration is downmixed to downmix audio content coded for a specific speaker configuration. One or more gain adjustments are performed on individual portions of the downmix audio content coded for the specific speaker configuration. Loudness measurements are then performed on the individual portions of the downmix audio content. An audio signal that comprises the audio content coded for the reference speaker configuration and downmix loudness metadata is generated. The downmix loudness metadata is created based at least in part on the loudness measurements on the individual portions of the downmix audio content.
H04S 7/00 - Indicating arrangements; Control arrangements, e.g. balance control
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
G10L 21/0324 - Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude - Details of processing therefor
Various embodiments are disclosed for (possibly simultaneously) applying EQ and DRC to audio signals. In an embodiment, a method comprises: dividing an input audio signal into n frames, where n is a positive integer greater than one; dividing each frame of the input audio signal into Nb frequency bands, where Nb is a positive integer greater than one; for each frame n: computing an input level of the input audio signal in each band f, resulting in a input audio level distribution for the input audio signal; computing a gain for each band f based at least in part on a mapping of one or more properties of the input audio level distribution to a reference N audio level distribution computed from one or more reference audio signals; and applying each computed gain for each band f to each corresponding band f of the input audio signal.
G10L 21/028 - Voice signal separating using properties of sound source
G10L 25/18 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
G10L 25/51 - Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination
A system for decoding a video bitstream includes receiving a bitstream and a plurality of enhancement bitstreams together with receiving a video parameter set and a video parameter set extension. The system also receives an output layer set change message including information indicating a change in at least one output layer set.
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
H04N 19/30 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
H04N 19/44 - Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
H04N 19/172 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
H04N 19/423 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals - characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation characterised by memory arrangements
H04N 19/177 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a group of pictures [GOP]
A method of playing out media from a media engine run on a receiving apparatus, the method comprising: at the receiving apparatus, receiving a media data structure comprising audio or video content formatted in a plurality of layers, including at least a first layer comprising the audio or video content encoded according to an audio or video encoding scheme respectively, and a second layer encapsulating the encoded content in one or more media containers according to a media container format; determining that at least one of the media containers further encapsulates runnable code for processing at least some of the formatting of the media data structure in order to support playout of the audio or video content by the media engine; running the code on a code engine of the receiving apparatus in order to perform the processing of the media data structure for input to the media engine.
H04N 21/434 - Disassembling of a multiplex stream, e.g. demultiplexing audio and video streams or extraction of additional data from a video stream; Remultiplexing of multiplex streams; Extraction or processing of SI; Disassembling of packetised elementary stream
H04N 21/439 - Processing of audio elementary streams
H04N 21/4402 - Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
Described herein is a method for creating object-based audio content from a text input for use in audio books and/or audio play, the method including the steps of: a) receiving the text input; b) performing a semantic analysis of the received text input; c) synthesizing speech and effects based on one or more results of the semantic analysis to generate one or more audio objects; d) generating metadata for the one or more audio objects; and e) creating the object-based audio content including the one or more audio objects and the metadata. Described herein are further a computer-based system including one or more processors configured to perform said method and a computer program product comprising a computer-readable storage medium with instructions adapted to carry out said method when executed by a device having processing capability.
A method for encoding an input audio stream including the steps of obtaining a first playback stream presentation of the input audio stream intended for reproduction on a first audio reproduction system, obtaining a second playback stream presentation of the input audio stream intended for reproduction on a second audio reproduction system, determining a set of transform parameters suitable for transforming an intermediate playback stream presentation to an approximation of the second playback stream presentation, wherein the transform parameters are determined by minimization of a measure of a difference between the approximation of the second playback stream presentation and the second playback stream presentation, and encoding the first playback stream presentation and the set of transform parameters for transmission to a decoder.
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
Audio objects are associated with positional metadata. A received downmix signal comprises downmix channels that are linear combinations of one or more audio objects and are associated with respective positional locators. In a first aspect, the downmix signal, the positional metadata and frequency-dependent object gains are received. An audio object is reconstructed by applying the object gain to an upmix of the downmix signal in accordance with coefficients based on the positional metadata and the positional locators. In a second aspect, audio objects have been encoded together with at least one bed channel positioned at a positional locator of a corresponding downmix channel. The decoding system receives the downmix signal and the positional metadata of the audio objects. A bed channel is reconstructed by suppressing the content representing audio objects from the corresponding downmix channel on the basis of the positional locator of the corresponding downmix channel.
G10L 19/00 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
G10L 19/20 - Vocoders using multiple modes using sound class specific coding, hybrid encoders or object based coding
H04S 5/00 - Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation
H04S 7/00 - Indicating arrangements; Control arrangements, e.g. balance control
G10L 19/02 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
G10L 25/06 - Speech or voice analysis techniques not restricted to a single one of groups characterised by the type of extracted parameters the extracted parameters being correlation coefficients
H04S 3/00 - Systems employing more than two channels, e.g. quadraphonic
H04S 3/02 - Systems employing more than two channels, e.g. quadraphonic of the matrix type, i.e. in which input signals are combined algebraically, e.g. after having been phase shifted with respect to each other
86.
METHODS, APPARATUS AND SYSTEMS FOR OPTIMIZING COMMUNICATION BETWEEN SENDER(S) AND RECEIVER(S) IN COMPUTER-MEDIATED REALITY APPLICATIONS
The present invention is directed to systems, methods and apparatus for processing media content for reproduction by a first apparatus. The method includes obtaining pose information indicative of a position and/or orientation of a user. The pose information is transmitted to a second apparatus that provides the media content. The media content is rendered based on the pose information to obtain rendered media content. The rendered media content is transmitted to the first apparatus for reproduction. The present invention may include a first apparatus for reproducing media content and a second apparatus storing the media content. The first apparatus is configured to obtain pose information indicative and transmit the pose information to the second apparatus; and the second apparatus is adapted to: render the media content based on the pose information to obtain rendered media content; and transmit the rendered media content to the first apparatus for reproduction.
A63F 13/428 - Processing input control signals of video game devices, e.g. signals generated by the player or derived from the environment by mapping the input signals into game commands, e.g. mapping the displacement of a stylus on a touch screen to the steering angle of a virtual vehicle involving motion or position input signals, e.g. signals representing the rotation of an input controller or a player's arm motions sensed by accelerometers or gyroscopes
A63F 13/213 - Input arrangements for video game devices characterised by their sensors, purposes or types comprising photodetecting means, e.g. cameras, photodiodes or infrared cells
G06F 3/01 - Input arrangements or combined input and output arrangements for interaction between user and computer
H04L 29/06 - Communication control; Communication processing characterised by a protocol
H04S 3/00 - Systems employing more than two channels, e.g. quadraphonic
H04S 7/00 - Indicating arrangements; Control arrangements, e.g. balance control
C12Q 1/6883 - Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
G10L 21/038 - Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques
G10L 19/02 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
G10L 19/022 - Blocking, i.e. grouping of samples in time; Choice of analysis windows; Overlap factoring
88.
Transforming audio signals captured in different formats into a reduced number of formats for simplifying encoding and decoding operations
The disclosed embodiments enable converting audio signals captured in various formats by various capture devices into a limited number of formats that can be processed by an audio codec (e.g., an Immersive Voice and Audio Services (IVAS) codec). In an embodiment, a simplification unit of the audio device receives an audio signal captured by one or more audio capture devices coupled to the audio device. The simplification unit determines whether the audio signal is in a format that is supported/not supported by an encoding unit of the audio device. Based on the determining, the simplification unit, converts the audio signal into a format that is supported by the encoding unit. In an embodiment, if the simplification unit determines that the audio signal is in a spatial format, the simplification unit can convert the audio signal into a spatial “mezzanine” format supported by the encoding.
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
H04S 3/00 - Systems employing more than two channels, e.g. quadraphonic
89.
Speech/Dialog Enhancement Controlled by Pupillometry
The present disclosure relates to methods for processing a decoded audio signal and for selectively applying speech/dialog enhancement to the decoded audio signal. The present disclosure also relates to a method of operating a headset for computer-mediated reality. A method of processing a decoded audio signal comprises obtaining a measure of a cognitive load of a listener that listens to a rendering of the audio signal, determining whether speech/dialog enhancement shall be applied based on the obtained measure of the cognitive load, and performing speech/dialog enhancement based on the determination. A method of operating a headset for computer-mediated reality comprises obtaining eye-tracking data of a wearer of the headset, determining a measure of a cognitive load of the wearer of the headset based on the eye-tracking data, and outputting an indication of the cognitive load of the wearer of the headset. The present disclosure further relates to corresponding apparatus and systems, and to methods of operating such apparatus and systems.
G10L 21/02 - Speech enhancement, e.g. noise reduction or echo cancellation
G10L 19/02 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
G06F 3/01 - Input arrangements or combined input and output arrangements for interaction between user and computer
G06T 19/00 - Manipulating 3D models or images for computer graphics
Methods for generating an AV bitstream (e.g., an MPEG-2 transport stream or bitstream segment having adaptive streaming format) such that the AV bitstream includes at least one video I-frame synchronized with at least one audio I-frame, e.g., including by re-authoring at least one video or audio frame (as a re-authored I-frame or a re-authored P-frame). Typically, a segment of content of the AV bitstream which includes the re-authored frame starts with an I-frame and includes at least one subsequent P-frame. Other aspects are methods for adapting such an AV bitstream, audio/video processing units configured to perform any embodiment of the inventive method, and audio/video processing units which include a buffer memory which stores at least one segment of an AV bitstream generated in accordance with any embodiment of the inventive method.
H04N 7/16 - Analogue secrecy systems; Analogue subscription systems
H04N 21/435 - Processing of additional data, e.g. decrypting of additional data or reconstructing software from modules extracted from the transport stream
H04N 21/2343 - Processing of video elementary streams, e.g. splicing of video streams or manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
H04N 21/61 - Network physical structure; Signal processing
The present document relates to audio coding systems. In particular, the present document relates to efficient methods and systems for parametric multi-channel audio coding. An audio encoding system configured to generate a bitstream indicative of a downmix signal and spatial metadata for generating a multi-channel upmix signal from the downmix signal is described. The system comprises a downmix processing unit configured to generate the downmix signal from a multi-channel input signal; wherein the downmix signal comprises m channels and wherein the multi-channel input signal comprises n channels; n, m being integers with m
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
This disclosure falls into the field of adapting external content to a video stream, and more specifically it is related to analyzing the video stream to define a suitable narrative model, and adapting the external content based on this narrative model.
G11B 27/031 - Electronic editing of digitised analogue information signals, e.g. audio or video signals
G11B 27/28 - Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
An audio object including audio content and object metadata is received. The object metadata indicates an object spatial position of the audio object to be rendered by audio speakers in a playback environment. Based on the object spatial position and source spatial positions of the audio speakers, initial gain values for the audio speakers are determined. The initial gain values can be used to select a set of audio speakers from among the audio speakers. Based on the object spatial position and a set of source spatial positions at which the set of audio speakers are respectively located in the playback environment, a set of non-negative optimized gain values for the set of audio speakers is determined. The audio object at the object spatial position is rendered with the set of optimized gain values for the set of audio speakers.
A method for decoding video includes receiving quantized coefficients representative of a block of video representative of a plurality of pixels. The quantized coefficients are dequantized based upon a function of a remainder. The dequantized coefficients are inverse transformed to determine a decoded residue.
H04N 19/132 - Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
H04N 19/184 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being bits, e.g. of the compressed video stream
H04N 19/44 - Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
H04N 19/176 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
H04N 19/182 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
95.
Network-based processing and distribution of multimedia content of a live musical performance
Methods, systems, and computer program products for network-based processing and distribution of multimedia content of a live performance are disclosed. In some implementations, recording devices can be configured to record a multimedia event (e.g., a musical performance). The recording devices can provide the recordings to a server while the event is ongoing. The server automatically synchronizes, mixes and masters the recordings. The server performs the automatic mixing and mastering using reference audio data previously captured during a rehearsal. The server streams the mastered recording to multiple end users through the Internet or other public or private network. The streaming can be live streaming.
G10H 1/00 - ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE - Details of electrophonic musical instruments
H04N 21/2343 - Processing of video elementary streams, e.g. splicing of video streams or manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
A multi-channel decoder for generating a binaural signal from a downmix signal using upmix rule information on an energy-error introducing upmix rule for calculating a gain factor based on the upmix rule information and characteristics of head related transfer function based filters corresponding to upmix channels. The one or more gain factors are used by a filter processor for filtering the downmix signal so that an energy corrected binaural signal having a left binaural channel and a right binaural channel is obtained.
H04S 7/00 - Indicating arrangements; Control arrangements, e.g. balance control
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
97.
Methods, devices and systems for parallel video encoding and decoding
A method for decoding a video bitstream is disclosed. The method comprises: entropy decoding a first portion of a video bitstream, wherein first portion of video bitstream is associated with a video frame, thereby producing a first portion of decoded data; entropy decoding a second portion of video bitstream, wherein second portion of video bitstream is associated with video frame, thereby producing a second portion of decoded data, wherein entropy decoding second portion of video bitstream is independent of entropy decoding first portion of video bitstream; and reconstructing a first portion of video frame associated with video bitstream using first portion of decoded data and second portion of decoded data.
H04N 19/91 - Entropy coding, e.g. variable length coding [VLC] or arithmetic coding
H04N 19/70 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
H04N 19/119 - Adaptive subdivision aspects e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
H04N 19/15 - Data rate or code amount at the encoder output by monitoring actual compressed data size at the memory before deciding storage at the transmission buffer
H04N 19/192 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding the adaptation method, adaptation tool or adaptation type being iterative or recursive
H04N 19/46 - Embedding additional information in the video signal during the compression process
H04N 19/61 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
H04N 19/174 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a slice, e.g. a line of blocks or a group of blocks
H04N 19/17 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
H04N 19/44 - Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
H04N 19/436 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals - characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation using parallelised computational arrangements
H04N 19/159 - Prediction type, e.g. intra-frame, inter-frame or bidirectional frame prediction
H04N 19/43 - Hardware specially adapted for motion estimation or compensation
H04N 19/136 - Incoming video signal characteristics or properties
H04N 19/40 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video transcoding, i.e. partial or full decoding of a coded input stream followed by re-encoding of the decoded output stream
H04N 19/172 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
H04N 19/80 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals - Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation
H04N 19/184 - Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being bits, e.g. of the compressed video stream
98.
Methods, apparatus and systems for 6DoF audio rendering and data representations and bitstream structures for 6DoF audio rendering
The present disclosure relates to methods, apparatus and systems for encoding an audio signal into a bitstream, in particular at an encoder, comprising: encoding or including audio signal data associated with 3DoF audio rendering into one or more first bitstream parts of the bitstream, and encoding or including metadata associated with 6DoF audio rendering into one or more second bitstream parts of the bitstream. The present disclosure further relates to methods, apparatus and systems for decoding an audio signal and audio rendering based on the bitstream.
H04S 7/00 - Indicating arrangements; Control arrangements, e.g. balance control
G10L 19/008 - Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
The present document relates to time-alignment of encoded data of an audio encoder with associated metadata, such as spectral band replication (SBR) metadata. An audio decoder configured to determine a reconstructed frame of an audio signal from an access unit of a received data stream is described. The access unit comprises waveform data and metadata, wherein the waveform data and the metadata are associated with the same reconstructed frame of the audio signal. The audio decoder comprises a waveform processing path configured to generate a plurality of waveform subband signals from the waveform data, and a metadata processing path configured to generate decoded metadata from the metadata.
G10L 21/0388 - Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques - Details of processing therefor
G10L 19/018 - Audio watermarking, i.e. embedding inaudible data in the audio signal
G10L 19/02 - Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
G10L 19/032 - Quantisation or dequantisation of spectral components
100.
Integration of high frequency reconstruction techniques with reduced post-processing delay
A method for decoding an encoded audio bitstream is disclosed. The method includes receiving the encoded audio bitstream and decoding the audio data to generate a decoded lowband audio signal. The method further includes extracting high frequency reconstruction metadata and filtering the decoded lowband audio signal with an analysis filterbank to generate a filtered lowband audio signal. The method also includes extracting a flag indicating whether either spectral translation or harmonic transposition is to be performed on the audio data and regenerating a highband portion of the audio signal using the filtered lowband audio signal and the high frequency reconstruction metadata in accordance with the flag. The high frequency regeneration is performed as a post-processing operation with a delay of 3010 samples per audio channel.