G10L 11/00
|
Determination or detection of speech or audio characteristics not restricted to a single one of groups ; G10L 15/00-G10L 21/00 |
G10L 11/02
|
Detection of presence or absence of speech signals |
G10L 11/04
|
Pitch determination of speech signals |
G10L 11/06
|
Discriminating between voiced and unvoiced parts of speech signals (G10L 11/04 takes precedence);; |
G10L 13/00
|
Speech synthesis; Text to speech systems |
G10L 13/02
|
Methods for producing synthetic speech; Speech synthesisers |
G10L 13/04
|
Methods for producing synthetic speech; Speech synthesisers - Details of speech synthesis systems, e.g. synthesiser structure or memory management |
G10L 13/06
|
Elementary speech units used in speech synthesisers; Concatenation rules |
G10L 13/07
|
Concatenation rules |
G10L 13/08
|
Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination |
G10L 13/10
|
Prosody rules derived from text; Stress or intonation |
G10L 13/027
|
Concept to speech synthesisers; Generation of natural phrases from machine-based concepts |
G10L 13/033
|
Voice editing, e.g. manipulating the voice of the synthesiser |
G10L 13/047
|
Architecture of speech synthesisers |
G10L 15/00
|
Speech recognition |
G10L 15/01
|
Assessment or evaluation of speech recognition systems |
G10L 15/02
|
Feature extraction for speech recognition; Selection of recognition unit |
G10L 15/04
|
Segmentation; Word boundary detection |
G10L 15/05
|
Word boundary detection |
G10L 15/06
|
Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice |
G10L 15/07
|
Adaptation to the speaker |
G10L 15/08
|
Speech classification or search |
G10L 15/10
|
Speech classification or search using distance or distortion measures between unknown speech and reference templates |
G10L 15/12
|
Speech classification or search using dynamic programming techniques, e.g. dynamic time warping [DTW] |
G10L 15/14
|
Speech classification or search using statistical models, e.g. Hidden Markov Models [HMM] |
G10L 15/16
|
Speech classification or search using artificial neural networks |
G10L 15/18
|
Speech classification or search using natural language modelling |
G10L 15/19
|
Grammatical context, e.g. disambiguation of recognition hypotheses based on word sequence rules |
G10L 15/20
|
Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise or of stress induced speech |
G10L 15/22
|
Procedures used during a speech recognition process, e.g. man-machine dialog |
G10L 15/24
|
Speech recognition using non-acoustical features |
G10L 15/25
|
Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis |
G10L 15/26
|
Speech to text systems |
G10L 15/28
|
Constructional details of speech recognition systems |
G10L 15/30
|
Distributed recognition, e.g. in client-server systems, for mobile phones or network applications |
G10L 15/32
|
Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems |
G10L 15/34
|
Adaptation of a single recogniser for parallel processing, e.g. by use of multiple processors or cloud computing |
G10L 15/065
|
Adaptation |
G10L 15/183
|
Speech classification or search using natural language modelling using context dependencies, e.g. language models |
G10L 15/187
|
Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams |
G10L 15/193
|
Formal grammars, e.g. finite state automata, context free grammars or word networks |
G10L 15/197
|
Probabilistic grammars, e.g. word n-grams |
G10L 17/00
|
Speaker identification or verification |
G10L 17/02
|
Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction |
G10L 17/04
|
Training, enrolment or model building |
G10L 17/06
|
Decision making techniques; Pattern matching strategies |
G10L 17/08
|
Use of distortion metrics or a particular distance between probe pattern and reference templates |
G10L 17/10
|
Multimodal systems, i.e. based on the integration of multiple recognition engines or fusion of expert systems |
G10L 17/12
|
Score normalisation |
G10L 17/14
|
Use of phonemic categorisation or speech recognition prior to speaker recognition or verification |
G10L 17/16
|
Hidden Markov models [HMM] |
G10L 17/18
|
Artificial neural networks; Connectionist approaches |
G10L 17/20
|
Pattern transformations or operations aimed at increasing system robustness, e.g. against channel noise or different working conditions |
G10L 17/22
|
Interactive procedures; Man-machine interfaces |
G10L 17/24
|
the user being prompted to utter a password or a predefined phrase |
G10L 17/26
|
Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices |
G10L 19/00
|
Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis |
G10L 19/002
|
Dynamic bit allocation |
G10L 19/03
|
Spectral prediction for preventing pre-echo; Temporary noise shaping [TNS], e.g. in MPEG2 or MPEG4 |
G10L 19/04
|
Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques |
G10L 19/005
|
Correction of errors induced by the transmission channel, if related to the coding algorithm |
G10L 19/06
|
Determination or coding of the spectral characteristics, e.g. of the short-term prediction coefficients |
G10L 19/07
|
Line spectrum pair [LSP] vocoders |
G10L 19/008
|
Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing |
G10L 19/09
|
Long term prediction, i.e. removing periodical redundancies, e.g. by using adaptive codebook or pitch predictor |
G10L 19/10
|
Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a multipulse excitation |
G10L 19/012
|
Comfort noise or silence coding |
G10L 19/13
|
Residual excited linear prediction [RELP] |
G10L 19/14
|
Details not provided for in groups ; G10L 19/06-G10L 19/12, e.g. gain coding, post filtering design or vocoder structure |
G10L 19/16
|
Vocoder architecture |
G10L 19/018
|
Audio watermarking, i.e. embedding inaudible data in the audio signal |
G10L 19/20
|
Vocoders using multiple modes using sound class specific coding, hybrid encoders or object based coding |
G10L 19/022
|
Blocking, i.e. grouping of samples in time; Choice of analysis windows; Overlap factoring |
G10L 19/24
|
Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding |
G10L 19/025
|
Detection of transients or attacks for time/frequency resolution switching |
G10L 19/26
|
Pre-filtering or post-filtering |
G10L 19/028
|
Noise substitution, e.g. substituting non-tonal spectral components by noisy source |
G10L 19/032
|
Quantisation or dequantisation of spectral components |
G10L 19/035
|
Scalar quantisation |
G10L 19/038
|
Vector quantisation, e.g. TwinVQ audio |
G10L 19/083
|
Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being an excitation gain |
G10L 19/087
|
Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters using mixed excitation models, e.g. MELP, MBE, split band LPC or HVXC |
G10L 19/093
|
Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters using sinusoidal excitation models |
G10L 19/097
|
Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters using prototype waveform decomposition or prototype waveform interpolative [PWI] coders |
G10L 19/107
|
Sparse pulse excitation, e.g. by using algebraic codebook |
G10L 19/113
|
Regular pulse excitation |
G10L 19/125
|
Pitch excitation, e.g. pitch synchronous innovation CELP [PSI-CELP] |
G10L 19/135
|
Vector sum excited linear prediction [VSELP] |
G10L 21/00
|
Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility |
G10L 21/01
|
Correction of time axis |
G10L 21/02
|
Speech enhancement, e.g. noise reduction or echo cancellation |
G10L 21/003
|
Changing voice quality, e.g. pitch or formants |
G10L 21/04
|
Time compression or expansion |
G10L 21/06
|
Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids |
G10L 21/007
|
Changing voice quality, e.g. pitch or formants characterised by the process used |
G10L 21/10
|
Transforming into visible information |
G10L 21/12
|
Transforming into visible information by displaying time domain information |
G10L 21/013
|
Adapting to target pitch |
G10L 21/14
|
Transforming into visible information by displaying frequency domain information |
G10L 21/16
|
Transforming into a non-visible representation |
G10L 21/18
|
Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids - Details of the transformation process |
G10L 21/028
|
Voice signal separating using properties of sound source |
G10L 21/034
|
Automatic adjustment |
G10L 21/038
|
Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques |
G10L 21/043
|
Time compression or expansion by changing speed |
G10L 21/045
|
Time compression or expansion by changing speed using thinning out or insertion of a waveform |
G10L 21/047
|
Time compression or expansion by changing speed using thinning out or insertion of a waveform characterised by the type of waveform to be thinned out or inserted |
G10L 21/049
|
Time compression or expansion by changing speed using thinning out or insertion of a waveform characterised by the interconnection of waveforms |
G10L 21/055
|
Time compression or expansion for synchronising with other signals, e.g. video signals |
G10L 21/057
|
Time compression or expansion for improving intelligibility |
G10L 21/0208
|
Noise filtering |
G10L 21/0216
|
Noise filtering characterised by the method used for estimating noise |
G10L 21/0224
|
Processing in the time domain |
G10L 21/0232
|
Processing in the frequency domain |
G10L 21/0264
|
Noise filtering characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques |
G10L 21/0272
|
Voice signal separating |
G10L 21/0308
|
Voice signal separating characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques |
G10L 21/0316
|
Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude |
G10L 21/0324
|
Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude - Details of processing therefor |
G10L 21/0332
|
Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude - Details of processing therefor involving modification of waveforms |
G10L 21/0356
|
Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for synchronising with other signals, e.g. video signals |
G10L 21/0364
|
Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility |
G10L 21/0388
|
Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques - Details of processing therefor |
G10L 23/00
|
Speech analysis not provided for in other groups of this subclass |
G10L 25/00
|
Speech or voice analysis techniques not restricted to a single one of groups |
G10L 25/03
|
Speech or voice analysis techniques not restricted to a single one of groups characterised by the type of extracted parameters |
G10L 25/06
|
Speech or voice analysis techniques not restricted to a single one of groups characterised by the type of extracted parameters the extracted parameters being correlation coefficients |
G10L 25/09
|
Speech or voice analysis techniques not restricted to a single one of groups characterised by the type of extracted parameters the extracted parameters being zero crossing rates |
G10L 25/12
|
Speech or voice analysis techniques not restricted to a single one of groups characterised by the type of extracted parameters the extracted parameters being prediction coefficients |
G10L 25/15
|
Speech or voice analysis techniques not restricted to a single one of groups characterised by the type of extracted parameters the extracted parameters being formant information |
G10L 25/18
|
Speech or voice analysis techniques not restricted to a single one of groups characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band |
G10L 25/21
|
Speech or voice analysis techniques not restricted to a single one of groups characterised by the type of extracted parameters the extracted parameters being power information |
G10L 25/24
|
Speech or voice analysis techniques not restricted to a single one of groups characterised by the type of extracted parameters the extracted parameters being the cepstrum |
G10L 25/27
|
Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique |
G10L 25/30
|
Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique using neural networks |
G10L 25/33
|
Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique using fuzzy logic |
G10L 25/36
|
Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique using chaos theory |
G10L 25/39
|
Speech or voice analysis techniques not restricted to a single one of groups characterised by the analysis technique using genetic algorithms |
G10L 25/45
|
Speech or voice analysis techniques not restricted to a single one of groups characterised by the type of analysis window |
G10L 25/48
|
Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use |
G10L 25/51
|
Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination |
G10L 25/54
|
Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination for retrieval |
G10L 25/57
|
Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination for processing of video signals |
G10L 25/60
|
Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals |
G10L 25/63
|
Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination for estimating an emotional state |
G10L 25/66
|
Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition |
G10L 25/69
|
Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for evaluating synthetic or decoded voice signals |
G10L 25/72
|
Speech or voice analysis techniques not restricted to a single one of groups specially adapted for particular use for transmitting results of analysis |
G10L 25/75
|
Speech or voice analysis techniques not restricted to a single one of groups for modelling vocal tract parameters |
G10L 25/78
|
Detection of presence or absence of voice signals |
G10L 25/81
|
Detection of presence or absence of voice signals for discriminating voice from music |
G10L 25/84
|
Detection of presence or absence of voice signals for discriminating voice from noise |
G10L 25/87
|
Detection of discrete points within a voice signal |
G10L 25/90
|
Pitch determination of speech signals |
G10L 25/93
|
Discriminating between voiced and unvoiced parts of speech signals |
G10L 99/00
|
Subject matter not provided for in other groups of this subclass |