Disclosed herein are system, method, and computer program product aspects for According to some aspects, a computing device (e.g., a server, a cloud-based device, an application-service device, etc.) may identify a characteristic of content received via a recording application on a user device (e.g., a mobile device, a smart device, a computing device, etc.). A type of the user device may be determined based on an identifier received with the content. Based on the type of the user device, an instruction may be sent to the user device that causes a change in an operational state of a component of the user device that is utilized by the recording application. Remediation instructions that remediate the characteristic of the content may be sent to the user device based on an indication of the change in the operation state of the audio component.
Techniques have been developed to facilitate (1) the capture and pitch correction of vocal performances on handheld or other portable computing devices and (2) the mixing of such pitch-corrected vocal performances with backing tracks for audible rendering on targets that include such portable computing devices and as well as desktops, workstations, gaming stations, even telephony targets. Implementations of the described techniques employ signal processing techniques and allocations of system functionality that are suitable given the generally limited capabilities of such handheld or portable computing devices and that facilitate efficient encoding and communication of the pitch-corrected vocal performances (or precursors or derivatives thereof) via wireless and/or wired bandwidth-limited networks for rendering on portable computing devices or other targets.
G10L 21/0356 - Amélioration de l'intelligibilité de la parole, p.ex. réduction de bruit ou annulation d'écho en changeant l’amplitude pour la synchronisation avec d’autres signaux, p.ex. signaux vidéo
3.
AUDIOVISUAL COLLABORATION METHOD WITH LATENCY MANAGEMENT FOR WIDE-AREA BROADCAST
Techniques have been developed to facilitate the livestreaming of group audiovisual performances. Audiovisual performances including vocal music are captured and coordinated with performances of other users in ways that can create compelling user and listener experiences. For example, in some cases or embodiments, duets with a host performer may be supported in a sing-with-the-artist style audiovisual livestream in which aspiring vocalists request or queue particular songs for a live radio show entertainment format. The developed techniques provide a communications latency-tolerant mechanism for synchronizing vocal performances captured at geographically-separated devices (e.g., at globally-distributed, but network-connected mobile phones or tablets or at audiovisual capture devices geographically separated from a live studio).
H04N 21/43 - Traitement de contenu ou données additionnelles, p.ex. démultiplexage de données additionnelles d'un flux vidéo numérique; Opérations élémentaires de client, p.ex. surveillance du réseau domestique ou synchronisation de l'horloge du décodeur; Intergiciel de client
H04N 21/4788 - Services additionnels, p.ex. affichage de l'identification d'un appelant téléphonique ou application d'achat communication avec d'autres utilisateurs, p.ex. discussion en ligne
H04N 21/462 - Gestion de contenu ou de données additionnelles, p.ex. création d'un guide de programmes électronique maître à partir de données reçues par Internet et d'une tête de réseau ou contrôle de la complexité d'un flux vidéo en dimensionnant la résolution o
H04L 65/611 - Diffusion en flux de paquets multimédias pour la prise en charge des services de diffusion par flux unidirectionnel, p.ex. radio sur Internet pour la multidiffusion ou la diffusion
H04L 65/612 - Diffusion en flux de paquets multimédias pour la prise en charge des services de diffusion par flux unidirectionnel, p.ex. radio sur Internet pour monodiffusion [unicast]
H04L 65/75 - Gestion des paquets du réseau multimédia
4.
AUDIO-VISUAL EFFECTS SYSTEM FOR AUGMENTATION OF CAPTURED PERFORMANCE BASED ON CONTENT THEREOF
Visual effects schedules are applied to audiovisual performances with differing visual effects applied in correspondence with differing elements of musical structure. Segmentation techniques applied to one or more audio tracks (e.g., vocal or backing tracks) are used to compute some of the components of the musical structure. In some cases, applied visual effects schedules are mood-denominated and may be selected by a performer as a component of his or her visual expression or determined from an audiovisual performance using machine learning techniques.
G11B 27/02 - Montage, p.ex. variation de l'ordre des signaux d'information enregistrés sur, ou reproduits à partir des supports d'enregistrement ou d'information
Audiovisual performances, including vocal music, are captured and coordinated with those of other users in ways that create compelling user experiences. In some cases, the vocal performances of individual users are captured (together with performance synchronized video) on mobile devices, television-type display and/or set-top box equipment in the context of karaoke-style presentations of lyrics in correspondence with audible renderings of a backing track. Contributions of multiple vocalists are coordinated and mixed in a manner that selects for visually prominent presentation performance synchronized video of one or more of the contributors. Prominence of particular performance synchronized video may be based, at least in part, on computationally-defined audio features extracted from (or computed over) captured vocal audio. Over the course of a coordinated audiovisual performance timeline, these computationally-defined audio features are selective for performance synchronized video of one or more of the contributing vocalists.
Disclosed herein are computer-implemented method, system, and computer-readable storage-medium embodiments for implementing template-based excerpting and rendering of multimedia performances technologies. An embodiment includes at least one computer processor configured to retrieve a first content instance and applying a template that results in transforming the first content instance. The first content instance may include a plurality of structural elements. The first content instance may be transformed by a rendering engine running on the at least one computer processor and/or transmitted to a content-playback device. An embodiment of transforming the first content instance includes trimming the content instance based on requirements provided by social media platforms.
Disclosed herein are computer-implemented method, system, and computer-readable storage-medium embodiments for implementing template-based excerpting and rendering of multimedia performances technologies. An embodiment includes at least one computer processor configured to retrieve a first content instance and corresponding first metadata. The first content instance may include a first plurality of structural elements, with at least one structural element corresponding to at least part of the first metadata. The first content instance may be transformed by a rendering engine running on the at least one computer processor and/or transmitted to a content-playback device.
User interface techniques provide user vocalists with mechanisms for seeding subsequent performances by other users (e.g., joiners). A seed may be a full-length seed spanning much or all of a pre-existing audio (or audiovisual) work and mixing, to seed further contributions of one or more joiners, a user's captured media content for at least some portions of the audio (or audiovisual) work. A short seed may span less than all (and in some cases, much less than all) of the audio (or audiovisual) work. For example, a verse, chorus, refrain, hook or other limited “chunk” of an audio (or audiovisual) work may constitute a seed. A seeding user's call invites other users to join the full-length or short-form seed by singing along, singing a particular vocal part or musical section, singing harmony or other duet part, rapping, talking, clapping, recording video, adding a video clip from camera roll, etc. The resulting group performance, whether full-length or just a chunk, may be posted, livestreamed, or otherwise disseminated in a social network.
Techniques have been developed to facilitate the livestreaming of group audiovisual performances. Audiovisual performances including vocal music are captured and coordinated with performances of other users in ways that can create compelling user and listener experiences. For example, in some cases or embodiments, duets with a host performer may be supported in a sing-with-the-artist style audiovisual livestream in which aspiring vocalists request or queue particular songs for a live radio show entertainment format. The developed techniques provide a communications latency-tolerant mechanism for synchronizing vocal performances captured at geographically-separated devices (e.g., at globally-distributed, but network-connected mobile phones or tablets or at audiovisual capture devices geographically separated from a live studio).
H04N 21/233 - Traitement de flux audio élémentaires
H04N 21/43 - Traitement de contenu ou données additionnelles, p.ex. démultiplexage de données additionnelles d'un flux vidéo numérique; Opérations élémentaires de client, p.ex. surveillance du réseau domestique ou synchronisation de l'horloge du décodeur; Intergiciel de client
H04L 65/75 - Gestion des paquets du réseau multimédia
10.
Crowd-sourced technique for pitch track generation
Digital signal processing and machine learning techniques can be employed in a vocal capture and performance social network to computationally generate vocal pitch tracks from a collection of vocal performances captured against a common temporal baseline such as a backing track or an original performance by a popularizing artist. In this way, crowd-sourced pitch tracks may be generated and distributed for use in subsequent karaoke-style vocal audio captures or other applications. Large numbers of performances of a song can be used to generate a pitch track. Computationally determined pitch trackings from individual audio signal encodings of the crowd-sourced vocal performance set are aggregated and processed as an observation sequence of a trained Hidden Markov Model (HMM) or other statistical model to produce an output pitch track.
Disclosed herein are computer-implemented method, system, and computer-readable storage-medium embodiments for implementing user-generated templates for segmented multimedia performances. An embodiment includes at least one computer processor configured to transmit a first version of a content instance and corresponding metadata. The first version of the content instance may include a plurality of structural elements, with at least one structural element corresponding to at least part of the metadata. The first content instance may be transformed by a rendering engine triggered by the at least one computer processor.
Vocal audio of a user together with performance synchronized video is captured and coordinated with audiovisual contributions of other users to form composite duet-style or glee club-style or window-paned music video-style audiovisual performances. In some cases, the vocal performances of individual users are captured (together with performance synchronized video) on mobile devices, television-type display and/or set-top box equipment in the context of karaoke-style presentations of lyrics in correspondence with audible renderings of a backing track. Contributions of multiple vocalists are coordinated and mixed in a manner that selects for presentation, at any given time along a given performance timeline, performance synchronized video of one or more of the contributors. Selections are in accord with a visual progression that codes a sequence of visual layouts in correspondence with other coded aspects of a performance score such as pitch tracks, backing audio, lyrics, sections and/or vocal parts.
G11B 27/02 - Montage, p.ex. variation de l'ordre des signaux d'information enregistrés sur, ou reproduits à partir des supports d'enregistrement ou d'information
G11B 27/031 - Montage électronique de signaux d'information analogiques numérisés, p.ex. de signaux audio, vidéo
G11B 27/10 - Indexation; Adressage; Minutage ou synchronisation; Mesure de l'avancement d'une bande
G11B 27/28 - Indexation; Adressage; Minutage ou synchronisation; Mesure de l'avancement d'une bande en utilisant une information détectable sur le support d'enregistrement en utilisant des signaux d'information enregistrés par le même procédé que pour l'enregistrement principal
Captured vocals may be automatically transformed using advanced digital signal processing techniques that provide captivating applications, and even purpose-built devices, in which mere novice user-musicians may generate, audibly render and share musical performances. In some cases, the automated transformations allow spoken vocals to be segmented, arranged, temporally aligned with a target rhythm, meter or accompanying backing tracks and pitch corrected in accord with a score or note sequence. Speech-to-song music applications are one such example. In some cases, spoken vocals may be transformed in accord with musical genres such as rap using automated segmentation and temporal alignment techniques, often without pitch correction. Such applications, which may employ different signal processing and different automated transformations, may nonetheless be understood as speech-to-rap variations on the theme.
G10L 19/02 - Techniques d'analyse ou de synthèse de la parole ou des signaux audio pour la réduction de la redondance, p.ex. dans les vocodeurs; Codage ou décodage de la parole ou des signaux audio utilisant les modèles source-filtre ou l’analyse psychoacoustique utilisant l'analyse spectrale, p.ex. vocodeurs à transformée ou vocodeurs à sous-bandes
G10L 21/055 - Compression ou expansion temporelles pour la synchronisation avec d’autres signaux, p.ex. signaux vidéo
G10L 19/00 - Techniques d'analyse ou de synthèse de la parole ou des signaux audio pour la réduction de la redondance, p.ex. dans les vocodeurs; Codage ou décodage de la parole ou des signaux audio utilisant les modèles source-filtre ou l’analyse psychoacoustique
14.
Augmented Reality Filters for Captured Audiovisual Performances
Visual effects, including augmented reality-type visual effects, are applied to audiovisual performances with differing visual effects and/or parameterizations thereof applied in correspondence with computationally determined audio features or elements of musical structure coded in temporally-synchronized tracks or computationally determined therefrom. Segmentation techniques applied to one or more audio tracks (e.g., vocal or backing tracks) are used to compute some of the components of the musical structure. In some cases, applied visual effects are based on an audio feature computationally extracted from a captured audiovisual performance or from an audio track temporally-synchronized therewith.
G10H 1/00 - INSTRUMENTS DE MUSIQUE ÉLECTROPHONIQUES; INSTRUMENTS DANS LESQUELS LES SONS SONT PRODUITS PAR DES MOYENS ÉLECTROMÉCANIQUES OU DES GÉNÉRATEURS ÉLECTRONIQUES, OU DANS LESQUELS LES SONS SONT SYNTHÉTISÉS À PARTIR D'UNE MÉMOIRE DE DONNÉES Éléments d'instruments de musique électrophoniques
15.
WIRELESS HANDHELD AUDIO CAPTURE DEVICE AND MULTI-VOCALIST METHOD FOR AUDIOVISUAL MEDIA APPLICATION
Embodiments described herein relate generally to systems comprising a display device, a display device-coupled computing platform, a mobile device in communication with the computing platform, and a content server in which methods and techniques of capture and/or processing of audiovisual performances are described and, in particular, description of techniques suitable for use in connection with display device connected computing platforms for rendering vocal performance captured by a handheld computing device.
G06F 1/16 - TRAITEMENT ÉLECTRIQUE DE DONNÉES NUMÉRIQUES - Détails non couverts par les groupes et - Détails ou dispositions de structure
H04M 1/72412 - Interfaces utilisateur spécialement adaptées aux téléphones sans fil ou mobiles avec des moyens de soutien local des applications accroissant la fonctionnalité par interfaçage avec des accessoires externes utilisant des interfaces sans fil bidirectionnelles à courte portée
16.
CROWD-SOURCED DEVICE LATENCY ESTIMATION FOR SYNCHRONIZATION OF RECORDINGS IN VOCAL CAPTURE APPLICATIONS
Latency on different devices (e.g., devices of differing brand, model, vintage, etc.) can vary significantly and tens of milliseconds can affect human perception of lagging and leading components of a performance. As a result, use of a uniform latency estimate across a wide variety of devices is unlikely to provide good results, and hand-estimating round-trip latency across a wide variety of devices is costly and would constantly need to be updated for new devices. Instead, a system has been developed for crowdsourcing latency estimates.
G10L 25/60 - Techniques d'analyses de la parole ou de la voix qui ne se limitent pas à un seul des groupes spécialement adaptées pour un usage particulier pour comparaison ou différentiation pour mesurer la qualité des signaux de voix
17.
COORDINATING AND MIXING VOCALS CAPTURED FROM GEOGRAPHICALLY DISTRIBUTED PERFORMERS
Despite many practical limitations imposed by mobile device platforms and application execution environments, vocal musical performances may be captured and continuously pitch-corrected for mixing and rendering with backing tracks in ways that create compelling user experiences. Based on the techniques described herein, even mere amateurs are encouraged to share with friends and family or to collaborate and contribute vocal performances as part of virtual “glee clubs.” In some implementations, these interactions are facilitated through social network- and/or eMail-mediated sharing of performances and invitations to join in a group performance. Using uploaded vocals captured at clients such as a mobile device, a content server (or service) can mediate such virtual glee clubs by manipulating and mixing the uploaded vocal performances of multiple contributing vocalists.
Visual effects, including augmented reality-type visual effects, are applied to audiovisual performances with differing visual effects and/or parameterizations thereof applied in correspondence with computationally determined audio features or elements of musical structure coded in temporally-synchronized tracks or computationally determined therefrom. Segmentation techniques applied to one or more audio tracks (e.g., vocal or backing tracks) are used to compute some of the components of the musical structure. In some cases, applied visual effects are based on an audio feature computationally extracted from a captured audiovisual performance or from an audio track temporally-synchronized therewith.
User interface techniques provide user vocalists with mechanisms for forward and backward traversal of audiovisual content, including pitch cues, waveform- or envelope-type performance timelines, lyrics and/or other temporally-synchronized content at record-time, during edits, and/or in playback. Recapture of selected performance portions, coordination of group parts, and overdubbing may all be facilitated. Direct scrolling to arbitrary points in the performance timeline, lyrics, pitch cues and other temporally-synchronized content allows user to conveniently move through a capture or audiovisual edit session. In some cases, a user vocalist may be guided through the performance timeline, lyrics, pitch cues and other temporally-synchronized content in correspondence with group part information such as in a guided short-form capture for a duet. A scrubber allows user vocalists to conveniently move forward and backward through the temporally-synchronized content.
G10H 1/00 - INSTRUMENTS DE MUSIQUE ÉLECTROPHONIQUES; INSTRUMENTS DANS LESQUELS LES SONS SONT PRODUITS PAR DES MOYENS ÉLECTROMÉCANIQUES OU DES GÉNÉRATEURS ÉLECTRONIQUES, OU DANS LESQUELS LES SONS SONT SYNTHÉTISÉS À PARTIR D'UNE MÉMOIRE DE DONNÉES Éléments d'instruments de musique électrophoniques
20.
Audiovisual collaboration method with latency management for wide-area broadcast
Techniques have been developed to facilitate the livestreaming of group audiovisual performances. Audiovisual performances including vocal music are captured and coordinated with performances of other users in ways that can create compelling user and listener experiences. For example, in some cases or embodiments, duets with a host performer may be supported in a sing-with-the-artist style audiovisual livestream in which aspiring vocalists request or queue particular songs for a live radio show entertainment format. The developed techniques provide a communications latency-tolerant mechanism for synchronizing vocal performances captured at geographically-separated devices (e.g., at globally-distributed, but network-connected mobile phones or tablets or at audiovisual capture devices geographically separated from a live studio).
H04N 21/43 - Traitement de contenu ou données additionnelles, p.ex. démultiplexage de données additionnelles d'un flux vidéo numérique; Opérations élémentaires de client, p.ex. surveillance du réseau domestique ou synchronisation de l'horloge du décodeur; Intergiciel de client
H04N 21/4788 - Services additionnels, p.ex. affichage de l'identification d'un appelant téléphonique ou application d'achat communication avec d'autres utilisateurs, p.ex. discussion en ligne
H04N 21/242 - Procédés de synchronisation, p.ex. traitement de références d'horloge de programme [PCR]
H04N 21/462 - Gestion de contenu ou de données additionnelles, p.ex. création d'un guide de programmes électronique maître à partir de données reçues par Internet et d'une tête de réseau ou contrôle de la complexité d'un flux vidéo en dimensionnant la résolution o
H04L 65/611 - Diffusion en flux de paquets multimédias pour la prise en charge des services de diffusion par flux unidirectionnel, p.ex. radio sur Internet pour la multidiffusion ou la diffusion
H04L 65/612 - Diffusion en flux de paquets multimédias pour la prise en charge des services de diffusion par flux unidirectionnel, p.ex. radio sur Internet pour monodiffusion [unicast]
H04L 65/75 - Gestion des paquets du réseau multimédia
21.
User-generated templates for segmented multimedia performance
Disclosed herein are computer-implemented method, system, and computer-readable storage-medium embodiments for implementing user-generated templates for segmented multimedia performances. An embodiment includes at least one computer processor configured to transmit a first version of a content instance and corresponding metadata. The first version of the content instance may include a plurality of structural elements, with at least one structural element corresponding to at least part of the metadata. The first content instance may be transformed by a rendering engine triggered by the at least one computer processor.
Social music system and method with continuous, real-time pitch correction of vocal performance and dry vocal capture for subsequent re-rendering based on selectively applicable vocal effect(s) schedule(s)
Embodiments described provide a method for mixing vocal performances from different vocalists. A vocal score temporally synchronized with a corresponding backing track and lyrics is retrieved via a communications interface of a portable computing device. A first vocal performance of a user is captured, via a microphone interface of the portable computing device, and in correspondence with the backing track. An open call indication for soliciting, from a second vocalist, a second vocal performance to be mixed for audible rendering with the first vocal performance is transmitted. A mix to one of the user and the second vocalist is provided by selecting, based on to whom the mix is provided, the mix from alternative mixes each having a different prominent vocal performance.
G10H 1/10 - Circuits pour établir le contenu harmonique des sons en combinant des sons pour obtenir des effets de chœur, des effets célestes ou des effets d'ensemble
G10L 21/013 - Adaptation à la hauteur tonale ciblée
23.
Template-based excerpting and rendering of multimedia performance
Disclosed herein are computer-implemented method, system, and computer-readable storage-medium embodiments for implementing template-based excerpting and rendering of multimedia performances technologies. An embodiment includes at least one computer processor configured to retrieve a first content instance and corresponding first metadata. The first content instance may include a first plurality of structural elements, with at least one structural element corresponding to at least part of the first metadata. The first content instance may be transformed by a rendering engine running on the at least one computer processor and/or transmitted to a content-playback device.
H04N 9/80 - Transformation du signal de télévision pour l'enregistrement, p.ex. modulation, changement de fréquence; Transformation inverse pour la reproduction
Techniques have been developed to facilitate (1) the capture and pitch correction of vocal performances on handheld or other portable computing devices and (2) the mixing of such pitch-corrected vocal performances with backing tracks for audible rendering on targets that include such portable computing devices and as well as desktops, workstations, gaming stations, even telephony targets. Implementations of the described techniques employ signal processing techniques and allocations of system functionality that are suitable given the generally limited capabilities of such handheld or portable computing devices and that facilitate efficient encoding and communication of the pitch-corrected vocal performances (or precursors or derivatives thereof) via wireless and/or wired bandwidth-limited networks for rendering on portable computing devices or other targets.
G10H 1/02 - Moyens pour contrôler la fréquence des sons, p.ex. attaque ou affaiblissement; Moyens pour produire des effets musicaux particuliers, p.ex. vibratos ou glissandos
G10L 21/013 - Adaptation à la hauteur tonale ciblée
G10L 21/0356 - Amélioration de l'intelligibilité de la parole, p.ex. réduction de bruit ou annulation d'écho en changeant l’amplitude pour la synchronisation avec d’autres signaux, p.ex. signaux vidéo
25.
Coordinating and mixing audiovisual content captured from geographically distributed performers
Audiovisual performances, including vocal music, are captured and coordinated with those of other users in ways that create compelling user experiences. In some cases, the vocal performances of individual users are captured (together with performance synchronized video) on mobile devices, television-type display and/or set-top box equipment in the context of karaoke-style presentations of lyrics in correspondence with audible renderings of a backing track. Contributions of multiple vocalists are coordinated and mixed in a manner that selects for visually prominent presentation performance synchronized video of one or more of the contributors. Prominence of particular performance synchronized video may be based, at least in part, on computationally-defined audio features extracted from (or computed over) captured vocal audio. Over the course of a coordinated audiovisual performance timeline, these computationally-defined audio features are selective for performance synchronized video of one or more of the contributing vocalists.
Digital signal processing and machine learning techniques can be employed in a vocal capture and performance social network to computationally generate vocal pitch tracks from a collection of vocal performances captured against a common temporal baseline such as a backing track or an original performance by a popularizing artist. In this way, crowd-sourced pitch tracks may be generated and distributed for use in subsequent karaoke-style vocal audio captures or other applications. Large numbers of performances of a song can be used to generate a pitch track. Computationally determined pitch trackings from individual audio signal encodings of the crowd-sourced vocal performance set are aggregated and processed as an observation sequence of a trained Hidden Markov Model (HMM) or other statistical model to produce an output pitch track.
Coordinated audio and video filter pairs are applied to enhance artistic and emotional content of audiovisual performances. Such filter pairs, when applied in audio and video processing pipelines of an audiovisual application hosted on a portable computing device (such as a mobile phone or media player, a computing pad or tablet, a game controller or a personal digital assistant or book reader) can allow user selection of effects that enhance both audio and video coordinated therewith. Coordinated audio and video are captured, filtered and rendered at the portable computing device using camera and microphone interfaces, using digital signal processing software executable on a processor and using storage, speaker and display devices of, or interoperable with, the device. By providing audiovisual capture and personalization on an intimate handheld device, social interactions and postings of a type made popular by modern social networking platforms can now be extended to audiovisual content.
G10L 21/055 - Compression ou expansion temporelles pour la synchronisation avec d’autres signaux, p.ex. signaux vidéo
G11B 27/031 - Montage électronique de signaux d'information analogiques numérisés, p.ex. de signaux audio, vidéo
G06F 3/0482 - Interaction avec des listes d’éléments sélectionnables, p.ex. des menus
G06F 3/04842 - Sélection des objets affichés ou des éléments de texte affichés
G10L 21/003 - Changement de la qualité de la voix, p.ex. de la hauteur tonale ou des formants
G10H 1/00 - INSTRUMENTS DE MUSIQUE ÉLECTROPHONIQUES; INSTRUMENTS DANS LESQUELS LES SONS SONT PRODUITS PAR DES MOYENS ÉLECTROMÉCANIQUES OU DES GÉNÉRATEURS ÉLECTRONIQUES, OU DANS LESQUELS LES SONS SONT SYNTHÉTISÉS À PARTIR D'UNE MÉMOIRE DE DONNÉES Éléments d'instruments de musique électrophoniques
Vocal audio of a user together with performance synchronized video is captured and coordinated with audiovisual contributions of other users to form composite duet-style or glee club-style or window-paned music video-style audiovisual performances. In some cases, the vocal performances of individual users are captured (together with performance synchronized video) on mobile devices, television-type display and/or set-top box equipment in the context of karaoke-style presentations of lyrics in correspondence with audible renderings of a backing track. Contributions of multiple vocalists are coordinated and mixed in a manner that selects for presentation, at any given time along a given performance timeline, performance synchronized video of one or more of the contributors. Selections are in accord with a visual progression that codes a sequence of visual layouts in correspondence with other coded aspects of a performance score such as pitch tracks, backing audio, lyrics, sections and/or vocal parts.
G11B 27/02 - Montage, p.ex. variation de l'ordre des signaux d'information enregistrés sur, ou reproduits à partir des supports d'enregistrement ou d'information
G11B 27/031 - Montage électronique de signaux d'information analogiques numérisés, p.ex. de signaux audio, vidéo
G11B 27/10 - Indexation; Adressage; Minutage ou synchronisation; Mesure de l'avancement d'une bande
G11B 27/28 - Indexation; Adressage; Minutage ou synchronisation; Mesure de l'avancement d'une bande en utilisant une information détectable sur le support d'enregistrement en utilisant des signaux d'information enregistrés par le même procédé que pour l'enregistrement principal
Disclosed herein are computer-implemented method, system, and computer-readable storage-medium embodiments for implementing template-based excerpting and rendering of multimedia performances technologies. An embodiment includes at least one computer processor configured to retrieve a first content instance and corresponding first metadata. The first content instance may include a first plurality of structural elements, with at least one structural element corresponding to at least part of the first metadata. An embodiment may further include selecting a first template comprising a first set of parameters. A parameter of the first set of parameters may be applicable to the at least one structural element. Applicable parameter(s) of the first template may be actively associated with the at least part of the first metadata corresponding to the at least one structural element. The first content instance may be transformed by a rendering engine running on the at least one computer processor.
H04N 9/80 - Transformation du signal de télévision pour l'enregistrement, p.ex. modulation, changement de fréquence; Transformation inverse pour la reproduction
Captured vocals may be automatically transformed using advanced digital signal processing techniques that provide captivating applications, and even purpose-built devices, in which mere novice user-musicians may generate, audibly render and share musical performances. In some cases, the automated transformations allow spoken vocals to be segmented, arranged, temporally aligned with a target rhythm, meter or accompanying backing tracks and pitch corrected in accord with a score or note sequence. Speech-to-song music applications are one such example. In some cases, spoken vocals may be transformed in accord with musical genres such as rap using automated segmentation and temporal alignment techniques, often without pitch correction. Such applications, which may employ different signal processing and different automated transformations, may nonetheless be understood as speech-to-rap variations on the theme.
G10L 21/055 - Compression ou expansion temporelles pour la synchronisation avec d’autres signaux, p.ex. signaux vidéo
G10L 19/02 - Techniques d'analyse ou de synthèse de la parole ou des signaux audio pour la réduction de la redondance, p.ex. dans les vocodeurs; Codage ou décodage de la parole ou des signaux audio utilisant les modèles source-filtre ou l’analyse psychoacoustique utilisant l'analyse spectrale, p.ex. vocodeurs à transformée ou vocodeurs à sous-bandes
G10L 19/00 - Techniques d'analyse ou de synthèse de la parole ou des signaux audio pour la réduction de la redondance, p.ex. dans les vocodeurs; Codage ou décodage de la parole ou des signaux audio utilisant les modèles source-filtre ou l’analyse psychoacoustique
31.
Coordinating and mixing vocals captured from geographically distributed performers
Despite many practical limitations imposed by mobile device platforms and application execution environments, vocal musical performances may be captured and continuously pitch-corrected for mixing and rendering with backing tracks in ways that create compelling user experiences. Based on the techniques described herein, even mere amateurs are encouraged to share with friends and family or to collaborate and contribute vocal performances as part of virtual “glee clubs.” In some implementations, these interactions are facilitated through social network- and/or eMail-mediated sharing of performances and invitations to join in a group performance. Using uploaded vocals captured at clients such as a mobile device, a content server (or service) can mediate such virtual glee clubs by manipulating and mixing the uploaded vocal performances of multiple contributing vocalists.
An application that manipulates audio (or audiovisual) content, automated music creation technologies may be employed to generate new musical content using digital signal processing software hosted on handheld and/or server (or cloud-based) compute platforms to intelligently process and combine a set of audio content captured and submitted by users of modern mobile phones or other handheld compute platforms. The user-submitted recordings may contain speech, singing, musical instruments, or a wide variety of other sound sources, and the recordings may optionally be preprocessed by the handheld devices prior to submission.
G10L 21/00 - Traitement du signal de parole ou de voix pour produire un autre signal audible ou non audible, p.ex. visuel ou tactile, afin de modifier sa qualité ou son intelligibilité
33.
Non-linear media segment capture and edit platform
User interface techniques provide user vocalists with mechanisms for forward and backward traversal of audiovisual content, including pitch cues, waveform- or envelope-type performance timelines, lyrics and/or other temporally-synchronized content at record-time, during edits, and/or in playback. Recapture of selected performance portions, coordination of group parts, and overdubbing may all be facilitated. Direct scrolling to arbitrary points in the performance timeline, lyrics, pitch cues and other temporally-synchronized content allows user to conveniently move through a capture or audiovisual edit session. In some cases, a user vocalist may be guided through the performance timeline, lyrics, pitch cues and other temporally-synchronized content in correspondence with group part information such as in a guided short-form capture for a duet. A scrubber allows user vocalists to conveniently move forward and backward through the temporally-synchronized content.
H04N 21/439 - Traitement de flux audio élémentaires
H04N 21/4402 - Traitement de flux élémentaires vidéo, p.ex. raccordement d'un clip vidéo récupéré d'un stockage local avec un flux vidéo en entrée ou rendu de scènes selon des graphes de scène MPEG-4 impliquant des opérations de reformatage de signaux vidéo pour la redistribution domestique, le stockage ou l'affichage en temps réel
H04N 21/466 - Procédé d'apprentissage pour la gestion intelligente, p.ex. apprentissage des préférences d'utilisateurs pour recommander des films
H04N 21/43 - Traitement de contenu ou données additionnelles, p.ex. démultiplexage de données additionnelles d'un flux vidéo numérique; Opérations élémentaires de client, p.ex. surveillance du réseau domestique ou synchronisation de l'horloge du décodeur; Intergiciel de client
34.
Audiovisual collaboration system and method with seed/join mechanic
User interface techniques provide user vocalists with mechanisms for seeding subsequent performances by other users (e.g., joiners). A seed may be a full-length seed spanning much or all of a pre-existing audio (or audiovisual) work and mixing, to seed further contributions of one or more joiners, a user's captured media content for at least some portions of the audio (or audiovisual) work. A short seed may span less than all (and in some cases, much less than all) of the audio (or audiovisual) work. For example, a verse, chorus, refrain, hook or other limited “chunk” of an audio (or audiovisual) work may constitute a seed. A seeding user's call invites other users to join the full-length or short-form seed by singing along, singing a particular vocal part or musical section, singing harmony or other duet part, rapping, talking, clapping, recording video, adding a video clip from camera roll, etc. The resulting group performance, whether full-length or just a chunk, may be posted, livestreamed, or otherwise disseminated in a social network.
Latency on different devices (e.g., devices of differing brand, model, vintage, etc.) can vary significantly and tens of milliseconds can affect human perception of lagging and leading components of a performance. As a result, use of a uniform latency estimate across a wide variety of devices is unlikely to provide good results, and hand-estimating round-trip latency across a wide variety of devices is costly and would constantly need to be updated for new devices. Instead, a system has been developed for crowdsourcing latency estimates.
G06F 17/00 - TRAITEMENT ÉLECTRIQUE DE DONNÉES NUMÉRIQUES Équipement ou méthodes de traitement de données ou de calcul numérique, spécialement adaptés à des fonctions spécifiques
H04R 29/00 - Dispositifs de contrôle; Dispositifs de tests
G10L 25/60 - Techniques d'analyses de la parole ou de la voix qui ne se limitent pas à un seul des groupes spécialement adaptées pour un usage particulier pour comparaison ou différentiation pour mesurer la qualité des signaux de voix
36.
Social music system and method with continuous, real-time pitch correction of vocal performance and dry vocal capture for subsequent re-rendering based on selectively applicable vocal effect(s) schedule(s)
Vocal musical performances may be captured and, in some cases or embodiments, pitch-corrected and/or processed in accord with a user selectable vocal effects schedule for mixing and rendering with backing tracks in ways that create compelling user experiences. In some cases, the vocal performances of individual users are captured on mobile devices in the context of a karaoke-style presentation of lyrics in correspondence with audible renderings of a backing track. Such performances can be pitch-corrected in real-time at the mobile device in accord with pitch correction settings. Vocal effects schedules may also be selectively applied to such performances. In these ways, even amateur user/performers with imperfect pitch are encouraged to take a shot at “stardom” and/or take part in a game play, social network or vocal achievement application architecture that facilitates musical collaboration on a global scale and/or, in some cases or embodiments, to initiate revenue generating in-application transactions.
G10H 1/10 - Circuits pour établir le contenu harmonique des sons en combinant des sons pour obtenir des effets de chœur, des effets célestes ou des effets d'ensemble
G10L 21/013 - Adaptation à la hauteur tonale ciblée
37.
Crowd-sourced device latency estimation for synchronization of recordings in vocal capture applications
Latency on different devices (e.g., devices of differing brand, model, vintage, etc.) can vary significantly and tens of milliseconds can affect human perception of lagging and leading components of a performance. As a result, use of a uniform latency estimate across a wide variety of devices is unlikely to provide good results, and hand-estimating round-trip latency across a wide variety of devices is costly and would constantly need to be updated for new devices. Instead, a system has been developed for crowdsourcing latency estimates.
G06F 17/00 - TRAITEMENT ÉLECTRIQUE DE DONNÉES NUMÉRIQUES Équipement ou méthodes de traitement de données ou de calcul numérique, spécialement adaptés à des fonctions spécifiques
H04R 29/00 - Dispositifs de contrôle; Dispositifs de tests
G10L 25/60 - Techniques d'analyses de la parole ou de la voix qui ne se limitent pas à un seul des groupes spécialement adaptées pour un usage particulier pour comparaison ou différentiation pour mesurer la qualité des signaux de voix
Visual effects schedules are applied to audiovisual performances with differing visual effects applied in correspondence with differing elements of musical structure. Segmentation techniques applied to one or more audio tracks (e.g., vocal or backing tracks) are used to compute some of the components of the musical structure. In some cases, applied visual effects schedules are mood-denominated and may be selected by a performer as a component of his or her visual expression or determined from an audiovisual performance using machine learning techniques.
G11B 27/02 - Montage, p.ex. variation de l'ordre des signaux d'information enregistrés sur, ou reproduits à partir des supports d'enregistrement ou d'information
Vocal audio of a user together with performance synchronized video is captured and coordinated with audiovisual contributions of other users to form composite duet-style or glee club-style or window-paned music video-style audiovisual performances. In some cases, the vocal performances of individual users are captured (together with performance synchronized video) on mobile devices, television-type display and/or set-top box equipment in the context of karaoke-style presentations of lyrics in correspondence with audible renderings of a backing track. Contributions of multiple vocalists are coordinated and mixed in a manner that selects for presentation, at any given time along a given performance timeline, performance synchronized video of one or more of the contributors. Selections are in accord with a visual progression that codes a sequence of visual layouts in correspondence with other coded aspects of a performance score such as pitch tracks, backing audio, lyrics, sections and/or vocal parts.
G11B 27/02 - Montage, p.ex. variation de l'ordre des signaux d'information enregistrés sur, ou reproduits à partir des supports d'enregistrement ou d'information
50.
Audiovisual collaboration method with latency management for wide-area broadcast
Techniques have been developed to facilitate the livestreaming of group audiovisual performances. Audiovisual performances including vocal music are captured and coordinated with performances of other users in ways that can create compelling user and listener experiences. For example, in some cases or embodiments, duets with a host performer may be supported in a sing-with-the-artist style audiovisual livestream in which aspiring vocalists request or queue particular songs for a live radio show entertainment format. The developed techniques provide a communications latency-tolerant mechanism for synchronizing vocal performances captured at geographically-separated devices (e.g., at globally-distributed, but network-connected mobile phones or tablets or at audiovisual capture devices geographically separated from a live studio).
H04N 21/43 - Traitement de contenu ou données additionnelles, p.ex. démultiplexage de données additionnelles d'un flux vidéo numérique; Opérations élémentaires de client, p.ex. surveillance du réseau domestique ou synchronisation de l'horloge du décodeur; Intergiciel de client
H04N 21/4788 - Services additionnels, p.ex. affichage de l'identification d'un appelant téléphonique ou application d'achat communication avec d'autres utilisateurs, p.ex. discussion en ligne
H04L 29/08 - Procédure de commande de la transmission, p.ex. procédure de commande du niveau de la liaison
H04L 29/06 - Commande de la communication; Traitement de la communication caractérisés par un protocole
H04N 21/242 - Procédés de synchronisation, p.ex. traitement de références d'horloge de programme [PCR]
H04N 21/462 - Gestion de contenu ou de données additionnelles, p.ex. création d'un guide de programmes électronique maître à partir de données reçues par Internet et d'une tête de réseau ou contrôle de la complexité d'un flux vidéo en dimensionnant la résolution o
Audiovisual performances, including vocal music, are captured and coordinated with those of other users in ways that create compelling user experiences. In some cases, the vocal performances of individual users are captured (together with performance synchronized video) on mobile devices, television-type display and/or set-top box equipment in the context of karaoke-style presentations of lyrics in correspondence with audible renderings of a backing track. Contributions of multiple vocalists are coordinated and mixed in a manner that selects for visually prominent presentation performance synchronized video of one or more of the contributors. Prominence of particular performance synchronized video may be based, at least in part, on computationally-defined audio features extracted from (or computed over) captured vocal audio. Over the course of a coordinated audiovisual performance timeline, these computationally-defined audio features are selective for performance synchronized video of one or more of the contributing vocalists.
In some examples, a system includes a first portable computing device that audibly renders a backing track, captures and pitch corrects a vocal performance of a first user, and transmits the first user's pitch corrected vocal performance. The system may also include a second portable computing device including a data communications interface that receives the first user's pitch corrected vocal performance, an audio transducer that audibly renders a mix of the backing track and the first user's pitch corrected vocal performance, a display for concurrent presentation of lyrics temporally synchronized with a vocal score and the backing track, a microphone interface that captures a vocal performance of a second user, and pitch correction code executable on the second portable computing device to pitch correct the second user's vocal performance in accord with the vocal score to produce a composite multi-vocal performance.
Techniques have been developed to facilitate (1) the capture and pitch correction of vocal performances on handheld or other portable computing devices and (2) the mixing of such pitch-corrected vocal performances with backing tracks for audible rendering on targets that include such portable computing devices and as well as desktops, workstations, gaming stations, even telephony targets. Implementations of the described techniques employ signal processing techniques and allocations of system functionality that are suitable given the generally limited capabilities of such handheld or portable computing devices and that facilitate efficient encoding and communication of the pitch-corrected vocal performances (or precursors or derivatives thereof) via wireless and/or wired bandwidth-limited networks for rendering on portable computing devices or other targets.
G10H 1/02 - Moyens pour contrôler la fréquence des sons, p.ex. attaque ou affaiblissement; Moyens pour produire des effets musicaux particuliers, p.ex. vibratos ou glissandos
G10L 21/0356 - Amélioration de l'intelligibilité de la parole, p.ex. réduction de bruit ou annulation d'écho en changeant l’amplitude pour la synchronisation avec d’autres signaux, p.ex. signaux vidéo
54.
Coordinating and mixing vocals captured from geographically distributed performers
Despite many practical limitations imposed by mobile device platforms and application execution environments, vocal musical performances may be captured and continuously pitch-corrected for mixing and rendering with backing tracks in ways that create compelling user experiences. Based on the techniques described herein, even mere amateurs are encouraged to share with friends and family or to collaborate and contribute vocal performances as part of virtual “glee clubs.” In some implementations, these interactions are facilitated through social network- and/or eMail-mediated sharing of performances and invitations to join in a group performance. Using uploaded vocals captured at clients such as a mobile device, a content server (or service) can mediate such virtual glee clubs by manipulating and mixing the uploaded vocal performances of multiple contributing vocalists.
G10L 21/00 - Traitement du signal de parole ou de voix pour produire un autre signal audible ou non audible, p.ex. visuel ou tactile, afin de modifier sa qualité ou son intelligibilité
G10L 21/013 - Adaptation à la hauteur tonale ciblée
Advanced, but user-friendly composition and editing environments for musical scores may be provided using the types, and in some cases the instances, of computing devices that will in turn consume musical score content so generated. Indeed, by integrating musical composition facilities within synthetic musical instruments that can be widely deployed on hand-held or portable computing devices, a social music network that includes such synthetic musical instruments gains access to a large, and potentially prolific, population of authors, editors and reviewers, as well as to the community-sourced musical scores that they can generate. By curating such content and/or by applying crowd-sourcing or other computational techniques to maintain quality, a social music network may rapidly deploy the new and ever evolving content that its user community desires.
G10H 1/00 - INSTRUMENTS DE MUSIQUE ÉLECTROPHONIQUES; INSTRUMENTS DANS LESQUELS LES SONS SONT PRODUITS PAR DES MOYENS ÉLECTROMÉCANIQUES OU DES GÉNÉRATEURS ÉLECTRONIQUES, OU DANS LESQUELS LES SONS SONT SYNTHÉTISÉS À PARTIR D'UNE MÉMOIRE DE DONNÉES Éléments d'instruments de musique électrophoniques
G10H 1/32 - INSTRUMENTS DE MUSIQUE ÉLECTROPHONIQUES; INSTRUMENTS DANS LESQUELS LES SONS SONT PRODUITS PAR DES MOYENS ÉLECTROMÉCANIQUES OU DES GÉNÉRATEURS ÉLECTRONIQUES, OU DANS LESQUELS LES SONS SONT SYNTHÉTISÉS À PARTIR D'UNE MÉMOIRE DE DONNÉES Éléments d'instruments de musique électrophoniques - Parties constitutives
Vocal musical performances may be captured and continuously pitch-corrected at a mobile device for mixing and rendering with backing tracks in ways that create compelling user experiences. In some cases, the vocal performances of individual users are captured in the context of a karaoke-style presentation of lyrics in correspondence with audible renderings of a backing track. Such performances can be pitch-corrected in real-time at the mobile device in accord with pitch correction settings. In some cases, such pitch correction settings code a particular key or scale for the vocal performance or for portions thereof. In some cases, pitch correction settings include a score-coded melody sequence of note targets supplied with, or for association with, the lyrics and/or backing track. In some cases, pitch correction settings are dynamically variable based on gestures captured at a user interface.
G10L 21/0356 - Amélioration de l'intelligibilité de la parole, p.ex. réduction de bruit ou annulation d'écho en changeant l’amplitude pour la synchronisation avec d’autres signaux, p.ex. signaux vidéo
57.
Display screen or portion thereof with graphical user interface
Digital signal processing and machine learning techniques can be employed in a vocal capture and performance social network to computationally generate vocal pitch tracks from a collection of vocal performances captured against a common temporal baseline such as a backing track or an original performance by a popularizing artist. In this way, crowd-sourced pitch tracks may be generated and distributed for use in subsequent karaoke-style vocal audio captures or other applications. Large numbers of performances of a song can be used to generate a pitch track. Computationally determined pitch trackings from individual audio signal encodings of the crowd-sourced vocal performance set are aggregated and processed as an observation sequence of a trained Hidden Markov Model (HMM) or other statistical model to produce an output pitch track.
Synthetic multi-string musical instruments have been developed for capturing and rendering musical performances on handheld or other portable devices in which a multi-touch sensitive display provides one of the input vectors for an expressive performance by a user or musician. Visual cues may be provided on the multi-touch sensitive display to guide the user in a performance based on a musical score. Alternatively, or in addition, uncued freestyle modes of operation may be provided. In either case, it is not the musical score that drives digital synthesis and audible rendering of the synthetic multi-string musical instrument. Rather, it is the stream of user gestures captured at least in part using the multi-touch sensitive display that drives the digital synthesis and audible rendering.
G10H 1/06 - Circuits pour établir le contenu harmonique des sons
G10H 1/00 - INSTRUMENTS DE MUSIQUE ÉLECTROPHONIQUES; INSTRUMENTS DANS LESQUELS LES SONS SONT PRODUITS PAR DES MOYENS ÉLECTROMÉCANIQUES OU DES GÉNÉRATEURS ÉLECTRONIQUES, OU DANS LESQUELS LES SONS SONT SYNTHÉTISÉS À PARTIR D'UNE MÉMOIRE DE DONNÉES Éléments d'instruments de musique électrophoniques
G06F 3/041 - Numériseurs, p.ex. pour des écrans ou des pavés tactiles, caractérisés par les moyens de transduction
62.
Display screen or portion thereof with animated graphical user interface
Captured vocals may be automatically transformed using advanced digital signal processing techniques that provide captivating applications, and even purpose-built devices, in which mere novice user-musicians may generate, audibly render and share musical performances. In some cases, the automated transformations allow spoken vocals to be segmented, arranged, temporally aligned with a target rhythm, meter or accompanying backing tracks and pitch corrected in accord with a score or note sequence. Speech-to-song music applications are one such example. In some cases, spoken vocals may be transformed in accord with musical genres such as rap using automated segmentation and temporal alignment techniques, often without pitch correction. Such applications, which may employ different signal processing and different automated transformations, may nonetheless be understood as speech-to-rap variations on the theme.
G10L 21/055 - Compression ou expansion temporelles pour la synchronisation avec d’autres signaux, p.ex. signaux vidéo
G01L 21/04 - Indicateurs de vide ayant une chambre de compression dans laquelle le gaz dont on doit mesurer la pression est comprimé dans lesquels la chambre est fermée par un liquide; Indicateurs de vide du type MacLeod
G10L 19/02 - Techniques d'analyse ou de synthèse de la parole ou des signaux audio pour la réduction de la redondance, p.ex. dans les vocodeurs; Codage ou décodage de la parole ou des signaux audio utilisant les modèles source-filtre ou l’analyse psychoacoustique utilisant l'analyse spectrale, p.ex. vocodeurs à transformée ou vocodeurs à sous-bandes
G10L 19/00 - Techniques d'analyse ou de synthèse de la parole ou des signaux audio pour la réduction de la redondance, p.ex. dans les vocodeurs; Codage ou décodage de la parole ou des signaux audio utilisant les modèles source-filtre ou l’analyse psychoacoustique
Social music system and method with continuous, real-time pitch correction of vocal performance and dry vocal capture for subsequent re-rendering based on selectively applicable vocal effect(s) schedule(s)
Vocal musical performances may be captured and, in some cases or embodiments, pitch-corrected and/or processed in accord with a user selectable vocal effects schedule for mixing and rendering with backing tracks in ways that create compelling user experiences. In some cases, the vocal performances of individual users are captured on mobile devices in the context of a karaoke-style presentation of lyrics in correspondence with audible renderings of a backing track. Such performances can be pitch-corrected in real-time at the mobile device in accord with pitch correction settings. Vocal effects schedules may also be selectively applied to such performances. In these ways, even amateur user/performers with imperfect pitch are encouraged to take a shot at “stardom” and/or take part in a game play, social network or vocal achievement application architecture that facilitates musical collaboration on a global scale and/or, in some cases or embodiments, to initiate revenue generating in-application transactions.
G10L 21/00 - Traitement du signal de parole ou de voix pour produire un autre signal audible ou non audible, p.ex. visuel ou tactile, afin de modifier sa qualité ou son intelligibilité
Techniques have been developed for transmitting and receiving information conveyed through the air from one portable device to another as a generally unperceivable coding within an otherwise recognizable acoustic signal. For example, in some embodiments in accordance with the present invention(s), information is acoustically communicated from a first handheld device toward a second by encoding the information in a signal that, when converted into acoustic energy at an acoustic transducer of the first handheld device, is characterized in that the acoustic energy is discernable to a human ear yet the encoding of the information therein is generally not perceivable by the human. The acoustic energy is transmitted from the acoustic transducer of the first handheld device toward the second handheld device across an air gap that constitutes a substantially entirety of the distance between the devices. Acoustic energy received at the second handheld device may then be processed using signal processing techniques tailored to detection of the particular information encodings employed.
Embodiments described herein relate generally to systems comprising a display device, a display device-coupled computing platform, a mobile device in communication with the computing platform, and a content server in which methods and techniques of capture and/or processing of audiovisual performances are described and, in particular, description of techniques suitable for use in connection with display device connected computing platforms for rendering vocal performance captured by a handheld computing device.
H04N 5/775 - Circuits d'interface entre un appareil d'enregistrement et un autre appareil entre un appareil d'enregistrement et un récepteur de télévision
G06F 3/0488 - Techniques d’interaction fondées sur les interfaces utilisateur graphiques [GUI] utilisant des caractéristiques spécifiques fournies par le périphérique d’entrée, p.ex. des fonctions commandées par la rotation d’une souris à deux capteurs, ou par la nature du périphérique d’entrée, p.ex. des gestes en fonction de la pression exer utilisant un écran tactile ou une tablette numérique, p.ex. entrée de commandes par des tracés gestuels
G06F 3/0346 - Dispositifs de pointage déplacés ou positionnés par l'utilisateur; Leurs accessoires avec détection de l’orientation ou du mouvement libre du dispositif dans un espace en trois dimensions [3D], p.ex. souris 3D, dispositifs de pointage à six degrés de liberté [6-DOF] utilisant des capteurs gyroscopiques, accéléromètres ou d’inclinaiso
G06F 3/0484 - Techniques d’interaction fondées sur les interfaces utilisateur graphiques [GUI] pour la commande de fonctions ou d’opérations spécifiques, p.ex. sélection ou transformation d’un objet, d’une image ou d’un élément de texte affiché, détermination d’une valeur de paramètre ou sélection d’une plage de valeurs
Embodiments described herein relate generally to systems comprising a display device, a display device-coupled computing platform, a mobile device in communication with the computing platform, and a content server in which methods and techniques of capture and/or processing of audiovisual performances are described and, in particular, description of techniques suitable for use in connection with display device connected computing platforms for rendering vocal performance captured by a handheld computing device.
G06F 1/16 - TRAITEMENT ÉLECTRIQUE DE DONNÉES NUMÉRIQUES - Détails non couverts par les groupes et - Détails ou dispositions de structure
H04M 1/72412 - Interfaces utilisateur spécialement adaptées aux téléphones sans fil ou mobiles avec des moyens de soutien local des applications accroissant la fonctionnalité par interfaçage avec des accessoires externes utilisant des interfaces sans fil bidirectionnelles à courte portée
G10L 25/57 - Techniques d'analyses de la parole ou de la voix qui ne se limitent pas à un seul des groupes spécialement adaptées pour un usage particulier pour comparaison ou différentiation pour le traitement des signaux vidéo
G10L 21/013 - Adaptation à la hauteur tonale ciblée
H04M 1/72442 - Interfaces utilisateur spécialement adaptées aux téléphones sans fil ou mobiles avec des moyens de soutien local des applications accroissant la fonctionnalité pour faire jouer des fichiers musicaux
78.
Coordinated audio and video capture and sharing framework
Coordinated audio and video filter pairs are applied to enhance artistic and emotional content of audiovisual performances. Such filter pairs, when applied in audio and video processing pipelines of an audiovisual application hosted on a portable computing device (such as a mobile phone or media player, a computing pad or tablet, a game controller or a personal digital assistant or book reader) can allow user selection of effects that enhance both audio and video coordinated therewith. Coordinated audio and video are captured, filtered and rendered at the portable computing device using camera and microphone interfaces, using digital signal processing software executable on a processor and using storage, speaker and display devices of, or interoperable with, the device. By providing audiovisual capture and personalization on an intimate handheld device, social interactions and postings of a type made popular by modern social networking platforms can now be extended to audiovisual content.
G11B 27/031 - Montage électronique de signaux d'information analogiques numérisés, p.ex. de signaux audio, vidéo
G06F 3/0482 - Interaction avec des listes d’éléments sélectionnables, p.ex. des menus
G06F 3/0484 - Techniques d’interaction fondées sur les interfaces utilisateur graphiques [GUI] pour la commande de fonctions ou d’opérations spécifiques, p.ex. sélection ou transformation d’un objet, d’une image ou d’un élément de texte affiché, détermination d’une valeur de paramètre ou sélection d’une plage de valeurs
G10L 21/003 - Changement de la qualité de la voix, p.ex. de la hauteur tonale ou des formants
G10H 1/00 - INSTRUMENTS DE MUSIQUE ÉLECTROPHONIQUES; INSTRUMENTS DANS LESQUELS LES SONS SONT PRODUITS PAR DES MOYENS ÉLECTROMÉCANIQUES OU DES GÉNÉRATEURS ÉLECTRONIQUES, OU DANS LESQUELS LES SONS SONT SYNTHÉTISÉS À PARTIR D'UNE MÉMOIRE DE DONNÉES Éléments d'instruments de musique électrophoniques
Notwithstanding practical limitations imposed by mobile device platforms and applications, truly captivating musical instruments may be synthesized in ways that allow musically expressive performances to be captured and rendered in real-time. In some cases, synthetic musical instruments can provide a game, grading or instructional mode in which one or more qualities of a user's performance are assessed relative to a musical score. By constantly adapting to such modes to actual performance characteristics and, in some cases, to the level of a given user musician's skill, user interactions with synthetic musical instruments can be made more engaging and may capture user interest and economic opportunities (e.g., for in-app purchase and/or social networking) over generally longer periods of time.
Notwithstanding practical limitations imposed by mobile device platforms and applications, truly captivating musical instruments may be synthesized in ways that allow musically expressive performances to be captured and rendered in real-time. Synthetic musical instruments that provide a game, grading or instructional mode are described in which one or more qualities of a user's performance are assessed relative to a musical score. By providing a range of modes (from score-assisted to fully user-expressive), user interactions with synthetic musical instruments are made more engaging and tend to capture user interest over generally longer periods of time. Synthetic musical instruments are described in which force dynamics of user gestures (such as finger contact forces applied to a multi-touch sensitive display or surface and/or the temporal extent and applied pressure of sustained contact thereon) are captured and drive the digital synthesis in ways that enhance expressiveness of user performances.
G10H 1/00 - INSTRUMENTS DE MUSIQUE ÉLECTROPHONIQUES; INSTRUMENTS DANS LESQUELS LES SONS SONT PRODUITS PAR DES MOYENS ÉLECTROMÉCANIQUES OU DES GÉNÉRATEURS ÉLECTRONIQUES, OU DANS LESQUELS LES SONS SONT SYNTHÉTISÉS À PARTIR D'UNE MÉMOIRE DE DONNÉES Éléments d'instruments de musique électrophoniques
Vocal audio of a user together with performance synchronized video is captured and coordinated with audiovisual contributions of other users to form composite duet-style or glee club-style or window-paned music video-style audiovisual performances. In some cases, the vocal performances of individual users are captured (together with performance synchronized video) on mobile devices, television-type display and/or set-top box equipment in the context of karaoke-style presentations of lyrics in correspondence with audible renderings of a backing track. Contributions of multiple vocalists are coordinated and mixed in a manner that selects for presentation, at any given time along a given performance timeline, performance synchronized video of one or more of the contributors. Selections are in accord with a visual progression that codes a sequence of visual layouts in correspondence with other coded aspects of a performance score such as pitch tracks, backing audio, lyrics, sections and/or vocal parts.
G11B 27/02 - Montage, p.ex. variation de l'ordre des signaux d'information enregistrés sur, ou reproduits à partir des supports d'enregistrement ou d'information
82.
Automatic estimation of latency for synchronization of recordings in vocal capture applications
Latency on different devices (e.g., devices of differing brand, model, vintage, etc.) can vary significantly and tens of milliseconds can affect human perception of lagging and leading components of a performance. As a result, use of a uniform latency estimate across a wide variety of devices is unlikely to provide good results, and hand-estimating round-trip latency across a wide variety of devices is costly and would constantly need to be updated for new devices. Instead, a system has been developed for automatically estimating latency through audio subsystems using feedback recording and analysis of recorded audio.
G10L 21/00 - Traitement du signal de parole ou de voix pour produire un autre signal audible ou non audible, p.ex. visuel ou tactile, afin de modifier sa qualité ou son intelligibilité
Vocal musical performances may be captured and continuously pitch-corrected at a mobile device for mixing and rendering with backing tracks in ways that create compelling user experiences. In some cases, the vocal performances of individual users are captured in the context of a karaoke-style presentation of lyrics in correspondence with audible renderings of a backing track. Such performances can be pitch-corrected in real-time at the mobile device in accord with pitch correction settings. In some cases, such pitch correction settings code a particular key or scale for the vocal performance or for portions thereof. In some cases, pitch correction settings include a score-coded melody sequence of note targets supplied with, or for association with, the lyrics and/or backing track. In some cases, pitch correction settings are dynamically variable based on gestures captured at a user interface.
G10L 21/0356 - Amélioration de l'intelligibilité de la parole, p.ex. réduction de bruit ou annulation d'écho en changeant l’amplitude pour la synchronisation avec d’autres signaux, p.ex. signaux vidéo
84.
Coordinating and mixing audiovisual content captured from geographically distributed performers
Audiovisual performances, including vocal music, are captured and coordinated with those of other users in ways that create compelling user experiences. In some cases, the vocal performances of individual users are captured (together with performance synchronized video) on mobile devices, television-type display and/or set-top box equipment in the context of karaoke-style presentations of lyrics in correspondence with audible renderings of a backing track. Contributions of multiple vocalists are coordinated and mixed in a manner that selects for visually prominent presentation performance synchronized video of one or more of the contributors. Prominence of particular performance synchronized video may be based, at least in part, on computationally-defined audio features extracted from (or computed over) captured vocal audio. Over the course of a coordinated audiovisual performance timeline, these computationally-defined audio features are selective for performance synchronized video of one or more of the contributing vocalists.
Techniques have been developed to facilitate (1) the capture and pitch correction of vocal performances on handheld or other portable computing devices and (2) the mixing of such pitch-corrected vocal performances with backing tracks for audible rendering on targets that include such portable computing devices and as well as desktops, workstations, gaming stations, even telephony targets. Implementations of the described techniques employ signal processing techniques and allocations of system functionality that are suitable given the generally limited capabilities of such handheld or portable computing devices and that facilitate efficient encoding and communication of the pitch-corrected vocal performances (or precursors or derivatives thereof) via wireless and/or wired bandwidth-limited networks for rendering on portable computing devices or other targets.
G10H 1/02 - Moyens pour contrôler la fréquence des sons, p.ex. attaque ou affaiblissement; Moyens pour produire des effets musicaux particuliers, p.ex. vibratos ou glissandos
G10L 21/0356 - Amélioration de l'intelligibilité de la parole, p.ex. réduction de bruit ou annulation d'écho en changeant l’amplitude pour la synchronisation avec d’autres signaux, p.ex. signaux vidéo
86.
System and method for capture and rendering of performance on synthetic string instrument
Synthetic multi-string musical instruments have been developed for capturing and rendering musical performances on handheld or other portable devices in which a multi-touch sensitive display provides one of the input vectors for an expressive performance by a user or musician. Visual cues may be provided on the multi-touch sensitive display to guide the user in a performance based on a musical score. Alternatively, or in addition, uncued freestyle modes of operation may be provided. In either case, it is not the musical score that drives digital synthesis and audible rendering of the synthetic multi-string musical instrument. Rather, it is the stream of user gestures captured at least in part using the multi-touch sensitive display that drives the digital synthesis and audible rendering.
G10H 1/00 - INSTRUMENTS DE MUSIQUE ÉLECTROPHONIQUES; INSTRUMENTS DANS LESQUELS LES SONS SONT PRODUITS PAR DES MOYENS ÉLECTROMÉCANIQUES OU DES GÉNÉRATEURS ÉLECTRONIQUES, OU DANS LESQUELS LES SONS SONT SYNTHÉTISÉS À PARTIR D'UNE MÉMOIRE DE DONNÉES Éléments d'instruments de musique électrophoniques
G06F 3/041 - Numériseurs, p.ex. pour des écrans ou des pavés tactiles, caractérisés par les moyens de transduction
87.
System and method for communication between mobile devices using digital/acoustic techniques
Techniques have been developed for transmitting and receiving information conveyed through the air from one portable device to another as a generally unperceivable coding within an otherwise recognizable acoustic signal. For example, in some embodiments in accordance with the present invention(s), information is acoustically communicated from a first handheld device toward a second by encoding the information in a signal that, when converted into acoustic energy at an acoustic transducer of the first handheld device, is characterized in that the acoustic energy is discernable to a human ear yet the encoding of the information therein is generally not perceivable by the human. The acoustic energy is transmitted from the acoustic transducer of the first handheld device toward the second handheld device across an air gap that constitutes a substantially entirety of the distance between the devices. Acoustic energy received at the second handheld device may then be processed using signal processing techniques tailored to detection of the particular information encodings employed.
Building on a set of captured audiovisual segments clipped to provide a palette of audio and synched video, techniques and implementations described herein facilitate a new and highly-personalized (and in some cases crowd-sourced or sourcable) genre of audiovisual sampling and musical performance. Using a palette of captured and/or imported audio and associated video, users can remix to create a coordinated audiovisual performance. Because the audiovisual sampling and musical performance capabilities can be hosted on ubiquitous handheld or other portable computing devices such as smartphones and/or pad-type computers, user/musicians can, in essence, creatively remix their life.
H04N 5/92 - Transformation du signal de télévision pour l'enregistrement, p.ex. modulation, changement de fréquence; Transformation inverse pour le surjeu
G06F 3/041 - Numériseurs, p.ex. pour des écrans ou des pavés tactiles, caractérisés par les moyens de transduction
G06F 3/01 - Dispositions d'entrée ou dispositions d'entrée et de sortie combinées pour l'interaction entre l'utilisateur et le calculateur
G06F 3/0484 - Techniques d’interaction fondées sur les interfaces utilisateur graphiques [GUI] pour la commande de fonctions ou d’opérations spécifiques, p.ex. sélection ou transformation d’un objet, d’une image ou d’un élément de texte affiché, détermination d’une valeur de paramètre ou sélection d’une plage de valeurs
G06F 3/0481 - Techniques d’interaction fondées sur les interfaces utilisateur graphiques [GUI] fondées sur des propriétés spécifiques de l’objet d’interaction affiché ou sur un environnement basé sur les métaphores, p.ex. interaction avec des éléments du bureau telles les fenêtres ou les icônes, ou avec l’aide d’un curseur changeant de comport
G06F 3/14 - Sortie numérique vers un dispositif de visualisation
G06F 3/023 - Dispositions pour convertir sous une forme codée des éléments d'information discrets, p.ex. dispositions pour interpréter des codes générés par le clavier comme codes alphanumériques, comme codes d'opérande ou comme codes d'instruction
89.
Coordinated audiovisual montage from selected crowd-sourced content with alignment to audio baseline
A generally diverse set of audiovisual clips is sourced from one or more repositories for use in preparing a coordinated audiovisual work. In some cases, audiovisual clips are retrieved using tags such as user-assigned hashtags or metadata. Pre-existing associations of such tags can be used as hints that certain audiovisual clips are likely to share correspondence with an audio signal encoding of a particular song or other audio baseline. Clips are evaluated for computationally determined correspondence with an audio baseline track. In general, comparisons of audio power spectra, of rhythmic features, tempo, pitch sequences and other extracted audio features may be used to establish correspondence. For clips exhibiting a desired level of correspondence, computationally determined temporal alignments of individual clips with the baseline audio track are used to prepare a coordinated audiovisual work that mixes the selected audiovisual clips with the audio track.
G10L 25/57 - Techniques d'analyses de la parole ou de la voix qui ne se limitent pas à un seul des groupes spécialement adaptées pour un usage particulier pour comparaison ou différentiation pour le traitement des signaux vidéo
G11B 27/031 - Montage électronique de signaux d'information analogiques numérisés, p.ex. de signaux audio, vidéo
G11B 27/10 - Indexation; Adressage; Minutage ou synchronisation; Mesure de l'avancement d'une bande
G06F 16/68 - Recherche de données caractérisée par l’utilisation de métadonnées, p.ex. de métadonnées ne provenant pas du contenu ou de métadonnées générées manuellement
G10L 25/48 - Techniques d'analyses de la parole ou de la voix qui ne se limitent pas à un seul des groupes spécialement adaptées pour un usage particulier
G06F 3/0481 - Techniques d’interaction fondées sur les interfaces utilisateur graphiques [GUI] fondées sur des propriétés spécifiques de l’objet d’interaction affiché ou sur un environnement basé sur les métaphores, p.ex. interaction avec des éléments du bureau telles les fenêtres ou les icônes, ou avec l’aide d’un curseur changeant de comport
H04N 21/414 - Plate-formes spécialisées de client, p.ex. récepteur au sein d'une voiture ou intégré dans un appareil mobile
G10L 25/54 - Techniques d'analyses de la parole ou de la voix qui ne se limitent pas à un seul des groupes spécialement adaptées pour un usage particulier pour comparaison ou différentiation pour la recherche
90.
Coordinating and mixing vocals captured from geographically distributed performers
Despite many practical limitations imposed by mobile device platforms and application execution environments, vocal musical performances may be captured and continuously pitch-corrected for mixing and rendering with backing tracks in ways that create compelling user experiences. Based on the techniques described herein, even mere amateurs are encouraged to share with friends and family or to collaborate and contribute vocal performances as part of virtual “glee clubs.” In some implementations, these interactions are facilitated through social network- and/or eMail-mediated sharing of performances and invitations to join in a group performance. Using uploaded vocals captured at clients such as a mobile device, a content server (or service) can mediate such virtual glee clubs by manipulating and mixing the uploaded vocal performances of multiple contributing vocalists.
G10L 21/00 - Traitement du signal de parole ou de voix pour produire un autre signal audible ou non audible, p.ex. visuel ou tactile, afin de modifier sa qualité ou son intelligibilité
G10L 21/013 - Adaptation à la hauteur tonale ciblée
Notwithstanding practical limitations imposed by mobile device platforms and applications, truly captivating musical instruments may be synthesized in ways that allow musically expressive performances to be captured and rendered in real-time. In some cases, synthetic musical instruments can provide a game, grading or instructional mode in which one or more qualities of a user's performance are assessed relative to a musical score. By constantly adapting to such modes to actual performance characteristics and, in some cases, to the level of a given user musician's skill, user interactions with synthetic musical instruments can be made more engaging and may capture user interest and economic opportunities (e.g., for in-app purchase and/or social networking) over generally longer periods of time.
G09B 15/02 - Claviers ou moyens similaires pour l'indication des notes
G10H 1/00 - INSTRUMENTS DE MUSIQUE ÉLECTROPHONIQUES; INSTRUMENTS DANS LESQUELS LES SONS SONT PRODUITS PAR DES MOYENS ÉLECTROMÉCANIQUES OU DES GÉNÉRATEURS ÉLECTRONIQUES, OU DANS LESQUELS LES SONS SONT SYNTHÉTISÉS À PARTIR D'UNE MÉMOIRE DE DONNÉES Éléments d'instruments de musique électrophoniques
G10H 7/00 - Instruments dans lesquels les sons sont synthétisés à partir d'une mémoire de données, p.ex. orgues à calculateur
92.
Pitch-correction of vocal performance in accord with score-coded harmonies
Despite many practical limitations imposed by mobile device platforms and application execution environments, vocal musical performances may be captured and continuously pitch-corrected for mixing and rendering with backing tracks in ways that create compelling user experiences. In some cases, the vocal performances of individual users are captured on mobile devices in the context of a karaoke-style presentation of lyrics in correspondence with audible renderings of a backing track. Such performances can be pitch-corrected in real-time at a portable computing device (such as a mobile phone, personal digital assistant, laptop computer, notebook computer, pad-type computer or netbook) in accord with pitch correction settings. In some cases, pitch correction settings include a score-coded melody and/or harmonies supplied with, or for association with, the lyrics and backing tracks. Harmonies notes or chords may be coded as explicit targets or relative to the score coded melody or even actual pitches sounded by a vocalist, if desired.
An application that manipulates audio (or audiovisual) content, automated music creation technologies may be employed to generate new musical content using digital signal processing software hosted on handheld and/or server (or cloud-based) compute platforms to intelligently process and combine a set of audio content captured and submitted by users of modern mobile phones or other handheld compute platforms. The user-submitted recordings may contain speech, singing, musical instruments, or a wide variety of other sound sources, and the recordings may optionally be preprocessed by the handheld devices prior to submission.
G10L 21/00 - Traitement du signal de parole ou de voix pour produire un autre signal audible ou non audible, p.ex. visuel ou tactile, afin de modifier sa qualité ou son intelligibilité
G10L 25/87 - Détection de points discrets dans un signal de voix
G10L 13/033 - Procédés d'élaboration de parole synthétique; Synthétiseurs de parole Édition de voix, p.ex. transformation de la voix du synthétiseur
94.
Score-directed string retuning and gesture cueing in synthetic multi-string musical instrument
Despite practical limitations imposed by mobile device platforms and applications, truly captivating musical instruments may be synthesized in ways that allow musically expressive performances to be captured and rendered in real-time. Visual cues presented on a multi-touch sensitive display provide the user with temporally sequenced string excitation cues. Note or chord soundings are indicated by user gestures (e.g., pluck-type gestures, strum-type gestures, chord selections, etc.) captured at the multi-touch sensitive display. Those captured gestures, rather than simply the score itself, are used as inputs to a digital synthesis of the musical instrument.
A63F 13/814 - Performances musicales, p.ex. en évaluant le joueur sur sa capacité à suivre une notation
A63F 13/426 - Traitement des signaux de commande d’entrée des dispositifs de jeu vidéo, p.ex. les signaux générés par le joueur ou dérivés de l’environnement par mappage des signaux d’entrée en commandes de jeu, p.ex. mappage du déplacement d’un stylet sur un écran tactile en angle de braquage d’un véhicule virtuel incluant des informations de position sur l’écran, p.ex. les coordonnées sur l’écran d’une surface que le joueur vise avec un pistolet optique
A63F 13/2145 - Dispositions d'entrée pour les dispositifs de jeu vidéo caractérisées par leurs capteurs, leurs finalités ou leurs types pour localiser des contacts sur une surface, p.ex. des tapis de sol ou des pavés tactiles la surface étant aussi un dispositif d’affichage, p.ex. des écrans tactiles
A63F 13/5375 - Commande des signaux de sortie en fonction de la progression du jeu incluant des informations visuelles supplémentaires fournies à la scène de jeu, p.ex. en surimpression pour simuler un affichage tête haute [HUD] ou pour afficher une visée laser dans un jeu de tir utilisant des indicateurs, p.ex. en montrant l’état physique d’un personnage de jeu sur l’écran pour suggérer graphiquement ou textuellement une action, p.ex. en affichant une flèche indiquant un tournant dans un jeu de conduite
A63F 13/54 - Commande des signaux de sortie en fonction de la progression du jeu incluant des signaux acoustiques, p. ex. pour simuler le bruit d’un moteur en fonction des tours par minute [RPM] dans un jeu de conduite ou la réverbération contre un mur virtuel
A63F 13/31 - Aspects de communication spécifiques aux jeux vidéo, p.ex. entre plusieurs dispositifs de jeu portatifs à courte distance
A63F 13/44 - Traitement des signaux de commande d’entrée des dispositifs de jeu vidéo, p.ex. les signaux générés par le joueur ou dérivés de l’environnement incluant la durée ou la synchronisation des opérations, p.ex. l’exécution d’une action dans une certaine fenêtre temporelle
A63F 13/92 - Dispositifs de jeu vidéo spécialement adaptés à une prise manuelle pendant le jeu
A63F 13/792 - Aspects de sécurité ou de gestion du jeu incluant des données sur les joueurs, p.ex. leurs identités, leurs comptes, leurs préférences ou leurs historiques de jeu pour le paiement, p.ex. d’abonnements mensuels
95.
System and method for communication between mobile devices using digital/acoustic techniques
Techniques have been developed for transmitting and receiving information conveyed through the air from one portable device to another as a generally unperceivable coding within an otherwise recognizable acoustic signal. For example, in some embodiments in accordance with the present invention(s), information is acoustically communicated from a first handheld device toward a second by encoding the information in a signal that, when converted into acoustic energy at an acoustic transducer of the first handheld device, is characterized in that the acoustic energy is discernable to a human ear yet the encoding of the information therein is generally not perceivable by the human. The acoustic energy is transmitted from the acoustic transducer of the first handheld device toward the second handheld device across an air gap that constitutes a substantially entirety of the distance between the devices. Acoustic energy received at the second handheld device may then be processed using signal processing techniques tailored to detection of the particular information encodings employed.
Coordinated audio and video filter pairs are applied to enhance artistic and emotional content of audiovisual performances. Such filter pairs, when applied in audio and video processing pipelines of an audiovisual application hosted on a portable computing device (such as a mobile phone or media player, a computing pad or tablet, a game controller or a personal digital assistant or book reader) can allow user selection of effects that enhance both audio and video coordinated therewith. Coordinated audio and video are captured, filtered and rendered at the portable computing device using camera and microphone interfaces, using digital signal processing software executable on a processor and using storage, speaker and display devices of, or interoperable with, the device. By providing audiovisual capture and personalization on an intimate handheld device, social interactions and postings of a type made popular by modern social networking platforms can now be extended to audiovisual content.
G06F 3/0482 - Interaction avec des listes d’éléments sélectionnables, p.ex. des menus
G06F 3/0484 - Techniques d’interaction fondées sur les interfaces utilisateur graphiques [GUI] pour la commande de fonctions ou d’opérations spécifiques, p.ex. sélection ou transformation d’un objet, d’une image ou d’un élément de texte affiché, détermination d’une valeur de paramètre ou sélection d’une plage de valeurs
G10L 21/003 - Changement de la qualité de la voix, p.ex. de la hauteur tonale ou des formants
G10L 21/013 - Adaptation à la hauteur tonale ciblée
H04N 21/414 - Plate-formes spécialisées de client, p.ex. récepteur au sein d'une voiture ou intégré dans un appareil mobile
97.
System and method for capture and rendering of performance on synthetic musical instrument
Techniques have been developed for capturing and rendering musical performances on handheld or other portable devices. The developed techniques facilitate the capture, encoding and use of gesture streams for rendering of a musical performance. In some embodiments, a gesture stream encoding facilitates audible rendering of the musical performance locally on the portable device on which the musical performance is captured, typically in real time. In some embodiments, a gesture stream efficiently codes the musical performance for transmission from the portable device on which the musical performance is captured to (or toward) a remote device on which the musical performance is (or can be) rendered. Indeed, is some embodiments, a gesture stream so captured and encoded may be rendered both locally and on remote devices using substantially identical or equivalent instances of a digital synthesis of the musical instrument executing on the local and remote devices.
Captured vocals may be automatically transformed using advanced digital signal processing techniques that provide captivating applications, and even purpose-built devices, in which mere novice user-musicians may generate, audibly render and share musical performances. In some cases, the automated transformations allow spoken vocals to be segmented, arranged, temporally aligned with a target rhythm, meter or accompanying backing tracks and pitch corrected in accord with a score or note sequence. Speech-to-song music applications are one such example. In some cases, spoken vocals may be transformed in accord with musical genres such as rap using automated segmentation and temporal alignment techniques, often without pitch correction. Such applications, which may employ different signal processing and different automated transformations, may nonetheless be understood as speech-to-rap variations on the theme.
G10L 21/055 - Compression ou expansion temporelles pour la synchronisation avec d’autres signaux, p.ex. signaux vidéo
G10L 19/02 - Techniques d'analyse ou de synthèse de la parole ou des signaux audio pour la réduction de la redondance, p.ex. dans les vocodeurs; Codage ou décodage de la parole ou des signaux audio utilisant les modèles source-filtre ou l’analyse psychoacoustique utilisant l'analyse spectrale, p.ex. vocodeurs à transformée ou vocodeurs à sous-bandes
G10L 19/00 - Techniques d'analyse ou de synthèse de la parole ou des signaux audio pour la réduction de la redondance, p.ex. dans les vocodeurs; Codage ou décodage de la parole ou des signaux audio utilisant les modèles source-filtre ou l’analyse psychoacoustique
Social music system and method with continuous, real-time pitch correction of vocal performance and dry vocal capture for subsequent re-rendering based on selectively applicable vocal effect(s) schedule(s)
Vocal musical performances may be captured and, in some cases or embodiments, pitch-corrected and/or processed in accord with a user selectable vocal effects schedule for mixing and rendering with backing tracks in ways that create compelling user experiences. In some cases, the vocal performances of individual users are captured on mobile devices in the context of a karaoke-style presentation of lyrics in correspondence with audible renderings of a backing track. Such performances can be pitch-corrected in real-time at the mobile device in accord with pitch correction settings. Vocal effects schedules may also be selectively applied to such performances. In these ways, even amateur user/performers with imperfect pitch are encouraged to take a shot at “stardom” and/or take part in a game play, social network or vocal achievement application architecture that facilitates musical collaboration on a global scale and/or, in some cases or embodiments, to initiate revenue generating in-application transactions.
G10L 21/00 - Traitement du signal de parole ou de voix pour produire un autre signal audible ou non audible, p.ex. visuel ou tactile, afin de modifier sa qualité ou son intelligibilité
G10L 21/013 - Adaptation à la hauteur tonale ciblée
Captured vocals may be automatically transformed using advanced digital signal processing techniques that provide captivating applications, and even purpose-built devices, in which mere novice user-musicians may generate, audibly render and share musical performances. In some cases, the automated transformations allow spoken vocals to be segmented, arranged, temporally aligned with a target rhythm, meter or accompanying backing tracks and pitch corrected in accord with a score or note sequence. Speech-to-song music applications are one such example. In some cases, spoken vocals may be transformed in accord with musical genres such as rap using automated segmentation and temporal alignment techniques, often without pitch correction. Such applications, which may employ different signal processing and different automated transformations, may nonetheless be understood as speech-to-rap variations on the theme.
G10L 19/02 - Techniques d'analyse ou de synthèse de la parole ou des signaux audio pour la réduction de la redondance, p.ex. dans les vocodeurs; Codage ou décodage de la parole ou des signaux audio utilisant les modèles source-filtre ou l’analyse psychoacoustique utilisant l'analyse spectrale, p.ex. vocodeurs à transformée ou vocodeurs à sous-bandes
G10L 19/00 - Techniques d'analyse ou de synthèse de la parole ou des signaux audio pour la réduction de la redondance, p.ex. dans les vocodeurs; Codage ou décodage de la parole ou des signaux audio utilisant les modèles source-filtre ou l’analyse psychoacoustique
G10L 21/055 - Compression ou expansion temporelles pour la synchronisation avec d’autres signaux, p.ex. signaux vidéo