Koyal Bird Voice



  1. Koyal Bird Voice

4,216 Best Birds Free Video Clip Downloads from the Videezy community. Free Birds Stock Video Footage licensed under creative commons, open source, and more! The upper plumage of young birds is more like that of the male and they have a black beak. They are very vocal during the breeding season (March to August in the Indian Subcontinent), with a range of different calls. The familiar song of the male is a repeated koo-Ooo. The female makes a shrill kik-kik-kik. Calls vary across populations.

Researchers at Google claim to have developed a machine learning model that can separate a sound source from noisy, single-channel audio based on only a short sample of the target source. In a paper, they say their SoundFilter system can be tuned to filter arbitrary sound sources, even those it hasn’t seen during training.

Mp3

The researchers believe a noise-eliminating system like SoundFilter could be used to create a range of useful technologies. For instance, Google drew on audio from thousands of its own meetings and YouTube videos to train the noise-canceling algorithm in Google Meet. Meanwhile, a team of Carnegie Mellon researchers created a “sound-action-vision” corpus to anticipate where objects will move when subjected to physical force.

SoundFilter treats the task of sound separation as a one-shot learning problem. The model receives as input the audio mixture to be filtered and a single short example of the kind of sound to be filtered out. Once trained, SoundFilter is expected to extract this kind of sound from the mixture if present.

Koyal

SoundFilter adopts what’s known as a wave-to-wave neural network architecture that can be trained using audio samples without requiring labels that denote the type of source. A conditioning encoder takes the conditioning audio and computes the corresponding embedding (i.e., numerical representation), while a conditional generator takes the mixture audio and the conditioning embedding as input and produces the filtered output. The system assumes that the original audio collection consists of many clips a few seconds in length that contain the same type of sound for the whole duration. Beyond this, SoundFilter assumes that each such clip contains a single audio source, such as one speaker, one musical instrument, or one bird singing.

The model is trained to produce the target audio, given the mixture and the conditioning audio as inputs. A SoundFilter training example consists of three parts:

  1. The target audio, which contains only one sound
  2. A mixture, which contains two different sounds, one of which is the target audio
  3. A conditioning audio signal, which is another example containing the same kind of sound as the target audio

In experiments, the researchers trained SoundFilter on two open source datasets: FSD50L (a collection of over 50,000 sounds) and LibriSpeech (around 1,000 hours of English speech). They report that the conditioning encoder learned to produce embeddings that represent the acoustic characteristics of the conditioning audio, enabling SoundFilter to successfully separate voices from mixtures of speakers, sounds from mixtures of sounds, and individual speakers/sounds from mixtures of speakers and sounds.

Here’s one sample before SoundFilter processed it:

https://venturebeat.com/wp-content/uploads/2020/11/download-1.wav

Here’s the sample post-processing:

Koyal Bird Voicehttps://venturebeat.com/wp-content/uploads/2020/11/download.wav

Here’s another sample:

https://venturebeat.com/wp-content/uploads/2020/11/download-6.wav

And here’s the post-processed result:

https://venturebeat.com/wp-content/uploads/2020/11/download-7.wav

Koyal Bird Voice

“Our work could be extended by exploring how to use the embedding learned as part of SoundFilter as a representation for an audio event classifier,” the researchers wrote. “In addition, it would be of interest to extend our approach from one-shot to many-shot.”





Comments are closed.