Re: [LAD] Looking for library to categorise audio

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
To: Philipp √úberbacher <murks@...>
Cc: Linux Audio Developers <linux-audio-dev@...>
Date: Wednesday, October 16, 2013 - 7:10 pm

--089e0158b78cb92f3604e8e07295
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On Wed, Oct 16, 2013 at 6:51 PM, Philipp =DCberbacher =
wrote:

> I was hoping for something that requires less DSP knowledge.

I think we all do... note although I dabble in DSP, I won't claim to "know"
DSP...

> However given that those low-level tools are available,

Of the three catagories you mentioned (speech, music, noise), speech is
probably the easiest to find...
FFT the whole track (windows of... 8192 or so perhaps), then check for
frequency content in the speech range[1]: 300 - 3.400 Hz.
If the content is steadily within those frequency ranges (allowing for some
FFT windowing error), the that should be ok.

Music (depending on type) is generally rythmical, so transients should be
present, and somewhat evenly spaced. Easier to detect if the music hasn't
been compressed to a brick-wall.
Noise (depending on type) is generally *not* rythmical, so transients
should be present but not evenly spaced...

The above is a suggestion only: I don't know is it the best way to go.
Depending on the content, you'll have some success with the above approach.
Advice on "music-information-retrieval" or content analysis is probably
better on the Music-DSP mailing list, perhaps ask there?

HTH, -Harry

[1]: Voice frequencies, http://en.wikipedia.org/wiki/Voice_frequency

--089e0158b78cb92f3604e8e07295
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

=
On Wed, Oct 16, 2013 at 6:51 PM, Philipp =DCberbacher &lt=
;murks@tuxfamily.o=
rg
> wrote:

I was hoping for something that requires less DSP knowledge.I think we all do... note although I dabble in DSP, I won't clai=
m to "know" DSP...=A0

However given that those low-level tools are available,
hints on how to combine them or on possibly useful algorithms etc.
would be appreciated as well.Of the th=
ree catagories you mentioned (speech, music, noise), speech is probably the=
easiest to find...FFT the whole track (windows of... 8192 o=
r so perhaps), then check for frequency content in the speech range[1]: 300=
- 3.400 Hz.
If the content is steadily within those frequency ranges (allowi=
ng for some FFT windowing error), the that should be ok.=
Music (depending on type) is generally rythmical, so transients should be p=
resent, and somewhat evenly spaced. Easier to detect if the music hasn'=
t been compressed to a brick-wall.
Noise (depending on type) is generally *not* rythmical, so trans=
ients should be present but not evenly spaced... <=
div>The above is a suggestion only: I don't know is it the best way to =
go. Depending on the content, you'll have some success with the above a=
pproach.
Advice on "music-information-retrieval" or content ana=
lysis is probably better on the Music-DSP mailing list, perhaps ask there?<=
br>HTH, -Harry[1]: Voice frequencies, http://en.wikipedia.o=
rg/wiki/Voice_frequency

--089e0158b78cb92f3604e8e07295--

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[LAD] Looking for library to categorise audio, Philipp , (Sun Oct 13, 8:50 pm)
Re: [LAD] Looking for library to categorise audio, Harry van Haaren, (Sun Oct 13, 8:59 pm)
Re: [LAD] Looking for library to categorise audio, Philipp , (Wed Oct 16, 5:51 pm)
Re: [LAD] Looking for library to categorise audio, Harry van Haaren, (Wed Oct 16, 7:10 pm)