On Saturday 23 February 2008, Sebastian Tschöpel wrote:
I think that would be a matter of training, basically; much like many
people can "hear" what something will sound like before playing it.
Untrained ears have a much harder time picking out and following
individual instruments from a mix, but this is something that
improves over time, in my experience. It's a lot like learning to
read by recognizing the words rather than the individual letters. It
would seem like the same basic mechanisms at play, though I don't
know if that's actually how the human brain implements it.
Now, theoretically, if you can make out the individual instruments
from a mix, and know what these instruments would sound like on their
own, you should basically be able to recreate any combination of
these instruments in your mind.
Anyway, from the strictly technical POV, there's this major
overlapping problem. Getting the individual frequency components out
of a fragment of music is trivial (relatively speaking) - but how do
you know which ones go with what instrument?
Well, consider a simple example with a solo melody voice over a simple
bass line. As long as you can make out the fundamentals (which can
turn out to be quite hard enough in real applications!), you can look
at the spectrum and figure out which components follow what melody.
This would require multipple analysis passes (to learn the instrument
sounds and melodies), and/or a database of "familiar sounds"; it's
not something you can just do frame by frame on unknown data.
...and of course, there's a million "minor" issues around this that
make it a lot harder than it appears to be. Logically, it has to be
possible, but maybe the first step would be to dispell a few
confusing myths about how the human brain does this stuff. I think
the brain has access to a lot of data that algorithms of this sort
generally don't have.
For example, I don't think it's realistically possible to do this
without some sort of database of "familiar sounds", and/or some model
of how the "average musically trained" human brain infers
fundamentals from audible spectra. (There are plenty of instruments
that have very little energy around the fundamental frequency, which
makes even "simple" pitch tracking non-trivial.) Considering how the
brain appears to work, I don't think there's a strict distinction
between a "database" and a "model" in this regard. A neural network
might be the proper model for software, and it's state after
appropriate training would be the "database."
//David Olofson - Programmer, Composer, Open Source Advocate
.------- http://olofson.net - Games, SDL examples -------.
| http://zeespace.net - 2.5D rendering engine |
| http://audiality.org - Music/audio engine |
| http://eel.olofson.net - Real time scripting |
'-- http://www.reologica.se - Rheology instrumentation --'
Linux-audio-user mailing list