On Tue, 2009-08-11 at 19:54 +0200, Fons Adriaensen wrote:
The priorities are always even .. and then again not nescessarily.
Say warp A (or "process" A) must do four smaller workloads while warp B
is doing one bigger workload? The way to go would then be for warp B to
call __syncthreads() when 25% of its work is done, thus assuring that
warp A will be given all of GPU untill it has catched up at the end of
its first workload and also calls __synthreads(), which gives warp B the
green light to continue. This under the assumption that warp A hasn't
already done it's part and is waiting for B to catch up.
Repeat the procedure at 50% and 75%.
> How to structure a convolution engine to run on a graphics
Locally on the card for use by other parts of the complex, unless by
routing directive read or written to those arrays that are transferred
back and forth between the GPU and host at each kernel launch.
> > How much jconv would something like a 300Mhz Pentium Pro buy me? (Just
I found a measure of ~1 sec for a 128K FFT on a PPro @200
Would that be helpful for a guesstimate?
The thing is also that, although the first thing one might come to think
of is a nice convolution reverb with a decay of two seconds, having
instead 32 shorter impulses - all different - opens up another universe.
You could have an increasing delay in front of each of them, giving an
illusion that they are all parts of the same (huge) impulse redponse, or
you could use keyboard triggers and routing to play them like an
Still, 500ms would be really very useful and 32 convolutions is mmm ..
perhaps a little overkill. There might be ways for two or four threads
to share one load. IIRC library routines for SSE enabled FFT exists
which could be more or less copied verbatim across four adjacent
Linux-audio-dev mailing list