DSP comparison request
Re: DSP comparison request
If you take a CPU (although about 1/8 the power of a current quad, a pentium 4) and put it on a real time OS you will be able to do everything a Korg OASYS (non PCI) can do which is basically what that is.
Re: DSP comparison request
There is also muse receptor, V machines, Neko... 
the oasys runs on linux and i think it uses dsp for effects (the oasys pci worked that way)
the neko uses 2 opteron 64 bits. the v-machine, well, at musikmess, the one i tested didn't want to output a sound
(but they work ok apparently).
Scope is 10 year old, and allows synths, effects and mixing on the same board... none of the above is able to do it without an OS (to run cubase etc like on neko).... The price is also not the same compared to xite, i think xite is the best price/power flexibility....

the oasys runs on linux and i think it uses dsp for effects (the oasys pci worked that way)
the neko uses 2 opteron 64 bits. the v-machine, well, at musikmess, the one i tested didn't want to output a sound

Scope is 10 year old, and allows synths, effects and mixing on the same board... none of the above is able to do it without an OS (to run cubase etc like on neko).... The price is also not the same compared to xite, i think xite is the best price/power flexibility....
Re: DSP comparison request
From what i know the oasys does not even have any DSP, just a MSI motherboard with a dumb sound card with DRM chip to stop people running oasys on any old PC.
Re: DSP comparison request
ah you noticed by yourselfspacef wrote: (the oasys pci worked that way)

Re: DSP comparison request
The XITE system have 12 X ADSP-21369 at 333MHz - each DSP is capable of 1998 MFLOPS. The XITE have 6 additional ADSP-21065L at 66MHz which is capable of 198 MFLOPS each.
The total processing power is therefor around 25 GFLOPS.
A Core i7-920 can reach 42+ GFLOPS.
The new 8-core MAC Pros has ALOT of power - 100+ GFLOPS (depending on the model). Most people are waiting for something like Larrabee which is capable of 2000+ GFLOPS with 32 x86 cores.
The above is peak performance and not sustained performance.
DSP's is not better than a CPU in anything - but DSPs are quite cheap compared to the performance that they deliver and the circuit boards are more simple and therefor cheaper.
But there's more to it than raw power - to achieve those numbers above you'll have to use SIMD 128bit/512bit vector units and the current compilers doesn't optimize automatically for the SSE/AVX etc. so that has do be done manually - some VST/AU developers have started (but most haven't started yet), but there's a long way before they master parallelism.
The total processing power is therefor around 25 GFLOPS.
A Core i7-920 can reach 42+ GFLOPS.
The new 8-core MAC Pros has ALOT of power - 100+ GFLOPS (depending on the model). Most people are waiting for something like Larrabee which is capable of 2000+ GFLOPS with 32 x86 cores.
The above is peak performance and not sustained performance.
DSP's is not better than a CPU in anything - but DSPs are quite cheap compared to the performance that they deliver and the circuit boards are more simple and therefor cheaper.
But there's more to it than raw power - to achieve those numbers above you'll have to use SIMD 128bit/512bit vector units and the current compilers doesn't optimize automatically for the SSE/AVX etc. so that has do be done manually - some VST/AU developers have started (but most haven't started yet), but there's a long way before they master parallelism.
-
- Posts: 1228
- Joined: Sat Apr 11, 2009 9:34 am
- Location: The Land of Cheese, Beer & Fat Chicks
- Sounddesigner
- Posts: 1085
- Joined: Sat Jun 02, 2007 11:06 pm
Re: DSP comparison request
Warp69 wrote:
DSP's is not better than a CPU in anything -
It may not be the hardware that delivers the weakness of computers, but most vst's/vsti's i have will not permit low-latency performance. Either a single plugin or a small combination will deliver worse performance for computers then i get with a 6 dsp SCOPE card at times. As long as windows xp, asio, the host programs are tied to computers they're not going to function aswell. This is always the case even with the latest and greatest computers. Core i7's where touted for low-latency performance but yet i must operate at 12ms buffer and even larger in some cases. There are too many demmanding plugins to sustain a realtime enviroment for Native Platform and it is easily taken out of it, unless one resorts to compromises. Native and computers are tied together. When people tout low-latency's from their computers they're avoiding any demmanding plugins (wich i'm sure are in most peoples arsenal) and or workflows. Even the Recepter does'nt function at latencies as low as dsp's and because one is forced to use uniwire it brings MUCH latency when connected to your main DAW.
There are also too many extremely power-hungry plugins to keep computers growing power countered to many people. I just bought Voxengo Voxformer wich uses up to 8x oversampling and only one instance of that can eat up 17% cpu on Core i7. The more the cores the worser the low-latency performance has been in the past also for computers.
Last edited by Sounddesigner on Wed Apr 15, 2009 11:20 am, edited 8 times in total.
Re: DSP comparison request
I've made a screenshot of the results page of the afforementioned TigerSharc versus PPC G4 comparison. The tests were performed on dedicated processor-boards with a handwritten 'operating system', so just the FFT task and memory and IO management were running.
Even if these processors are different from what we typically use they can be assumed as valid 'prototypes' representing their class.
cheers, Tom
Even if these processors are different from what we typically use they can be assumed as valid 'prototypes' representing their class.
cheers, Tom
- Attachments
-
- DSP-CPU.jpg (267.2 KiB) Viewed 2231 times
Re: DSP comparison request
Hmmm - you can make tests that shows what ever you want them to show. Tigersharcs have a wonderfull memory system, several DMA channels - alot of bandwidth. But no support for 64bit - I could make a test that shows how many 2nd order butterworth filters with a cutoff freq of 0.5Hz could each processor handle - the PPC have native support for 64bit, but the Tigersharc doesn't - you will then have to use some tricks to get the same precision and that would use cycles.astroman wrote:I've made a screenshot of the results page of the afforementioned TigerSharc versus PPC G4 comparison. The tests were performed on dedicated processor-boards with a handwritten 'operating system', so just the FFT task and memory and IO management were running.
Even if these processors are different from what we typically use they can be assumed as valid 'prototypes' representing their class.
That document is from 2003 - alot of things have changed on the CPU side, but not much have changed on the DSP side - more and more people are infact using FPGA's instead of DSPs. No matter which high-end DSP you choose it would fail in any test against the new generation of CPU's except price. It's hard to beat a price of 32 dollars for a 2GFLOPS DSP.
Re: DSP comparison request
I was talking about raw power and no DSP can match any Core i7 - 17% of a Core i7?? That probably equals something like 10 Eventide H8000 units - I think that says more about that plugin than it do about the Core i7. Not many plugins are optimized (if any) - not many audio developers will last a day in the graphics/game segment.Sounddesigner wrote: It may not be the hardware that delivers the weakness of computers, but most vst's/vsti's i have will not permit low-latency performance. Either a single plugin or a small combination will deliver worse performance for computers then i get with a 6 dsp SCOPE card at times. As long as windows xp, asio, the host programs are tied to computers they're not going to function aswell. This is always the case even with the latest and greatest computers. Core i7's where touted for low-latency performance but yet i must operate at 12ms buffer and even larger in some cases. There are too many demmanding plugins to sustain a realtime enviroment for Native Platform and it is easily taken out of it, unless one resorts to compromises. Native and computers are tied together. When people tout low-latency's from their computers they're avoiding any demmanding plugins (wich i'm sure are in most peoples arsenal) and or workflows. Even the Recepter does'nt function at latencies as low as dsp's and because one is forced to use uniwire it brings MUCH latency when connected to your main DAW.
There are also too many extremely power-hungry plugins to keep computers growing power countered to many people. I just bought Voxengo Voxformer wich uses up to 8x oversampling and only one instance of that can eat up 17% cpu on Core i7. The more the cores the worser the low-latency performance has been in the past also for computers.
The Core i7 can easily handle realtime processing - let us see how many realtime effects XITE can handle after you have implemented an OS running on the DSPs

Im not against XITE in anyway - I'll be using one soon (I hope) as my primary rig - no doubt about that. And I would not have any second thoughts about recommending it to someone else (already did).
- siriusbliss
- Posts: 3118
- Joined: Fri Apr 06, 2001 4:00 pm
- Location: Cupertino, California US
- Contact:
Re: DSP comparison request
sonolive wrote:hi greg,
have a look here ...
http://www.dspguide.com/ch28.htm
no answwer like "it is twice more power than ..." but a good introduction in dsp vs other cpu
cheerz
olive
All good stuff guys, thanks!
Greg
Xite rig - ADK laptop - i7 975 3.33 GHz Quad w/HT 8meg cache /MDR3-4G/1066SODIMM / VD-GGTX280M nVidia GeForce GTX 280M w/1GB DDR3
- siriusbliss
- Posts: 3118
- Joined: Fri Apr 06, 2001 4:00 pm
- Location: Cupertino, California US
- Contact:
Re: DSP comparison request
Yes, another good, legitimate question.spacef wrote:Leads to another question; what is the minimal ulli/asio latency on xite ?[/color]
i didn't check that at musikmess...
Greg
Xite rig - ADK laptop - i7 975 3.33 GHz Quad w/HT 8meg cache /MDR3-4G/1066SODIMM / VD-GGTX280M nVidia GeForce GTX 280M w/1GB DDR3
- siriusbliss
- Posts: 3118
- Joined: Fri Apr 06, 2001 4:00 pm
- Location: Cupertino, California US
- Contact:
Re: DSP comparison request
Good!astroman wrote: your layman's bottomline as requested:
CPU power is speced by artificial tests with data aready loaded into the CPU core.
As soon as your app has to move it around - and a realtime audio processor got to move lots of stuff... you can just forget about it. Seriously.
Mind you: the example above dealt with hand-optimized machine code by industry experts on a rather simple (and well known) subject. Expect typical audio application to be anything between 5 to 20 times less efficient
Greg
Xite rig - ADK laptop - i7 975 3.33 GHz Quad w/HT 8meg cache /MDR3-4G/1066SODIMM / VD-GGTX280M nVidia GeForce GTX 280M w/1GB DDR3
Re: DSP comparison request
The problem with comparisons is that dsp "raw power" and cpu "raw power" are different things. The dsp is designed specifically for doing dsp activies.
You can see this by simply looking at a specific dsp instruction like the multiply and accumulate. This intsruction will load an operand, and multiplier, multiple them and add the result to the accumulator. This is done in a single instuction. On a standard cpu it would take the following instructions load operand, load operator, mulitply, add, increment loop counter, test, branch back if not done. This is 7 instructions give or take. Now each of these instructions may take multiple steps depending on the hardware. The increment is probably 1 or 2 cycles but the multiply a lot will be more. The pentiums used to say the average was 3-4 cyles per instuction. So the cpu takes maybe 21-28 maybe more, maybe less cycles to evaluate but the dsp takes one.
So if the cycle time of the dsp and the cpu were the same, it would take more than 20 times longer to evalute with a cpu. In most cases the cycle time will be faster for the cpu, but overal the dsp will perform this sequecne a lot faster as shown in the table comparing fft times in a previous post.
You can see this by simply looking at a specific dsp instruction like the multiply and accumulate. This intsruction will load an operand, and multiplier, multiple them and add the result to the accumulator. This is done in a single instuction. On a standard cpu it would take the following instructions load operand, load operator, mulitply, add, increment loop counter, test, branch back if not done. This is 7 instructions give or take. Now each of these instructions may take multiple steps depending on the hardware. The increment is probably 1 or 2 cycles but the multiply a lot will be more. The pentiums used to say the average was 3-4 cyles per instuction. So the cpu takes maybe 21-28 maybe more, maybe less cycles to evaluate but the dsp takes one.
So if the cycle time of the dsp and the cpu were the same, it would take more than 20 times longer to evalute with a cpu. In most cases the cycle time will be faster for the cpu, but overal the dsp will perform this sequecne a lot faster as shown in the table comparing fft times in a previous post.
mark winger
- siriusbliss
- Posts: 3118
- Joined: Fri Apr 06, 2001 4:00 pm
- Location: Cupertino, California US
- Contact:
Re: DSP comparison request
Yes! Good straightforward explanation.Warp69 wrote:The XITE system have 12 X ADSP-21369 at 333MHz - each DSP is capable of 1998 MFLOPS. The XITE have 6 additional ADSP-21065L at 66MHz which is capable of 198 MFLOPS each.
The total processing power is therefor around 25 GFLOPS.
A Core i7-920 can reach 42+ GFLOPS.
The new 8-core MAC Pros has ALOT of power - 100+ GFLOPS (depending on the model). Most people are waiting for something like Larrabee which is capable of 2000+ GFLOPS with 32 x86 cores.
The above is peak performance and not sustained performance.
DSP's is not better than a CPU in anything - but DSPs are quite cheap compared to the performance that they deliver and the circuit boards are more simple and therefor cheaper.
But there's more to it than raw power - to achieve those numbers above you'll have to use SIMD 128bit/512bit vector units and the current compilers doesn't optimize automatically for the SSE/AVX etc. so that has do be done manually - some VST/AU developers have started (but most haven't started yet), but there's a long way before they master parallelism.
I understand that a lot of horsepower is lost in the applications and VST integration, etc.
The geek side of me is happy.

Greg
Xite rig - ADK laptop - i7 975 3.33 GHz Quad w/HT 8meg cache /MDR3-4G/1066SODIMM / VD-GGTX280M nVidia GeForce GTX 280M w/1GB DDR3
Re: DSP comparison request
Latencies are not inherent to the processors. The latency comes from the software, and how the data is moved from the hardware IO to application that is using the data.
mark winger
Re: DSP comparison request
Well - first of all - you can do multiply-add with SSE3+4 on multiple data in one instruction now - Second, the next generation SSE (AVX) is 256bit (and later 512bit) SIMD which support fused multiply-add : this means that AVX can perform multiply-add - > accumulator with one rounding on 8 32bit floats in one instruction which takes around 2-3 cycles and remember that we are running at 2+GHz and not 400MHz - the DSP segment has to move extremely fast within this year to catch up (well thats probably impossible).
Re: DSP comparison request
of course you can... I only posted the results pageWarp69 wrote: ...Hmmm - you can make tests that shows what ever you want them to show.

the company that did the test sells both kind of boards - some with DSPs, some with CPUs
as you can see in the paragraph about applications, music isn't even mentioned, but radar, ultrasound etc.
The 'test' was done to point out the specific strength of each architecture to give customers some decision support. There is no bias whatsoever in the paper.
I had some concerns about the 'age', too...That document is from 2003 - alot of things have changed on the CPU side, but not much have changed on the DSP side ...

but then it's the only comparison I know that goes into such detail of each architecture.
And regardless of what has changed with CPUs their argument of feeding data still holds.
We're not talking about a quadcore with a custom realtime OS, but a chip running either M$ or Apple's general purpose stuff.
I use Macs for decades, so I have a pretty good comparison for real world results from M68K, PPC and Intel CoreDuo CPUs, ranging from 8 MHZ to 2.5 GHZ clocks.
OSX itself must eat up a significant portion of the available resources, otherwise I couldn't explain the relatively small difference in overall reaction of the machines compared to a 500MHZ PPC with a 66 MHZ bus and SD-Ram.
As in the DSP versus CPU comparison there is a huge gap between my expectations and the realworld result I experience every day.
Imho there are countless people raving about the power of these CPUs, but noone ever tells you how the data gets into the core(s).
You may have noticed (in the article above) that the actual processing of the FFT didn't influence the PPC's result at all. The performance was entirely determined by memory transfer and cache handling

by pure reasoning I tend to assume that it may not be relevant at all for audio processing - but then there's the everyday office experience where a machine with supposedly 6 times the processing power 'feels' barely twice as fast...
cheers, Tom
Re: DSP comparison request
You are 100% right, but allow me to completely disagreewinger wrote:Latencies are not inherent to the processors. The latency comes from the software, and how the data is moved from the hardware IO to application that is using the data.

not about cpu, but about the relation to a hardware, ie an ensemble of i/os, connections, cpu, ram etc....
Here is what i see happening:
- Load a softsynth on a PC: it doesn't output any sound because you need, for the best latency an asio driver. but you have latency anyway and it is difficult to go below 1ms latency even with very good drivers and exceptional soundcard hardware. (1 ms at 44.1Khz is 44.1 samples approx). Without a driver, the softsynth often don't even run as in "i can see the interface".
- Load a softsynth on scope, you have sound immediately, no need for additional driver, asio or whatever. It runs on the dsp of the card (latency is lower than 20 samples, i think more around [5-15 ms] max). scope has been way below asio latency whatever is the chosen Asio latency). because the DSP ("cpu" of scope) and the design of the scope hardware allow this easily.
About asio drivers:
- Asio drivers are softwares, but they are bound to the hardware capability (chips, design etc)
- Many examples:
- Scope Generation 1 board, you get a minimum of 12/13 ms latency (11.4 at 48Khz sampling rate if i remember correctly). The hardware design doesn't allow for less.
- Scope generation II boards allow 3 ms latency . It is not a change of driver that allowed this, but a modification of the hardware.. and this is the same whatever is the pc/mac power...
- Behringer FCA202 reaches 4ms asio latency with no problem on a celeron 1.5 ghz with 512 mb ram, it is firewire. lower latency cause glitches and drop outs.. due to what ? firewire or the pc it is connected to ? the pc i tend to think, or the behringer itself....
- Focusrite Saffire, firewire, 6 ms latency on the same computer (celeron 1.5/512 mb ram). I should run test to see if it can cope with lower latencies than a behringer FCA202 (but the saffire is a friend's so i can have it always - but when i tested it, it appeared less efficient than a FCA202).
- Noah on usb 2 : drop outs and unsynched usb/asio driver happens below 12 to 7ms latency - depending on what you run on the PC - , allowing only 2 ins/2outs. I am not sure it is only because of the low speed of usb 1.
- RME Fireface 1 ms latency; takes a lot more ressources on the computer than using a higher latency....( i didn't test this directly, but have no reason to doubt the person from whom i got this info who had a scope board a ran into the same limitation and consumption of ressources, instabilities, when trying to go down to the 1ms promissed by RME)...
So there is an ensemble of things to take into consideration.
Scope has to deal with asio buffers to manage low latencies, which are themselves in the PCI way to communicate with the computer, to speak grossly, and Scope uses its own hardware to achieve more or less asio latency whatever is the PC/CPU connected to it: pentium 3, 4 or core2duo.. it is always the same..... how good is the driver and how fast is the machine is not the most relevant relevant, it is primarily a matter of design of the card.
and, it doesn't need asio to run on the first place. by itself, it is faster/real time etc...
So i think latency, eventhough not bound to cpu/dsp themselves, is hardware dependant, and in this field, the superiority of scope is certain when it comes to synths and effects, mixing and all things that are nice to do on a real time basis, and that's been so for a long time (more than 10 years)....
no ?
Re: DSP comparison request
This generation of CPUs (Core i7) has absolutely no problem with moving data between external RAM and the cache - way faster than any DSP. But all that power get wasted in bad optimized code/no support for SSE (or other special instructions) and a large OS system - I don't think that will change - unfortunately.astroman wrote:As in the DSP versus CPU comparison there is a huge gap between my expectations and the realworld result I experience every day.
Imho there are countless people raving about the power of these CPUs, but noone ever tells you how the data gets into the core(s).