STI Cell processor against x86...
Nice and informative artical:
http://www.blachford.info/computer/Cells/Cell0.html
Author asks himself: Has the PC finally met it's match?
But:
(quote) Cell may be vastly more powerful than existing x86 processors but history has shown the PC's ability to overcome even vastly better systems. Being faster alone is not enough to topple the PC.
<font size=-1>[ This Message was edited by: hubird on 2005-02-08 08:10 ]</font>
http://www.blachford.info/computer/Cells/Cell0.html
Author asks himself: Has the PC finally met it's match?
But:
(quote) Cell may be vastly more powerful than existing x86 processors but history has shown the PC's ability to overcome even vastly better systems. Being faster alone is not enough to topple the PC.
<font size=-1>[ This Message was edited by: hubird on 2005-02-08 08:10 ]</font>
The CELL is no match for a PC. IBM quote from an article, published by heise.de:
"CELL is up to ten times faster, compared to CPU's currently used in entertainment and gaming systems."
Marketing speech. Keep in mind, IBM had a prototype CELL running at 4.6GHz (but didn't mention the cooling solution - I assume it was Nitrogen, Peltier, or at least water cooled)... And what does '[...] CPU's currently used [...]' mean? I'll be nice and assume they compared it to the Celeron 733, currently used in the XBox.
OK, now I'll try to translate:
"CELL, heavily overclocked to 4.6GHz, is ten times faster compared to an Intel Celeron 733."
Wow! Really!? That's, like, 3% faster than an Opteron 150? But what if I overclock the Opteron, too? And when will this thing be available? Next year? Too bad, 'till then, a top-of-the-line x86-64 dual core CPU will be even faster...
CELL is a vector CPU - one could even go as far as calling it a DSP. It's fast as hell, but only for very few operations. It's completely unsuited for most uses, except for gaming (AI, physics, 3D), video encoding and maybe audio processing...
The CPU's commonly used for PC's are far better all-round machines, and it is very possible (and common) to add dedicated DSP's for specific uses (Sharc, CS601, EnLight256)...
"CELL is up to ten times faster, compared to CPU's currently used in entertainment and gaming systems."
Marketing speech. Keep in mind, IBM had a prototype CELL running at 4.6GHz (but didn't mention the cooling solution - I assume it was Nitrogen, Peltier, or at least water cooled)... And what does '[...] CPU's currently used [...]' mean? I'll be nice and assume they compared it to the Celeron 733, currently used in the XBox.
OK, now I'll try to translate:
"CELL, heavily overclocked to 4.6GHz, is ten times faster compared to an Intel Celeron 733."
Wow! Really!? That's, like, 3% faster than an Opteron 150? But what if I overclock the Opteron, too? And when will this thing be available? Next year? Too bad, 'till then, a top-of-the-line x86-64 dual core CPU will be even faster...
CELL is a vector CPU - one could even go as far as calling it a DSP. It's fast as hell, but only for very few operations. It's completely unsuited for most uses, except for gaming (AI, physics, 3D), video encoding and maybe audio processing...
The CPU's commonly used for PC's are far better all-round machines, and it is very possible (and common) to add dedicated DSP's for specific uses (Sharc, CS601, EnLight256)...
I will not drop into discussions about clockrates since it's nonsense since the days of the C64On 2005-02-08 12:31, wsippel wrote:
..., 'till then, a top-of-the-line x86-64 dual core CPU will be even faster...
...
The CPU's commonly used for PC's are far better all-round machines, ...

But I suspect those current CPUs you mention also only shine in rather few 'specialized' applications (code segments).
On the other hand regular 'Windows' code on X86 is (must be considered) total crap if you compare the execution speed to an emulated version running on a Powerchip...

cheers, Tom
Actually, if you look at the Cell spec, it looks like they have a central CPU-style unit and 12 vector processors, capable of doing 32GFlops each. That's a pretty completely sick amount of processing power per Cell.
Current CPUs work the same way, Velocity Engine in the PowerPC is actually DSP cores, that contribute a pretty solid chunk of the processing power. Same with P4, without the embedded DSP cores in those CPUs, they wouldn't be very fast.
Current CPUs work the same way, Velocity Engine in the PowerPC is actually DSP cores, that contribute a pretty solid chunk of the processing power. Same with P4, without the embedded DSP cores in those CPUs, they wouldn't be very fast.
Tom,
you're right that clockspeed-talk is nonsense. It's easy to see by comparing AMD and Intel CPU's (especially P4), or Intel's P4 and P-M, or even comparing the PPC970/ G5 and AMD's amd64 line.
When it comes to pure number-crunching, amd64 CPU's, still based on a heavily improved x86 core, outperforms the PPC970/ G5 per cycle in almost every case. So, I'd say the x86 architecture is far better than most people would think:
1x AMD Opteron 1.8GHz, benchmarked by AMD:
CINT2000_base: 1095
CFPU2000_base: 1120
1x Apple G5 2.0GHz, benchmarked by Apple:
CINT2000_base: 800
CFPU2000_base: 820
2x AMD Opteron 1.8GHz, benchmarked by AMD:
CINT2000_rate_base: 25.0
CFP2000_rate_base: 24.7
2x Apple G5 2.0GHz, benchmarked by Apple:
CINT2000_rate_base: 17.2
CFP2000_rate_base: 15.7
One benefit of the x86 architecture is that it's very well known, and it's therefore easier to improve compilers for revised cores.
And it's somewhat unfair to consider PPC's more powerful just because VirtualPC _feels_ fast. It isn't, you wouldn't want to do complex stuff on VirtualPC. Give it a while, PearPC is supposed to add a recompiler for amd64 CPU's - maybe OSX on Linux/ amd64 will feel even faster than the other way around...
you're right that clockspeed-talk is nonsense. It's easy to see by comparing AMD and Intel CPU's (especially P4), or Intel's P4 and P-M, or even comparing the PPC970/ G5 and AMD's amd64 line.
When it comes to pure number-crunching, amd64 CPU's, still based on a heavily improved x86 core, outperforms the PPC970/ G5 per cycle in almost every case. So, I'd say the x86 architecture is far better than most people would think:
1x AMD Opteron 1.8GHz, benchmarked by AMD:
CINT2000_base: 1095
CFPU2000_base: 1120
1x Apple G5 2.0GHz, benchmarked by Apple:
CINT2000_base: 800
CFPU2000_base: 820
2x AMD Opteron 1.8GHz, benchmarked by AMD:
CINT2000_rate_base: 25.0
CFP2000_rate_base: 24.7
2x Apple G5 2.0GHz, benchmarked by Apple:
CINT2000_rate_base: 17.2
CFP2000_rate_base: 15.7
One benefit of the x86 architecture is that it's very well known, and it's therefore easier to improve compilers for revised cores.
And it's somewhat unfair to consider PPC's more powerful just because VirtualPC _feels_ fast. It isn't, you wouldn't want to do complex stuff on VirtualPC. Give it a while, PearPC is supposed to add a recompiler for amd64 CPU's - maybe OSX on Linux/ amd64 will feel even faster than the other way around...

When the Playstation II was on its way out one commonly quoted statistic was that it was 10x faster than a PC and able to push "66 million 3D transformations per second, and can render images at 2.4 billion pixels per second" which was a rather inaccurate way to portray its performance. Those numbers really only apply to the raw throughput the emotion engine has in terms of memory bus bandwidth. What it ignores is that the pipeline is largely lacking in cache and required such a dramatic programming change from what programmers were used to it has only recently been showing its potential (and programmers still complain that you can't fit a decent sized model or texture completely into the pipeline).
So now we're seeing similar marketing spin put on the cell processor phenomenon (which also isn't new although it is at this pricepoint). In my opinion the most exciting thing about this is that it may potentially affect future PowerPC designs (although with the Power5 already shipping its most likely that the next PowerPC chip will be a scaled down version of that much like the PowerPC970 draws largely from the IBM Power4).
So now we're seeing similar marketing spin put on the cell processor phenomenon (which also isn't new although it is at this pricepoint). In my opinion the most exciting thing about this is that it may potentially affect future PowerPC designs (although with the Power5 already shipping its most likely that the next PowerPC chip will be a scaled down version of that much like the PowerPC970 draws largely from the IBM Power4).
I think a major point is being missed here. The design concept is not analog to traditional workstations. These are low-power devices designed from the bottom up to cluster.
Yes, you can network a bunch of machines together, but what if the process used to synchronize the various systems together is part of the CPU design.
Now deathmatch style videogames is just one venue for this type of application. Sony & IBM are shooting for the living room with this concept, but I also imagine a heavily optimised style of FXTeleport-like process sharing, but suitable for all types of computing. Imagine having a cell-based PVR/DVD Player/Recorder in the front room and/or a PS3. Then you have an office machine in the den or bedroom, and your workstation in the studio. Each machine can, theoretically, dynamically borrow cycles from each other depending on the priority assigned to each device. Even more, you can just keep stacking Cells onto your audio workstation to keep up with the Jones's.
But the other end of the concept is large grid arrays for IBM's Enterprise class projects. Their vision is to build expandable low-power grids that you can keep stacking cells onto as the load increases.
So a 1-1 processor comparison totally misses the point of the excercise.
Outside of the infamous DeathStar GXP 60GB drives, I have yet to see IBM make bad kit. They just have a perpetual marketing problem (their marketing team should really be pushing mops). Sony, however, does not. I look forward to this with great anticipation.
Sam
Yes, you can network a bunch of machines together, but what if the process used to synchronize the various systems together is part of the CPU design.
Now deathmatch style videogames is just one venue for this type of application. Sony & IBM are shooting for the living room with this concept, but I also imagine a heavily optimised style of FXTeleport-like process sharing, but suitable for all types of computing. Imagine having a cell-based PVR/DVD Player/Recorder in the front room and/or a PS3. Then you have an office machine in the den or bedroom, and your workstation in the studio. Each machine can, theoretically, dynamically borrow cycles from each other depending on the priority assigned to each device. Even more, you can just keep stacking Cells onto your audio workstation to keep up with the Jones's.
But the other end of the concept is large grid arrays for IBM's Enterprise class projects. Their vision is to build expandable low-power grids that you can keep stacking cells onto as the load increases.
So a 1-1 processor comparison totally misses the point of the excercise.
Outside of the infamous DeathStar GXP 60GB drives, I have yet to see IBM make bad kit. They just have a perpetual marketing problem (their marketing team should really be pushing mops). Sony, however, does not. I look forward to this with great anticipation.
Sam
Sam,
I know that what you said is pretty much what STI had in mind with CELL. I simply think that's not feasible. How is the communication between the nodes supposed to work? Large cluster systems usually depend on Myrinet or Infiniband. Everything else lacks the needed latency and bandwidth, and Myrinet and Infiniband are extremely complicated and expensive technologies. Ethernet doesn't cut it, except maybe for 10Gb Ethernet (also very expensive), and wireless is completely out of the question. Am I supposed to connect my PC, PS3, DVD player, TV, microwave oven and kitchen sink using fibre channel? And what am I supposed to do with that processing power? Weather forecast? Nuclear weapon simulations? Lottery number prediction? A cluster is not exactly suited for realtime stuff (and vector CPU's like CELL are also unsuited for most servers applications)...
If you want a supercomputer for your living room, you might want to check Orion Multisystems 96 node minicluster (300Gflops peak, up to 192GB RAM, up to 9.6TB storage - at the size of a big tower...
):
http://www.orionmulti.com/products/
<font size=-1>[ This Message was edited by: wsippel on 2005-02-08 23:54 ]</font>
I know that what you said is pretty much what STI had in mind with CELL. I simply think that's not feasible. How is the communication between the nodes supposed to work? Large cluster systems usually depend on Myrinet or Infiniband. Everything else lacks the needed latency and bandwidth, and Myrinet and Infiniband are extremely complicated and expensive technologies. Ethernet doesn't cut it, except maybe for 10Gb Ethernet (also very expensive), and wireless is completely out of the question. Am I supposed to connect my PC, PS3, DVD player, TV, microwave oven and kitchen sink using fibre channel? And what am I supposed to do with that processing power? Weather forecast? Nuclear weapon simulations? Lottery number prediction? A cluster is not exactly suited for realtime stuff (and vector CPU's like CELL are also unsuited for most servers applications)...
If you want a supercomputer for your living room, you might want to check Orion Multisystems 96 node minicluster (300Gflops peak, up to 192GB RAM, up to 9.6TB storage - at the size of a big tower...

http://www.orionmulti.com/products/
<font size=-1>[ This Message was edited by: wsippel on 2005-02-08 23:54 ]</font>
I know from personal experience that most games actually LOSE about 10% performance on my duals, and the same often goes for non-SMP aware audio apps. Also only benefit I ever saw for Nuendo/Sx was added instability although I think this varies depending on your working mode and what plugins you tend to use. Softsynth code, at least in the 1.x and earlier 2.x versions was never properly balanced against the main midi/program engine and the vst/dx plugin processing so people who do more audio tracking and straight mixing (versus sample-playback and synthesis) benefitted more. The only tasks that benefit from having more than 1 cpu are 3d rendering (when broken into tiles or when scanlines are handled independantly) and similar 2d processes (filters in 2d paint and compositing applications).
I point this out in support of Wsippel's statement above. In today's computers we're moving closer and closer to a point where the entire machine is more or less a small network to itself, and the IBM cell cpu design takes this into the cpu itself. But the more things are broken into a networked (or switched) set of smaller computing devices the higher your latency will go unless you dramatically simplify the design of your hardware and push the task of scheduling things further into software. And as has been stated (again) you'll need the bandwidth to spare between the devices.
In regards to gaming it would seem to me that the backend duties (managing AI, physics calculations etc) would be a lot more likely candidates for offloading from the main client and/or server than the hardcore duties (graphics calculations, updating positions of actors in a scene, pushing geometry, etc.) One arena that might potentially benefit a lot would be MMORPG's. I can see it being used to dramatically increase the realism in a given game-universe. Insignificant things that would never have been dedicated cpu time (or would have been scheduled very low priority) before will now be possible to simulate, including all those positions that people never seem to want to fill in an online community (Janitor anyone?) I doubt however that distributed computing will bring more frames-per-second to the client (or better pixel shaders) in the near future (for reasons that Wsippel already outlined).
(edit) wanted to add this from the end of part2 of an Arstechnica article covering the IBM Cell cpu:
<font size=-1>[ This Message was edited by: valis on 2005-02-09 02:01 ]</font>
I point this out in support of Wsippel's statement above. In today's computers we're moving closer and closer to a point where the entire machine is more or less a small network to itself, and the IBM cell cpu design takes this into the cpu itself. But the more things are broken into a networked (or switched) set of smaller computing devices the higher your latency will go unless you dramatically simplify the design of your hardware and push the task of scheduling things further into software. And as has been stated (again) you'll need the bandwidth to spare between the devices.
In regards to gaming it would seem to me that the backend duties (managing AI, physics calculations etc) would be a lot more likely candidates for offloading from the main client and/or server than the hardcore duties (graphics calculations, updating positions of actors in a scene, pushing geometry, etc.) One arena that might potentially benefit a lot would be MMORPG's. I can see it being used to dramatically increase the realism in a given game-universe. Insignificant things that would never have been dedicated cpu time (or would have been scheduled very low priority) before will now be possible to simulate, including all those positions that people never seem to want to fill in an online community (Janitor anyone?) I doubt however that distributed computing will bring more frames-per-second to the client (or better pixel shaders) in the near future (for reasons that Wsippel already outlined).
(edit) wanted to add this from the end of part2 of an Arstechnica article covering the IBM Cell cpu:
So he has countered my supposition that it may eventually find its way into an Apple computer. Either way I still think that the next PowerPC will be versioned off the Power5 core. Part1 of this Arstechnica article is here: http://arstechnica.com/articles/paedia/cpu/cell-1.arsThe Cell and Apple
Finally, before signing off, I should clarify my earlier remarks to the effect that I don't think that Apple will use this CPU. I originally based this assessment on the fact that I knew that the SPUs would not use VMX/Altivec. However, the PPC core does have a VMX unit. Nonetheless, I expect this VMX to be very simple, and roughly comparable to the Altivec unit o the first G4. Everything on this processor is stripped down to the bare minimum, so don't expect a ton of VMX performance out of it, and definitely not anything comparable to the G5. Furthermore, any Altivec code written for the new G4 or G5 would have to be completely reoptimized due to inorder nature of the PPC core's issue.
So the short answer is, Apple's use of this chip is within the realm of concievability, but it's extremely unlikely in the short- and medium-term. Apple is just too heavily invested in Altivec, and this processor is going to be a relative weakling in that department. Sure, it'll pack a major SIMD punch, but that will not be a double-precision Alitvec-type punch.
<font size=-1>[ This Message was edited by: valis on 2005-02-09 02:01 ]</font>
Well, who knows what it will actually end up doing, I was merely pointing out an ommision in the realm of possibilities being discussed.
I'm always hopeful with new technologies, just because my getting comfortable with the old ones usually produces such good work. Obviously the continuous updating can throw the efficiency/competency curve a bit here and there.
I terms of how it will scale, just think of Blade servers. They are all just tiny low-power super-micro ATX machines. I think this is just taking the idea a little bit further and in a slightly different direction.
As for the supercomputing bit, I don't think that's really the point. It's certainly not what most people will get out of the package. What I envision is being able to wander from room to room and have the same Television stream playing on LCD wall panels, and a pop-up message telling you the dry-cycle on your laundry is done. Things like this. Obviously the capabilities are already sort of in place, but it's a bit of a hack job to get X10 chips into the electrical system and program your house from the ground up. I think the Cell is going to try and become the around-the-house chip. The PS3 is just a giant red carpet with which it can prove it's worthiness. WinXP MCE is such a pain in the ass and by it's design, limits what you can do to Microsoft's profit-regulated imagination, Linux is up and coming, but most people won't be able to use it practically in the living room for a while yet. But if the PS3 already demonstrates some level of integration and interaction with other PS3's on the network, computers, etc. then it may inspire some dedicated application development.
I'm not so naive to think that the world will explode in orgiastic celebration as soon as the chip hits the channel (nor do I think anyone attributed such an idea to me), but like Tom said earlier, it's nice to see companies playing the game a little different. We probably won't see these roll outs provide anything interesting besides Tekken 5 for a couple of months while people figure out what the hell it actually is and what it's limitations are. The possibilities are exciting though.
Sam
I'm always hopeful with new technologies, just because my getting comfortable with the old ones usually produces such good work. Obviously the continuous updating can throw the efficiency/competency curve a bit here and there.
I terms of how it will scale, just think of Blade servers. They are all just tiny low-power super-micro ATX machines. I think this is just taking the idea a little bit further and in a slightly different direction.
As for the supercomputing bit, I don't think that's really the point. It's certainly not what most people will get out of the package. What I envision is being able to wander from room to room and have the same Television stream playing on LCD wall panels, and a pop-up message telling you the dry-cycle on your laundry is done. Things like this. Obviously the capabilities are already sort of in place, but it's a bit of a hack job to get X10 chips into the electrical system and program your house from the ground up. I think the Cell is going to try and become the around-the-house chip. The PS3 is just a giant red carpet with which it can prove it's worthiness. WinXP MCE is such a pain in the ass and by it's design, limits what you can do to Microsoft's profit-regulated imagination, Linux is up and coming, but most people won't be able to use it practically in the living room for a while yet. But if the PS3 already demonstrates some level of integration and interaction with other PS3's on the network, computers, etc. then it may inspire some dedicated application development.
I'm not so naive to think that the world will explode in orgiastic celebration as soon as the chip hits the channel (nor do I think anyone attributed such an idea to me), but like Tom said earlier, it's nice to see companies playing the game a little different. We probably won't see these roll outs provide anything interesting besides Tekken 5 for a couple of months while people figure out what the hell it actually is and what it's limitations are. The possibilities are exciting though.
Sam