Well, garyb, after 14 years of sterling service, my Pulsar board has finally died and it's time to move on so I guess I won't be coming here again. I didn't want my last post to be a big "No, you're wrong" type of thing because I know how much you've helped me and countless other people over the years, and I didn't want to repay that with a riposte rather than a thank you. But you've made a number of statements, and as someone who has been programming PCs professionally ever since there was such a thing as a PC, I couldn't let some of them go unchallenged. Bear in mind that much of the information floating around about ht is dated, and Intel and MS have made significant improvements to the way it's handled over the years. But let's just take a step back and look at ht in general.
So, you start off with
n physical cores, so once ht is enabled you have 2
n virtual cores instead (not, as seems a common belief,
n real cores and
n virtual ones). Windows by and large just sees the number of cores as doubling, and so is able to schedule twice as many threads. And btw, there's nothing magic about 2 threads per core, it just depends on the CPU architecture - PowerPC runs 3 threads per core, and the Intel Xeon Phi runs 4 threads per core (which is insane when you have 60+ physical cores to start with). So what's in a core? It's basically a computer of its own, with a simple instruction set (the micro-ops or μops), an "OS" in ROM, various types of arithmetic processing units, several different types of RAM, and lots of I/O. Its main task is to fetch the complex x86 instructions (macro-ops), and break them down into μops which it queues up and executes. It will try every trick in the book to make sure it's not sat there doing nothing - for example, it works out which μops don't depend on the results of others so it can execute 2 or more simultaneously, depending on which processing units they need, or it can even bring a μop forward right to the front in the queue if it sees there's an appropriate processing unit free and nothing else depends on the results. It will even try to guess the result of a conditional branch operation and fetch (and even start executing) macro instructions at the predicted branch target. But of course the processing resources are limited, and there may well be μops just stuck waiting to execute.
If it had a longer μop queue then it would have more to choose from and there would be more of a chance it could find one that it could execute. Intel went down that road with its Netburst architecture, but that was a dead end because the longer the queue, the more severe the time penalty for mis-predicting a branch, and so the performance gain is negated. With the Nehalem architecture, Intel did a radical redesign and allowed one core to process 2 execution threads. Each core can now have twice as many μops in its queue without doubling the penalty for a branch mis-prediction. A further refinement with Haswell and beyond was to segregate the μop queue between the 2 threads so that one of them wouldn't monopolize it to the detriment of the other. And occasionally Intel will throw another processing unit into the core to improve things yet further.
The best analogy I've seen is that of a carpenter. He has his tools, and at any one time he's using one or more of them, so maybe a clamp and a drill, or a bench and a saw. The problem here is that there are expensive tools sitting there unused, so what you do is you get another carpenter along and say they both have to share the same tools. While carpenter #1 is drilling, carpenter #2 can be sawing, and so you are making the best use of the resources. So if carpenter #1 is making a chair and #2 is making a bird table then this probably works pretty well as mostly they won't be needing the same tools at the same time. Where it may not work so well is if they're both making chairs and starting at the same time so they need the same tools in the same order, or maybe #1's chair is a rush job but you haven't told either of them and so he sometimes waits for tools that #2 is using because neither of them knows that #1 ought to be in a hurry.
Taking this back to CPUs and ht, if you have a lot of active threads all doing the same thing, like, oh, say they were all streaming audio, then ht may not be the best thing as they may end up all waiting for the same resources at the same time. Or if you had some really important threads, like, oh, say they were all streaming audio, then again ht may not be the best thing as they may end up having to wait for resources being used by some less important threads because it's only the OS that knows the thread priorities, not the cores. Time for a curveball, though. Even if you don't have ht enabled, don't think that your threads run uninterrupted on the same cores as Windows may well spread a single thread across the available cores. Say you have 4 physical cores and ht is off, and just run a single-threaded process. You may well see that even though the process is using 25% of your CPU, if you look at each individual core, there's not one running at 100% and the others idle, they are ALL running at 25%. So even if you're just making a single chair, there are 4 carpenters actually working on it in turns!
So how to fix this and make ht work the way your software needs it to? Many years ago, MS introduced the SetThreadAffinityMask and SetProcessAffinityMask OS calls. What these do is to specify which core(s), or virtual core(s) if ht is enabled, your process or thread is allowed to use. So if you don't want your thread smeared across different cores, ask for it to run on just one of them. If you have lots of threads all doing the same thing at the same time, ask for them to be allocated to virtual cores in a way that will minimise internal resource waits. Now we're ready for:
garyb wrote:it has nothing to do with Sonar
No, it has
everything to do with Sonar, or whatever DAW you are using. That software should be using the affinity masks to mitigate the ht downsides and maximize the upsides. Or they could just chicken out and tell you to turn off ht (which is fine if you are a professional with dedicated music PCs). You can actually see that Sonar is using affinity masks because, at least on my PC with ht enabled, it is only using 7 of my 8 virtual cores most of the time. Why exactly it should be doing that I can't tell you because affinity masks are tweaky things and Cakewalk probably spent ages working out the best settings.
garyb wrote:ht only helps offline processes
No, ht can help with any process if used correctly because it's maximizing the use of processor resources.
garyb wrote:it takes REAL cores to process virtual cores.
True, but as far as Windows is concerned, your "real cores" are still pretty virtual because of the way it smears threads between them
unless your software tells it not to.
garyb wrote:in a realtime system, ht will make most processes late, which means lost data and clicks and pops.
Windows has never been and will never be a realtime OS. I've worked with proper realtime OS like OS-9 and they are very, very different beasts than Windows. OS-9 doesn't even allow multiple cores, let alone ht. In a non-realtime OS supporting multiple cores like Windows you just have to make the best of a bad job, and that means not letting processor resources lie idle.
But now we come to the crux of it:
garyb wrote:... if your use allows ht without problems, great!
I'm glad I can end with something of yours that I can wholeheartedly agree with. If ht works in your setup then it works, otherwise it doesn't, whether it ought to or not.
So on that note, I bid you farewell and wish you (and everyone else here) all the best for the future.