A major issue with any multi-processor system is bus contention. Adding cores can only really be beneficial if the memory or I/O buses have unused communications capacity when the existing complement of processors is making full use of them. Otherwise, it's like trying to increase the traffic-carrying capacity of the road network by adding vehicles; once the roads are full, more cars don't equal more throughput.
Given the description of the quad-core chips in the referred-to article, that they are pin-compatible replacements for dual-core chips, then it doesn't surprise me that the performance boost is substantially less than 100%, since it's unlikely that the existing buses could carry twice as much data as the dual-core chips need.
Intel found that going above about three processors on a single bus was not just diminishing returns, but could be counter-productive because of the overhead in bus arbitration that happens, due to the amount of times each CPU is denied access to memory or I/O because another CPU is using them. This is why their Xeon architecture pegs out at four, as I understand it. (AMD get more performance by taking the multiple-bus architecture from DEC's Alpha chip, which is more scalable because it has parallel channels to memory. But that's more expensive to make, since every motherboard carries the cost of all those duplicate connections.)
One potential benefit of adding cores, though, is a rise in cache coherency. Each CPU has some cache associated with it, either solely for it or shared with other cores. If you have more tasks active on the machine than there are CPUs available, then those tasks will periodically be stopped, taken off the CPU they were running on, and another program given access to the CPU instead. When the first task is given the chance to run again, perhaps only a millisecond or two later, the cache of data it had built up could have been scribbled on by the other task. This can completely annihilate the performance of both tasks compared with running them isolated from one other.
A separate problem is when the first task gets allocated to the first available CPU, and it turns out to be a different one from the CPU it ran on a moment ago; in this case, even if its cached data were still available, it's not accessible from the new CPU. 'Processor affinity' is a technique used to try to get around this, where long-running tasks are automatically assigned to the same CPU when possible. I don't know how well OS/X handles this, or whether tasks like individual instruments know how to ask for it, but that would certainly be a factor in getting the best out of any multi-CPU machine.
Given the description of the quad-core chips in the referred-to article, that they are pin-compatible replacements for dual-core chips, then it doesn't surprise me that the performance boost is substantially less than 100%, since it's unlikely that the existing buses could carry twice as much data as the dual-core chips need.
Intel found that going above about three processors on a single bus was not just diminishing returns, but could be counter-productive because of the overhead in bus arbitration that happens, due to the amount of times each CPU is denied access to memory or I/O because another CPU is using them. This is why their Xeon architecture pegs out at four, as I understand it. (AMD get more performance by taking the multiple-bus architecture from DEC's Alpha chip, which is more scalable because it has parallel channels to memory. But that's more expensive to make, since every motherboard carries the cost of all those duplicate connections.)
One potential benefit of adding cores, though, is a rise in cache coherency. Each CPU has some cache associated with it, either solely for it or shared with other cores. If you have more tasks active on the machine than there are CPUs available, then those tasks will periodically be stopped, taken off the CPU they were running on, and another program given access to the CPU instead. When the first task is given the chance to run again, perhaps only a millisecond or two later, the cache of data it had built up could have been scribbled on by the other task. This can completely annihilate the performance of both tasks compared with running them isolated from one other.
A separate problem is when the first task gets allocated to the first available CPU, and it turns out to be a different one from the CPU it ran on a moment ago; in this case, even if its cached data were still available, it's not accessible from the new CPU. 'Processor affinity' is a technique used to try to get around this, where long-running tasks are automatically assigned to the same CPU when possible. I don't know how well OS/X handles this, or whether tasks like individual instruments know how to ask for it, but that would certainly be a factor in getting the best out of any multi-CPU machine.