I made this suggestion myself a couple of time. Then I decided to have a look at doing this kind of development myself just to get an idea of the potential. I made some experiments, just to get the idea.
The biggest problem of using theses cards, especially the graphics card is the fact, that you can process video pictures asynchronous.
This means, that you can have pieces of the picture calculated, give it to various processors and collect the results at the end and assemble a final picture and show the frame.
With audio, you always have to be in sync, in time. You cannot play note/audio in time ahead and assemble them later. So, music/audio is always a function over time. That is what the basic formulas of Fourrier seems to be about. (When I remember the forum right, than Dietz had some philosophical events with Fraunhofer about this).
And that is where things like nvidia falls short. Getting the Geforce (and all the co-cards) to be in sync with calculating audio effects in time in serial sync is the problem.
Or you can use the Motorola-DSPs, but look at Creamware, Kyma, TC Powercore and all the others: They have the problem, that with faster and faster "all purpose PC's", the advantage of dedicated hardware is more and more obsolete.
So I can imagine, if you are in for the long run (as I would guess VSL will do) youre best bet is common HW.