@cm said:
ok, i was understanding the term *sample* in the sense it is used in the VSL product description (violin staccato has xy samples, ect).
if we look at it in a digital manner we have to clear definitions first:
- the headers of a PCM/RIFF file are using a 32bit data format (this is the reason why a RIFF file is limited to 4 GB maximum, usually even 2 GB because of the difference between signed and unsigned integer)
- the sampling bit-depth can be 8/16/24/32 bit (VSL is 16 bit currently, of course theoretically it can be lower than 8 bit but i'm sure we wouldn't like that, additionally 8 bit is stored as unsigned integer, wheras 16 bit as signed integer, 32bit would be floating point)
- the sampling rate can be something (VSL is using 44100 Hz currently - /2 = 22050, /4=11025, /8=5512,5 ... oops, here we would leave the integer and we had to start rounding because i can't display a half sample)
- PCM data ist stored interleaved in frames and each frame consists of sequential data for the used channels (2 in case of VSL because of stereo)
i'm basically referring to this description
- the Nyquist-Shannon-Theorem says that unambiguous reconstruction is possible if the signal is bandlimited and the sampling frequency is greater than twice the signal bandwidth. that's why telephone lines don't sound too good and i think a sampling frequency of 11050 would leave us with an unacceptable *quality*.
one could possibly now *invent* (=interpolate) the missing samples to stay with 44100 or repeat the *reference sample* but i've never tried that and probably i wouldn't like to hear the result. do you have any examples for that to compare the quality using a human ear?
based on this - where would you cut the amount of information into half (and further /4, /8, ect) without loosing an acceptable sound?
another issue is the needed amount of preloaded samples. this is basically only neccessary to fill the various bufffers (soundcard, audio application, operating system, harddisk I/O) and *hide* the various latencies.
128 kB (=32.768 samples 16 bit stereo) is a common value for most samling engines - ViennaInstruments needs far less (and is 24 bit!).
256 Byte (= 64 samples 16bit stereo) is a common value for soundcards - many users need to use a higher value because their system cannot deliver such a data stream continuosly.
any efforts in this direction (reducing amount of preloaded data) are useless IMO and stay theoretical unless we get storage media with an almost-zero seektime like compact-flash or solid state disks with enough capacity for a reasonable price.
christian
Thanks for the detailed reply Christian. I think we're getting to an interesting place now...
This is how I envisaged this working (and I'm fully prepared to be told that it doesn't work - it is just an idea)
1) You have a 'quality' value that can be controlled on the VI which is defaulted to 1, but can be changed to 2 or 4. As you say, smaller values than 4 would not only probably sound bad, but would also start to cause rounding issues. as 44100/8 is not a round number. That isn't to say it isn't possible of course.
2) When pre-loading or streaming samples from disk, you use this value to only load every nth sample value in the file, skipping the samples in between. In other words with a value set to 4, you take sample 0,4,8,12,16 etc.
This in turn will reduce the pre-load buffer by a factor of n. In other words, with a value of 4, we could theoretically load 4x as many instruments into memory. What you end up loading is a less accurate sample waveform, but one that is much smaller.
3) When playing back the sample (at 44100), you repeat the same sample n times so that it appears to be a sample recorded at 44100. At no point does the VI change its sample rate. It always remain at 44100.
There are improvements on the above. As you say, you could interpolate the samples rather than playing 4 identical samples thus you still save on memory but synthetically create slighly different values for each played sample so that it sound closer to the original.
After all, this is how computer animation works. Quite often frames are interpolated from the frames around it to speed up workflow.
I would like to experiment with this... although I'll have to read up on how to read a wav file programmatically... what would be an interesting test would be to read a 44100 wav file into memory and write out different versions of the same file using different settings as above. Also try out the interpolation. I may look into this and get back to the forum and post the results for people's assessment of the quality.