@Aldo Esplay said:
I have to agree with you Julian in the sense that if you simply overlaid two samples, even if the velocity was identical and phases were aligned as much as possible at the beginning of the sample, that by the end the two would be out of phase. However, mdovey's approach would need little modification to do a pretty good job. With Christian's test with zipping one of the files, I doubt anything this cumbersome is needed, but here's the idea. Most of the energy of the string is at some fundamental frequency. Since a good number of the keys in the piano have three strings, technically you'd have to determine 3 close, but not identical frequencies for these and their phase alignment. For a given key at a given velocity, the amplitude is probably easily described as a function of time. Applying this envelope to the phase aligned fundamental frequency mashup thingy and subtracting it from the sample would probably remove most of the energy of the sample. All of that data that was removed could be easily represented in a handful of parameters that would take only a few bits and could be stored as metadata. The residual sample data could then be expressed with far fewer bits. As big a pain in the butt as this would be, all the heavy lifting is on the encoding side. The decoding is much simpler. A lot more refinement would be needed to achieve a 10:1 compression ratio. Neat idea, but why waste the time developing from scratch some crazy algorithm when you can just tailor something like zip and get the same result. I think Football is right, this thread is nuts. Probably need a new thread just to debate the compression algorithm.
I may be misinterpreting this but weren't Christian's results a saving of just 8% with the zip and 29% with the rar? This is lossless in that the complete file is re-created on decoding (un-zipping) So the question is: is the VSL system actually a zip like function where the exact file is reconstructed on decoding (unlikely, i would guess at 10:1) or in fact data compression where 90% of the original data is removed forever? And if this is the case how much is this changing the original sound quality. I would have thought most good musicians/engineers would detect 10:1 compression ratios in A/B comparison however good the algorithms used - particularly when there is low level ambience involved (the room mics recordings)
Julian
Hmm. That'd change everything. Zip primarily uses "deflate" which makes use of LZ77, which is a dictionary coder. Christian hinted that the VSL is using a dictionary matching coder, so I'd guess it's a lot like zip. I'm guessing that either the sliding window is larger to increase the chances of a match, or that a static dictionary is used instead to reduce the dictionary size. With this being a fixed library and the compression being geared entirely to this specific library, I'd guess the latter. Word size makes a big difference. At 24 bits, if you can represent 10 consecutive samples with a 24-bit dictionary reference, then you could achieve 10 to one for that one string. With entire library available, an extremely optimized static dictionary could be built. Although it takes forever to build, as long as the entire dictionary can be read into memory (like 2 MB), then the CPU can assemble the actual sample file as it is read from disk.
If this is the case, then it would be lossless. Hopefully VSL has been able to optimize the dictionary to do it in a lossless manner. I'm sure they wouldn't go through all of the trouble to sample this much data just to screw it up with compression. More likely they had a goal to compress as much as possible without losing data and the magic ratio came out to around 10:1.
Aldo