Vienna Symphonic Library Forum
Forum Statistics

197,313 users have contributed to 43,059 threads and 258,564 posts.

In the past 24 hours, we have 1 new thread(s), 15 new post(s) and 83 new user(s).

  • last edited
    last edited

    ok, let me try to explain 😊

    I have to admit that I haven't used  VSL yet, I only have experience with Synthogy's Ivory, but I guess the way it basically works is the same.

    My understand how it currently works is as follows: When the engine initially starts, it reads the first part of each sample (of each sampleset the user has selected) into RAM, let's say the first 10ms. This amounts to quite some GBs. Additionally, the engine allocates a buffer for each polyphony level. These buffers can hold a much longer period, let's say 100ms. These buffers are cheap, because you only need very few of them compared to the amount of buffers you need for the first 10ms of each sample.

    The moment a (MIDI) event arrives, the engine can start playing the sample right away, because it has 10ms buffered in RAM. It starts playing from this buffer. Simultaneously it allocates one of the 100ms buffers and directs the HDD driver to fill it with the sample data from 10ms-110ms. After the first 10ms have been played, hopefully enough data has arrived from HDD to continue from the larger buffer. This buffer will constantly be refilled from HDD until the sample ends or playback has been terminated. Afterwards the 100ms buffer will be released. If the sample is being played again, the data starting from 10ms will have to be fetched from HDD again. The 10ms are chosen in a way the HDD has enough time to respond.

    Now the same in my 3 layer model:

    At installation time, we copy the first 10ms of each installed sample to SSD. This data will stay there until the sampleset will be deinstalled.

    At engine startup, the engine reads the first 1ms of each sample into RAM. Additionally it allocates the 100ms buffers as above.

    The moment a (MIDI) event arrives, the engine can start playing the sample right away, because it has 1ms buffered in RAM. It starts playing from this buffer. Simultaneously it allocates one of the 100ms buffers and directs the SDD driver to fill the first 9ms of the buffer with the sample data from 1ms to 10ms. Also, it direct the HDD driver to fill it with the sample data from 10ms-101ms. After the first ms have been played, hopefully enough data has arrived from SDD to continue from the larger buffer. After 10ms the data from HDD will have arrived so that there will be no disruption in playback. After that, this buffer will constantly be refilled from HDD in the same way as above until the sample ends or playback has been terminated. Afterwards the 100ms buffer will be released. If the sample is being played again, the data will have to be fetched from SSD and HDD again. The 1ms is chosen in a way the SSD has enough time to respond, the 1+9=10ms are chosen in a way the HDD has enough time to respond.

    The model described here bases on how I would implement sample streaming on an first impulse, because I have never done it. Please correct me if some (or all 😉 of my basic assumptions are wrong. 

    @Another User said:

     

    pps: it doesn't matter if it is PPC, intel. sparc, alpha, windows, OS X, solaris, irix, BSD, linux, whatever .... sample streaming has it's rules everywhere ...

    Sample streaming has, but latency has not. Clearly there are well behaved systems and systems that are not. From my experience Windows is  nightmare regarding latency.


  • so if i understood your model right now, you would use the flash just as *doubling* the *second part* of the sample header. but then again the *problem* as known starts as soon as the flash buffer is empty, then the usual harddisk access starts ("buffer filled constantly from the harddisk" - thats as it is done now)

     

    down into the math (not 100% accurate for all possible cases) - say ome instrument is A/B blended and has release samples we need 4 x 64KB sample header (per note and velocity layer) which affects only needed memory for the moment, so a loaded preset/matrix/patch needs between a few up to 800 MB or even more. (note: this amount could only be reduced if the sample header = preload buffer can be configured shorter).

     

    on the other hand (since almost all VI-instruments are monophonic so far) we only need to stream 4 samples at a given time (using the example above) more or less in the way you describe it - preload buffer gets emptied on the front and refilled from harddisk on the back.

    simplified: the preload buffer with 64 KB equals ~370 ms (thanks to the on-the-fly decompresssion, otherwise it would be only ~250 ms) so this buffer needs to be filled with a new 32KB portion after 175 ms otherwise the buffer would run empty during the next cycle. this result in 22.8 access events per second (for a single *track* only), each has to be executed and finished within the 175 ms above ... in fact it is a) more and b) we don't have as much time.

     

    counterwise look: to have 10ms in RAM + 90ms on flash (for each sample in the instrument) + a 4 x 100ms buffer (reserved for the currently played *sample set*) we are fine for the first 100 ms, but then the process described in the paragraph above starts again (access from harddisk), just with the difference we now have 85 access events per second and only 47 ms left to execute and finish the refill process of the buffer.

     

    given we use a modern sATA II drive with average 4,5 ms seektime we spend 10% of the time left just for allowing the harddisk to look where the needed portion resides - then we can start with the actual process to read and write the data and begin thinking about file system, harddisk interface and memory buffers, I/O waitstates and everything else.

     

    the priciple as such is of course worth to think about it and would reduce the needed memory, say for a setup needing 24 GB RAM now to 650 MB + 6.5 GBG flash memory + a harddisk system capable of handling min. 5.000 of such I/O processes per second and everything without *crapping* the CPU only with I/O (since the CPU still has its main job rendering the audio according to the rules within the VI player)

     

    so my point of view is it would only help to have everything on flash and then reduce the preload buffer (assumably divided by 8) given the various other latencies will allow to use a preload buffer of 46 ms. some streaming engines currently allow to reduce the preload to 16 KB per sample under certain conditions, but we have to keep in mind it has to work under every condition, especially under high load (many tracks, complex instruments)

     

    anyway, thanks for your input on this topic, clearly there are more routes traveling to rome ;-)

    christian


    and remember: only a CRAY can run an endless loop in just three seconds.
  • I see you're still not completely enthusiastic, so please give me one last shot ;-) I made the mistake to name some numbers without knowing the real ones. So please take my '1, 10, 100ms' only as an example and replace them by the numbers you really use. My intend was not to increase the IO/s. I'll try again: currently,  - you preload 64k for every possible stream. This gives you 350ms for the first HDD access  -  the HDD is accessed in portions of 32k every 175ms change it to the following:  - you preload 8k for every possible stream. This gives you 45ms for the first _SDD_ access. You consented reducing the preload buffer to 1/8 in case of SDD should be possible  - read 2x32k from SDD to fill your buffer. This give you 300ms (given 40ms access time for the 32k and 10ms for the second) for the first HDD access  - beginning from the third access, read from the HDD in 32k portions every 175ms as usual So I don't want to add any accesses, just direct the first two to the SDD instead of HDD. There will be no additional I/O or CPU load. Also you don't need much bandwidth to the SDD, because you need it for the first 64k of each sample only. Given a bandwidth of 50MB/s stated in an earlier post this would give you 780 events/samples per second, independant of the polyphony. For a massive polyphony you only need fast harddrives. Experimenting with the buffers might show that it is sufficient to read only 48k instead of 64k, yielding 1000 samples/s. As a further optimization, as mentioned in my last post, you can start the HDD access right away and not only after the data from SDD arrived. Cheers,

    Arne


  • I'd like to go back to the motivation, that led to my initial proposal. My goal was to save Steffen from selling his car to buy RAM, because without a car he cannot give me piano lessons anymore.

    If I understand him correctly he edits his arrangement in Cubase and plays it back to review the results. He does not play any instrument live, so latency does not matter to him. To save RAM you currently have two options:

    a) bounce some tracks to disk so they have not to be rendered on every playback

    b) only preload the notes/velocities he really needs.

    Both approaches are a bit cumbersome and time-consuming.

    Why not add a 'high latency mode', where nothing is preloaded? One could delay playback for about 300ms and use the time to load the samples needed. This way you need not chose which instruments/notes/velocities you need in advance, you'd just have your complete library at your disposal, without the need for a single GB of RAM.

    AFAIK Cubase is able to compensate for the latency, so it should be possible to even play some Instruments live in the conventional way. The only drawback I can see is that you have to wait 300ms after hitting 'play' before playback starts.

    Just a thought.

    Arne 


  • last edited
    last edited

    @arne said:

    There will be no additional I/O or CPU load.

     

    There will be: You just have to add the SSD I/Os to the current workflow, so the I/Os in CPU/RAM will be roughly doubled.  And there's still that thing that the user would be able to do everything on his SSD drive via his Explorer which you don't have any control over... so missing files here will crush the system easily.

    The way I understood it now for a 4 part sample, each letter representing a 32kb part of the buffer, the brackets [..] the 64kb preload amount:

    [AB] [CD] [EF] [GH]

    When the sample of [AB] is accessed, you will read A into CPU, and request the following 32kb (which are needed after B) from HDD to be filled in A while B is transferred to CPU?

    I thought that [AB] would be left untouched and for each voice of polyphony there is a new buffer [XY] in RAM where X is the 32kb after B, and Y the 32kb after X, and meanwhile X will get filled with the next 32kb after current Y.

    How else would you deal with the situation if a sample is retriggered again within that first 350ms or while the sample is still streaming? In case you don't want any data loss...

    PolarBear

    PS: Funny thing - with brackets alone [] forum software won't display the second one...


  • last edited
    last edited

    @arne said:

    - you preload 64k for every possible stream. This gives you 350ms for the first HDD access

    not really ... because of the nature of buffers harddisk access had to beginn after 175ms ... at latest ... if it would only start at 350 ms the buffer would be already empty. also dividing the buffer in two portions only was just an analogy to soundcard buffers (the one for output of wave data) to simplify the math in my example. the same would apply for any buffer filled by SDD, so its geting already rather complex when and how to switch the source without starting to think about the behaviour of threads handling all this.

     

    our developers already optimized the engine in such a detailed way (including the monolithic data format as source) that only one plain priciple will lead to significantly more performance: much helps much - in this case speed.

    christian


    and remember: only a CRAY can run an endless loop in just three seconds.
  • last edited
    last edited

    @arne said:

    The only drawback I can see is that you have to wait 300ms after hitting 'play' before playback starts.

    this would be unacceptable for a large portion of our userbase, at least for those who need to add tracks using a keyboard ...

    such a high latency mode would be reserved for the next link in the chain, the MIR - buffers need to be much larger there by design and realtime playing (without or with almost no latency) would be possible there only at the expense of quality.

    christian


    and remember: only a CRAY can run an endless loop in just three seconds.
  • last edited
    last edited

    @Another User said:


    our developers already optimized the engine in such a detailed way (including the monolithic data format as source) that only one plain priciple will lead to significantly more performance: much helps much - in this case speed.

    christian

    You should ask your developers if they thought about this. If I hadn't already have a good job I'd build a business case around these two ideas... 


  • last edited
    last edited

    @arne said:

    The only drawback I can see is that you have to wait 300ms after hitting 'play' before playback starts.

    this would be unacceptable for a large portion of our userbase, at least for those who need to add tracks using a keyboard ...

    christian

    Hi Christian,

    I admit meanwhile reading this thread very intrested, I am of course not that far inside software optimization. But as far I understand Arne right, he seems to propose, we wont need no exagerated RAM-Hype if we accept the 300ms latency for loading the Samples totally from HDD.

    Does'nt that sound promissing: working with half a terabyte samples on one machine without being limited by any RAM-capacity just admitting that one can't use the midikeyboard in realtime.

    Why not allowing the software to do that optionwise for those, who work mainly with large scores in Sequencers. For now, working with large sophisticated scores, which VSL is already well prepared to cope with muscally, still forces you to more or less extensive use of the RAM-optimization function.

    I admit the current VI-RAM-Optimization is - compared to former days - an ingenius feature, But still it would be more comfortable to keep all necessary Level II articulations, velocitys available for minor tweaks and adjustments than to optimize other Patches to reload those you want to change, just because there are some users that sometimes like to programm their midievents with realtime playing.

    OK, nobody should be forced to accept 300ms Latency if he preferres to boost his Hardwaresetup to play all his midievents in real time, but why should all need little servers to run VSL just because some like to have that realtime option, (No one would play a whole Mahlerscore or anything like this in realtime ...at least that would presumably neither really speed up nor ameilorate the workflow in any way 😉

    VSL already seem to be mentaly prepared to sacrifice that realtime-principle in order to get MIR-work for your Userbase,

    So I suppose the necessary latency for MIR would be even easier accepted, the earlier one can already optionwise experience its real advantages for the efficiency of Hardware-usage and impacts on the workflow. 

    Meanwhile I wont be forced to sell any car to upgrade my hardware in anyway and actually just wait for the moment that I can rely on some more experiences which would be the suitable Hardwaresetup to make the best use of 64-Systems. But since you opend up the VSL-Userbase for those who are lucky that they can afford the SE, this option might make live easier, for those who wont buy server-hardware to use their SE or Download-VI's.

    In short a total Direct-from-Disc as Arne proposed it, still seems to me an interesting additional option if ever possible.

    best

    Steffen


  • steffen, maybe my statement it would be *worth thinking about* has been drowned out by the amount of involved details ...

     

    direct from disk is also called (patented?) the technology from NI which also cannot cut out buffers for sample headers .. such a *read ahead* technology is a totally other approach to sample based arrangement - we could do a lot of midi detection and processing during such 300 ms which would finally result in a kind of *sample rendering* engine and might also be worth to think about it.

     

    the neccessary latency for the MIR is caused by the nature of convolving and i'd say there is nothing to be sacrificed ... compare with CGI (Computer Generated Images) ... to draw a simple shadow works easily realtime, to show even the first reflections (from objects within a scene) works too, but to render a realistic movie containing several layers, a dozend light sources and reflections (meaning the depth of reflecting light on surfaces, not the number of reflections) is impossible realtime - you would either need incredible CPU power and a high delay or accept offline rendering.

     

    AFAIK convolving in sequoia adds at least 1024 or even 2048 to the latency, and the impulses are not very much and long in this case ....

    christian


    and remember: only a CRAY can run an endless loop in just three seconds.
  • last edited
    last edited

    @arne said:

    You should ask your developers if they thought about this. If I hadn't already have a good job I'd build a business case around these two ideas... 

    If you actually really consider this idea I'd make my userbase a lot if not indefinitely larger by providing SSD benefits for all possible applications - you'd need to buffer/copy the first portion of every file to SSD to overcome HDD seektime and have a little RAID-like controller manage your files and managing read and write operations, e.g. in an encapsulated external bay. I think a 64GB SSD drive should be enough for buffering 1million files, typical on a 500GB drive should be around 300k at max. and less.

    Actually the drawback here is, that it wouldn't really work with monolithic files 😉

    PolarBear 


  • btw ... this reminds me to something else (related to my initial misunderstanding of the three-tier-model above) ....

    AFAIK all notebooks with the VISTA logo need to have hybrid-disks (normal harddrive + some flash memory to hold often used data) since june 2007 ....

    now lately i've read a report about an extensive test how much such hybrid-disks in fact do speed up system boot / application start / loading data ...

    guess what the result has been ... a shattering one .. no or almost no effect ... the difference between *slower* and *faster* drives was much more sigmificant than any performance gain using the hybrid-thingies ....

    a not so shattering but nevertheless relatively poor result for waking up from the hibernate mode (memory written to disk when computer *sleeps*) if using flash instead of normal harddrives.

    christian


    and remember: only a CRAY can run an endless loop in just three seconds.
  • last edited
    last edited

    @arne said:

    The only drawback I can see is that you have to wait 300ms after hitting 'play' before playback starts.

    this would be unacceptable for a large portion of our userbase, at least for those who need to add tracks using a keyboard ...

     

    Of course this wouldn't be a global option but a per-instrument option, so the user will be able to add tracks in the usual low-latency way.


  • last edited
    last edited

    @PolarBear said:

    If you actually really consider this idea I'd make my userbase a lot if not indefinitely larger by providing SSD benefits for all possible applications - you'd need to buffer/copy the first portion of every file to SSD to overcome HDD seektime and have a little RAID-like controller manage your files and managing read and write operations, e.g. in an encapsulated external bay.

    Actually the drawback here is, that it wouldn't really work with monolithic files 😉

     

    This is a nice idea, but hard to implement on device level. Normally the device doesn't know anything about files, it only knows about sectors. Of course you can teach your device how FAT32, NTFS, UFS, ZFS, but you can imagine how error prone that is. Also, if the user builds a RAID with these device, each device only sees partial filesystems.

    A different approach would be to add a 'learn' button. As long as this button is held down, every read access is mirrored to flash. Otherwise the content is untouched.

    This would lead to the following usage szenario: the user installs his libs to our drive. Then he starts his VST host and loads one instrument after the other, always pressing the button while the application reads the sample headers. This will give the drive exactly the information it needs, regardless if the data is organized in single or monolithic files.

    Cheers,

    Arne


  • last edited
    last edited

    @cm said:

    now lately i've read a report about an extensive test how much such hybrid-disks in fact do speed up system boot / application start / loading data ...

    guess what the result has been ... a shattering one .. no or almost no effect ...

     

    I could only guess, but so far the SSD managment is purely done by Vista and not implemented at hardware level. Cache sizes usually are 256MB which is really small for OS plus apps. And then again - Vista would have to know which files should be loaded to manage to get them to the SSD portion before it really requests them. It would also have to know which application you were to launch next or which application data to precache when booted. So I don't think it's a problem that SSD is involve, but morea structural one, that it's not supported or used to maximum (or in that case: any noticeable) effect.

    PolarBear 


  • Arne, that "learn on startup" idea is nice, but that would only work for monolithic sample files where we know we will have 100% of all possible read accesses done in the loading phase... anyway... this thing is a task of its own and I seriously doubt I'm gonna go and do it.

    PolarBear 


  • So how are those flash disks ging on with VSL on the 2009 ?

    can't find modern posts about it ?

    will the vsl instrument be able to playback sounds on these disks without loading the all sample in ram ?


  • Excuse me if I miss the obvious - but what do you mean when you write "loading all sample in RAM"?

    /Dietz - Vienna Symphonic Library
  • mmm maybe i'm wrong but ... when you load an instrument all the sounds are going in the ram right ?

    that's why you need huge amout of ram.

    with flash disks maybe it would be possible to imagine that the sounds stays and the drive until you play it. Then loaded very fast to ram or direct to sound card (or not i dunno i'm not engeenier).

    anyway i can't find recent posts about flash disks and vsl ... i imagine you can load sounds very fast from those drives but about the sreaming, what does it bring.

    i'm about to buy a new computer for my samples. about 10Go of ram. or maybe there's something new coming with the flash disks..


  • Oh, you got this wrong, obviously.

    Vienna Instruments stream their samples directly from disk, only the first few KB of data is loaded to RAM. It just the fact that there are so many samples within one instrument which gives you the impression that _everything_ gets loaded into RAM. :-)

    Kind regards,

    /Dietz - Vienna Symphonic Library