Flash-Disks?

fahl5

I've talked with a friend, working as software engineer for one of the leading german Internet providers about the use of little servers for the use of larger RAM for VSL. He asked me if VSL-Software is able to use Flash-disks, which might - as he put it - replace expensive Investments for RAM and Servers. Here is his Mail. Would this be an inexpensiv option to handle our half-terabyte samples on one machine?

best

Steffen

"Hi Steffen,

diskutieren deine VSL-Kollegen eigentlich nur darueber, wie sie mehr RAM in ihre Rechner kriegen, oder hat schonmal jemand darueber nachgedacht, Flashdisks zu verwenden? Das viele RAM braucht man doch nur, um von jedem erdenklichen Sample die ersten paar hundert Millisekunden im direkten Zugriff zu haben. Da Flash aber wesentlich schneller im Zugriff ist als Festplatten es sind, wuerde es dann reichen, nur 1-2ms im RAM zu halten. Die naechsten paar hundert ms sind dann im Flash, der Rest wie gehabt auf Platte. Wenn die Software das unterstuetzt, koennte man damit sehr viel Geld fuer RAM und Server sparen. Das muss man den VSL-Leuten nur schmackhaft machen. Eine 32 GB SSD kostet 150,-, und damit spart man sich 32GB RAM und entsprechend teuere Mainboards.

Gruss,

Arne"

PolarBear

Hi Steffen,

Some SSD discussions around here:

http://community.vsl.co.at/forums/p/17871/127291.aspx#127291

http://community.vsl.co.at/forums/p/17920/127595.aspx#127595

http://community.vsl.co.at/forums/p/10836/80906.aspx#80906

The point of all these is, we're not there with capacities yet. Storing all of VSL will take about 600GB of disk space. That's 10x 64GB SSD drives at a price around 6000 Euro pls two controllers at around 300 each. It's still less expensive to build that 32GB RAM server machine for around 3000 Euro that is providing the same result.

Another issue is that currently the VI software has a fixed preload value of 64kb per sample, which could not be adjusted by the user. That also means that is yet not possible to load two different instances of VI's with different preload settings, so that one could spread the load onto SSD and conventional HDD.

Using SSD as another buffer in the line won't really work because of their life span would decrease massively with that many write cycles involved.

All the best,

PolarBear

cm

steffen (and arne), no need to make VSL palatable to such a consideration ... ;-)

a few drawbacks: the 32 GB flash drives for 150.- are not the fastest ... (assumably 40 MB/s range) ... the big advantage would be the low average seektime (~0.1 ms) ... the problem is that conventional flash drives are based on the IDE protocol (like pATA is)

a pre-series model i have received of a 64 GB drive (pure sATA) delivers ~ 150 MB/s with also ~ 0.1 ms seektime and astonishingly problems show up actually reducing the preload buffer to 32 KB ... so there must be another hidden bottleneck somewhere.

one of the most *expensive* processes on a computer is access, read and write data (using kernel time and adding load to the chipset), so such a 3-tier model would double this load (harddisk to flash, flash to RAM, RAM to processor, processor to audio device) and add another buffer (the flash disk) what usually also adds latency.

polar bear is also right when he mentions the write-and-read-cycles which would reduce lifetime or at least MTBF of the *flash buffer* significantly.

IMO the next step which would make sense to test is a stripe (raid 0) of 4 128 GB models when they become somehow affordable.

christian

PolarBear

Well I'd say the 40MB/s limitation isn't that crucial as you woldn't hit it too often when you have several of these drives RAIDed. But I also read the cheaper models aren't suited for RAID, only the more expensive industry models - can't imagine why though.

Maybe the bottleneck with 32kb is that some system access is performed meanwhile on the sys drive or the SSD being connected via a busy southbridge line? Perhaps you should test this also with the system drive being an SSD drive, dunno.

PolarBear

arne

Hi,

@cm said:
one of the most *expensive* processes on a computer is access, read and write data (using kernel time and adding load to the chipset), so such a 3-tier model would double this load (harddisk to flash, flash to RAM, RAM to processor, processor to audio device) and add another buffer (the flash disk) what usually also adds latency.

I feel you have mistaken my proposal a bit: I do not want to write the data from harddisk *through* the flash. The idea is to organize the data in a way that let's say the first 64k of each sample reside on flash disk, the rest solely on HDD. There won't be any writes to the flash disk.

Also because you only read a small amount of each sample from flash, you don't need that much bandwidth. The overwhelming amount of data still come directly from HDD.

There are no additional accesses and no additional buffers. At the moment you have 2 sources for the data, first RAM (very low latency), second HDD (high latency). Just add a third source, SSD (low latency).

Regarding latency: I cannot imagine that the access time to SSD is > 0.1ms on a well tuned system as long as the drive is not overloaded, but I don't have experience with windows there, I'm a SUN/Solaris guy 😉

I don't think an experiment with _all_ data on SSD does prove anything in regard to my proposal because in this case you'll run quickly into bandwidth limitation which you won't if you follow the procedure outlined above.

cheers,

Arne

PolarBear

@arne said:
I feel you have mistaken my proposal a bit: I do not want to write the data from harddisk *through* the flash. The idea is to organize the data in a way that let's say the first 64k of each sample reside on flash disk, the rest solely on HDD. There won't be any writes to the flash disk.
Also because you only read a small amount of each sample from flash, you don't need that much bandwidth. The overwhelming amount of data still come directly from HDD.
There are no additional accesses and no additional buffers. At the moment you have 2 sources for the data, first RAM (very low latency), second HDD (high latency). Just add a third source, SSD (low latency).

I also played with this idea in my head some while ago. But what you propose would mean to write a IDE driver from the scratch. How could you verify things keep updated? That every sample still or already has the header saved in the SSD portion? And then again, there is no buffering model like you (and I) thought of with the SSD being an additional read acces medium - the way it is now is, that "streaming" still means that little chunks of data (likely to be 64kb blocks of sample audio data, hence polyphony limitations) are transferred into RAM and then from there go to the CPU. Pre-buffered chunks in RAM aren't touched therefore.

So you would have to make sure you are reading the first chunk of data from the SSD while already accessing the next chunk of data at the SATA/IDE drive at the same time. You'd have to double the amount of current voice polyphony buffering to serve both SSD and conventional harddrive. Last but not least I doubt it's an easy task to do this cross platform or maybe even for the different Windows versions of 2k, XP, XP64, Vista32 and Vista64, ... Ultimately it's easier to wait the year or two until such a development would be able to hit the market and by then SSD will be already reasonable enough in price so that everything could be substituted. Just my opinion though, it won't work as easy as the hybrid drives that are out there already with around 120GB capacity and 256MB SSD drive portion e.g. - they seem to cache the whole data and are most likely working on heuristic methods for pre-caching if they do any at all...

All the best,

PolarBear

cm

welcome arne, maybe i misunderstood your proposal, but then you need to explain:

why should it help to have a) very little in RAM b) something on the flash disk c) the overwhelming amount on the disk

as soon as the data from the flash disk is empty the streaming process again happens from the harddrive and we are in the same situation as we are now (without flash)

and because data from the harddrive comes *so slow* we still need the same amount of buffer in RAM ... we will win nothing

i did not say flash has more than 0.1 ms latency (here: seektime) - it _is_ ~ 0.1 ms

the only design which would make sense IMO is to have all sample data on flash to receive 0.1 ms seektime for the whole amount of sample data, otherwise i couldn't reduce the buffer size for streaming in RAM

christian

ps: directory management would become very complicated if we would spread sample data across flash and harddrive, since flash would be also accessed as drive/volume

pps: it doesn't matter if it is PPC, intel. sparc, alpha, windows, OS X, solaris, irix, BSD, linux, whatever .... sample streaming has it's rules everywhere ...

arne

ok, let me try to explain 😊

I have to admit that I haven't used VSL yet, I only have experience with Synthogy's Ivory, but I guess the way it basically works is the same.

My understand how it currently works is as follows: When the engine initially starts, it reads the first part of each sample (of each sampleset the user has selected) into RAM, let's say the first 10ms. This amounts to quite some GBs. Additionally, the engine allocates a buffer for each polyphony level. These buffers can hold a much longer period, let's say 100ms. These buffers are cheap, because you only need very few of them compared to the amount of buffers you need for the first 10ms of each sample.

The moment a (MIDI) event arrives, the engine can start playing the sample right away, because it has 10ms buffered in RAM. It starts playing from this buffer. Simultaneously it allocates one of the 100ms buffers and directs the HDD driver to fill it with the sample data from 10ms-110ms. After the first 10ms have been played, hopefully enough data has arrived from HDD to continue from the larger buffer. This buffer will constantly be refilled from HDD until the sample ends or playback has been terminated. Afterwards the 100ms buffer will be released. If the sample is being played again, the data starting from 10ms will have to be fetched from HDD again. The 10ms are chosen in a way the HDD has enough time to respond.

Now the same in my 3 layer model:

At installation time, we copy the first 10ms of each installed sample to SSD. This data will stay there until the sampleset will be deinstalled.

At engine startup, the engine reads the first 1ms of each sample into RAM. Additionally it allocates the 100ms buffers as above.

The moment a (MIDI) event arrives, the engine can start playing the sample right away, because it has 1ms buffered in RAM. It starts playing from this buffer. Simultaneously it allocates one of the 100ms buffers and directs the SDD driver to fill the first 9ms of the buffer with the sample data from 1ms to 10ms. Also, it direct the HDD driver to fill it with the sample data from 10ms-101ms. After the first ms have been played, hopefully enough data has arrived from SDD to continue from the larger buffer. After 10ms the data from HDD will have arrived so that there will be no disruption in playback. After that, this buffer will constantly be refilled from HDD in the same way as above until the sample ends or playback has been terminated. Afterwards the 100ms buffer will be released. If the sample is being played again, the data will have to be fetched from SSD and HDD again. The 1ms is chosen in a way the SSD has enough time to respond, the 1+9=10ms are chosen in a way the HDD has enough time to respond.

The model described here bases on how I would implement sample streaming on an first impulse, because I have never done it. Please correct me if some (or all 😉 of my basic assumptions are wrong.

@Another User said:

pps: it doesn't matter if it is PPC, intel. sparc, alpha, windows, OS X, solaris, irix, BSD, linux, whatever .... sample streaming has it's rules everywhere ...

Sample streaming has, but latency has not. Clearly there are well behaved systems and systems that are not. From my experience Windows is nightmare regarding latency.

cm

so if i understood your model right now, you would use the flash just as *doubling* the *second part* of the sample header. but then again the *problem* as known starts as soon as the flash buffer is empty, then the usual harddisk access starts ("buffer filled constantly from the harddisk" - thats as it is done now)

down into the math (not 100% accurate for all possible cases) - say ome instrument is A/B blended and has release samples we need 4 x 64KB sample header (per note and velocity layer) which affects only needed memory for the moment, so a loaded preset/matrix/patch needs between a few up to 800 MB or even more. (note: this amount could only be reduced if the sample header = preload buffer can be configured shorter).

on the other hand (since almost all VI-instruments are monophonic so far) we only need to stream 4 samples at a given time (using the example above) more or less in the way you describe it - preload buffer gets emptied on the front and refilled from harddisk on the back.

simplified: the preload buffer with 64 KB equals ~370 ms (thanks to the on-the-fly decompresssion, otherwise it would be only ~250 ms) so this buffer needs to be filled with a new 32KB portion after 175 ms otherwise the buffer would run empty during the next cycle. this result in 22.8 access events per second (for a single *track* only), each has to be executed and finished within the 175 ms above ... in fact it is a) more and b) we don't have as much time.

counterwise look: to have 10ms in RAM + 90ms on flash (for each sample in the instrument) + a 4 x 100ms buffer (reserved for the currently played *sample set*) we are fine for the first 100 ms, but then the process described in the paragraph above starts again (access from harddisk), just with the difference we now have 85 access events per second and only 47 ms left to execute and finish the refill process of the buffer.

given we use a modern sATA II drive with average 4,5 ms seektime we spend 10% of the time left just for allowing the harddisk to look where the needed portion resides - then we can start with the actual process to read and write the data and begin thinking about file system, harddisk interface and memory buffers, I/O waitstates and everything else.

the priciple as such is of course worth to think about it and would reduce the needed memory, say for a setup needing 24 GB RAM now to 650 MB + 6.5 GBG flash memory + a harddisk system capable of handling min. 5.000 of such I/O processes per second and everything without *crapping* the CPU only with I/O (since the CPU still has its main job rendering the audio according to the rules within the VI player)

so my point of view is it would only help to have everything on flash and then reduce the preload buffer (assumably divided by 8) given the various other latencies will allow to use a preload buffer of 46 ms. some streaming engines currently allow to reduce the preload to 16 KB per sample under certain conditions, but we have to keep in mind it has to work under every condition, especially under high load (many tracks, complex instruments)

anyway, thanks for your input on this topic, clearly there are more routes traveling to rome ;-)

christian

arne

I see you're still not completely enthusiastic, so please give me one last shot ;-) I made the mistake to name some numbers without knowing the real ones. So please take my '1, 10, 100ms' only as an example and replace them by the numbers you really use. My intend was not to increase the IO/s. I'll try again: currently, - you preload 64k for every possible stream. This gives you 350ms for the first HDD access - the HDD is accessed in portions of 32k every 175ms change it to the following: - you preload 8k for every possible stream. This gives you 45ms for the first _SDD_ access. You consented reducing the preload buffer to 1/8 in case of SDD should be possible - read 2x32k from SDD to fill your buffer. This give you 300ms (given 40ms access time for the 32k and 10ms for the second) for the first HDD access - beginning from the third access, read from the HDD in 32k portions every 175ms as usual So I don't want to add any accesses, just direct the first two to the SDD instead of HDD. There will be no additional I/O or CPU load. Also you don't need much bandwidth to the SDD, because you need it for the first 64k of each sample only. Given a bandwidth of 50MB/s stated in an earlier post this would give you 780 events/samples per second, independant of the polyphony. For a massive polyphony you only need fast harddrives. Experimenting with the buffers might show that it is sufficient to read only 48k instead of 64k, yielding 1000 samples/s. As a further optimization, as mentioned in my last post, you can start the HDD access right away and not only after the data from SDD arrived. Cheers,

Arne

arne

I'd like to go back to the motivation, that led to my initial proposal. My goal was to save Steffen from selling his car to buy RAM, because without a car he cannot give me piano lessons anymore.

If I understand him correctly he edits his arrangement in Cubase and plays it back to review the results. He does not play any instrument live, so latency does not matter to him. To save RAM you currently have two options:

a) bounce some tracks to disk so they have not to be rendered on every playback

b) only preload the notes/velocities he really needs.

Both approaches are a bit cumbersome and time-consuming.

Why not add a 'high latency mode', where nothing is preloaded? One could delay playback for about 300ms and use the time to load the samples needed. This way you need not chose which instruments/notes/velocities you need in advance, you'd just have your complete library at your disposal, without the need for a single GB of RAM.

AFAIK Cubase is able to compensate for the latency, so it should be possible to even play some Instruments live in the conventional way. The only drawback I can see is that you have to wait 300ms after hitting 'play' before playback starts.

Just a thought.

Arne

PolarBear

@arne said:
There will be no additional I/O or CPU load.

There will be: You just have to add the SSD I/Os to the current workflow, so the I/Os in CPU/RAM will be roughly doubled. And there's still that thing that the user would be able to do everything on his SSD drive via his Explorer which you don't have any control over... so missing files here will crush the system easily.

The way I understood it now for a 4 part sample, each letter representing a 32kb part of the buffer, the brackets [..] the 64kb preload amount:

[AB] [CD] [EF] [GH]

When the sample of [AB] is accessed, you will read A into CPU, and request the following 32kb (which are needed after B) from HDD to be filled in A while B is transferred to CPU?

I thought that [AB] would be left untouched and for each voice of polyphony there is a new buffer [XY] in RAM where X is the 32kb after B, and Y the 32kb after X, and meanwhile X will get filled with the next 32kb after current Y.

How else would you deal with the situation if a sample is retriggered again within that first 350ms or while the sample is still streaming? In case you don't want any data loss...

PolarBear

PS: Funny thing - with brackets alone [] forum software won't display the second one...

cm

@arne said:
- you preload 64k for every possible stream. This gives you 350ms for the first HDD access

not really ... because of the nature of buffers harddisk access had to beginn after 175ms ... at latest ... if it would only start at 350 ms the buffer would be already empty. also dividing the buffer in two portions only was just an analogy to soundcard buffers (the one for output of wave data) to simplify the math in my example. the same would apply for any buffer filled by SDD, so its geting already rather complex when and how to switch the source without starting to think about the behaviour of threads handling all this.

our developers already optimized the engine in such a detailed way (including the monolithic data format as source) that only one plain priciple will lead to significantly more performance: much helps much - in this case speed.

christian

cm

@arne said:
The only drawback I can see is that you have to wait 300ms after hitting 'play' before playback starts.

this would be unacceptable for a large portion of our userbase, at least for those who need to add tracks using a keyboard ...

such a high latency mode would be reserved for the next link in the chain, the MIR - buffers need to be much larger there by design and realtime playing (without or with almost no latency) would be possible there only at the expense of quality.

christian

arne

@Another User said:

our developers already optimized the engine in such a detailed way (including the monolithic data format as source) that only one plain priciple will lead to significantly more performance: much helps much - in this case speed.
christian

You should ask your developers if they thought about this. If I hadn't already have a good job I'd build a business case around these two ideas...

fahl5

@arne said:
The only drawback I can see is that you have to wait 300ms after hitting 'play' before playback starts.

this would be unacceptable for a large portion of our userbase, at least for those who need to add tracks using a keyboard ...

christian

Hi Christian,

I admit meanwhile reading this thread very intrested, I am of course not that far inside software optimization. But as far I understand Arne right, he seems to propose, we wont need no exagerated RAM-Hype if we accept the 300ms latency for loading the Samples totally from HDD.

Does'nt that sound promissing: working with half a terabyte samples on one machine without being limited by any RAM-capacity just admitting that one can't use the midikeyboard in realtime.

Why not allowing the software to do that optionwise for those, who work mainly with large scores in Sequencers. For now, working with large sophisticated scores, which VSL is already well prepared to cope with muscally, still forces you to more or less extensive use of the RAM-optimization function.

I admit the current VI-RAM-Optimization is - compared to former days - an ingenius feature, But still it would be more comfortable to keep all necessary Level II articulations, velocitys available for minor tweaks and adjustments than to optimize other Patches to reload those you want to change, just because there are some users that sometimes like to programm their midievents with realtime playing.

OK, nobody should be forced to accept 300ms Latency if he preferres to boost his Hardwaresetup to play all his midievents in real time, but why should all need little servers to run VSL just because some like to have that realtime option, (No one would play a whole Mahlerscore or anything like this in realtime ...at least that would presumably neither really speed up nor ameilorate the workflow in any way 😉

VSL already seem to be mentaly prepared to sacrifice that realtime-principle in order to get MIR-work for your Userbase,

So I suppose the necessary latency for MIR would be even easier accepted, the earlier one can already optionwise experience its real advantages for the efficiency of Hardware-usage and impacts on the workflow.

Meanwhile I wont be forced to sell any car to upgrade my hardware in anyway and actually just wait for the moment that I can rely on some more experiences which would be the suitable Hardwaresetup to make the best use of 64-Systems. But since you opend up the VSL-Userbase for those who are lucky that they can afford the SE, this option might make live easier, for those who wont buy server-hardware to use their SE or Download-VI's.

In short a total Direct-from-Disc as Arne proposed it, still seems to me an interesting additional option if ever possible.

best

Steffen

cm

steffen, maybe my statement it would be *worth thinking about* has been drowned out by the amount of involved details ...

direct from disk is also called (patented?) the technology from NI which also cannot cut out buffers for sample headers .. such a *read ahead* technology is a totally other approach to sample based arrangement - we could do a lot of midi detection and processing during such 300 ms which would finally result in a kind of *sample rendering* engine and might also be worth to think about it.

the neccessary latency for the MIR is caused by the nature of convolving and i'd say there is nothing to be sacrificed ... compare with CGI (Computer Generated Images) ... to draw a simple shadow works easily realtime, to show even the first reflections (from objects within a scene) works too, but to render a realistic movie containing several layers, a dozend light sources and reflections (meaning the depth of reflecting light on surfaces, not the number of reflections) is impossible realtime - you would either need incredible CPU power and a high delay or accept offline rendering.

AFAIK convolving in sequoia adds at least 1024 or even 2048 to the latency, and the impulses are not very much and long in this case ....

christian

PolarBear

@arne said:
You should ask your developers if they thought about this. If I hadn't already have a good job I'd build a business case around these two ideas...

If you actually really consider this idea I'd make my userbase a lot if not indefinitely larger by providing SSD benefits for all possible applications - you'd need to buffer/copy the first portion of every file to SSD to overcome HDD seektime and have a little RAID-like controller manage your files and managing read and write operations, e.g. in an encapsulated external bay. I think a 64GB SSD drive should be enough for buffering 1million files, typical on a 500GB drive should be around 300k at max. and less.

Actually the drawback here is, that it wouldn't really work with monolithic files 😉

PolarBear

cm

btw ... this reminds me to something else (related to my initial misunderstanding of the three-tier-model above) ....

AFAIK all notebooks with the VISTA logo need to have hybrid-disks (normal harddrive + some flash memory to hold often used data) since june 2007 ....

now lately i've read a report about an extensive test how much such hybrid-disks in fact do speed up system boot / application start / loading data ...

guess what the result has been ... a shattering one .. no or almost no effect ... the difference between *slower* and *faster* drives was much more sigmificant than any performance gain using the hybrid-thingies ....

a not so shattering but nevertheless relatively poor result for waking up from the hibernate mode (memory written to disk when computer *sleeps*) if using flash instead of normal harddrives.

christian

arne

@arne said:
The only drawback I can see is that you have to wait 300ms after hitting 'play' before playback starts.

this would be unacceptable for a large portion of our userbase, at least for those who need to add tracks using a keyboard ...

Of course this wouldn't be a global option but a per-instrument option, so the user will be able to add tracks in the usual low-latency way.

Products

Deals

Music

Forum

Support

Academy

Vienna Symphonic Library Forum

Forum Statistics

Flash-Disks?