Regarding "perfect sync"
Choosing from the list above I'd guess #3 works best, but is super time consuming since delay per region would have to be set by hand.
In my case we have decided not to use VEPro for various reasons, mostly relating to Logic X. We do use track delays within the time boundaries of what Logic actually executes.
While changing DAWs might help Cubase, and every DAW, has inconsistent MIDI outut timing, especially when many MIDI events occur simultaneously.
There is also the fact that round-robin sampler setups mean that any single track delay will be inaccurate, by some margin, for some individual samples. In some cases freezing or re-setting round robins can help address this. We just fudge this and set an average track delay.
In general, the more "realistic" a synth orchestra is the more likely every MIDI playback will be somewhat different.
In the case of full orchestra MIDI mockups it is worth considering what the indended outcome is. Say there's a pronounced ensemble "hit" at an exact point, like beat 1. The graphical depiction of the MIDI in the session should be right at the GUI beat one line. How should it sound?
In the "real world" ensembles of instruments will speak at different times based on the player and a host of other variables. This range of variability is probably as wide or wider than the inconsistencies of MIDI output from/processing inside a DAW.
I suggest that if a real ensemble somehow managed to play such that all of their sounds reached a listener starting at an exact point in time, the listener would react to the effect as "fake" or "mechanical." So, unless a fake or mechanical effect is desired, this kind of synchrony is undesireable.
In fact, much of the advancements in sample libraries have been intended to reduce just this effect. Trying to program it out is bound to fail.
Sometimes the samples within a sample library are edited poorly. In these cases, if extremely accurate timing is a high priority, the library is unusable.
So, if the answer to "how should it sound?" is "somewhat natural, or believable" perfectly timed MIDI processing is probably not the most important consideration.
Maybe the correct question is not "how should it sound?" but is rather "how should the MIDI, once translated into audio, be captured as an audio file? And how will that interact with a video synchronized to the same sync protocol used when creating the MIDI data?" This is the need of people using synth orchestras in a soundtrack. It isn't the "MIDI," or even the immediate audio output. It's the capture of audio output as a PCM file that matters--and, most importantly, matters when placing that audio into a video project.
To address these questions the following considerations must be made:
- How is the synth orchestra synchronizing to the video track? MTC, LTC, or internal to the DAW?
- How accurate is the timing relationship between the DAW musical (bars, beats, units or bbu) timeline and the DAW video (hours mintes seconds frames or hh:mm:ss:ff) timeline?
Regarding (1) many professional systems use MTC to sync the musical and video timelines. MTC has a resultion of 1/4 to 2 frames, 10 to 80 MS at 24fps. this range is large enough to introduce musically significant asynchrony. Given this coarse timing, creating a truly tight (say 5MS or less) audio output from a MIDI DAW session is mostly unrealistic. LTC, which is not native to most DAW systems, can be significantly more accurate, in my tests within a few samples. Keeping the entire project inside one DAW is also much better for overall timing accuracy.
Regarding (2) one of the big surprises of working with Logic has been the doscovery that its MTC output was influenced by the musical timeline. This has been resolved in Logic 10.2.2. Another consideration is the fact that the MIDI tempo map may be interpreted differently by different MIDI file readers. As a result a tempo map created in Logic may run differently when opened in ProTools.
So, the answer to "how should the MIDI, once translated into audio, be captured as an audio file? And how will that interact with a video synchronized to the same MTC used when creating the MIDI data?" is, this depends on a lot of stuff, but in most cases the interaction with the MTC will be inaccurate and variable on a pass-by-pass basis.
Essentially this is a shortcoming of
- Sample libraries that use real human performances--if the sample libraries were all fully digital the issue of variable sample starts could be resolved fairly easily.
- Outdated synchronization protocols. Proprietary synchronization schemes.
And the solutions:
- More carefully edited sample sets. Samplers that account for round-robin sample start variables.
- An industry to move toward open source (non Dante) audio-over-IP solutions, namely AES 67, and a universal packet-based MIDI protocol. This could reduce MIDI playback variability to a matter of nanoseconds.