Verify that the audio server interval in the runtime library is not too long (such as 30fps).
 When playing back a Cue, there is a delay of 1V in synchronous processing.
 The unit of V can be changed for a game. It may be set to a low value by default in order to reduce the load of the game on machines with poor performance.
 Ex.) 30fps=1V (about 33msec). 60fps=1V( about 17msec).
 It is usually said that latency is perceived from 50msec, but in some games, it should be 17msec or the latency will be perceived.
 
In addition, if the sound to be played back is known in advance, you can reduce the latency by using PREP playback (starting the playback in paused state).
 On smartphones (especially on Android), several 100msec of latency may be introduced in the mixer's sound buffer for some targets.
 Though it is a special case, you can reduce the latency using the CRI Atom layer (low-level playback runtime).
 (Basically, only using the decoded waveform. The program can control the volume and pitch.)
 There is a trade-off between CPU load and ease of programming or design.