Japanese 5 Vowel Animation Curve File Output (.adxlip)

When the argument -enable_adxlip_compat is specified, output will be generated as a separate file in addition to the file path specified with -out.

It outputs a file format compatible with files previously generated by CriLipsMake.exe.
Only valid values for the five Japanese vowels are recorded. The frame rate is fixed at 100FPS.

File Contents

The analysis results are output as a comma-separated text file. The file format is as follows:

// input: created by CriLipsMake2
// framerate: 100 [fps]
// frame count,msec,width(0-1 def=0.583),height(0-1 def=0.000),toungue(0-1 def=0.000),A(0-1),I(0-1),U(0-1),E(0-1),O(0-1),Vol(dB)
0,0.0000,0.5830,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,-96.0000
1,10.0000,0.5830,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,-96.0000
2,20.0000,0.5830,0.0000,0.0000,0.0000,0.0615,0.0000,0.0000,0.0000,-96.0000
3,30.0000,0.5830,0.0000,0.0000,0.0000,0.1988,0.0000,0.0000,0.0000,-96.0000
4,40.0000,0.5830,0.0000,0.0000,0.0000,0.3530,0.0000,0.0000,0.0000,-96.0000
5,50.0000,0.5830,0.0000,0.0000,0.0000,0.5046,0.0000,0.0000,0.0000,-96.0000
6,60.0000,0.5830,0.0000,0.0000,0.0000,0.6423,0.0000,0.0000,0.0000,-96.0000
7,70.0000,0.5830,0.0000,0.0000,0.0000,0.7618,0.0000,0.0000,0.0000,-96.0000
8,80.0000,0.5830,0.0000,0.0000,0.0000,0.8605,0.0000,0.0000,0.0000,-96.0000
9,90.0000,0.5830,0.0000,0.0000,0.0000,0.9326,0.0000,0.0000,0.0000,-96.0000
10,100.0000,0.5830,0.0000,0.0000,0.0000,0.9700,0.0000,0.0000,0.0000,-96.0000

Lines starting with '//' are header lines.
In the current version, three header lines are output.
Each line below the header contains the analysis results of mouth pattern data.

frame count

The frame count is the number of analysis frames, incremented one by one according to the frame rate (fixed at 100 FPS).

msec

This is the audio time (milliseconds) corresponding to the mouth pattern data of this line.
When output from CriLipsMake2, it is fixed at 10 milliseconds.

width, height

Not supported in CriLipsMake2. Always written as closed mouth (0.0, 0.583).

tongue position

A value according to the analysis result is written.

A, I, U, E, O

Each value means:

Blend amount of morph target "A" (between 0.0f and 1.0f)
Blend amount of morph target "I" (between 0.0f and 1.0f)
Blend amount of morph target "U" (between 0.0f and 1.0f)
Blend amount of morph target "E" (between 0.0f and 1.0f)
Blend amount of morph target "O" (between 0.0f and 1.0f)

If your model data has morph targets that can be blended, use these output values for morphing.

Of the five blend amounts output here, at most two can be greater than 0 at the same time.
This is to prevent mouth shapes from breaking when combining three or more blend shapes.

The total of all blend amounts does not exceed 1.0.
If all "A I U E O" are 0.0, it means "the speaker's mouth is closed."

Vol

Decibel: 0.0 means maximum volume, the more negative, the quieter. In this result, -96 means silent.
Note that -96 to 0.0 does not increase linearly.
This value is output as a reference for applications.
For example, you may use it to adjust the mouth opening size when the volume is high.

File Contents​

frame count​

msec​

width, height​

tongue position​

A, I, U, E, O​

Vol​