This one takes a series of sustained sounds, diphonetic and triphonetic samples from a sample library which are specified by the phonetic system and utilizes them to reconstruct the word reassembling them in accordance to how a word would be phonetically pronounced.
VOCALOID uses the method called Frequency-domain Singing Articulation Splicing and Shaping, a kind of concatenative synthesis. As such, Japanese VOCALOIDs are often more precise than English ones on their diaphonetic sounds. This makes separating sounds for the English VOCALOIDs much harder to do. However, for English VOCALOIDs, the phonetic data has to be separated by cutting sections out of the recorded samples, because some sounds simply cannot be gathered unless they were spoken as part of a word. The libraries consist of various sounds recorded and separated for use with the software.įor Japanese the script is much simpler with each phonetic sample successfully divided across the notes with little trouble. The recording is then transferred to into a library which the VOCALOIDs will pull their results from. The samples are gathered via the provider reading out a script in various keys while being recorded. Note: The following applies to the VOCALOID2 system onwards, while both programs work in a similar fashion, some things may not apply to VOCALOID or work differently than VOCALOID2.