Description
(CV: Natalie Van Sistine) 40K, 8 Hop length Trained on 28 minutes of in-game dialogue <
Comments

chose 1000epochs bc it had less artifacting

can't do covers cause im abroad rn lol

wait isnt this over trained to oblivion

listen to this

what steps was e250 at

thats strange that it sounds better?

is that inferenced over the original voiceline?

yes

the voice sounds fine even at 1000 epochs so thats why i kept it

¯\_(ツ)_/¯

also because i was using mangio to infer

with rmvpe these problems are usuaslly non-existant

i was testing using worst-case scenario (in terms of sibilant artifacts)

240epoch was 27k so im assumiong 250 was at around 28/29k

i think the graph smoothing was too much

do you have a 0.95 graph still

nope

unfortunate then

it does display the trend properly though

it's not like this is an exact science so

sometimes the graph skews up at the start quite a bit when at 0.999

nah its a normal trend for all these voices

sampo natasha tingyun graphs btw

all at 0.999 smoothing

seems like there is always 2 dips

yep

although a 50k serval one might be good

ill have to test more

idk why my models are quite fine at low epoch

i was using mangio-crepe to infer, which struggles a lot when it comes to sibilants

rmvpe usually has no issues with that

although i dont think inferencing over the original audio is a good idea cause it might overfit?

yeah it's just an easy way to determine how "faithful" the model is

that's why i add multiple examples

like a singing test tts test and the original audio test

to show the models stability
Add a comment
Samples
1. Singing
Male
English
2. Singing
Female
English
3. Singing (Dry)
Female
English
4. Singing (High)
Female
English
5. Singing 2
Male
English
6. Singing (Dry)
Male
English
7. Singing (Dry, High)
Male
English
Pitch
Users also tried
More to explore
Loading more