This will be a guide that will show you how to start using RVC. I'm new to this myself, but i already did some model trainings. For in-depth knowledge seek more advanced guides. This is just "How to start"
Disclaimer: Sorry, but i use google collab. If you know how to and want to help including installation process in guide, then please. WEBUI will be same. (i guess)
Links to other guides are down below.
the corresponding github:
https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI
How to install RVC v2 locally: https://docs.google.com/document/d/1KKKE7hoyGXMw-Lg0JWx16R8xz3OfxADjwEYJTqzDO1k/edit
And the collab that i use:
https://colab.research.google.com/drive/1TU-kkQWVf-PLO_hSa2QCMZS1XF5xVHqs
1. COLLAB SETUP
You can skip those steps if you want to skip right to WEBUI part.
Step 1. Click on that button
Step 2.
Make a folder "dataset" in your google drive. Place in it audio that you want train model from. The higher the quality of audio the better the results. So preferable to use WAV (flac didnt work for me, dont know why).
Step 3.
Run the ui. Wait until https://%nonsencewifipassword%.gradio.live link will appear in console and click on it.
2. The UI. Training
You can skip those steps if you want to skip right to processing (generating) part.
You can just copy the screenshots content.
1. First thing you see. Go to Train tab.
2. The process of training goes by steps. Strictly follow them (i separated steps by red line) or you will get error!
Project name
sample rate 40k (48 gave me error)
i just leave that true, i think base model includes that
Version V2
Click on Process data. Wait until it finishes processing (check the console). Likely it will be very fast.
3. next step
1. select mangio-crepe (This is not a must, i just know that mangio-crepe is best, along with harvest, but last one is slow)
2. Write 128.
3. Click on Feature extraction and WAIT UNTIL IT FINISHES processing (check console).
4. last step
1. Save every n epoch. Each epoch is ~55mb, adjust it to your google drive capacity.
1.1. Total training epochs - likely collab will cut you out faster than you will traing everything. i Usually have up to 400 epochs, but that doesnt really matter. You will search for best one yourself.
2. Batch size for GPU in collab is 16.
3. Click train.
Check the console for "Epoch"s. And wait.
...
...
After it succesfully trained, you can get your model in folder "/content/Retrieval-based-Voice-Conversion-WebUI/weights". Also the collab saves the models to your google drive in "RVC backup" folder. If you close your collab you wont lose them.
Voice model file extension is called .pth
5. The Graph. How to get best epochs.
Right under collabs console
you click
SCALARS
Ignore outliners in chart scaling
Smoothing 0,999
write in g/total
"like here's an example of a voice i built earlier. i trained it for a while, so this is max smoothing but with ignore outliers turned on because eventually you do need to ignore them (but will turn it off again to find the best point to pull from)
in this graph you can see where the sharp dips are, and where it clearly starts trending upward at around 33k.
those dips are where i'd pull my .pth files from to test against each other and see which sounds best" (C) mojo.zone
3. The processing. Testing.
Refresh models. Choose epoch you want. Write the path for MP3(audio file. Better bitrate - better quality*). Leave everything default. Click convert.
You are on your own now.
You succesfully made AI voice. Now read more detailed guides and iterate best options yourself.
#EXTRA:
How to make an AI cover using an existing model on RVC v2: https://docs.google.com/document/d/13_l1bd1Osgz7qlAZn-zhklCbHpVRk6bYOuAuB78qmsE/edit
How to train a new voice model using RVC v2:
https://docs.google.com/document/d/13ebnzmeEBc6uzYCMt-QVFQk-whVrK4zw8k7_Lw3Bv_A/edit
How to install RVC v2 locally: https://docs.google.com/document/d/1KKKE7hoyGXMw-Lg0JWx16R8xz3OfxADjwEYJTqzDO1k/edit