Submitted by Xilodyne on Tue, 04/14/2020 - 17:26

Step by Step Guide

Installing and Running the Real Time Voice Cloning demo from CorentinJ

on Windows 10 Pro, CPU only

TL;DR

Get the CPU code: https://github.com/shawwn/Real-Time-Voice-Cloning
Download and install Visual Studio Build Tools (https://visualstudio.microsoft.com/downloads/#build-tools-for-visual-studio-2019), make sure to install the optional CLI
Reboot machine
Download and unzip the required conda packages: http://neuralotic.blog.xilodyne.com/sites/default/files/2020-06/spec-file_rtvc.zip
Download and install ffmpeg, point PATH to the bin directory: https://ffmpeg.zeranoe.com/builds/win64/static/ffmpeg-4.2.2-win64-static.zip

From your conda prompt, install the files, navigate to your code, run python demo_cli.py:

Create and run a *conda* environment
`(base) > conda create --name rtvc --file spec-file_rtvc.txt (base) > conda activate rtvc (rtvc) > pip install webrtcvad (rtvc) > pip install PyQT5 (rtvc) > python demo_cli.py`

Source Code and Instructions

~~https://github.com/CorentinJ/Real-Time-Voice-Cloning~~

~~Download GitHub zip file: https://github.com/CorentinJ/Real-Time-Voice-Cloning.git~~

Note: Didn't work correctly with CPU environment (see below), use forked download from: https://github.com/shawwn/Real-Time-Voice-Cloning

Development Environment
Windows 10 Pro in VMWare Image PyCharm 2020.1 (Community Edition) Anaconda3-2020-02 x64

Steps

In development folder, unzip Real-Time-Voice-Cloning-master.zip

Directions state to run pip install -r requirements.txt to install the necessary packages. However, will install manually.

With Anaconda prompt (Windows Key --> Anaconda3 (64-bit) --> Anaconda Prompt)

Create work environment

>conda create --name rtvc python=3.7

>conda activate rtvc

Install packages from requirements.txt

>conda install -c conda-forge tensorflow=1.14

Note: installing tensorflow (i.e. tensorflow-cpu) for eval purposes

Test tensorflow install

`test_tf-1.0.py` (Tensorflow Hello World)
`import tensorflow as tf` `import os` `os.environ['CUDA_VISIBLE_DEVICES'] = "0"` `hello = tf.constant('Hello, TensorFlow!') sess = tf.Session() print(sess.run(hello))` `#if no gpu: #CUDA_VISIBLE_DEVICES='0'`

Results (success) `test_tf-1.0.py`
`2020-04-13 09:42:06.769691: I tensorflow/core/common_runtime/process_util.cc:115] Creating new thread pool with default inter op setting: 4. Tune using inter_op_parallelism_threads for best performance. b'Hello, TensorFlow!'` `Process finished with exit code 0`

> ~~conda install -c pytorch pytorch~~ ~~(https://anaconda.org/pytorch/pytorch)~~

*PyTorch* Defaults to GPU
`Collecting package metadata (repodata.json): done Solving environment: done` `## Package Plan ##` `environment location: C:\python-programs\Anaconda3\envs\rtvc` `added / updated specs: - pytorch` `The following packages will be downloaded:` package \| build ---------------------------\|----------------- certifi-2020.4.5.1 \| py37_0 159 KB cudatoolkit-10.1.243 \| h74a9793_0 456.2 MB ninja-1.9.0 \| py37h74a9793_0 263 KB pytorch-1.4.0 \|py3.7_cuda101_cudnn7_0 472.8 MB pytorch ------------------------------------------------------------ Total: 929.4 MB `The following NEW packages will be INSTALLED:` `cudatoolkit pkgs/main/win-64::cudatoolkit-10.1.243-h74a9793_0 ninja pkgs/main/win-64::ninja-1.9.0-py37h74a9793_0 pytorch pytorch/win-64::pytorch-1.4.0-py3.7_cuda101_cudnn7_0` `The following packages will be UPDATED:` `certifi 2019.11.28-py37_1 --> 2020.4.5.1-py37_0`

As I'm running only CPU (no GPU available), requires a CPU PyTorch:

> conda install pytorch torchvision cpuonly -c pytorch (https://pytorch.org/get-started/locally/)

Test PyTorch (from command line, conda env rtvc, python pytorch_hello_world.py)

`pytorch_hello_world.py`
`#https://nestedsoftware.com/2019/08/15/pytorch-hello-world-37mo.156165.html` `import torch import torch.nn as nn import torch.optim as optim` `class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.hidden_layer = nn.Linear(1, 1) self.hidden_layer.weight = torch.nn.Parameter(torch.tensor([[1.58]])) self.hidden_layer.bias = torch.nn.Parameter(torch.tensor([-0.14]))` `self.output_layer = nn.Linear(1, 1) self.output_layer.weight = torch.nn.Parameter(torch.tensor([[2.45]])) self.output_layer.bias = torch.nn.Parameter(torch.tensor([-0.11]))` `def forward(self, x): x = torch.sigmoid(self.hidden_layer(x)) x = torch.sigmoid(self.output_layer(x)) return x` `net = Net() print(f"network topology: {net}")` `print(f"w_l1 = {round(net.hidden_layer.weight.item(), 4)}") print(f"b_l1 = {round(net.hidden_layer.bias.item(), 4)}") print(f"w_l2 = {round(net.output_layer.weight.item(), 4)}") print(f"b_l2 = {round(net.output_layer.bias.item(), 4)}")` `# run input data forward through network input_data = torch.tensor([0.8]) output = net(input_data) print(f"a_l2 = {round(output.item(), 4)}")` `# backpropagate gradient target = torch.tensor([1.]) criterion = nn.MSELoss() loss = criterion(output, target) net.zero_grad() loss.backward()` `# update weights and biases optimizer = optim.SGD(net.parameters(), lr=0.1) optimizer.step()` `print(f"updated_w_l1 = {round(net.hidden_layer.weight.item(), 4)}") print(f"updated_b_l1 = {round(net.hidden_layer.bias.item(), 4)}") print(f"updated_w_l2 = {round(net.output_layer.weight.item(), 4)}") print(f"updated_b_l2 = {round(net.output_layer.bias.item(), 4)}")` `output = net(input_data) print(f"updated_a_l2 = {round(output.item(), 4)}")`

Produces result that matches https://nestedsoftware.com/2019/08/15/pytorch-hello-world-37mo.156165.html

Results (success) `pytorch_hello_world.py`
`C:\python-programs\Anaconda3\envs\rtvc\python.exe "D:/Projects/Voice Cloning/SV2TTS - Corentine/Real Time Voice Cloning (python)/pytorch_hello_world.py" network topology: Net( (hidden_layer): Linear(in_features=1, out_features=1, bias=True) (output_layer): Linear(in_features=1, out_features=1, bias=True) ) w_l1 = 1.58 b_l1 = -0.14 w_l2 = 2.45 b_l2 = -0.11 a_l2 = 0.8506 updated_w_l1 = 1.5814 updated_b_l1 = -0.1383 updated_w_l2 = 2.4529 updated_b_l2 = -0.1062 updated_a_l2 = 0.8515` `Process finished with exit code 0`

>conda install -c zeus1942 umap-learn (https://anaconda.org/zeus1942/umap-learn)

>pip install webrtcvad

`webrtcvad` requires Microsoft C++
`error: Microsoft Visual C++ 14.0 is required. Get it with "Build Tools for Visual Studio": https://visualstudio.microsoft.com/downloads/ ---------------------------------------- ERROR: Failed building wheel for webrtcvad Running setup.py clean for webrtcvad`

Visual Studio Build Tools (https://visualstudio.microsoft.com/downloads/#build-tools-for-visual-studio-2019)
Downloaded and installed, (make sure to install optional CLI)
Rebooted machine

>pip install webrtcvad

Successfully installed webrtcvad-2.0.10

>conda install -c conda-forge librosa

matplotlib already installed

numpy already installed

scipy already installed

>conda install tqdm

>conda install -c conda-forge python-sounddevice

>conda install unidecode

>conda install inflect

~~pyqt already installed~~

pyqt 5.9.2 is not the same as PyQT5 (see below).

>pip install PyQT5

>conda install -c conda-forge multiprocess

>conda install numba

>conda install -c conda-forge visdom

Try Demo

Launch PyCharm, NewProject, select Real Time Voice Cloning master
Acknowledge pop-up dialog box asking to create project from existing sources
Use conda env: rtvc

From CorentinJ Read Me
Preliminary Before you download any dataset, you can begin by testing your configuration with: `python demo_cli.py` If all tests pass, you're good to go.

Launch PyCharm, run demo_cli.py

Results (failure: CPU-only not supported) `demo_cli.py`
OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids. OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-3 OMP: Info #156: KMP_AFFINITY: 4 available OS procs OMP: Info #157: KMP_AFFINITY: Uniform topology OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 1 threads/core (4 total cores) OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map: OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 1 OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 2 OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 3 OMP: Info #250: KMP_AFFINITY: pid 2444 tid 4876 thread 0 bound to OS proc set 0 Arguments: enc_model_fpath: encoder\saved_models\pretrained.pt syn_model_dir: synthesizer\saved_models\logs-pretrained voc_model_fpath: vocoder\saved_models\pretrained\pretrained.pt low_mem: False no_sound: False `Your PyTorch installation is not configured to use CUDA. If you have a GPU ready for deep learning, ensure that the drivers are properly installed, and that your CUDA version matches your PyTorch installation. CPU-only inference is currently not supported. Running a test of your configuration...` `Process finished with exit code -1`

Performed same test on Ubuntu 18.04 installation (VMWare). Same error.

Fix from https://github.com/CorentinJ/Real-Time-Voice-Cloning/issues/174

Fix for failed `demo_cli.py` in CPU-only environment
`JQuezada0 commented on Dec 10, 2019` `Is this project only usable with a nvidia gpu? I'm getting this error as well, but I have an intel gpu so I can't use CUDA.` `Edit: Nevermind, this fork works fine https://github.com/shawwn/Real-Time-Voice-Cloning`

Download source code from https://github.com/shawwn/Real-Time-Voice-Cloning

Run PyCharm demo_cli.py

Results (success) `demo_cli.py`
OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids. OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-3 OMP: Info #156: KMP_AFFINITY: 4 available OS procs OMP: Info #157: KMP_AFFINITY: Uniform topology OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 1 threads/core (4 total cores) OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map: OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 1 OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 2 OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 3 OMP: Info #250: KMP_AFFINITY: pid 944 tid 8284 thread 0 bound to OS proc set 0 Arguments: enc_model_fpath: encoder\saved_models\pretrained.pt syn_model_dir: synthesizer\saved_models\logs-pretrained voc_model_fpath: vocoder\saved_models\pretrained\pretrained.pt low_mem: False no_sound: False `Running a test of your configuration...` Preparing the encoder, the synthesizer and the vocoder... Your PyTorch installation is not configured to use CUDA. If you have a GPU ready for deep learning, ensure that the drivers are properly installed, and that your CUDA version matches your PyTorch installation. CPU-only inference is currently not supported. Traceback (most recent call last): File "D:/Projects/Voice Cloning/SV2TTS - shawwn/Real-Time-Voice-Cloning-master (python)/demo_cli.py", line 61, in <module> encoder.load_model(args.enc_model_fpath) File "D:\Projects\Voice Cloning\SV2TTS - shawwn\Real-Time-Voice-Cloning-master (python)\encoder\inference.py", line 33, in load_model checkpoint = torch.load(weights_fpath, map_location=_device) File "C:\python-programs\Anaconda3\envs\rtvc\lib\site-packages\torch\serialization.py", line 525, in load with _open_file_like(f, 'rb') as opened_file: File "C:\python-programs\Anaconda3\envs\rtvc\lib\site-packages\torch\serialization.py", line 212, in _open_file_like return _open_file(name_or_buffer, mode) File "C:\python-programs\Anaconda3\envs\rtvc\lib\site-packages\torch\serialization.py", line 193, in __init__ super(_open_file, self).__init__(open(name, mode)) FileNotFoundError: [Errno 2] No such file or directory: 'encoder\\saved_models\\pretrained.pt' `Process finished with exit code 1`

Indeed, missing a saved_models folder.

https://github.com/CorentinJ/Real-Time-Voice-Cloning/wiki/Pretrained-models

Pretrained Models
`Pretrained models come as an archive that contains all three models (speaker encoder, synthesizer, vocoder). The archive comes with the same directory structure as the repo, and you're expected to merge its contents with the root of the repository. For reference, the GPUs used for training are GTX 1080 Ti.` `Initial commit (latest release) [Google drive] [MEGA]` `Encoder: trained 1.56M steps (20 days with a single GPU) with a batch size of 64 Synthesizer: trained 256k steps (1 week with 4 GPUs) with a batch size of 144 Vocoder: trained 428k steps (4 days with a single GPU) with a batch size of 100`

Downloaded https://drive.google.com/file/d/1n1sPXvT34yXFLT47QZA6FIRGrwMeSsZc/view and unzipped.

Results (success) `demo_cli.py`
`C:\python-programs\Anaconda3\envs\rtvc\python.exe "D:/Projects/Voice Cloning/SV2TTS - shawwn/Real-Time-Voice-Cloning-master (python)/demo_cli.py"` `(Tensorflow warnings removed...)` OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids. OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-3 OMP: Info #156: KMP_AFFINITY: 4 available OS procs OMP: Info #157: KMP_AFFINITY: Uniform topology OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 1 threads/core (4 total cores) OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map: OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 1 OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 2 OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 3 OMP: Info #250: KMP_AFFINITY: pid 8996 tid 5892 thread 0 bound to OS proc set 0 Arguments: enc_model_fpath: encoder\saved_models\pretrained.pt syn_model_dir: synthesizer\saved_models\logs-pretrained voc_model_fpath: vocoder\saved_models\pretrained\pretrained.pt low_mem: False no_sound: False `Your PyTorch installation is not configured to use CUDA. If you have a GPU ready for deep learning, ensure that the drivers are properly installed, and that your CUDA version matches your PyTorch installation. CPU-only inference is currently not supported. Running a test of your configuration...` Preparing the encoder, the synthesizer and the vocoder... OMP: Info #250: KMP_AFFINITY: pid 8996 tid 424 thread 1 bound to OS proc set 1 OMP: Info #250: KMP_AFFINITY: pid 8996 tid 1888 thread 2 bound to OS proc set 2 OMP: Info #250: KMP_AFFINITY: pid 8996 tid 9136 thread 3 bound to OS proc set 3 Loaded encoder "pretrained.pt" trained to step 1564501 Found synthesizer "pretrained" trained to step 278000 Building Wave-RNN Trainable Parameters: 4.481M Loading model weights at vocoder\saved_models\pretrained\pretrained.pt Testing your configuration with small inputs. Testing the encoder... WARNING:tensorflow:From D:\Projects\Voice Cloning\SV2TTS - shawwn\Real-Time-Voice-Cloning-master (python)\synthesizer\inference.py:57: The name tf.reset_default_graph is deprecated. Please use tf.compat.v1.reset_default_graph instead. `(Tensorflow warnings removed...)` ...initialisation done /gpu:0 Initialized Tacotron model. Dimensions (? = dynamic shape): Train mode: False Eval mode: False GTA mode: False Synthesis mode: True Input: (?, ?) device: 0 embedding: (?, ?, 512) enc conv out: (?, ?, 512) encoder out (cond): (?, ?, 768) decoder out: (?, ?, 80) residual out: (?, ?, 512) projected residual out: (?, ?, 80) mel out: (?, ?, 80) <stop_token> out: (?, ?) Tacotron Parameters 28.439 Million. Loading checkpoint: synthesizer\saved_models\logs-pretrained\taco_pretrained\tacotron_model.ckpt-278000 WARNING:tensorflow:From D:\Projects\Voice Cloning\SV2TTS - shawwn\Real-Time-Voice-Cloning-master (python)\synthesizer\models\tacotron.py:286: The name tf.trainable_variables is deprecated. Please use tf.compat.v1.trainable_variables instead. `2020-04-13 11:12:21.391252: I tensorflow/core/common_runtime/process_util.cc:115] Creating new thread pool with default inter op setting: 4. Tune using inter_op_parallelism_threads for best performance. WARNING:tensorflow:From D:\Projects\Voice Cloning\SV2TTS - shawwn\Real-Time-Voice-Cloning-master (python)\synthesizer\tacotron2.py:62: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.` WARNING:tensorflow:From C:\python-programs\Anaconda3\envs\rtvc\lib\site-packages\tensorflow\python\training\saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file APIs to check for files with this prefix. OMP: Info #250: KMP_AFFINITY: pid 8996 tid 1432 thread 4 bound to OS proc set 0 OMP: Info #250: KMP_AFFINITY: pid 8996 tid 756 thread 5 bound to OS proc set 1 OMP: Info #250: KMP_AFFINITY: pid 8996 tid 2012 thread 6 bound to OS proc set 2 OMP: Info #250: KMP_AFFINITY: pid 8996 tid 7152 thread 7 bound to OS proc set 3 OMP: Info #250: KMP_AFFINITY: pid 8996 tid 7340 thread 8 bound to OS proc set 0 OMP: Info #250: KMP_AFFINITY: pid 8996 tid 7184 thread 9 bound to OS proc set 1 OMP: Info #250: KMP_AFFINITY: pid 8996 tid 4888 thread 10 bound to OS proc set 2 OMP: Info #250: KMP_AFFINITY: pid 8996 tid 3912 thread 11 bound to OS proc set 3 OMP: Info #250: KMP_AFFINITY: pid 8996 tid 4496 thread 12 bound to OS proc set 0 OMP: Info #250: KMP_AFFINITY: pid 8996 tid 8696 thread 13 bound to OS proc set 1 OMP: Info #250: KMP_AFFINITY: pid 8996 tid 124 thread 14 bound to OS proc set 2 OMP: Info #250: KMP_AFFINITY: pid 8996 tid 4064 thread 15 bound to OS proc set 3 OMP: Info #250: KMP_AFFINITY: pid 8996 tid 2920 thread 16 bound to OS proc set 0 OMP: Info #250: KMP_AFFINITY: pid 8996 tid 5044 thread 17 bound to OS proc set 1 OMP: Info #250: KMP_AFFINITY: pid 8996 tid 3748 thread 19 bound to OS proc set 3 OMP: Info #250: KMP_AFFINITY: pid 8996 tid 4052 thread 18 bound to OS proc set 2 Testing the vocoder... All test passed! You can now synthesize speech. `This is a GUI-less example of interface to SV2TTS. The purpose of this script is to show how you can interface this project easily with your own. See the source code for an explanation of what is happening.` `Interactive generation loop Reference voice: enter an audio filepath of a voice to be cloned (mp3, wav, m4a, flac, ...):`

When pointing to my mp3 file, error: Caught exception: NoBackendError()

Appears to be related to librosa missing the OGG codec

`test_librosa.py`
`import librosa` `y, sr = librosa.load(librosa.util.example_audio_file())`

No Backend Error for `librosa`
Traceback (most recent call last): File "D:/Projects/Voice Cloning/TestLibRosa (python)/test_librosa.py", line 3, in <module> y, sr = librosa.load(librosa.util.example_audio_file()) File "C:\python-programs\Anaconda3\envs\rtvc\lib\site-packages\librosa\core\audio.py", line 119, in load with audioread.audio_open(os.path.realpath(path)) as input_file: File "C:\python-programs\Anaconda3\envs\rtvc\lib\site-packages\audioread\__init__.py", line 116, in audio_open raise NoBackendError() audioread.exceptions.NoBackendError `Process finished with exit code 1`

Installed ffmpeg as recommended here: https://github.com/librosa/librosa/issues/219

Unzipped ffmpeg-4.2.2-win64-static.zip, pointed Windows PATH to ffmpeg-4.2.2-win64-static\bin

Rebooted VM.

Test *ffmpeg* added to Windows path
C:\Users\aholiday>echo %PATH% C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Windows\System32\OpenSSH\;C:\java-programs\jdk-11.0.3\bin\java.exe;D:\Projects\Voice Cloning\ffmpeg-4.2.2-win64-static\bin;C:\python-programs\Anaconda3;C:\python-programs\Anaconda3\Library\mingw-w64\bin;C:\python-programs\Anaconda3\Library\usr\bin;C:\python-programs\Anaconda3\Library\bin;C:\python-programs\Anaconda3\Scripts;C:\Users\aholiday\AppData\Local\Microsoft\WindowsApps C:\Users\aholiday>ffmpeg ffmpeg version 4.2.2 Copyright (c) 2000-2019 the FFmpeg developers built with gcc 9.2.1 (GCC) 20200122 configuration: --enable-gpl --enable-version3 --enable-sdl2 --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libdav1d --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvorbis --enable-libvo-amrwbenc --enable-libmysofa --enable-libspeex --enable-libxvid --enable-libaom --enable-libmfx --enable-amf --enable-ffnvcodec --enable-cuvid --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-avisynth --enable-libopenmpt libavutil 56. 31.100 / 56. 31.100 libavcodec 58. 54.100 / 58. 54.100 libavformat 58. 29.100 / 58. 29.100 libavdevice 58. 8.100 / 58. 8.100 libavfilter 7. 57.100 / 7. 57.100 libswscale 5. 5.100 / 5. 5.100 libswresample 3. 5.100 / 3. 5.100 libpostproc 55. 5.100 / 55. 5.100 Hyper fast Audio and Video encoder usage: ffmpeg [options] [[infile options] -i infile]... {[outfile options] outfile}... `Use -h to get full help or, even better, run 'man ffmpeg'` `C:\Users\aholiday>`

Rerun test_librosa.py

Results (success) test_librosa.py
`C:\python-programs\Anaconda3\envs\rtvc\python.exe "D:/Projects/Voice Cloning/TestLibRosa (python)/test_librosa.py"` `Process finished with exit code 0`

Running demo_cli.py with ffmpeg installed, using 38MB mp3 voice sample file

Results (failure: low memory error with) `demo_cli.py`
`This is a GUI-less example of interface to SV2TTS. The purpose of this script is to show how you can interface this project easily with your own. See the source code for an explanation of what is happening.` `Interactive generation loop Reference voice: enter an audio filepath of a voice to be cloned (mp3, wav, m4a, flac, ...): ..\..\voicefiles\ADH_sample.mp3 Loaded file succesfully Caught exception: RuntimeError('[enforce fail at ..\\c10\\core\\CPUAllocator.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 1005568 bytes. Buy new RAM!\n') Restarting`

Current memory: 6.1 GB with no memory disk cache

Changed to 12.1 GB with no memory disk cache, 9.4 GB available with PyCharm launched.

Results (success) `demo_cli.py`
`This is a GUI-less example of interface to SV2TTS. The purpose of this script is to show how you can interface this project easily with your own. See the source code for an explanation of what is happening.` `Interactive generation loop Reference voice: enter an audio filepath of a voice to be cloned (mp3, wav, m4a, flac, ...): ..\..\voicefiles\ADH_sample.mp3 Loaded file succesfully Created the embedding Write a sentence (+-20 words) to be synthesized: Hi there, this is Austin Created the mel spectrogram Synthesizing the waveform: {\| ████████████████ 76000/76800 \| Batch Size: 8 \| Gen Rate: 0.9kHz \| }float64` `Saved output as demo_output_00.wav`

Run: demo_toolbox.py

Results (failure: `PyQT5` Error) `demo_toolbox.py`
Traceback (most recent call last): File "D:/Projects/Voice Cloning/SV2TTS - shawwn/Real-Time-Voice-Cloning-master (python)/demo_toolbox.py", line 2, in <module> from toolbox import Toolbox File "D:\Projects\Voice Cloning\SV2TTS - shawwn\Real-Time-Voice-Cloning-master (python)\toolbox\__init__.py", line 1, in <module> from toolbox.ui import UI File "D:\Projects\Voice Cloning\SV2TTS - shawwn\Real-Time-Voice-Cloning-master (python)\toolbox\ui.py", line 1, in <module> from matplotlib.backends.backend_qt5agg import FigureCanvasQTAgg as FigureCanvas File "C:\python-programs\Anaconda3\envs\rtvc\lib\site-packages\matplotlib\backends\backend_qt5agg.py", line 11, in <module> from .backend_qt5 import ( File "C:\python-programs\Anaconda3\envs\rtvc\lib\site-packages\matplotlib\backends\backend_qt5.py", line 15, in <module> import matplotlib.backends.qt_editor.figureoptions as figureoptions File "C:\python-programs\Anaconda3\envs\rtvc\lib\site-packages\matplotlib\backends\qt_editor\figureoptions.py", line 12, in <module> from matplotlib.backends.qt_compat import QtGui File "C:\python-programs\Anaconda3\envs\rtvc\lib\site-packages\matplotlib\backends\qt_compat.py", line 168, in <module> raise ImportError("Failed to import any qt binding") ImportError: Failed to import any qt binding `Process finished with exit code 1`

`>conda list pyqt*`
`# Name Version Build Channel` `pyqt 5.9.2 py37h6538335_4 conda-forge`

Apparently pyqt v5.9.2 is not the same same as PyQT5.

>pip install PyQT5

>conda list pyqt*
`# Name Version Build Channel pyqt 5.9.2 py37h6538335_4 conda-forge pyqt5 5.14.2 pypi_0 pypi pyqt5-sip 12.7.2 pypi_0 pypi`

Run: demo_toolbox.py

Results (success, but missing dataset) `demo_toolbox.py`
`WARNING:tensorflow:From D:\Projects\Voice Cloning\SV2TTS - shawwn\Real-Time-Voice-Cloning-master (python)\synthesizer\models\modules.py:91: The name tf.nn.rnn_cell.RNNCell is deprecated. Please use tf.compat.v1.nn.rnn_cell.RNNCell instead.` `Arguments: datasets_root: None enc_models_dir: encoder\saved_models syn_models_dir: synthesizer\saved_models voc_models_dir: vocoder\saved_models low_mem: False` Warning: you did not pass a root directory for datasets as argument. The recognized datasets are: LibriSpeech/dev-clean LibriSpeech/dev-other LibriSpeech/test-clean LibriSpeech/test-other LibriSpeech/train-clean-100 LibriSpeech/train-clean-360 LibriSpeech/train-other-500 LibriTTS/dev-clean LibriTTS/dev-other LibriTTS/test-clean LibriTTS/test-other LibriTTS/train-clean-100 LibriTTS/train-clean-360 LibriTTS/train-other-500 LJSpeech-1.1 VoxCeleb1/wav VoxCeleb1/test_wav VoxCeleb2/dev/aac VoxCeleb2/test/aac VCTK-Corpus/wav48 Feel free to add your own. You can still use the toolbox by recording samples yourself.

Conda Environment

`>conda list`
(rtvc) C:\Users\aholiday>conda list # packages in environment at C:\python-programs\Anaconda3\envs\rtvc: # # Name Version Build Channel _tflow_select 2.3.0 mkl absl-py 0.9.0 py37hc8dfbb8_1 conda-forge astor 0.7.1 py_0 conda-forge audioread 2.1.8 py37hc8dfbb8_2 conda-forge blas 1.0 mkl ca-certificates 2020.4.5.1 hecc5488_0 conda-forge certifi 2020.4.5.1 py37hc8dfbb8_0 conda-forge cffi 1.14.0 py37h7a1dbc1_0 chardet 3.0.4 py37hc8dfbb8_1006 conda-forge cpuonly 1.0 0 pytorch cryptography 2.8 py37hb32ad35_1 conda-forge cycler 0.10.0 py_2 conda-forge decorator 4.4.2 py_0 conda-forge dill 0.3.1.1 py37hc8dfbb8_1 conda-forge freetype 2.9.1 ha9979f8_1 gast 0.3.3 py_0 conda-forge grpcio 1.23.0 py37h3f65fb1_1 conda-forge h5py 2.10.0 nompi_py37h422b98e_102 conda-forge hdf5 1.10.5 nompi_ha405e13_1104 conda-forge icc_rt 2019.0.0 h0cc432a_1 icu 58.2 ha66f8fd_1 idna 2.9 py_1 conda-forge importlib_metadata 1.5.0 py37_0 inflect 4.1.0 py37_0 intel-openmp 2019.4 245 joblib 0.14.1 py_0 jpeg 9b hb83a4c4_2 keras-applications 1.0.8 py_1 conda-forge keras-preprocessing 1.1.0 py_0 conda-forge kiwisolver 1.2.0 py37heaa310e_0 conda-forge libblas 3.8.0 14_mkl conda-forge libcblas 3.8.0 14_mkl conda-forge liblapack 3.8.0 14_mkl conda-forge libmklml 2019.0.5 0 libpng 1.6.37 h2a8f88b_0 libprotobuf 3.11.4 h1a1b453_0 conda-forge librosa 0.6.3 py_0 conda-forge libsodium 1.0.17 h2fa13f4_0 conda-forge libtiff 4.1.0 h56a325e_0 llvmlite 0.31.0 py37ha925a31_0 m2w64-gcc-libgfortran 5.3.0 6 m2w64-gcc-libs 5.3.0 7 m2w64-gcc-libs-core 5.3.0 7 m2w64-gmp 6.1.0 2 m2w64-libwinpthread-git 5.0.0.4634.697f757 2 markdown 3.2.1 py_0 conda-forge matplotlib 3.2.1 0 conda-forge matplotlib-base 3.2.1 py37h911224e_0 conda-forge mkl 2019.4 245 mkl-service 2.3.0 py37hfa6e2cd_0 conda-forge msys2-conda-epoch 20160418 1 multiprocess 0.70.9 py37h8055547_1 conda-forge ninja 1.9.0 py37h74a9793_0 numba 0.48.0 py37h47e9c7a_0 numpy 1.18.1 py37h90d3380_1 conda-forge olefile 0.46 py37_0 openssl 1.1.1f hfa6e2cd_0 conda-forge pillow 7.0.0 py37hcc1f983_0 pip 20.0.2 py37_1 portaudio 19.6.0 hca4a3dc_2 conda-forge protobuf 3.11.4 py37h5fe3f0a_1 conda-forge pycparser 2.20 py_0 pyopenssl 19.1.0 py_1 conda-forge pyparsing 2.4.7 pyh9f0ad1d_0 conda-forge pyqt 5.9.2 py37h6538335_4 conda-forge pyqt5 5.14.2 pypi_0 pypi pyqt5-sip 12.7.2 pypi_0 pypi pyreadline 2.1 py37_1001 conda-forge pysocks 1.7.1 py37hc8dfbb8_1 conda-forge python 3.7.7 h60c2a47_0_cpython python-dateutil 2.8.1 py_0 conda-forge python-sounddevice 0.3.15 pyh8c360ce_0 conda-forge python_abi 3.7 1_cp37m conda-forge pytorch 1.4.0 py3.7_cpu_0 [cpuonly] pytorch pyzmq 19.0.0 py37h8c16cda_1 conda-forge qt 5.9.7 vc14h73c81de_0 requests 2.23.0 pyh8c360ce_2 conda-forge resampy 0.2.2 py_0 conda-forge scikit-learn 0.22.1 py37h6288b17_0 scipy 1.3.1 py37h29ff71c_0 conda-forge setuptools 46.1.3 py37_0 sip 4.19.8 py37h6538335_1000 conda-forge six 1.14.0 py_1 conda-forge sqlite 3.31.1 he774522_0 tbb 2020.0 h74a9793_0 tensorboard 1.14.0 py37_0 conda-forge tensorflow 1.14.0 mkl_py37h7908ca0_0 tensorflow-base 1.14.0 mkl_py37ha978198_0 tensorflow-estimator 1.14.0 py37h5ca1d4c_0 conda-forge termcolor 1.1.0 py_2 conda-forge tk 8.6.8 hfa6e2cd_0 torchfile 0.1.0 py_0 conda-forge torchvision 0.5.0 py37_cpu [cpuonly] pytorch tornado 6.0.4 py37hfa6e2cd_0 conda-forge tqdm 4.44.1 py_0 umap-learn 0.3.10 py37_1 zeus1942 unidecode 1.1.1 py_0 urllib3 1.25.8 py37hc8dfbb8_1 conda-forge vc 14.1 h0510ff6_4 visdom 0.1.8.9 0 conda-forge vs2015_runtime 14.16.27012 hf0eaf9b_1 webrtcvad 2.0.10 pypi_0 pypi websocket-client 0.57.0 py37hc8dfbb8_1 conda-forge werkzeug 1.0.1 pyh9f0ad1d_0 conda-forge wheel 0.34.2 py37_0 win_inet_pton 1.1.0 py37_0 conda-forge wincertstore 0.2 py37_0 wrapt 1.12.1 py37h8055547_1 conda-forge xz 5.2.4 h2fa13f4_4 zeromq 4.3.2 h6538335_2 conda-forge zipp 2.2.0 py_0 zlib 1.2.11 h2fa13f4_1006 conda-forge zstd 1.3.7 h508b16e_0

Step by Step: Real Time Voice Cloning Demo Setup