Step by Step: Real Time Voice Cloning Demo Setup

Submitted by Xilodyne on Tue, 04/14/2020 - 17:26
demo logo

Step by Step Guide

Installing and Running the Real Time Voice Cloning demo from CorentinJ

on Windows 10 Pro, CPU only

 

TL;DR

 

From your conda prompt, install the files, navigate to your code, run python demo_cli.py:

Create and run a conda environment

(base) > conda create --name rtvc --file spec-file_rtvc.txt
(base) > conda activate rtvc
(rtvc) > pip install webrtcvad
(rtvc) > pip install PyQT5
(rtvc) > python demo_cli.py

 

Source Code and Instructions

https://github.com/CorentinJ/Real-Time-Voice-Cloning

Download GitHub zip file:  https://github.com/CorentinJ/Real-Time-Voice-Cloning.git

Note:  Didn't work correctly with CPU environment (see below), use forked download from: https://github.com/shawwn/Real-Time-Voice-Cloning

 

Development Environment

Windows 10 Pro in VMWare Image

PyCharm 2020.1 (Community Edition)

Anaconda3-2020-02 x64

 

Steps

In development folder, unzip Real-Time-Voice-Cloning-master.zip

Directions state to run pip install -r requirements.txt to install the necessary packages.  However, will install manually.

 

With Anaconda prompt (Windows Key --> Anaconda3 (64-bit)  --> Anaconda Prompt)

 

Create work environment

>conda create --name rtvc python=3.7

>conda activate rtvc

 

Install packages from requirements.txt

>conda install -c conda-forge tensorflow=1.14

Note: installing tensorflow (i.e. tensorflow-cpu) for eval purposes

  • Test tensorflow install
test_tf-1.0.py (Tensorflow Hello World)
import tensorflow as tf

import os

os.environ['CUDA_VISIBLE_DEVICES'] = "0"

hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))

#if no gpu:
#CUDA_VISIBLE_DEVICES='0'

 

Results (success) test_tf-1.0.py

2020-04-13 09:42:06.769691: I tensorflow/core/common_runtime/process_util.cc:115] Creating new thread pool with default inter op setting: 4. Tune using inter_op_parallelism_threads for best performance.
b'Hello, TensorFlow!'

Process finished with exit code 0

 

> conda install -c pytorch pytorch  (https://anaconda.org/pytorch/pytorch)

PyTorch Defaults to GPU

Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: C:\python-programs\Anaconda3\envs\rtvc

  added / updated specs:
    - pytorch


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2020.4.5.1         |           py37_0         159 KB
    cudatoolkit-10.1.243       |       h74a9793_0       456.2 MB
    ninja-1.9.0                |   py37h74a9793_0         263 KB
    pytorch-1.4.0              |py3.7_cuda101_cudnn7_0       472.8 MB  pytorch
    ------------------------------------------------------------
                                           Total:       929.4 MB

The following NEW packages will be INSTALLED:

  cudatoolkit        pkgs/main/win-64::cudatoolkit-10.1.243-h74a9793_0
  ninja              pkgs/main/win-64::ninja-1.9.0-py37h74a9793_0
  pytorch            pytorch/win-64::pytorch-1.4.0-py3.7_cuda101_cudnn7_0

The following packages will be UPDATED:

  certifi                                 2019.11.28-py37_1 --> 2020.4.5.1-py37_0

 

As I'm running only CPU (no GPU available), requires a CPU PyTorch:

> conda install pytorch torchvision cpuonly -c pytorch  (https://pytorch.org/get-started/locally/)

  • Test PyTorch (from command line, conda env rtvc, python pytorch_hello_world.py)
pytorch_hello_world.py

#https://nestedsoftware.com/2019/08/15/pytorch-hello-world-37mo.156165.html

import torch
import torch.nn as nn
import torch.optim as optim


class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.hidden_layer = nn.Linear(1, 1)
        self.hidden_layer.weight = torch.nn.Parameter(torch.tensor([[1.58]]))
        self.hidden_layer.bias = torch.nn.Parameter(torch.tensor([-0.14]))

        self.output_layer = nn.Linear(1, 1)
        self.output_layer.weight = torch.nn.Parameter(torch.tensor([[2.45]]))
        self.output_layer.bias = torch.nn.Parameter(torch.tensor([-0.11]))

    def forward(self, x):
        x = torch.sigmoid(self.hidden_layer(x))
        x = torch.sigmoid(self.output_layer(x))
        return x


net = Net()
print(f"network topology: {net}")

print(f"w_l1 = {round(net.hidden_layer.weight.item(), 4)}")
print(f"b_l1 = {round(net.hidden_layer.bias.item(), 4)}")
print(f"w_l2 = {round(net.output_layer.weight.item(), 4)}")
print(f"b_l2 = {round(net.output_layer.bias.item(), 4)}")

# run input data forward through network
input_data = torch.tensor([0.8])
output = net(input_data)
print(f"a_l2 = {round(output.item(), 4)}")

# backpropagate gradient
target = torch.tensor([1.])
criterion = nn.MSELoss()
loss = criterion(output, target)
net.zero_grad()
loss.backward()

# update weights and biases
optimizer = optim.SGD(net.parameters(), lr=0.1)
optimizer.step()

print(f"updated_w_l1 = {round(net.hidden_layer.weight.item(), 4)}")
print(f"updated_b_l1 = {round(net.hidden_layer.bias.item(), 4)}")
print(f"updated_w_l2 = {round(net.output_layer.weight.item(), 4)}")
print(f"updated_b_l2 = {round(net.output_layer.bias.item(), 4)}")

output = net(input_data)
print(f"updated_a_l2 = {round(output.item(), 4)}")

Produces result that matches https://nestedsoftware.com/2019/08/15/pytorch-hello-world-37mo.156165.html

Results (success) pytorch_hello_world.py

C:\python-programs\Anaconda3\envs\rtvc\python.exe "D:/Projects/Voice Cloning/SV2TTS - Corentine/Real Time Voice Cloning (python)/pytorch_hello_world.py"
network topology: Net(
  (hidden_layer): Linear(in_features=1, out_features=1, bias=True)
  (output_layer): Linear(in_features=1, out_features=1, bias=True)
)
w_l1 = 1.58
b_l1 = -0.14
w_l2 = 2.45
b_l2 = -0.11
a_l2 = 0.8506
updated_w_l1 = 1.5814
updated_b_l1 = -0.1383
updated_w_l2 = 2.4529
updated_b_l2 = -0.1062
updated_a_l2 = 0.8515

Process finished with exit code 0

>conda install -c zeus1942 umap-learn  (https://anaconda.org/zeus1942/umap-learn)

>pip install webrtcvad

webrtcvad requires Microsoft C++
error: Microsoft Visual C++ 14.0 is required. Get it with "Build Tools for Visual Studio": https://visualstudio.microsoft.com/downloads/
  ----------------------------------------
  ERROR: Failed building wheel for webrtcvad
  Running setup.py clean for webrtcvad

>pip install webrtcvad

Successfully installed webrtcvad-2.0.10

>conda install -c conda-forge librosa

matplotlib already installed

numpy already installed

scipy already installed

>conda install tqdm

>conda install -c conda-forge python-sounddevice

>conda install unidecode

>conda install inflect

pyqt already installed

pyqt 5.9.2 is not the same as PyQT5 (see below).

>pip install PyQT5

>conda install -c conda-forge multiprocess

>conda install numba

>conda install -c conda-forge visdom

Try Demo

  • Launch PyCharmNewProject, select Real Time Voice Cloning master
  • Acknowledge pop-up dialog box asking to create project from  existing sources
  • Use conda env:  rtvc

 

From CorentinJ Read Me

Preliminary

Before you download any dataset, you can begin by testing your configuration with:
python demo_cli.py
If all tests pass, you're good to go.

Launch PyCharm, run demo_cli.py

Results (failure: CPU-only not supported) demo_cli.py

OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-3
OMP: Info #156: KMP_AFFINITY: 4 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 1 threads/core (4 total cores)
OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 1
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 2
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 3
OMP: Info #250: KMP_AFFINITY: pid 2444 tid 4876 thread 0 bound to OS proc set 0
Arguments:
    enc_model_fpath:   encoder\saved_models\pretrained.pt
    syn_model_dir:     synthesizer\saved_models\logs-pretrained
    voc_model_fpath:   vocoder\saved_models\pretrained\pretrained.pt
    low_mem:           False
    no_sound:          False

Your PyTorch installation is not configured to use CUDA. If you have a GPU ready for deep learning, ensure that the drivers are properly installed, and that your CUDA version matches your PyTorch installation. CPU-only inference is currently not supported.
Running a test of your configuration...


Process finished with exit code -1

Performed same test on Ubuntu 18.04 installation (VMWare).  Same error.

Fix from https://github.com/CorentinJ/Real-Time-Voice-Cloning/issues/174

Fix for failed demo_cli.py in CPU-only environment

JQuezada0 commented on Dec 10, 2019

Is this project only usable with a nvidia gpu? I'm getting this error as well, but I have an intel gpu so I can't use CUDA.

Edit: Nevermind, this fork works fine https://github.com/shawwn/Real-Time-Voice-Cloning

Download source code from https://github.com/shawwn/Real-Time-Voice-Cloning

Run PyCharm demo_cli.py

Results (success) demo_cli.py

OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-3
OMP: Info #156: KMP_AFFINITY: 4 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 1 threads/core (4 total cores)
OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 1
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 2
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 3
OMP: Info #250: KMP_AFFINITY: pid 944 tid 8284 thread 0 bound to OS proc set 0
Arguments:
    enc_model_fpath:   encoder\saved_models\pretrained.pt
    syn_model_dir:     synthesizer\saved_models\logs-pretrained
    voc_model_fpath:   vocoder\saved_models\pretrained\pretrained.pt
    low_mem:           False
    no_sound:          False

Running a test of your configuration...

Preparing the encoder, the synthesizer and the vocoder...
Your PyTorch installation is not configured to use CUDA. If you have a GPU ready for deep learning, ensure that the drivers are properly installed, and that your CUDA version matches your PyTorch installation. CPU-only inference is currently not supported.
Traceback (most recent call last):
  File "D:/Projects/Voice Cloning/SV2TTS - shawwn/Real-Time-Voice-Cloning-master (python)/demo_cli.py", line 61, in <module>
    encoder.load_model(args.enc_model_fpath)
  File "D:\Projects\Voice Cloning\SV2TTS - shawwn\Real-Time-Voice-Cloning-master (python)\encoder\inference.py", line 33, in load_model
    checkpoint = torch.load(weights_fpath, map_location=_device)
  File "C:\python-programs\Anaconda3\envs\rtvc\lib\site-packages\torch\serialization.py", line 525, in load
    with _open_file_like(f, 'rb') as opened_file:
  File "C:\python-programs\Anaconda3\envs\rtvc\lib\site-packages\torch\serialization.py", line 212, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "C:\python-programs\Anaconda3\envs\rtvc\lib\site-packages\torch\serialization.py", line 193, in __init__
    super(_open_file, self).__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'encoder\\saved_models\\pretrained.pt'

Process finished with exit code 1

Indeed, missing a saved_models folder.

https://github.com/CorentinJ/Real-Time-Voice-Cloning/wiki/Pretrained-models

Pretrained Models

Pretrained models come as an archive that contains all three models (speaker encoder, synthesizer, vocoder). The archive comes with the same directory structure as the repo, and you're expected to merge its contents with the root of the repository. For reference, the GPUs used for training are GTX 1080 Ti.

Initial commit (latest release) [Google drive] [MEGA]

Encoder: trained 1.56M steps (20 days with a single GPU) with a batch size of 64
Synthesizer: trained 256k steps (1 week with 4 GPUs) with a batch size of 144
Vocoder: trained 428k steps (4 days with a single GPU) with a batch size of 100

Downloaded https://drive.google.com/file/d/1n1sPXvT34yXFLT47QZA6FIRGrwMeSsZc/view and unzipped.

Results (success) demo_cli.py

C:\python-programs\Anaconda3\envs\rtvc\python.exe "D:/Projects/Voice Cloning/SV2TTS - shawwn/Real-Time-Voice-Cloning-master (python)/demo_cli.py"

(Tensorflow warnings removed...)

OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-3
OMP: Info #156: KMP_AFFINITY: 4 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 1 threads/core (4 total cores)
OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 1
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 2
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 3
OMP: Info #250: KMP_AFFINITY: pid 8996 tid 5892 thread 0 bound to OS proc set 0
Arguments:
    enc_model_fpath:   encoder\saved_models\pretrained.pt
    syn_model_dir:     synthesizer\saved_models\logs-pretrained
    voc_model_fpath:   vocoder\saved_models\pretrained\pretrained.pt
    low_mem:           False
    no_sound:          False

Your PyTorch installation is not configured to use CUDA. If you have a GPU ready for deep learning, ensure that the drivers are properly installed, and that your CUDA version matches your PyTorch installation. CPU-only inference is currently not supported.
Running a test of your configuration...

Preparing the encoder, the synthesizer and the vocoder...
OMP: Info #250: KMP_AFFINITY: pid 8996 tid 424 thread 1 bound to OS proc set 1
OMP: Info #250: KMP_AFFINITY: pid 8996 tid 1888 thread 2 bound to OS proc set 2
OMP: Info #250: KMP_AFFINITY: pid 8996 tid 9136 thread 3 bound to OS proc set 3
Loaded encoder "pretrained.pt" trained to step 1564501
Found synthesizer "pretrained" trained to step 278000
Building Wave-RNN
Trainable Parameters: 4.481M
Loading model weights at vocoder\saved_models\pretrained\pretrained.pt
Testing your configuration with small inputs.
    Testing the encoder...
WARNING:tensorflow:From D:\Projects\Voice Cloning\SV2TTS - shawwn\Real-Time-Voice-Cloning-master (python)\synthesizer\inference.py:57: The name tf.reset_default_graph is deprecated. Please use tf.compat.v1.reset_default_graph instead.

(Tensorflow warnings removed...)

...initialisation done /gpu:0
Initialized Tacotron model. Dimensions (? = dynamic shape):
  Train mode:               False
  Eval mode:                False
  GTA mode:                 False
  Synthesis mode:           True
  Input:                    (?, ?)
  device:                   0
  embedding:                (?, ?, 512)
  enc conv out:             (?, ?, 512)
  encoder out (cond):       (?, ?, 768)
  decoder out:              (?, ?, 80)
  residual out:             (?, ?, 512)
  projected residual out:   (?, ?, 80)
  mel out:                  (?, ?, 80)
  <stop_token> out:         (?, ?)
  Tacotron Parameters       28.439 Million.
Loading checkpoint: synthesizer\saved_models\logs-pretrained\taco_pretrained\tacotron_model.ckpt-278000
WARNING:tensorflow:From D:\Projects\Voice Cloning\SV2TTS - shawwn\Real-Time-Voice-Cloning-master (python)\synthesizer\models\tacotron.py:286: The name tf.trainable_variables is deprecated. Please use tf.compat.v1.trainable_variables instead.

2020-04-13 11:12:21.391252: I tensorflow/core/common_runtime/process_util.cc:115] Creating new thread pool with default inter op setting: 4. Tune using inter_op_parallelism_threads for best performance.
WARNING:tensorflow:From D:\Projects\Voice Cloning\SV2TTS - shawwn\Real-Time-Voice-Cloning-master (python)\synthesizer\tacotron2.py:62: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.

WARNING:tensorflow:From C:\python-programs\Anaconda3\envs\rtvc\lib\site-packages\tensorflow\python\training\saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
OMP: Info #250: KMP_AFFINITY: pid 8996 tid 1432 thread 4 bound to OS proc set 0
OMP: Info #250: KMP_AFFINITY: pid 8996 tid 756 thread 5 bound to OS proc set 1
OMP: Info #250: KMP_AFFINITY: pid 8996 tid 2012 thread 6 bound to OS proc set 2
OMP: Info #250: KMP_AFFINITY: pid 8996 tid 7152 thread 7 bound to OS proc set 3
OMP: Info #250: KMP_AFFINITY: pid 8996 tid 7340 thread 8 bound to OS proc set 0
OMP: Info #250: KMP_AFFINITY: pid 8996 tid 7184 thread 9 bound to OS proc set 1
OMP: Info #250: KMP_AFFINITY: pid 8996 tid 4888 thread 10 bound to OS proc set 2
OMP: Info #250: KMP_AFFINITY: pid 8996 tid 3912 thread 11 bound to OS proc set 3
OMP: Info #250: KMP_AFFINITY: pid 8996 tid 4496 thread 12 bound to OS proc set 0
OMP: Info #250: KMP_AFFINITY: pid 8996 tid 8696 thread 13 bound to OS proc set 1
OMP: Info #250: KMP_AFFINITY: pid 8996 tid 124 thread 14 bound to OS proc set 2
OMP: Info #250: KMP_AFFINITY: pid 8996 tid 4064 thread 15 bound to OS proc set 3
OMP: Info #250: KMP_AFFINITY: pid 8996 tid 2920 thread 16 bound to OS proc set 0
OMP: Info #250: KMP_AFFINITY: pid 8996 tid 5044 thread 17 bound to OS proc set 1
OMP: Info #250: KMP_AFFINITY: pid 8996 tid 3748 thread 19 bound to OS proc set 3
OMP: Info #250: KMP_AFFINITY: pid 8996 tid 4052 thread 18 bound to OS proc set 2
    Testing the vocoder...
All test passed! You can now synthesize speech.


This is a GUI-less example of interface to SV2TTS. The purpose of this script is to show how you can interface this project easily with your own. See the source code for an explanation of what is happening.

Interactive generation loop
Reference voice: enter an audio filepath of a voice to be cloned (mp3, wav, m4a, flac, ...):

When pointing to my mp3 file,   error: Caught exception: NoBackendError()

Appears to be related to librosa missing the OGG codec

test_librosa.py

import librosa

y, sr = librosa.load(librosa.util.example_audio_file())

 

No Backend Error for librosa

Traceback (most recent call last):
  File "D:/Projects/Voice Cloning/TestLibRosa (python)/test_librosa.py", line 3, in <module>
    y, sr = librosa.load(librosa.util.example_audio_file())
  File "C:\python-programs\Anaconda3\envs\rtvc\lib\site-packages\librosa\core\audio.py", line 119, in load
    with audioread.audio_open(os.path.realpath(path)) as input_file:
  File "C:\python-programs\Anaconda3\envs\rtvc\lib\site-packages\audioread\__init__.py", line 116, in audio_open
    raise NoBackendError()
audioread.exceptions.NoBackendError

Process finished with exit code 1

Installed ffmpeg as recommended here: https://github.com/librosa/librosa/issues/219

Unzipped ffmpeg-4.2.2-win64-static.zip, pointed Windows PATH to ffmpeg-4.2.2-win64-static\bin

Rebooted VM.

Test ffmpeg added to Windows path

C:\Users\aholiday>echo %PATH%
C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Windows\System32\OpenSSH\;C:\java-programs\jdk-11.0.3\bin\java.exe;D:\Projects\Voice Cloning\ffmpeg-4.2.2-win64-static\bin;C:\python-programs\Anaconda3;C:\python-programs\Anaconda3\Library\mingw-w64\bin;C:\python-programs\Anaconda3\Library\usr\bin;C:\python-programs\Anaconda3\Library\bin;C:\python-programs\Anaconda3\Scripts;C:\Users\aholiday\AppData\Local\Microsoft\WindowsApps

C:\Users\aholiday>ffmpeg
ffmpeg version 4.2.2 Copyright (c) 2000-2019 the FFmpeg developers
  built with gcc 9.2.1 (GCC) 20200122
  configuration: --enable-gpl --enable-version3 --enable-sdl2 --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libdav1d --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvorbis --enable-libvo-amrwbenc --enable-libmysofa --enable-libspeex --enable-libxvid --enable-libaom --enable-libmfx --enable-amf --enable-ffnvcodec --enable-cuvid --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-avisynth --enable-libopenmpt
  libavutil      56. 31.100 / 56. 31.100
  libavcodec     58. 54.100 / 58. 54.100
  libavformat    58. 29.100 / 58. 29.100
  libavdevice    58.  8.100 / 58.  8.100
  libavfilter     7. 57.100 /  7. 57.100
  libswscale      5.  5.100 /  5.  5.100
  libswresample   3.  5.100 /  3.  5.100
  libpostproc    55.  5.100 / 55.  5.100
Hyper fast Audio and Video encoder
usage: ffmpeg [options] [[infile options] -i infile]... {[outfile options] outfile}...

Use -h to get full help or, even better, run 'man ffmpeg'

C:\Users\aholiday>

Rerun test_librosa.py

Results (success) test_librosa.py

C:\python-programs\Anaconda3\envs\rtvc\python.exe "D:/Projects/Voice Cloning/TestLibRosa (python)/test_librosa.py"

Process finished with exit code 0

Running demo_cli.py with ffmpeg installed, using 38MB mp3 voice sample file

Results (failure: low memory error with) demo_cli.py

This is a GUI-less example of interface to SV2TTS. The purpose of this script is to show how you can interface this project easily with your own. See the source code for an explanation of what is happening.

Interactive generation loop
Reference voice: enter an audio filepath of a voice to be cloned (mp3, wav, m4a, flac, ...):
..\..\voicefiles\ADH_sample.mp3
Loaded file succesfully
Caught exception: RuntimeError('[enforce fail at ..\\c10\\core\\CPUAllocator.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 1005568 bytes. Buy new RAM!\n')
Restarting

Current memory:  6.1 GB with no memory disk cache

Changed to 12.1 GB with no memory disk cache, 9.4 GB available with PyCharm launched.

Results (success) demo_cli.py

This is a GUI-less example of interface to SV2TTS. The purpose of this script is to show how you can interface this project easily with your own. See the source code for an explanation of what is happening.

Interactive generation loop
Reference voice: enter an audio filepath of a voice to be cloned (mp3, wav, m4a, flac, ...):
..\..\voicefiles\ADH_sample.mp3
Loaded file succesfully
Created the embedding
Write a sentence (+-20 words) to be synthesized:
Hi there, this is Austin
Created the mel spectrogram
Synthesizing the waveform:
{| ████████████████ 76000/76800 | Batch Size: 8 | Gen Rate: 0.9kHz | }float64

Saved output as demo_output_00.wav

Run:  demo_toolbox.py

Results (failure: PyQT5 Error) demo_toolbox.py

Traceback (most recent call last):
  File "D:/Projects/Voice Cloning/SV2TTS - shawwn/Real-Time-Voice-Cloning-master (python)/demo_toolbox.py", line 2, in <module>
    from toolbox import Toolbox
  File "D:\Projects\Voice Cloning\SV2TTS - shawwn\Real-Time-Voice-Cloning-master (python)\toolbox\__init__.py", line 1, in <module>
    from toolbox.ui import UI
  File "D:\Projects\Voice Cloning\SV2TTS - shawwn\Real-Time-Voice-Cloning-master (python)\toolbox\ui.py", line 1, in <module>
    from matplotlib.backends.backend_qt5agg import FigureCanvasQTAgg as FigureCanvas
  File "C:\python-programs\Anaconda3\envs\rtvc\lib\site-packages\matplotlib\backends\backend_qt5agg.py", line 11, in <module>
    from .backend_qt5 import (
  File "C:\python-programs\Anaconda3\envs\rtvc\lib\site-packages\matplotlib\backends\backend_qt5.py", line 15, in <module>
    import matplotlib.backends.qt_editor.figureoptions as figureoptions
  File "C:\python-programs\Anaconda3\envs\rtvc\lib\site-packages\matplotlib\backends\qt_editor\figureoptions.py", line 12, in <module>
    from matplotlib.backends.qt_compat import QtGui
  File "C:\python-programs\Anaconda3\envs\rtvc\lib\site-packages\matplotlib\backends\qt_compat.py", line 168, in <module>
    raise ImportError("Failed to import any qt binding")
ImportError: Failed to import any qt binding

Process finished with exit code 1

 

>conda list pyqt*

# Name                    Version                   Build  Channel

pyqt                      5.9.2            py37h6538335_4    conda-forge

Apparently pyqt v5.9.2 is not the same same as PyQT5.

>pip install PyQT5

>conda list pyqt*
# Name                    Version                   Build  Channel
pyqt                      5.9.2            py37h6538335_4    conda-forge
pyqt5                     5.14.2                   pypi_0    pypi
pyqt5-sip                 12.7.2                   pypi_0    pypi

Run:  demo_toolbox.py

Results (success, but missing dataset) demo_toolbox.py

WARNING:tensorflow:From D:\Projects\Voice Cloning\SV2TTS - shawwn\Real-Time-Voice-Cloning-master (python)\synthesizer\models\modules.py:91: The name tf.nn.rnn_cell.RNNCell is deprecated. Please use tf.compat.v1.nn.rnn_cell.RNNCell instead.

Arguments:
    datasets_root:    None
    enc_models_dir:   encoder\saved_models
    syn_models_dir:   synthesizer\saved_models
    voc_models_dir:   vocoder\saved_models
    low_mem:          False

Warning: you did not pass a root directory for datasets as argument.
The recognized datasets are:
    LibriSpeech/dev-clean
    LibriSpeech/dev-other
    LibriSpeech/test-clean
    LibriSpeech/test-other
    LibriSpeech/train-clean-100
    LibriSpeech/train-clean-360
    LibriSpeech/train-other-500
    LibriTTS/dev-clean
    LibriTTS/dev-other
    LibriTTS/test-clean
    LibriTTS/test-other
    LibriTTS/train-clean-100
    LibriTTS/train-clean-360
    LibriTTS/train-other-500
    LJSpeech-1.1
    VoxCeleb1/wav
    VoxCeleb1/test_wav
    VoxCeleb2/dev/aac
    VoxCeleb2/test/aac
    VCTK-Corpus/wav48
Feel free to add your own. You can still use the toolbox by recording samples yourself.

Conda Environment

>conda list

(rtvc) C:\Users\aholiday>conda list
# packages in environment at C:\python-programs\Anaconda3\envs\rtvc:
#
# Name                    Version                   Build  Channel
_tflow_select             2.3.0                       mkl
absl-py                   0.9.0            py37hc8dfbb8_1    conda-forge
astor                     0.7.1                      py_0    conda-forge
audioread                 2.1.8            py37hc8dfbb8_2    conda-forge
blas                      1.0                         mkl
ca-certificates           2020.4.5.1           hecc5488_0    conda-forge
certifi                   2020.4.5.1       py37hc8dfbb8_0    conda-forge
cffi                      1.14.0           py37h7a1dbc1_0
chardet                   3.0.4           py37hc8dfbb8_1006    conda-forge
cpuonly                   1.0                           0    pytorch
cryptography              2.8              py37hb32ad35_1    conda-forge
cycler                    0.10.0                     py_2    conda-forge
decorator                 4.4.2                      py_0    conda-forge
dill                      0.3.1.1          py37hc8dfbb8_1    conda-forge
freetype                  2.9.1                ha9979f8_1
gast                      0.3.3                      py_0    conda-forge
grpcio                    1.23.0           py37h3f65fb1_1    conda-forge
h5py                      2.10.0          nompi_py37h422b98e_102    conda-forge
hdf5                      1.10.5          nompi_ha405e13_1104    conda-forge
icc_rt                    2019.0.0             h0cc432a_1
icu                       58.2                 ha66f8fd_1
idna                      2.9                        py_1    conda-forge
importlib_metadata        1.5.0                    py37_0
inflect                   4.1.0                    py37_0
intel-openmp              2019.4                      245
joblib                    0.14.1                     py_0
jpeg                      9b                   hb83a4c4_2
keras-applications        1.0.8                      py_1    conda-forge
keras-preprocessing       1.1.0                      py_0    conda-forge
kiwisolver                1.2.0            py37heaa310e_0    conda-forge
libblas                   3.8.0                    14_mkl    conda-forge
libcblas                  3.8.0                    14_mkl    conda-forge
liblapack                 3.8.0                    14_mkl    conda-forge
libmklml                  2019.0.5                      0
libpng                    1.6.37               h2a8f88b_0
libprotobuf               3.11.4               h1a1b453_0    conda-forge
librosa                   0.6.3                      py_0    conda-forge
libsodium                 1.0.17               h2fa13f4_0    conda-forge
libtiff                   4.1.0                h56a325e_0
llvmlite                  0.31.0           py37ha925a31_0
m2w64-gcc-libgfortran     5.3.0                         6
m2w64-gcc-libs            5.3.0                         7
m2w64-gcc-libs-core       5.3.0                         7
m2w64-gmp                 6.1.0                         2
m2w64-libwinpthread-git   5.0.0.4634.697f757               2
markdown                  3.2.1                      py_0    conda-forge
matplotlib                3.2.1                         0    conda-forge
matplotlib-base           3.2.1            py37h911224e_0    conda-forge
mkl                       2019.4                      245
mkl-service               2.3.0            py37hfa6e2cd_0    conda-forge
msys2-conda-epoch         20160418                      1
multiprocess              0.70.9           py37h8055547_1    conda-forge
ninja                     1.9.0            py37h74a9793_0
numba                     0.48.0           py37h47e9c7a_0
numpy                     1.18.1           py37h90d3380_1    conda-forge
olefile                   0.46                     py37_0
openssl                   1.1.1f               hfa6e2cd_0    conda-forge
pillow                    7.0.0            py37hcc1f983_0
pip                       20.0.2                   py37_1
portaudio                 19.6.0               hca4a3dc_2    conda-forge
protobuf                  3.11.4           py37h5fe3f0a_1    conda-forge
pycparser                 2.20                       py_0
pyopenssl                 19.1.0                     py_1    conda-forge
pyparsing                 2.4.7              pyh9f0ad1d_0    conda-forge
pyqt                      5.9.2            py37h6538335_4    conda-forge
pyqt5                     5.14.2                   pypi_0    pypi
pyqt5-sip                 12.7.2                   pypi_0    pypi
pyreadline                2.1                   py37_1001    conda-forge
pysocks                   1.7.1            py37hc8dfbb8_1    conda-forge
python                    3.7.7           h60c2a47_0_cpython
python-dateutil           2.8.1                      py_0    conda-forge
python-sounddevice        0.3.15             pyh8c360ce_0    conda-forge
python_abi                3.7                     1_cp37m    conda-forge
pytorch                   1.4.0               py3.7_cpu_0  [cpuonly]  pytorch
pyzmq                     19.0.0           py37h8c16cda_1    conda-forge
qt                        5.9.7            vc14h73c81de_0
requests                  2.23.0             pyh8c360ce_2    conda-forge
resampy                   0.2.2                      py_0    conda-forge
scikit-learn              0.22.1           py37h6288b17_0
scipy                     1.3.1            py37h29ff71c_0    conda-forge
setuptools                46.1.3                   py37_0
sip                       4.19.8          py37h6538335_1000    conda-forge
six                       1.14.0                     py_1    conda-forge
sqlite                    3.31.1               he774522_0
tbb                       2020.0               h74a9793_0
tensorboard               1.14.0                   py37_0    conda-forge
tensorflow                1.14.0          mkl_py37h7908ca0_0
tensorflow-base           1.14.0          mkl_py37ha978198_0
tensorflow-estimator      1.14.0           py37h5ca1d4c_0    conda-forge
termcolor                 1.1.0                      py_2    conda-forge
tk                        8.6.8                hfa6e2cd_0
torchfile                 0.1.0                      py_0    conda-forge
torchvision               0.5.0                  py37_cpu  [cpuonly]  pytorch
tornado                   6.0.4            py37hfa6e2cd_0    conda-forge
tqdm                      4.44.1                     py_0
umap-learn                0.3.10                   py37_1    zeus1942
unidecode                 1.1.1                      py_0
urllib3                   1.25.8           py37hc8dfbb8_1    conda-forge
vc                        14.1                 h0510ff6_4
visdom                    0.1.8.9                       0    conda-forge
vs2015_runtime            14.16.27012          hf0eaf9b_1
webrtcvad                 2.0.10                   pypi_0    pypi
websocket-client          0.57.0           py37hc8dfbb8_1    conda-forge
werkzeug                  1.0.1              pyh9f0ad1d_0    conda-forge
wheel                     0.34.2                   py37_0
win_inet_pton             1.1.0                    py37_0    conda-forge
wincertstore              0.2                      py37_0
wrapt                     1.12.1           py37h8055547_1    conda-forge
xz                        5.2.4                h2fa13f4_4
zeromq                    4.3.2                h6538335_2    conda-forge
zipp                      2.2.0                      py_0
zlib                      1.2.11            h2fa13f4_1006    conda-forge
zstd                      1.3.7                h508b16e_0