Skip to content

rrmhearts/merlin

 
 

Repository files navigation

Python application

Merlin: The Neural Network (NN) based Speech Synthesis System

This repository contains the Neural Network (NN) based Speech Synthesis System developed at the Centre for Speech Technology Research (CSTR), University of Edinburgh.

Merlin is a toolkit for building Deep Neural Network models for statistical parametric speech synthesis. It must be used in combination with a front-end text processor (e.g., Festival) and a vocoder (e.g., STRAIGHT or WORLD).

The system is written in Python and relies on Keras and TensorFlow. Merlin comes with recipes (in the spirit of the Kaldi automatic speech recognition toolkit) to show you how to build state-of-the art systems.

Merlin is free software, distributed under an Apache License Version 2.0, allowing unrestricted commercial and non-commercial use alike.

Read the documentation at cstr-edinburgh.github.io/merlin.

Note: This repository is a fork. The original repository is located at https://github.com/CSTR-Edinburgh/merlin. This fork contains updates and modifications; refer to the commit history and any open pull requests for details on changes made.

Merlin is compatible with: Python 3.10 (See notes below).

Coffee is a proven love language. If this has proven helpful,

"Buy Me A Coffee"

Installation

Merlin uses the following dependencies:

  • python >= 3.8 (ideally 3.10)
  • Keras >= 3.X
  • tensorflow (optional, required if you use tensorflow models)
  • pytorch (optional, required if you use torch models)
  • sklearn, scipy, h5py (optional, required if you use keras models)

To install Merlin, cd merlin and run the below steps:

  • Create a virtual environment (recommended):

    python3 -m venv .venv
    source .venv/bin/activate
  • Install dependencies:

    bash ./install.sh
    pip install -r requirements.txt # if something is missing

For detailed instructions, to build the toolkit: see INSTALL and CSTR blog post.
These instructions are valid for UNIX systems including various flavors of Linux;

Important Notes on Python and TensorFlow:

  • Due to the rapid evolution of TensorFlow and its compatibility with various Python versions, it's crucial to use compatible versions. The officially supported Python versions for this fork are 3.8-3.10. Later versions might work, but are not guaranteed.
  • TensorFlow versions prior to 2.0 are not supported.
  • If you encounter issues, double-check your Python and TensorFlow versions.
  • Consider using a virtual environment to manage dependencies and avoid conflicts.

Getting started with Merlin

To run the example system builds, see egs/README.txt

As a first demo, please follow the scripts in egs/slt_arctic or egs/build_your_own_voice/s1[_python]. The simpliest demo is to do the following:

cd egs/build_your_own_voice/s1_python
python run_merlin_workflow.py --setup-data --train-tts --run-tts
# The resulting wav files are then at
cd experiments/slt_arctic/test_synthesis/wav

Now, you can also follow Josh Meyer's blog post for detailed instructions on how to install Merlin and build SLT demo voice (This blog post might be outdated, use with caution).

For a more in-depth tutorial about building voices with Merlin, you can check out:

Synthetic speech samples

Listen to synthetic speech samples from our SLT arctic voice.

Development pattern for contributors

  1. Create a personal fork of the main Merlin repository in GitHub.
  2. Make your changes in a named branch different from master, e.g. you create a branch my-new-feature.
  3. Generate a pull request through the Web interface of GitHub.

Important Considerations for this Fork:

  • Contribute to this fork similarly at this repository!
  • Before submitting pull requests, ensure that your changes are compatible with the changes introduced in this fork. Carefully review the commit history and any open pull requests.
  • Specify clearly in your pull request description the purpose of your changes and how they relate to the original repository and this fork.

Contact Us

Post your questions, suggestions, and discussions to GitHub Issues. For issues related to this specific fork, use this repo or contact rrmhearts and please clearly indicate that in your issue title or description.

Citation

If you publish work based on Merlin, please cite:

Zhizheng Wu, Oliver Watts, Simon King, "Merlin: An Open Source Neural Network Speech Synthesis System" in Proc. 9th ISCA Speech Synthesis Workshop (SSW9), September 2016, Sunnyvale, CA, USA.

About

This is the UNOFFICIAL location of the Merlin project, but it works well on Python 3.10 with Keras 3.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 41.5%
  • C++ 26.0%
  • Shell 24.4%
  • C 5.2%
  • Scheme 1.6%
  • Awk 0.7%
  • Other 0.6%