This notebook demonstrates how SyntheisFilters.jl works. Below we provide synthesized audio examples (Japanese) so that you are able to compare synthesis filters on your browser. Please read on.
In this notebook, the following synthesis fileters are demonstrated.
using PyCall
matplotlib = pyimport("matplotlib")
PyDict(matplotlib["rcParams"])["figure.figsize"] = (12, 5)
using PyPlot
WARNING: using PyPlot.matplotlib in module Main conflicts with an existing identifier.
# https://gist.github.com/jfsantos/a39ed69a7894876f1e04#file-audiodisplay-jl
# Thanks, @jfsantos
include("AudioDisplay.jl")
inline_audioplayer (generic function with 2 methods)
using WAV
using DSP
using MelGeneralizedCepstrums # to esimate spectral envelope parameters
using SynthesisFilters
# plotting utilities
function wavplot(x; label="a waveform", x_label="sample")
plot(1:endof(x), x, "b", label=label)
xlim(1, endof(x))
xlabel(x_label)
legend()
end
function wavcompare(x, y; label="synthesized waveform", x_label="sample")
plot(1:endof(y), y, "r-+", label=label)
plot(1:endof(x), x, label="original speech signal")
xlim(1, endof(x))
xlabel(x_label)
legend()
end
wavcompare (generic function with 1 method)
In this notebook, we use the follwoing audio data to analyze and re-synthesize. Let's see and listen the example.
x, fs = wavread(joinpath(dirname(@__FILE__), "data", "test16k.wav"), format="native")
x = convert(Vector{Float64}, vec(x))
fs = convert(Int, fs)
wavplot(x)
inline_audioplayer(map(Int16, x), fs)
To syntheisze a wavefrom, basically you need two speech parameters:
MelGenralizedCepstrums.jl supports extracting lots of spectral parameters.
In this notebook, we use a pre-extracted excitation signal, for test16k.wav in the example directory.
# Note about excitation
# fs: 16000
# frame period: 5.0 ms
# F0 analysis: esimated by WORLD.dio and WORLD.stonemask
# Excitation genereration: perioic pulse for voiced segments and gaussian random
# values for un-voiced segments
base_excitation = vec(readdlm(joinpath(dirname(@__FILE__), "data", "test16k_excitation.txt")))
wavplot(base_excitation)
inline_audioplayer(base_excitation ./ maximum(base_excitation), fs)
This ia a basic step before mel-genrealized cesptrum analysiis. Note that windowing is essential for mel-generalized cepstrum analysis.
framelen = 512
hopsize = 80 # 5.0 ms for fs 16000
noverlap = framelen - hopsize
# Note that mgcep analysis basically assumes power-normalized window so that Σₙ w(n)² = 1
win = DSP.blackman(framelen) ./ sqrt(sumabs2(DSP.blackman(framelen)))
@assert isapprox(sumabs2(win), 1.0)
# create windowed signal matrix that each column represents a windowed time slice
as = arraysplit(x, framelen, noverlap)
xw = Array(Float64, framelen, length(as))
for t=1:length(as)
xw[:,t] = as[t]
end
# col-wise windowing
xw .*= win;
@show size(xw)
size(xw) = (512,753)
(512,753)
You can extact lots of spectral parameters using MelGenrealizedCepstrums.jl. In the follwoing example, we extract mel-cepstrum from the windowed signal and then show the spectral envelope estimte.
c = estimate(MelCepstrum(20, mcepalpha(fs)), xw)
imshow(c, origin="lower", aspect="auto")
colorbar()
PyObject <matplotlib.colorbar.Colorbar instance at 0x7fcc0f712998>
# Let's see spectral envelope estimate
imshow(real(mgc2sp(c, framelen)), origin="lower", aspect="auto")
colorbar()
PyObject <matplotlib.colorbar.Colorbar instance at 0x7fcc0e247368>
Let's compare syntheiszed waveform with various synthesis filters.
c = estimate(LinearCepstrum(25), xw)
y = synthesis(base_excitation, c, hopsize)
wavcompare(x, y, label="Cepstrum-based synthesized waveform")
inline_audioplayer(round(Int16, clamp(y, typemin(Int16), typemax(Int16))), fs)
c = estimate(MelCepstrum(25, mcepalpha(fs)), xw)
y = synthesis(base_excitation, c, hopsize)
wavcompare(x, y, label="Mel-cepstrum-based synthesized waveform")
inline_audioplayer(round(Int16, clamp(y, typemin(Int16), typemax(Int16))), fs)
c = estimate(MelGeneralizedCepstrum(25, mcepalpha(fs), -1/4), xw)
y = synthesis(base_excitation, c, hopsize)
wavcompare(x, y, label="Mel-generalized cepstrum based synthesized waveform")
inline_audioplayer(round(Int16, clamp(y, typemin(Int16), typemax(Int16))), fs)
l = estimate(LinearPredictionCoef(25), xw, use_mgcep=true)
y = synthesis(base_excitation, l, hopsize)
wavcompare(x, y, label="LPC-based synthesized waveform")
inline_audioplayer(round(Int16, clamp(y, typemin(Int16), typemax(Int16))), fs)
l = lpc2par(estimate(LinearPredictionCoef(25), xw))
y = synthesis(base_excitation, l, hopsize)
wavcompare(x, y, label="PARCOR-based synthesized waveform")
inline_audioplayer(round(Int16, clamp(y, typemin(Int16), typemax(Int16))), fs)
l = lpc2lsp(estimate(LinearPredictionCoef(15), xw))
y = synthesis(base_excitation, l, hopsize)
wavcompare(x, y, label="LSP-based synthesized waveform")
inline_audioplayer(round(Int16, clamp(y, typemin(Int16), typemax(Int16))), fs)