# Source Separation with Sparsity¶


This numerical tour explore local Fourier analysis of sounds, and its application to source separation from stereo measurements.

In [ ]:
using PyPlot
using NtToolBox
using WAV


## Sound Mixing¶

We load 3 sounds and simulate a stero recording by performing a linear blending of the sounds.

In [ ]:
n = 1024*16
s = 3 #number of sounds
p = 2 #number of micros

x = zeros(n,3)


Normalize the energy of the signals.

In [ ]:
x = x./repeat(std(x,1), outer=(n,1));


We mix the sound using a $2\mathrm{x}3$ transformation matrix. Here the direction are well-spaced, but you can try with more complicated mixing matrices.

Compute the mixing matrix

In [ ]:
theta = Array(linspace(0, pi, s + 1)); theta = theta[1:3]
theta[1] = 0.2
M = vcat(cos(theta)', sin(theta)');


Compute the mixed sources.

In [ ]:
y = x*M';


Display of the sounds and their mix.

In [ ]:
figure(figsize = (10,10))

for i in 1:s
subplot(s, 1, i)
plot(x[:, i])
xlim(0,n)
title("Source #$i") end  Display of the micro output. In [ ]: figure(figsize = (10,7)) for i in 1:p subplot(p, 1, i) plot(y[:, i]) xlim(0,n) title("Micro #$i")
end


## Local Fourier analysis of sound.¶

In order to perform the separation, one performs a local Fourier analysis of the sound. The hope is that the sources will be well-separated over the Fourier domain because the sources are sparse after a STFT.

First set up parameters for the STFT.

In [ ]:
w = 128   #size of the window
q = Base.div(w,4);  #overlap of the window


Compute the STFT of the sources.

In [ ]:
X = complex(zeros(w,4*w+1,s))
Y = complex(zeros(w,4*w+1,p))

for i in 1:s
X[:,:,i] = perform_stft(x[:,i],w,q,n)
figure(figsize = (15,10))
plot_spectrogram(X[:,:,i],"Source #$i") end  Exercise 1 Compute the STFT of the micros, and store them into a matrix |Y|. In [ ]: #run -i nt_solutions/audio_2_separation/exo1 include("NtSolutions/audio_2_separation/exo1.jl")  In [ ]: ## Insert your code here.  ## Estimation of Mixing Direction by Clustering¶ Since the sources are quite sparse over the Fourier plane, the directions are well estimated by looking as the direction emerging from a point clouds of the transformed coefficients. First we compute the position of the point cloud. In [ ]: mf = size(Y)[1] mt = size(Y)[2] P = reshape(Y, (mt*mf,p)) P = vcat(real(P), imag(P));  Then we keep only the 5% points with largest energy. Display some points in the original (spacial) domain. Number of displayed points. In [ ]: npts = 6000;  Display the original points. In [ ]: sel = randperm(n) sel = sel[1:npts] figure(figsize = (7,5)) plot(y[sel,1], y[sel,2], ".", ms = 3) xlim(-5,5) ylim(-5,5) title("Time domain");  Exercise 2 Display some points of$P$in the transformed (time/frequency) domain. In [ ]: include("NtSolutions/audio_2_separation/exo2.jl");  In [ ]: ## Insert your code here.  We compute the angle associated to each point over the transformed domain. The histogram shows the main direction of mixing. In [ ]: nrow = size(P)[1] Theta = zeros(nrow) for i in 1:nrow Theta[i] = mod(atan2(P[i,2],P[i,1]),pi) end  Display histogram. In [ ]: nbins = 100 h,t = plt[:hist](Theta,nbins) h=h/sum(h) clf() bar(t[1:end-1], h, width = pi/nbins) xlim(0,pi);  Exercise 3 The histogram computed from the whole set of points are not peacked enough. To stabilize the detection of mixing direction, compute an histogram from a reduced set of point that have the largest amplitude. Compute the energy of each point. Extract only a small sub-set. In [ ]: include("NtSolutions/audio_2_separation/exo3.jl");  In [ ]: ## Insert your code here.  Exercise 4 Detect the direction$M_1$approximating the true direction$M$by looking at the local maxima of the histogram. First detect the set of local maxima, and then keep only the three largest. Sort in descending order. In [ ]: include("NtSolutions/audio_2_separation/exo4.jl")  In [ ]: ## Insert your code here.  ## Separation of the Sources using Clustering¶ Once the mixing direction are known, one can project the sources on the direction. We compute the projection of the coefficients Y on each estimated direction. In [ ]: A = reshape(Y, (mt*mf,p));  Compute the projection of the coefficients on the directions. In [ ]: C = abs(M1'*A');  At each point$x$, the index$I(x)$is the direction which creates the largest projection.$I$is the index of the closest source. In [ ]: tmp, I = compute_max(C,1) I = reshape(I, (mf,mt));  An additional denoising is achieved by removing small coefficients. In [ ]: T = .05 D = sqrt(sum(abs(Y).^2, 3))[:,:,1] I = I.*(D .> T);  We can display the segmentation of the time frequency plane. In [ ]: figure(figsize = (15,10)) imageplot(I[1:Base.div(mf,2),:]) imshow(I[1:Base.div(mf,2),:], cmap = get_cmap("jet"), interpolation = "nearest");  The recovered coefficients are obtained by projection. In [ ]: Proj = M1'*A' Xr = complex(zeros(w,4*w+1,s)) for i in 1:s Xr[:,:,i] = reshape(Proj[i,:], (mf,mt)).*(I .== i) end  The estimated signals are obtained by inverting the STFT. In [ ]: xr = zeros(n,s) for i in 1:s xr[:,i] = perform_stft(Xr[:,:,i], w, q, n) end  One can display the recovered signals. In [ ]: figure(figsize = (10,10)) for i in 1:s subplot(s,1,i) plot(xr[:,i]) xlim(0,n) title("Estimated source #$i")
end


One can listen to the recovered sources.

In [ ]:
i = 1
WAV.wavplay(x[:,i], 15000) # Supported back-ends : AudioQueue (MacOSX) and Pulse Audio (Linux, libpulse-simple).
#There is not a native backend for Windows yet.

In [ ]:
WAV.wavplay(xr[:,i], 15000)