The Swiss Roll Matching Example
Load the Swiss Roll data from the 3D nonlinear Swiss Roll and its 2D linear generating data, do manifold matching, plot the matched embedding, and calculate the distance correlation & testing power by various nonlinear embedding algorithms.
Contents
Original Swiss Roll Data
To start, take the 3D Swiss roll and its corresponding 2D points for matching.
clear; load SwissRoll.mat X_data=X_data(:,1:5000); %The 3D Swiss Roll Y_data=Y_data(:,1:5000); %The 2D Plane (Generating data of Swiss Roll) Y_data=[Y_data' zeros(1, 5000)']'; %Make the ambient dimension of the 2D plane to 3D
Check the input data by scatter plots for validation.
figure scatter3(X_data(1,:),X_data(2,:),X_data(3,:),30,color(1:5000),'o'); title('3D Nonlinear Swiss Roll') figure scatter3(Y_data(1,:),Y_data(2,:),Y_data(3,:),30,color(1:5000),'o'); title('2D Linear Manifold in 3D')
Manifold Matching without Nonlinear Algorithm
Set up the parameters: tran=1000 is the number the training pairs, numData is the number of datasets to match, dimension=2 is the matching dimension, 2*tesn is the number of testing/oos points, K is the number of neighbodhood, iter=-1 uses classical MDS whenever MDS is involved.
tran=1000;numData=2;dim=2;tesn=100;K=10;iter=-1;
cc=color(1:tran+3*tesn); %Take the color scheme of Swiss roll for involved data
Formulate the data for proper input. The first 1000 data are matched training pairs, the next tesn=100 pairs are matched testing pairs, and the last tesn=100 pairs are un-matched testing pairs.
disEuc=[X_data(:,1:tran+2*tesn) [Y_data(:,1:tran+tesn) Y_data(:,tran+2*tesn+1:tran+3*tesn)]]; dis=squareform(pdist(disEuc')); %Form the distance matrix ss=size(dis,2)/2; dis=[dis(1:ss, 1:ss) dis(ss+1:end, ss+1:end)]; %Flatten the distance matrix for our manifold matching algorithm
First, we do Procrustes matching directly without nonlinear embedding. Note that 2*tesn points are used for testing and embedded by out-of-sample technique.
options = struct('nonlinear',0,'match',1,'neighborSize',K,'jointSelection',0,'numData',numData,'oos',2*tesn,'maxIter',iter); [sol, dCorr]=ManifoldMatching(dis,dim,options);
After matching, we check training data, testing matched data, and testing unmatched data using scatter plots.
figure hold on scatter(sol(1,1:tran),sol(2,1:tran),20,cc(1:tran),'o'); %training matched scatter(sol(1,tran+2*tesn+1:2*tran+2*tesn),sol(2,tran+2*tesn+1:2*tran+2*tesn),20,cc(1:tran),'+'); title('Training Matched Data'); xlim([-60 60]); ylim([-30 30]); hold off figure hold on scatter(sol(1,tran+1:tran+tesn),sol(2,tran+1:tran+tesn),30,cc(tran+1:tran+tesn),'o'); %testing matched scatter(sol(1,2*tran+2*tesn+1:2*tran+3*tesn),sol(2,2*tran+2*tesn+1:2*tran+3*tesn),30,cc(tran+1:tran+tesn),'+'); title('Testing Matched Data'); xlim([-60 60]); ylim([-30 30]); hold off figure hold on scatter(sol(1,tran+tesn+1:tran+2*tesn),sol(2,tran+tesn+1:tran+2*tesn),30,cc(tran+tesn+1:tran+2*tesn),'o'); %testing unmatched scatter(sol(1,2*tran+3*tesn+1:2*tran+4*tesn),sol(2,2*tran+3*tesn+1:2*tran+4*tesn),30,cc(tran+2*tesn+1:tran+3*tesn),'+'); title('Testing Unmatched Data'); xlim([-60 60]); ylim([-30 30]); hold off
And if we check the matchedness by connecting each pair by black line, it is a mess and the matched data are never matched.
plotVelocity([sol(:,1:tran) sol(:,tran+2*tesn+1:2*tran+2*tesn)],options.numData); title('Training Matched Data'); xlim([-60 60]); ylim([-30 30]); plotVelocity([sol(:,tran+1:tran+tesn) sol(:,2*tran+2*tesn+1:2*tran+3*tesn)],options.numData); title('Testing Matched Data'); xlim([-60 60]); ylim([-30 30]); plotVelocity([sol(:,tran+tesn+1:tran+2*tesn) sol(:,2*tran+3*tesn+1:2*tran+4*tesn)],options.numData); title('Testing Unmatched Data'); xlim([-60 60]); ylim([-30 30]);
We can check the distance correlation of the training data, as well as the matching test power of the testing data at critical level 0.05. Both metrics are not too high.
dCorr p=plotPower(sol,numData,tesn,20); p(2)
dCorr = 0.6404 ans = 0.4900
Manifold Matching using Joint Isomap
Then we repeat the same procedure using joint Isomap with Procrustes matching.
options = struct('nonlinear',1,'match',1,'neighborSize',K,'jointSelection',1,'numData',numData,'oos',2*tesn,'maxIter',iter); [sol, dCorr]=ManifoldMatching(dis,dim,options);
After matching, we again check training data, testing matched data, and testing unmatched data using scatter plots; they look much better in terms of matching, and also indicate that our embedding & oos codes should be correct in recovering the geometry.
figure hold on scatter(sol(1,1:tran),sol(2,1:tran),20,cc(1:tran),'o'); %training matched scatter(sol(1,tran+2*tesn+1:2*tran+2*tesn),sol(2,tran+2*tesn+1:2*tran+2*tesn),20,cc(1:tran),'+'); title('Training Matched Data'); xlim([-60 60]); ylim([-30 30]); hold off figure hold on scatter(sol(1,tran+1:tran+tesn),sol(2,tran+1:tran+tesn),30,cc(tran+1:tran+tesn),'o'); %testing matched scatter(sol(1,2*tran+2*tesn+1:2*tran+3*tesn),sol(2,2*tran+2*tesn+1:2*tran+3*tesn),30,cc(tran+1:tran+tesn),'+'); title('Testing Matched Data'); xlim([-60 60]); ylim([-30 30]); hold off figure hold on scatter(sol(1,tran+tesn+1:tran+2*tesn),sol(2,tran+tesn+1:tran+2*tesn),30,cc(tran+tesn+1:tran+2*tesn),'o'); %testing unmatched scatter(sol(1,2*tran+3*tesn+1:2*tran+4*tesn),sol(2,2*tran+3*tesn+1:2*tran+4*tesn),30,cc(tran+2*tesn+1:tran+3*tesn),'+'); title('Testing Unmatched Data'); xlim([-60 60]); ylim([-30 30]); hold off
And if we check the matchedness by connecting each pair by black line, it is almost perfect (except two pairs), i.e., matched data are matched in both training and testing, and testing unmatched data are far away.
plotVelocity([sol(:,1:tran) sol(:,tran+2*tesn+1:2*tran+2*tesn)],options.numData); title('Training Matched Data'); plotVelocity([sol(:,tran+1:tran+tesn) sol(:,2*tran+2*tesn+1:2*tran+3*tesn)],options.numData); title('Testing Matched Data'); plotVelocity([sol(:,tran+tesn+1:tran+2*tesn) sol(:,2*tran+3*tesn+1:2*tran+4*tesn)],options.numData); title('Testing Unmatched Data');
The distance correlation and the testing power at 0.05 are perfect.
dCorr p=plotPower(sol,numData,tesn,20); p(2)
dCorr = 1.0000 ans = 1
Manifold Matching using Separate LLE
Next we repeat the same procedure using separate LLE with Procrustes matching.
options = struct('nonlinear',2,'match',1,'neighborSize',K,'jointSelection',0,'numData',numData,'oos',2*tesn,'maxIter',iter); [sol, dCorr]=ManifoldMatching(dis,dim,options);
After matching, we check training data, testing matched data, and testing unmatched data using scatter plots as usual.
figure hold on scatter(sol(1,1:tran),sol(2,1:tran),20,cc(1:tran),'o'); %training matched scatter(sol(1,tran+2*tesn+1:2*tran+2*tesn),sol(2,tran+2*tesn+1:2*tran+2*tesn),20,cc(1:tran),'+'); title('Training Matched Data'); xlim([-3 3]); ylim([-3 3]); hold off figure hold on scatter(sol(1,tran+1:tran+tesn),sol(2,tran+1:tran+tesn),30,cc(tran+1:tran+tesn),'o'); %testing matched scatter(sol(1,2*tran+2*tesn+1:2*tran+3*tesn),sol(2,2*tran+2*tesn+1:2*tran+3*tesn),30,cc(tran+1:tran+tesn),'+'); title('Testing Matched Data'); xlim([-3 3]); ylim([-3 3]); hold off figure hold on scatter(sol(1,tran+tesn+1:tran+2*tesn),sol(2,tran+tesn+1:tran+2*tesn),30,cc(tran+tesn+1:tran+2*tesn),'o'); %testing unmatched scatter(sol(1,2*tran+3*tesn+1:2*tran+4*tesn),sol(2,2*tran+3*tesn+1:2*tran+4*tesn),30,cc(tran+2*tesn+1:tran+3*tesn),'+'); title('Testing Unmatched Data'); xlim([-3 3]); ylim([-3 3]); hold off
And if we check the matchedness by connecting each pair by black line, it is better than no nonlinear algorithm, but not exactly matched and worse than joint isomap.
plotVelocity([sol(:,1:tran) sol(:,tran+2*tesn+1:2*tran+2*tesn)],options.numData); title('Training Matched Data'); xlim([-3 3]); ylim([-3 3]); plotVelocity([sol(:,tran+1:tran+tesn) sol(:,2*tran+2*tesn+1:2*tran+3*tesn)],options.numData); title('Testing Matched Data'); xlim([-3 3]); ylim([-3 3]); plotVelocity([sol(:,tran+tesn+1:tran+2*tesn) sol(:,2*tran+3*tesn+1:2*tran+4*tesn)],options.numData); title('Testing Unmatched Data'); xlim([-3 3]); ylim([-3 3]);
The distance correlation and the testing power are worse than joint Isomap but better than without nonlinear algorithm. Note that if we change the jointSelection option to 1 for LLE, it will exhibit perfect matching as joint Isomap.
dCorr p=plotPower(sol,numData,tesn,20); p(2)
dCorr = 0.9433 ans = 0.8600
Manifold Matching using Laplacian Eigenmaps
At last we show how to use Laplacian eigenmaps to do matching. Note that we use the code from Laurens van der Maaten (http://lvdmaaten.github.io/drtoolbox/), and delete their outlier detection step for our matching purpose. Also OOS is not used here and all testing points are in-sample embedded, please check our paper for reasons. But if the oos option is changed to 2*tesn, the functionality is still supported, and the power will be a little lower; similarly, we can change the oos option previously to 0 for in-sample embedding.
disEuc=[X_data(:,1:tran+2*tesn) [Y_data(:,1:tran+tesn) Y_data(:,tran+2*tesn+1:tran+3*tesn)]];%nonlinear vs linear options = struct('nonlinear',4,'match',1,'neighborSize',K,'jointSelection',0,'weight',1,'scaling',0,'numData',numData,'oos',0,'maxIter',iter); sol=ManifoldMatchingEuc(disEuc,dim,options);
After matching, as usual, we check training data, testing matched data, and testing unmatched data using scatter plots.
figure hold on scatter(sol(1,1:tran),sol(2,1:tran),20,cc(1:tran),'o'); %training matched scatter(sol(1,tran+2*tesn+1:2*tran+2*tesn),sol(2,tran+2*tesn+1:2*tran+2*tesn),20,cc(1:tran),'+'); title('Training Matched Data'); xlim([-0.02 0.02]); ylim([-0.02 0.02]); hold off figure hold on scatter(sol(1,tran+1:tran+tesn),sol(2,tran+1:tran+tesn),30,cc(tran+1:tran+tesn),'o'); %testing matched scatter(sol(1,2*tran+2*tesn+1:2*tran+3*tesn),sol(2,2*tran+2*tesn+1:2*tran+3*tesn),30,cc(tran+1:tran+tesn),'+'); title('Testing Matched Data'); xlim([-0.02 0.02]); ylim([-0.02 0.02]); hold off figure hold on scatter(sol(1,tran+tesn+1:tran+2*tesn),sol(2,tran+tesn+1:tran+2*tesn),30,cc(tran+tesn+1:tran+2*tesn),'o'); %testing unmatched scatter(sol(1,2*tran+3*tesn+1:2*tran+4*tesn),sol(2,2*tran+3*tesn+1:2*tran+4*tesn),30,cc(tran+2*tesn+1:tran+3*tesn),'+'); title('Testing Unmatched Data'); xlim([-0.02 0.02]); ylim([-0.02 0.02]); hold off
And if we check the matchedness by connecting each pair by black line, it is better than no nonlinear algorithm, but worse than joint isomap.
plotVelocity([sol(:,1:tran) sol(:,tran+2*tesn+1:2*tran+2*tesn)],options.numData); title('Training Matched Data'); xlim([-0.02 0.02]); ylim([-0.02 0.02]); plotVelocity([sol(:,tran+1:tran+tesn) sol(:,2*tran+2*tesn+1:2*tran+3*tesn)],options.numData); title('Testing Matched Data'); xlim([-0.02 0.02]); ylim([-0.02 0.02]); plotVelocity([sol(:,tran+tesn+1:tran+2*tesn) sol(:,2*tran+3*tesn+1:2*tran+4*tesn)],options.numData); title('Testing Unmatched Data'); xlim([-0.02 0.02]); ylim([-0.02 0.02]);
Our funtion does not return the distance correlation in this case, and we simply show the testing power. It is better than LLE but not perfect.
p=plotPower(sol,numData,tesn,20); p(2)
ans = 0.9400
All the above simulations can be repeated; which we repeat 100 times in our paper for randomly selected partial data for testing.