Singular Value Decomposition (SVD)

Introduction

The SVD is a matrix factorization technique that decomposes any matrix to a unique set of matrices. The SVD is used for dimension reduction, trend analysis, and potentially for the clustering of a multivariate dataset. SVD is an exploratory approach to the data analysis and therefore it is an unsupervised approach. In other words, you will only need the X block matrix. However, where the Y matrix/vector is available, it (i.e. Y) can be used for building composite models or assess the quality of the clustering.

How?

In SVD the matrix $X_{m \times n}$ is decomposed into the matrices $U_{m \times n}$, $D_{n \times n}$, and $V_{n \times n}^{T}$. The matrix $U_{m \times n}$ is the left singular matrix and it represents a rotation in the matrix space. The $D_{n \times n}$ is diagonal matrix and contains the singular values. This matrix may be indicated with different symbols such as $\Sigma_{n \times n}$. The $D_{n \times n}$ matrix in the geometrical space represents an act of stretching. Each singular value is the degree and/or weight of stretching. We use the notation $D_{n \times n}$ to remind ourselves that this is a diagonal matrix. Finally, $V_{n \times n}^{T}$ is called the right singular matrix and is associated with rotation. Overall, SVD geometrically is a combination of a rotation, a stretching, and a second rotation.

The two matrices $U_{m \times n}$ and $V_{n \times n}^{T}$ are very special due to their unitary properties.

\[ U^{T} \times U = U \times U^{T} = I\\ V^{T} \times V = V \times V^{T} = I \]

Therefore the general matrix expression of SVD is the following:

\[X = UDV^{T}. \]

To deal with the non-square matrices, we have to convert our X matrix to $X^{T} \times X$. This implies that our SVD equation will become the following:

\[X^{T}X = (UDV^{T})^{T} \times UDV^{T}. \]

And after a little bit of linear algebra:

\[X^{T}X = VD^{T} \times DV^{T} \\ and \\ XV = UD. \]

This is a system of two equations with two variables that can be solved. Before looking at an example of such system let's remind ourselves that $VD^{T} \times DV^{T}$ is the solution of eigenvalue/eigenvector decomposition of $X^{T}X$. This means that both D and $V^{T}$ can be calculated by calculating the eigenvalues and eigenvectors of $X^{T}X$. Therefore we can calculate D and V as follows:

\[ D = \sqrt{eigenvalues(X^{T}X)}\\ V = eigenvector(X^{T}X). \]

Once we know V, we can use that and the second equation of SVD to calculate the last part i.e. the matrix U.

\[U = XVD^{-1} \]

Please note that $D^{-1}$ denotes the inverse or pseudo-inverse of the matrix D.

Practical Example

Let's do the SVD calculations together for the below matrix:

using ACS

X = [5 -5;-1 7;1 10]

3×2 Matrix{Int64}:
  5  -5
 -1   7
  1  10

Step 1: $X^{T}X$

# The function transpose(-) is part of LinearAlgebra.jl package that has been automatically installed via ACS.jl package.
# Not all the functions of LinearAlgebra.jl are exported within the ACS.jl environment.
XtX = transpose(X)*X

2×2 Matrix{Int64}:
  27  -22
 -22  174

Step 2: Calculation of D, V, and U

D = diagm(sqrt.(eigvals(XtX))) # A diagonal matrix is generated

2×2 Matrix{Float64}:
 4.87628   0.0
 0.0      13.3125

V = eigvecs(XtX) # Right singular matrix

2×2 Matrix{Float64}:
 -0.989446  -0.144904
 -0.144904   0.989446

U = X*V*pinv(D)	# Left singular matrix

3×2 Matrix{Float64}:
 -0.865969    -0.426048
 -0.00510321   0.531158
 -0.500072     0.732362

Builtin Function

The same calculations can be done with the function svd(-) of ACS package provided via LinearAlgebra.jl package.

 out = svd(X)

LinearAlgebra.SVD{Float64, Float64, Matrix{Float64}}
U factor:
3×2 Matrix{Float64}:
  0.426048  0.865969
 -0.531158  0.00510321
 -0.732362  0.500072
singular values:
2-element Vector{Float64}:
 13.312471610995619
  4.8762792789621585
Vt factor:
2×2 Matrix{Float64}:
 0.144904  -0.989446
 0.989446   0.144904

 D = diagm(out.S) # The singular value matrix

2×2 Matrix{Float64}:
 13.3125  0.0
  0.0     4.87628

 U = out.U # Left singular matrix

3×2 Matrix{Float64}:
  0.426048  0.865969
 -0.531158  0.00510321
 -0.732362  0.500072

 V = transpose(out.Vt) # Right singular matrix

2×2 transpose(::Matrix{Float64}) with eltype Float64:
  0.144904  0.989446
 -0.989446  0.144904

Please note that the builtin function sorts the singular values in descending order and consequently the other two matrices are also sorted following the same. Additionally, for ease of calculations the builtin function generates the mirror image of the U and V matrices. These differences essentially do not impact your calculations at all, as long as they are limited to what is listed above.

Step 3 Calculation of $\hat{X}$

Using both the manual method and the builtin function, you can calculate $\hat{X}$ following the below operation.

X_hat = U*D*transpose(V)

3×2 Matrix{Float64}:
  5.0  -5.0
 -1.0   7.0
  1.0  10.0

Applications

As mentioned above SVD has several applications in different fields. Here we will focus on three, namely: dimension reduction, clustering/trend analysis, and multivariate regression. This dataset contains five variables (i.e. columns) and 150 measurements (i.e. rows). The last variable "Species" is a categorical variable which defines the flower species.

Dimension Reduction

To show case the power of SVD in dimension reduction we will use the Iris dataset from Rdatasets.

using ACS

data = dataset("datasets", "iris")
describe(data) # Summarizes the dataset

5×7 DataFrame

Row	variable	mean	min	median	max	nmissing	eltype
	Symbol	Union…	Any	Union…	Any	Int64	DataType
1	SepalLength	5.84333	4.3	5.8	7.9	0	Float64
2	SepalWidth	3.05733	2.0	3.0	4.4	0	Float64
3	PetalLength	3.758	1.0	4.35	6.9	0	Float64
4	PetalWidth	1.19933	0.1	1.3	2.5	0	Float64
5	Species		setosa		virginica	0	CategoricalValue{String, UInt8}

Here we show how SVD is used for dimension reduction with the iris dataset. First we need to convert the dataset from table (i.e. dataframe) to a matrix. For data we can use the function Matrix(-) builtin in the julia core language.

Y = data[!,"Species"]
X = Matrix(data[:,1:4]); # The first four columns are selected for this

150×4 Matrix{Float64}:
 5.1  3.5  1.4  0.2
 4.9  3.0  1.4  0.2
 4.7  3.2  1.3  0.2
 4.6  3.1  1.5  0.2
 5.0  3.6  1.4  0.2
 5.4  3.9  1.7  0.4
 4.6  3.4  1.4  0.3
 5.0  3.4  1.5  0.2
 4.4  2.9  1.4  0.2
 4.9  3.1  1.5  0.1
 ⋮              
 6.9  3.1  5.1  2.3
 5.8  2.7  5.1  1.9
 6.8  3.2  5.9  2.3
 6.7  3.3  5.7  2.5
 6.7  3.0  5.2  2.3
 6.3  2.5  5.0  1.9
 6.5  3.0  5.2  2.0
 6.2  3.4  5.4  2.3
 5.9  3.0  5.1  1.8

Now we can perform SVD on the X and try to assess the underlying trends in the data.

 out = svd(X)

LinearAlgebra.SVD{Float64, Float64, Matrix{Float64}}
U factor:
150×4 Matrix{Float64}:
 -0.0616168   0.129611    0.0021386    0.00163819
 -0.0580709   0.11102     0.0706724    0.051757
 -0.056763    0.117966    0.00434255   0.00955702
 -0.0566534   0.105308    0.00592467  -0.0416439
 -0.0612302   0.13109    -0.0318811   -0.0322148
 -0.0675032   0.130885   -0.0685372   -0.0113642
 -0.0574821   0.116598   -0.0664137   -0.0267434
 -0.0609726   0.120943    0.00543027  -0.0240567
 -0.0537612   0.0999415   0.0176366   -0.0165154
 -0.0588267   0.112043    0.0649689   -0.030472
  ⋮                                   
 -0.0975766  -0.0421663  -0.0477535    0.269324
 -0.0866823  -0.0643397  -0.0672473   -0.0101401
 -0.101467   -0.0726079  -0.0954496    0.0314228
 -0.100361   -0.0670195  -0.157083     0.128363
 -0.0961497  -0.0524346  -0.0589711    0.226609
 -0.0892692  -0.0585064   0.0460294    0.134135
 -0.0940593  -0.0498297  -0.04144      0.0728945
 -0.0948896  -0.0561012  -0.212978     0.0231635
 -0.0884784  -0.0515697  -0.0957528   -0.0835063
singular values:
4-element Vector{Float64}:
 95.95991387196453
 17.76103365732857
  3.460930930386972
  1.884826305918045
Vt factor:
4×4 Matrix{Float64}:
 -0.751108  -0.380086  -0.513009   -0.167908
  0.284175   0.546745  -0.708665   -0.343671
  0.502155  -0.675243  -0.0591662  -0.537016
  0.320814  -0.317256  -0.480745    0.751872

 D = diagm(out.S) # The singular value matrix

4×4 Matrix{Float64}:
 95.9599   0.0    0.0      0.0
  0.0     17.761  0.0      0.0
  0.0      0.0    3.46093  0.0
  0.0      0.0    0.0      1.88483

 U = out.U # Left singular matrix

150×4 Matrix{Float64}:
 -0.0616168   0.129611    0.0021386    0.00163819
 -0.0580709   0.11102     0.0706724    0.051757
 -0.056763    0.117966    0.00434255   0.00955702
 -0.0566534   0.105308    0.00592467  -0.0416439
 -0.0612302   0.13109    -0.0318811   -0.0322148
 -0.0675032   0.130885   -0.0685372   -0.0113642
 -0.0574821   0.116598   -0.0664137   -0.0267434
 -0.0609726   0.120943    0.00543027  -0.0240567
 -0.0537612   0.0999415   0.0176366   -0.0165154
 -0.0588267   0.112043    0.0649689   -0.030472
  ⋮                                   
 -0.0975766  -0.0421663  -0.0477535    0.269324
 -0.0866823  -0.0643397  -0.0672473   -0.0101401
 -0.101467   -0.0726079  -0.0954496    0.0314228
 -0.100361   -0.0670195  -0.157083     0.128363
 -0.0961497  -0.0524346  -0.0589711    0.226609
 -0.0892692  -0.0585064   0.0460294    0.134135
 -0.0940593  -0.0498297  -0.04144      0.0728945
 -0.0948896  -0.0561012  -0.212978     0.0231635
 -0.0884784  -0.0515697  -0.0957528   -0.0835063

 V = transpose(out.Vt) # Right singular matrix

4×4 transpose(::Matrix{Float64}) with eltype Float64:
 -0.751108   0.284175   0.502155    0.320814
 -0.380086   0.546745  -0.675243   -0.317256
 -0.513009  -0.708665  -0.0591662  -0.480745
 -0.167908  -0.343671  -0.537016    0.751872

As you may have noticed, there are four variables in the original data and four non-zero singular values. Each column in the lift singular matrix is associated with one singular value and one row in the V matrix. For example the first column of sorted U matrix (i.e. via the builtin function) is directly connected to the first singular value of 95.9 and the first row of the matrix V. With all four singular values we can describe 100% of variance in the data (i.e. $\hat{X} = X$). By removing the smaller or less important singular values we can reduce the number of dimensions in the data. We can calculate the variance explained by each singular value via two different approaches.

 var_exp = diag(D) ./ sum(D) # diag() selects the diagonal values in a matrix

4-element Vector{Float64}:
 0.8059340691495327
 0.1491687679800496
 0.029067159767294893
 0.015830003103122974

 var_exp_cum = cumsum(diag(D)) ./ sum(D) # cumsum() calculates the cumulative sum

4-element Vector{Float64}:
 0.8059340691495327
 0.9551028371295822
 0.9841699968968772
 1.0000000000000002

 scatter(1:length(var_exp),var_exp,label="Individual")
 plot!(1:length(var_exp),var_exp,label=false)

 scatter!(1:length(var_exp),var_exp_cum,label="Cumulative")
 plot!(1:length(var_exp),var_exp_cum,label=false)
 xlabel!("Nr Singular Values")
 ylabel!("Variance Explained")

Given that the first two singular values explain more than 95% variance in the data, they are considered enough for modeling our dataset. The next step here is to first plot the scores (i.e. the left singular matrix) of first and second singular values against each other to see whether we have a model or not. Each column in the $U$ matrix represents a set of scores associated with a singular value (e.g. first column for the first singular value).

 scatter(U[:,1],U[:,2],label=false)
 xlabel!("First Singular value (81%)")
 ylabel!("Second Singular value (15%)")

At this point we are assuming that we do not have any idea about the plant species included in our dataset. Now we need to connect the singular values to individual variables. For that similarly to PCA we will take advantage of the loadings, which in this case are the columns of the $V$ or the rows of $V^{T}$.

 bar(V[:,1] ./ sum(abs.(V)),label="First SV")
 bar!(V[:,2] ./ sum(abs.(V)),label="Second SV")
 xlabel!("Variable Nr")
 ylabel!("Importance")
 #ylims!(-0.1,0.1)

The sign of each loading value shows the relationship between the variable and the model. For example, based on the first SV the variable number one and two both have a negative impact on the final model (i.e. scores of the SV1). A positive impact indicates an increase of the final model scores with the variable while a negative impact means a decrease in the score values with an increase the variable.

 p1 = scatter(X[:,1],U[:,1],label=false)
 xlabel!("SepalLength")
 ylabel!("Scores U1")

  p2 = scatter(X[:,2],U[:,1],label=false)
 xlabel!("SepalWidth")
 ylabel!("Scores U1")

  p3 = scatter(X[:,2],U[:,2],label=false)
 xlabel!("SepalWidth")
 ylabel!("Scores U2")

 p4 = scatter(X[:,3],U[:,2],label=false)
 xlabel!("PetalLength")
 ylabel!("Scores U2")

 plot(p1,p2,p3,p4,layout = (2,2))

In this particular case, the SV1 is a linear combination of SepalLength and SepalWidth while the SV2 is a linear combination of all four variables. This implies that we can cover the variance present in the $X$ with two variables, which are $U1$ and $U2$. For this dataset, we have a reduction of variables from 4 to 2, which may not look impressive. However, this can be a very useful technique when dealing with a large number of variables (the octane example).

Clustering

When we perform cluster analysis or most modeling approaches, we need to divide our data into training and test sets. We usually go for a division of 80% for training set and 20% for the test. More details are provided in the cross-validation chapter. Let's randomly select 15 data points to put aside as the test set.

n = 15 # number of points to be selected

rand_ind = rand(1:size(X,1),n) # generate a set of random numbers between 1 and size(X,1)
ind_tr = ones(size(X,1))       # generate a matrix of indices
ind_tr[rand_ind] .= 0          # set the test set values' indices to zero
X_tr = X[ind_tr .== 1,:]       # select the training set
X_ts = X[rand_ind,:]           # select the test set
data[rand_ind,:]

15×5 DataFrame

Row	SepalLength	SepalWidth	PetalLength	PetalWidth	Species
	Float64	Float64	Float64	Float64	Cat…
1	5.6	2.8	4.9	2.0	virginica
2	5.5	2.4	3.8	1.1	versicolor
3	7.6	3.0	6.6	2.1	virginica
4	6.1	3.0	4.9	1.8	virginica
5	6.3	2.7	4.9	1.8	virginica
6	5.0	3.2	1.2	0.2	setosa
7	6.5	2.8	4.6	1.5	versicolor
8	6.1	2.8	4.0	1.3	versicolor
9	5.0	3.6	1.4	0.2	setosa
10	5.7	3.8	1.7	0.3	setosa
11	6.9	3.1	5.4	2.1	virginica
12	5.0	3.6	1.4	0.2	setosa
13	5.4	3.9	1.7	0.4	setosa
14	5.4	3.4	1.7	0.2	setosa
15	5.1	3.4	1.5	0.2	setosa

Now that we have training and test sets separated, we can build our model using the training set. This implies that the model has never seen the values in the test set. It should be noted that we always want the homogenous distribution of measurements in the test set. Also, each iteration here will result in a different test set as a new set of random numbers are generated. Now let's build our model with only the $X_{tr}$ following the same procedure as before.

 out = svd(X_tr)

LinearAlgebra.SVD{Float64, Float64, Matrix{Float64}}
U factor:
136×4 Matrix{Float64}:
 -0.0644915   0.139274    0.00390965   0.00347969
 -0.0607864   0.119401    0.0743097    0.0553671
 -0.0594122   0.126776    0.00601556   0.0114573
 -0.0593075   0.113304    0.00775086  -0.0408228
 -0.060169    0.12535    -0.0668757   -0.0263947
 -0.063823    0.130036    0.00733603  -0.0227442
 -0.0562792   0.107528    0.0196769   -0.0151021
 -0.061579    0.120506    0.0687161   -0.0286915
 -0.0683386   0.146767    0.0069212   -0.00466566
 -0.0627493   0.122368   -0.0242302   -0.0839512
  ⋮                                   
 -0.102313   -0.0426679  -0.0487663    0.27788
 -0.0909141  -0.0665037  -0.0682495   -0.00819832
 -0.106418   -0.0749638  -0.0972876    0.0344471
 -0.105255   -0.0690282  -0.161203     0.132766
 -0.100826   -0.0536231  -0.0602775    0.23408
 -0.093617   -0.0602601   0.0482396    0.14055
 -0.0986344  -0.0509049  -0.0416952    0.077145
 -0.0995139  -0.0575247  -0.218568     0.0244605
 -0.0927887  -0.0528727  -0.0973246   -0.0834407
singular values:
4-element Vector{Float64}:
 91.54301665593304
 16.69359148734842
  3.353705321118695
  1.8446586145388477
Vt factor:
4×4 Matrix{Float64}:
 -0.749506  -0.37855   -0.516091   -0.16909
  0.28665    0.54883   -0.706691   -0.342354
  0.500962  -0.672587  -0.0567141  -0.541707
  0.324206  -0.321111  -0.480648    0.748836

 D = diagm(out.S) # The singular value matrix

4×4 Matrix{Float64}:
 91.543   0.0     0.0      0.0
  0.0    16.6936  0.0      0.0
  0.0     0.0     3.35371  0.0
  0.0     0.0     0.0      1.84466

 U = out.U # Left singular matrix

136×4 Matrix{Float64}:
 -0.0644915   0.139274    0.00390965   0.00347969
 -0.0607864   0.119401    0.0743097    0.0553671
 -0.0594122   0.126776    0.00601556   0.0114573
 -0.0593075   0.113304    0.00775086  -0.0408228
 -0.060169    0.12535    -0.0668757   -0.0263947
 -0.063823    0.130036    0.00733603  -0.0227442
 -0.0562792   0.107528    0.0196769   -0.0151021
 -0.061579    0.120506    0.0687161   -0.0286915
 -0.0683386   0.146767    0.0069212   -0.00466566
 -0.0627493   0.122368   -0.0242302   -0.0839512
  ⋮                                   
 -0.102313   -0.0426679  -0.0487663    0.27788
 -0.0909141  -0.0665037  -0.0682495   -0.00819832
 -0.106418   -0.0749638  -0.0972876    0.0344471
 -0.105255   -0.0690282  -0.161203     0.132766
 -0.100826   -0.0536231  -0.0602775    0.23408
 -0.093617   -0.0602601   0.0482396    0.14055
 -0.0986344  -0.0509049  -0.0416952    0.077145
 -0.0995139  -0.0575247  -0.218568     0.0244605
 -0.0927887  -0.0528727  -0.0973246   -0.0834407

 V = transpose(out.Vt) # Right singular matrix

4×4 transpose(::Matrix{Float64}) with eltype Float64:
 -0.749506   0.28665    0.500962    0.324206
 -0.37855    0.54883   -0.672587   -0.321111
 -0.516091  -0.706691  -0.0567141  -0.480648
 -0.16909   -0.342354  -0.541707    0.748836

Let's plot our results for the first two SVs, as we did before. However, this time we will take the knowledge of the different species into account.

 var_exp = diag(D) ./ sum(D) # variance explained
 Y_tr = data[ind_tr .== 1,"Species"]
 Y_ts = data[ind_tr .== 0,"Species"]
 scatter(U[:,1],U[:,2],label=["Setosa" "Versicolor" "Virginica"], group = Y_tr)
 xlabel!("First Singular value (81%)")
 ylabel!("Second Singular value (14%)")

As it can be seen, this model is very similar to our previous model based on the full dataset. Now we need to first define thresholds for each class based on the score values in the $U1$ and $U2$ space. This is typically more difficult to assess. However, for this case the main separating factor is the $U2$ values (e.g. $U2 \geq 0.05 = Setosa$).

 scatter(U[:,1],U[:,2],label=["Setosa" "Versicolor" "Virginica"], group = Y_tr)
 plot!([-0.15,0],[0.05,0.05],label="Setosa")
 plot!([-0.15,0],[-0.04,-0.04],label="Virginica")
 xlabel!("First Singular value (81%)")
 ylabel!("Second Singular value (14%)")

The next step is to calculate the score values for the measurements in the test set. This will enable us to estimate the class associated with each data point in the test set. To do this we need to do a little bit of linear algebra.

\[X = UDV^{T}\\ U_{test} = X \times (DV^{T})^{-1} \]

In practice:

 U_test = X_ts * pinv(D*transpose(V))

15×4 Matrix{Float64}:
 -0.0887473  -0.0602345    -0.13095     0.0319505
 -0.0784106  -0.0100786     0.098307    0.00527021
 -0.115718   -0.0933336     0.0827905  -0.0537177
 -0.0932987  -0.0409719    -0.0640672   0.00382242
 -0.0936956  -0.0474006     0.0259731   0.0911961
 -0.0613047   0.13616       0.0525194   0.0902396
 -0.0935012  -0.0218264     0.0893239   0.0653232
 -0.0864742   0.000806589   0.0720251   0.0701692
 -0.0640863   0.140844     -0.031083   -0.0315033
 -0.0725206   0.144689      0.0121442   0.0191353
 -0.103635   -0.0512662    -0.0215346   0.118521
 -0.0640863   0.140844     -0.031083   -0.0315033
 -0.0706626   0.140774     -0.0688761  -0.0104036
 -0.0682256   0.128438      0.0637042  -0.00455512
 -0.0646418   0.131753      0.0222736  -0.00516886


 scatter(U[:,1],U[:,2],label=["Setosa" "Versicolor" "Virginica"], group = Y_tr)
 plot!([-0.15,0],[0.05,0.05],label="Setosa")
 plot!([-0.15,0],[-0.04,-0.04],label="Virginica")
 scatter!(U_test[Y_ts[:] .== "setosa" ,1],U_test[Y_ts[:] .== "setosa",2],label="Setosa",marker=:d)
 scatter!(U_test[Y_ts[:] .== "versicolor",1],U_test[Y_ts[:] .== "versicolor",2],label="Versicolor",marker=:d)
 scatter!(U_test[Y_ts[:] .== "virginica",1],U_test[Y_ts[:] .== "virginica",2],label="Virginica",marker=:d)
 xlabel!("First Singular value (81%)")
 ylabel!("Second Singular value (14%)")

As it can be seen from the results of the test set, our model is not prefect but it does well for most cases. It should be noted that steps such as data pre-treatment and the use of supervised methods may improve the results of your cluster analysis. The use of SVD for prediction is not recommended. It must be mainly used for the dimension reduction and data exploration.

Regression

If you have a dataset (e.g. octane dataset in the additional example), where the SVD is used to reduce the dimensions of the dataset. In this case the we can perform a least square regression using the selected columns of $U$ rather than the original $X$. For example in case of iris dataset the $U1$ and $U2$ can be used to replace $X$.

 X_svr = U[:,1:m] # m is the number of selected SVs 
 Y_svr            # does not exist for iris dataset. for the octane dataset is the octane column
 b = pinv(transpose(X_svr) * X_svr) * transpose(X_svr) * Y_svr # simple least square solution
 y_hat = X_svr * b # prediction the y_hat

Trend Analysis

We also can assess the trend represented by each SV in our model. This is typically done by setting all SV values except one to zero. Then the new $D$ is used to predict $\hat{X}$. Then different variables are plotted against each other for both $X$ matrices.

 D_temp = out.S
 D_temp[2:end] .= 0
 D_n = diagm(D_temp) # the new singular value matrix

4×4 Matrix{Float64}:
 91.543  0.0  0.0  0.0
  0.0    0.0  0.0  0.0
  0.0    0.0  0.0  0.0
  0.0    0.0  0.0  0.0

Then the $\hat{X}$ is calculated.

 X_h  = U * D_n * transpose(V)
 X_h[1:5,:]

5×4 Matrix{Float64}:
 4.42489  2.23486  3.04687  0.998263
 4.17068  2.10647  2.87182  0.940912
 4.07639  2.05885  2.8069   0.91964
 4.06921  2.05522  2.80195  0.918019
 4.12832  2.08507  2.84265  0.931354

Now if we plot the SepalLength vs SepalWidth we can clearly see a clear 1 to 2 relationship between the two variables which is being detected by the first SV. This can be done for other variables and SVs.

 scatter(X[:,1],X[:,2],label="X")
 scatter!(X_h[:,1],X_h[:,2],label="X_h")
 xlabel!("SepalLength")
 ylabel!("SepalWidth")

Additional Example

If you are interested in practicing more, you can use the NIR.csv file provided in the folder dataset of the package ACS.jl github repository. Please note that this is an SVR problem, where you can first use SVD for the dimension reduction and then use the selected SVs for the regression.

If you are interested in math behind SVD and would like to know more you can check this MIT course material.