library(rsvddpd)
library(microbenchmark)
library(matrixStats)
library(pcaMethods)
#> Loading required package: Biobase
#> Loading required package: BiocGenerics
#> Loading required package: generics
#> 
#> Attaching package: 'generics'
#> The following objects are masked from 'package:base':
#> 
#>     as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
#>     setequal, union
#> 
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following object is masked from 'package:utils':
#> 
#>     data
#> The following objects are masked from 'package:base':
#> 
#>     anyDuplicated, aperm, append, as.data.frame, basename, cbind,
#>     colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
#>     get, grep, grepl, is.unsorted, lapply, Map, mapply, match, mget,
#>     order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
#>     rbind, Reduce, rownames, sapply, saveRDS, scale, sequence, table,
#>     tapply, transform, unique, unsplit, which.max, which.min
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.
#> 
#> Attaching package: 'Biobase'
#> The following objects are masked from 'package:matrixStats':
#> 
#>     anyMissing, rowMedians
#> 
#> Attaching package: 'pcaMethods'
#> The following object is masked from 'package:stats':
#> 
#>     loadings

Introduction

Singular Value Decomposition (SVD) is a very popular technique which is abundantly used in different applications from Bioinformatics, Image and Signal processing, Textual Analysis, Dimensional Reduction techniques etc.

However, it is often the case that the data matrix, on which SVD is generally applied on, contains outliers which are not in accord with the data generating mechanism. In such a case, usual SVD performs poorly in a sense that the singular values and the left and right singular vectors are found to be very different from the ones that would have been obtained if the data matrix was free of outliers. Hence, the dire need of a robust version of SVD is extremely prevalent, since hardly any data in practice becomes free of any type of outliers.

For illustration, consider the simple \(4\times 3\) matrix, where the elements go from \(1\) to \(12\).

X <- matrix(1:12, nrow = 4, ncol = 3, byrow = TRUE)
X
#>      [,1] [,2] [,3]
#> [1,]    1    2    3
#> [2,]    4    5    6
#> [3,]    7    8    9
#> [4,]   10   11   12

and the singular value decomposition turns out the singular values as approximately \(25, 1.3\) and \(0\).

svd(X)
#> $d
#> [1] 2.546241e+01 1.290662e+00 1.809728e-15
#> 
#> $u
#>            [,1]        [,2]       [,3]
#> [1,] -0.1408767 -0.82471435  0.5470490
#> [2,] -0.3439463 -0.42626394 -0.7091593
#> [3,] -0.5470159 -0.02781353 -0.2228286
#> [4,] -0.7500855  0.37063688  0.3849388
#> 
#> $v
#>            [,1]        [,2]       [,3]
#> [1,] -0.5045331  0.76077568 -0.4082483
#> [2,] -0.5745157  0.05714052  0.8164966
#> [3,] -0.6444983 -0.64649464 -0.4082483

Now, note what happens when we contaminate a single entry of the matrix by a large outlying value.

X[2, 2] <- 100
svd(X)
#> $d
#> [1] 101.431313  18.313121   1.148165
#> 
#> $u
#>             [,1]       [,2]          [,3]
#> [1,] -0.02260136  0.1500488  0.9516017926
#> [2,] -0.98805726 -0.1540849  0.0008289283
#> [3,] -0.08969187  0.5758535  0.1322532569
#> [4,] -0.12323712  0.7887559 -0.2774210109
#> 
#> $v
#>             [,1]        [,2]         [,3]
#> [1,] -0.05752705  0.62535728 -0.778215212
#> [2,] -0.99499917 -0.09966888 -0.006539692
#> [3,] -0.08165348  0.77394728  0.627963626

All the singular values are now much different, being \(101.4, 18.3\) and \(1.14\). However, in practical cases, where \(X\) actually represent a data matrix, this can pose a serious problem.

On the other hand, using rSVDdpd function from rsvddpd package enables us a mitigate the effect of this outlier.

rSVDdpd(X, alpha = 0.3, nd = min(dim(X)))
#> $d
#> [1] 25.4729849  1.2901302  0.0281141
#> 
#> $u
#>           [,1]        [,2]       [,3]
#> [1,] 0.1408239 -0.82504019  0.4775446
#> [2,] 0.3450082 -0.42538715 -0.8366691
#> [3,] 0.5467886 -0.02772993  0.2395725
#> [4,] 0.7497734  0.37092520  0.1205864
#> 
#> $v
#>           [,1]        [,2]       [,3]
#> [1,] 0.5043111  0.76326256  0.4038571
#> [2,] 0.5749972  0.05210479 -0.8164946
#> [3,] 0.6442426 -0.64398397  0.4125968

Since the function does some randomized initialization under the hood, the result might not be exactly same when you run the code again. However, you should get the singular values pretty close to the singular values of the original \(X\) before we added the outlier.

Theoretical Background

Let us take a look at what rSVDdpd does under the hood. Before that, singular value decomposition (SVD) of a matrix \(X\) is splitting it as;

\[X_{n\times p} = U_{n \times r} D_{r\times r}V_{p\times r}^T\] Here, \(r\) is the rank of the matrix \(X\), \(D\) is a diagonal matrix with non-negative real entries, and \(U, V\) are orthogonal matrices. Since, we usually observe data matrix \(X\) with errors, the model ends up being \(X = UDV^T + \epsilon\), where \(\epsilon\) is the errors.

For simplicity, we consider \(r = 1\), i.e. \(X \approx \lambda ab^T\), where \(a, b\) are vectors of appropriate dimensions. The usual SVD can be viewed as solving the problem \(\sum_{i, j} (X_{ij} - \lambda a_i b_j)^2\), with respect to the choices of \(a_i, b_j\)’s and \(\lambda\). This \(L_2\) norm is essentially susceptible to outliers, hence people have generally tried to use \(L_1\) norm instead and tried to minimize that.

Here, we use Density Power Divergence (which is popularly used in robust estimation techniques bridging robustness and efficiency) to quantify the norm of the error. In particular, we try to minimize the function,

\[ H = \int \phi\left( \dfrac{x - \lambda a_ib_j}{\sigma} \right)^{(1 + \alpha)}dx - \dfrac{1}{np} \sum_{i=1}^{n} \sum_{j = 1}^{p} \phi\left( \dfrac{X_{ij} - \lambda a_ib_j}{\sigma} \right)^{\alpha} \] with respect to the unknowns \(\lambda, a_i, b_j\) and \(\sigma^2\), where \(\phi(\cdot)\) is the standard normal density function. However, since the above problem is actually non-convex, but is convex when one of \(a_i\)’s or \(b_j\)’s are held fixed, we iterate between situations fixing \(a_i\)’s and \(b_j\)’s and finding minimum of the other quantities respectively.

Features

Underflow and Overflow

Because of the usage of standard normal density, and exponential functions, the usual algorithm suffers from underflow and overflow and the estimates tend to become NAN or Inf in some iterations for reasonably large or reasonably small values in the data matrix. To deal with this, rSVDdpd function first scales all elements of the data matrix to a suitable range, and then perform the robust SVD algorithm. Finally, the scaling factor can be adjusted to obtain the original singular values.

rSVDdpd(X * 1e6, alpha  = 0.3, nd = min(dim(X)))
#> $d
#> [1] 25472982.72  1290130.41    28103.39
#> 
#> $u
#>           [,1]        [,2]       [,3]
#> [1,] 0.1408240  0.82504006  0.4775448
#> [2,] 0.3450075  0.42538758 -0.8366691
#> [3,] 0.5467887  0.02773002  0.2395721
#> [4,] 0.7497736 -0.37092500  0.1205858
#> 
#> $v
#>           [,1]       [,2]       [,3]
#> [1,] 0.5043112 -0.7632614  0.4038593
#> [2,] 0.5749971 -0.0521072 -0.8164945
#> [3,] 0.6442427  0.6439852  0.4125948
rSVDdpd(X * 1e-6, alpha = 0.3, nd = min(dim(X)))
#> $d
#> [1] 2.547298e-05 1.290130e-06 2.810305e-08
#> 
#> $u
#>           [,1]        [,2]       [,3]
#> [1,] 0.1408240 -0.82504005  0.4775448
#> [2,] 0.3450075 -0.42538759 -0.8366691
#> [3,] 0.5467887 -0.02773002  0.2395721
#> [4,] 0.7497736  0.37092499  0.1205858
#> 
#> $v
#>           [,1]        [,2]       [,3]
#> [1,] 0.5043112  0.76326133  0.4038593
#> [2,] 0.5749971  0.05210728 -0.8164945
#> [3,] 0.6442427 -0.64398523  0.4125947

As it can be seen, the function rSVDdpd handles the very large or very small elements nicely.

Permutation Invariance

Y <- X[, c(3, 1, 2)]
rSVDdpd(Y, alpha = 0.3, nd = min(dim(Y)))
#> $d
#> [1] 25.4729908  1.2901295  0.0281431
#> 
#> $u
#>           [,1]        [,2]       [,3]
#> [1,] 0.1408238 -0.82504056  0.4775440
#> [2,] 0.3450100 -0.42538598 -0.8366689
#> [3,] 0.5467882 -0.02772969  0.2395734
#> [4,] 0.7497729  0.37092574  0.1205881
#> 
#> $v
#>           [,1]        [,2]       [,3]
#> [1,] 0.6442425 -0.64398064  0.4126022
#> [2,] 0.5043110  0.76326582  0.4038511
#> [3,] 0.5749974  0.05209824 -0.8164948

As expected, the singular values do not change when the columns of the data matrix is permuted, however, the singular vector permutes in the same manner of the permutation of the columns.

Orthogonality of Left and Right Singular vectors

An important property of SVD is that the matrix corresponding to the left and right singular vectors are orthogonal matrices. A sanity check of this property can also be verified very easily.

crossprod(rSVDdpd(X, alpha = 0.3, nd = min(dim(X)))$u)
#>              [,1]         [,2]         [,3]
#> [1,] 1.000000e+00 1.790184e-17 5.949705e-17
#> [2,] 1.790184e-17 1.000000e+00 2.229803e-17
#> [3,] 5.949705e-17 2.229803e-17 1.000000e+00

As it seems, the off diagonal entries are very small values. This is ensured by introducing a Gram Schimdt Orthogonalization step between successive iterations of the algorithm.

Effect of Robustness Parameter

In presence of outliers with large deviation, the performance of rSVDdpd is fairly robust to the choice of \(\alpha\), the robustness parameter. With \(\alpha = 0\), rSVDdpd corresponds to usual svd function from base package. However, with increasing \(\alpha\), the robustness increases, i.e. even a smaller deviation would not affect the singular values, while with higher \(\alpha\), the variance of the estimators generally increase.

To demonstrate the effect of \(\alpha\) on time complexity, microbenchmark package will be used.

microbenchmark::microbenchmark(svd(X), 
                               rSVDdpd(X, alpha = 0, nd = min(dim(X))), 
                               rSVDdpd(X, alpha = 0.25, nd = min(dim(X))), 
                               rSVDdpd(X, alpha = 0.5, nd = min(dim(X))), 
                               rSVDdpd(X, alpha = 0.75, nd = min(dim(X))), 
                               rSVDdpd(X, alpha = 1, nd = min(dim(X))), times = 30)
#> Unit: microseconds
#>                                        expr    min     lq     mean  median
#>                                      svd(X) 21.716 22.985 27.00220 24.4995
#>     rSVDdpd(X, alpha = 0, nd = min(dim(X))) 58.455 62.050 66.48873 63.5805
#>  rSVDdpd(X, alpha = 0.25, nd = min(dim(X))) 61.119 63.700 67.72633 65.0255
#>   rSVDdpd(X, alpha = 0.5, nd = min(dim(X))) 64.058 65.938 70.73547 67.5605
#>  rSVDdpd(X, alpha = 0.75, nd = min(dim(X))) 64.195 65.638 69.89830 67.4305
#>     rSVDdpd(X, alpha = 1, nd = min(dim(X))) 65.196 68.484 78.82820 70.0765
#>      uq     max neval
#>  27.016  76.213    30
#>  65.774 100.318    30
#>  67.598 117.060    30
#>  74.313  90.802    30
#>  69.845 119.757    30
#>  74.162 266.342    30

Therefore, the execution time slightly increases with higher \(\alpha\).

Comparison with existing packages

To compare performances of usual SVD algorithm with that of rSVDdpd, one can use simSVD function, which is used to simulate data matrices based on a model and then obtain an estimate of Bias and MSE of the estimates using a Monte Carlo approach.

First, we create the true data matrix, with singular vectors taken from coefficients of orthogonal polynomials.

U <- as.matrix(stats::contr.poly(10)[, 1:3])
V <- as.matrix(stats::contr.poly(4)[, 1:3])
trueSVD <- list(d = c(10, 5, 3), u = U, v = V)  # true svd of the data matrix

We can now call simSVD function to see the performance of usual SVD algorithm under contamination from outlier.

res <- simSVD(trueSVD, svdfun = svd, B = 100, seed = 2021, outlier = TRUE, out_value = 25, tau = 0.9)

res
#> $Bias
#> [1] 28.20376 21.39073 11.20089
#> 
#> $MSE
#> [1] 845.4570 527.2264 208.7107
#> 
#> $Variance
#> [1] 50.00523 69.66283 83.25073
#> 
#> $Left
#> [1] 0.6713628 0.7145263 0.7491816
#> 
#> $Right
#> [1] 0.5724405 0.5660693 0.6095304

Following is the performance of robustSvd function from pcaMethods package.

res <- simSVD(trueSVD, svdfun = pcaMethods::robustSvd, B = 100, seed = 2021, outlier = TRUE, out_value = 25, tau = 0.9)

res
#> $Bias
#> [1] 17.98868 14.14000 17.30844
#> 
#> $MSE
#> [1] 534.3642 358.2193 433.7734
#> 
#> $Variance
#> [1] 210.7715 158.2798 134.1911
#> 
#> $Left
#> [1] 0.4884228 0.6558614 0.6970538
#> 
#> $Right
#> [1] 0.4289678 0.5361160 0.5130100

Now we compare rSVDdpd function’s performance with the other SVD implementations.

rSVDdpd_max <- function(X, alpha) {
  return(rSVDdpd(X, alpha = alpha, nd = min(dim(X))))
}
res <- simSVD(trueSVD, svdfun = rSVDdpd_max, B = 100, seed = 2021, outlier = TRUE, out_value = 25, tau = 0.9, alpha = 0.25)

res
#> $Bias
#> [1]  3.47466190  0.05686266 -0.22769722
#> 
#> $MSE
#> [1] 104.9849391   0.6843969   0.2378147
#> 
#> $Variance
#> [1] 92.9116638  0.6811636  0.1859687
#> 
#> $Left
#> [1] 0.08385757 0.09691123 0.10080858
#> 
#> $Right
#> [1] 0.05255743 0.07546599 0.08511308

And with \(\alpha = 0.75\), we have;

res <- simSVD(trueSVD, svdfun = rSVDdpd_max, B = 100, seed = 2021, outlier = TRUE, out_value = 25, tau = 0.9, alpha = 0.75)

res
#> $Bias
#> [1]  0.1903417 -0.1843705 -0.2716241
#> 
#> $MSE
#> [1] 0.4822773 0.1357128 0.1671783
#> 
#> $Variance
#> [1] 0.44604734 0.10172035 0.09339867
#> 
#> $Left
#> [1] 0.01421574 0.03210005 0.04942867
#> 
#> $Right
#> [1] 0.006691835 0.020217527 0.029774606

As it can be seen, the bias and MSE are much lesser in rSVDdpd algorithm.

Introduction to rSVDdpd