Structured Matrix Factorization

Dec 7, 2019

One application of this work is to segment videos of neural Calcium image data into temporal signals and spatial segmentations.

Matrix factorization models are very wide-spread in machine learning and encompass many classical and fundamental algorithms with Principal Component Analysis (PCA), Non-Negative Matrix Factorization (NMF), and Sparse Dictionary Learning (SDL) being a few well-known examples. In the most basic form, matrix factorization models simply mean that given some data matrix $Y$ we wish to find an approximation (or exact factorization) $Y \approx U V^\top$. More generally, we might also wish to perform linear regression or make some other prediction about $Y$ via a predictor $Y \approx f(UV^\top,X)$ that depends on some additional data $X$ problems ¹ can be captured via an optimization problem of the form:

$$\min_{U,V} \ell(Y, f(UV^\top, X)) + \lambda \Theta(U,V)$$

where $\ell$ is some loss function which measures the distance between the prediction $\widehat Y = f(UV^\top, X)$ and $Y$ and $\Theta$ is some function which promotes various properties or constraints in $(U,V)$. A key challenge in matrix factorization problems is that the above optimization function is inherently non-convex largely regardless of the choice of $\ell$, $f$, or $\Theta$ due to the fact that the matrix product $UV^\top$ destroys convexity.

For some problems, such as PCA, one is fortunate and the resulting non-convex optimization problem can still be solved, but for most other matrix factorization models the optimization problem can be challenging to solve.

Structured Factorization

Beyond the inherent non-convexity, in many other matrix factorization problems we don’t wish to just have a generic factorization $UV^\top$, but we also desire for $U$ and $V$ to satisfy various properties. For example, in NMF $U$ and $V$ are constrained to be non-negative, while in SDL we require one of the factors to be sparse (i.e., most of the entries are 0), which is usually accomplished through an appropriate choice of the $\Theta$ function.

In this work we study the optimization problems associated of structured matrix factorization by linking it to a related convex optimization problem in the product space (i.e., $Z = UV^\top$). Specifically, we consider the case where $\Theta$ can be defined as a sum over the a function that acts on the columns of $(U,V)$,

$$ \Theta(U,V) = \sum_{i=1}^r \theta(U_i,V_i) $$

where $(U_i, V_i)$ denotes the $i^\text{th}$ column of $(U,V)$, $r$ is the number of columns in $(U,V)$. If $\theta$ satisfies a few basic properties, then it is possible to define a convex function (convex w.r.t. $Z$)

$$ \Omega_{\theta} (Z) \equiv { \min_{r,U,V} \sum_{i=1}^r \theta(U_i,V_i) \ \ \text{s.t.} \ \ Z = UV^\top } $$

which, given a matrix $Z$, attempts to find the factorization of $Z$ into $UV^\top$ over all possible factorizations which minimizes the sum of $\theta$ terms on the columns of $(U,V)$. From this, one can then define an optimization problem on $Z$, which is tightly linked with our original problem of interest on ($U,V$):

$$ \min_{Z} \ell(Y, f(Z, X)) + \lambda \Omega_{\theta}(Z) $$

Further, because $\Omega_{\theta}(Z)$ is convex w.r.t. $Z$, then if $\ell(Y, f(Z,X))$ is convex w.r.t. $Z$ (as is often the case) then the overall problem above is convex w.r.t. $Z$ as well. From the definition of $\Omega_{\theta}$ this above objective is also tightly linked to our non-convex objective of interest w.r.t. $(U,V)$. As a result, we can use tools from convex analysis to study the convex objective (w.r.t. $Z$) and using the link of $\Omega_{\theta}$ provide theoretical guarantees about the non-convex problem over $(U,V)$.

This whole approach can also be generalized in some situations to factorizations involving more than one matrix (e.g., $Z = R U V^\top$ where we optimize over $R$, $U$, and $V$), which arises in some applications like seperable dictionary learning, where we can provide polynomial time guarantees for solving the non-convex objective.

References

Haeffele and Vidal. “Structured low-rank matrix factorization: Global optimality, algorithms, and applications.” IEEE PAMI. (2019)

Schwab, Haeffele, Vidal, Charon. “Global optimality in separable dictionary learning with applications to the analysis of diffusion mri.” SIAM J on Imaging Sciences. (2019)

Haeffele, Young, and Vidal. “Structured Low-Rank Matrix Factorization: Optimality, Algorithm, and Applications to Image Processing.” ICML (2014)

Note this can be generalized even further if there are additional variables $Q$ which are ‘unfactorized’. For example, in Robust PCA there is a matrix of sparse outliers. This can also be handeled fairly simply if the overall objective is jointly convex w.r.t. $(Q,Z)$. ↩︎

Associate Research Scientist

My research interests include machine learning, computer vision, optimization, computational imaging, and biomedical applications.