Structured Matrix Factorization

One application of this work is to segment videos of neural Calcium image data into temporal signals and spatial segmentations.

Matrix factorization models are very wide-spread in machine learning and encompass many classical and fundamental algorithms with Principal Component Analysis (PCA), Non-Negative Matrix Factorization (NMF), and Sparse Dictionary Learning (SDL) being a few well-known examples. In the most basic form, matrix factorization models simply mean that given some data matrix $Y$ we wish to find an approximation (or exact factorization) $Y \approx U V^\top$. More generally, we might also wish to perform linear regression or make some other prediction about $Y$ via a predictor $Y \approx f(UV^\top,X)$ that depends on some additional data $X$ problems 1 can be captured via an optimization problem of the form:

$$\min_{U,V} \ell(Y, f(UV^\top, X)) + \lambda \Theta(U,V)$$

where $\ell$ is some loss function which measures the distance between the prediction $\widehat Y = f(UV^\top, X)$ and $Y$ and $\Theta$ is some function which promotes various properties or constraints in $(U,V)$. A key challenge in matrix factorization problems is that the above optimization function is inherently non-convex largely regardless of the choice of $\ell$, $f$, or $\Theta$ due to the fact that the matrix product $UV^\top$ destroys convexity.

For some problems, such as PCA, one is fortunate and the resulting non-convex optimization problem can still be solved, but for most other matrix factorization models the optimization problem can be challenging to solve.

Structured Factorization

Beyond the inherent non-convexity, in many other matrix factorization problems we don’t wish to just have a generic factorization $UV^\top$, but we also desire for $U$ and $V$ to satisfy various properties. For example, in NMF $U$ and $V$ are constrained to be non-negative, while in SDL we require one of the factors to be sparse (i.e., most of the entries are 0), which is usually accomplished through an appropriate choice of the $\Theta$ function.

In this work we study the optimization problems associated of structured matrix factorization by linking it to a related convex optimization problem in the product space (i.e., $Z = UV^\top$). Specifically, we consider the case where $\Theta$ can be defined as a sum over the a function that acts on the columns of $(U,V)$,

$$ \Theta(U,V) = \sum_{i=1}^r \theta(U_i,V_i) $$

where $(U_i, V_i)$ denotes the $i^\text{th}$ column of $(U,V)$, $r$ is the number of columns in $(U,V)$. If $\theta$ satisfies a few basic properties, then it is possible to define a convex function (convex w.r.t. $Z$)

$$ \Omega_{\theta} (Z) \equiv { \min_{r,U,V} \sum_{i=1}^r \theta(U_i,V_i) \ \ \text{s.t.} \ \ Z = UV^\top } $$

which, given a matrix $Z$, attempts to find the factorization of $Z$ into $UV^\top$ over all possible factorizations which minimizes the sum of $\theta$ terms on the columns of $(U,V)$. From this, one can then define an optimization problem on $Z$, which is tightly linked with our original problem of interest on ($U,V$):

$$ \min_{Z} \ell(Y, f(Z, X)) + \lambda \Omega_{\theta}(Z) $$

Further, because $\Omega_{\theta}(Z)$ is convex w.r.t. $Z$, then if $\ell(Y, f(Z,X))$ is convex w.r.t. $Z$ (as is often the case) then the overall problem above is convex w.r.t. $Z$ as well. From the definition of $\Omega_{\theta}$ this above objective is also tightly linked to our non-convex objective of interest w.r.t. $(U,V)$. As a result, we can use tools from convex analysis to study the convex objective (w.r.t. $Z$) and using the link of $\Omega_{\theta}$ provide theoretical guarantees about the non-convex problem over $(U,V)$.

This whole approach can also be generalized in some situations to factorizations involving more than one matrix (e.g., $Z = R U V^\top$ where we optimize over $R$, $U$, and $V$), which arises in some applications like seperable dictionary learning, where we can provide polynomial time guarantees for solving the non-convex objective.

References

Haeffele and Vidal. “Structured low-rank matrix factorization: Global optimality, algorithms, and applications.” IEEE PAMI. (2019)

Schwab, Haeffele, Vidal, Charon. “Global optimality in separable dictionary learning with applications to the analysis of diffusion mri.” SIAM J on Imaging Sciences. (2019)

Haeffele, Young, and Vidal. “Structured Low-Rank Matrix Factorization: Optimality, Algorithm, and Applications to Image Processing.” ICML (2014)


  1. Note this can be generalized even further if there are additional variables $Q$ which are ‘unfactorized’. For example, in Robust PCA there is a matrix of sparse outliers. This can also be handeled fairly simply if the overall objective is jointly convex w.r.t. $(Q,Z)$. ↩︎

Associate Research Scientist

My research interests include machine learning, computer vision, optimization, computational imaging, and biomedical applications.