Tensor Decompositions

The 2021 SIAM ALA conference took place May 24-28 virtually. Nick Vannieuwenhoven and Grey Ballard organized a three-part minisymposium on improving efficiency and accuracy of algorithms for computing various tensor decompositions. The conference website is here. The speakers have agreed to share their slides on this webpage.

**Abstract.**
There has been much recent progress in improving efficiency and accuracy of algorithms
for computing tensor decompositions, including canonical polyadic decomposition, Tucker,
Tensor Train, Tensor Networks, and data fusion models. The goal of this minisymposium is
to highlight these algorithmic advances and bring together a diverse set of approaches
with a variety of decompositions to establish connections across fields and identify new
techniques.

- Grey Ballard (Wake Forest University)
- Nick Vannieuwenhoven (Katholieke Universiteit Leuven)

**Abstract.**
We discuss the a randomized variant of the matricized-tensor-times-Khari-Rao-product (MTTKRP) tensor kernel and its theoretical properties.

**Abstract.**
We describe a simple, black-box compression format for tensors with a multiscale structure. By representing the tensor as a sum of compressed tensors defined on increasingly coarse grids, we capture low-rank structures on each grid-scale, and we show how this leads to an increase in compression for a fixed accuracy. We devise an alternating algorithm to represent a given tensor in the multiresolution format and prove local convergence guarantees. In two dimensions, we provide examples that show that this approach can beat the Eckart-Young theorem, and for dimensions higher than two, we achieve higher compression than the tensor-train format on six real-world datasets. We also provide results on the closedness and stability of the tensor format and discuss how to perform common linear algebra operations on the level of the compressed tensors.

**Abstract.**
We describe two sets of approximated algorithms for accelerating alternating least squares (ALS) for tensor decompositions. First, we introduce a sketched ALS algorithm for Tucker decomposition, which solves a sequence of sketched rank-constrained linear least-squares subproblems. This new ALS algorithm, combined with a new initialization scheme based on randomized range finder, yields up to 22.0% relative decomposition residual improvement compared to the state-of-the-art sketched randomized algorithm for Tucker decomposition of various synthetic datasets. Second, we introduce a novel family of algorithms, called pairwise perturbation, that uses perturbative corrections to the subproblems rather than recomputing the tensor contractions. Results show improvements of up to 3.1X of pairwise perturbation with respect to state-of-the-art alternating least squares approaches for various model tensor problems and real datasets.

**Abstract.**
This talk introduces a number of recent advances on the computational front of large-scale low-rank tensor decomposition with structured latent factors (or structured tensor factorization). Structured tensor factorization is well-motivated for a number of reasons, e.g., establishing model uniqueness and enhancing factor interpretability. Nonetheless, from an optimization viewpoint, the mathematical programming problems associated with structured factorization are challenging. These problems are nonconvex (and often nonsmooth)—and in many cases they even do not admit ‘nice’ properties such as the gradient-Lipschitz continuity. Applying general-purpose computational tools may not be able to attain satisfactory performance in terms of speed and accuracy, especially in the era of data deluge. In recent years, a series of algorithms have been proposed for the problem of interest. These methods integrate the ideas from large-scale optimization for machine learning, classic computational linear algebra, and tensor multilinear algebra to come up with effective and scalable algorithms. In this talk, we offer a unified nonconvex optimization perspective for a number of recently developed representative structured tensor decomposition algorithms and discuss their properties, e.g., computational complexity, memory cost, and convergence characterizations.

**Abstract.**
We characterize the sensitivity of the output in low-rank matrix approximation with respect to perturbations in the input. The characterization is in terms of a condition number. We show that the condition number for low-rank approximation depends on the singular value gap of the input matrix. This complements prior results by Hackbusch and by Drineas and Ipsen on the sensitivity of low-rank matrix approximation. In addition, we characterize the sensitivity of approximating an incomplete matrix by a low-rank matrix. Incomplete means that some entries of the input matrix are not known or not accessible. We give an algorithm for computing the associated condition number and demonstrate in experiments how the rate of missing data affects it.

**Abstract.**
The canonical polyadic decomposition (CPD) of a low rank tensor plays a major role in data analysis and signal processing by allowing for unique recovery of underlying factors. However, it is well known a tensor may fail to have a best rank low rank CPD approximation. In the case a best rank low rank approximation does exist, an important question is how to compute this approximation.

We view the CPD model as a “joint generalized eigenvalue" problem. Roughly speaking, a low rank CPD can be viewed as a diagonalization in this sense, and, mirroring the classical case, a tensor is guaranteed to be diagonalizable (i.e. have low rank) if the tensor has distinct joint generalized eigenvalues. Thus existence can be understood in terms of perturbation theory for the joint generalized eigenvalue problem. Using this approach we provide computable deterministic bounds which guarantee that a given tensor has a best low rank approximation over either the real or complex field.

In addition, our approach suggests a natural extension of the classical generalized eigenvalue decomposition for algebraic computation of a CPD to a “generalized eigenspace decomposition". While the classical method uses a single pencil to compute a CPD and suffers from pencil based instability, our method uses multiple pencils from the tensor and performs only stable computations in each pencil. The proposed method significantly outperforms the classical method in terms of factor matrix error.

**Abstract.**
An increasing amount of data collected are high-dimensional and efficient learning algorithms must utilize the tensorial structure as much as possible. The ever-present curse of dimensionality for high dimensional data and the loss of structure when vectorizing the data motivates the use of tailored low-rank tensor methods. We propose an algorithm of computing a Canonical Polyadic (CP) decomposition by avoiding the NP-hard issue of finding the best CP rank by computing first a Tensor Train (TT) decomposition and call it TT-CP factorization. Along with it, we define a nonlinear classification machine learning model. We build a full Gradient Descent Primal (GDP) optimization problem which takes initialized variables from the partial GDP model optimized via Support Tensor Machines (STM). In turn, the full GDP enhances a potential suboptimal CP decomposition computed in the first step. This leads to better classification accuracy and a reliable deterministic algorithm for computing the nonlinear boundary, each step of which admits a reasonable explanation. Hence, the full GDP can thus be seen as both a tensor decomposition method tailored to the classification problem, and a classification method that exploits the low-rank model of the data. With numerical examples, we show that this approach benchmarked than other state-of-the-art techniques.

**Abstract.**
Computing good low rank approximations of high-order tensors is an important problem in numerical optimization, known to be both theoretically and practically challenging in general. Block-coordinate descent methods constitute a large part of the literature on this problem. Speeding up these methods by means of parallel computing, sketching/randomization or extrapolation has been an intensive research topic of the last decade.

In this work, we provide a framework to potentially speed-up any block-coordinate descent algorithm for computing constrained approximate CPD. We show that we may accelerate any block-coordinate scheme. We propose to extrapolate the various block estimates at each outer loop (by opposition to using extrapolation when solving each block individually) with restart. The extrapolation technique is inspired from [Accelerating nonnegative matrix factorization algorithms using extrapolation, A. M. S. Ang and N. Gillis, 2019], already shown to accelerate block-coordinate algorithms for nonnegative matrix factorization.

We show on a large set of synthetic data and real-world data that, empirically, the proposed approach accelerates any block-coordinate descent algorithm we could think of, in particular when computing nonnegative CPD. However the proposed framework is a heuristic without convergence guaranties. We will also discuss the various links between the proposed heuristic work and recent/older extrapolated block-coordinate descent algorithms.

**Abstract.**
We present efficient and scalable parallel algorithms for performing mathematical operations for low-rank tensors represented in the tensor train (TT) format. We consider algorithms for addition, elementwise multiplication, computing norms and inner products, orthogonalization, and rounding (rank truncation). These are the kernel operations for applications such as iterative Krylov solvers that exploit the TT structure. The parallel algorithms are designed for distributed-memory computation, and we use a data distribution and strategy that parallelizes computations for individual cores within the TT format. We analyze the computation and communication costs of the proposed algorithms to show their scalability, and we present numerical experiments that demonstrate their efficiency on both shared-memory and distributed-memory parallel systems. For example, we observe better single-core performance than the existing MATLAB TT-Toolbox in rounding a 2GB TT tensor, and our implementation achieves a 34× speedup using all 40 cores of a single node. We also show nearly linear parallel scaling on larger TT tensors up to over 10,000 cores for all mathematical operations.

**Abstract.**
To reduce complexity and facilitate the treatment of large-scale data sets, multiway data can be approximately represented by a low-rank tensor decomposition. However, the search for a low-rank representation for a given tensor may be challenging due to measurement noise present in the data. In this talk, we approach the low-rank tensor approximation problem from a Bayesian perspective. This leads to a probabilistic interpretation of the well-known iterative algorithm alternating linear scheme (ALS). In this way, the assumptions on the measurement noise are considered and the uncertainty of each tensor decomposition component is quantified. Furthermore, prior knowledge can be explicitly taken into account and the resulting low-rank tensor comes with a measure of uncertainty.

**Abstract.**
The Tensor Train (TT) format is a highly compact low-rank representation for high-dimensional tensors. TT is useful in particular in representing approximations to the solution of certain types of parametrized partial differential equations. The fundamental operation used to maintain feasible memory and computational time is called rounding, which truncates the internal ranks of a tensor already in TT format. We propose several randomized algorithms for this task that are generalizations of randomized low-rank matrix approximation algorithms and provide significant reduction in computation compared to deterministic TT rounding algorithms. Randomization is particularly effective in the case of rounding a sum of TT tensors, which is the bottleneck computation in the adaptation of GMRES to vectors in TT format. In this talk, we will present the randomized algorithms and compare their empirical accuracy and computational time with deterministic alternatives.

**Abstract.**
Many applications in integral equations, spatial statistics, and machine learning yield dense matrices whose entries model pairwise interactions (using a problem-specific kernel) between a set of points in two or three spatial dimensions. When the number of points are large, these matrices are challenging or even prohibitively expensive to work with. We can efficiently approximate these dense matrices using rank-structured matrices that compress and store certain off-diagonal blocks in a low-rank format. When compressing the off-diagonal blocks to obtain this format, the rank-r SVD gives the optimal rank-r approximation, but it is computationally expensive. Fast low-rank algorithms have been developed, but these methods can still be expensive with regards to computational cost and storage. In this presentation, we show new, tensor-based approaches for computing low-rank matrix representations that are less expensive computationally than standard methods. We will discuss the details of the method as well as numerical results supporting our algorithm.