Constrained Low-Rank
Matrix and Tensor Approximations

MS04 at SIAM Conference on Applied Linear Algebra 2018

The 2018 SIAM ALA conference took place May 4-8 at Hong Kong Baptist University in Hong Kong. We organized a two-part minisymposium on constrained low-rank approximations of matrices and tensors. The conference website is here. The speakers have agreed to share their slides on this webpage.

Abstract. Constrained low rank matrix and tensor approximations are extremely useful in large-scale data analytics with applications across data mining, signal processing, statistics, and machine learning. Tensors are multidimensional arrays, or generalizations of matrices to more than two dimensions. The talks in this minisymposium will span various matrix and tensor decompositions and discuss applications and algorithms, as well as available software, with a particular focus on computing solutions that satisfy application-dependent constraints.

Organizers

Grey Ballard (Wake Forest University)
Ramakrishnan Kannan (Oak Ridge National Laboratory)
Haesun Park (Georgia Institute of Technology)

Joint Nonnegative Matrix Factorization for Hybrid Clustering based on Content and Connection Structure

Haesun Park (Georgia Institute of Technology)

Abstract. A hybrid method called JointNMF is presented for latent information discovery from data sets that contain both text content and connection structure information. The method jointly optimizes an integrated objective function, which is a combination the Nonnegative Matrix Factorization (NMF) objective function for handling text content and the Symmetric NMF (SymNMF) objective function for handling relation/connection information. An effective algorithm for the joint NMF objective function is proposed utilizing the block coordinate descent (BCD) framework. The proposed hybrid method simultaneously discovers content associations and related latent connections without any need for post-processing or additional clustering. It is shown that JointNMF can also be applied when the text content is associated with hypergraph edges. An additional capability is prediction of unknown connection information which is illustrated using some real world problems such as citation recommendations of papers and leader detection in organizations. This is a joint work with Rundong Du, Barry Drake.

Tensor Decompositions for Big Multi-Aspect Data Analytics

Vagelis Papalexakis (University of California Riverside)

Abstract. Tensors and tensor decompositions have been very popular and effective tools for analyzing multi-aspect data in a wide variety of fields, ranging from Psychology to Chemometrics, and from Signal Processing to Data Mining and Machine Learning. Using tensors in the era of big data poses the challenge of scalability and efficiency. In this talk, I will discuss recent techniques on tackling this challenge by parallelizing and speeding up tensor decompositions, especially for very sparse datasets (such as the ones encountered for example in online social network analysis). In addition to scalability, I will also touch upon the challenge of unsupervised quality assessment, where in absence of ground truth, we seek to automatically select the decomposition model that captures best the structure in our data. The talk will conclude with a discussion on future research directions and open problems in tensors for big data analytics.

Speeding Up Tensor Contractions Through Extended BLAS Kernels

Yang Shi (University of California Irvine)

Abstract. Tensor contractions constitute a key computational ingredient of numerical multi-linear algebra. However, as the order and dimension of tensors grow, the time and space complexities of tensor-based computations grow quickly. Existing approaches for tensor contractions typically involves explicit copy and transpose operations. In this paper, we propose and evaluate a new BLAS-like primitive StridedBatchedGemm that is capable of performing a wide range of tensor contractions on CPU and GPU efficiently. Through systematic benchmarking, we demonstrate the advantages of our approach over conventional approaches. Concretely, we implement the Tucker decomposition and show that using our kernels yields 100x speedup as compared to the implementation using existing state-of-the-art libraries.

SUSTain: Scalable Unsupervised Scoring for Tensors and Its Application to Phenotyping

Kimis Perros (Georgia Institute of Technology)

Abstract. We present a new method, which we call SUSTain, that extends real-valued matrix and tensor factorizations to data where values are integers. Such data are common when the values correspond to event counts or ordinal measures. The conventional approach is to treat integer data as real, and then apply real-valued factorizations. However, doing so fails to preserve important characteristics of the original data, thereby making it hard to interpret the results. SUSTain outperforms several baselines on both synthetic and real Electronic Health Records (EHR) data, showing either a better fit or orders-of-magnitude speedups at a comparable fit. We apply SUSTain to EHR datasets to extract patient phenotypes (i.e., clinically meaningful patient clusters). Furthermore, 87% of them were validated as clinically meaningful phenotypes related to heart failure by a cardiologist.

Accelerating the Tucker Decomposition with Compressed Sparse Tensors

George Karypis (University of Minnesota)

Abstract. The Tucker decomposition is a higher-order analogue of the singular value decomposition and is a popular method of performing analysis on multi-way data (i.e., tensors). Computing the Tucker decomposition of a sparse tensor is demanding in terms of both memory and computational resources. The primary kernel of the factorization is a chain of tensor-matrix multiplications (TTMc). State-of-the-art algorithms accelerate the underlying computations by trading off memory to memoize the intermediate results of TTMc in order to reuse them across iterations. We present an algorithm based on a compressed data structure for sparse tensors and show that many computational redundancies during TTMc can be identified and pruned without the memory overheads of memoization. In addition, our algorithm can further reduce the number of operations by exploiting an additional amount of user-specified memory. We evaluate our algorithm on a collection of real-world and synthetic datasets and demonstrate up to 20.7x speedup while using 28.5x less memory than the state-of-the-art parallel algorithm.

Efficient CP-ALS and Reconstruction from CP Form

Jed Duersch (Sandia National Laboratories)

Abstract. The Canonical Polyadic (CP) decomposition is an essential tool in the analysis of multi-way datasets. Just as the singular value decomposition (SVD) can be used to analyze and approximate matrices, the CP decomposition generalizes the SVD to multi-dimensional arrays. It can be used for data compression, for feature extraction, and to fill in missing entries. The alternating least-squares algorithm CP-ALS is the standard means of computing the CP decomposition. We present a high-performance reformulation of CP-ALS that substantially reduces memory requirements and improves performance on large dense tensors. This is accomplished by reformulating a critical kernel called matricized-tensor times Khatri-Rao product (MTTKRP). This operation is restructured to avoid large intermediate Khatrio-Rao products. We then exploit the restructured formulation to reuse partial computations as the algorithm alternates through factor matrix updates. This new approach runs up to 20 times faster. We apply the same technique to reconstruct the full tensor from the CP formulation. The new reconstruction mechanism also reduces both memory requirements and run time by an order of magnitude over the previous approach.

Non-Negative Sparse Tensor Decomposition on Distributed Systems

Jiajia Li (Georgia Institute of Technology)

Abstract. Sparse tensor decompositions have emerged as a promising analysis technique in a variety of applications for real-world sparse data. This work focuses on modeling of sparse count data which is described by Poisson distribution. Two distributed parallel sparse CANDECOMP/PARAFAC alternating Poisson regression (CP-APR) implementations are developed based on two sparse tensor formats: coordinate (COO) and our recently proposed hierarchical coordinate (HiCOO) formats. We achieve up to 95x speedup on 320 IBM POWER8 cores over the sequential code.

Communication-Optimal Algorithms for CP Decompositions of Dense Tensors

Grey Ballard (Wake Forest University)

Abstract. The CP decomposition is a generalization of the matrix SVD to tensors, or multidimensional data arrays. We will present communication lower bounds and optimal algorithms for the fundamental computations used within algorithms for computing CP decompositions. We will consider both sequential and parallel memory models, and we will discuss the communication costs of existing algorithms for variations of the matricized-tensor times Khatri-Rao product (MTTKRP) computation.

Constrained Low-Rank Matrix and Tensor Approximations