# GREY BALLARD

The software provides parallel algorithms for performing arithmetic
on tensors in tensor train (TT) format. The arithmetic operations include addition,
elementwise multiplication, and norm computation, which are performed in way that exploits
and maintains the TT structure of the tensors. The truncation of TT ranks is computed via
a rounding routine that produces an approximation satisfying a specified error tolerance. It is written in C and MPI and relies on BLAS and LAPACK.

PLANC solves the nonnegative matrix factorization problem:
approximating a matrix as a product of two matrices whose entries are nonnegative.
The code is written in C++ and uses OpenMP for shared-memory parallelism and MPI for
distributed-memory parallelism. It incorporates multiple alternating-updating techniques,
including multiplicative updates, hierarchical alternating least squares, and block
principal pivoting. It uses Armadillo to interface with BLAS and LAPACK libraries.

TuckerMPI computes the Tucker decomposition of dense tensors using
the Sequentially Truncated Higher-Order Singular Value Decomposition algorithm. It is
designed for use in distributed memory but can also be used on a single node. The code
is written in C++ with MPI. It also relies on BLAS and LAPACK libraries.

This software contains Matlab and C implementations for computing
only the largest entries of a matrix-matrix product. The C implementations rely on a
mex interface to MATLAB and use subroutines from the CSparse library. The main algorithm
is randomized, based on wedge or diamond sampling, but it can also compute the largest
entries deterministically (with reduced memory footprint).

This software contains implementations of fast matrix multiplication
algorithms (those that require fewer than the classical O(n^3) arithmetic operations)
for sequential and shared-memory parallel environments. The code is written in C++ and
can incorporate any fast algorithm that can be expressed by three matrices U, V, and W.
It relies on BLAS's DGEMM for the implementation of the classical algorithm.

This code is used to help solve the symmetric eigenvalue problem
by using similarity transformations to reduce a symmetric band matrix to symmetric
tridiagonal form (maintaining the eigenvalues). It is written in C and uses OpenMP
for shared-memory parallelization. It also includes MATLAB code for visualization of
the algorithm.

### Other Software

I've also made minor contributions to the following libraries: