GPU implementation of singular value decomposition for high rank tensors

Abstract

Programming using the Python API (application programming interface) offers some advantages over using compiled languages. Here we implement a high rank tensor decomposition routine using the TensorFlow library which has native support for utilizing multi-core CPU, GPU, and TPU hardware. Specifically, a singular value decomposition algorithm was performed on a rank-5 tensor. The performance of this Python implementation was compared with a known C++ based library written specifically for tensor manipulations but without native GPU support. We report some use cases where the implementation on a consumer grade GPU was empirically faster than the C++ based library when the rank-5 tensor has more than 2x10E6 elements. With the acceptable performance of the implementation, it may be beneficial to have have a native implementation of tensor network operations on TensorFlow.