This event has ended. View the official site or create your own event → Check it out
This event has ended. Create your own
View analytic
Wednesday, July 20 • 4:30pm - 5:00pm
AD: Efficient Primitives for Standard Tensor Linear Algebra

Sign up or log in to save this to your schedule and see who's attending!

This paper presents the design and implementation of low-levellibrary to compute general sums and products over multi-dimensional arrays (tensors). Using only 3 low-level functions, the API at once generalizes core BLAS1-3 as well as eliminates the need for most tensor transpositions. Despite their relatively low operation count, we show that these transposition steps can become performance limiting in typical use cases for BLAS on tensors. The execution of the present API achieves peak performance on the same order of magnitude (teraflops) as for vendor-optimized GEMM by utilizing a code generator to output CUDA source code for all computational kernels. The outline for these kernels is a multi-dimensional generalization of the MAGMA BLAS matrix multiplication on GPUs. Separate transpositions steps can be skipped because every kernel allows arbitrary multi-dimensional transpositions of the arguments. The library, including its methodology and programming techniques, are made available in SLACK. Future improvements to the library include a high-level interface to translate directly from a \LaTeX{}-like equation syntax to a data-parallel computation.


Wednesday July 20, 2016 4:30pm - 5:00pm
Chopin Ballroom

Attendees (5)