In this thesis, we have proposed efficient implementations of linear algebra kernels such as matrix-vector and matrix-matrix multiplications by formulating arithmetic calculations in terms of diagonals and thereby giving an orientation-neutral (column-/row-major layout) computational scheme. Matrix elements are accessed with stride-1 and no indirect referencing is involved. Access to the transposed matrix requires no additional effort. The proposed storage scheme handles dense matrices and matrices with special structures such as banded, symmetric in a uniform manner. Test results from numerical experiments with OpenMP implementation are promising. We also show that, using our diagonal framework, Java native arrays can yield superior computational performance. We present two alternative implementations for matrix-matrix multiplication operation in Java. The results from numerical testing demonstrate the advantage of our proposed methods.