Part 1 -- Basic Concepts and Skills 1. Introduction 2. Data-Parallel Programming 3. Data Parallel Execution Models 4. Memory Hierarchy 5. Performance Considerations
Part 2 -- Parallel Algorithm Patterns 6. Parallel Programming and Computational Thinking 7. Parallel Patterns -- Stencil Computation 8. Parallel Patterns -- Reduction Trees and Prefix Sum with an Introduction to work efficiency of parallel algorithms 9. Parallel Patterns -- Sorting with introduction to load balance considerations 10. Parallel Patterns -- Sparse Matrix-Vector Multiplication with introduction to data compression 11. Parallel Patterns -- Parallel Histogramming with introduction to atomic operations and privatization 12. Parallel Patterns -- Parallel Graph Algorithms with introduction to dynamic parallelism 13. Computational Neural Networks 14. Numerical Issues in Parallel Algorithms -- Floating-Point Considerations 15. Conclusion and Future Outlook
Appendix A. Introduction to OpenCL Appendix B. Parallel Programming with OpenACC Appendix C. A productivity Oriented Library for CUDA Appendix D. CUDA FORTRAN Appendix E. An Introduction to C++AMP Appendix F. Programming a Heterogeneous Cluster Appendix G. New Features in Kepler Appendix H. Matrix Multiplication Host-Only Version Source Code Appendix I. GPU Compute Capabilities