2 The Weight Finder - An Advanced Profiler for Fortran Programs.- 2.1 Introduction.- 2.2 Prerequisite.- 2.3 The Weight Finder.- 2.3.1 Choosing sequential program parameters.- 2.3.2 Instrumentation.- 2.3.3 Optimization.- 2.3.4 Compile and Execute.- 2.3.5 Attribute and Visualize.- 2.4 Adaptation of Profile Data.- 2.4.1 Program transformations.- 2.4.2 Problem Size.- 2.5 Conclusion and Future Work.- 3 Predicting Execution Times of Sequential Scientific Kernels.- 3.1 Motivation.- 3.2 Deriving time formulae for code fragments.- 3.3 Obtaining a platform model.- 3.4 Examples.- 3.4.1 Fragment A.- 3.4.2 Fragment B.- 3.4.3 Fragment C.- 3.4.4 Fragment D.- 3.4.5 Fragment E.- 3.4.6 Fragment F.- 3.4.7 Summary of results.- 3.5 Discussion and Further Work.- 4 Isolating the Reasons for the Performance of Parallel Machines on Numerical Programs.- 4.1 Introduction.- 4.2 Micro Measurements.- 4.2.1 Micro Measurements for a Node Processor.- 4.2.2 Micro Measurements for Communication Networks.- 4.3 Measurements.- 4.3.1 Measurements of the Serial Kernels.- 4.3.2 Measurements of the Parallel Kernels.- 4.4 Algorithms.- 4.4.1 CG-method.- 4.4.2 PDE1-method.- 4.4.3 PDE2-method.- 4.5 Analysis of the Programs.- 4.5.1 Serial Versions.- 4.5.2 Parallel Versions.- 4.6 Conclusion.- 5 Targeting Transputer Systems, Past and Future.- 5.1 Introduction.- 5.2 The T800 family.- 5.3 The T9000 family.- 5.4 The Chameleon family.- 6 Adaptor: A Compilation System for Data Parallel Fortran Programs.- 6.1 Introduction.- 6.2 The Adaptor Compilation System.- 6.2.1 Properties of Adaptor.- 6.2.2 Overview of Adaptor.- 6.2.3 The Input Language.- 6.2.4 Programming Models for the Generated Programs.- 6.2.5 Interactive Source-to-Source Transformation.- 6.2.6 Realization of the Translation.- 6.2.7 Distributed Array Library.- 6.2.8 Visualization of the Run Time Behavior.- 6.2.9 Availability.- 6.2.10 Related Work.- 6.3 Results of Benchmark Codes.- 6.3.1 The Purdue Set.- 6.3.2 Comparison of Sequential and Parallel Version.- 6.3.3 Efficiency and Scalability.- 6.3.4 Adaptor vs. hand-coded message passing programs.- 6.3.5 Full vs. Loosely Synchronous Execution.- 6.4 Results of Application Codes.- 6.4.1 HYDFLO: a CM Fortran Code for Fluid Dynamics.- 6.4.2 ESM: a Fortran 90 Code for Circulation.- 6.4.3 IFS: a Fortran 77 Code for Weather Prediction.- 6.5 Summary.- 7 SNAP! Prototyping a Sequential and Numerical Application Parallelizer.- 7.1 Introduction.- 7.2 Compiler.- 7.2.1 Front-End for FORTRAN.- 7.2.2 Dependence Analysis.- 7.2.3 Alignment analysis.- 7.2.4 Parallelizer.- 7.2.5 Code generation.- 7.3 Conclusions.- 8 Knowledge-Based Automatic Parallelization by Pattern Recognition.- 8.1 Introduction and Overview.- 8.2 Preprocessing the Source Code.- 8.3 Which Patterns are Supported?.- 8.4 Pattern Recognition: A Detailed View.- 8.4.1 Program Representation.- 8.4.2 Pattern Hierarchy Graph.- 8.4.3 The Matching Algorithm.- 8.4.4 Standard Pattern Matching: A simple example.- 8.4.5 Removing redundant IF statements.- 8.4.6 Loop Rerolling.- 8.4.7 Difference Stars.- 8.4.8 Beyond standard matching: Identification of multigrid hierarchies.- 8.5 A Parallel Algorithm for each Pattern.- 8.6 Alignment and Partitioning.- 8.7 Determining Cost Functions: Estimating and Benchmarking.- 8.8 Implementation and Future Extensions.- 8.9 Conclusions.- 9 Automatic Data Layout for Distributed-Memory Machines in the D Programming Environment.- 9.1 Introduction.- 9.2 Compilation system.- 9.3 Dynamic Data Layout: Two Examples.- 9.4 Towards Dynamic Data Layout.- 9.4.1 Alignment Analysis.- 9.4.2 Distribution Analysis.- 9.4.3 Inter-Phase Decomposition Analysis.- 9.5 Related Work.- 9.6 Summary and Future Work.- 10 Subspace Optimizations.- 10.1 Introduction.- 10.1.1 Data Optimization.- 10.1.2 Shapes.- 10.2 Subspaces.- 10.3 Subspace Changes.- 10.3.1 Scalars.- 10.3.2 Control Expressions.- 10.3.3 Array Sections.- 10.3.4 Explicit Dimensions.- 10.3.5 Reductions.- 10.4 Subspace Optimizations.- 10.4.1 Relative Costs.- 10.4.2 Subspace Minimization.- 10.4.3 Subspace Minimization with other Types of Expansion.- 10.4.4 Combining Multiple Expansions.- 10.4.5 Expansion Strength Reduction.- 10.4.6 Expansion Costs.- 10.4.7 Reducing the Computation within Expansions.- 10.5 Subspaces Optimization Compared to Alignment.- 10.6 Summary.- 10.7 Acknowledgments.- 11 Data and Process Alignment in Modula-2*.- 11.1 Introduction.- 11.2 Modula-2*.- 11.2.1 FORALL statement.- 11.2.2 Allocation of array data.- 11.3 Alignment in Modula-2*.- 11.3.1 Data Alignment.- 11.3.2 Process Alignment.- 11.4 Arrangement Graphs and Conflicts.- 11.4.1 Type and Structure.- 11.4.2 Conflicts.- 11.5 Cost Considerations.- 11.6 Example.- 11.7 Conclusion.- 12 Automatic Parallelization for Distributed Memory Multiprocessors.- 12.1 Introduction.- 12.2 Related Work.- 12.3 Overview.- 12.4 Parallelization Strategy.- 12.5 Branch-and-Bound Algorithm.- 12.5.1 Basic Approach.- 12.5.2 Distribution Graph.- 12.5.3 Redistribution during Program Execution.- 12.6 Performance Estimator.- 12.6.1 Transfer costs.- 12.6.2 Combining the transfer costs.- 12.6.3 Data Transfer Graph.- 12.7 Prototype Implementation and Results.- 12.7.1 Implementation.- 12.7.2 Livermore Loops.- 12.7.3 Gauss-Seidel Relaxation.- 12.7.4 Jacobi Relaxation.- 12.8 Conclusions and Further Research.- 12.9 Acknowledgements.- A Trademarks.
موضوع (اسم عام یاعبارت اسمی عام)
موضوع مستند نشده
Codage.
موضوع مستند نشده
Parallel processing (Electronic computers)
رده بندی کنگره
شماره رده
QA76
.
58
نشانه اثر
C475
1994
نام شخص به منزله سر شناسه - (مسئولیت معنوی درجه اول )