GPU Accelerator, Part 3
The GPU Accelerator capability in ANSYS 13.0 can also be used for the PCG iterative solver. However, one will usually find a much greater performance boost when using GPU Accelerator with the sparse direct solver. One of the reasons is due to the fact that running GPU Accelerator with the PCG solver may be constrained by the problem size since the matrix must fit into the graphic card’s memory.
The PCG solver always runs in memory, unlike the sparse direct solver that has in-core and out-of-core modes. With the GPU Accelerator capability at 13.0, the MSAVE,ON
command cannot be presently be used, so more memory is required to keep the assembled global stiffness matrix. This matrix must be small enough to fit in memory on the graphics card — otherwise, one will get the warning message below, indicating that GPU Accelerator, though requested, has been skipped:
*** WARNING *** The assembled matrix size (3683.8 MB) exceeds the physical memory available on the GPU accelerator device (2014.4 MB). The GPU accelerator device option is disabled.
The Tesla C2050 has 3 GB of memory while the Tesla C2070 has 6 GB of memory. Consequently, if one uses the Tesla C2050 or any other GPU that has 3 GB of memory or less, the size of the problem may be limited — for these medium-sized problems that already run quite fast on 2 cores, including GPU Acceleration may only make the solution run 20-30% faster.
As a point of comparison, the author’s graphics card only has 2 GB of memory. An 880k DOF structural model was solved that fit in the 2 GB of memory, but with two cores, the solver time only reduced from 33 seconds to 27 seconds. The overall time was reduced by this difference of 5 seconds, as expected, from 63 to 57 seconds. Hence, a model that already runs quite fast won’t see much benefit with GPU Accelerator.
The JCG solver also supports GPU Acceleration. The JCG solver is an option used for thermal analyses, and the preconditioner is much simpler than the PCG case. Consequently, one can solve a 2 million DOF thermal model with GPU Acceleration on a graphics card with 2 GB of memory. With 2 cores, the solver time with and without GPU Acceleration was 130 vs. 260 seconds, so we see a factor of 2 difference in this case.
The GPU Accelerator capability can be used for iterative solvers, not just direct solvers. However, unlike direct solvers, the entire matrix must fit in the graphic card’s memory, so unless one is using the Tesla C2070 GPU with 6 GB of memory, one may be limited by the problem size that can be used with GPU Accelerator.
Comments are closed.