As noted in the Mechanical APDL Help manual, in the Advanced Analysis Techniques Guide Ch. 16 “GPU Accelerator Capability”, when using GPUs, the entire solution does not use the GPU cores – only the equation solver portion is off-loaded to the GPU.  GPUs cannot interpret data the same way as CPUs, so data needs to be prepared for the GPUs.  Because of this, it does not make sense to keep sending data back and forth between the CPU and GPU for each phase of the analysis, as that may lead to performance degradation rather than a performance boost.

When using the sparse direct solver, check the jobname.BCS file for details on the benefit of using GPUs.  For example, run an analysis with 2 cores, then run the same analysis with 2 cores + GPU.  You will see that most of the lines indicating solution time will be very similar between the two cases except for the time (cpu & wall) for numeric factor entry.  This gives information on the actual matrix factorization time, and this is where the use of GPUs will come into play.

When using 4 cores with GPU Accelerator compared with 2 cores with GPU Accelerator, one may notice that the factorization time may not change significantly since, in both cases, the same GPU is being used.  However, when using 4 cores, other aspects of the solution (e.g., matrix formulation and assembly, stress recovery, etc.) are using more cores, so 4 cores with GPU Accelerator should still be faster than 2 cores with GPU Accelerator for medium- or large-sized models.  (This reason also explains why the performance increase with GPU Accelerator for 1 CPU core can be at least 3-4x faster since, instead of 1 CPU core, the factorization is done on the tens or hundreds of GPU cores.)

For small-sized problems, use of GPU Accelerator may not show performance benefit but may even show some slight performance loss.  The processing power of an individual GPU core is usually much lower than the CPU core; however, the GPU has many more cores than the CPU does.  Hence, for small-sized models, using GPUs may not be any faster than the CPU or may be slower (also recall that the data has to be packaged and sent to the GPU, too).  However, for large models, being able to distribute the solution to tens or hundreds of GPU cores allows the solution to speed up considerably.  Hence, if doing benchmarks, consider the number of degrees of freedom (DOF) for the model being solved.  If the number of DOF is < 250k, that may be too small of a problem to see the benefit of GPU Accelerator.