Numerical Solutions of Heat and Mass Transfer with the First Kind Boundary and Initial Conditions in Capillary Porous Cylinder Using Programmable Graphics Hardware

Recently, heat and mass transfer simulation is more and more important in various engineering fields. In order to analyze how heat and mass transfer in a thermal environment, heat and mass transfer simulation is needed. However, it is too much time-consuming to obtain numerical solutions to heat and mass transfer equations. Therefore, in this paper, one of acceleration techniques developed in the graphics community that exploits a graphics processing unit (GPU) is applied to the numerical solutions of heat and mass transfer equations. The nVidia Compute Unified Device Architecture (CUDA) programming model provides a straightforward means of describing inherently parallel computations. This paper improves the performance of solving heat and mass transfer equations over capillary porous cylinder with the first boundary and initial conditions numerically running on GPU. Heat and mass transfer simulation using the novel CUDA platform on nVidia Quadro FX 4800 is implemented. Our experimental results clearly show that GPU can accurately perform heat and mass transfer simulation. GPU can significantly accelerate the performance with the maximum observed speedups 10 times. Therefore, the GPU is a good approach to accelerate the heat and mass transfer simulation. Keywords—Genereal: Numerical Solution; Heat and Mass Transfer; General Purpose Graphics Processing Unit; CUDA


INTRODUCTION
During the last half century, many scientists and engineers working in Heat and Mass Transfer processes have put lots of efforts in finding solutions both analytically/numerically, and experimentally.To precisely analyze physical behaviors of heat and mass environment, to simulate several heat and mass transfer phenomena such as heat conduction, convection, and radiation are very important.A heat transfer simulation is accomplished by utilizing parallel computer resources to simulate such heat and mass transfer phenomena.With the helps from computer, initially the sequential solutions were found, and later when high-end computers became available, fast solutions were obtained to heat and mass transfer problems.However, the heat and mass transfer simulation requires much more computing resources than the other simulations.Therefore, acceleration of this simulation is very essential to implement a practical big data size heat and mass transfer simulation.This paper utilizes the parallel computing power of GPUs to speedup the heat and mass transfer simulation.GPUs are very efficient considering theoretical peak floating-point operation rates [1].
Therefore, comparing with supercomputer, GPUs is a powerful co-processor on a common PC which is ready to simulate a large-scale heat and mass transfer at a less resources.The GPU has several advantages over CPU architectures, such as highly parallel, computation intensive workloads, including higher bandwidth, higher floating-point throughput.The GPU can be an attractive alternative to clusters or super-computer in high performance computing areas.
CUDA [2] by nVidia already proved its effort to develop both programming and memory models.CUDA is a new parallel, C-like language programming Application program interface (API), which bypasses the rendering interface and avoids the difficulties from using GPGPU.Parallel computations are expressed as general-purpose, C-like language kernels operating in parallel over all the points in a application.This paper develops the numerical solutions to Two-point Initial-Boundary Value Problems (TIBVP) of Heat and Mass with the first boundary and initial conditions in capillary porous cylinder.
These problems can be found some applications in drying processes, space science, absorption of nutrients, transpiration cooling of space vehicles at re-entry phase, and many other scientific and engineering problems.Although some traditional approaches of parallel processing to the solutions of some of these problems have been investigated, no one seems to have explored the high performance computing solutions to heat and mass transfer problems with compact multi-processing capabilities of GPU, which integrates multi-processors on a chip.With the advantages of this compact technology, we developed algorithms to find the solution of TIBVP with the first boundary and initial conditions and compare with some existing solutions to the same problems.All of our www.ijacsa.thesai.orgexperimental results show significant performance speedups.The maximum observed speedups are about 10 times.
The rest of the paper is organized as follow: Section II briefly introduces some closely related work; Section III describes the basic information on GPU and CUDA; Section IV presents the mathematical model of heat and mass transfer and numerical solutions to heat and mass transfer equations; Section V presents our experimental results; And Section VI concludes this paper and give some possible future work directions.

II. RELATED WORK
The simulation of heat and mass transfer has been a very hot topic for many years.And there is lots of work related to this field, such as fluid and air flow simulation.We just refer to some most recent work close to this field here.
Soviet Union was in the fore-front for exploring the coupled Heat and Mass Transfer in Porous media, and major advances were made at Heat and Mass Transfer Institute at Minsk, BSSR.Later England and India took the lead and made further contributions for analytical and numerical solutions to certain problems.Narang [4][5][6][7][8][9] explored the wavelet solutions to heat and mass transfer equations and Ambethkar [10] explored the numerical solutions to some of these problems.
Krüger et al. [11] computed the basic linear algebra problems with the feathers of programmability of fragments on GPU, and further computed the 2D wavelets equations and NSEs on GPU.Bolz et al. [12] matched the sparse matrix into textures on GPU, and utilized the multigrid method to solve the fluid problem.In the meantime, Goodnight et al. [13] used the multigrid method to solve the boundary value problems on GPU.Harris [14,15] solved the PDEs of dynamic fluid motion to get cloud animation.
GPU is also used to solve other kinds of PDEs by other researchers.Kim et al. [16] solved the crystal formation equations on GPU.Lefohn et al. [17] matched the level-set iso surface data into a dynamic sparse texture format.Another creative usage was to pack the information of the next active tiles into a vector message, which was used to control the vertices and texture coordinates needed to send from CPU to GPU.To learn more applications about general-purpose computations GPU, more information can be found from here [18].

III. AN OVERVIEW OF CUDA ARCHITECTURE
The GPU that we have used in our implementations is nVidia's Quadro FX 4800, which is DirectX 10 compliant.Quadro FX 4800 is one of nVidia's fastest processors that support the CUDA API.All CUDA compatible devices support 64-bit integer processing.An important consideration for GPU performance is its level of occupancy.Occupancy refers to the number of threads available for execution at any one time.It is normally desirable to have a high level of occupancy as it facilitates the hiding of memory latency.
The GPU memory architecture is shown in figure 1.

A. Mathematical Model
Considering the Heat and Mass Transfer through a porous cylinder with boundary conditions of the first kind, let the zaxis be directed upward along the cylinder and the r-axis radius of the cylinder.Let u and v be the velocity components along the z-and r-axes respectively.Then the heat and mass transfer equations in the Boussinesq's approximation are: A prescribed constant temperature and concentration supplied by the hot plate at the left end X=0 of the cylinder, the initial and boundary conditions of the problem are: From Equation (1) we observe that 1 v is independent of space co-ordinates and may be taken as constant.We define the following non-dimensional variables and parameters.

Numerical Solutions
Here we sought a solution by finite difference technique of implicit type namely Crank-Nicolson implicit finite difference method which is always convergent and stable.This method has been used to solve Equations (9), and (10) subject to the conditions given by ( 11), (12) and (13).To obtain the difference equations, the region of the heat is divided into a gird or mesh of lines parallel to The finite difference approximation of Equations ( 9) and ( 10) are obtained with substituting Equation ( 14) into Equations ( 9) and ( 10) and multiplying both sides by t  and after simplifying, we let Dt Dz ( ) 2 = r ' = 1 (method is always stable and convergent), under this condition the above equations can be written as:

A. Experiments Setup and Device Configuration
The experiment was executed using the CUDA Runtime Library, Quadro FX 4800 graphics card, Intel Core 2 Duo.The programming interface used was Visual Studio.
The experiments were performed using a 64-bit Lenovo ThinkStation D20 with an Intel Xeon CPU E5520 with processor speed of 2.27 GHZ and physical RAM of 4.00GB.The Graphics Processing Unit (GPU) used was an NVIDIA Quadro FX 4800 with the following specifications: Device to Device Bandwitdh: 57509.6 (MB/s) In the experiments, we considered solving heat and mass transfer differential equations in capillary porous cylinder with boundary conditions of the first kind using numerical methods.Our main purpose here was to obtain numerical solutions for Temperature T, and concentration C distributions across the various points in a cylinder as heat and mass are transferred from one end of the cylinder to the other.For our experiment, we compared the similarity of the CPU and GPU results.We also compared the performance of the CPU and GPU in terms of processing times of these results.
In the experimental setup, we are given the initial temperature T 0 and concentration C 0 at point z = 0 on the cylinder.Also, there is a constant temperature and concentration N 0 constantly working the surface of the cylinder.The temperature at the other end of the cylinder where z = ∞ is assumed to be ambient temperature (assumed to be zero).Also, the concentration at the other end of the cylinder where z = ∞ is assumed to be negligible (≈ 0).Our initial problem was to derive the temperature T 1 and concentration C 1 associated with the initial temperature and concentration respectively.We did this by employing the finite difference technique.Hence, we obtained total initial temperature of (T 0 + T 1 ) and total initial concentration of (C 0 + C 1 ) at z = 0.These total initial conditions were then used to perform calculations.
For the purpose of implementation, we assumed a fixed length of the cylinder and varied the number of nodal points N to be determined in the cylinder.Since N is inversely proportional to the step size ∆z, increasing N decreases ∆z and therefore more accurate results are obtained with larger values of N. For easy implementation in Visual Studio, we employed the Forward Euler Method (FEM) for forward calculation of the temperature and concentration distributions at each nodal point in both the CPU and GPU.For a given array of size N, the nodal points are calculated iteratively until the values of temperature and concentration become stable.
In this experiment, we performed the iteration for 10 different time steps.After the tenth step, the values of the temperature and concentration became stable and are recorded.We run the tests for several different values of N and ∆z and the error between the GPU and CPU calculated results were increasingly smaller as N increased.Finally, our results were normalized in both the GPU and CPU.

B. Experimental Results
The normalized temperature and concentration distributions at various points in the cylinder are depicted in Table 1 and Table 2 respectively.We can immediately see that, at each point in the cylinder, the CPU and GPU computed results are similar.In addition, the value of temperature is highest and the value of concentration is lowest at the point on the cylinder where the heat resource and mass resource are constantly applied.As we move away from this point, the values of the temperature decrease and concentration increase.At a point near the designated end of the cylinder, the values of the temperature approach zero and concentration approach one.www.ijacsa.thesai.orgFurthermore, we also evaluated the performance of the GPU (nVIDIA Quadro FX 4800) in terms of solving heat and mass transfer equations by comparing its execution time to that of the CPU (Intel Xeon E5520).
For the purpose of measuring the execution time, the same functions were implemented in both the device (GPU) and the host (CPU), to initialize the temperature and concentration and to compute the numerical solutions.In this case, we measured the processing time for different values of N. The graph in Figure 5 depicts the performance of the GPU versus the CPU in terms of the processing time.We run the test for N running from 15 to 10000 and generally, the GPU performed the calculations a lot faster than the CPU.

-
When N < 2500, the CPU performed faster than the GPU.-When N > 2500 the GPU performance began to increase considerably Figure 5 shows some of our experimental results.Finally, the accuracy of our numerical solution was dependent on the number of iterations we performed in calculating each nodal point, where more iteration means more accurate results.In our experiment, we observed that after 9 or 10 iterations, the solution to the heat and mass equation at a given point became stable.For optimal performance, and to keep the number of iterations the same for both CPU and GPU, we used 10 iterations.

VI. CONCLUSION AND FUTURE WORK
We have presented our work on the numerical solutions of the heat and mass transfer equations with the first kind of boundary and initial conditions using finite difference method on GPGPUs.We implemented numerical solutions to heat and mass transfer equations by using GPGPU on nVidia CUDA.We have demonstrated GPU can perform significantly faster than CPU in the field of mass caculations.Our experimental results also indicate that our GPU-based implementation offers a significant faster over CPU-based implementation and the maximum observed speedups are about 10 times.
There are several directions for future work.We would like to test our implantations on new generation GPUs and explore the new performance improvements offered by newer generations of GPUs.It would also be interesting to run more experiments with big data set.Finally, further attempts will be made to explore more complicated problems such as different boundary conditions and hollow cylinder geometry.

Fig. 1 .
Fig. 1.GPU Memory Architecture [2] Since the cylinder is assumed to be porous, 1  is the velocity of the fluid, p T the temperature of the fluid near the cylinder,  T the temperature of the fluid far away from the cylinder, p C the concentration near the cylinder,  C the concentration far away from the cylinder, g the acceleration due to gravity,  the coefficient of volume expansion for heat transfer, '  the coefficient of volume expansion for concentration,  the kinematic viscosity,  the scalar electrical conductivity,  the frequency of oscillation, k the thermal conductivity.
z and r axes.Solutions of difference equations are obtained at the intersection of these mesh lines called nodes.The values of the dependent variables T , and C at the nodal points along the plane 0 from the boundary conditions.In the figure 2, Dz , Dr are constant mesh sizes along z and r directions respectively.We need an algorithm to find single values at next time level in terms of known values at an earlier time level.A forward difference approximation for the first order partial derivatives of T and C .And a central difference approximation for the second order partial derivative of T and C are used.On introducing finite difference approximations for:

Figure 3
Figure 3 Shows the temperature distribution in the cylinder with 4 different normalized radiuses

TABLE I .
COMPARISON OF GPU AND CPU RESULTS (TEMPRETURE)