国产日韩欧美一区二区三区三州_亚洲少妇熟女av_久久久久亚洲av国产精品_波多野结衣网站一区二区_亚洲欧美色片在线91_国产亚洲精品精品国产优播av_日本一区二区三区波多野结衣 _久久国产av不卡

?

Parallel Control-volume Method Based on Compact Local Integrated RBFs for the Solution of Fluid Flow Problems

2014-04-14 08:15:37PhamSyTranNMaiDuyandTranCong

N.Pham-Sy,C.-D.TranN.Mai-Duyand T.Tran-Cong

1 Introduction

Integrated Radial Basis Function(IRBF)method was proposed by Mai-Duy and Tran-Cong(2001)as an alternative to the RBF interpolation of scattered data by Kansa(1990),which is here referred to as the differential RBF(DRBF)method.Thanks to the integration approach,IRBF method is shown to have higher order of accuracy than the DRBF method.The convergence rate was then boosted by using local/compact local schemes to form tridiagonal matrix[Hoang-Trieu,Mai-Duy,and Tran-Cong(2012);Chandhini and Sanyasiraju(2007)].

The IRBF method is further developed in combination with control-volume(CV)method[Patankar(1980)]to improve the accuracy of the solution in non-rectangular domains[Mai-Duy and Tran-Cong(2010)].With the inherent conservation of mass,momentum and energy over control volumes,this method has been shown to be a very effective way to deal with domains with complex boundary.In this paper,the CV method is employed in combination with two-dimensional(2D)IRBF to simulate fluid flow in the triangular cavity problem[Kohno and Bathe(2006)].

Since the scale of practical engineering problems is huge in terms of degrees offreedom, modern computational mechanics has begun to embrace parallel paradigms.With the help of parallelisation,the shortage of memory and computational power is being addressed.What remains challenging is parallel algorithms,which is the main focus of this paper.The method being considered is a domain decomposition(DD)method,which is one of the most popular methods for solving large scale problems[Quarteroni and Valli(1999)].The DD method is used to split the computational domain into smaller sub-domains which are solved separately.Originally,sub-domains are solved sequentially one after another.In order to improve the throughput of the simulation,in this work,the sub-domains are solved in parallel.The DD method being used is the Schwarz additive overlapping DD method and the communication between parallel sub-domains are handled by Matlab-supported Message Passing Interface(MPI).

The paper is organised as follows.In Sections 2 and 3,a brief review of Compact Local IRBF(CLIRBF)and CV method is presented.The DD method is described in Section 4.Numerical results are then given and discussed in Section 5 with a conclusion in Section 6.

2 Local methods based on Integrated Radial Basis Function

In this section,a brief review of several IRBF local approaches,including onedimension(1D)IRBF and two-dimension(2D)local 9-point IRBF stencils,is provided.

2.1 1D-IRBF method

Consider Poisson Partial Differential Equation(PDE)of a simple 2D problem as follows.

whereuis the field variable;x the position vector;? the considered domain andfa known function of x.

By means of IRBF,the highest order derivatives of the PDE,second order in this case,are approximated by a weighted set of RBFs as

2.2 2D IRBF local stencil scheme

In this work,a 9-point stencil scheme is employed to overcome the problem of illconditioned system matrix,which is an inherent problem in the global approach.According to this scheme,a local 9-point stencil for an arbitrary grid-point xi,j(2≤i≤nx?1;2≤j≤ny?1)is described as follows(Fig.1).

wherenx×nyis a Cartesian grid density of the considered domain. More details canbe found in Hoang-Trieu, Mai-Duy, and Tran-Cong (2012). This approximation,coupled with the control volume method, will be presented in the next sections.

3 A control volume method based on 2D IRBFs

In this approach,each grid point is surrounded by a CV and the conservative governing equations are integrated within this volume.Figure 2 shows the CV formation for a regular 2-D domain.In this figure,CVs are bounded by lines parallel to grid lines through the middle points between the reference point and its neighbours.

Figure 1:A 9-point local stencil.

Figure 2:CV formation in 2D.

Consider the 2-D Poisson equation(1).This equation is integrated over a CV,and then by applying the divergence theorem to the resultant equation one gets

where ?sand Γsare the CV under consideration and its surface,respectively;?nthe outward normal unit vector and(.)|(s),(s=l,r,t,b)depicts integrals over the left,right,top and bottom faces of the CV respectively.

Using 5-point Gaussian quadrature scheme to discretise Eq.(8)yields

where αkand ηk(k=i,j)are the weights and Gauss points,respectively.

The 2D Local IRBF approximation scheme mentioned in Section 2.2 is used to approximate the field variable and its derivatives in Eq.(9).Thus,in this approach the governing equations are forced to be satisfied locally over CVs while the boundary conditions are directly imposed using the 1D-IRBF approximant.The procedure leads to an algebraic equation system for unknown nodal values of the field variable as follows.

The conversion of the network-weight space into the physical space is achieved by

inverting Eq.(10)

By substituting Eq.(16)into Eqs.(3)and(4)the first order derivatives ofuwith respect toxandyand the function itself over a local stencil are determined.

In the case of non-rectangular domains,the CV formation for interior points is carried out in a similar way but with some extra treatments for non-rectangular boundaries,which will be detailed in Section 5.2.

4 Parallel domain decomposition method

Domain decomposition has been successfully used to overcome the resource limitation associated with large-scale problems.Its primary objective is to split a large problem domain into small ones called sub-domains in which the problem can be solved more effectively in terms of memory and computing power(Quarteroni and Valli,1999;Tran,Phillips,and Tran-Cong,2009).A notable advantage of DD method in solving numerical problems is that it helps to decrease the condition number of system matrices in sub-domains.As a result,DD method helps to achieve a more stable and accurate solution.Furthermore,with the advance of parallel computing,DD technique finds itself very parallel capable.That potential for parallelisation further encourages more intensive research in DD method in recent decades.

Over the last two decades,researchers have developed parallel algorithms owing to the simplicity of grid generation to significantly increase the throughput of numerical solutions.For example,Singh and Jain(2005)used an Element-Free Galerkin method with moving least-square approximant to solve fluid flow problems.They were able to achieve high efficiency,e.g.91.27%for a 2D problem with 8 CPUs.Shirazaki and Yagawa(1999)proposed a Mesh-Free method based parallel algorithm to solve incompressible viscous flow.They obtained a stable solution to a model with three-million degrees of freedom.However,as the speed-up was separated into two parts,namely the construction of system equations and the time integration,the efficiency of the first part was very high and even super-linear with high number of CPUs,the efficiency of the second part was not able to scale to high number of CPUs.Indeed,the efficiency droped from approximately 98%with 16 CPUs to around 50%with 64 CPUs.Ingber,Chen,and Tanski(2004)combined the method of fundamental solutions and the particular solution method to solve the transient heat conduction problems.The approach was developed using a Schwarz Neumann-Neumann DD based parallel scheme.Although the authors successfully demonstrated the accuracy of parallel algorithm in comparison with the non-parallel version,unfortunately,information regarding the efficiency of parallelisation was not given.

This work is a further development of the Schwarz Additive Overlapping DD technique(Pham-Sy,Tran,Hoang-Trieu,Mai-Duy,and Tran-Cong,2013)using the local stencil IRBF approximants.Here,the local stencil 2D-IRBF based CV method,which is presented in the previous sections,is used to develop the parallel algorithm for solving fluid flow problems.

The additive overlapping DD method is a rather simple but effective method.It also has a high potential for parallelisation as the computation in each subdomain is independent within a time step.In this approach,the original domain is divided into several overlapping sub-domains.The function values on artificial boundaries(ABs)are initially unknown and are set to zeros(initial guess).In each iterative step,the boundary value problem is solved separately in each sub-domain.Then the function value on the artificial boundary of one sub-domain is updated by the solution from other sub-domains.This procedure is repeated until a desired tolerance is achieved.Several specific details on the use of additive overlapping DD technique will be presented briefly in the next sections.

4.1 Sub-domain formation and neighbour identification

The sub-domain formation task is straightforward with a rectangular domain as can be seen in Fig.3.This formation has been reported in our previous work.However,it is also presented here for completeness.

The Matlab’s notation will be utilised as follows.

?lab/worker-a computing node in distributed system.

?lab-index-the lab’s identification in distributed system.This lab-index is used for labs to communicate with each other.

For example,in Fig.3 the original domain is decomposed intoNx×Ny=4×3=12 sub-domains.These sub-domains’lab-indices are enumerated from 1 to 12 and from bottom to top and left to right.Each sub-domain also has a 2D index(i,j)that determines its position in the original domain.In this example,the 2D index of sub-domain 7 is(3,2).In order to determine neighbours,an arbitrary lab(i,j)simply checks the following cases

1.if(j?1)> 0 its left neighbour is lab(i,j?1);

2.if(j+1)≤Nyits right neighbour is lab(i,j+1);

Figure 3:Enumeration in a system of Nx×Ny=4×3 sub-domains of a rectangular domain.

3.if(i+1)≤Nxits top neighbour is lab(i+1,j);

4.if(i?1)> 0 its bottom neighbour is lab(i?1,j).

Unfortunately,with a non-rectangular domain(for example,Fig.4)the above algorithm will not work because some sub-domains will lie outside the considered domain.For example,in Fig.4 sub-domains 8 and 9 are outside of the triangular domain ?.To overcome this situation,one first needs to create a list of subdomains(LSD),along with their 2D indices(as usual,from bottom to top and left to right).Then inside each sub-domain the following conditions must be checked to determine a sub-domain’s neighbours’lab-index.Consider a sub-domain whose 2D index is(i,j)

1.if lab(i,j?1)exists in LSD then it is the left neighbour of lab(i,j);

2.if lab(i,j+1)exists in LSD then it is the right neighbour of lab(i,j);

Figure 4:Enumeration in a system of 7 sub-domains of a triangular domain.

3.if lab(i+1,j)exists in LSD then it is the top neighbour of lab(i,j);

4.if lab(i?1,j)exists in LSD then it is the bottom neighbour of lab(i,j).

A detailed example of this process is provided in Table 1 with 7 sub-domains in a triangular domain presented in Fig.4.

4.2 Communication and Synchronisation

In additive overlapping DD method,one of the critical tasks is the communication between the sub-domains as the function values on the artificial boundary of one sub-domain are obtained from the solution in its neighbouring sub-domains in the previous step.In the present implementation,Matlab built-in parallel communication method is utilised.Matlab communication functions allow to send an array of data to Matlab workers in a synchronized way,which means the sender must wait until the receiver fully receives a message.This mechanism itself guarantees the synchronisation between sub-domains and no extra care is needed to ensure that all sub-domains are always executing the same iterative step.More information about Matlab supported MPI implementation can be found in(MATLAB,2012).

Table 1:Neighbour sub-domain(NB)determination for a triangular domain

4.3 Termination

Since the parallel algorithm presented in this paper is a Distributed Computing Algorithm,it needs to have a termination detection process.This process has been investigated and classified into a unique class of algorithm called Distributed Termination Detection(DTD).In this paper,a bitmap DTD algorithm presented by Pham-Sy,Tran,Hoang-Trieu,Mai-Duy,and Tran-Cong(2013)is employed.This algorithm has several advantages such as symmetric detection,decentralized control and low termination detection delay,and thus ideally suits the implementation of parallel algorithm in the present work.

4.4 Algorithm of the present procedure

The present parallel method is based on the combination of the local stencil 2DIRBF and CV,and the DD technique presented in the previous sections can now be described in an overall algorithm whose flowchart is shown in Fig.5.

5 Numerical results

The proposed method is verified through the simulation of the lid driven cavity(LDC) fluid flow problem for two cases of rectangular and non-rectangular domains.The efficiency of the present method is analyzed.

The lid-driven cavity flow has been commonly used for verification of a numerical method owing to the availability of benchmark solutions in the literature.The problem has also been quite popular among meshless community,e.g.Lin and Atluri(2001)with meshless Local Petrov-Galerkin(MLPG)method;Shu,Ding,and Yeo(2005)with local RBF-based Differential Quadrature method;Chinchapatnam,Djidjeli,and Nair(2007)with RBF;and Kim,Kim,Jun,and Lee(2007)with Meshfree point collocation method.Therefore,in this paper the lid-driven cavity flow is also employed to investigate the accuracy as well as the efficiency of the present parallel scheme.

Figure 5:Algorithm of the Parallel domain decomposition method using the local IRBF based Control Volume approach.

The problem is defined in the stream-function-vorticity formulation as follows.

whereReis the Reynolds number,ψ the stream function;ω the vorticity andtthe time.Thex? andy? velocity components are given byu=The problem is solved using the local 9-point stencil 2D-IRBF scheme as presented in Section 2.2 with the time derivative being discretised using a first-order Euler scheme and the diffusive terms being treated implicitly.The boundary condition for ω is computed through Eq.(18)using 1D global IRBF as described in Section 2.1.

The general procedure for solving a LDC problem is given as follows.

1.Guess the initial values of ω;

2.Solve(18)for ψ;

3.Approximate the values of ω on boundaries and the convective terms;

4.Solve(17)for ω;

5.Check convergence measure for ω.

5.1 Square cavity

For the square cavity problem,the geometry of the analysis domain with the chosen coordinate system is shown in Fig.6.The boundary conditions are given in terms of the stream-function as.

Figure 6:The square lid driven cavity fluid flow problem.Geometry and boundary conditions.No slip is assumed between the fluid and solid surfaces.The top lid is moving from left to right with a speed of 1.

The Dirichlet boundary condition on ψ is used to solve Eq.(18)in step 2,while the Neumann boundary condition on ψ is used to approximate the value of ω on boundaries.The values of ω on boundaries,in turn,are used as boundary conditions to solve Eq.(17)in step 4 above.

The iterative procedure for solving the square cavity problem with parallel DD method is as follows.

1.Divide the analysis domain into a number of sub-domains.Guess initial boundary condition on ABs;

2.Solve the fluid flow problem in each and every sub-domain as described above;

3.Exchange the values of ψ and ω at interfaces with neighbours;

4.Calculate convergence measure on all interfaces;

5.Check for termination condition.

In this paper,the square cavity problem is simulated for a range of Reynolds numbers(Re=100,400,1000 and 3200).Figure 7 depicts streamlines of the flow obtained by the present parallel method using a grid of 151×151 collocation points,?t=1.E?03,DDTol=1.E?06,CMTol=1.E?06 and β=2,4 sub-domains associated with 4 CPUs forRe=100,400,1000(Figs.7(a)-7(c))and 2 subdomains forRe=3200(Fig.7(d)).The results are in very good agreement with those presented in Ghia,Ghia,and Shin(1982)as well as in Botella and Peyret(1998).Similar results can be found for vorticity contours in Fig.8.Furthermore,Fig.9 provides the pro files of the velocities along the vertical and horizontal centrelines by the present method along with the benchmark values from Ghia,Ghia,and Shin(1982).As can be seen,the results match up very well with the benchmark solution.

Figure 7:The square LDC fluid flow problem.Stream-function(ψ)contours of the flow for several Reynolds numbers(Re=100,400,1000 and 3200)by the present parallel CV method using 4 sub-domains for Re=100,400,1000 and 2 sub-domains for Re=3200 with the specifications:grid 151×151,?t=1.E?03,DDTol=1.E?06,CMTol=1.E?06 and β=2.

Figure 8:The square LDC fluid flow problem.Vorticity(ω)contours of the flow for several Reynolds numbers(Re=100,400,1000 and 3200)by the present parallel CV method using 4 sub-domains for Re=100,400,1000 and 2 sub-domains for Re=3200.The other parameters are given in Fig.7.

The efficiency of the present parallel method is assessed using the following criteria:the number of iterations(i),computation time(t),speed-up(spd)(ratio of the computation times using one processor and multi processors)and efficiency(eff)(the ratio of speed-up and the number of CPUs used).A fixed grid 151×151 is chosen to run the problem with various number of CPUs and the results are provided in Tables 2-5 for different Reynolds numbers.Results described in the left hand side of Tables 2-5 show that the computation time of the present parallel method(the P-CV method)for the time-dependent square lid driven cavity problem decrease tremendously as the number of sub-domains increases.An interpretation on the significant improvement of throughput can be found in Pham-Sy,Tran,Hoang-Trieu,Mai-Duy,and Tran-Cong(2013)for the parallel collocation method;a similar interpretation is applicable here.Again,there are always some thresholds(calledcpusopt)over which the increase of number of CPUs influences insignificantly on the efficiency based on all criteria(t,spd and eff).For example,the improvement of efficiency(eff)of computation is not significant anymore as the number of CPUs is more than 49,64,30 and 49(Tables 2-5)for Re numbers 100,400,1000 and 3200,respectively using the grid of 151×151.Furthermore,the tendency of computational efficiency can be found similarly with the present parallel algorithm using the collocation method(named P-C method)(right hand side of the Tables 2-5).

Figure 9:The square LDC fluid flow problem.Pro files of the u velocity along the vertical centreline and the v velocity along the horizontal centreline(solid lines)for several Reynolds numbers(Re=100,400,1000 and 3200)by the present parallel CV method in comparison with the corresponding Ghia’s results(□ for u velocity and○for v velocity).The parameters of the present method are given in Fig.7.

Table 2:The square LDC fluid flow problem.Comparison between parallel CV(P-CV)and parallel collocation(P-C)methods with Re=100,grid=151×151,dt=1.E?03,DDTol=1.E?06,CMTol=1.E?06,β=2.CPUs:number of CPUs(sub-domains);i:number of iterations;t(m):elapsed time(minutes);spd:speed-up;eff:efficiency.The observed super-linear speed up can be explained in terms of reduced matrix condition numbers(see main text).

Table 3:The square LDC fluid flow problem.Comparison between P-CV and P-C methods with Re=400,grid=151×151.CPUs:number of CPUs(sub-domains);i:number of iterations;t(m):elapsedtime(minutes);spd:speed-up;eff:efficiency.Other parameters are given in Table 2.

The efficiency,speed-up and throughput of the present parallel method can be seen visually in Figs.10(a),10(c)and 10(e).These figures also depict the influence of the Reynolds number on the mentioned criteria of the present parallel algorithm with respect to the number of CPUs.For example,the efficiency of the present parallel method is higher for the lower Reynolds numbers.While the throughput increases gradually with respect to the number of sub-domains/CPUs(Fig.10(e)),the gradients of time curves decrease as the number of CPUs is more than around 20.This is also indicated by the efficiency curves given in Fig.10(a).Similar trends of the efficiency,speed-up and throughput are also obtained by the present parallel algorithm using the collocation method and given in Figs.10(b),10(d)and 10(f).It shows that the choice of the scale for sub-domains/CPUs plays an important rolein the performance of parallel computation schemes for a given problem.

Table 4:The square LDC fluid flow problem.Comparison between P-CV and P-C methods with Re=1000,grid=151×151.CPUs:number of CPUs(subdomains);i:number of iterations;t(m):elapsed time(minutes);spd:speed-up;eff:efficiency.Other parameters are given in Table 2.

Table 5:The square LDC fluid flow problem.Comparison between P-CV and P-C methods with Re=3200,grid=151×151.CPUs:number of CPUs(subdomains);i:number of iterations;t(m):elapsed time(minutes);spd:speed-up;eff:efficiency.Other parameters are given in Table 2.

Table 6:The square LDC fluid flow problem.Condition numbersCNωandCNψin single and parallel solutions with Re=100 and grid=151×151.CPUs:number of CPUs(sub-domains).

It is observed that a super-linear speed-up is achieved using a range of numbers of CPUs 30,36,42 and 49 with corresponding efficiencies 101%,110%,111%and 115%for the Reynolds numberRe=100(Table 2).This is an exclusive behavior and sometimes controversial in classical parallel computing,when the speed-up is higher than the number of CPUs used in a parallel algorithm.For these cases,the super-linear speed-up is considered to be related to the decrease of condition number of each subdomain which plays a crucial role for the stability of a numerical method.Indeed,by decomposing the domain,sub-problem in each sub-domain is not only smaller in terms of degrees of freedom but also has smaller condition number(see Table 6).

The efficiency of the algorithm in large scale problems is also investigated.For testing purposes,the fluid withRe=1000 is simulated using very fine grids including grid-1=401×401 and grid-2=601×601 with the following parametersDDTol=1.E?06,CMTol=1.E?06,β=2 and?t=1.E?03 for grid-1 and 5.E?04 for grid-2.

While Fig.11(a)points out a gradual increase of throughput with respect to the number of CPUs for different scales by the present P-CV method, Fig. 11(b) depictsthe influence of the grid density on the efficiency with respect to the number ofCPUs. Indeed, the gradient of time curves of finer gridsize is steeper, which againindicates that the efficiency of the present parallel method will be higher for largerscale problems.

Figure 10:The square LDC fluid flow problem. Comparison between the parallelperformance of the P-C and P-CV methods for several Reynolds numbers(Re = 100, 400, 1000 and 3200) with a grid of 151×151: the efficiency, speed-upand throughput of the two methods as a function of the number of CPUs. Otherparameters are given in Tables 2 - 5.

Figure 11:The square LDC fluid flow problem.Throughput of the P-CV method with Re=1000 using different grids:151×151,401×401 and 601×601 as a function of the number of CPUs.

5.2 Triangular cavity

The triangular cavity has been proposed as a test case for the numerical algorithm in the case of non-rectangular domain.The domain is an equilateral triangle with the left and right sides being fixed and the top side(also called the lid)moving at a constant velocity from left to right.The problem’s geometry and boundary condition can be seen visually in Fig.12.

It is noted that while implementing CVs with non-rectangular domains,one needs to take extra care regarding points closed to boundary to make sure that CVs do not intersect with each other nor with the boundary.Figure 13 shows an example of control volume formation for a triangular domain.

The boundary conditions are given in terms of the stream function as

where the variables are defined before.The solving procedure remains the same as for the square cavity problem.However,when approximate the boundary value for ω two following cases must be considered.

First,for boundary points that lie on bothxgrid-line andygrid-line the approximation can be carried out normally by using 1D IRBF in two directions following Eq.(18).

Second,for boundary points,that lie only onxgrid-line orygrid-line,its approximation,thus,is available only in one direction.In this case,equivalent formulas provided by Le-Cao,Mai-Duy,and Tran-Cong(2009)are used as follows.

for points ony-grid line,wheretxandtyare thex-andy-components of the unit vector tangential to the boundary.

Figure 12: lid driven cavity flow problem:geometry and boundary conditions.Q=3.No slip is assumed between the fluid and solid surfaces.The top lid is moving from left to right with a speed of 1.

Figure 13:CV formation in 2D.

In this paper,a range of Reynolds numbers(100,200,500,1000)is investigated.Again,the streamline(Fig.14),vorticity contours(Fig.15)and velocity pro files along central horizontal liney=2 and vertical linex=0(Fig.16)by the present CV method with 4 sub-domains agree very well with ones by Kohno and Bathe(2006)using flow-conditioned-based finite element method.

Figure 14:The triangular LDC fluid flow problem.Stream-function(ψ)contours of the flow for several Reynolds numbers by the present parallel CV method using 4 sub-domains with grid of 24697 nodes,?t=5.E?04,DDTol=1.E?06,CMTol=1.E ?06 and β =1.

Figure 15:The triangular LDC fluid flow problem.Vorticity(ω)contours of the flow for several Reynolds numbers by the present parallel CV method.Other parameters are given in Fig.14.

Figure 16:The triangular LDC fluid flow problem.Vorticity pro files along vertical line(x=0)and horizontal line(y=2)for several Reynolds numbers by the present parallel CV method in comparison with the corresponding Kohno and Bathe’s results(□foruvelocity and○forvvelocity).Other parameters of the present method are given in Fig.14.

Figure 17:The triangular LDC fluid flow problem.Parallel performance of the P-CV methods for several Reynolds numbers using a grid of 24607 nodes:the efficiency,speed-up and throughput as a function of the number of CPUs.Other parameters are given in Table 7.

dt =ns;24697 nodes,iteratio of er mb of nu gridi:with ains);od om meth (sub-d t P-CV Us CP the presen of number s:Results by PU.C=1 problem.06,β C f l uid f l ow LD ular 1.E?Tol=speed-up;eff:eff i ciency.?06,CMd:sp utes);.E in The triang=1 Table 7:(m ol DT time,D 04 5.E?t(m):elapsed eff0.4.9.0.5.7.4.2.6.5.6.9.7.1.8.4.8 d 1085434349464655593460515149494542 00 sp1.00 Re10 1.71 14 85834824181385618640393330262523 6405120263864748 i8314181920242828264929343536333838 1.76 t(m)60.1 4.31 4 900 173 spd 679 Re50 t(m)eff0.39 0 3.01 0.83 1.00 5.95 5.07 7.94 1.53 Re200 spd 11.14 5.55 1313.30 eff 17.11 3.87 859.59 t(m)1.05 23.85 i7267 16.93.33 965.23 36.38.22 36.90.14 43.48 47.67.57 55.86.58.63 57.71.14 61.75.30.65 413 370 900 436 122 217 076 830 805 386 666 753.72 732.17.67.61.63.94.41.51.81.03.06.91.75.71.91 1076413945465354644556605356414838 1.67 eff 2.74 1436 Re100 spd 5.48 478.99 i4710111312121312121513121513141614 100 786.89 77.24 1.00 1.54 1493 7.92.69 966.92 i8658 48668546580444013 239.64 73608271673 Us 165.74 CP 17.0 12.8 25.7 102.03 22.3 77.11 34.0 50.98 42.6 58.90 44.5 38.53 55.2 30.81 29.46 46.7 61.8 23.79 56.0 28.09 21.23 23.44 32483078731666954 1743 188942.94 2037 48.08 1.72 220555.75 3.37 869.66 1575100 219749.45 6.69 443.81 1892 70.83 1.00 266069.26 8.41 223.26 182841.32 1.42 t(m)8.12 2206 3422 75.38 16.62 177.67 83.03 23.37 89.86 282884.84 33.21 63.92 263176.88 41.57 44.97 305886.86 46.13 35.93 285980.65 61.67 32.38 368970.64 67.75 24.22 326979.59 68.52 22.05 378772.19 89.14 21.80 73.20 91.68 16.76 105.41 16.29 14.17 189036.27 1.65 8.72 232448.23 2.54 196452.29 5.79 211553.73 1974 74.13 2064 71.42 2421 64.04 2057 78.01 2293 78.14 2570 67.54 2128 77.49 2468 65.85 2283 61.47 64.68 8.89.9.9.5.3.8.4.7.1.7.0.1 08780836563 1222283146555675737893 845951331495653629271815141111109.11 800 3.13 352 4.09 643 6.5307093295806 247.4.7.9.6.0.1.2.9.2.5.8 594 511 729 912 695 267 267 884 687 683 760 187 810 124712172431404960718497274 111214

In terms of parallel efficiency,Table 7 gives detailed results of the parallel algorithm for several Reynolds numbers and with a grid of 205×205(or 24697 grid points).Visual forms can be found in Fig.17.For each Reynolds number,although results showed that the computation time decrease gradually as the number of subdomains(CPUs)increases(Fig.17(c)),the optimum number of CPUscpusoptof the parallel method described by the efficiency(eff)for each case is not clear(Fig.17(a)).This can be explained as the in fluence of the domain decomposition for a non-rectangular domain problem where the numbers of collocations/CVs in subdomains are not equal,resulting in significant variation of the amount of work to be completed from sub-domain to sub-domain.Thus,the results show that the sub-domain formation plays an important role in parallel computation schemes.

6 Conclusion

In this paper,we proposed a DD parallel distributed method coupled with a local IRBF CV approach.The proposed method is successfully implemented to simulate the lid driven cavity flow in rectangular and non-rectangular domains.It has been shown that results produced by the method are in excellent agreement with the spectral benchmark solutions by Botella and Peyret(1998)and Ghia,Ghia,and Shin(1982)for the square domain and by Kohno and Bathe(2006)for the triangular domain.A very important achievement of this paper is the high time-efficiency of the parallel algorithm including the speed-up.It is shown that the speedup grows steadily with the increase of the number of CPUs.This indicates excellent scalability of the method.Moreover,a super-linear efficiency has been observed for several cases;this phenomenon is best explained by the decrease of condition numbers in sub-domains.The parallel algorithm performs well in both collocation and CV methods.Indeed,the trend in efficiency with increasing number of CPUs for several Reynolds numbers is consistent with results achieved by the collocation method reported in Pham-Sy,Tran,Hoang-Trieu,Mai-Duy,and Tran-Cong(2013).

Acknowledgement:The first author would like to thank the CESRC,Faculty of Health,Engineering and Sciences and University of Southern Queensland for a Ph.D.scholarship.This research was supported by the Australian Research Council.The computation was performed on the USQ high performance computing cluster supported by the Queensland Cyber Infrastructure Foundation(QCIF).The authors would like to thank the reviewers for their helpful comments.

Botella,O.;Peyret,R.(1998):Benchmark spectral results on the lid-driven cavity flow.Computers&Fluids,vol.27,pp.421–433.

Chandhini,G.;Sanyasiraju,Y.(2007):Local RBF-FD solutions for steady convection-diffusion problems.International journal for numerical methods in Engineering,vol.72(3),pp.352–378.

Chinchapatnam,P.P.;Djidjeli,K.;Nair,P.B.(2007):Radial basis function meshless method for the steady incompressible Navier-Stokes equations.International Journal of Computer Mathematics,vol.84,no.10,pp.1509–1521.

Ghia,U.;Ghia,K.N.;Shin,C.T.(1982):High-Re solutions for incompressible flow using the Navier-Stokes equations and a multigrid method.Journal of Computational Physics,vol.48,pp.387–411.

Hoang-Trieu,T.-T.;Mai-Duy,N.;Tran-Cong,T.(2012):Several compact local stencils based on Integrated RBFs for fourth-order ODEs and PDEs.Computer Modeling in Engineering&Sciences,vol.84(2),pp.171–203.

Ingber,M.;Chen,C.;Tanski,J.(2004):A mesh free approach using radial basis functions and parallel domain decomposition for solving three-dimensional diffusion equations.International Journal for Numerical Methods in Engineering,vol.60,pp.2183–2201.

Kansa,E.J.(1990):Multiquadrics-A scattered data approximation scheme with applications to computational fluid-dynamics-I Surface approximations and partial derivative estimates.Computers&Mathematics with Applications,vol.19(8-9),pp.127–145.

Kim,Y.;Kim,D.W.;Jun,S.;Lee,J.H.(2007):Meshfree point collocation method for the stream-vorticity formulation of 2D incompressible Navier-Stokes equations.Computer Methods in Applied Mechanics and Engineering,vol.196,no.33-34,pp.3095–3109.

Kohno,H.;Bathe,K.-J.(2006):A flow-condition-based interpolation finite element procedure for triangular grids.International Journal for Numerical Methods in Fluids,vol.51,no.6,pp.673–699.

Le-Cao,K.;Mai-Duy,N.;Tran-Cong,T.(2009):An effective Intergrated-RBFN cartesian-grid discretization for the streamfunction-vorticity-temperature formulation in nonrectangular domains.Numerical Heat Transfer,Part B:Fundamentals,vol.55(6),pp.480–502.

Lin,H.;Atluri,S.(2001):The Meshless Local Petrov-Galerkin(MLPG)Method for Solving Incompressible Navier-Stokes Equations.CMES:Computer Modeling in Engineering&Sciences,vol.2,no.2,pp.117–142.

Mai-Duy,N.;Tran-Cong,T.(2001):Numerical solution of differential equations using multiquadric radial basis function networks.Neural Networks,vol.14,pp.185–199.

Mai-Duy,N.;Tran-Cong,T.(2010):A numerical study of 2D integrated RBFNs incorporating Cartesian grids for solving 2D elliptic differential problems.Numerical methods for Partial Differential Equation,vol.26,pp.1443–1462.

MATLAB(2012):Parallel Computing Toolbox User’s Guide R2012b.The MathWorks Inc.,Natick,Massachusetts,United States,6.1 edition.

Patankar,S.V.(1980):Numerical heat transfer and fluid flow.CRC Press.

Pham-Sy,N.;Tran,C.-D.;Hoang-Trieu,T.-T.;Mai-Duy,N.;Tran-Cong,T.(2013):Compact local IRBF and domain decomposition method for solving PDEs using a distributed termination detection based parallel algorithm.CMES:Computer Modeling in Engineering&Sciences,vol.92,no.1,pp.1–31.

Quarteroni,A.;Valli,A.(1999):Domain decomposition methods for partial differential equations.Clarendon Press.

Shirazaki,M.;Yagawa,G.(1999):Large-scale parallel flow analysis based on free mesh method:a virtually meshless method.Computer Methods in Applied Mechanics and Engineering,vol.174,no.3-4,pp.419–431.

Shu,C.;Ding,H.;Yeo,K.(2005):Computation of Incompressible Navier-Stokes Equations by Local RBF-based Differential Quadrature Method.CMES:Computer Modeling in Engineering&Sciences,vol.7,no.2,pp.195–206.

Singh,I.V.;Jain,P.K.(2005):Parallel Meshless EFG Solution for Fluid Flow Problems.Numerical Heat Transfer,Part B:Fundamentals,vol.48,no.1,pp.45–66.

Tran,C.-D.;Phillips,D.G.;Tran-Cong,T.(2009):Computation of dilute polymer solution flows using BCF-RBFN based method and domain decomposition technique.Korea-Australia Rheology Journal,vol.21(1),pp.1–12.

旺苍县| 阳东县| 岱山县| 康定县| 隆尧县| 乌什县| 新河县| 观塘区| 榆林市| 新建县| 鹿邑县| 天祝| 霍林郭勒市| 唐山市| 邵东县| 乌兰县| 吴江市| 清流县| 辉县市| 汕尾市| 湘阴县| 伊川县| 象山县| 景德镇市| 宿州市| 酒泉市| 赫章县| 铅山县| 贺兰县| 芦溪县| 韩城市| 康平县| 安顺市| 中牟县| 青河县| 微山县| 那曲县| 凤城市| 巴里| 永清县| 铅山县|