Modeling Liver Cancer and Leukemia Data Using Arcsine-Gaussian Distribution

2021-12-16 07:51:06FarouqMohammadAlamSharifahAlrajhiMazenNassarandAhmedAfify

Computers Materials&Continua 2021年5期

Farouq Mohammad A.Alam,Sharifah Alrajhi,Mazen Nassar,2 and Ahmed Z.Afify

1Department of Statistics,Faculty of Science,King Abdulaziz University,Jeddah,21589,Saudi Arabia

2Department of Statistics,Faculty of Commerce,Zagazig University,Zagazig,44511,Egypt

3Department of Statistics,Mathematics and Insurance,Benha University,Benha,13511,Egypt

Abstract:The main objective of this paper is to discuss a general family of distributions generated from the symmetrical arcsine distribution.The considered family includes various asymmetrical and symmetrical probability distributions as special cases.A particular case of a symmetrical probability distribution from this family is the Arcsine-Gaussian distribution.Key statistical properties of this distribution including quantile,mean residual life,order statistics and moments are derived.The Arcsine-Gaussian parameters are estimated using two classical estimation methods called moments and maximum likelihood methods.A simulation study which provides asymptotic distribution of all considered point estimators,90% and 95% asymptotic confidence intervals are performed to examine the estimation efficiency of the considered methods numerically.The simulation results show that both biases and variances of the estimators tend to zero as the sample size increases,i.e.,the estimators are asymptotically consistent.Also,when the sample size increases the coverage probabilities of the confidence intervals increase to the nominal levels,while the corresponding length decrease and approach zero.Two real data sets from the medicine filed are used to illustrate the flexibility of the Arcsine-Gaussian distribution as compared with the normal,logistic,and Cauchy models.The proposed distribution is very versatile to fit real applications and can be used as a good alternative to the traditional gaussian distribution.

Keywords:Liver cancer data;leukemia data;normal distribution;moments estimation;maximum likelihood estimation

1 Introduction

In the last two decades,several methods are proposed to generate continuous distributions.Many of these methods are discussed in [1].The methodologies of these methods depend on generating new distributions by adding parameters to an existing distribution or combining existing distributions,see for more details [2,3].The beta distribution is an important model for the analysis of proportions which are common in many fields of science such as toxicology [4].A particular case of the beta distribution is the symmetric arcsine distribution which is a beta distribution with both shape parameters equal to half.In the field of stochastic process,the arcsine distribution is associated with the arcsine laws of random walks and Brownian motion [5].For more comperhansive details about the beta distribution,see [4,6].

A continuous random variableXis said to follows the standard arcsine distribution if its cumulative density function (CDF) is given by:

Now,notice that thexterm in the right hand side of (1) is actually the CDF of a standard uniform distribution.Hence,by simply replacing this term with another CDF of any continuous probability distribution;say,G(·),then one can obtain an extended arcsine distribution with the following CDF:

Clearly,this extension of the arcsine distribution can generate lifetime distributions and elliptically contoured distributions (i.e.,symmetrical distributions in R (by simply replacingG(·)with the corresponding CDF of the considered probability distribution.For more details about different kinds of univariate continuous distributions,see [7].In statistical literature,researchers have proposed generalizations and extensions for many continuous probability distributions.Obviously,when modeling real data,obtaining a generalization or an extension for a model of interest provides a more flexible version of the model which may fit the data more appropriately.For instance,the exponential distribution is extensively considered in reliability data with a constant failure rate.In practice,however,several reliability data may have monotonic failure (hazard)rates.Thus,a well-known generalization for the exponential distribution;namely,the Weibull distribution,is alternatively considered.Although constant and monotonic failure rates might be encountered in reality,many real-life data have non-monotonic failure rates.Consequently,researchers have considered various generalization for the Weibull distribution,see the concise article by [8]in this connection.

Recently,several generalizations of the normal distribution have been developed.For example,the beta-normal distribution [9],generalized normal distribution [10],skew-normal distribution [11],and truncated normal distribution [12].

This study considers the extension of the arcsine distribution based on a gaussian kernel which is henceforth called the arcsine-gaussian (AG) distribution.The motivations to propose the AG distribution are:(1) To develop various shapes for the density and hazard rate function of the distribution.(2) To increase the flexibility of the classical gaussian distribution in modelling different real life applications.(3) To increase the flexibility of the traditional gaussian distribution properties like mean,variance,skewness and kurtosis.(4) The analysis of two real data sets proved that the AG distribution provides a better fit than the traditional gaussian distribution and some of its competitive models.Although the idea of this paper is not new since the AG distribution is a special case of the beta-normal distribution [9],the novelty of this study lies in the fact that,to the best of the author’s knowledge,no previous research has been conducted on this probability distribution although the importance and popularity of the gaussian distribution in modeling many real-life applications.The remaining sections of this article are organized as follows.Sections 2 and 3 discuss the distributional and statistical properties of the AG distribution,respectively.In Section 4,estimators are derived for the model parameters and their finite-sample efficiencies are numerically examined using Monte Carlo simulations in Section 5.Two real life data sets are analyzed in Section 5 to illustrate that the AG distribution is a suitable fit for the two data under analysis,comparing with some well-known distributions.Finally,the paper is concluded in Section 7.

2 Distributional Properties of the AG Distribution

In this section,the distributional properties of the AG distribution are discussed.

2.1 The CDF and the Survival Function

By replacingG(·)in expression (2) by CDF of the gaussian distribution,denoted by Φ(·),one can say that a random variableXfollows the AG distribution with location parameterμ∈R and a non-negative scale parameterσ >0 (i.e.,X～AG(μ,σ)) if the CDF and the survival function(SF) have the following forms,respectively:

and

Clearly,the AG distribution is a location-scale model;i.e.,ifZ～AG(0,1),thenX=μ+σZ～AG(μ,σ).

2.2 Quantile Function

It is well-known that the quantile function finds the valueXsuch that

Pr(X≤x)=u

for a probability 0

where Φ?1(·)is the the quantile function of thestandardgaussian distribution.Note that one can verify that the median of the AG distribution is equal toμby settingu=0.5 in expression (5).

2.3 The Probability Density Function

By differentiating both part of (3) with respect tox,one can show that the probability density function (PDF) of the AG function with location parameterμand a scale parameterσis given by:

whereφ(z),Φ(z)and(z)=1?Φ(z)are the PDF,CDF,and SF of the standard gaussian(normal) distribution,respectively.Clearly,the distribution is symmetric and this fact is proven in the following lemma.

Lemma 1AG(μ,σ) is symmetric about its location parameter μ.

Proof.A continuous probability distribution is said to be symmetric about its location parameterμif and onlyf(μ?x)=f(μ+x)for allx∈R.Clearly,

Note that additional statistical proprieties are to be addressed in the following section.Fig.1 shows some possible shapes of the AG density for various values ofμandσ.

Figure 1:The PDFs of the AG distribution for different values of μ and σ

2.4 The Hazard Rate Function

The AG distributional not only is symmetric like the Gaussian distribution,but also inherits the behavior of its hazard function (HF).That is,the HF is increasing,see Fig.2.The HF of the AG function with location parameterμand a scale parameterσis given by:

A probability distribution with an increasing HF is suitable to model lifetime data observed due to wear-out of lifeless objects or aging of living entities.Mathematically speaking,this can be proven as follows.

Theorem 1AG(μ,σ) has an increasing hazard rate.

Proof.Without loss of generality,consider the standard AG distribution.Notice that expression (7) can be rewritten as:

The termis proven to be increasing by [13],while the third is the cumulative HF of the AG distribution which is increasing be definition.Notice that:As previously mentioned,i s increasing and so does?[Φ(z)]?1since Φ(z)is increasing,is decreasing,and is increasing.Because the second term is increasing for all values ofz,then logh(z)is increasing due to the fact that all of its components are increasing by the definition of increasing functions.

According to Theorem 1,the HF of the AG model is increasing function in its parameters as displayed graphically in Fig.2,for various values ofμandσ.

2.5 The Mean Residual Life

In reliability analysis,the mean residual life (MRL) is an important characteristic of a lifetime model.Letm(t;μ,σ)denotes the MRL of the AG distribution;then:

Figure 2:The HRFs of the AG distribution for different values of μ and σ

whereS(·)andf(·)are the SF and the PDF of the probability distribution of interest.Notice that expression (8) can be rewritten in terms of the HF as follows:

whereh(τ;μ,σ)is the HF of the considered distribution;see [14,15]in this connection.Hence,form expression (9),one can easily infer that the MRL in the case of the AG distribution has an opposite behavior to that of the HF,i.e.,it is decreasing ?x.This observation is asserted in Fig.3.

Figure 3:The MRL of the AG,normal and Chancy distributions with μ=0 and σ=1

2.6 Order Statistics

In this section,the PDF of ther-th order statistics is derived.LetX(1),...,X(1)be the order statistics for a random sampleX1,...,Xnof sizenfrom the AG distribution.It is known that the PDF of ther-th takes the form

whereF(x)andf (x)are the CDF and PDF of the AG distribution.From (3) and (6),the PDF of ther-th order statistics of the AG distribution is given by

Particularly,PDF of the first and last order statistics can be derived directly from the last equation as follow

and

3 Statistical Properties of the AG Distribution

This section presents several statistical properties of the AG distribution which are obtained from the following lemma and theorem.

Lemma 2

1.The quantile function of the standard normal distribution;namely,Φ?1(u),is increasing for allu∈(0,1).

2.IfU～Beta(0.5,0.5),then?∞<<∞for allu∈(0,1)andk∈N.

Proof.

1.Ifq=Φ?1(u),thenu=Φ(q).Differentiating both sides of the latter equation with respect toqyields:

Hence,Φ?1(u)is an increasing function for allu∈(0,1).

2.Recall that Φ?1(u)is increasing and letδ >0;thus,sincek∈N.Hence,the proof is completed by taking the expected value on all sides of the inequality and making use of the properties of the expectation operator (see [16]in this connection),and by taking the limitδ→0.

Theorem 2If Z～AG(0,1),then the kth moment exists for k=1,2...and it is given by:

such that

Proof.For any value ofk,it is clear that:

Hence,E(Xk)is finite according to Lemma 2.However,ifkis a positive odd integer (i.e.,k=1,3,...),then the term

is clearly an odd function since the AG distribution is symmetric according to Lemma 1.Hence,E(Zk)=0 fork=1,3,....

By making use of Theorem 2 and the the fact that the AG distribution is a location-scale and a symmetric family of distributions,its properties are straightforwardly obtained as follows.

Corollary 1If X～AG(μ,σ),then the measures of center tendency;namely,the mean,median,and the mode are equal to μ.

Corollary 2If X～AG(μ,σ),then the second,third,and forth moments of X are given by:

E(X2)=μ2+ξ2σ2,E(X3)=μ3+3ξ2μσ2andE(X4)=σ4(ξ4?ξ22)+4ξ2μ2σ2,

respectively.

Corollary 3If X～AG(μ,σ),then the variance is given by:

Corollary 4Let γ1(γ2) denote the coefficient of skewness (kurtosis),if X～AG(μ,σ),then γ1=0,while γ2=2.86158.

It is to be noted that the above corollaries agree with the result of [9].

4 Model Parameters Estimation

In this section,two methods are considered to estimate the parameters of the AG distribution;namely,the method of moments and the maximum likelihood method.

4.1 Moments Estimators

Suppose thatX1,X2,...,Xnrepresent a random sample from the AG distribution with location parameterμand a scale parameterσ.By employing the method of moments,the corresponding moments estimator (ME) forμ;say,,is the sample meanwhile the ME forσ;say,,is given by:

Notice that one can obtain a Monte Carlo moments estimator (MCME) forσbased on expression (11) using Monte Carlo integration (MCI).To improve the approximation,the variance is reduced using antithetic variates.For more information about the latter method and MCI,see [17].The MCME estimator ofσ;say,?,is calculated by approximating the termξ2in (11)as follows:

1.Generate a random sampleU1,...,UMfrom Beta(0.5,0.5).

Theorem 3then the asymptotic joint sampling distributionis a bivariate normal(BN)distribution with mean vector θ=[μσ]T and variance-covariance matrix,i.e.,1The superscript T denotes the matrix transpose operator.

such that 0 is the zero vector and

Proof.Recall thatX1,...,Xnrepresent a random sample (i.e.,independent and identically distributed random variables) from theAG(μ,σ).Suppose that:

By the strong law of large numbers,M1 andM2 converge almost surely toE(X)andE(X2),respectively.Furthermore,by the central limit theorem,bothM1andM2are asymptotically normally distributed.Also,any linear combination ofM1andM2;say,c1M1+c2M2,is asymptotically normally distributed for allc1andc2.Accordingly,

where

such that

and

by making use of the corollaries in the previous section.Now,the aim is to find the asymptotic joint sampling distribution of

such thatψ1(u,v)=uandψ2(u,v)=

Notice that:

Hence,by making use of Taylor series expansion,one can easily verify that

where

4.2 Maximum Likelihood Estimators

Recall the PDF of the AG distribution which was given by expression (6).Also,suppose that x=[x1:n···xn:n]Tare the observed order statistics.The likelihood function based on x is then

such thatzi:n?σ?1(xi:n?μ).Accordingly,the log-likelihood function is as follows:

wheren!=n×(n?1)×···×1.From (13),the likelihood (normal) equations forμandσare,respectively:

and

such thatandφ(z)=φ(z)/Φ(z)are the HF and the reversed HF of the standard gaussian distribution,respectivly.The latter function is decreasing for all real numbers and this can be proven using a methodology simiar to that of [13].

Clearly,the values of the maximum likelihood estimators (MLEs) need to determined numerically.Nevertheless,by following a similar approach as in [13],one can derive the following approximated MLEs for bothμandσbased on the observed order statisticsx1:n,...,xn:n:

such that

while

where

and

Unfortunately,it is not easy to derive the exact distributions for both the MLEs and their approximated counterparts.However,one may can derive asymptotic confidence intervals for the model parameters under some regularity condition.For more information about the asymptotic properties of the MLEs,see [16].Now,according to the latter reference,asn→∞,then:

and

5 Simulation Outcomes

To compare the estimation methods in terms of efficiency,extensive MC simulations are carried out.The outcomes of an MC simulation study are reported in this section.Without loss of any generality,the standard AG distribution is considered,while the sample sizes of interest aren=10(10)100.The numerical results of this study were determined from 10,000 MC simulation runs.This number of simulations gives the accuracy in the order ±(10,000)?0.5=±0.01 see,[18];thus,all numerical outcomes for this study are reported up to three decimal digits.

Tab.1 presents the outcomes associated with the estimators ofμ;namely,the ME (),the AMLE (?),and the MLE().On the other hand,Tab.2 summarizes the simulation results of the estimators ofσ;namely,the MCME (),the ME (),the AMLE (),and the MLE().By making use of the asymptotic distribution of all considered point estimators,90% and 95% asymptotic confidence interval are obtained and are evaluated according to their observed coverage probabilities and lengths.Interestingly,all estimators had similar performance indicators.Furthermore,the simulated variance of the estimators and their counterparts which were calculated based on the variance-covariance matrix (VCM) of the asymptotic sampling joint distributions were quite close.

Table 1:Bias and variance for the estimators of μ alongside the observed coverage probability(CPr) and length (L) of 90% and 95% confidence intervals (CIs)

Table 2:Bias and variance for the estimators of σ alongside the observed coverage probability(CPr) and length (L) of 90% and 95% confidence intervals (CIs)

In terms of estimation efficiency,asn→∞,both the biases and variances of the estimators tend to zero,i.e.,the estimators are asymptotically consistent.Moreover,as the sample size increases,the coverage probabilities of the confidence intervals increase to the nominal levels,while the corresponding length decrease and approach zero.

6 Data Analysis

In this section,two real data sets from medicine field were analyzed to illustrate the application of the AG distribution in practice.The first data under consideration represent life times(in days) of 39 patients suffering from liver cancer,and the data were reported by Elminia cancer center Ministry of Health,Egypt in (1999) [19].The data are:10,14,14,14,14,14,15,17,18,20,20,20,20,20,23,23,24,26,30,30,31,40,49,51,52,60,61,67,71,74,75,87,96,105,107,107,107,116,150.

The second data under consideration are the life times of 20 leukemia patients who were treated by a certain drug ([20,21]).The data are:1.013,1.034,1.109,1.169,1.226,1.509,1.533,1.563,1.716,1.929,1.965,2.061,2.344,2.546,2.626,2.778,2.951,3.413,4.118,5.136.

Practically,the logarithmic counterparts of these models are the normal distribution and the logistic distribution,respectively.Hence,the logarithms of the data are analyzed using the latter distributions alongside the Cauchy distribution and the AG distribution.

To determine which model appropriately fit the log-data,the minus observed log-likelihood(??),the Akaike information criterion (AIC=2k?2?) [22]and the Bayesian information criterion (BIC=klog(n)?2?) [23]are calculated for each model as shown in Tabs.3 and 4.The AG distribution has provided very close results to all data sets than the normal,logistic,and Cauchy distributions.We conclude that the AG and normal distributions have outperformed the remaining ones.

Table 3:Estimators of the location μ and scale σ2 parameters with associated SEs and the corresponding information criteria for liver cancer data

Table 4:Estimators of the location μ and scale σ2 parameters with associated SEs and the corresponding information criteria for leukemia data

Furthermore,the empirical survival function (ESF) and the theoretical survival functions(TSF) of the AG,normal,logistic,and Cauchy distributions were compared graphically,for the two real data sets,in Fig.4.The fitted functions of the AG model for the two real data sets including the PDF,CDF,SF and PP plots were displayed in Fig.5.

Figure 4:ESF vs. TSF of the compared distributions for liver cancer data (left) and leukemia data (right)

Figure 5:The fitted functions of the AG distribution for liver cancer data (left) and leukemia data (right)

7 Conclusions

In this paper,the AG distribution is considered.The relation between this distribution and the beta-normal distribution is the same as the relation between the arcsine distribution and the beta distribution.Both distribution and statistical properties of the AG distribution are intuitive and easy to verify.Point estimators for the corresponding model parameters have been obtained using the method of moments and maximum likelihood method and their asymptotic sampling distributions were discussed as well.In terms of performance,a simulation study have been conducted and its outcomes indicated that both point and interval estimators are quite similar in terms of efficiency and are asymptotically consistent as the sample size increases.In terms of data analysis,the AG distribution provided better fit to the considered data sets.

Availability of Data and Materials:The data sets used in this paper are provided within the main body of the manuscript.

Funding Statement:The authors received no specific funding for this study.

Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.

Computers Materials&Continua2021年5期

Computers Materials&Continua的其它文章: Industrial Food Quality Analysis Using New k-Nearest-Neighbour methods; Economical Requirements Elicitation Techniques During COVID-19:A Systematic Literature Review; A Resource Management Algorithm for Virtual Machine Migration in Vehicular Cloud Computing; Coronavirus:A“Mild”Virus Turned Deadly Infection; Improved Channel Reciprocity for Secure Communication in Next Generation Wireless Systems; Statistical Medical Pattern Recognition for Body Composition Data Using Bioelectrical Impedance Analyzer

国产日韩欧美一区二区三区三州_亚洲少妇熟女av_久久久久亚洲av国产精品_波多野结衣网站一区二区_亚洲欧美色片在线91_国产亚洲精品精品国产优播av_日本一区二区三区波多野结衣 _久久国产av不卡

Modeling Liver Cancer and Leukemia Data Using Arcsine-Gaussian Distribution

1 Introduction

2 Distributional Properties of the AG Distribution

2.1 The CDF and the Survival Function

2.2 Quantile Function

2.3 The Probability Density Function

2.4 The Hazard Rate Function

2.5 The Mean Residual Life

2.6 Order Statistics

3 Statistical Properties of the AG Distribution

4 Model Parameters Estimation

4.1 Moments Estimators

4.2 Maximum Likelihood Estimators

5 Simulation Outcomes

6 Data Analysis

7 Conclusions