Bayesian Finite Mixture Negative Binomial Model for Over-dispersed Count Data with Application to DMFT Index Data
International Journal of Data Science and Analysis
Volume 5, Issue 5, October 2019, Pages: 104-110
Received: Oct. 8, 2019; Accepted: Oct. 23, 2019; Published: Oct. 30, 2019
Views 564      Downloads 134
Authors
Kipngetich Gideon, Department of Statistics and Actuarial Science, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya
Anthony Wanjoya, Department of Statistics and Actuarial Science, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya
Samuel Mwalili, Department of Statistics and Actuarial Science, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya
Article Tools
Follow on us
Abstract
To establish viable statistical model for modelling and analyzing DMFT index data which is important in oral health studies, difficulty arise when DMFT index data is characterized by over-dispersion. Over-dispersion caused by unobserved heterogeneity in the data pose a problem in fitting more common models to this data. and failure to account on such heterogeneity in the model can undermine the validity of the empirical results. The limitations of other count data models to account for overdispersion in DMFT index data due to existence of heterogeneity in the data, this paper formulated alternative model that captures heterogeneity in the data, that is Bayesian Finite mixture negative binomial regression model and the model applied to simulated overdispersed count data to determine the exact number of negative binomial components to be mixed and finally apply the model to DMFT index data. Bayesian finite mixture Negative Binomial (BFMNB-3) regression model is useful since the data were collected from heterogenous population. simulation results shows that 3-component Bayesian finite mixture of NB regression model converges and was quite enough to model the overdispersed simulated count data, applying BFMNB-3 model to DMFT index data, the model capability to capture heterogeneity in the data identifies that the methods; all the treatment (all methods together), mouth wash with 0.2% sodium fluoride and Oral hygiene were the best methods in preventing tooth decay in children in Belo Horizonte (Brazil) aged seven years this shows that BFMNB-3 performs better than BNB model were due to heterogeneity present in methods it only identifies methods; all the treatment (all methods together) and mouth wash with 0.2% sodium fluoride to be the best methods for preventing tooth decay for children in Belo Horizonte (Brazil) aged seven while this two methods were not the only significant methods, therefore from results there is complete superiority of BFMNB-3 over BNB model. R statistical software was used to accomplish the objectives of this paper.
Keywords
BFMNB-3 Model, DMFT Index Data, BNB
To cite this article
Kipngetich Gideon, Anthony Wanjoya, Samuel Mwalili, Bayesian Finite Mixture Negative Binomial Model for Over-dispersed Count Data with Application to DMFT Index Data, International Journal of Data Science and Analysis. Vol. 5, No. 5, 2019, pp. 104-110. doi: 10.11648/j.ijdsa.20190505.15
Copyright
Copyright © 2019 Authors retain the copyright of this article.
This article is an open access article distributed under the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
References
[1]
A. J. Dobson, “An introduction to generalized linear models.” Chapman & Hall/CRC, 2001.
[2]
K. F. Sellers and G. Shmueli, “Data dispersion: now you see it… now you don’t,” Commun. Stat. Methods, vol. 42, no. 17, pp. 3134–3147, 2013.
[3]
N. C. Pradhan and P. Leung, “A Poisson and negative binomial regression model of sea turtle interactions in Hawaii’s longline fishery,” Fish. Res., vol. 78, no. 2–3, pp. 309–322, 2006.
[4]
R. Winkelmann, Econometric analysis of count data. Springer Science & Business Media, 2008.
[5]
J. M. Hilbe, Modeling count data. Springer, 2011.
[6]
E. S. Park and D. Lord, “Multivariate Poisson-lognormal models for jointly modeling crash frequency by severity,” Transp. Res. Rec., vol. 2019, no. 1, pp. 1–6, 2007.
[7]
E. Hauer, Observational before/after studies in road safety. Estimating the effect of highway and traffic engineering measures on road safety. 1997.
[8]
J. B. Kadane, G. Shmueli, T. P. Minka, S. Borle, P. Boatwright, and others, “Conjugate analysis of the Conway-Maxwell-Poisson distribution,” Bayesian Anal., vol. 1, no. 2, pp. 363–374, 2006.
[9]
G. Shmueli, T. P. Minka, J. B. Kadane, S. Borle, and P. Boatwright, “A useful distribution for fitting discrete data: revival of the Conway--Maxwell--Poisson distribution,” J. R. Stat. Soc. Ser. C (Applied Stat., vol. 54, no. 1, pp. 127–142, 2005.
[10]
H. Madsen and P. Thyregod, Introduction to general and generalized linear models. CRC Press, 2010.
[11]
Y. Lee, J. A. Nelder, and Y. Pawitan, Generalized linear models with random effects: unified analysis via H-likelihood. Chapman and Hall/CRC, 2018.
[12]
D. Lord, S. D. Guikema, and S. R. Geedipally, “Application of the Conway--Maxwell--Poisson generalized linear model for analyzing motor vehicle crashes,” Accid. Anal. Prev., vol. 40, no. 3, pp. 1123–1134, 2008.
[13]
K. F. Sellers, S. Borle, and G. Shmueli, “The COM-Poisson model for count data: a survey of methods and applications,” Appl. Stoch. Model. Bus. Ind., vol. 28, no. 2, pp. 104–116, 2012.
[14]
S. D. Guikema and J. P. Coffelt, “Modeling count data in risk analysis and reliability engineering,” in Handbook of performability engineering, Springer, 2008, pp. 579–594.
[15]
D. Spiegelhalter, A. Thomas, N. Best, and D. Lunn, “WinBUGS user manual.” version, 2003.
[16]
Y. Zou, S. R. Geedipally, and D. Lord, “Evaluating the double Poisson generalized linear model,” Accid. Anal. Prev., vol. 59, pp. 497–505, 2013.
[17]
Y. Zou, D. Lord, and S. R. Geedipally, “Over-and Under-Dispersed Crash Data: Comparing the Conway-Maxwell-Poisson and Double-Poisson Distributions,” Texas A & M University, 2012.
[18]
J. K. Ghosh, M. Delampady, and T. Samanta, An introduction to Bayesian analysis: theory and methods. Springer Science & Business Media, 2007.
[19]
A. C. Cameron and P. K. Trivedi, Regression analysis of count data, vol. 53. Cambridge university press, 2013.
[20]
M. Zhou and L. Carin, “Negative binomial process count and mixture modeling,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 2, pp. 307–320, 2015.
[21]
B.-J. Park, D. Lord, and J. D. Hart, “Bias properties of Bayesian statistics in finite mixture of negative binomial regression models in crash data analysis,” Accid. Anal. Prev., vol. 42, no. 2, pp. 741–749, 2010.
[22]
S. Zamani Dadaneh, M. Zhou, and X. Qian, “Bayesian negative binomial regression for differential expression with confounding factors,” Bioinformatics, 2018.
ADDRESS
Science Publishing Group
1 Rockefeller Plaza,
10th and 11th Floors,
New York, NY 10020
U.S.A.
Tel: (001)347-983-5186