Statistical practice requires various imperfections resulting from the nature of data to be addressed. Data containing different types of measurement errors and irregularities, such as missing observations, have to be modelled. The study presented in the paper concerns the application of the expectation-maximisation (EM) algorithm to calculate maximum likelihood estimates, using an autoregressive model as an example. The model allows describing a process observed only through measurements with certain level of precision and through more than one data series. The studied series are affected by a measurement error and interrupted in some time periods, which causes the information for parameters estimation and later for prediction to be less precise. The presented technique aims to compensate for missing data in time series. The missing data appear in the form of breaks in the source of the signal. The adjustment has been performed by the EM algorithm to a hybrid version, supplemented by the Newton-Raphson method. This technique allows the estimation of more complex models. The formulation of the substantive model of an autoregressive process affected by noise is outlined, as well as the adjustment introduced to overcome the issue of missing data. The extended version of the algorithm has been verified using sampled data from a model serving as an example for the examined process. The verification demonstrated that the joint EM and Newton-Raphson algorithms converged with a relatively small number of iterations and resulted in the restoration of the information lost due to missing data, providing more accurate predictions than the original algorithm. The study also features an example of the application of the supplemented algorithm to some empirical data (in the calculation of a forecasted demand for newspapers).
missing data, multivariate time series, expectation-maximisation algorithm, Newton-Raphson algorithm
C13, C19, C61
Anderson, B. D. O., & Moore, J. B. (1979). Optimal Filtering. Hoboken: Prentice-Hall. http://users.cecs.anu.edu.au/~john/papers/BOOK/B02.PDF.
Butland, B. K., Armstrong, B., Atkinson, R. W., Wilkinson, P., Heal, M. R., Doherty, R. M., & Vieno, M. (2013). Measurement error in time-series analysis: a simulation study comparing modelled and monitored data. BMC Medical Research Methodology, 13, 1–12. https://doi.org/10.1186/1471-2288-13-136.
Cajner, T., Crane, L. D., Decker, R. A., Hamins-Puertolas, A., & Kurz, C. (2019). Improving the accuracy of economic measurement with multiple data sources: The case of payroll employment data (NBER Working Paper No. 26033). https://doi.org/10.3386/w26033.
Carey, R. N., Jani, C., Johnson, C., Pearce, J., Hui-Ng, P., & Lacson, E. (2016). Chemistry Testing on Plasma Versus Serum Samples in Dialysis Patients: Clinical and Quality Improvement Implications. Clinical Journal of the American Society of Nephrology, 11(9), 1675–1679. https://doi.org/10.2215/CJN.09310915.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1), 1–22. https://doi.org/10.1111/j.2517-6161.1977.tb01600.x.
Diebold, F. X., & Mariano, R. S. (1995). Comparing Predictive Accuracy. Journal of Business and Economic Statistics, 13(3), 253–263. https://doi.org/10.1080/07350015.1995.10524599.
Fukuda, K. (2005). Forecasting economic time series with measurement error. Applied Economics Letters, 12(15), 923–927. https://doi.org/10.1080/13504850500119161.
Geweke, J. (2005). Contemporary Bayesian Econometrics and Statistics. Hoboken: John Wiley & Sons.
Ghassemi, M., Pimentel, M. A. F., Naumann, T., Brennan, T., Clifton, D. A., Szolovits, P., & Feng, M. (2015). A multivariate timeseries modeling approach to severity of illness assessment and forecasting in ICU with sparse, heterogeneous clinical data. Proceedings of the AAAI Conference on Artificial Intelligence, Palo Alto. http://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/view/9393.
Ha¨rdle, W. K., & Simar, L. (2015). Applied Multivariate Statistical Analysis (4th edition). Berlin, Heidelberg: Springer-Verlag. https://doi.org/10.1007/978-3-662-45171-7.
Jóźwiak, J., & Podgórski, J. (2009). Statystyka od podstaw (6th edition). Warszawa: Polskie Wydawnictwo Ekonomiczne.
Kalman, R. E. (1960). A New Approach to Linear Filtering and Prediction Problems. Journal of Basic Engineering, 82(1), 35–45. https://doi.org/10.1115/1.3662552.
Koenker, R. (2005). Quantile Regression. Cambridge University Press. https://doi.org/10.1017/CBO9780511754098.
Korczyński, A. (2018). Screening wariancji jako narzędzie wykrywania zmowy cenowej. Istota i znaczenie imputacji danych. Warszawa: Oficyna Wydawnicza SGH.
Laaksonen, S. (2018). Survey Methodology and Missing Data: Tools and Techniques for Practitioners. Cham: Springer. https://doi.org/10.1007/978-3-319-79011-4.
Lange, K. (1995). A quasi-Newton acceleration of the EM algorithm. Statistica Sinica, 5(1), 1–18. http://www3.stat.sinica.edu.tw/statistica/oldpdf/A5n11.pdf.
Little, R. J. A., & Rubin, D. B. (2002). Statistical Analysis with Missing Data (2nd edition). Hoboken: John Wiley & Sons. https://doi.org/10.1002/9781119013563.
Mittelhammer, R. (2013). Mathematical Statistics for Economics and Business (2nd edition). New York: Springer. https://doi.org/10.1007/978-1-4614-5022-1.
Molenberghs, G., & Kenward, M. G. (2007). Missing Data in Clinical Studies. Chichester: John Wiley & Sons. https://doi.org/10.1002/9780470510445.
Shumway, R. H., & Stoffer, D. S. (1982). An approach to time series smoothing and forecasting using the EM algorithm. Journal of Time Series Analysis, 3(4), 253–264. https://doi.org/10.1111/j.1467-9892.1982.tb00349.x.
Shumway, R. H., & Stoffer, D. S. (2017). Time Series Analysis and Its Applications: With R Examples (4th edition). Cham: Springer. https://doi.org/10.1007/978-3-319-52452-8.
Teleskop. (n.d.). Polskie badania czytelnictwa. Retrieved July 14, 2018, from https://www.teleskop.org.pl/zkdp/index.jsp?p=news.