Technical note: A procedure to clean, decompose, and aggregate time series
[摘要] Errors, gaps, and outliers complicate and sometimesinvalidate the analysis of time series. While most fields have developedtheir own strategy to clean the raw data, no generic procedure has beenpromoted to standardize the pre-processing. This lack of harmonization makesthe inter-comparison of studies difficult, and leads to screening methodsthat can be arbitrary or case-specific. This study provides a genericpre-processing procedure implemented in R (ctbi for cyclic/trenddecomposition using bin interpolation) dedicated to univariate time series.Ctbi is based on data binning and decomposes the time series into along-term trend and a cyclic component (quantified by a new metric, theStacked Cycles Index) to finally aggregate the data. Outliers are flaggedwith an enhanced box plot rule called Logbox that corrects biases due to thesample size and that is adapted to non-Gaussian residuals. Three differentEarth science datasets (contaminated with gaps and outliers) aresuccessfully cleaned and aggregated with ctbi. This illustrates therobustness of this procedure that can be valuable to any discipline.
[发布日期] [发布机构]
[效力级别] [学科分类] 妇产科学
[关键词] [时效性]