Data clean and missing values: time series

 How to Deal with Missing Data:

  •  Firts try to extract meaningful insights from the understanding of the missing data.
  • Try to identified the causes, some common cause us to be:
    • Humman errors
    • Machine errors (software)
    •  Refusing to answer or provide data
    • Drop-outs
    • Data transfer and conversion
    • Merging or unrelated datasets

Implications

  • Can potentially lead to a reduction in overall statistical power and biases in the estimate.
  •  Lead to incorrect conclusions.

Patterns of missing data

  • univariate/multivariate
  • monotone (common in longitudinal studies)
  • non-monotone
  • conected/unconected
  • planned
  • random 

 Types of missing data

  •  Missing Completely At Random (MCAR)
  •  Missing At Random (MAR)
  •  Missing Not At random (MNAR)

 Handling approached

  • Drop observations with missing values
  • Imputation
    • Single-value (Mean, x value, etc...)
    • Multiple imputation (using data distribution)
    • Hot-Deck imputation (using values form math non-missing data cases)
    •  Last Observation Carried Forward (LOCF)
  • Model-based method
    • Interpolation
    • K-Nearest Neighbors 
    • MICE 
    • Regression – Linear, Logistic and Stochastic
    • Support vector machine
    • Decision Tree
    • Clustering imputation
    • Esemble method


 Reference source: 

Princeton University. (n.d.). In R: Missing data. Princeton University. Retrieved from https://libguides.princeton.edu/R-Missingdata

Alayo, B. (2023, February 12). Missing data: Causes, types, and handling techniques. LinkedIn. Retrieved from https://www.linkedin.com/pulse/missing-data-causes-types-handling-techniques-bilikis-alayo-ho9if/ 

Masters in Data Science. (n.d.). How to deal with missing data. Retrieved from https://www.mastersindatascience.org/learning/how-to-deal-with-missing-data/

Cook, A. B. (2023, August 12). Missing values. Kaggle. Retrieved from https://www.kaggle.com/code/alexisbcook/missing-values 

National Center for Biotechnology Information. (2019). Types of missing data. In NCBI Bookshelf. Retrieved from https://www.ncbi.nlm.nih.gov/books/NBK493614/


 

Entradas populares

SQL