A Comparative Analysis of Time Series Prediction Techniques a Systematic Literature Review (SLR)

  • Conference paper
  • First Online: 22 December 2023
  • Cite this conference paper

time series analysis literature review

  • Sawssen Briki 11 ,
  • Nesrine Khabou 11 &
  • Ismael Bouassida Rodriguez 11  

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14396))

Included in the following conference series:

  • International Conference on Model and Data Engineering

350 Accesses

This paper highlights the significance of systematic literature reviews and explores the different techniques employed in these reviews, including statistical methods, machine learning, deep learning, and hybrid methods. The study aims to understand the performance and effectiveness of these techniques in the context of literature reviews. Statistical methods offer quantitative insights and analysis, while machine learning and deep learning techniques enable automation and uncover complex patterns in large volumes of data. However, hybrid methods, which integrate multiple techniques, have shown superior performance in systematic literature reviews, combining the strengths of different methodologies to achieve more comprehensive and accurate outcomes. Further development and refinement of hybrid methods can enhance the quality and effectiveness of literature review processes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

time series analysis literature review

An Analysis of Deep Neural Networks for Predicting Trends in Time Series Data

time series analysis literature review

A Meta-learner approach to multistep-ahead time series prediction

time series analysis literature review

Time Series Forecasting with Statistical, Machine Learning, and Deep Learning Methods: Past, Present, and Future

Xu, X., Hu, Z., Su, Q., Li, Y., Dai, J.: Multivariable grey prediction evolution algorithm: a new metaheuristic. Appl. Soft Comput. 89 , 106086 (2020)

Article   Google Scholar  

Krupitzer, C., Pfannemüller, M., Kaddour, J., Becker, C.: Satisfy: towards a self-learning analyzer for time series forecasting in self-improving systems. In: 2018 IEEE 3rd International Workshops on Foundations and Applications of Self-Systems (FAS* W), pp. 182–189. IEEE (2018)

Google Scholar  

Rostam, N.A.P., Malim, N.H.A.H., Abdullah, R., Ahmad, A.L., Ooi, B.S., Chan, D.J.C.: A complete proposed framework for coastal water quality monitoring system with algae predictive model. IEEE Access 9 , 108249 (2021)

Rinchen, S., Yassine, A., Schwartzentruber, K., Ahmed, H., Armitage, A.: Integrating small scale green energy into smart grids: prediction for peak load reduction. In: 2018 International Conference on Computer and Applications (ICCA), pp. 104–109. IEEE (2018)

Roberts, L., Michalák, P., Heaps, S., Trenell, M., Wilkinson, D., Watson, P.: Automating the placement of time series models for IoT healthcare applications. In: 2018 IEEE 14th International Conference on e-Science (e-Science), pp. 290–291. IEEE (2018)

Bazine, H., Mabrouki, M.: Prediction of photovoltaic production for smart grid energy management using hidden Markov model: a study case. In: 2017 International Renewable and Sustainable Energy Conference (IRSEC), pp. 1–7. IEEE (2017)

Carvalho, J., Jr., Costa, C., Jr.: Identification method for fuzzy forecasting models of time series. Appl. Soft Comput. 50 , 166 (2017)

Tessoni, V., Amoretti, M.: Advanced statistical and machine learning methods for multi-step multivariate time series forecasting in predictive maintenance. Procedia Comput. Sci. 200 , 748 (2022)

Zhang, Q., et al.: Data-driven approaches for time series prediction of daily production in the Sulige tight gas field, China. Artif. Intell. Geosci. 2 , 165 (2021)

Farías, R.L., Flores, J.J., Puig, V.: Qualitative and quantitative multi-model forecasting with nonlinear noise filter applied to water demand. In: 2015 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC), pp. 1–6. IEEE (2015)

Swaraj, A., Verma, K., Kaur, A., Singh, G., Kumar, A., de Sales, L.M.: Implementation of stacking based ARIMA model for prediction of COVID-19 cases in India. J. Biomed. Inform. 121 , 103887 (2021)

Pratyaksa, H., Permanasari, A. E., Fauziati, S., Fitriana, I.: Arima implementation to predict the amount of antiseptic medicine usage in veterinary hospital. In: 2016 1st International Conference on Biomedical Engineering (IBIOMED), pp. 1–4. IEEE (2016)

Marrero, L., García-Santander, L., Carrizo, D., Ulloa, F.: An application of load forecasting based on ARIMA models and particle swarm optimization. In: 2019 11th International Symposium on Advanced Topics in Electrical Engineering (ATEE), pp. 1–6. IEEE (2019)

Permanasari, A.E., Hidayah, I., Bustoni, I.A.: SARIMA (seasonal ARIMA) implementation on time series to forecast the number of malaria incidence. In: 2013 International Conference on Information Technology and Electrical Engineering (ICITEE), pp. 203–207. IEEE (2013)

Dong, J., Kuruganti, T., Djouadi, S.M.: Very short-term photovoltaic power forecasting using uncertain basis function. In: 2017 51st Annual Conference on Information Sciences and Systems (CISS), pp. 1–6. IEEE (2017)

Ding, J., Zhou, J., Tarokh, V.: Asymptotically optimal prediction for time-varying data generating processes. IEEE Trans. Inf. Theory 65 , 3034 (2018)

Article   MathSciNet   Google Scholar  

Guo, T., Liu, Y., Zhao, J., Zhu, Y., Liu, J.: A dynamic wavelet-based robust wind power smoothing approach using hybrid energy storage system. Int. J. Electr. Power Energy Syst. 116 , 105579 (2020)

Neri, F.: Combining machine learning and agent based modeling for gold price prediction. In: Cagnoni, S., Mordonini, M., Pecori, R., Roli, A., Villani, M. (eds.) WIVACE 2018. CCIS, vol. 900, pp. 91–100. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-21733-4_7

Chapter   Google Scholar  

Mohan, S., Solanki, A.K., Taluja, H.K., Singh, A., et al.: Predicting the impact of the third wave of COVID-19 in India using hybrid statistical machine learning models: a time series forecasting and sentiment analysis approach. Comput. Biol. Med. 144 , 105354 (2022)

Bezerra, J.D.M., Pinheiro, A.J., de Souza, C.P., Campelo, D.R.: Performance evaluation of elephant flow predictors in data center networking. Future Gener. Comput. Syst. 102 , 952–964 (2020)

Danbatta, S.J., Varol, A.: Modeling and forecasting of tourism time series data using ANN-Fourier series model and Monte Carlo simulation. In: 2021 9th International Symposium on Digital Forensics and Security (ISDFS), pp. 1–6. IEEE (2021)

Wang, X., Han, M.: Improved extreme learning machine for multivariate time series online sequential prediction. Eng. Appl. Artif. Intell. 40 , 28 (2015)

Soualhi, A., Medjaher, K., Celrc, G., Razik, H.: Prediction of bearing failures by the analysis of the time series. Mech. Syst. Signal Process. 139 , 106607 (2020)

Sinha, A., Jana, P.K.: MRF: MapReduce based forecasting algorithm for time series data. Procedia Comput. Sci. 132 , 92 (2018)

Abuhay, T.M., Nigatie, Y.G., Kovalchuk, S.V.: Towards predicting trend of scientific research topics using topic modeling. Procedia Comput. Sci. 136 , 304 (2018)

Lijuan, W., Guohua, C.: Seasonal SVR with FOA algorithm for single-step and multi-step ahead forecasting in monthly inbound tourist flow. Knowl.-Based Syst. 110 , 157 (2016)

Xu, W., Binbin, H., Xiao, Y., Cirenluobu, KanAike, Jinji, L.: A spatio-temporal series simulation and prediction method of geography based on SVR-CA model. In: Proceedings of the 2nd International Conference on Intelligent Information Processing, pp. 1–7 (2017)

Zhang, L., Meng, W., Chen, A., Mei, M., Liu, Y.: Application of LSTM neural network for urban road diseases trend forecasting. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 4176–4181. IEEE (2018)

Ferreira, W.D.A.P., Grout, I., da Silva, A.C.R.: Application of a fuzzy ARTMAP neural network for indoor air quality prediction. In: 2022 International Electrical Engineering Congress (iEECON), pp. 1–4. IEEE (2022)

Han, J., Lee, G.H., Park, S., Lee, J., Choi, J.K.: A multivariate-time-series-prediction-based adaptive data transmission period control algorithm for IoT networks. IEEE Internet Things J. 9 , 419 (2021)

Espinosa, R., Palma, J., Jiménez, F., Kamińska, J., Sciavicco, G., Lucena-Sánchez, E.: A time series forecasting based multi-criteria methodology for air quality prediction. Appl. Soft Comput. 113 , 107850 (2021)

Jansen, F., Holenderski, M., Ozcelebi, T., Dam, P., Tijsma, B.: Predicting machine failures from industrial time series data. In: 2018 5th International Conference on Control, Decision and Information Technologies (CoDIT), pp. 1091–1096. IEEE (2018)

Ghany, K.K.A., Zawbaa, H.M., Sabri, H.M.: COVID-19 prediction using LSTM algorithm: GCC case study. Inform. Med. Unlocked 23 , 100566 (2021)

Wang, H., Li, M., Yue, X.: IncLSTM: incremental ensemble LSTM model towards time series data. Comput. Electr. Eng. 92 , 107156 (2021)

Bedi, J., Toshniwal, D.: Energy load time-series forecast using decomposition and autoencoder integrated memory network. Appl. Soft Comput. 93 , 106390 (2020)

Luo, C., Tan, C., Wang, X., Zheng, Y.: An evolving recurrent interval type-2 intuitionistic fuzzy neural network for online learning and time series prediction. Appl. Soft Comput. 78 , 150 (2019)

Gay, D., Guigourès, R., Boullé, M., Clérot, F.: Feature extraction over multiple representations for time series classification. In: Appice, A., Ceci, M., Loglisci, C., Manco, G., Masciari, E., Ras, Z.W. (eds.) NFMCP 2013. LNCS (LNAI), vol. 8399, pp. 18–34. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08407-7_2

Li-wei, H.L., Nemati, S., Mark, R.G.: Hemodynamic monitoring using switching autoregressive dynamics of multivariate vital sign time series. In: 2015 Computing in Cardiology Conference (CinC), pp. 1065–1068. IEEE (2015)

Widiputra, H.: Evaluation of multivariate transductive neuro-fuzzy inference system for multivariate time-series analysis and modelling. In: Proceedings of the 5th International Conference on Sustainable Information Engineering and Technology, pp. 45–50 (2020)

Download references

Acknowledgments

This work was partially supported by the LABEX-TA project MeFoGL: “Méthodes Formelles pour le Génie Logiciel”.

Author information

Authors and affiliations.

ReDCAD, ENIS, University of Sfax, Sfax, Tunisia

Sawssen Briki, Nesrine Khabou & Ismael Bouassida Rodriguez

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Sawssen Briki .

Editor information

Editors and affiliations.

Bordeaux INP, Talence, France

Mohamed Mosbah

Dublin City University, Dublin, Ireland

Tahar Kechadi

ENSMA, Poitiers, France

Ladjel Bellatreche

University of Sfax, Sfax, Tunisia

Faiez Gargouri

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Cite this paper.

Briki, S., Khabou, N., Bouassida Rodriguez, I. (2024). A Comparative Analysis of Time Series Prediction Techniques a Systematic Literature Review (SLR). In: Mosbah, M., Kechadi, T., Bellatreche, L., Gargouri, F. (eds) Model and Data Engineering. MEDI 2023. Lecture Notes in Computer Science, vol 14396. Springer, Cham. https://doi.org/10.1007/978-3-031-49333-1_1

Download citation

DOI : https://doi.org/10.1007/978-3-031-49333-1_1

Published : 22 December 2023

Publisher Name : Springer, Cham

Print ISBN : 978-3-031-49332-4

Online ISBN : 978-3-031-49333-1

eBook Packages : Computer Science Computer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

Applications of time series analysis in epidemiology: Literature review and our experience during COVID-19 pandemic

Affiliations.

  • 1 Department of Informatics, New Bulgarian University, Sofia 1618, Bulgaria. [email protected].
  • 2 Department of Diagnostic Imaging, Medical University Plovdiv, Plovdiv 4000, Bulgaria.
  • 3 Department of Genetics, Faculty of Biology, Sofia University "St. Kliment Ohridski", Sofia 1164, Bulgaria.
  • 4 Department of Epidemiology and Disaster Medicine, Medical University, University Hospital "St George", Plovdiv 4000, Bulgaria.
  • 5 Department of Medical Faculty, Sofia University, St. Kliment Ohridski, Sofia 1407, Bulgaria.
  • PMID: 37946767
  • PMCID: PMC10631421
  • DOI: 10.12998/wjcc.v11.i29.6974

Time series analysis is a valuable tool in epidemiology that complements the classical epidemiological models in two different ways: Prediction and forecast. Prediction is related to explaining past and current data based on various internal and external influences that may or may not have a causative role. Forecasting is an exploration of the possible future values based on the predictive ability of the model and hypothesized future values of the external and/or internal influences. The time series analysis approach has the advantage of being easier to use (in the cases of more straightforward and linear models such as Auto-Regressive Integrated Moving Average). Still, it is limited in forecasting time, unlike the classical models such as Susceptible-Exposed-Infectious-Removed. Its applicability in forecasting comes from its better accuracy for short-term prediction. In its basic form, it does not assume much theoretical knowledge of the mechanisms of spreading and mutating pathogens or the reaction of people and regulatory structures (governments, companies, etc. ). Instead, it estimates from the data directly. Its predictive ability allows testing hypotheses for different factors that positively or negatively contribute to the pandemic spread; be it school closures, emerging variants, etc. It can be used in mortality or hospital risk estimation from new cases, seroprevalence studies, assessing properties of emerging variants, and estimating excess mortality and its relationship with a pandemic.

Keywords: Auto-regressive integrated moving average; COVID-19; Epidemiology; Excess mortality; Pandemic; Seroprevalence; Time series analysis.

©The Author(s) 2023. Published by Baishideng Publishing Group Inc. All rights reserved.

PubMed Disclaimer

Conflict of interest statement

Conflict-of-interest statement: The authors declare no conflict of interest.

Similar articles

  • Forecasting the dynamics of cumulative COVID-19 cases (confirmed, recovered and deaths) for top-16 countries using statistical machine learning models: Auto-Regressive Integrated Moving Average (ARIMA) and Seasonal Auto-Regressive Integrated Moving Average (SARIMA). ArunKumar KE, Kalaga DV, Sai Kumar CM, Chilkoor G, Kawaji M, Brenza TM. ArunKumar KE, et al. Appl Soft Comput. 2021 May;103:107161. doi: 10.1016/j.asoc.2021.107161. Epub 2021 Feb 8. Appl Soft Comput. 2021. PMID: 33584158 Free PMC article.
  • Improvement of Time Forecasting Models Using Machine Learning for Future Pandemic Applications Based on COVID-19 Data 2020-2022. K Abdul Hamid AA, Wan Mohamad Nawi WIA, Lola MS, Mustafa WA, Abdul Malik SM, Zakaria S, Aruchunan E, Zainuddin NH, Gobithaasan RU, Abdullah MT. K Abdul Hamid AA, et al. Diagnostics (Basel). 2023 Mar 15;13(6):1121. doi: 10.3390/diagnostics13061121. Diagnostics (Basel). 2023. PMID: 36980429 Free PMC article.
  • Forecasting COVID-19 confirmed cases, deaths and recoveries: Revisiting established time series modeling through novel applications for the USA and Italy. Gecili E, Ziady A, Szczesniak RD. Gecili E, et al. PLoS One. 2021 Jan 7;16(1):e0244173. doi: 10.1371/journal.pone.0244173. eCollection 2021. PLoS One. 2021. PMID: 33411744 Free PMC article.
  • Analyzing and Forecasting Pediatric Fever Clinic Visits in High Frequency Using Ensemble Time-Series Methods After the COVID-19 Pandemic in Hangzhou, China: Retrospective Study. Zhang W, Zhu Z, Zhao Y, Li Z, Chen L, Huang J, Li J, Yu G. Zhang W, et al. JMIR Med Inform. 2023 Sep 20;11:e45846. doi: 10.2196/45846. JMIR Med Inform. 2023. PMID: 37728972 Free PMC article.
  • Forecasting COVID-19 pandemic: Unknown unknowns and predictive monitoring. Luo J. Luo J. Technol Forecast Soc Change. 2021 May;166:120602. doi: 10.1016/j.techfore.2021.120602. Epub 2021 Jan 19. Technol Forecast Soc Change. 2021. PMID: 33495665 Free PMC article. Review.
  • Zeger SL, Irizarry R, Peng RD. On time series analysis of public health and biomedical data. Annu Rev Public Health. 2006;27:57–79. - PubMed
  • Gu YL. Why do we Sometimes get Nonsense-Correlations between Time-Series?--A Study in Sampling and the Nature of Time-Series. J R Stat Soc. 1926;89:1.
  • Barlow NS, Weinstein SJ. Accurate closed-form solution of the SIR epidemic model. Physica D. 2020;408:132540. - PMC - PubMed
  • Roda WC, Varughese MB, Han D, Li MY. Why is it difficult to accurately predict the COVID-19 epidemic? Infect Dis Model. 2020;5:271–281. - PMC - PubMed
  • McBane G. SIR (Susceptible–Infectious–Removed) Model of Epidemiology as an Extended Example for Chemical Kinetics Students. J Chem Educ. 2021;98:2906–2911.

Publication types

  • Search in MeSH

Related information

Linkout - more resources, full text sources.

  • Baishideng Publishing Group Inc.
  • Europe PubMed Central
  • PubMed Central

Miscellaneous

  • NCI CPTAC Assay Portal
  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Trop Med Health
  • v.43(1); 2015 Mar

Logo of tropmedhealth

A Systematic Review of Methodology: Time Series Regression Analysis for Environmental Factors and Infectious Diseases

Chisato imai.

1 Department of Pediatric Infectious Diseases, Institute of Tropical Medicine, Nagasaki University, 1-12-4 Sakamoto, Nagasaki, Japan 852-8523 (CI and MH)

2 Research Fellow of Japan Society for the Promotion of Science, Japan

Masahiro Hashizume

Background: Time series analysis is suitable for investigations of relatively direct and short-term effects of exposures on outcomes. In environmental epidemiology studies, this method has been one of the standard approaches to assess impacts of environmental factors on acute non-infectious diseases (e.g. cardiovascular deaths), with conventionally generalized linear or additive models (GLM and GAM). However, the same analysis practices are often observed with infectious diseases despite of the substantial differences from non-infectious diseases that may result in analytical challenges. Methods: Following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines, systematic review was conducted to elucidate important issues in assessing the associations between environmental factors and infectious diseases using time series analysis with GLM and GAM. Published studies on the associations between weather factors and malaria, cholera, dengue, and influenza were targeted. Findings: Our review raised issues regarding the estimation of susceptible population and exposure lag times, the adequacy of seasonal adjustments, the presence of strong autocorrelations, and the lack of a smaller observation time unit of outcomes (i.e. daily data). These concerns may be attributable to features specific to infectious diseases, such as transmission among individuals and complicated causal mechanisms. Conclusion: The consequence of not taking adequate measures to address these issues is distortion of the appropriate risk quantifications of exposures factors. Future studies should pay careful attention to details and examine alternative models or methods that improve studies using time series regression analysis for environmental determinants of infectious diseases.

Introduction

Time series regression analysis is one of the most common methods practiced in environmental epidemiology studies. Time series analysis usually follows one population or community throughout the study period and requires health outcome (dependent) and exposure (independent) variables measured repeatedly over time and at the fixed interval (e.g. on daily or weekly basis). In the analysis, impacts of exposures on outcomes are evaluated by comparing the changes over time in the rates of outcome occurrences and the corresponding level of exposures. Because within-one-community comparison does not require the denominator data unless the targeted population changes over time [ 1 ], the advantages of the analysis is that individual level confounders and uncertainty of the covered area for study are not considered as problems. Instead, time-varying covariates are considered important confounding factors.

Time series analysis is typically suitable for investigations on relatively direct and short-term effects of exposures. In environmental epidemiology studies, it has long been applied to assess the impacts of air pollution and meteorological variability on acute non-infectious diseases that are routinely collected in database, that is, deaths, hospital admissions or visits [ 2 ]. Conventionally, generalized linear models (GLMs) and generalized additive models (GAMs) are the standard models for the analyses [ 1 – 3 ].

Though time series analysis in environmental epidemiology studies has been widely used for non-infectious diseases, it is also being used for infectious diseases in the same manner. Infectious diseases are substantially different from acute non-infectious diseases (e.g. cardiovascular deaths, cardiac arrests, asthma attacks) in the nature of causal mechanisms and the population at risk. More precisely, the distinct difference from non-infectious diseases is that the incidence of infectious disease often dependent on transmissions among individuals, the presence of intermediators (e.g. vectors), and temporary or permanent immunity protection. These differences might consequently result in statistical challenges when applying infectious diseases to the conventional time series method, yet no study to date has summarized the potential considerations. The present article is a review of the literature for studies in which associations between infectious disease and environmental factors are evaluated with GLMs and GAMs, aiming to characterize the potential methodological challenges involved in the analyses. Other time-series methods developed from econometrics [ 4 ] and forecasting such as autoregressive integrated moving average (ARIMA) are not considered here because of the different modeling structure and required model components. The literature review was conducted following the guidelines of Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) [ 5 ].

Time series regression model

Here we first introduce a brief overview of the time series regression model. An outcome of interest is usually a count of disease occurrence. The outcome counts and measured exposure factors of interest should be in order of time and at the fixed interval in dataset. The most common regression model is Poisson regression model, also known as GLM with Poisson distribution, which can be expressed as follows:

Y t ~ Poisson (μ t )

log ( μ t ) = ζ 0 + ζ x t + Σ p η p z p,t + f ( t ).

where Y is the disease count at the time t , ζ 0 is the intercept, f ( t ) denotes the smoothing function of time to remove the effects of seasonality and long term trend, x t represents the exposure factors, and Σ p η p z p , t denotes other time-varying covariates [ 6 ]. Adjustments of seasonal variation and long term trend in a model characterize the traditional time series method and are required to differentiate their effects from the short-term associations between exposures factors and outcome of interest. For the seasonal variation adjustments, alternatively, the time stratifications and trigonometric terms (Fourier) are widely used. Further details about time series regression models are described elsewhere [ 6 ].

Literature search strategy

Our aim was to summarize the characteristics of analyses of studies using GLMs or GAMs to assess associations between infectious diseases and environmental factors. We conducted systematic reviews on published articles in the online electronic database of PubMed (http://www.ncbi.nlm.nih.gov/pubmed). Since the exposure factors of our interest were particularly climate or weather, we limited our review to the climate-sensitive infectious diseases for targeted diseases in this study, that is, malaria, cholera, dengue, and influenza. In the search on PubMed, the following key designated terms were included: “weather” OR “climate” OR “temperature” OR “rainfall” OR “precipitation” OR “humidity” AND the name of each disease (“malaria”, “dengue”, “cholera” and “influenza”). For further specific identifications, studies were restricted to journal articles written in English and targeting human health outcomes through the additional filter functions of “article types”, “language” and “species” on PubMed. Publications dated from January 1st, 1995 to November 5th, 2013, identified as of December 4th, 2013, were included in the search.

Selection of articles

A total of 2,598 reports was found through the designated search on the online database. Since a large number of articles was identified, precise measures were taken for screening and eligibility assessments ( Fig. 1 ). After the duplicates were removed, two authors screened the titles of the studies to determine whether the studies looked at associations between infectious diseases and weather or climate factors. The articles selected by either one of two authors in the title screening process were then re-assembled, and the following procedure of eligibility selections was conducted in two steps by one author. First, the abstract and method sections were examined to determine whether GLMs or GAMs were used as analysis methods, and studies apparently using irrelevant methods were discarded. In the second step, the full text of the rest of the studies was reviewed to confirm that the purpose and analysis method of each study were suitable for our literature review.

An external file that holds a picture, illustration, etc.
Object name is tmh-43_2014-21-g001.jpg

PRISMA diagram flow of systematic review.

Review schemes for study designs and analytical methods

In order to pursue the strategic reviews of analytic methodology, we have set certain schemes to investigate. The 13 schemes are as follows; author and publication year; study period; study location; age and group of targeted population; outcome of interests; exposure factors; statistical models; time unit of data; confounder controls (season, trend, and others); variation in the susceptible population; autocorrelation; lag estimate of exposure factors; overdispersion.

Of the 2,598 reports initially identified by our designated electronic search on PubMed, 33 articles were selected for our review at the end of the eligibility evaluation. These 33 articles consist of 9 malaria [ 7 – 15 ], 13 dengue [ 16 – 28 ], 9 cholera [ 29 – 37 ], and 2 influenza [ 38 , 39 ] studies (Table ​ (Table1). 1 ). Table ​ Table2 2 shows the locations in which the reviewed studies were conducted. The study locations are mostly low- and middle-income countries in tropics, as our targeted diseases, except for influenza, are most prevalent in the areas [ 40 ].

Table 1.

Ref.Author, yearStudy period (year)City (Country)ExposureStatistical modelUnit of
data
Confounder controlVariation in susceptible populationAutocorrelation*Assessed Lag*Overdispersion
SeasonTrendOthers
Malaria Kim, et al., 20122001–2009the capital region (Korea)temperature, RH, diurnal temperature range (DTR), duration of sunshineGLM PoissonweeklyFourier termsyear0 to 8 weeks single lag (SL) for all cliamte parameters, rainfall 0 to 60 days (SL)Overdispersion parameter included
Jusot, et al., 20112000–2003Magaria (Niger)rainfallGAM negative binomial (NB)dailypenalised cubic regression splinereligious celebrations, days of the week, holidays, min & max temp, RHpenalised cubic regression spline is to minimize the autocorrelation0 to 40 days (SL)NB distribution model
Haque, et al., 20101989–2008Rangamati
district, (Bangladesh)
temperature, rainfall, humidity, normalized difference vegetation index (NDVI), SST of the Bay of Bengal, NINO3GLM NBmonthlymonthyearAR(1) includedall (except NINO): 0 to 3 months moving average (MA), NINO3: 0 to 3, 4 to 7, 8 to 11 (MA)NB distribution model
Xiao, et al., 20101995–2006Hinan (China)temperature, rainfall, RHPoisson regressionmonthlypopulationthe cases for the previous months0 to 3 months (SL)
Olson, et al., 20091996–1999Brazilian Amazon regiontemperature, rainfallPoisson regressionmonthlynatural cubic splinepopulation (offset)
Hashizume, et al., 20081982–2011western Kenyan highlandsDMI (diapole mode index), NINO3, rainfallGLM Poissonmonthlymonthyearpopulation not considered since trends in malaria rates are included in the modelAR(1) included0 to 6 months (SL)included overdispersion parameter
Teklehaimanot, et al., 20041990–2000Ethiopiatemperature, rainfallPoisson regressionweeklyweek (of the year)AR included (based on a moving average of the number of cases four, five and six weeks before)rainfall: 4 to 12 weeks (MA) temperature: 4 to 10 weeks (MA)
Teklehaimanot, et al., 20041990–2000Ethiopiatemperature, rainfallPoisson regressionweeklytime variabledistrict, interaction between time and districtrainfall: 4 to 12 weeks (MA) temperature: 3 to 10 weeks (MA)
Abeku, et al., 20031986–1993Ethiopiatemperature, rainfallGLMM (mixed model)monthlylog (numer of cases in the previous month) was included as sector-specific random effectslog (numer of cases in the previous month) as sector-specific random effects handles spatial and temporal autocorrelations.rainfall: 1 and 2 months distributed lag (DL) temperature: 1 month (SL)
Dengue Hii, et al., 20122000–2011Singaporetemperature, rainfallPoisson regressionweeklyseason parametertrend parameterpopulation (offset)the past number of cases12 to 24 weeks (SL)developed Poisson regression model that allowed overdispersion
Gomes, et al., 20122001–2009Rio de Janeiro (Brazil)rainfall, temperature, proportions of days in the month: mean temperature < 22(°C), 22 ≤ mean temperature < 26, 26 ≤ mean temperatureGLM Poisson & NBmonthlyyearpopulation × the number of days in the month (offset)1 and 2 months (SL)NB distribution model
Lowe, et al., 20112001–2009Southeast Brazilrainfall, temperature, Oceanic Niño Index (ONI)GLMM NBmonthlymonthexpected number (offest): the population × global dengue rate. cartographic, demographic, and economic variablesinclusion of unstructured random effect to be surrogate for not only population immunity, but quality of healthcare services and local health interventionsthe log standardised morbidity ratio lagged by 3 months was included in the model.temperature and rain: 3 month (MA), ONI: 4 month (SL)NB distribution model
Hashizume, et al., 20122005–2009Dhaka (Bangladesh)river levels, temperature, rainfallGLM PoissonweeklyFourier termsyearpublic holidaysAR(1) includedassessed up to 26 weeksused generalized linear Poisson regression models allowing for overdispersion
Earnest, et al., 20122001–2008Singaporetemperature, rainfall, RH, ours of sunshine and hours of cloud, Southern Oscillation Index (SOI)Poisson regressionweeklysinusoidal termsAR(2) included0 to 12 week (SL)included overdispersion parameter
Pham, et al., 20112004–2008Dak Lak province, Vietnamtemperature, duration of sunshine, rainfall, RH, larval index (household index, the container index, and the Breteau index)Poisson regressionmonthlySeasonal componentsTrend componentsAR(1) included
Pinto, et al., 20112000–2007Singaporerainfall, temperature, RHPoisson regressionweekly0 to 40 week (SL)
Shang, et al., 20101998–20073 areas in Southern Taiwan (Tinan, Kaohsiung, and Pingtung)temperature, RH, wind speed, rainfall, rainy hours, sunshine accumulation hours, sunshine rate (from sunrise to sunset), sunshine total flux, imported dengue casesPoisson regression, and GLM NBbi-weeklyFourier termsarea, population desityassessed 1 to 12 bi-weeks which is equivalent to 2 tp 24 weeks (SL)NB distribution model
Chen, et al., 20101998–2008Taipei and Kaohsiung (Taiwan)temperatures, rainfall intensity, RHPoisson regression, GEEmonthlythe percentage of monthly Breteau index (BI) levels > 2 (index for the potential transmission risk)0 to 4 months (SL)
Tipayamongkholgul, et al., 20091996–2005all provinces in Thailandthe multivariate ENSO index (MEI), the sea level pressure index (SLP), temperatures, RH, wind speedquasi-Poisson or NBmonthlysinusoidal termspopulation (offset), province, population densitythe cases of the previous month1 to 12 months (SL)used quasi-Poisson or NB
Lu, et al., 20092001–2006Guangzhou (China)temperatures, rainfall, RH, wind velocityPoisson regression, GEEmonthlyAR(1) included0 to 3 months (SL)included overdispersion parameter
Johansson, et al., 20091986–2006all manicipalities in Puerto Ricotemperatures, rainfallPoisson regressionmonthlynatural cubic spline on observational timepopulation (offest), % of population below the poverty linetemperature: 0 to 2 month (DL), rain: 1 to 2 (DL)
Thammapalo, et al., 20051978–199773 provinces in Thailandrainfall, rainny days, temperatures, RHPoisson regressionmonthlyFourier termstime in month (t) and (t) the lagged residual series is includednone
Cholera Hashizume, et al., 20111993–2007Dhaka (Bangladesh)DMI, NINO3, SST and SSH of the northern Bay of BengalGLM negative binomial (NB)monthlymonthyearnot consideredlagged model residual included (Brumback method)0–3, 4–7, 8–11 months (MA)NB distribution model
Rajendran, et al., 20111996–2008Kolkata (India)temperature, RH, rainfallGLM, SARIMAdailyexponential smoothing function
Hashizume, et al., 20101983–2008Dhaka (Bangladesh)temperature, rainfallGLM PoissonweeklyFourier termsyearsampling proportionhigh rain: 0–8 (MA), low rain: 0–16 (MA), temperature: 0–4 (MA)included overdispersion parameter
Paz, 20091971–20068 African countries: Uganda, Kenya, Rwanda, Burundi, Tanzania, Malawi, Zambia, and Mozambiqueair temperature, sea surface temperature (the western Indian Ocean), anomaly air temperaturePoisson regressionyearlyAR1 = cor (Yt, Yt-1) is taken into account in the estimation using generalized estimating equations.0 and 1 year (SL)
Constantin de Magny, et al., 20081997–2006Matlab (Bangladesh) and Kolkata (India)SST, rain, chlorophyll a concetrationGLM quasi-Possionmonthlyquarter periods of a yearlog (number of cases for the previous month)0 and 1 month (SL)quasi-Poisson model
Martinez-Urtaza, et al., 20081994–2005PeruSST, sea height anmoaly, heat content above 20°CGAM NB & ridge regression with penalties to identify zero-inflationweeklythin plate regression splinesobservational time × smoothing (when autocorrelation was seen in residuals) included1 to 5 weeks (SL)NB distribution model
Luque Fernández, et al., 20082003–2006Lusaka (Zambia)temperature, rainfallGLM Poissonweeklysinusoidal termsthe cases for the previous week.temperature 6 weeks (SL), rainfall 3 weeks (SL)examined by standard errors were scaled using the square root of the Pearson chi2 dispersion.
Hashizume, et al., 20081996–2002Dhaka (Bangladesh)rainfall, river level, temperatureGLM PoissonweeklyFourier termsyearpublic holidaysAR(1) includedrainfall: 0 to 16 weeks (MA), river level: 0 to 4 weeks (MA)
Huq, et al., 20051997–20005 different cities, (Bangladesh)water temperature, air temperature, water depth, pH, rainfallPoisson regressionbimonthly0, 2, 6, 4, 8 months (SL)
Influenza Hu, et al., 20122009Brisbane (Australia)temperature, rainfall, interactionPoisson regression, spatiotemporal analysis (CAR)weeklysinusoidal termssocio-economic index, population (offset), spatially structured random effectAR(1) included1 week single lag (SL)
Jusot, et al., 20112009–2010Nigertemperature, relative humidity (RH), wind speed, visibilityGAMdailyseasonal componentstrend componentsday of the week, holidays, religious festival, and pilgrimage

Blanks represent unknown for the case no statements are made in articles regarding each category. Otherwise whether it was considered or how it was considered are stated in this table.

* SL: single lag, MA: moving average, DL: distribute lag, AR: auto-regressive term

Table 2.

Study locations.

RegionCountriesNumber of studies
(n = 33)
AfricaBurundi, Ethiopia, Kenya, Niger, Malawi, Rwanda, Tanzania, Uganda, Zambia8
East AsiaChina, Taiwan, Korea5
Southeast AsiaThailand, Vietnam, Singapore6
South AsiaIndia, Bangladesh8
Central/South AmericaPeru, Puerto Rico, Brazil5
OceaniaAustralia1

The counts for outcome diseases of interest used in the studies were mostly in the time unit of weeks and months (29 studies). Daily and yearly counts were not as common, being only 5 and 1 studies respectively (Table ​ (Table3 3 ).

Table 3.

Summary of modelling characteristics

Number of studies (n = 33)
Unit of outcome data
 Daily3
 Weekly (including bi-weekly)13
 Monthly (including bi-monthly)16
 Yearly1
Regression models
 GLM (Poisson, quasi-Poisson, negative binomial)28
 GAM (Poisson, negative binomial)3
 Mixed models2
Control of seasonality and long term trend
 Some adjustments were included in the model25
 No adjustments / not described8
Autocorrelation
 Examined / included parameters to control autocorrelation21
 No specific measures / not described12
Lag effects of exposure
 Lag effects of whether variables were assessed28
 No lag effect assessments5

As specified in the review criteria, the regression models were GLM and GAM with different distribution models, i.e. Poisson, quasi-Poisson, and negative binomial (31 studies). The other two studies integrated mixed models. Among the studies, 18 used models allowing for overdispersion, if any, by inclusion of an overdispersion parameter or selection of different distribution models (e.g. quasi-Poisson or negative binomial).

As mentioned above, an adjustment of seasonal variation and long-term trend is part of the standard approach in the typical time-series regression. In our review, 25 of the 33 studies (76%) included terms in models that allow for seasonality and trends with natural spline functions on time, trigonometric functions, or month and year indicator variables. Other than adjustments for cyclic seasonality and long term trend effects, more than half of the reviewed studies commonly indicated considerations or attempts to control autocorrelation (21 studies). Autocorrelation adjustments may have been necessary because time series are generally subjected to high autocorrelation caused by serial correlations between observations close in time distance. In those 21 studies, the most popular method for autocorrelation controls was to incorporate autoregressive terms including lagged outcome values, the logarithm of lagged outcome values, and lagged model residuals (19 studies).

Other covariates were also included in many studies, including spatial factors if studies involved different geographical areas, population number, risk related index, and holiday indicators. In risk assessments of exposure factors, time lag effects were considered in the majority of the reviewed studies (28 studies). However, we found that the analyzed lag forms (i.e. single lag, moving average lag, or distributed lag) and the time length of lag varied by study regardless of the same targeted disease. While evaluated lag lengths were, if predetermined, often supported by literature reviews and biological plausibility, many did not provide the rationales of assessed lag lengths. In some exploratory studies, on the other hand, long lag lengths were investigated to observe the thorough exposure effects over time. Another finding in our review was, even though infectious diseases generally confer temporary or permanent immunity, the susceptible or immune population was rarely addressed in study models. No studies computed or integrated the estimated susceptible population, and a few studies instead included proxies (e.g. vaccination rate) to account for the target population’s susceptible risk.

While time series analysis with GLMs or GAMs is the established method in environmental epidemiology research, our review brings attention to several potential issues when the same application of the traditional approach for non-infectious diseases extends to infectious diseases.

First, immune protection, which is one of the unique features of infectious diseases, can lead to rapid changes in the underlying population at risk over the course of the study period, but few studies have addressed the susceptible or immune population in their models. The information on immune population can be critical as host immune competence (intrinsic factor) and environmental (extrinsic) factors are both important contributors to seasonal disease activity [ 41 ]. In particular, the importance of the interplay of intrinsic and extrinsic factors is illustrated in one cholera study in which the developments of outbreaks is unsuccessful, even with the disease’s favorable environmental conditions when the susceptible population is small [ 42 ]. The consequence of not taking into account the susceptible population in a model is the misquantification of the effects of environmental exposures. However, since estimates of immune or susceptible individuals within a population seldom exist in data, it is often necessary to create alternative measures to increase the precision of the analysis. The alternative approaches may include, but are not limited to, reconstructing estimation of susceptible population by deterministic models (e.g. susceptible-infected-recovered models) and proxy indicators such as vaccination rates.

Secondly, while adjustments for seasonal variations and long term trends were common, one third of the reviewed articles did not include the adjustment measures in their models. The reason is unknown, yet one possible reason might be less apparent seasonal variations of disease activity. For instance, while in temperate climate regions have epidemics of influenza on a regular basis in winter time, malaria often presents a less obvious periodic pattern of seasonality. In general, adjustments for seasonality variation in the traditional time series analysis involve two important meanings, i.e. elimination of the effects of unknown time-varying covariates and realization of the regression assumption of independence. Realization of the independence assumption is a particularly important underlying regression hypothesis for time series analysis, because observations of a variable that are close in time tend to be similar and are generally correlated (i.e. autocorrelation) [ 1 ]. When seasonality is absent in the outcome data at a glance, the question may naturally arise whether there is any necessity to implement seasonal adjustments in a model. However, given the possibility of serial correlations that may naturally exist in time series data, the question of whether to include seasonal adjustments should be carefully examined using statistical validations (e.g. model fitness and residuals).

Another concern regarding autocorrelations arises when the magnitude of strength and the potential underlying cause are considered. In our literature review, inclusion of autoregressive terms in addition to seasonal adjustments to control autocorrelation was commonly observed (19 studies), which, for one reason, may imply that the adjustment of seasonality variation alone is not sufficient. In general, an imperfect control of autocorrelation suggests omissions of other significant time-varying covariates from a model [ 43 ]. However, given the characteristics of infectious diseases, a stronger autocorrelation than controlled seasonality may be induced by the actual correlation in outcome observations due to disease transmissions among individuals. In other words, the true dependence among neighboring observations can be present with infectious disease data because the number of newly infected individuals depends on the number of previously infected individuals in the population. In fact, some studies [ 15 , 16 ] included autoregressive terms (e.g. a lagged outcome or logarithm of lagged outcome) to account for the dependency of infectious diseases data. This correlation is also known as “true contagion” [ 44 ], and the resulting violation of the assumption of independence will cause biases not in the regression coefficients but in the estimates of standard errors [ 43 ]. Thus, the discussion again returns to the importance of implementing adequate seasonality adjustments with statistical validations and the need for additional measures if autocorrelation in model residuals remains. In order to competently address the autocorrelation resulting from true contagion or transmissibility of infectious diseases, it might be worthwhile in the future to explore what approaches are not only statistically effective but also biologically compelling from the aspect of disease mechanisms.

Thirdly, in the process of estimating lag effects of exposure factors, the lag timings evaluated varied by studies in spite of the same targeted disease. This may be because the quantitative evidences needed to establish the optimal lag timings remains elusive with most diseases, although there might be qualitatively convincing ideas. The difficulty of estimating the optimal lag times may be especially severe in vector-borne diseases. In these diseases, the transmission mechanisms become highly complicated due to the intermediating effects of vectors which influence the strong disease seasonality [ 45 ], but they can also be highly content-dependent. For instance, the association patterns and lags of rainfall effects in malaria vary widely by region and climate conditions (e.g. whether the region is generally dry or has abundant rain) [ 46 ]. More importantly, however, time lags and association patterns can be more complicated in infectious diseases than non-infectious diseases because the mechanism of disease manifestation (e.g. incubation period) and the transmission dynamics of pathogenic microorganisms (e.g. bacteria, viruses, parasites, or fungi) play a critical role in the causal pathway. Therefore, an understanding of biological mechanisms can be of great help in estimating lags and association patterns. If no certain prior knowledge exists or complicated transmission pathways are expected, then strategic exploration approaches are required to find the optimal estimates.

Lastly, most of our reviewed studies conducted an analysis using weekly or monthly data (including bi-weekly and bi-monthly). Unlike non-infectious diseases, daily count outcomes were much less common. This relates to only certain infectious diseases, but it is worth noting that using the longer time unit of data may sometimes lead to an underestimation of risk factors when the optimal time lags of exposure effects and disease incubation periods are short (e.g. monthly data is used for analysis when the optimal exposure effects are expected in one week lag). Wherever possible, selection of the most statistically robust and biologically plausible time unit of data is desirable for analysis.

Our study has some limitations. The first is that, among all the diseases potentially linked to weather variability, only four diseases were selected for the review. As a result, we may have eliminated studies that could have delivered some insightful analytical approaches. In review of our aim to characterize the methodological trends, however, our selected diseases were probably sufficient because they consist of different types of infectious diseases including water-borne, vector-borne, and air-borne diseases. Another limitation is that GLMs and GAMs were the only targeted models, even though other methods such as autoregressive integrated moving average can also fall into the category of time series regression models. Those other time-series methods might have provided solutions for the concerns raised here, but we believe that we have looked at important issues in common with the above that deserve careful attention and awareness. In conclusion, the careful implementation of time series regression analysis is required in the study of environmental determinants of infectious diseases. Further studies are required to explore alternative models and to address methods that will improve the time series analysis.

Acknowledgements

We sincerely thank Ben Armstrong for his insights that formed the basis of this study.

Conflict of Interest

None to declare.

 Peer Review, Refereed, Indexed, Multidisciplinary, Multilanguage, Open Access Journal    Call for Paper    All Policy    Paper Status    

time series analysis literature review

Page Not Found

404 error: page not found.

Sorry, we can’t find the page you’re looking for. It might have been moved or deleted.

Here’s what you can do:

  • Check the URL: Make sure it’s typed correctly.
  • Go to Our Homepage:  ijnrd.org
  • Contact Us: Need help? Email to [email protected].

time series analysis literature review

  • DOI: 10.6084/M9.FIGSHARE.1163874.V1
  • Corpus ID: 2018357

Time Series Data Analysis for Forecasting - A Literature Review

  • Journal Ijmer , Ms. Neelam , Abhinav Jain Er.
  • Published 6 September 2014
  • Computer Science, Engineering, Environmental Science

3 Citations

A comprehensive survey of data mining techniques on time series data for rainfall prediction, profit prediction using arima, sarima and lstm models in time series forecasting: a comparison, an integrated approach for flood prediction by using block chain network and machine learning, 30 references, time series data analysis for long term forecasting and scheduling of organizational resources – few cases, forecasting strong seasonal time series with artificial neural networks, time series analysis: forecasting and control, a feed-forward neural networks-based nonlinear autoregressive model for forecasting time series, comparison of short-term rainfall prediction models for real-time flood forecasting, a statistical method for forecasting rainfall over puerto rico, rainfall forecasting in space and time using a neural network, time series analysis, forecasting and control, rainfall forecasting using soft computing models and multivariate adaptive regression splines, how effective are neural networks at forecasting and prediction a review and evaluation, related papers.

Showing 1 through 3 of 0 Related Papers

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to  upgrade your browser .

Enter the email address you signed up with and we'll email you a reset link.

  • We're Hiring!
  • Help Center

paper cover thumbnail

Time Series Data Analysis for Forecasting – A Literature Review

Profile image of IJMER Journal

In today's world there is ample opportunity to clout the numerous sources of time series data available for decision making. This time ordered data can be used to improve decision making if the data is converted to information and then into knowledge which is called knowledge discovery. Data Mining (DM) methods are being increasingly used in prediction with time series data, in addition to traditional statistical approaches. This paper presents a literature review of the use of DM and statistical approaches with time series data, focusing on weather prediction. This is an area that has been attracting a great deal of attention from researchers in the field.

Related Papers

Nashra Javed

time series analysis literature review

IRJMETS Publication

International Research Journal of Modernization in Engineering Technology and Science (IRJMETS)

Climate change is becoming a serious impact nowadays on the environment. Climate change refers to extreme changes in weather conditions. It leads to major threat for human beings. The drastic changes in weather, makes people difficult in predicting the climatic conditions. Therefore, highly scientific techniques like Machine learning algorithms are required to predict the weather conditions. Many tools and techniques are available to collect the weather data. Out of the several techniques used for weather forecasting, data mining approach is considered as the most feasible approach. This paper makes an analysis of the various applications of data mining in weather forecasting.

International Journal for Research in Applied Science and Engineering Technology

Sameer Kaul

International Journal of engineering Research and science

ashwini mandale

Weather forecasting is an important application in meteorology and has been one of the most scientifically and technologically challenging problems around the world. In this paper, we analyse the use of data mining techniques in forecasting weather. This can be carried out using Artificial Neural Network and Decision tree Algorithms and meteorological data collected in specific time. The performance of these algorithms was compared using standard performance metrics, and the algorithm which gave the best results used to generate classification rules for the mean weather variables. The results show that given enough case data mining techniques can be used for weather forecasting.

International Journal IJRITCC

Data mining is the computer assisted process of digging through and analysing enormous sets of data and then extracting the meaningful data. Data mining tools predicts behaviours and future trends, allowing businesses to make proactive decisions. It can answer questions that traditionally were very time consuming to resolve. Therefore they can be used to predict meteorological data that is weather prediction. Weather prediction is a vital application in meteorology and has been one of the most scientifically and technologically challenging problems across the world in the last century. Predicting the weather is essential to help preparing for the best and the worst of the climate. Accurate Weather Prediction has been one of the most challenging problems around the world. Many weather predictions like rainfall prediction, thunderstorm prediction, predicting cloud conditions are major challenges for atmospheric research. This paper presents the review of Data Mining Techniques for Weather Prediction and studies the benefit of using it. The paper provides a survey of available literatures of some algorithms employed by different researchers to utilize various data mining techniques, for Weather Prediction. The work that has been done by various researchers in this field has been reviewed and compared in a tabular form. For weather prediction, decision tree and k-mean clustering proves to be good with higher prediction accuracy than other techniques of data mining.

Himanshu Arora

Time series data available in huge amounts can be used in decision-making. Such time series data can be converted into information to be used for forecasting. Various techniques are available for prediction and forecasting on the basis of time series data. Presently, the use of data mining techniques for this purpose is increasing day by day. In the present study, a comprehensive survey of data mining approaches and statistical techniques for rainfall prediction on time series data was conducted. A detailed comparison of different relevant techniques was also conducted and some plausible solutions are suggested for efficient time series data mining techniques for future algorithms.

Dr. Divya Chauhan

International Journal of Information Engineering and Electronic Business

Folorunsho Olaiya

Risul Islam Rasel

Weather forecasting for an area where the weather and climate changes occurs spontaneously is a challenging task. Weather is non-linear systems because of various components having a grate impact on climate change such as humidity, wind speed, sea level and density of air. A strong forecasting system can play a vital role in different sectors like business, agricultural, tourism, transportation and construction. This paper exhibits the performance of data mining and machine learning techniques using Support Vector Regression (SVR) and Artificial Neural Networks (ANN) for a robust weather prediction purpose. To undertake the experiments 6-years historical weather dataset of rainfall and temperature of Chittagong metropolitan area were collected from Bangladesh Meteorological Department (BMD). The finding from this study is SVR can outperform the ANN in rainfall prediction and ANN can produce the better results than the SVR.

IJCSNS International Journal of Computer Science and Network Security

Somia A . Asklany

In Meteorological field, where a huge database takes place; weather prediction is a vital process as it affects people's daily life. In the last century, the accuracy of weather predictions has been one of the most challenging concern facing meteorologists around the world. Atmospheric dust is considered to be a harmful air pollutant causing respiratory diseases and infections from one side as well as affecting the earth's energy budget from the other side, so an early prediction of dust phenomena occurrence can be very useful in reducing its harmful effects. Data mining is mainly a machine learning process for extracting useful information form extremely large data base as it is capable of handling huge, noisy, ambiguous, random and missing data, so it represents a very helpful tool in predicting different weather elements. The virtue of using data mining techniques is that they not only analyse the huge historical data base, but also learn from it for future predictions. In this work, we investigate the use of data mining techniques in forecasting different atmospheric phenomena specially atmospheric dust using Decision Tree, k-NN and Naïve biased algorithms as well as making a comparison between them by evaluating each model results. The proposed models are implemented using the open source data mining tool Rapidminer.

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

RELATED PAPERS

International Journal of Computer Applications

Ravi Khatri

Asian Journal of Research in Computer Science

Dathar Abas Hasan

Manoj Chaudhari

Rohini Patil

International Journal of Scientific Research in Computer Science, Engineering and Information Technology

International Journal of Scientific Research in Computer Science, Engineering and Information Technology IJSRCSEIT

Procedia Computer Science

Martin Gažák

International Journal of Advanced Computer Science and Applications

Addisu Mulugeta

IJAERS Journal

Nithin Chowdary

Godfrey Onwubolu

International Journal of Scientific Research in Science, Engineering and Technology IJSRSET

Anderson Namen

Data Warehousing and …

Sergio Viademonte, PhD.

IRJET Journal

Sakinat Tijani -Folorunso

International Journal of Advance Research in Computer Science and Management Studies [IJARCSMS] ijarcsms.com

Durga Charan

Arthur Kordon

International Journal of Advances in Life Science and Technology

Sherko Murad

Thatiparti venkata Rajini Kanth , Computer Science & Information Technology (CS & IT) Computer Science Conference Proceedings (CSCP)

DR. A. GOVARDHAN , Computer Science & Information Technology (CS & IT) Computer Science Conference Proceedings (CSCP)

Wireless Networks, Information …

Anwar Mirza

  •   We're Hiring!
  •   Help Center
  • Find new research papers in:
  • Health Sciences
  • Earth Sciences
  • Cognitive Science
  • Mathematics
  • Computer Science
  • Academia ©2024

Generative Adversarial Networks in Time Series: A Systematic Literature Review

New citation alert added.

This alert has been successfully added and will be sent to:

You will be notified whenever a record that you have chosen has been cited.

To manage your alert preferences, click on the button below.

New Citation Alert!

Please log in to your account

Information & Contributors

Bibliometrics & citations, view options, 1 introduction.

time series analysis literature review

2 Related Work

3 generative adversarial networks, 3.1 background.

time series analysis literature review

3.2 Challenges

3.3 popular datasets.

Name (Year)Data TypeInstancesAttributes
Oxford-Man Institute “realized library” (updated daily)Real multivariate time series>2,689,4875
EEG Motor Movement/Imagery Dataset (2004)Real multivariate time series1,50064
ECG 200 (2001)Real univariate time series2001
Epileptic Seizure Recognition Dataset (2001)Real multivariate time series11,500179
TwoLeadECG (2015)Real multivariate time series1,1622
MIMIC-III (2016)Real, integer, and categorical multivariate time series
EPILEPSIAE project database (2012)Real multivariate time series30
PhysioNet/CinC (2015)Real multivariate time series7504
Wrist PPG During Exercise (2017)Real multivariate time series1914
MIT-BIH Arrhythmia Database (2001)Real multivariate time series2012
PhysioNet/CinC (2012)Real, integer, and categorical multivariate time series12,00043
KDD Cup Dataset (2018)Real, integer, and categorical multivariate time series2823
PeMS Database (updated daily)Integer and categorical multivariate time series8
Nottingham Music Database (2003)Special text format time series1,000

4 Classification of Time Series Based GANs

time series analysis literature review

4.1 Discrete-Variant GANs

4.1.1 sequence gan (seqgan) (sept. 2016)..

time series analysis literature review

4.1.2 Quant GAN (July 2019).

time series analysis literature review

4.2 Continuous-Variant GANs

4.2.1 continuous rnn-gan (c-rnn-gan) (nov. 2016)..

time series analysis literature review

4.2.2 Noise Reduction GAN (NR-GAN) (Oct. 2019).

time series analysis literature review

4.2.3 TimeGAN (Dec. 2019).

time series analysis literature review

4.2.4 Conditional Sig-Wasserstein GAN (SigCWGAN) (June 2020).

4.2.5 decision-aware time series conditional gan (dat-cgan) (sept. 2020)., 4.2.6 recurrent conditional gan (rcgan) (2017)..

time series analysis literature review

4.2.7 Sequentially Coupled GAN (SC-GAN) (April 2019).

time series analysis literature review

4.2.8 Synthetic Biomedical Signals GAN (SynSigGAN) (Dec. 2020).

time series analysis literature review

5 Applications

5.1 data augmentation.

time series analysis literature review

5.2 Imputation

time series analysis literature review

5.3 Denoising

5.4 anomaly detection, 5.5 other applications, 6 evaluation metrics.

ApplicationGAN Architecture(s)Dataset(s)Evaluation Metrics
Medical/physiological generationLSTM-LSTM [ , , , , , ] LSTM-CNN [ , ] BiLSTM-CNN [ ] BiGridLSTM-CNN [ ] CNN-CNN [ , ] AE-CNN [ ] FCNN [ ]EEG, ECG, EHRs, PPG, EMG, speech, NAF, MNIST, synthetic setsTSTR, MMD, reconstruction error, DTW, PCC, IS, FID, ED, S-WD, RMSE, MAE, FD, PRD, averaging samples, WA, UAR, MV-DTW
Financial time series generation/predictionTimeGAN [ ] SigCWGAN [ ] DAT-GAN [ ] QuantGAN [ ]S& P 500 index (SPX), Dow Jones index (DJI), ETFsMarginal distributions, dependencies, TSTR, Wasserstein distance, EM distance, DY metric, ACF score, leverage effect score, discriminative score, predictive score
Time series estimation/predictionLSTM-NN [ ] LSTM-CNN [ ] LSTM-MLP [ ]Meteorological data, Truven MarketScan datasetRMSE, MAE, NS, WI, LMI
Audio generationC-RNN-GAN [ ] TGAN (variant) [ ] RNN-FCN [ ] DCGAN (variant) [ ] CNN-CNN [ ]Nottingham dataset, midi music files, MIR-1K, TheSession, speechHuman perception, polyphony, scale consistency, tone span, repetitions, NSDR, SIR, SAR, FD, t-SNE, distribution of notes
Time series imputation/repairingMTS-GAN [ ] CNN-CNN [ ] DCGAN (variant) [ ] AE-GRUI [ ] RGAN [ ] FCN-FCN [ ] GRUI-GRUI [ ]TEP, point machine, wind turbine data, PeMS, PhysioNet Challenge 2012, KDD CUP 2018, parking lot data,Visually, MMD, MAE, MSE, RMSE, MRE, spatial similarity, AUC score
Anomaly detectionLSTM-LSTM [ ] LSTM-(LSTM& CNN) [ ] LSTM-LSTM (MAD-GAN) [ ]SET50, NYC taxi data, ECG, SWaT, WADIManipulated data used as a test set, ROC curve, precision, recall, F1, accuracy
Other time series generationVAE-CNN [ ]Fixed length time series “vehicle and engine speed”DTW, SSIM
ArchitectureLoss FunctionToy Sine Dataset
MMDDTWMSE
LSTM-LSTMBCE0.952791.10710.2308
MSE0.007854.1644
BiLSTM-LSTMBCE0.1215428.43103.0700
MSE0.951579.56070.2362
LSTM-CNNBCE0.00655.36200.3154
MSE0.575786.73570.5643
BiLSTM-CNNBCE 129.92570.9193
MSE0.489143.26940.1869
GRU-CNNBCE0.0244 0.2303
MSE0.372742.73480.22823
FC-CNNBCE0.003958.35650.3048
MSE0.011743.36110.2972
ArchitectureLoss FunctionMIT-BIH Arrhythmia Dataset
MMDDTWMSE
LSTM-LSTMBCE0.993130.18160.0867
MSE0.884244.45530.1389
BiLSTM-LSTMBCE0.991622.86340.0699
MSE0.973723.55330.0806
LSTM-CNNBCE0.5519
MSE 24.73060.0457
BiLSTM-CNNBCE0.9246117.39940.2272
MSE0.068722.67400.0586
GRU-CNNBCE0.005520.48450.0335
MSE0.7704108.41240.1948
FC-CNNBCE0.206823.99100.0309
MSE0.308218.23400.0212

7.1 Differential Privacy

7.2 decentralized/federated learning, 7.3 assessment of privacy preservation, 8 discussion, 9 conclusion.

  • Chapela-Campa D Benchekroun I Baron O Dumas M Krass D Senderovich A (2025) A framework for measuring the quality of business process simulation models Information Systems 10.1016/j.is.2024.102447 127 (102447) Online publication date: Jan-2025 https://doi.org/10.1016/j.is.2024.102447
  • Wang X An Y Hu Q (2024) Anomaly prediction of Internet behavior based on generative adversarial networks PeerJ Computer Science 10.7717/peerj-cs.2009 10 (e2009) Online publication date: 23-Jul-2024 https://doi.org/10.7717/peerj-cs.2009
  • Cekić M (2024) Anomaly Detection in Medical Time Series with Generative Adversarial Networks: A Selective Review Anomaly Detection - Recent Advances, AI and ML Perspectives and Applications 10.5772/intechopen.112582 Online publication date: 17-Jan-2024 https://doi.org/10.5772/intechopen.112582
  • Show More Cited By

Index Terms

Computing methodologies

Machine learning

Machine learning approaches

Recommendations

Capsulegan: generative adversarial capsule network.

We present Generative Adversarial Capsule Network (CapsuleGAN), a framework that uses capsule networks (CapsNets) instead of the standard convolutional neural networks (CNNs) as discriminators within the generative adversarial network (GAN) ...

Detect and Remove Watermark in Deep Neural Networks via Generative Adversarial Networks

Deep neural networks (DNN) have achieved remarkable performance in various fields. However, training a DNN model from scratch requires expensive computing resources and a lot of training data, which are difficult to obtain for most individual ...

Generative Adversarial Networks (GANs): Challenges, Solutions, and Future Directions

Generative Adversarial Networks (GANs) is a novel class of deep generative models that has recently gained significant attention. GANs learn complex and high-dimensional distributions implicitly over images, audio, and data. However, there exist major ...

Information

Published in.

cover image ACM Computing Surveys

University of Sydney, Australia

Association for Computing Machinery

New York, NY, United States

Publication History

Permissions, check for updates, author tags.

  • Generative adversarial networks
  • time series
  • discrete-variant GANs
  • continuous-variant GANs

Funding Sources

  • Science Foundation Ireland

Contributors

Other metrics, bibliometrics, article metrics.

  • 81 Total Citations View Citations
  • 23,404 Total Downloads
  • Downloads (Last 12 months) 17,414
  • Downloads (Last 6 weeks) 1,697
  • Ribeiro I Comarela G Rocha A Mota V (2024) Towards a Framework to Evaluate Generative Time Series Models for Mobility Data Features Journal of Internet Services and Applications 10.5753/jisa.2024.3887 15 :1 (258-272) Online publication date: 11-Aug-2024 https://doi.org/10.5753/jisa.2024.3887
  • Rahman A Debnath T Kundu D Khan M Aishi A Sazzad S Sayduzzaman M Band S (2024) Machine learning and deep learning-based approach in smart healthcare: Recent advances, applications, challenges and opportunities AIMS Public Health 10.3934/publichealth.2024004 11 :1 (58-109) Online publication date: 2024 https://doi.org/10.3934/publichealth.2024004
  • Oh Y (2024) Data Augmentation Techniques for Accurate Action Classification in Stroke Patients with Hemiparesis Sensors 10.3390/s24051618 24 :5 (1618) Online publication date: 1-Mar-2024 https://doi.org/10.3390/s24051618
  • Melo A Câmara M Pinto J (2024) Data-Driven Process Monitoring and Fault Diagnosis: A Comprehensive Survey Processes 10.3390/pr12020251 12 :2 (251) Online publication date: 24-Jan-2024 https://doi.org/10.3390/pr12020251
  • Qian W Xu H Chen H Yang L Lin Y Xu R Yang M Liao M (2024) A Synergistic MOEA Algorithm with GANs for Complex Data Analysis Mathematics 10.3390/math12020175 12 :2 (175) Online publication date: 5-Jan-2024 https://doi.org/10.3390/math12020175
  • Ströbel R Mau M Puchta A Fleischer J (2024) Improving Time Series Regression Model Accuracy via Systematic Training Dataset Augmentation and Sampling Machine Learning and Knowledge Extraction 10.3390/make6020049 6 :2 (1072-1086) Online publication date: 11-May-2024 https://doi.org/10.3390/make6020049
  • Shobayo O Saatchi R Ramlakhan S (2024) Convolutional Neural Network to Classify Infrared Thermal Images of Fractured Wrists in Pediatrics Healthcare 10.3390/healthcare12100994 12 :10 (994) Online publication date: 11-May-2024 https://doi.org/10.3390/healthcare12100994

View options

View or Download as a PDF file.

View online with eReader .

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Share this publication link.

Copying failed.

Share on social media

Affiliations, export citations.

  • Please download or close your previous search result export first before starting a new bulk export. Preview is not available. By clicking download, a status dialog will open to start the export process. The process may take a few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress. Download
  • Download citation
  • Copy citation

We are preparing your search results for download ...

We will inform you here when the file is ready.

Your file of search results citations is now ready.

Your search export query has expired. Please try again.

  • Open access
  • Published: 28 May 2023

Time series big data: a survey on data stream frameworks, analysis and algorithms

  • Ana Almeida 1 , 2 ,
  • Susana Brás 2 , 3 ,
  • Susana Sargento 1 , 2 &
  • Filipe Cabral Pinto 1 , 4  

Journal of Big Data volume  10 , Article number:  83 ( 2023 ) Cite this article

7170 Accesses

11 Citations

1 Altmetric

Metrics details

Big data has a substantial role nowadays, and its importance has significantly increased over the last decade. Big data’s biggest advantages are providing knowledge, supporting the decision-making process, and improving the use of resources, services, and infrastructures. The potential of big data increases when we apply it in real-time by providing real-time analysis, predictions, and forecasts, among many other applications. Our goal with this article is to provide a viewpoint on how to build a system capable of processing big data in real-time, performing analysis, and applying algorithms. A system should be designed to handle vast amounts of data and provide valuable knowledge through analysis and algorithms. This article explores the current approaches and how they can be used for the real-time operations and predictions.

Introduction

The concept of big data was mentioned for the first time in a paper published in 1997 [ 1 ]. The authors called the problem of dealing with large data sets, “the problem of big data”. These large data sets were characterized by not fitting in the main memory, making it challenging or even impossible to analyze and visualize them. Even 25 years later, most computers cannot load 100 GB to memory, let alone process it.

In the current era in which data is produced at high rates, information has a decisive role, and most computers cannot process vast amounts of data; thus, it was necessary to create new ways to process the data. These aspects were the big impulse for the appearance of big data technologies.

The first approach to deal with big data sets was to divide them into smaller segments. However, even then, the segments could be very large in most cases. Besides, few computers were able to make this type of processing. To tackle this issue, frameworks started to appear to deal with batches of data. Nevertheless, none of these approaches deals with one big problem: what can be done if the data set keeps growing, and data continues to be received over time? To answer this question, several frameworks that deal with data streams have appeared.

The main goals of using big data are: (1) predicting future events, and (2) gaining insights and discovering relationships; in multidimensional and large sample-sized datasets [ 2 ]. However, these goals bring challenges in terms of computation and methods.

Predicting future events is also known as forecasting. Forecasting tasks foresee dealing with time series data. Processing and analyzing time series data in real-time can be a game-changer for an organization. This article will focus on time series data. Three tasks stand out on the analysis and prediction of time series data: monitoring, forecasting, and anomaly detection. These tasks benefit from being executed in real-time. Moreover, these tasks can be applied to many contexts and use cases. Therefore, it is important to use a streaming framework to process data as it arrives.

Anomaly detection in data streams is beneficial and essential for organizations to detect problems before they achieve more significant dimensions: for instance, to notice an intrusion before the intruder can steal or damage data. Another example is to detect unexpected traffic congestion and activate the responsible authorities. Therefore, the anomaly prediction connected to time series data will also be dealt in this article.

Using data streams in different contexts allows us to extract knowledge and make decisions in real-time (or near real-time). This article will explore how we can deal with big data, particularly, time series big data. This article will also analyse which algorithms can be applied to data to make forecasts and detect anomalies.

The main contributions of this work can be summarized as follows:

A comparative analysis of Stream Processing Engines (SPEs), including their characteristics and provenance, processing techniques, delivery of events, performance, and popularity.

A discussion on forecasting algorithms, including statistical and Machine Learning (ML) algorithms, and the advantages and disadvantages of using each type of algorithm.

A discussion on anomaly detection algorithms, the challenges of working with datasets containing anomalies, and the methods used to detect anomalies, such as statistical and ML approaches.

A comparative analysis of SPEs led us conclude that Spark is the most popular framework; however, Flink is better for data-intensive applications, and Heron scales better. Forecasting and anomaly detection methods bring value to organizations. While forecasting can allow better management of resources, anomaly detection can mitigate and eliminate problems. Regarding the type of methods used, statistical methods are usually lighter and more explainable, while machine learning methods are better when we have complex hidden patterns. The most recent published papers show a preference for deep learning techniques.

Working with huge amounts of streaming time series data can be a challenging task. With this in mind, we want to guide the reader on how this can be achieved. We will focus on three key relevant aspects:

Stream processing frameworks: these frameworks enable to process huge amounts of data, perform analysis, and apply algorithms in real-time.

Forecasting algorithms: these algorithms allow to predict future events. Therefore, they are essential for many organizations to perform informed decisions, manage resources, improve services, among others.

Anomaly detection algorithms: these algorithms allow to identify abnormal or unusual patterns. They can be early symptoms of something wrong, and we should be careful. They help us to improve security, quality, and efficiency.

Although the main focus of this work is the literature review on streaming frameworks, since we aim to work with time series data, we will also review the forecasting and anomaly detection algorithms; they play a crucial role in taking advantage of real-time processing capabilities. Therefore, with this survey, we aim to:

Identify the most relevant state-of-the-art regarding both data streams and algorithms.

Evaluate and compare different frameworks and methods to highlight each method or framework’s strengths, weaknesses, and limitations and when they should be applied.

Provide a guide for future research by identifying gaps in the current literature, areas that need further investigation, and other opportunities.

Related work

This subsection provides an overview of other related surveys presented in the literature. Table  1 summarizes the subjects mentioned in the works presented in this article, both surveys and research works. In this section we will address the survey articles.

This article presents a literature review on how to process huge amounts of time series that are continuously being produced over time and need to be processed in real-time. Therefore, in Table  1 , we consider papers regarding big data, stream processing, real-time processing, machine learning and deep learning, forecasting, and anomaly detection. In addition, we revised both surveys and research articles. Unfortunately, to the best of our knowledge, we did not find a paper analyzing all these topics. Nevertheless, we will compare our study with the most relevant works.

The most significant difference with work [ 9 ] regarding big data streams is that the authors of work [ 9 ] compared several tools, technologies, methods and techniques regarding data streams. However, we are more focused on data stream processing frameworks. In addition, the authors of [ 3 ] also discussed the concept of real-time associated with the processing of data streams, while the authors of [ 10 ] only perform a brief comparison of streaming processing frameworks. The authors of [ 10 ] conducted some practical evaluations of the streaming processing frameworks. Our survey presents a literature review. Similar to the work presented in [ 11 ], we are also researching progress in big data-oriented stream data mining; however, we focus on time series related problems, namely forecasting and anomaly detection.

Article structure

The remainder of this article is organized as follows. " Big data stream processing frameworks " section is focused on big data and data stream processing frameworks. It starts by discussing the problem definition, followed by existing solutions, it presents the elaborations and a summary. This section characterizes big data and discusses its relationship with data streams, forecasting methods, and anomaly detection. We also present frameworks for processing data streams, compare them, and discuss some example cases where each one can be applied. Next, " Analysis and algorithms for streaming data " section discusses algorithms that can be applied in the context of big data, namely forecasting concepts and methods (" Time series forecasting " section) and anomaly detection strategies (" Anomaly detection " section). In this section, we focus on statistical, ML, and Deep Learning (DL) methods and their advantages and disadvantages. Each of these 2 sections presents a similar organization. Finally, " Conclusions and future research directions " section presents the conclusions and the challenges envisaged for future work, as well as some future research directions.

Big data stream processing frameworks

Problem definition.

The evolution of traditional systems to streaming systems brings new processing and analysis capabilities and challenges. Firstly, we are no longer limited to bounded data, since we can process bounded and unbounded data. We are no longer required to divide or process data into multiple steps. Usually, a single step is enough. Besides, we no longer have to wait long periods for data to be processed. As we receive data, we process and obtain results and insights.

Designing the architecture of an application is an important task that should be well thought out. Considering that the streaming processing is part of an entire system, as a first step in the deployment of this component, the system requirements should be analyzed and task prioritization shall be evaluated. Choosing a SPE is not different. Some of the desired requirements that might be considered for real-time data stream processing are:

Process large volumes of data;

Integrate data from multiple data sources;

Deal with data with different properties (multi-dimensional data, multiple entities, spatial-temporal dependencies);

Deal with bounded and unbounded data streams;

Deal with unsorted data, or delayed data;

Detect data anomalies;

Computation performance metrics (low latency, high throughput, high availability, high scalability).

As we stated before, the true value of big data comes from taking insights from the data and helping decision-makers. Therefore, efficient and precise algorithms implemented on scalable frameworks are needed to explore the data potentials. If we consider ML and DL in our analysis, we might add the model performance (error and training time) to the list. In the context of forecasting, metrics such as the Mean Squared Error (MSE) or \(\hbox {R}^{{\textbf {2}}}\) -Score can be useful [ 38 ]. In the case of anomaly detection, we may choose a high accuracy, high precision or even high recall method [ 16 ]. Since explanations play a crucial role in decision-making, the explainability of the ML model should also be considered [ 78 ].

There are several SPEs. Each SPE provides different features and has different properties. Moreover, each one can be more or less adequate according to the application.

The concept of big data has evolved through the years. First, big data started being depicted as a massive amount of data that does not fit in the main memory and requires more sophisticated ways of processing and visualizing [ 1 ]. This definition remains true; however, it is incomplete, since it is always being updated due to the data explosion [ 18 ] that occurred during the last decades. Defining big data is not a simple task because of its complexity. Figure  1 summarizes big data characteristics, challenges and opportunities.

figure 1

Big data taxonomy—information collected from [ 2 , 5 , 15 , 17 , 19 ]

As previously mentioned, this massive amount of data is characterized by massive sample size and high dimensionality [ 2 ]. Besides, data can arrive at high velocities and different flow rates [ 19 ]. Moreover, data can come from different sources [ 2 ], making it more complex. Data stream frameworks can receive data from multiple sources and process huge volumes of data, continuously arriving at high velocities. Several factors increase the complexity of dealing with big data, such as the variety of data that can be received [ 19 ]. For example, we can receive numerical values, text, images, sounds, video, or a combination of more than one type. In addition, our data can have a temporal component that brings additional complexity to the problem.

The maximum potential of big data is achieved when we trust the data and take advantage of it by analyzing it. Thus, we must identify inaccurate and uncertain data and deal with it [ 19 ]. In this context, the importance of anomaly detection methods is highlighted, especially the real-time detection of anomalies in data streams to mitigate anomalies as soon they happen.

Some of these characteristics bring statistical, computational, and visualization problems. For example, we can have algorithm instability, noise accumulation, spurious correlation, incidental endogeneity, and measurement errors regarding statistical problems [ 2 ]. On the other hand, regarding computation problems, we have storage, scalability, and bottleneck problems [ 2 , 79 ]. Finally, visualization can be complex or even impossible when we have high-dimensional data.

Statistical problems can bring dangerous consequences, since they can lead to wrong statistical inferences or false scientific discoveries. For instance, an excellent example of a spurious correlation is the strong correlation (99.79%) between “US spending on science, space, and technology” and “Suicides by hanging, strangulation and suffocation” [ 80 ]. As we can understand, these two phenomena are unrelated. This is a well-known phenomenon in statistics, meaning that correlation does not imply causality. However, spurious correlations can go unnoticed depending on the context and the available knowledge.

To summarize, big data requires demanding computational resources, and its potential is unlocked through trust in data analysis. Therefore, several streaming frameworks emerged to process big amounts of data with low latency, high throughput, and high scalability. Furthermore, anomaly detection methods are essential in data streams [ 19 ], since they can suffer security attacks, have malfunctioning devices, or something unexpected may occur. We can also execute these methods in batch; however, when applied to real-time streaming data, they achieve their full potential. Besides, big data allows to (1) forecast future events, and (2) gain insights and discover relationships in data [ 2 ], both being important tasks, especially for decision-makers.

Big data analysis, forecasting, and anomaly detection are achieved through statistical, machine learning, or deep learning methods. Note that deep learning is a subset of machine learning. Figure  2 depicts Google searching trends through the years, by keywords. Big data, machine learning, and deep learning have a growing trend over the years. On the other hand, anomaly detection had a very soft increase. The searching trend forecasting decreases and reaches its peak in 2022; however, we can use other terms to express forecasting, such as prediction. Note that Google trends do not allow complex queries.

figure 2

Google research trends over time—data collected from [ 81 ]

We can apply big data to a vast amount of scientific fields. We will present examples of use cases and applications for analyzing time series data streams in real-time. We will also include some examples that benefit from forecasting or anomaly detection methods.

In finances and economics, monitoring the stock market, detecting fraud, or forecasting the performance of assets, are high relevant tasks. In [ 25 ], the authors used Artificial Neural Network (ANNs) and data streams to forecast stock prices. Monika Arya et al. [ 21 ] proposed a real-time method to detect credit card fraud in data streams, using ANNs with ensemble trees.

Regarding health care and well-being, monitoring patients and having real-time processing capabilities can save lives. For instance, Leo Kobayashi et al. [ 82 ] created a patient monitor system using streams and multimodal data fusion. Their approach allowed them to analyse the data, conduct experiments and develop and apply algorithms. Another interesting application is to monitor and forecast the spread of infectious diseases. For instance, Ensheng Dong et al. [ 83 ] created an interactive dashboard to monitor COVID-19 using data streams.

We can also find works that benefit from using frameworks to process data streams in informatics and communications, such as monitor resource usage or detect security attacks. In [ 4 ], the authors propose an internet traffic monitoring system using streaming frameworks. And in [ 7 ], Liu et al. perform resource management and scheduling.

Other main areas with big data characteristics are smart cities and industry 4.0. One significant advantage is that they allow the creation of living labs, creating a space for learning and innovation. We can find several works to monitor and improve urban mobility, monitor water consumption and detect water leaks [ 84 ], and forecast traffic flow [ 38 ], among many others. Leonhard Hennig et al. [ 23 ] built a system to extract mobility and industry events from data streams. Qinglong Dai et al. [ 13 ] used a data stream framework with customized changes to process data from smart grids. Still, in the context of energy systems, Philsy Baban [ 24 ] could process and validate real-time streaming data. In [ 8 ], Sahal et al. discussed streaming frameworks and other tools to perform predictive maintenance for railway transportation and wind energy.

As can be observed, we can find big data applications in several different fields. Society can benefit greatly from big data; however, big data can also be dangerous. In this article, we will not explore the “dark side” of big data. For instance, it can serve for mass surveillance and persecution or increase the disparities among minorities. However, we hope that governments and institutions use big data for good. In this context, it emerged a new research area: “fair AI”, whose biggest goal is to combat racism, sexism, and other types of discrimination against minorities [ 85 ].

Real-time data stream processing

We use the term “big data” to define huge amounts of data [ 1 ] and the term “stream” to express data continuously being created and arriving [ 86 ]. This data can come from different sources and have different formats; its processing is not always trivial, especially if it is required in real-time.

Big data applications can have five types of components: data sources, a messaging platform, a processing module, a storage mechanism, and a presentation module. The data sources can be, among others, Internet of Things (IoT) sensors and social networks. These sources of information usually come from users, devices or activity logs. The messaging platform is responsible for sending data between modules. The processing module can be a streaming processing framework to ensure real-time processing capabilities. The storage mechanism can be a database or a data warehouse. Processed data can be presented in different ways, such as a web application, a mobile application, and a technical report. Figure  3 depicts the components of big data applications.

figure 3

Big data applications components

Existing solutions

Fundamental concepts.

In " Problem definition " we mentioned application requirements that can restrict the choice of a SPE. Now, we will discuss fundamental concepts that make it possible to have different data-processing techniques.

We may consider three types of processing: batch-based, stream-based, or event-based [ 87 ]. Batch processing is characterized by processing bounded data streams with a beginning and an ending. On the contrary, stream processing is characterized by the processing of unbounded data streams that do not have a known end. Besides, the data processing is performed as data arrives. If our application requires that we generate alerts or triggers if our data meets some conditions, we have event-based processing.

Concerning the processing model, we also have three types: at most once, at least once, and exactly once. At most once processing does not guarantee that the data is processed or persisted. In case of failure, we may have to deal with missing data. Usually, applications that choose at most once processing are more concerned with latency than reliability. On the contrary, at least once processing may process or persist duplicated data, but at least it guarantees that every data is processed or persisted at least one time. At last, exactly once processing just processes or persists data once.

Window mechanisms specify how to divide the stream in order to aggregate time series data. There are six main processing techniques [ 26 , 88 ]. The most basic mechanism is the single-pass in which we process each new sample only once. Several windowing mechanisms will be discussed. Nevertheless, a windowing mechanism can be defined as a function of the time or the number of events [ 27 ]. A sliding window mechanism is defined as a window with a fixed size that slides over the data stream [ 26 ]. Tumbling windows are non-overlapping sliding windows [ 88 ]. Session windows are similar to tumbling windows; however, in session windows, we have a gap between windows [ 88 ]. In a landmark window, it is specified a sample from which the window keeps growing [ 26 ]. This sample can be updated from time to time. At last, the damped window mechanism uses a fading mechanism in which, the most recent samples have a bigger weight, and, as time goes by, the samples loose their weight [ 26 ]. Figure  4 represents some of these window mechanisms.

figure 4

Processing window mechanisms

Regarding stream-based processing, its methods can be considered stateless or stateful. If the processing is stateless, then the state is not preserved. We can use stateful processing if we want to know how many people buy a specific game per month. On the other hand, if the state is retained, the processing is stateful. This can be useful to measure how many people buy the game over time in a commulative maner.

Data processing frameworks

As aforementioned, we will discuss and compare different SPEs. We selected six SPEs: Apache Spark, Apache Flink, Apache Storm, Apache Heron, Apache Samza, and Amazon Kinesis. Besides, we decided to include Apache Hadoop for historical reasons.

Hadoop Footnote 1 was the first framework that appeared to process large datasets using the MapReduce programming model. Hadoop is very scalable, since it can run on a single cluster, in a single machine, or spread on several clusters in multiple machines. Moreover, Hadoop takes advantage of distributed storage to improve performance by transmitting the code that processes the data instead of the data [ 89 ]. Besides, Hadoop provides high availability and high throughput. However, it can have efficiency problems when dealing with small files.

The major drawback of using Hadoop is that it does not support real-time stream processing. To deal with this problem, Apache Spark emerged. Spark Footnote 2 is a framework for processing batch and streaming data, and allows distributed processing. According to Matei Zaharia [ 90 ], the creator of Spark, Spark was designed to respond to three big problems of Hadoop:

Avoid iterative algorithms that make several passes through the data;

Allow real-time streaming;

Allow interactive queries.

Instead of MapReduce, Spark uses Resilient Distributed Datasets (RDDs) that are fault-tolerant and can be processed in parallel. Spark also provides scalability, and since its early releases, it has proved to outperform Hadoop [ 33 ]. Spark is helpful for data science related projects. Besides its main component, Spark provides several libraries for Exploratory Data Analysis (EDA), ML, graph analysis, stream processing and SQL analytics.

Two years later, Apache Flink Footnote 3 and Apache Storm Footnote 4 were created. While Spark uses micro-batches for stream processing, Flink and Storm can perform stream processing natively. Flink can process batch and streaming data. In Flink, we can process streams with specific temporal requirements. For example, we may consider processing or event time. In case of event time, Flink allows to deal with delayed events. Besides, Flink provides watermark support, allowing a trade-off between latency and completeness of data. Storm and Flink are similar frameworks, generating some discussion regarding their differences [ 91 ] and which of the following stand out:

Storm only allows stream processing;

They both can perform stream processing with low latency;

The API offered by Flink is more high-level and provides more functionalities;

They have different strategies to provide fault tolerance (Storm employs record-level acknowledgements while Flink uses a snapshot algorithm).

Storm is a good streaming framework; however, its capabilities to scale are not enough for more demanding applications. Besides this, debugging and managing Storm can be complex tasks. In this context, Apache Heron Footnote 5 emerges, as the successor of Storm. A paper published in 2015 [ 34 ] announced this transition at Twitter.

Apache Samza Footnote 6 is a framework that provides real-time processing, event-based applications, and Extract, Transform and Load capabilities. Samza provides several APIs and presents an architecture similar to Hadoop, but instead of using MapReduce, it has the Samza API, and it uses Kafka instead of the Hadoop Distributed File System .

Finally, Amazon Kinesis Footnote 7 is the only framework presented in this article that does not belong to the Apache Software Foundation. Kinesis is actually a set of four frameworks instead of a data stream framework. In this work, we refer to Amazon Kinesis to talk about the Kinesis Data Streams framework to simplify. Kinesis can easily be integrated with Flink.

Elaboration

The processing frameworks present different properties, which makes it challenging to choose one framework without understanding the differences. Therefore, we should choose the framework that suits best our use case.

Firstly, we decided to look at the nature of each framework. Although several frameworks belong to the Apache ecosystem, most were not created by Apache. They were later integrated into the Apache family through The Apache Incubator. Footnote 8 Table  2 resumes the nature of each one of them.

Table  3 contains information about the processing techniques available (batch or stream) and the delivery of events (at most once, at least once, exactly once). As we already mentioned, Hadoop only provides batch processing. Storm and Heron only provide stream processing. All other frameworks offer both batch and stream processing. However, Spark provides stream processing through micro-batches. Regarding the delivery of events, most frameworks guarantee that the events are processed exactly once or at least once. Heron offers three types of delivery, the two mentioned above and at most once. Besides, these frameworks provide drivers for several programming languages, the most popular are Python and Java.

Performance-wise, some experiments have been conducted to compare the different SPEs. Note that it is difficult to make a fair comparison due to the lack of experiments that contemplate all frameworks. Therefore, we started by a performance comparison regarding the frameworks. This comparison considers the information available in the official documentation of each framework, which is present in Table  4 . One of the most important characteristics when choosing a framework is the ability to process information in real-time. However, there needs to be a consensual definition of what real-time means. Gomes et al. [ 3 ] focused their study on this concept in the context of data streams and big data. According to the authors, there are different intents when discussing real-time. For example, real-time could mean an immediate response. Another possibility is the guarantee of low latency: some consider the time the system should answer, while others refer the time the system must answer. For a more fair comparison, in this discussion, we will focus on real-time as the property of having low latency.

Most of these frameworks present low latency, which is good when we are processing significant amounts of data and want to process it in real-time. Hadoop is the only one that is considered to have high latency. All frameworks present high throughput and high scalability. However, Hadoop only allows scaling vertically. Regarding fault tolerance mechanisms, all frameworks deal with fault tolerance.

After this initial study, we look for works that compare some of these frameworks to make an unbiased comparison. In 2015, Namiot et al. [ 10 ] made an introductory comparison of the properties of Storm, Spark, Samza, Apache Flume, Apache Kafka, Amazon Kinesis, and IBM InfoSphere.

Besides the noticeable differences between Hadoop and Spark, Pooja Choudhary et al. [ 28 ] conducted some experiments to compare these two frameworks. They concluded that Spark uses more memory than Hadoop, needing less execution time. However, the authors of [ 35 ] mentioned that Spark might not be the best framework if our application requires low latency and high throughput.

The authors of [ 29 ] compared the performance of Spark, Flink, and Storm under saturation conditions (the maximum streaming load that the frameworks could support without delay). This comparison is insightful if we want to choose the best framework for a data-intensive application. Flink presented the highest saturation level, while Storm had the worst CPU usage. Even when failure recovery mechanisms are activated, Storm performance decreases by 50%, while Flink only decreases 10%. Nevertheless, Spark can surpass Flink if we are not concerned with latency.

Inoubli et al. [ 12 ] performed experiments in which they compared Spark, Storm, Flink, and Samza. They observed that Spark achieved the worst processing rates compared to the other three frameworks. Flink and Samza were more efficient, especially when messages had a more considerable size. Flink CPU usage was lower; however, Flink could outperform Storm if the CPU consumption allowed was increased. Spark requires more RAM, less disk access, it is slower in processing messages, and uses less bandwidth.

In 2019, in the context of a smart city, Hamid Nasiri et al. [ 30 ] evaluated three different frameworks: Spark, Flink and Storm. They started by fixing the input rate and compared the performance with two nodes versus eight nodes. With two nodes, Flink presented the lowest latency and the highest throughput. Flink delivered a similar performance with a slightly higher throughput with eight nodes. The improvements on Spark and Storm were more significant, but Flink was still the best. On the other hand, Spark had the worst latency. With eight nodes, Spark presented a similar throughput to Flink; however, it reached the highest throughput peaks. They analyzed the impact of changing the input rate and the number of worker nodes. We can conclude that the performance of Flink is similar to Storm, even when using no acknowledgements in Storm. The most significant difference is the throughput in which Flink is better than Storm; however, Storm seems to scale better, and with eight nodes, Spark is the best of them all in terms of throughput. At last, they measured CPU and network utilization. Flink achieved the lowest CPU utilization and the highest network utilization. Storm and Spark achieved similar performances.

Kolajo et al. [ 9 ] compared 19 tools and technologies for data streaming; however, only half of them supported both batch and streaming processing. On another work [ 31 ], in 2019, the authors compared the performance of five stream processing systems: Storm, Flink, Spark, Kafka Stream, and Hazelcast Jet. Storm has the best memory consumption, and presents good stability. Flink presents the lowest latency. Spark presents the highest throughput and has a good compatibility with ML libraries.

In 2020, LinkedIn published a post [ 92 ] showing some improvements performed on Samza. These improvements provided Samza with more considerable throughput capabilities when compared with Flink.

Later in 2021, Krzysztof Wecel et al. [ 32 ] selected six frameworks, but has chosen to focus their analysis on comparing Spark and Flink. They concluded that Spark is more memory efficient while Flink is more CPU efficient. The authors also mentioned that, while performing their experiment, they found a problem that led to delays in the implementation phase: missing detailed documentation. We were already aware of this problem, especially with Flink.

Heron brings an extensive set of advantages to users that want to transit from Storm to a more scalable framework. The API available for Heron is compatible with the one available for Storm. Heron requires fewer resources (less CPU usage) and provides performance improvements (more throughput and less latency). Currently, Heron is in the incubating phase at The Apache Incubator [ 93 ].

To understand the frameworks popularity, we decided to perform two experiments using Scopus. Footnote 9 These experiments were performed on August 9th, 2022. In the first experiment, we try to understand the popularity of the different frameworks over the years. In the second experiment, we try to perceive how many publications exist when we consider different criteria.

For the first experiment, we created three queries. The example below contains the queries for the Apache Hadoop framework. Similar queries were performed for the remaining frameworks.

apache w/ hadoop

TITLE-ABS-KEY (apache w/ hadoop)

TITLE-ABS-KEY (apache w/ hadoop) AND (LIMIT-TO (SUBJAREA,“COMP”) OR LIMIT-TO (SUBJAREA,“ENGI”))

Firstly, we perform a general search using only the framework’s name. Secondly, we restrict papers with the framework’s name in the title, abstract or keywords. Lastly, we limit the subject area to papers published in the engineering field or computer science.

Figure  5 contains the results of the first query. We can visualize that Hadoop is the dominant framework in the first years. This happens because Hadoop is the oldest, and most frameworks did not exist or did not belong to the Apache Software Foundation at the time. The most popular streaming framework is Spark. Following Spark, the popularity of Flink and Storm is similar. Finally, Heron, Samza and Kinesis are the most unpopular frameworks.

figure 5

Data processing frameworks: Popularity over the years first query

Figure  6 presents the results of the second query. When we restrict papers with the framework’s name in the title, abstract or keywords, we can visualize that Spark is the dominant framework. This might indicate that most papers that mention Hadoop only mention it because it was the first relevant framework. Another explanation is that Hadoop is the framework used in the study, but was not the subject of the study. Therefore, this second query is more focused on studying the framework, not its usage.

figure 6

Data processing frameworks: Popularity over the years second query

What we can visualize in Fig.  6 is intensified in Fig.  7 when we limit the subject area. Figure  7 shows the results of the third query.

figure 7

Data processing frameworks: Popularity over the years third query

In the second experiment, we evaluate the number of papers that considered stream-related concepts and algorithms. Our goal is to understand, for instance, how many articles that addressed forecasting also addressed streams. We started with two basic queries. First, query 4 helps to understand how many papers contain the word forecast or other words derivated from the word forecast, such as forecasting or forecasts. Query 5 helps to understand how many papers include anomaly detection or outlier detection. Query 6 is an additional query to understand how many papers also include ML or DL.

(anomaly w/ detection) OR (outlier w/ detection)

(machine w/ learning) OR (deep w/ learning)

Figure  8 contains the results for forecasting terms. We start by performing query 4, and we named forecast-term. Then, we also included query 6, which we called ML-term. Then, we selected only the papers that had both terms in the title, abstract, or keywords. The next step was to limit by subject area (as in the first experiment). Then, we limited the search by the years from 2012 until 2023. Finally, we included different terms in order to answer our initial question. We separated the terms stream and the several frameworks. As we can visualize, we started with 1.5 million papers, and in the end, only 1 thousand had terms related to streams.

figure 8

Forecast versus Stream

Figure  9 contains the results for anomaly detection. The only thing that changed with Fig. 5 was the initial term that, in this case, was the anomaly detection term, query 5. As we can visualize, we began with 136 thousand papers, and in the end, only five hundred had terms related to streams.

figure 9

Anomaly detection versus Stream

Only a few papers consider streaming and forecasting concepts because a forecasting algorithm, to provide the most benefits, should perform real-time forecasting. Moreover, given the complexity of implementing a stream-based forecasting system and a forecasting algorithm, researchers can be more focused on developing one of these tasks when they publish their work. The same can be applied to anomaly detection concepts and other applications.

Choosing the best SPE is a critical engineering task that should consider the following. Foremost, only Spark, Flink, Samza and Kinesis allow both batch and stream processing. In addition, Spark and Flink do not allow missing or repeated data. However, Heron enables the choice of any delivery. Flink is the best framework for data-intensive applications, presenting the lowest latency and highest throughput. However, Storm seems to scale better. Recent studies have proven that Samza has a better throughput than Flink, and Heron scales better than Storm. Nevertheless, Spark and Storm are the most popular stream frameworks. Heron is a good substitute for Storm, allowing Storm users to transition easily.

Analysis and algorithms for streaming data

In the scope of ML, several tasks can take advantage of streaming technologies, such as regression, classification, clustering, forecasting, anomaly detection, and frequent pattern mining.

In this section, we decided to focus on two tasks related with time series: forecasting (" Time series forecasting " secrtion) and anomaly detection (" Anomaly detection " section).

Time series forecasting

Humans are constantly trying to predict the future. Millions of years ago, when we started counting time, we also began to make predictions. One of the questions that most hunt humanity, and that several societies, religions and individuals tried to guest, is when doomsday will occur. Several dates have been proposed over the years, but until now, none of them has been correct.

Forecasting is a prediction task in which we try to predict future events accurately. To make good forecasts, we should understand the phenomenon and the causes that influence the phenomenon. We can use historical data, events that may occur, and other information that may contribute to the forecasting task [ 94 ]. For example, when we look at the sky and see dark clouds, we can (most certainly) guess it will rain.

Accordingly, with the domain of our problem, we should look for data other than the phenomenon’s data. For instance, Wasiat Khan et al. [ 45 ] used data from social media and financial news to predict the stock market’s performance. However, the authors recognize that not all stocks are influenced the same way. Besides, the authors noticed that some stocks were more influenced by social media news, while others were more influenced by financial news. Ahmad Ali et al. [ 46 ] considered the spatial-temporal dependencies and several temporal patterns (current, daily, and weekly) to predict crow flows. The use of external factors, such as weather conditions, holidays, and events was also crucial in this context.

Forecasting tasks can be classified as short, medium or long-term forecasts [ 94 ]. These terms are used if the forecast is made for the near future, medium future or distant future. For instance, we may want to predict how many people will travel to a tourist destination in the next hour, in the next week, or in the next year.

Usually, short-term forecasting is only relevant in a short interval. Therefore, we might benefit from performing the forecasting in real-time or near-real-time. On the other hand, medium and long-term forecasting is not needed immediately; therefore, we can perform them offline.

Forecasting problems use time series data. A time series is the evolution of one variable (or more) over time. A time series is a stochastic process, time-indexed, thus making statistical properties relevant. When we only have one variable, we have a univariate time series. We have a multivariate time series when we have more than one variable. Usually, when we are in the presence of a univariate time series, we call it a time series [ 94 , 95 , 96 ].

figure 10

Forecasting methods

Time series data is similar to streaming data, since we can look at the data arriving from the streaming with a temporal component and a sequential order. However, this does not mean that all data from streams are time series, even though they might have a timestamp associated.

There are three types of forecasting methods: historical, statistical, and ML. Historical methods only look at past values to forecast new ones. The most popular historical method is the Historical Average (HA), which can be found in the literature [ 47 ], especially as a baseline. Statistical methods are mainly based on the Auto Regressive (AR) method. They are also considered usually as a baseline. For instance, we can find Auto Regressive Integrated Moving Average (ARIMA) in work [ 47 ]. ML approaches, particularly DL, have been highlighted more recently, and several novelty methods have been proposed.

We can find forecasting works related to energy consumption and pricing. Bangzhu Zhu et al. [ 48 ] used an SVM-based method with mixture kernels to forecast carbon prices. Razak Olu-Ajayi et al. [ 49 ] predicted the energy consumption of buildings using ML and DL models, and concluded that ANNs are more suitable to make predictions. In [ 50 ], Zhang et al. proposed a Multi-view Ensemble Learning Model (MELM) to forecast traffic of base stations to save power in cellular networks. Their multi-view methods had four views: a temporal, a spatial, one dedicated to events, and the last view for residual information. For the temporal component, they analyzed the auto-correlation, the trend, and the seasonality of the data, and they used the Seasonal Auto Regressive Integrated Moving Average (SARIMA) to perform short and long-term forecasting. They used a spreading model based on a grid system to observe and capture the spatial dependencies. The authors observed that different regions have a different number of users, and they observed mobility transferring from nearby regions. They used a decision tree to capture the influence of events, since they cause changes in traffic. They considered four types of events (holidays, weather, concerts, and news). For the residual information, they used a top-k regression tree.

Another explored topic is related to traffic. To predict the flow of crowds, in [ 51 ] it is proposed a framework called Forecasting Citywide Crowd Flows (FCCF). The authors used human mobility data, weather conditions, and road network data. First, they divided the human mobility data into two edge flow categories: inflow and outflow. Besides that, they split the region into small regions. Then, they decomposed the flows into seasonal, trend, and residual and built a model for each one of the flows. For the seasonal and trend components, they created an Intrinsic Guassian Markov-Random-Field (IGMRF) for each component. For the residual, they explored the spatiotemporal dependence and built a spatiotemporal residual model that uses a Bayesian network. Then, the models were aggregated to give the final prediction.

The authors in [ 52 ] proposed a multi-view network model called Deep Multi-View Spatial-Temporal Network (DMVST-NET). They observed that, in most cases, including a region that presents a weak correlation with the region we want to predict decreases the model’s performance. Usually, distant regions are less correlated, but this is not always true. Considering this all, the authors chose to create three views: a view for the temporal component, another for the spatial component (they only consider nearby regions), and the last one for semantic relations (the regions are far away but present similar demands). They used a Long Short-Term Memory (LSTM) for the temporal component, a Convolutional Neural Network (CNN) for the spatial component, and a Graph Neural Network (GNN) to capture the semantic relations.

In [ 53 ], the Multi-Task Learning Temporal Convolutional Neural Network (MTLTCNN) method is proposed for short-term passenger demand prediction. The authors started by using a Spatio-Temporal Dynamic Time Warping (ST-DTW) algorithm to select the most relevant features. The proposed method is multi-task, having one task per region. Each task comprises a Temporal Convolutional Neural Network (TCNN), and the tasks share information between them, namely spatiotemporal correlations. Ahmad Ali et al. [ 46 ] proposed an ANN model based on graphs and convolution to predict crowd flows. In addition, they explored spatiotemporal dependencies and external factors. The authors of [ 47 ] proposed an architecture that uses graphs, convolution, and recurrency to forecast traffic. Their approach explores spatiotemporal dependencies.

In 2018, Spyros Makridakis et al. [ 39 ] published the results of the fourth edition of a forecasting accuracy competition. This competition discouraged the submission of complicated ML models that required high computational capabilities. Most of the best methods were combinations of statistical models. One of the best methods was a hybrid ML (using Recurrent Neural Network (RNN)) and a statistical approach (exponential smoothing). Unfortunately, some of the submitted methods were based only on ML and achieved the worst results. Later in 2021, Spyros Makridakis et al. [ 40 ] published the results of the fifth edition of the forecasting accuracy competition. The goal was to predict the sales of a retail company represented by 42.840 time series. Most of the competitors used LightGBM-based methods, a ML method based on trees. In the top five, the first two top methods were essentially a weighted combination of LightGBM models, the third winner was a weighted combination of a Neural Network (NN), the fourth place was a non-recursive LightGBM, and the fifth was a recursive LightGBM.

A literature review on deep learning methods for financial time series forecasting [ 43 ] presented eight methods commonly used: Deep Multi Layer Perceptron (DMLPs), RNNs, LSTMs, CNNs, Restricted Boltzman Machines (RBMs), Deep Belief Networks (DBNs), Autoencoders (AEs), and Deep Reinforcement Learning (DRL). The authors highlight the preference of researchers in using RNNs, specially LSTMs, with financial data. However, as the authors identified, CNNs and Graph-based networks still need to be explored when using financial data. Meanwhile, Masini et al. [ 44 ] reviewed both ML and DL methods for financial forecasting; their main focus was NN, regression trees, bagging, and regression. The authors emphasized the use of ML models (including DL models) in the presence of large datasets.

Table 5 resumes the revised works. In this comparison, we did not include the survey articles. As we can visualize, different approaches emerged over the last years for both ML and DL methods. Most of the authors used more than one metric to compare the methods.

Figure  10 contains some of the methods used in forecast tasks. Forecasting may be accomplished using statistical methods or DL-based methods. Both approaches have advantages and disadvantages. Depending on the context, statistical methods may be more advantageous than DL methods and vice-versa. While statistical methods are explainable, they are usually more robust in short-time predictions, and they present the best results in short-time contexts. They are usually not suitable for long-term forecasting.

ANNs present some disadvantages. The first problem is to find the weights of the inputs. The training process will update the model weights in each iteration; however, the optimization algorithm used may not lead to the minimum error or loss and can lead to overfitting. The training process can be extensive, making its adoption difficult in some contexts. ANNs also require a lot of information and great computational power when compared with statistical methods.

One of the big problems with ML algorithms is the lack of transparency, especially in ANNs. ANNs are often seen as “black boxes” [ 41 ]. In order to solve this issue, a new topic has emerged in the scope of ML: explainable models. Explainability plays a crucial role in the understanding of a particular problem. A correct prediction is not always enough, since it can have real impacts in terms of security, ethics, mismatched objectives, privacy, and others [ 42 ].

The more relevant advantage of using DL based methods is the possibility of working with multidimensional data, in some cases exploring the relationships between space, time, and other factors that may influence the prediction. Statistical methods may be more beneficial regarding forecasting methods with real-time stream processing, since they are lighter. However, we should consider the application requirements, the data, and the threshold between execution time and other performance metrics.

We decided to compare the type of methods used in forecasting in terms of popularity over the years, highlighting the last years. Figure  11 contains the relationship between the number of documents retrieved from Scopus when we perform the query example Q7. As we can observe, the use of machine learning and deep learning for forecasting increased over the last few years.

figure 11

Evolution of the popularity of type of methods regarding forecasting over the years. ML stands for Machine Learning, DL for Deep Learning, SL for Statistical Learning, and RL for Reinforcement Learning

TITLE-ABS-KEY ( forecasting AND ( “machine learning” OR “ml”) )

We also compared the methods used. Figure  12 contains the obtained results. Before 2018, the type of methods that were more mentioned were the ANNs. This can happen for two reasons: it was used the generic architecture of ANN, or the authors used the word when referring to a specific type of ANN. For instance, a LSTM is a type of ANN. Over the years, we can observe an increase in the use of LSTMs, CNN, RNNs, AE, and GNNs. The popularity of Deep Learning methods does not mean that the statistical ones are not important. It just reflects the evolution and trends of research methods.

figure 12

Evolution of the popularity of methods regarding forecasting over the years. ANN stands for Artificial Neural Network, SVM for Support Vector Machine, LSTM for Long Short-Term Memory, A &S for ARIMA and SARIMA, RNN for Recurrent Neural Network, CNN for Convolution Neural Network, FNN for Feedforward Neural Network, AE for Autoencoder, GNN for Graph Neural Network, DBN for Deep Belief Network, LGBM for LightGBM, HA for Historical Average and RBM for Restricted Boltzmann Machines

Forecasting is an essential task when working with time series datasets. We can have different forecasting horizons, such as short, medium, and long-term. We can apply this type of method to different contexts and use cases.

Classical methods are mainly based on Auto-Regression. Regarding machine learning methods, LightGBM proved to be efficient. In the case of deep learning methods, the most used are based on LSTMs, CNNs, AEs, and GNNs. As we discussed, all methods have their positive and negative aspects. In addition, the application and intent of the problem can make the choice of the technique easier to select.

  • Anomaly detection

An anomaly occurs when something unexpected happens. We can observe anomalies in our daily lives, for instance, a cold day (as if it were winter) in the middle of the summer. We can visualize the anomalies in data. If we look for the chart that contains the daily temperatures measured in the summer, we would see an anomalous point in relation to the other points. However, not all anomalies are expressed in the same way. Anomalies can be classified by their nature, they can be a point anomaly, a contextual anomaly, or a collective anomaly [ 54 ].

A point anomaly can be identified when we compare it with the rest of the data [ 55 ]. Remembering the “cold day in the middle of the summer” example, if we only had data from the summer, we would have a point anomaly if the observed temperature was very different from all others.

A contextual anomaly happens in a particular context [ 55 ]. If we had data from the entire year, we would observe that in the winter there are low temperatures. The point is anomalous because it happens in the summer and not in the winter. This is similar to a conditional anomaly, which depends on the context to be classified as an anomaly.

A collective anomaly is a collection of points that are considered anomalous when compared with the remaining dataset [ 56 ]. They can be, for instance, an abrupt change in the temperature of the summer. Another example would be a day in which it is verified a smaller variation of temperatures. As we know, temperatures are higher in the summer. However, we can have fluctuation throughout the day. From the examples above, we can conclude that anomalies can also be present in time series, and can be isolated outliers or abrupt changes.

There are several challenges associated with the detection of anomalies. Anomalies are not always known or noticeable, and it is difficult to define what may be considered as anomalous. Besides that, there is always some noise associated with the anomaly detection. As an example, network attacks can change, evolve, and adapt, marking this as a complex problem, and allowing negative impacts to happen from the presence of false negatives and false positives in the analysis [ 54 , 57 ].

Anomalies are known for being rare in datasets. It is because of that property that they are considered anomalies. In a dataset containing anomalies, and if our goal is to identify them, we will have a class imbalance problem. This problem is amplified when dealing with big data. There are three different techniques to solve this issue [ 16 ]:

Data-based techniques: using sampling methods, we can reduce the level of imbalance;

Algorithm-based techniques: we can reduce the bias towards the majority group;

Hybrid techniques.

Learners can have difficulties identifying anomalies, especially in highly imbalanced datasets, such as decision trees and logistic regression [ 16 ]. Moreover, some classification metrics are more sensitive to imbalanced classes. Regarding the evaluation metrics, some metrics are highly affected and are not recommended, such as accuracy and error rate. Other metrics, such as precision, and recall, can be used, but they alone are usually not enough [ 16 ]. The F -measure metric is a weighted average of precision and recall and is highly used in this context.

To detect anomalies, statistical learning approaches can be used. In [ 58 ], Hochenbaum et al. used seasonal decomposing to extract the trend and the seasonal components. They proposed two techniques: the seasonal Extreme Studentized Deviate (ESD), and the seasonal hybrid ESD, which adds the median and the Median Absolute Deviation.

Some methods to detect anomalies are signal-based. In [ 59 ], the authors could effectively detect sharp increases in the local variance using wavelet filters and pseudo-spline filters. In [ 97 ], Muñoz et al. used correlation-based techniques.

Principal Component Analysis (PCA) based approaches were explored in [ 60 , 61 ]. In [ 60 ], the authors applied wavelet transformations to network traffic data. Then, it is applied PCA to extract the nature of anomalies. Finally, they use a mapping function to detect the anomalies. In [ 62 ], the authors could also localize the source of anomalies by incorporating the network structure information with the PCA model. They used the Karhunen Loève Expansion to get spatial and temporal correlations. In [ 61 ], the authors proposed the use of Minimum Covariance Determinant (MCD) with Robust Principal Component Analysis (rPCA). As PCA might have issues associated with introducing the outliers in the subspace, rPCA tackles it, with a computational cost. The use of MCD helps to ease the computational cost.

We can also find in the literature approaches based on the k-Nearest Neighbors (KNN) algorithm. In [ 63 ], the authors proposed a Transductive Confidence Machine (TCM) with KNN for online anomaly detection. They could improve their results by applying instance selection. The authors of [ 22 ] compared Naive Bayes, Support Vector Machine (SVM), and decision trees, and in [ 36 ] it is used Naive Bayes.

figure 13

Anomaly detection methods

Several works are based on ANNs, such as [ 37 , 64 , 65 , 66 , 67 , 68 , 69 , 70 , 71 , 72 , 73 , 74 ]. In [ 64 ], motivated by the presence of a high rate of false alarms and improving accuracy, Hussain et al. proposed a FeedForward Neural Network (FNN) to detect anomalies in cellular networks. They accomplished high accuracy and a low False Positive Rate (FPR), proving the usefulness of FNNs. The work in [ 65 ] used a LSTM to detect network attacks through the anomalies present in data. They tested two types of baselines. In the first one, they only used cleaned data to train the model (without anomalies). In the second one, they used dirty data to train the model (with anomalies). They concluded that the dirty baseline models achieved the best results, which is good when no completely clean dataset exists. In [ 66 ], it is proposed the Parallel Subagging-GRU-based network (PSB-GRU)Parallel Subagging-GRU-based network (PSB-GRU) method. The model uses a Gated Recurrent Unit (GRU) network for long-term dependencies, a genetic algorithm to optimize the training process, the Spark platform to improve train efficiency, and subagging smoothly to improve the model’s generalization.

In [ 67 ], it is compared the performance of several RNN-based methods. The authors concluded that LSTM networks achieve the best results in terms of performance; however, the other RNN-based network also achieved good results. The works in [ 65 , 66 , 67 ] allow to conclude that sequential NN are suitable to detect anomalies. In [ 68 ], it is proposed a CNN-based method to extract spatio-temporal and other features from data with a threshold-based separation method to detect anomalies. The architecture had four convolutional layers. They achieved good results; however, they recognize that they need a more lightweight method to perform online anomaly detection. The authors of [ 74 ] also used a CNN. They were able to achieve better performance, in some cases, in architectures with one convolutional layer when compared with two or three convolution layers. However, their methods did not outperform RNN-based methods. The authors of [ 69 ] explored how CNNs can fail. The authors concluded that a one-pixel attack can mislead CNN-based networks. Increasing the number of layers (three convolution and three pooling layers) and retraining contributes to a more robust detection.

The authors of [ 70 ] proposed an ensemble method based on RBM and SVM. They tested their method in real time and achieved good performance. The work in [ 71 ] used Self-Organizing-Maps (SOM). Their model is computationally light, presenting results with a very low delay. In [ 37 ] the authors also use SOM with k - medoids , and they perform a two-step clustering. They achieved fast online detection and a multistage decision to distinguish different anomalies. In [ 72 ] it is proposed an autoencoder-based method with convolution. The use of autoencoders allowed the authors to capture non-linear correlations between features. The use of convolution has also reduced the training time. In [ 73 ], stacked autoencoders are used with a one-class classification model. The use of autoencoders allows the selection of the most relevant features and the reduction of data dimensionality.

Other approaches, such as the one proposed by [ 75 , 76 ] are tensor-based. A tensor is a structure similar to a multidimensional array with three or more dimensions. When we have one dimension, we have a vector (denoted as a first-order tensor), and if we have two dimensions, we have a matrix (second-order tensor) [ 76 ]. In [ 75 ], the proposed method is based on tensor decomposition. The method in [ 76 ] is based on tensor factorization, and we have a two-phase anomaly detection. Tensor-based methods are useful when we have complex data with high-dimensional orders.

Table 6 resumes the revised works for anomaly detection. We can visualize different types of methods. In anomaly detection, one of the most important tasks is the fair evaluation of the methods. Usually, in an anomaly detection problem, we have the class imbalance problem, as mentioned above. To compare better the evaluation metrics used, we decided to create Table  7 . False Positive Rate, True Positive Rate, and accuracy are the most frequently used metrics. The class imbalance highly affects the accuracy, and this metric should not be used, especially without other metrics.

Figure  13 contains some methods used in anomaly detection. Traditional statistical methods can fail in the face of big data and data with several dimensions. On the other side, ML methods can deal with high dimensionality. Supervised methods achieve good performance in detecting anomalies [ 6 ]. However, they have problems detecting new unseen types of anomalies. Unsupervised methods are good at detecting new anomalies [ 14 ].

Figure  14 contains the evolution of the popularity of the type of anomaly detection methods over the last few years. The use of statistical methods decreased while the use of deep learning methods increased. Currently, most of the published works use machine learning and deep learning. Similarly, Fig.  15 contains the evolution of the popularity of techniques over the last few years. As we can observe, methods such as PCA, SVM, and KNN lost popularity over time, while the focus evolved to the use of CNNs, RNNs, LSTMs and AE.

figure 14

Evolution of the popularity of type of methods regarding anomaly detection over the years. ML stands for Machine Learning, DL for Deep Learning, SL for Statistical Learning, and RL for Reinforcement Learning

figure 15

Evolution of the popularity of methods regarding anomaly detection over the years. ESD stands for Extreme Studentized Deviate, PCA for Principal Component Analysis, rPCA for Robust Principal Component Analysis, MCD for Minimum Covariance Determinant, KNN for k-Nearest Neighbors, NB for Naive Bayes, SVM for Support Vector Machine, DT for Decision Trees (and includes random forest), ANN for Artificial Neural Network, FNN for Feedforward Neural Network, LSTM for Long Short-Term Memory, RNN for Recurrent Neural Network, CNN for Convolution Neural Network, SOM for Self-Organizing-Maps, RBM for Restricted Boltzmann Machines, AE for Autoencoder and DBSCAN for Density-Based Spatial Clustering of Applications with Noise

As can be concluded from the above information, there are several methods that can be applied to anomaly detection. Regardless of the chosen method, we must take into consideration some problems associated with the nature of the data. The first class of problems that the methods can be vulnerable to are data poisoning attacks. In this context, a data poisoning attack might be something that we consider normal, being abnormal in the training phase. In [ 77 ], the authors deal with this problem by separating the training phase from the learning process.

Different methods should be considered when dealing with anomalies in data streams, since there is not one single method able to detect all types of anomalies. Furthermore, data streams are very susceptible to data poisoning attacks, since the use of supervised methods does not know the most recent data and needs to be regularly updated. Moreover, we should evaluate, once more, the threshold between execution time and other performance metrics. Finally, in the context of big data and ML, we should take into account that we are dealing with a class imbalance problem.

Conclusions and future research directions

Data by itself can have no value for organizations and society. However, we can transform data into knowledge and improve decision-making through analysis. Nevertheless, dealing with big data can be a complex problem, especially when the data keeps growing over time. In this context, Stream Processing Engines emerged. They are an essential tool for processing big data in real-time. In this work, we presented some frameworks to process data streams in real-time, and we compared them. Spark is not a native streaming framework since it uses micro-batches, which brings some performance issues. However, Spark is the most popular framework with several exploratory data analysis and machine learning modules. On the other side, Flink can deal better with data-intensive applications, while Heron seems to scale better.

We also presented approaches to deal with common big data problems, such as forecasting and anomaly detection in real-time. Applying these algorithms in real time can be very beneficial for organizations. For instance, the use of forecasting can help organizations to optimize the use of services and resources. On the other side, using anomaly detection algorithms can prevent or minimize problems before they happen, such as network attacks. Finally, we discussed statistical, machine learning, and deep learning approaches. Statistical methods are more explainable and computationally lighter. On the other side, machine learning methods deal better with complex data and can predict longer times.

As future research directions, we would like to suggest real-time analytics and algorithms over big data time series streams. Namely, having time series related machine learning and deep learning algorithms take advantage of online learning for providing real-time analysis, forecasts, and anomaly detection. Another possible research direction is the development of explainable methods focused on time-series.

Availability of data and materials

Not applicable.

https://hadoop.apache.org/ .

https://spark.apache.org/ .

https://flink.apache.org/ .

https://storm.apache.org/ .

https://heron.apache.org/ .

https://samza.apache.org/ .

https://aws.amazon.com/kinesis/ .

https://incubator.apache.org/ .

www.scopus.com.

Cox M, Ellsworth D. Application-controlled demand paging for out-of-core visualization. In: Proceedings of the 8th Conference on Visualization ’97. VIS ’97, pp. 235–244. IEEE Computer Society Press, Washington, DC, USA, 1997. https://doi.org/10.1109/VISUAL.1997.663888

Fan J, Han F, Liu H. Challenges of Big Data analysis. Natl Sci Rev. 2014;1(2):293–314. https://doi.org/10.1093/nsr/nwt032 .

Article   Google Scholar  

Gomes EHA, Plentz PDM, Rolt CRD, Dantas MAR. A survey on data stream, big data and real-time. Int J Netw Virtual Organ. 2019;20(2):143–67. https://doi.org/10.1504/IJNVO.2019.097631 .

Zhou B, Li J, Wang X, Gu Y, Xu L, Hu Y, Zhu L. Online internet traffic monitoring system using spark streaming. Big Data Mining Anal. 2018;1(1):47–56. https://doi.org/10.26599/BDMA.2018.9020005 .

Thudumu S, Branch P, Jin J, Singh J. A comprehensive survey of anomaly detection techniques for high dimensional big data. J Big Data. 2020. https://doi.org/10.1186/s40537-020-00320-x .

Es-Samaali H, Outchakoucht A, Benhadou S, Mounnan O, Abou El Kalam A. Anomaly detection for big data security: a benchmark. In: 2021 the 3rd International Conference on Big Data Engineering and Technology (BDET). BDET 2021, Association for Computing Machinery, New York, NY, USA 2021, pp. 35–39. https://doi.org/10.1145/3474944.3474950

Liu X, Buyya R. Resource management and scheduling in distributed stream processing systems: a taxonomy, review, and future directions. ACM Comput Surv. 2020. https://doi.org/10.1145/3355399 .

Sahal R, Breslin JG, Ali MI. Big data and stream processing platforms for industry 4.0 requirements mapping for a predictive maintenance use case. J Manuf Syst. 2020;54:138–51. https://doi.org/10.1016/j.jmsy.2019.11.004 .

Kolajo T, Daramola O, Adebiyi A. Big data stream analysis: a systematic literature review. J Big Data. 2019;6(1):47. https://doi.org/10.1186/s40537-019-0210-7 .

Namiot D. On big data stream processing. Int J Open Info Technol. 2015;3(8):48–51.

Google Scholar  

Wu Y. Network big data: a literature survey on stream data mining. J Softw. 2014. https://doi.org/10.4304/jsw.9.9.2427-2434 .

Inoubli W, Aridhi S, Mezni H, Maddouri M, Mephu Nguifo E. A comparative study on streaming frameworks for big data. In: Ziviani A, Hara CS, Ogasawara ES, de Macêdo JAF, Valduriez P, editors. LADaS@VLDB. Rio de Janeiro: CEUR-WS.org; 2018. p. 17–24.

Dai Q, Qian J. A distributed stream data processing platform design and implementation in smart cities. In: 2020 IEEE 3rd International Conference on Electronic Information and Communication Technology (ICEICT), 2020, pp. 688–693. https://doi.org/10.1109/ICEICT51264.2020.9334234

Ahmed M, Choudhury N, Uddin S. Anomaly detection on big data in financial markets. In: 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2017, pp. 998–1001

L’Heureux A, Grolinger K, Elyamany HF, Capretz MAM. Machine learning with big data: challenges and approaches. IEEE Access. 2017;5:7776–97. https://doi.org/10.1109/ACCESS.2017.2696365 .

Johnson J, Khoshgoftaar T. Survey on deep learning with class imbalance. J Big Data. 2019;6:27. https://doi.org/10.1186/s40537-019-0192-5 .

Luo Y, Du X, Sun Y. Survey on real-time anomaly detection technology for big data streams. In: 2018 12th IEEE International Conference on Anti-counterfeiting, Security, and Identification (ASID), 2018, pp. 26–30. https://doi.org/10.1109/ICASID.2018.8693216

Zhu Y, Zhong XY. Data explosion, data nature and dataology. Brain Inform. 2009;5819:147–58. https://doi.org/10.1007/978-3-642-04954-5_25 .

Gandomi A, Haider M. Beyond the hype: big data concepts, methods, and analytics. Int J Inf Manag. 2015;35(2):137–44. https://doi.org/10.1016/j.ijinfomgt.2014.10.007 .

Trifunovic N, Milutinovic V, Salom J, Kos A. Paradigm shift in big data supercomputing: dataflow vs. controlflow. J Big Data. 2015. https://doi.org/10.1186/s40537-014-0010-z .

Arya M, Sastry GH. Deal-’deep ensemble algorithm’ framework for credit card fraud detection in real-time data stream with google tensorflow. Smart Sci. 2020;8(2):71–83. https://doi.org/10.1080/23080477.2020.1783491 .

Zhao S, Chandrashekar M, Lee Y, Medhi D. Real-time network anomaly detection system using machine learning. In: 2015 11th International Conference on the Design of Reliable Communication Networks (DRCN), 2015, pp. 267–270. https://doi.org/10.1109/DRCN.2015.7149025

Hennig L, Thomas P, Ai R, Kirschnick J, Wang H, Pannier J, Zimmermann N, Schmeier S, Xu F, Ostwald J, Uszkoreit H. Real-time discovery and geospatial visualization of mobility and industry events from large-scale, heterogeneous data streams. In: Proceedings of ACL-2016 System Demonstrations. Association for Computational Linguistics, Berlin, Germany 2016, pp. 37–42. https://doi.org/10.18653/v1/P16-4007. https://aclanthology.org/P16-4007

Baban P. Pre-processing and data validation in IOT data streams. In: Proceedings of the 14th ACM International Conference on Distributed and Event-Based Systems. DEBS ’20. Association for Computing Machinery, New York, NY, USA 2020, pp. 226–229. https://doi.org/10.1145/3401025.3406443

Kovacs A, Bogdandy B, Toth Z. Predict stock market prices with recurrent neural networks using NASDAQ data stream, 2021, pp. 449–454. https://doi.org/10.1109/SACI51354.2021.9465634

Bahri M, Bifet A, Gama J, Gomes HM, Maniu S. Data stream analysis: foundations, major tasks and tools. WIREs Data Min Knowl Discov. 2021;11(3):1405. https://doi.org/10.1002/widm.1405 .

Namiot D, Sneps-Sneppe M, Pauliks R. On data stream processing in IOT applications. In: Galinina O, Andreev S, Balandin S, Koucheryavy Y, editors. Internet of things, smart spaces, and next generation networks and systems. Cham: Springer; 2018. p. 41–51.

Chapter   Google Scholar  

Choudhary P, Garg K. Comparative analysis of spark and hadoop through imputation of data on big datasets. In: 2021 IEEE Bombay Section Signature Conference (IBSSC), 2021, pp. 1–6. https://doi.org/10.1109/IBSSC53889.2021.9673461

Karakaya Z, Yazici A, Alayyoub M. A comparison of stream processing frameworks. In: 2017 International Conference on Computer and Applications (ICCA), 2017, pp. 1–12 . https://doi.org/10.1109/COMAPP.2017.8079733

Nasiri H, Nasehi S, Goudarzi M. Evaluation of distributed stream processing frameworks for IOT applications in smart cities. J Big Data. 2019. https://doi.org/10.1186/s40537-019-0215-2 .

Shahverdi E, Awad A, Sakr S. Big stream processing systems: an experimental evaluation. In: 2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW), 2019, pp. 53–60. https://doi.org/10.1109/ICDEW.2019.00-35

Wecel K, Szmydt M, Stróżyna M. Stream processing tools for analyzing objects in motion sending high-volume location data. Bus Inf Syst. 2021;1:257–68. https://doi.org/10.52825/bis.v1i.41 .

Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I. Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing. HotCloud’10. USENIX Association, USA 2010, p. 10

Kulkarni S, Bhagat N, Fu M, Kedigehalli V, Kellogg C, Mittal S, Patel JM, Ramasamy K, Taneja S. Twitter heron: stream processing at scale. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. SIGMOD ’15. Association for Computing Machinery, New York, NY, USA 2015, pp. 239–250. https://doi.org/10.1145/2723372.2742788

Salloum S, Dautov R, Chen X, Peng PX, Huang JZ. Big data analytics on apache spark. Int J Data Sci Anal. 2016;1:145–64. https://doi.org/10.1007/s41060-016-0027-9 .

Ding N, Gao H, Bu H, Ma H. Radm:real-time anomaly detection in multivariate time series based on bayesian network. In: 2018 IEEE International Conference on Smart Internet of Things (SmartIoT), 2018, pp. 129–134. https://doi.org/10.1109/SmartIoT.2018.00-13

Qin X, Tang S, Chen X, Miao D, Wei G. Sqoe kqis anomaly detection in cellular networks: fast online detection framework with hourglass clustering. China Commun. 2018;15(10):25–37. https://doi.org/10.1109/CC.2018.8485466 .

Almeida A, Brás S, Oliveira I, Sargento S. Vehicular traffic flow prediction using deployed traffic counters in a city. Futur Gener Comput Syst. 2022;128:429–42. https://doi.org/10.1016/j.future.2021.10.022 .

Makridakis S, Spiliotis E, Assimakopoulos V. The m4 competition: results, findings, conclusion and way forward. Int J Forecast. 2018;34(4):802–8. https://doi.org/10.1016/j.ijforecast.2018.06.001 .

Makridakis S, Spiliotis E, Assimakopoulos V. M5 accuracy competition: results, findings, and conclusions. Int J Forecast. 2022. https://doi.org/10.1016/j.ijforecast.2021.11.013 .

Karlaftis MG, Vlahogianni EI. Statistical methods versus neural networks in transportation research: differences, similarities and some insights. Transp Res Part C Emerg Technol. 2011;19(3):387–99. https://doi.org/10.1016/j.trc.2010.10.004 .

Carvalho DV, Pereira EM, Cardoso JS. Machine learning interpretability: a survey on methods and metrics. Electronics (Switzerland). 2019. https://doi.org/10.3390/electronics8080832 .

Sezer OB, Gudelek MU, Ozbayoglu AM. Financial time series forecasting with deep learning: a systematic literature review: 2005–2019. Appl Soft Comput. 2020;90: 106181. https://doi.org/10.1016/j.asoc.2020.106181 .

Masini RP, Medeiros MC, Mendes EF. Machine learning advances for time series forecasting. J Econ Surv. 2023;37(1):76–111. https://doi.org/10.1111/joes.12429 .

Khan W, Ghazanfar MA, Azam MA, Karami A, Alyoubi K, Alfakeeh A. Stock market prediction using machine learning classifiers and social media news. J Ambient Intell Humaniz Comput. 2022. https://doi.org/10.1007/s12652-020-01839-w .

Ali A, Zhu Y, Zakarya M. Exploiting dynamic spatio-temporal graph convolutional neural networks for citywide traffic flows prediction. Neural Netw. 2022;145:233–47. https://doi.org/10.1016/j.neunet.2021.10.021 .

Guo K, Hu Y, Qian Z, Liu H, Zhang K, Sun Y, Gao J, Yin B. Optimized graph convolution recurrent neural network for traffic prediction. IEEE Trans Intell Transp Syst. 2021;22(2):1138–49. https://doi.org/10.1109/TITS.2019.2963722 .

Zhu B, Ye S, Wang P, Chevallier J, Wei Y-M. Forecasting carbon price using a multi-objective least squares support vector machine with mixture kernels. J Forecast. 2022;41(1):100–17.

Article   MathSciNet   Google Scholar  

Olu-Ajayi R, Alaka H, Sulaimon I, Sunmola F, Ajayi S. Building energy consumption prediction for residential buildings using deep learning and other machine learning techniques. J Build Eng. 2022;45: 103406. https://doi.org/10.1016/j.jobe.2021.103406 .

Zhang S, Zhao S, Yuan M, Zeng J, Yao J, Lyu MR, King I. Traffic prediction based power saving in cellular networks: a machine learning method. In: Proceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. SIGSPATIAL ’17. Association for Computing Machinery, New York, NY, USA 2017) https://doi.org/10.1145/3139958.3140053

Hoang MX, Zheng Y, Singh AK. FCCF: Forecasting citywide crowd flows based on big data. In: Proceeding of the 24rd ACM International Conference on Advances in Geographical Information Systems (ACM SIGSPATIAL 2016). ACM SIGSPATIAL 2016, 2016. https://www.microsoft.com/en-us/research/publication/forecasting-citywide-crowd-flows-based-big-data/

Yao H, Wu F, Ke J, Tang X, Jia Y, Lu S, Gong P, Ye J, Li Z. Deep multi-view spatial-temporal network for taxi demand prediction. In: McIlraith, S.A., Weinberger, K.Q. (eds.) Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th Innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pp. 2588–2595. AAAI Press, 2018. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16069

Zhang K, Liu Z, Zheng L. Short-term prediction of passenger demand in multi-zone level: temporal convolutional neural network with multi-task learning. IEEE Trans Intell Transp Syst. 2020;21(4):1480–90. https://doi.org/10.1109/TITS.2019.2909571 .

Junior G, Rodrigues J, Carvalho L, Al-Muhtadi J, Proença M. A comprehensive survey on network anomaly detection. Telecommun Syst. 2019. https://doi.org/10.1007/s11235-018-0475-8 .

Chandola V, Banerjee A, Kumar V. Anomaly detection: a survey. ACM Comput Surv 2009. https://doi.org/10.1145/1541880.1541882

Ahmed M, Naser Mahmood A, Hu J. A survey of network anomaly detection techniques. J Netw Comput Appl. 2016;60:19–31. https://doi.org/10.1016/j.jnca.2015.11.016 .

Zhu M, Ye K, Xu C-Z. Network anomaly detection and identification based on deep learning methods. In: Luo M, Zhang L-J, editors. Cloud computing–CLOUD 2018. Cham: Springer; 2018. p. 219–34.

Hochenbaum J, Vallis OS, Kejariwal A. Automatic anomaly detection in the cloud via statistical learning. CoRR abs/1704.07706 (2017). 1704.07706

Barford, P., Kline, J., Plonka, D., Ron, A.: A signal analysis of network traffic anomalies. In: Proceedings of the 2nd ACM SIGCOMM Workshop on Internet Measurment. IMW ’02. Association for Computing Machinery, New York, NY, USA 2002, pp. 71–82. https://doi.org/10.1145/637201.637210

Jiang D, Yao C, Xu Z, Qin W. Multi-scale anomaly detection for high-speed network traffic. Trans Emerg Telecommun Technol. 2015;26(3):308–17. https://doi.org/10.1002/ett.2619 .

Matsuda T, Morita T, Kudo T, Takine T. Traffic anomaly detection based on robust principal component analysis using periodic traffic behavior. IEICE Trans Commun E100.B(5), 2017, pp. 749–761 . https://doi.org/10.1587/transcom.2016EBP3239 .

Jiang R, Fei H, Huan J. A family of joint sparse PCA algorithms for anomaly localization in network data streams. IEEE Trans Knowl Data Eng. 2013;25(11):2421–33. https://doi.org/10.1109/TKDE.2012.176 .

Li Y, Lu T, Guo L, Tian Z, Qi L. Optimizing network anomaly detection scheme using instance selection mechanism. In: GLOBECOM 2009–2009 IEEE Global Telecommunications Conference, 2009, pp. 1–7. https://doi.org/10.1109/GLOCOM.2009.5425547

Hussain B, Du Q, Zhang S, Imran A, Imran MA. Mobile edge computing-based data-driven deep learning framework for anomaly detection. IEEE Access. 2019;7:137656–67. https://doi.org/10.1109/ACCESS.2019.2942485 .

Radford BJ, Apolonio LM, Trias AJ, Simpson JA. Network traffic anomaly detection using recurrent neural networks. CoRR 2018.

Tao X, Peng Y, Zhao F, Yang C, Qiang B, Wang Y, Xiong Z. Gated recurrent unit-based parallel network traffic anomaly detection using subagging ensembles. Ad Hoc Netw. 2021. https://doi.org/10.1016/j.adhoc.2021.102465 .

Ravi V, Kp S, Poornachandran P. Evaluation of recurrent neural network and its variants for intrusion detection system (IDs). Int J Inf Syst Model Des. 2017;8:43–63. https://doi.org/10.4018/IJISMD.2017070103 .

Nie L, Li Y, Kong X. Spatio-temporal network traffic estimation and anomaly detection based on convolutional neural network in vehicular ad-hoc networks. IEEE Access. 2018;6:40168–76. https://doi.org/10.1109/ACCESS.2018.2854842 .

Ogawa, Y., Kimura, T., Cheng, J.: Vulnerability assessment for machine learning based network anomaly detection system. In: 2020 IEEE International Conference on Consumer Electronics–Taiwan (ICCE-Taiwan), 2020, pp. 1–2 . https://doi.org/10.1109/ICCE-Taiwan49838.2020.9258068

Garg S, Kaur K, Kumar N, Rodrigues JJPC. Hybrid deep-learning-based anomaly detection scheme for suspicious flow detection in SDN: a social multimedia perspective. IEEE Trans Multimedia. 2019;21(3):566–78. https://doi.org/10.1109/TMM.2019.2893549 .

Sarasamma ST, Zhu QA, Huff J. Hierarchical kohonenen net for anomaly detection in network security. IEEE Trans Syst Man Cybern Syst. 2005;35(2):302–12. https://doi.org/10.1109/TSMCB.2005.843274 .

Chen Z, Yeo C, Lee B-S, Lau C. Autoencoder-based network anomaly detection. 2018 Wireless Telecommunications Symposium (WTS), 2018, p. 1–5. https://doi.org/10.1109/WTS.2018.8363930 .

Dai S, Yan J, Wang X, Zhang L. A deep one-class model for network anomaly detection. IOP Conf Ser Mater Sci Eng. 2019;563: 042007. https://doi.org/10.1088/1757-899X/563/4/042007 .

Kwon, D., Natarajan, K., Suh, S., Kim, H., Kim, J.: An empirical study on network anomaly detection using convolutional neural networks. 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS), 2018, pp. 1595–1598. https://doi.org/10.1109/ICDCS.2018.00178 .

Kasai H, Kellerer W, Kleinsteuber M. Network volume anomaly detection and identification in large-scale networks based on online time-structured traffic tensor tracking. IEEE Trans Netw Serv Manag. 2016;13(3):636–50. https://doi.org/10.1109/TNSM.2016.2598788 .

Xie K, Li X, Wang X, Xie G, Wen J, Cao J, Zhang D. Fast tensor factorization for accurate internet anomaly detection. IEEE/ACM Trans Netw. 2017;25(6):3794–807. https://doi.org/10.1109/TNET.2017.2761704 .

Moustafa N, Choo K-KR, Radwan I, Camtepe S. Outlier dirichlet mixture mechanism: adversarial statistical learning for anomaly detection in the fog. IEEE Trans Inf Forensics Secur. 2019;14(8):1975–87. https://doi.org/10.1109/TIFS.2018.2890808 .

Zhou J, Gandomi AH, Chen F, Holzinger A. Evaluating the quality of machine learning explanations: a survey on methods and metrics. Electronics. 2021. https://doi.org/10.3390/electronics10050593 .

Buhl H, Roeglinger M, Moser F, Heidemann J. Big data: a fashionable topic with(out) sustainable relevance for research and practice? Bus Inf Syst Eng. 2013;5:65–9. https://doi.org/10.1007/s12599-013-0249-5 .

Vigen T. Spurious correlations. 2022. https://www.tylervigen.com/spurious-correlations . Accessed 7 Sep 2022.

Google: google trends. 2022. https://trends.google.com/trends/explore . Accessed 07 Sept 2022.

Kobayashi L, Oyalowo A, Agrawal U, Chen S-L, Asaad W, Hu X, Loparo KA, Jay GD, Merck DL. Development and deployment of an open, modular, near-real-time patient monitor datastream conduit toolkit to enable healthcare multimodal data fusion in a live emergency department setting for experimental bedside clinical informatics research. IEEE Sensors Lett. 2019;3(1):1–4. https://doi.org/10.1109/LSENS.2018.2880140 .

Dong E, Du H, Gardner L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect Dis. 2020;20(5):533–4. https://doi.org/10.1016/s1473-3099(20)30120-1 .

Schultz W, Javey S, Sorokina A. Smart water meters and data analytics decrease wasted water due to leaks. J Am Water Works Assoc. 2018;110(11):24–30. https://doi.org/10.1002/awwa.1124 .

Feuerriegel S, Dolata M, Schwabe G. Fair AI: challenges and opportunities. Bus Inf Syst Eng. 2020. https://doi.org/10.1007/s12599-020-00650-3 .

Confluent: what is streaming data? How it works, examples, and use cases. 2022. https://www.confluent.io/learn/data-streaming/ . Accessed 30 Aug 2022.

Flink A. Stateful computations over data streams. 2022. https://flink.apache.org/ . Accessed 28 Jun 2022.

Flink A. Windows: Apache Flink. 2022. https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/windows/ h. Accessed 28 Jul 2022.

Lam C. Hadoop in action. 1st ed. USA: Manning Publications Co.; 2010.

of the ACM C: Apache spark: a unified engine for big data processing on VIMEO. 2022. https://vimeo.com/185645796 . Accessed 21 Jul 2022.

Hueske F. What is/are the main difference(s) between Flink and Storm? Stack Overflow. https://stackoverflow.com/a/30719138 . Accessed 28 Jun 2022.

Zhang Y. Building a better and faster Beam Samza runner: LinkedIn engineering. https://engineering.linkedin.com/blog/2020/building-a-better-and-faster-beam-samza-runner . Accessed 30 Jun 2022.

Foundation TAS. Apache Heron. A realtime, distributed, fault-tolerant stream processing engine. 2022. https://heron.apache.org/ . Accessed 30 Aug 2022.

Hyndman RJ, Athanasopoulos G. Forecasting: principles and practice. 3rd ed. Melbourne: OTexts; 2021.

Pal A, Prakash P. Practical time series analysis: master time series data processing, visualization, and modeling using python. UK: Packt Publishing; 2017.

Brownlee J. Introduction to time series forecasting with python: how to prepare data and develop models to predict the future. Machine Learning Mastery, San Juan, Puerto Rico, 2017. https://books.google.pt/books?id=-AiqDwAAQBAJ

Muñoz P, Barco R, Serrano I, Gómez-Andrades A. Correlation-based time-series analysis for cell degradation detection in son. IEEE Commun Lett. 2016;20(2):396–9. https://doi.org/10.1109/LCOMM.2016.2516004 .

Download references

Acknowledgements

This work is supported by FEDER, through POR LISBOA 2020 and COMPETE 2020 of the Portugal 2020 Project CityCatalyst POCI-01-0247-FEDER-046119. Ana Almeida acknowledges the Doctoral Grant from Fundação para a Ciência e Tecnologia (2021.06222.BD). Susana Brás is funded by national funds, European Regional Development Fund, FSE, through COMPETE2020 and FCT, in the scope of the framework contract foreseen in the numbers 4, 5 and 6 of the article 23, of the Decree-Law 57/2016, of August 29, changed by Law 57/2017, of July 19.

Author information

Authors and affiliations.

Instituto de Telecomunicações, Aveiro, Portugal

Ana Almeida, Susana Sargento & Filipe Cabral Pinto

Departamento de Eletrónica, Telecomunicações e Informática, Universidade de Aveiro, Aveiro, Portugal

Ana Almeida, Susana Brás & Susana Sargento

IEETA, DETI, LASI, Universidade de Aveiro, Aveiro, Portugal

Susana Brás

Altice Labs, Aveiro, Portugal

Filipe Cabral Pinto

You can also search for this author in PubMed   Google Scholar

Contributions

Conceptualization: AA; Data curation: AA; Formal analysis: AA; Investigation: AA; Methodology: AA; Software: AA; Validation: AA, SB; Visualization: AA; Writing—original draft: AA; Funding acquisition: SS; Project administration: SS; Supervision: SB, SS, FCP; Writing—review & editing: SB, SS, FCP. All authors read the final manuscript.

Corresponding author

Correspondence to Ana Almeida .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Almeida, A., Brás, S., Sargento, S. et al. Time series big data: a survey on data stream frameworks, analysis and algorithms. J Big Data 10 , 83 (2023). https://doi.org/10.1186/s40537-023-00760-1

Download citation

Received : 12 October 2022

Accepted : 08 May 2023

Published : 28 May 2023

DOI : https://doi.org/10.1186/s40537-023-00760-1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Time series
  • Stream processing engines
  • Forecasting
  • Machine learning

time series analysis literature review

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

information-logo

Article Menu

time series analysis literature review

  • Subscribe SciFeed
  • Recommended Articles
  • Author Biographies
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Advancements in deep learning techniques for time series forecasting in maritime applications: a comprehensive review.

time series analysis literature review

1. Introduction

2. literature collection procedure.

  • Search scope: Titles, Keywords, and Abstracts
  • Keywords 1: ‘deep’ AND ‘learning’, AND
  • Keywords 2: ‘time AND series’, AND
  • Keywords 3: ‘maritime’, OR
  • Keywords 4: ‘vessel’, OR
  • Keywords 5: ‘shipping’, OR
  • Keywords 6: ‘marine’, OR
  • Keywords 7: ‘ship’, OR
  • Keywords 8: ‘port’, OR
  • Keywords 9: ‘terminal’
  • Retain only articles related to maritime operations. For example, studies on ship-surrounding weather and risk prediction based on ship data will be kept, while research solely focused on marine weather or wave prediction that is unrelated to any aspect of maritime operations will be excluded.
  • Exclude neural network studies that do not employ deep learning techniques, such as ANN or MLP with only one hidden layer.
  • The language of the publications must be English.
  • The original data used in the papers must include time series sequences.

3. Deep Learning Algorithms

3.1. artificial neural network (ann), 3.1.1. multilayer perceptron (mlp)/deep neural networks (dnn), 3.1.2. wavenet, 3.1.3. randomized neural network, 3.2. convolutional neural network (cnn), 3.3. recurrent neural network (rnn), 3.3.1. long short-term memory (lstm), 3.3.2. gated recurrent unit (gru), 3.4. attention mechanism (am)/transformer, 3.5. overview of algorithms usage, 4. time series forecasting in maritime applications, 4.1. ship operation-related applications, 4.1.1. ship trajectory prediction, 4.1.2. meteorological factor prediction, 4.1.3. ship fuel consumption prediction, 4.1.4. others, 4.2. port operation-related applications, 4.3. shipping market-related applications, 4.4. overview of time series forecasting in maritime applications, 5. overall analysis, 5.1. literature description, 5.1.1. literature distribution, 5.1.2. literature classification, 5.2. data utilized in maritime research, 5.2.1. automatic identification system data (ais data), 5.2.2. high-frequency radar data and sensor data, 5.2.3. container throughput data, 5.2.4. other datasets, 5.3. evaluation parameters, 5.4. real-world application examples, 5.5. future research directions, 5.5.1. data processing and feature extraction, 5.5.2. model optimization and application of new technologies, 5.5.3. specific application scenarios, 5.5.4. practical applications and long-term predictions, 5.5.5. environmental impact, fault prediction, and cross-domain applications, 6. conclusions, author contributions, institutional review board statement, informed consent statement, data availability statement, conflicts of interest.

  • UNCTAD. Review of Maritime Transport 2023 ; United Nations Conference on Trade and Development: Geneva, Switzerland, 2023; Available online: https://www.un-ilibrary.org/content/books/9789213584569 (accessed on 1 April 2024).
  • Liang, M.; Liu, R.W.; Zhan, Y.; Li, H.; Zhu, F.; Wang, F.Y. Fine-Grained Vessel Traffic Flow Prediction With a Spatio-Temporal Multigraph Convolutional Network. IEEE Trans. Intell. Transp. Syst. 2022 , 23 , 23694–23707. [ Google Scholar ] [ CrossRef ]
  • Liu, R.W.; Liang, M.; Nie, J.; Lim, W.Y.B.; Zhang, Y.; Guizani, M. Deep Learning-Powered Vessel Trajectory Prediction for Improving Smart Traffic Services in Maritime Internet of Things. IEEE Trans. Netw. Sci. Eng. 2022 , 9 , 3080–3094. [ Google Scholar ] [ CrossRef ]
  • Dui, H.; Zheng, X.; Wu, S. Resilience analysis of maritime transportation systems based on importance measures. Reliab. Eng. Syst. Saf. 2021 , 209 , 107461. [ Google Scholar ] [ CrossRef ]
  • Liang, M.; Li, H.; Liu, R.W.; Lam, J.S.L.; Yang, Z. PiracyAnalyzer: Spatial temporal patterns analysis of global piracy incidents. Reliab. Eng. Syst. Saf. 2024 , 243 , 109877. [ Google Scholar ] [ CrossRef ]
  • Chen, Z.S.; Lam, J.S.L.; Xiao, Z. Prediction of harbour vessel emissions based on machine learning approach. Transp. Res. Part D Transp. Environ. 2024 , 131 , 104214. [ Google Scholar ] [ CrossRef ]
  • Chen, Z.S.; Lam, J.S.L.; Xiao, Z. Prediction of harbour vessel fuel consumption based on machine learning approach. Ocean Eng. 2023 , 278 , 114483. [ Google Scholar ] [ CrossRef ]
  • Liang, M.; Weng, L.; Gao, R.; Li, Y.; Du, L. Unsupervised maritime anomaly detection for intelligent situational awareness using AIS data. Knowl.-Based Syst. 2024 , 284 , 111313. [ Google Scholar ] [ CrossRef ]
  • Dave, V.S.; Dutta, K. Neural network based models for software effort estimation: A review. Artif. Intell. Rev. 2014 , 42 , 295–307. [ Google Scholar ] [ CrossRef ]
  • Uslu, S.; Celik, M.B. Prediction of engine emissions and performance with artificial neural networks in a single cylinder diesel engine using diethyl ether. Eng. Sci. Technol. Int. J. 2018 , 21 , 1194–1201. [ Google Scholar ] [ CrossRef ]
  • Chaudhary, L.; Sharma, S.; Sajwan, M. Systematic Literature Review of Various Neural Network Techniques for Sea Surface Temperature Prediction Using Remote Sensing Data. Arch. Comput. Methods Eng. 2023 , 30 , 5071–5103. [ Google Scholar ] [ CrossRef ]
  • Dharia, A.; Adeli, H. Neural network model for rapid forecasting of freeway link travel time. Eng. Appl. Artif. Intell. 2003 , 16 , 607–613. [ Google Scholar ] [ CrossRef ]
  • Hecht-Nielsen, R. Applications of counterpropagation networks. Neural Netw. 1988 , 1 , 131–139. [ Google Scholar ] [ CrossRef ]
  • Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning ; MIT Press: Cambridge, MA, USA, 2016. [ Google Scholar ]
  • Veerappa, M.; Anneken, M.; Burkart, N. Evaluation of Interpretable Association Rule Mining Methods on Time-Series in the Maritime Domain. Springer International Publishing: Cham, Switzerland, 2021; pp. 204–218. [ Google Scholar ]
  • Frizzell, J.; Furth, M. Prediction of Vessel RAOs: Applications of Deep Learning to Assist in Design. In Proceedings of the SNAME 27th Offshore Symposium, Houston, TX, USA, 22 February 2022. [ Google Scholar ] [ CrossRef ]
  • Van Den Oord, A.; Dieleman, S.; Zen, H.; Simonyan, K.; Vinyals, O.; Graves, A.; Kalchbrenner, N.; Senior, A.; Kavukcuoglu, K. Wavenet: A generative model for raw audio. arXiv 2016 , arXiv:1609.03499. [ Google Scholar ]
  • He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [ Google Scholar ]
  • Ning, C.X.; Xie, Y.Z.; Sun, L.J. LSTM, WaveNet, and 2D CNN for nonlinear time history prediction of seismic responses. Eng. Struct. 2023 , 286 , 116083. [ Google Scholar ] [ CrossRef ]
  • Schmidt, W.F.; Kraaijveld, M.A.; Duin, R.P. Feed forward neural networks with random weights. In International Conference on Pattern Recognition ; IEEE Computer Society Press: Washington, DC, USA, 1992; pp. 1–4. [ Google Scholar ]
  • Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme learning machine: Theory and applications. Neurocomputing 2006 , 70 , 489–501. [ Google Scholar ] [ CrossRef ]
  • Pao, Y.H.; Park, G.H.; Sobajic, D.J. Learning and generalization characteristics of the random vector functional-link net. Neurocomputing 1994 , 6 , 163–180. [ Google Scholar ] [ CrossRef ]
  • Zhang, L.; Suganthan, P.N. A comprehensive evaluation of random vector functional link networks. Inf. Sci. 2016 , 367 , 1094–1105. [ Google Scholar ] [ CrossRef ]
  • Huang, G.; Huang, G.B.; Song, S.J.; You, K.Y. Trends in extreme learning machines: A review. Neural Netw. 2015 , 61 , 32–48. [ Google Scholar ] [ CrossRef ]
  • Shi, Q.S.; Katuwal, R.; Suganthan, P.N.; Tanveer, M. Random vector functional link neural network based ensemble deep learning. Pattern Recognit. 2021 , 117 , 107978. [ Google Scholar ] [ CrossRef ]
  • Du, L.; Gao, R.B.; Suganthan, P.N.; Wang, D.Z.W. Graph ensemble deep random vector functional link network for traffic forecasting. Appl. Soft Comput. 2022 , 131 , 109809. [ Google Scholar ] [ CrossRef ]
  • Rehman, A.; Xing, H.L.; Hussain, M.; Gulzar, N.; Khan, M.A.; Hussain, A.; Mahmood, S. HCDP-DELM: Heterogeneous chronic disease prediction with temporal perspective enabled deep extreme learning machine. Knowl.-Based Syst. 2024 , 284 , 111316. [ Google Scholar ] [ CrossRef ]
  • Gao, R.B.; Li, R.L.; Hu, M.H.; Suganthan, P.N.; Yuen, K.F. Online dynamic ensemble deep random vector functional link neural network for forecasting. Neural Netw. 2023 , 166 , 51–69. [ Google Scholar ] [ CrossRef ]
  • Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998 , 86 , 2278–2324. [ Google Scholar ] [ CrossRef ]
  • Palaz, D.; Magimai-Doss, M.; Collobert, R. End-to-end acoustic modeling using convolutional neural networks for HMM-based automatic speech recognition. Speech Commun. 2019 , 108 , 15–32. [ Google Scholar ] [ CrossRef ]
  • Fang, W.; Love, P.E.D.; Luo, H.; Ding, L. Computer vision for behaviour-based safety in construction: A review and future directions. Adv. Eng. Inform. 2020 , 43 , 100980. [ Google Scholar ] [ CrossRef ]
  • Qin, L.; Yu, N.; Zhao, D. Applying the convolutional neural network deep learning technology to behavioural recognition in intelligent video. Teh. Vjesn. 2018 , 25 , 528–535. [ Google Scholar ]
  • Hoseinzade, E.; Haratizadeh, S. CNNpred: CNN-based stock market prediction using a diverse set of variables. Expert Syst. Appl. 2019 , 129 , 273–285. [ Google Scholar ] [ CrossRef ]
  • Rasp, S.; Dueben, P.D.; Scher, S.; Weyn, J.A.; Mouatadid, S.; Thuerey, N. WeatherBench: A Benchmark Data Set for Data-Driven Weather Forecasting. J. Adv. Model. Earth Syst. 2020 , 12 , e2020MS002203. [ Google Scholar ] [ CrossRef ]
  • Crivellari, A.; Beinat, E.; Caetano, S.; Seydoux, A.; Cardoso, T. Multi-target CNN-LSTM regressor for predicting urban distribution of short-term food delivery demand. J. Bus. Res. 2022 , 144 , 844–853. [ Google Scholar ] [ CrossRef ]
  • Bai, S.; Kolter, J.Z.; Koltun, V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv 2018 , arXiv:1803.01271. [ Google Scholar ]
  • Lin, Z.; Yue, W.; Huang, J.; Wan, J. Ship Trajectory Prediction Based on the TTCN-Attention-GRU Model. Electronics 2023 , 12 , 2556. [ Google Scholar ] [ CrossRef ]
  • He, K.; Zhang, X.; Ren, S.; Sun, J. Identity Mappings in Deep Residual Networks. In Computer Vision–ECCV 2016 ; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer International Publishing: Cham, Switzerland, 2016; pp. 630–645. [ Google Scholar ]
  • Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014 , 15 , 1929–1958. [ Google Scholar ]
  • Bin Syed, M.A.; Ahmed, I. A CNN-LSTM Architecture for Marine Vessel Track Association Using Automatic Identification System (AIS) Data. Sensors 2023 , 23 , 6400. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Li, M.-W.; Xu, D.-Y.; Geng, J.; Hong, W.-C. A hybrid approach for forecasting ship motion using CNN–GRU–AM and GCWOA. Appl. Soft Comput. 2022 , 114 , 108084. [ Google Scholar ] [ CrossRef ]
  • Zhang, B.; Wang, S.; Deng, L.; Jia, M.; Xu, J. Ship motion attitude prediction model based on IWOA-TCN-Attention. Ocean Eng. 2023 , 272 , 113911. [ Google Scholar ] [ CrossRef ]
  • Elman, J.L. Finding Structure in Time. Cogn. Sci. 1990 , 14 , 179–211. [ Google Scholar ] [ CrossRef ]
  • Shan, F.; He, X.; Armaghani, D.J.; Sheng, D. Effects of data smoothing and recurrent neural network (RNN) algorithms for real-time forecasting of tunnel boring machine (TBM) performance. J. Rock Mech. Geotech. Eng. 2024 , 16 , 1538–1551. [ Google Scholar ] [ CrossRef ]
  • Apaydin, H.; Feizi, H.; Sattari, M.T.; Colak, M.S.; Shamshirband, S.; Chau, K.-W. Comparative Analysis of Recurrent Neural Network Architectures for Reservoir Inflow Forecasting. Water 2020 , 12 , 1500. [ Google Scholar ] [ CrossRef ]
  • Ma, Z.; Zhang, H.; Liu, J. MM-RNN: A Multimodal RNN for Precipitation Nowcasting. IEEE Trans. Geosci. Remote Sens. 2023 , 61 , 1–14. [ Google Scholar ] [ CrossRef ]
  • Lu, M.; Xu, X. TRNN: An efficient time-series recurrent neural network for stock price prediction. Inf. Sci. 2024 , 657 , 119951. [ Google Scholar ] [ CrossRef ]
  • Bengio, Y.; Simard, P.; Frasconi, P. Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 1994 , 5 , 157–166. [ Google Scholar ] [ CrossRef ]
  • Kratzert, F.; Klotz, D.; Brenner, C.; Schulz, K.; Herrnegger, M. Rainfall–runoff modelling using Long Short-Term Memory (LSTM) networks. Hydrol. Earth Syst. Sci. 2018 , 22 , 6005–6022. [ Google Scholar ] [ CrossRef ]
  • Hochreiter, S. Untersuchungen zu dynamischen neuronalen Netzen. Diploma Tech. Univ. München 1991 , 91 , 31. [ Google Scholar ]
  • Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997 , 9 , 1735–1780. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Gers, F.A.; Schmidhuber, J.; Cummins, F. Learning to Forget: Continual Prediction with LSTM. Neural Comput. 2000 , 12 , 2451–2471. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Witten, I.H.; Frank, E. Data mining: Practical machine learning tools and techniques with Java implementations. SIGMOD Rec. 2002 , 31 , 76–77. [ Google Scholar ] [ CrossRef ]
  • Mo, J.X.; Gao, R.B.; Liu, J.H.; Du, L.; Yuen, K.F. Annual dilated convolutional LSTM network for time charter rate forecasting. Appl. Soft Comput. 2022 , 126 , 109259. [ Google Scholar ] [ CrossRef ]
  • Cho, K.; Van Merriënboer, B.; Bahdanau, D.; Bengio, Y. On the properties of neural machine translation: Encoder-decoder approaches. arXiv 2014 , arXiv:1409.1259. [ Google Scholar ]
  • Yang, S.; Yu, X.; Zhou, Y. LSTM and GRU Neural Network Performance Comparison Study: Taking Yelp Review Dataset as an Example. In Proceedings of the 2020 International Workshop on Electronic Communication and Artificial Intelligence (IWECAI), Shanghai, China, 12–14 June 2020; pp. 98–101. [ Google Scholar ] [ CrossRef ]
  • Zhao, Z.N.; Yun, S.N.; Jia, L.Y.; Guo, J.X.; Meng, Y.; He, N.; Li, X.J.; Shi, J.R.; Yang, L. Hybrid VMD-CNN-GRU-based model for short-term forecasting of wind power considering spatio-temporal features. Eng. Appl. Artif. Intell. 2023 , 121 , 105982. [ Google Scholar ] [ CrossRef ]
  • Pan, N.; Ding, Y.; Fu, J.; Wang, J.; Zheng, H. Research on Ship Arrival Law Based on Route Matching and Deep Learning. J. Phys. Conf. Ser. 2021 , 1952 , 022023. [ Google Scholar ] [ CrossRef ]
  • Ma, J.; Li, W.K.; Jia, C.F.; Zhang, C.W.; Zhang, Y. Risk Prediction for Ship Encounter Situation Awareness Using Long Short-Term Memory Based Deep Learning on Intership Behaviors. J. Adv. Transp. 2020 , 2020 , 8897700. [ Google Scholar ] [ CrossRef ]
  • Suo, Y.F.; Chen, W.K.; Claramunt, C.; Yang, S.H. A Ship Trajectory Prediction Framework Based on a Recurrent Neural Network. Sensors 2020 , 20 , 5133. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv 2014 , arXiv:1409.0473. [ Google Scholar ] [ CrossRef ]
  • Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017 , 30 , 03762. [ Google Scholar ]
  • Nascimento, E.G.S.; de Melo, T.A.C.; Moreira, D.M. A transformer-based deep neural network with wavelet transform for forecasting wind speed and wind energy. Energy 2023 , 278 , 127678. [ Google Scholar ] [ CrossRef ]
  • Zhang, L.; Zhang, J.; Niu, J.; Wu, Q.M.J.; Li, G. Track Prediction for HF Radar Vessels Submerged in Strong Clutter Based on MSCNN Fusion with GRU-AM and AR Model. Remote Sens. 2021 , 13 , 2164. [ Google Scholar ] [ CrossRef ]
  • Zhang, X.; Fu, X.; Xiao, Z.; Xu, H.; Zhang, W.; Koh, J.; Qin, Z. A Dynamic Context-Aware Approach for Vessel Trajectory Prediction Based on Multi-Stage Deep Learning. IEEE Trans. Intell. Veh. 2024 , 1–16. [ Google Scholar ] [ CrossRef ]
  • Jiang, D.; Shi, G.; Li, N.; Ma, L.; Li, W.; Shi, J. TRFM-LS: Transformer-Based Deep Learning Method for Vessel Trajectory Prediction. J. Mar. Sci. Eng. 2023 , 11 , 880. [ Google Scholar ] [ CrossRef ]
  • Violos, J.; Tsanakas, S.; Androutsopoulou, M.; Palaiokrassas, G.; Varvarigou, T. Next Position Prediction Using LSTM Neural Networks. In Proceedings of the 11th Hellenic Conference on Artificial Intelligence, Athens, Greece, 2–4 September 2020; pp. 232–240. [ Google Scholar ] [ CrossRef ]
  • Hoque, X.; Sharma, S.K. Ensembled Deep Learning Approach for Maritime Anomaly Detection System. In Proceedings of the 1st International Conference on Emerging Trends in Information Technology (ICETIT), Inst Informat Technol & Management, New Delhi, India, 21–22 June 2020; In Lecture Notes in Electrical Engineering. Volume 605, pp. 862–869. [ Google Scholar ]
  • Wang, Y.; Zhang, M.; Fu, H.; Wang, Q. Research on Prediction Method of Ship Rolling Motion Based on Deep Learning. In Proceedings of the 2020 39th Chinese Control Conference (CCC), Shenyang, China, 27–29 July 2020; pp. 7182–7187. [ Google Scholar ] [ CrossRef ]
  • Choi, J. Predicting the Frequency of Marine Accidents by Navigators’ Watch Duty Time in South Korea Using LSTM. Appl. Sci. 2022 , 12 , 11724. [ Google Scholar ] [ CrossRef ]
  • Li, T.; Li, Y.B. Prediction of ship trajectory based on deep learning. J. Phys. Conf. Ser. 2023 , 2613 , 012023. [ Google Scholar ] [ CrossRef ]
  • Chondrodima, E.; Pelekis, N.; Pikrakis, A.; Theodoridis, Y. An Efficient LSTM Neural Network-Based Framework for Vessel Location Forecasting. IEEE Trans. Intell. Transp. Syst. 2023 , 24 , 4872–4888. [ Google Scholar ] [ CrossRef ]
  • Long, Z.; Suyuan, W.; Zhongma, C.; Jiaqi, F.; Xiaoting, Y.; Wei, D. Lira-YOLO: A lightweight model for ship detection in radar images. J. Syst. Eng. Electron. 2020 , 31 , 950–956. [ Google Scholar ] [ CrossRef ]
  • Cheng, X.; Li, G.; Skulstad, R.; Zhang, H.; Chen, S. SpectralSeaNet: Spectrogram and Convolutional Network-based Sea State Estimation. In Proceedings of the IECON 2020 the 46th Annual Conference of the IEEE Industrial Electronics Society, Singapore, 18–21 October 2020; pp. 5069–5074. [ Google Scholar ] [ CrossRef ]
  • Wang, K.; Cheng, X.; Shi, F. Learning Dynamic Graph Structures for Sea State Estimation with Deep Neural Networks. In Proceedings of the 2023 6th International Conference on Intelligent Autonomous Systems (ICoIAS), Qinhuangdao, China, 22–24 September 2023; pp. 161–166. [ Google Scholar ]
  • Yu, J.; Huang, D.; Shi, X.; Li, W.; Wang, X. Real-Time Moving Ship Detection from Low-Resolution Large-Scale Remote Sensing Image Sequence. Appl. Sci. 2023 , 13 , 2584. [ Google Scholar ] [ CrossRef ]
  • Ilias, L.; Kapsalis, P.; Mouzakitis, S.; Askounis, D. A Multitask Learning Framework for Predicting Ship Fuel Oil Consumption. IEEE Access 2023 , 11 , 132576–132589. [ Google Scholar ] [ CrossRef ]
  • Selimovic, D.; Hrzic, F.; Prpic-Orsic, J.; Lerga, J. Estimation of sea state parameters from ship motion responses using attention-based neural networks. Ocean Eng. 2023 , 281 , 114915. [ Google Scholar ] [ CrossRef ]
  • Ma, J.; Jia, C.; Yang, X.; Cheng, X.; Li, W.; Zhang, C. A Data-Driven Approach for Collision Risk Early Warning in Vessel Encounter Situations Using Attention-BiLSTM. IEEE Access 2020 , 8 , 188771–188783. [ Google Scholar ] [ CrossRef ]
  • Ji, Z.; Gan, H.; Liu, B. A Deep Learning-Based Fault Warning Model for Exhaust Temperature Prediction and Fault Warning of Marine Diesel Engine. J. Mar. Sci. Eng. 2023 , 11 , 1509. [ Google Scholar ] [ CrossRef ]
  • Liu, Y.; Gan, H.; Cong, Y.; Hu, G. Research on fault prediction of marine diesel engine based on attention-LSTM. Proc. Inst. Mech. Eng. Part M J. Eng. Marit. Environ. 2023 , 237 , 508–519. [ Google Scholar ] [ CrossRef ]
  • Li, M.W.; Xu, D.Y.; Geng, J.; Hong, W.C. A ship motion forecasting approach based on empirical mode decomposition method hybrid deep learning network and quantum butterfly optimization algorithm. Nonlinear Dyn. 2022 , 107 , 2447–2467. [ Google Scholar ] [ CrossRef ]
  • Yang, C.H.; Chang, P.Y. Forecasting the Demand for Container Throughput Using a Mixed-Precision Neural Architecture Based on CNN–LSTM. Mathematics 2020 , 8 , 1784. [ Google Scholar ] [ CrossRef ]
  • Zhang, W.; Wu, P.; Peng, Y.; Liu, D. Roll Motion Prediction of Unmanned Surface Vehicle Based on Coupled CNN and LSTM. Future Internet 2019 , 11 , 243. [ Google Scholar ] [ CrossRef ]
  • Kamal, I.M.; Bae, H.; Sunghyun, S.; Yun, H. DERN: Deep Ensemble Learning Model for Short- and Long-Term Prediction of Baltic Dry Index. Appl. Sci. 2020 , 10 , 1504. [ Google Scholar ] [ CrossRef ]
  • Li, M.Z.; Li, B.; Qi, Z.G.; Li, J.S.; Wu, J.W. Enhancing Maritime Navigational Safety: Ship Trajectory Prediction Using ACoAtt–LSTM and AIS Data. ISPRS Int. J. Geo-Inform. 2024 , 13 , 85. [ Google Scholar ] [ CrossRef ]
  • Yu, T.; Zhang, Y.; Zhao, S.; Yang, J.; Li, W.; Guo, W. Vessel trajectory prediction based on modified LSTM with attention mechanism. In Proceedings of the 2024 4th International Conference on Neural Networks, Information and Communication Engineering, NNICE, Guangzhou, China, 19–21 January 2024; pp. 912–918. [ Google Scholar ] [ CrossRef ]
  • Xia, C.; Peng, Y.; Qu, D. A pre-trained model specialized for ship trajectory prediction. In Proceedings of the IEEE Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Chongqing, China, 15–17 March 2024; pp. 1857–1860. [ Google Scholar ] [ CrossRef ]
  • Cheng, X.; Li, G.; Skulstad, R.; Chen, S.; Hildre, H.P.; Zhang, H. Modeling and Analysis of Motion Data from Dynamically Positioned Vessels for Sea State Estimation. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 6644–6650. [ Google Scholar ] [ CrossRef ]
  • Xia, C.; Qu, D.; Zheng, Y. TATBformer: A Divide-and-Conquer Approach to Ship Trajectory Prediction Modeling. In Proceedings of the 2023 IEEE 11th Joint International Information Technology and Artificial Intelligence Conference (ITAIC), Chongqing, China, 8–10 December 2023; pp. 335–339. [ Google Scholar ] [ CrossRef ]
  • Ran, Y.; Shi, G.; Li, W. Ship Track Prediction Model based on Automatic Identification System Data and Bidirectional Cyclic Neural Network. In Proceedings of the 2021 4th International Symposium on Traffic Transportation and Civil Architecture, ISTTCA, Suzhou, China, 12–14 November 2021; pp. 297–301. [ Google Scholar ] [ CrossRef ]
  • Yang, C.H.; Wu, C.H.; Shao, J.C.; Wang, Y.C.; Hsieh, C.M. AIS-Based Intelligent Vessel Trajectory Prediction Using Bi-LSTM. IEEE Access 2022 , 10 , 24302–24315. [ Google Scholar ] [ CrossRef ]
  • Sadeghi, Z.; Matwin, S. Anomaly detection for maritime navigation based on probability density function of error of reconstruction. J. Intell. Syst. 2023 , 32 , 20220270. [ Google Scholar ] [ CrossRef ]
  • Perumal, V.; Murugaiyan, S.; Ravichandran, P.; Venkatesan, R.; Sundar, R. Real time identification of anomalous events in coastal regions using deep learning techniques. Concurr. Comput. Pract. Exp. 2021 , 33 , e6421. [ Google Scholar ] [ CrossRef ]
  • Xie, J.L.; Shi, W.F.; Shi, Y.Q. Research on Fault Diagnosis of Six-Phase Propulsion Motor Drive Inverter for Marine Electric Propulsion System Based on Res-BiLSTM. Machines 2022 , 10 , 736. [ Google Scholar ] [ CrossRef ]
  • Han, P.; Li, G.; Skulstad, R.; Skjong, S.; Zhang, H. A Deep Learning Approach to Detect and Isolate Thruster Failures for Dynamically Positioned Vessels Using Motion Data. IEEE Trans. Instrum. Meas. 2021 , 70 , 1–11. [ Google Scholar ] [ CrossRef ]
  • Cheng, X.; Wang, K.; Liu, X.; Yu, Q.; Shi, F.; Ren, Z.; Chen, S. A Novel Class-Imbalanced Ship Motion Data-Based Cross-Scale Model for Sea State Estimation. IEEE Trans. Intell. Transp. Syst. 2023 , 24 , 15907–15919. [ Google Scholar ] [ CrossRef ]
  • Lei, L.; Wen, Z.; Peng, Z. Prediction of Main Engine Speed and Fuel Consumption of Inland Ships Based on Deep Learning. J. Phys. Conf. Ser. 2021 , 2025 , 012012. [ Google Scholar ]
  • Ljunggren, H. Using Deep Learning for Classifying Ship Trajectories. In Proceedings of the 21st International Conference on Information Fusion (FUSION), Cambridge, UK, 10–13 July 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 2158–2164. [ Google Scholar ]
  • Kulshrestha, A.; Yadav, A.; Sharma, H.; Suman, S. A deep learning-based multivariate decomposition and ensemble framework for container throughput forecasting. J. Forecast. 2024 , in press . [ Google Scholar ] [ CrossRef ]
  • Shankar, S.; Ilavarasan, P.V.; Punia, S.; Singh, S.P. Forecasting container throughput with long short-term memory networks. Ind. Manag. Data Syst. 2020 , 120 , 425–441. [ Google Scholar ] [ CrossRef ]
  • Lee, E.; Kim, D.; Bae, H. Container Volume Prediction Using Time-Series Decomposition with a Long Short-Term Memory Models. Appl. Sci. 2021 , 11 , 8995. [ Google Scholar ] [ CrossRef ]
  • Cuong, T.N.; You, S.-S.; Long, L.N.B.; Kim, H.-S. Seaport Resilience Analysis and Throughput Forecast Using a Deep Learning Approach: A Case Study of Busan Port. Sustainability 2022 , 14 , 13985. [ Google Scholar ] [ CrossRef ]
  • Song, X.; Chen, Z.S. Shipping market time series forecasting via an Ensemble Deep Dual-Projection Echo State Network. Comput. Electr. Eng. 2024 , 117 , 109218. [ Google Scholar ] [ CrossRef ]
  • Li, X.; Hu, Y.; Bai, Y.; Gao, X.; Chen, G. DeepDLP: Deep Reinforcement Learning based Framework for Dynamic Liner Trade Pricing. In Proceedings of the Proceedings of the 2023 17th International Conference on Ubiquitous Information Management and Communication, IMCOM, Seoul, Republic of Korea, 3–5 January 2023; pp. 1–8. [ Google Scholar ] [ CrossRef ]
  • Alqatawna, A.; Abu-Salih, B.; Obeid, N.; Almiani, M. Incorporating Time-Series Forecasting Techniques to Predict Logistics Companies’ Staffing Needs and Order Volume. Computation 2023 , 11 , 141. [ Google Scholar ] [ CrossRef ]
  • Lim, S.; Kim, S.J.; Park, Y.; Kwon, N. A deep learning-based time series model with missing value handling techniques to predict various types of liquid cargo traffic. Expert Syst. Appl. 2021 , 184 , 115532. [ Google Scholar ] [ CrossRef ]
  • Cheng, R.; Gao, R.; Yuen, K.F. Ship order book forecasting by an ensemble deep parsimonious random vector functional link network. Eng. Appl. Artif. Intell. 2024 , 133 , 108139. [ Google Scholar ] [ CrossRef ]
  • Xiao, Z.; Fu, X.J.; Zhang, L.Y.; Goh, R.S.M. Traffic Pattern Mining and Forecasting Technologies in Maritime Traffic Service Networks: A Comprehensive Survey. IEEE Trans. Intell. Transp. Syst. 2020 , 21 , 1796–1825. [ Google Scholar ] [ CrossRef ]
  • Yan, R.; Wang, S.A.; Psaraftis, H.N. Data analytics for fuel consumption management in maritime transportation: Status and perspectives. Transp. Res. Part E Logist. Transp. Rev. 2021 , 155 , 102489. [ Google Scholar ] [ CrossRef ]
  • Filom, S.; Amiri, A.M.; Razavi, S. Applications of machine learning methods in port operations—A systematic literature review. Transp. Res. Part E-Logist. Transp. Rev. 2022 , 161 , 102722. [ Google Scholar ] [ CrossRef ]
  • Ksciuk, J.; Kuhlemann, S.; Tierney, K.; Koberstein, A. Uncertainty in maritime ship routing and scheduling: A Literature review. Eur. J. Oper. Res. 2023 , 308 , 499–524. [ Google Scholar ] [ CrossRef ]
  • Jia, H.; Prakash, V.; Smith, T. Estimating vessel payloads in bulk shipping using AIS data. Int. J. Shipp. Transp. Logist. 2019 , 11 , 25–40. [ Google Scholar ] [ CrossRef ]
  • Yang, D.; Wu, L.X.; Wang, S.A.; Jia, H.Y.; Li, K.X. How big data enriches maritime research—A critical review of Automatic Identification System (AIS) data applications. Transp. Rev. 2019 , 39 , 755–773. [ Google Scholar ] [ CrossRef ]
  • Liu, M.; Zhao, Y.; Wang, J.; Liu, C.; Li, G. A Deep Learning Framework for Baltic Dry Index Forecasting. Procedia Comput. Sci. 2022 , 199 , 821–828. [ Google Scholar ] [ CrossRef ]
  • Wang, Y.C.; Wang, H.; Zou, D.X.; Fu, H.X. Ship roll prediction algorithm based on Bi-LSTM-TPA combined model. J. Mar. Sci. Eng. 2021 , 9 , 387. [ Google Scholar ] [ CrossRef ]
  • Xie, H.T.; Jiang, X.Q.; Hu, X.; Wu, Z.T.; Wang, G.Q.; Xie, K. High-efficiency and low-energy ship recognition strategy based on spiking neural network in SAR images. Front. Neurorobotics 2022 , 16 , 970832. [ Google Scholar ] [ CrossRef ]
  • Muñoz, D.U.; Ruiz-Aguilar, J.J.; González-Enrique, J.; Domínguez, I.J.T. A Deep Ensemble Neural Network Approach to Improve Predictions of Container Inspection Volume. In Proceedings of the 15th International Work-Conference on Artificial Neural Networks (IWANN), Gran Canaria, Spain, 12–14 June 2019; Springer: Berlin/Heidelberg, Germany, 2019; Volume 11506, pp. 806–817. [ Google Scholar ] [ CrossRef ]
  • Velasco-Gallego, C.; Lazakis, I. Mar-RUL: A remaining useful life prediction approach for fault prognostics of marine machinery. Appl. Ocean Res. 2023 , 140 , 103735. [ Google Scholar ] [ CrossRef ]
  • Zhang, X.; Zheng, K.; Wang, C.; Chen, J.; Qi, H. A novel deep reinforcement learning for POMDP-based autonomous ship collision decision-making. Neural Comput. Appl. 2023 , 1–15. [ Google Scholar ] [ CrossRef ]
  • Guo, X.X.; Zhang, X.T.; Lu, W.Y.; Tian, X.L.; Li, X. Real-time prediction of 6-DOF motions of a turret-moored FPSO in harsh sea state. Ocean Eng. 2022 , 265 , 112500. [ Google Scholar ] [ CrossRef ]
  • Kim, D.; Kim, T.; An, M.; Cho, Y.; Baek, Y.; IEEE. Edge AI-based early anomaly detection of LNG Carrier Main Engine systems. In Proceedings of the OCEANS Conference, Limerick, Ireland, 5–8 June 2023. [ Google Scholar ] [ CrossRef ]
  • Theodoropoulos, P.; Spandonidis, C.C.; Giannopoulos, F.; Fassois, S. A Deep Learning-Based Fault Detection Model for Optimization of Shipping Operations and Enhancement of Maritime Safety. Sensors 2021 , 21 , 5658. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Huang, B.; Wu, B.; Barry, M. Geographically and temporally weighted regression for modeling spatio-temporal variation in house prices. Int. J. Geogr. Inf. Sci. 2010 , 24 , 383–401. [ Google Scholar ] [ CrossRef ]
  • Zhang, W.; Xu, Y.; Streets, D.G.; Wang, C. How does decarbonization of the central heating industry affect employment? A spatiotemporal analysis from the perspective of urbanization. Energy Build. 2024 , 306 , 113912. [ Google Scholar ] [ CrossRef ]
  • Zhang, D.; Li, X.; Wan, C.; Man, J. A novel hybrid deep-learning framework for medium-term container throughput forecasting: An application to China’s Guangzhou, Qingdao and Shanghai hub ports. Marit. Econ. Logist. 2024 , 26 , 44–73. [ Google Scholar ] [ CrossRef ]
  • Wang, Y.; Wang, H.; Zhou, B.; Fu, H. Multi-dimensional prediction method based on Bi-LSTMC for ship roll. Ocean Eng. 2021 , 242 , 110106. [ Google Scholar ] [ CrossRef ]

Click here to enlarge figure

Ref.ArchitectureDatasetAdvantage
[ ]MSCNN-GRU-AMHF radarIt is applicable for high-frequency radar ship track prediction in environments with significant clutter and interference
[ ]CNN-BiLSTM-Attention6L34DF dual fuel diesel engineThe high prediction accuracy and early warning timeliness can provide interpretable fault prediction results
[ ]LSTMTwo LNG carriersEnables early anomaly detection in new ships and new equipment
[ ]LSTMsensorsbetter and high-precision effects
[ ]Self-Attention-BiLSTMA real military shipNot only can it better capture complex ship attitude changes, but it also shows greater accuracy and stability in long-term forecasting tasks
[ ]CNN–GRU–AMA C11 containershipbetter accuracy of forecasting
[ ]GRUA scaled model testgood prediction accuracy
[ ]CNNA bulk carriergood prediction accuracy
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Wang, M.; Guo, X.; She, Y.; Zhou, Y.; Liang, M.; Chen, Z.S. Advancements in Deep Learning Techniques for Time Series Forecasting in Maritime Applications: A Comprehensive Review. Information 2024 , 15 , 507. https://doi.org/10.3390/info15080507

Wang M, Guo X, She Y, Zhou Y, Liang M, Chen ZS. Advancements in Deep Learning Techniques for Time Series Forecasting in Maritime Applications: A Comprehensive Review. Information . 2024; 15(8):507. https://doi.org/10.3390/info15080507

Wang, Meng, Xinyan Guo, Yanling She, Yang Zhou, Maohan Liang, and Zhong Shuo Chen. 2024. "Advancements in Deep Learning Techniques for Time Series Forecasting in Maritime Applications: A Comprehensive Review" Information 15, no. 8: 507. https://doi.org/10.3390/info15080507

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

  • Open access
  • Published: 28 August 2024

Historical and practical aspects of macular buckle surgery in the treatment of myopic tractional maculopathy: case series and literature review

  • Francyne Veiga Reis Cyrino 1 ,
  • Moisés Moura de Lucena 1 ,
  • Letícia de Oliveira Audi 1 ,
  • José Afonso Ribeiro Ramos Filho 1 ,
  • João Pedro Romero Braga 1 ,
  • Thais Marino de Azeredo Bastos 1 ,
  • Igor Neves Coelho 1 &
  • Rodrigo Jorge 1  

International Journal of Retina and Vitreous volume  10 , Article number:  60 ( 2024 ) Cite this article

Metrics details

Uncorrected myopia is a leading cause of blindness globally, with a rising prevalence in recent decades. Pathological myopia, often seen in individuals with increased axial length (AXL), can result in severe structural changes in the posterior pole, including myopic tractional maculopathy (MTM). MTM arises from tractional forces at the vitreoretinal interface, leading to progressive macular retinoschisis, macular holes, and retinal detachment (RD). This study aims to outline preoperative evaluation and surgical indication criteria for MTM, based on the MTM staging system, and to share our Brazilian experience with three cases of macular buckle (MB) surgery, all with over a year of follow-up.

We conducted a retrospective analysis of three cases of MTM-associated RD treated with MB surgery, with or without pars plana vitrectomy. Preoperative evaluations included optical coherence tomography (OCT) and ultrasonography (USG) to assess the extent of macular involvement and retinal detachment. Surgical indications were determined based on the MTM staging system. The MB was assembled using customizable and accessible materials. Surgical procedures varied according to the specific needs of each case. An informed consent form regarding the surgical procedure was appropriately obtained for each case. The study was conducted with the proper approval of the institution’s ethics committee.

All three cases demonstrated successful retinal attachment during the mean follow-up of eighteen months. In the first case, combined phacoemulsification, vitrectomy, and MB were performed for MTM with macular hole and RD. The second case required MB and vitrectomy after two failed RD surgeries. In the third case, a macular detachment with an internal lamellar hole was treated with MB alone. These cases highlight the efficacy of MB surgery in managing MTM in highly myopic eyes.

Conclusions

MB surgery is an effective treatment option for MTM-associated RD in highly myopic eyes, providing long-term retinal attachment. Our experience demonstrates that with proper preoperative evaluation and surgical planning, MB can be successfully implemented using accessible materials, offering a viable solution in resource-limited settings. Further studies with larger sample sizes are warranted to validate these findings and refine surgical techniques.

Introduction

Uncorrected myopia is considered one of the leading causes of blindness worldwide [ 1 ], and its prevalence has grown significantly in recent decades [ 2 ]. Specifically, in myopic individuals with increased axial length (AXL), structural changes may occur in the posterior pole that characterizes pathological myopia, including posterior staphyloma, myopic macular degeneration, optic neuropathy associated with myopia, and myopic tractional maculopathy (MTM) [ 3 , 4 ]. The incidence of pathological myopia increases with age but can also occur in younger patients [ 5 ]. The impact of myopic maculopathy lies in its frequent occurrence in both eyes, its irreversibility, and its potential to affect individuals of working age [ 6 ].

MTM is a specific condition of pathological myopia secondary to tangential and anteroposterior tractional alterations at the vitreoretinal interface, where the retina is unable to adapt to the progressive increase in AXL and ends up undergoing structural changes. Characteristically, it involves a progressive combination of macular retinoschisis, lamellar or full-thickness macular holes, and, ultimately, retinal detachment (RD) [ 1 ]. Hence, while antiangiogenic therapy is used to treat neovascular membranes and there is no treatment for atrophic changes, MTM, and its complications require precise surgical interventions, and Macular buckle (MB) surgery, with or without vitrectomy, is one of the surgical techniques options.

In this study, we present the historical aspects of MB, discussing preoperative evaluation and criteria for surgical indication. Hereby we also discuss our experience with MB surgery cases, describing the assembly of a customizable MB using accessible materials.

Historical context and evolution of the macular buckle

The surgical treatment of RD has undergone revolutionary advancements following the theory developed by Jules Gonin in 1921, which involved surgically blocking tears and breaks in the retina [ 2 ]. However, it was soon understood that cases of surgical failure were related to the traction exerted by the vitreous on areas of retinal discontinuity, perpetuating the infiltration of subretinal fluid [ 3 , 4 ]. In an attempt to alleviate this traction by approximating the underlying choroid to the detached retina, several authors proposed techniques such as subchoroidal injection of plasma, transient indentation with gauze, or even a piece of plastic sutured to the sclera near the treated area [ 5 , 6 ]. In 1957, Schepens conceived the technique now known as scleral buckling, revolutionizing retinal surgery, and also proposing some adaptations for the treatment of the macular region in cases of retinal detachment associated with macular holes by positioning the buckle beneath the macular region [ 6 ].

Over time, other MB techniques were developed by different authors [ 7 , 8 , 9 , 10 , 11 , 12 ]. In 1980, Ando [ 13 ] created the first solid silicone MB, facilitating its implantation without the need for muscle disinsertion or suturing of the implant to the thinned posterior sclera. However, it presented limitations such as the adjustment of force and interference in imaging exams due to the presence of embedded metal [ 14 ]. In 2012, Stirpe et al. developed a new MB that did not contain metal wires and had adjustable sutures [ 15 ], while Mateo et al. proposed the coupling of an illuminated probe to facilitate the precise positioning of Ando’s MB beneath the macula [ 16 ].

Unfortunately, Ando’s device presents limitations regarding shape, tension adjustment, and posterior suture thus hindering its reproducibility. Hence, certain authors explored alternative methods to tailor their implants, such as utilizing silicone sponges internally coated with stainless steel [ 17 ] or employing a titanium stent [ 18 , 19 ], as described by Parolini et al. (2013). In their report, Parolini et al. detailed three cases where they utilized MB exclusively for macular detachment unrelated to macular holes. Additionally, they introduced a novel L-shaped design of MB devoid of posterior sutures, enhancing its feasibility for surgical implementation [ 18 ].

In Brazil, there are no commercially available MBs, so we chose to manufacture one following the descriptions provided by Parolini et al. [ 18 ], as we will describe throughout this article.

Preoperative evaluation, imaging exams in myopic tractional maculopathy, and their role in the surgical indication of macular buckle

Macular buckle surgery requires a comprehensive preoperative ophthalmological assessment and complementary imaging exams to assist in the classification of MTM and surgical planning. Here, we highlight and discuss ocular ultrasonography (USG) and optical coherence tomography (OCT).

Ocular ultrasonography

The importance of USG in the surgical planning of MB procedures lies in its ability to assess vitreous and retinal conditions, such as the presence of anteroposterior vitreoretinal tractions (VMT) and/or tears, and to locate and estimate the extent of RD. OCT can also be useful for identifying VTM, but standard OCT does not have sufficient width and depth to capture the entire retinal detachment. Sometimes, in eyes with very high myopia, it is challenging to acquire images of the macular holes and, in these cases, examining with the patient using contact lenses can provide better image acquisition. As wide-field OCT is not available in Brazil, USG is very useful in these situations.

USG also aids in selecting the appropriate surgical technique and determining the indication for MB [ 18 , 19 ]. Moreover, it facilitates the measurement of AXL in cases where optical biometry is unreliable, allows for the accurate calculation of intraocular lens power using the immersion technique to avoid corneal compression [ 21 ], assists in identifying structures in cases of media opacity, and ensures accurate intraoperative positioning and postoperative follow-up of the MB. Regarding the anesthetic procedure, USG is essential in evaluating the size of the staphyloma, helping to select the most suitable anesthetic method for highly myopic eyes (retrobulbar block or subtenon anesthesia) to avoid complications such as ocular perforation or intraocular injection of anesthetic in significantly large eyes [ 22 , 23 , 24 ].

Optical coherence tomography

The diagnosis and monitoring of MTM can be challenging due to the atrophic changes associated with pathological myopia. In this context, OCT has emerged as a fundamental diagnostic method for the non-invasive and detailed evaluation of the vitreoretinal interface, retinal layers, the retinal pigment epithelium, and the choroid, allowing for a better understanding and classification of these structures, as described below [ 25 , 26 , 27 , 28 ].

Classification and criteria for surgical indication in MTM based on OCT findings

The evaluation of OCT and the correct interpretation of findings are essential steps in surgical indication in MTM. In 2021, Parolini et al. [ 27 , 28 , 29 , 30 ] introduced a new OCT classification for MTM, which has strong reproducibility between examiners, intending to streamline information sharing and improve understanding of disease progression. [ 29 ]. The MTM staging system (MSS) categorizes findings into two types of evolution: perpendicular and tangential. Perpendicular evolution describes the anatomical sequence of predominantly internal or inner retinoschisis (stage 1), predominantly external retinoschisis (stage 2), retinoschisis with macular detachment (stage 3), and complete macular detachment without schisis (stage 4). Tangential evolution, in turn, describes the anatomical sequence of preserved foveal contour (a), internal lamellar macular hole (b), and full-thickness macular hole (c). This classification allows for the combination of evolution types, facilitating disease categorization. The occurrence of external lamellar macular holes is described in the classification as “O”, which can happen at any stage, while the presence of epiretinal abnormalities is indicated as “Plus” [ 28 ].

Based on the MSS, a surgical management approach for MTM was proposed. The idea is that comparing MB vitrectomy and pars plana vitrectomy (PPV) alone does not make sense, as each approach has its value in treatment. Early-stage cases warrant observation (stages 1a and 2a), while intervention is reserved for those who experience a progressive decline in visual acuity (stages 1b and 2b). When tangential forces predominate, PPV alone presents good results in stages 1a, with significant epiretinal membrane, and 1b and 1c.

In cases where perpendicular evolution predominates, MB alone has proven effective in stages 2b, 3a, 3b, 4a, and 4b. If epiretinal abnormalities are identified as clinically significant for visual improvement following the MB procedure, rapprochement with PPV remains a viable option. Finally, in cases where perpendicular and tangential forces are present, leading to macular involvement and/or macular or retinal detachment, MB + PPV is indicated (stages 2c, 3c, and 4c). The presence of “plus” alterations may require surgical intervention to improve complaints of metamorphopsia. Table  1 summarizes OCT findings and their implications in surgical indication [ 30 ].

Based on the criteria outlined by Parolini et al. [ 28 , 29 , 30 ], we sought to share our experience in this small case series, where all patients underwent MB surgery, with or without PPV, and have been followed up for over a year. Additionally, we will outline the methodology employed for the MB procedure and offer a concise analysis of the results, correlating them with the current literature.

This retrospective study analyzed three patients with MTM-associated RD treated with MB surgery, with or without PPV. Preoperative evaluations used OCT and USG to determine macular involvement and the extent of RD. Surgical indications were guided by the MTM staging system, and the MB was assembled using customizable materials. Procedures were tailored to the specific needs of each patient. All participants provided written informed consent. The study received approval from the ethics committee of the Clinical Hospital of the University of São Paulo, Ribeirão Preto, SP, Brazil, and adhered to the principles of the Declaration of Helsinki.

Cases report

We describe the surgical management of three cases of highly myopic eyes with MTM, where MB surgery was performed. In cases 1 and 2, RD was associated with a macular hole (MH). In case 2, the indication for MB was due to two previous failures of vitreoretinal surgery (PPV) for the treatment of retinal detachment with a macular hole. In case 3, a macular detachment was associated with an internal lamellar hole. Table  2 summarizes the main findings of each case, and Figs.  1 , 2 and 3 illustrate them.

figure 1

a : Color fundus photographs of wide-field preoperative imaging, showing retinal detachment in the posterior pole with a macular hole in the left eye (OS); b : Postoperative color fundus photography of the OS with attached retina and a residual gas bubble; c : Preoperative USG evidencing retinal detachment and posterior staphyloma; d : Intraoperative USG evidencing correct positioning of the buckle flattening the posterior staphyloma; e : Preoperative OCT showing a retinal detachment with associated macular hole; f : Postoperative OCT showing a reattached retina with a grade 2 macular hole closure exhibiting applied edges (grade 2 closure, Kang et al.’s classification [ 31 ])

figure 2

a : Ultrasound of the left eye shows retinal detachment; b : Postoperative OCT reveals attached retina; c : Postoperative color fundus photography of the left eye demonstrates a reattached retina

figure 3

a : Preoperative USG showing a large posterior staphyloma with macular detachment (arrow); b : Postoperative USG evidencing flattening of the posterior staphyloma due to the positioning of the buckle; c : Preoperative OCT showing an internal lamellar hole with macular detachment and nasal macular retinoschisis. Vitreomacular adhesion can also be observed; d : Postoperative OCT evidencing flattening of the posterior staphyloma, resolution of the lamellar hole, and macular detachment, as well as reduction of retinoschisis; the vitreomacular adhesion remains stable; e : fundus retinography showing attached retina

Description of implant fabrication and the surgical technique

one 1.5-mm titanium microplate for osteosynthesis containing 8 holes Traumec ® (Medical Support, Brazil); one 270 sleeve-type band (Labitician, USA); one 506G oval sponge (Labitician, USA); one 15-degree blade; pliers, and strong scissors (Fig.  4 a).

Implant fabrication

We used a titanium osteosynthesis plate containing 16 holes, which was cut in half (8 holes) using strong scissors (or pliers), creating the ideal size for our implant. This plate was then inserted into a 270 sleeve-type band (sleeve), covering its entire surface, with the help of Kelly forceps to open the sleeve and facilitate plate insertion, preventing any tearing. Approximately 2.0 mm of the band should be left beyond the plate on the vertical portion to protect the extremity and prevent conjunctival erosion after fixation. The plate is then bent into an “L” shape using pliers, leaving 3 holes horizontally (short arm of the L) and 5 holes vertically (long arm of the L). Next, a tunnel is made in the middle of the linear length of the 506G sponge with a 15-degree blade, ensuring it is longer than the short arm of the titanium plate to cover it, and without letting the tunnel pierce the sponge (to avoid plate exposure). Finally, the short arm of the L-shaped plate is inserted into the 506G sponge through the tunnel, and the 506G sponge should then be cut to cover the short arm of the implant, leaving at least 1.0 mm beyond the implant length to prevent exposure beyond the sponge (Fig.  4 a-c).

Surgical technique

The initial procedures remain similar, whether isolated MB surgery or combined surgery with vitrectomy is performed. The procedure begins with a temporal peritomy at the limbus of the conjunctiva and Tenon’s capsule from 11 to 4 o’clock. The lateral and superior rectus muscles were isolated using a suture of silk thread 2.0 (Ethicon, Johnson & Johnson, Brazil) to promote eye motility. Before positioning the implant, anterior chamber paracentesis is performed to reduce intraocular pressure (IOP) and minimize pressure changes when positioning the MB. Next, the implant is placed in the upper temporal quadrant, where the shorter arm will be positioned under the macula, and the longer arm should be inserted parallel to the lateral rectus muscle (Fig.  4 d). After, a 25-gauge Chandelier optic fiber is positioned at 6 o’clock (Alcon Constellation Vision System, USA) to enable visualization of the fundus.

Subsequently, we confirm the proper positioning of the implant under the macular region using a panoramic visualization system coupled to a microscope (Resight 500 ® , Zeiss) with delicate manipulation of the implant. Once the MB positioning is confirmed, the vertical portion of the device (long arm) is sutured to the sclera using 5.0 Mersilene ® suture (Ethicon, Johnson & Johnson, Brazil) with 2 separate stitches. In order to confirm the proper positioning of the MB, we perform preoperative USG, covering the USG probe and cable with a sterile plastic cover, and at the same time, it is possible to measure the comparative AXL.

figure 4

a : Material to be used for the fabrication of the macular buckle b : Schematic figure of the shape to be molded for the buckle; c : MB fabricated in the operating room for the described cases; d : Postoperative aspect of the correctly positioned macular buckle; it can be observed under the conjunctiva in the upper temporal quadrant

As reported above, in two cases, where there was retinal detachment associated to MH, we performed combined MB and PPV surgery (cases 1 and 2), and after positioning the MB, we routinely carried out PPV surgery. In case one, besides PPV and MB, phacoemulsification was carried out, and C3F8 was chosen as a vitreous substitute. In case 2, due to the history of previous PPV and retinal re-detachment with MH, it was decided to use silicone oil as a vitreous substitute in addition to MB. One case presenting an internal lamellar hole (stage 4b) with macular detachment and nasal macular retinoschisis (patient 3) was managed only with MB, despite slight vitreomacular adherence, which was not considered significant.

In the immediate postoperative period of the three cases operated at our service, the patients presented with slight hyperemia, mild pain improved with analgesic (dipyrone), and none showed increased IOP. Patient 3 presented with retinal hemorrhage in the posterior pole in the immediate postoperative period, probably due to the significant reduction of the large preoperative staphyloma after MB implantation. The approach was expectant, and there was complete absorption of the hemorrhage, and progressive reabsorption of the subretinal fluid, leading to the repositioning of the macula throughout the following months, despite a stable vitreomacular adhesion may be seen. In patient 1, during follow-up, the attached retina and grade 2 closure of the macular hole were observed (according to Kang et al.’s classification) [ 31 ]. Patient 2 evolved also with retina applied, macular hole closure, and silicon oil. There were no reports of diplopia among the operated patients and/or limitations in ocular mobility.

All three patients (100%) showed visual acuity improvement after surgery, maintaining retina attached and stable vision for more than a year of follow-up. No patient (100%) experienced complications such as conjunctival erosion, displacement/rotation of the MB, endophthalmitis, or anterior chamber reactions throughout the follow-up period.

The use of MB surgery significantly decreased in the 1980s with the advancement of vitrectomy, primarily because of technical difficulties and the lack of related scientific studies at that time [ 32 , 33 ]. Nonetheless, in highly myopic eyes with posterior staphyloma, PPV can result in surgical failures in 26.7 to 50% of cases due to the inability to alter the axial length of the eye and reduce the anteroposterior forces exerted by the staphyloma [ 34 ]. The use of MB in these circumstances can reduce the anteroposterior force, providing positive results. This evidence, combined with the relevant study by Sasoh et al., which demonstrated good results and safety of MB use in the early 2000s, encouraged the resumption of studies and the development of the MB technique [ 35 ].

In 2001, Ripandelli et al. [ 36 ], compared highly myopic patients with retinal detachment and macular holes undergoing vitrectomy via pars plana (group A) and MB surgery (group B). They observed a surgical success rate of 73.3% in group A and 93.3% in group B, with group B also showing a significant improvement in vision, unlike the vitrectomy group. These results suggested anatomical and functional superiority when MB was used. Similarly, Ando et al., in 2007, reported anatomical success in the MB group in 93.3% of cases after the first surgery and 100% after the second procedure, while only 50% of the cases treated with vitrectomy achieved retinal reattachment in the first procedure, and 86% in the second approach, which was associated with MB [ 37 ].

In a literature review, Alkabes and Mateo [ 32 ] showed that after MB surgery, the retinal reattachment rate ranged from 81.8 to 100%, while the MH closure rate ranged from 40 to 93.3%. Although persistent MH was identified as a risk factor for retinal re-detachment, eyes with persistent MH that underwent MB did not experience retinal re-detachment. Furthermore, the literature indicates that patients with AXLs greater than 30 mm have a higher risk of early retinal re-detachment after PPV. Several studies have shown statistically significant higher rates of retinal re-detachment after PPV for treating RD associated with MH in patients with AXL > 30 mm [ 38 , 39 , 40 ]. For these patients, when undergoing the MB procedure, the retina was reattached in 100% of cases and the MH closure rate ranged from 40 to 100%. Notably, no re-detachment was observed in cases of persistent MH [ 32 ]. In our two cases involving RD and MH that underwent combined surgery, both achieved successful outcomes with retinal reattachment and macular hole closure, with no retinal re-detachment observed.

In general, outcomes of both PPV or MB procedures have been shown to be effective in improving retinal anatomy and visual acuity. However, PPV, particularly when combined with internal limiting membrane (ILM) peeling, is associated with a higher incidence of postoperative MH. Due to the lack of randomized studies, it is challenging to determine if MB or PPV is superior for treating progressive macular foveoschisis. Given its progressive nature and potential for RD with MH, surgical intervention should be considered if the schisis progresses or visual acuity decreases. Regular OCT monitoring and early interventions based on physician experience are recommended [ 32 , 41 , 42 ].

Regarding complications, patient 3 experienced retinal hemorrhage following MB surgery, which resolved spontaneously within one month. This patient had a deep staphyloma of the posterior pole, and after MB, the AXL was significantly reduced by 7.9 mm. Despite performing a paracentesis at the beginning of the procedure, no hypotony was observed. We attributed the retinal hemorrhage to the pronounced reduction in AXL. Mateo and colleagues previously described cases where excessive compression of the choroidal vessels could lead to increased local hydrostatic pressure and changes in the RPE, resulting in subretinal fluid and, in some cases, macular atrophy [ 32 , 43 ]. However, we did not observe any of these complications in patient 3 or the other patients.

Other potential complications reported in various case series include scleral perforation, orbital fat prolapse, improper positioning of the explant, and ocular muscle disinsertion during buckle placement [ 32 ]. During the mean follow-up period of eighteen months, no issues such as intraocular pressure changes, strabismus, eye movement restriction, explant displacement, choroidal effusion, choroidal detachment, or posterior pole atrophy were observed.

As demonstrated by Parolini et al., the management of MTM can range from using MB alone to performing combined surgeries. When full-thickness macular holes and macular or retinal detachment are present, a combination of PPV and MB is recommended, as each surgical method targets different force vectors affecting MTM [ 29 , 30 ].

Despite the positive outcomes demonstrated in this report and the literature, MB can present complications. It is essential to carefully evaluate the risk-benefit ratio carefully and reserve its use for cases where it is truly necessary, based on an appropriate classification system. Therefore, we recommend considering MB + PPV surgery as the first choice for highly myopic patients with macular RD associated with MH, given the high rates of retinal re-detachment. In our small case series reported herein, success was achieved with combined surgery in two of our cases and MB alone in one case, proving to be effective in improving anatomical and functional outcomes without the need for additional interventions. None of the patients experienced re-RD with combined surgery or MB alone, which is consistent with the literature.

Finally, it is important to emphasize that the contralateral eye of all three patients continues to be followed up with OCT and fundoscopy. Macular buckling should be considered if any anatomical or visual deterioration occurs, depending on the classification of tractional maculopathy.

MB has proven to be effective in our small experience, whether alone or conjunction with PPV, in managing MTM. Its indication should consider the pathophysiological mechanism of MTM, which is influenced by tangential and anteroposterior forces, with PPV often needing to be combined in many cases. Decision-making should be based on the patient’s evolution regarding symptoms of decreased vision, anatomical findings on fundoscopy, ocular ultrasound, and based on OCT classification. The postoperative results reported here, and in the literature, have shown good anatomical and functional results, the absence of recurrence of retinal detachment, showing that the macular buckle can contribute to better results in eyes with very long axial lengths.

Data availability

No datasets were generated or analysed during the current study.

Abbreviations

Axial length

Counting fingers

Intraocular pressure

Macular buckle

Macular hole

MTM staging system

Myopic tractional maculopathy

Pars plana vitrectomy

Retinal detachment

Silicon oil

Phacoemulsification

Ultrasonography

Visual acuity

Flitcroft DI, He M, Jonas JB, Jong M, Naidoo K, Ohno-Matsui K, et al. IMI – defining and classifying myopia: a proposed set of standards for clinical and epidemiologic studies. Invest Ophthalmol Vis Sci. 2019;60(3):M20–30.

Article   PubMed   PubMed Central   Google Scholar  

Morais FB. Jules Gonin and the Nobel Prize: Pioneer of retinal detachment surgery who almost received a Nobel Prize in medicine. 4, Int J Retina Vitreous. 2018.

Schepens CL. Progress in detachment surgery. Trans Am Acad Ophthalmol Otolaryngol. 1951;55.

Schepens CL. Clinical aspects of pathologic changes in the vitreous body. Am J Ophthalmol. 1954;38(1 PART 2).

Custodis E. Die Behandlung Der Netzhautablösung durch umschriebene diathermiekoagulation und einer mittels Plombenaufnähung Erzeugten Eindellung Der Sklera Im Bereich Des Risses. Klin Monbl Augenheilkd Augenarztl Fortbild. 1956;129(4).

Schepens CL, Okamura ID, Brockhurst RJ. The Scleral Buckling Procedures. Surgical Techniques and Management. AMA Arch Ophthalmol [Internet]. 1957;58(6):797–811. http://archopht.jamanetwork.com/

Rosengren B. The silver plomb method in macular holes. Trans Ophthalmol Soc U K. 1966;86:49–53.

CAS   PubMed   Google Scholar  

Theodossiadis GP. A simplified technique for the surgical treatment of retinal detachments resulting from macula holes (author’s transl)]. Klin Monbl Augenheilkd. 1973;162(6):719–28.

Siam A. Macular hole with central retinal detachment in high myopia with posterior staphyloma. Br J Ophthalmol. 1969;53(1):62–3.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Klöti R. Silver clip for central retinal detachments with macular hole. Mod Probl Ophthalmol. 1974;12(0):330–6.

PubMed   Google Scholar  

Feman SS, Hepler RS, Straatsma BR. Rhegmatogenous retinal detachment due to macular hole. Management with cryotherapy and a Y-shaped sling. Arch Ophthalmol. 1974;91(5):371–2.

Article   CAS   PubMed   Google Scholar  

Landolfo V, Albini L, Romano A. Macular hole-induced retinal detachment: treatment with an armed-silicone implant. Ophthalmic Surg. 1986;17(12):810–2.

Ando F. Use of a special macular explant in surgery for retinal detachment with macular hole. Jpn J Ophthalmol. 1980;24:29–34.

Google Scholar  

Susvar P, Sood G. Current concepts of macular buckle in myopic traction maculopathy. Indian Journal of Ophthalmology. Volume 66. Wolters Kluwer Medknow; 2018. pp. 1772–84.

Stirpe M, Ripandelli G, Rossi T, Cacciamani A, Orciuolo M. A new adjustable macular buckle designed for highly myopic eyes. Retina. 2012;32(7):1424–7. https://doi.org/10.1097/IAE.0b013e3182550648 .

Article   PubMed   Google Scholar  

Mateo C, Marco, Medeiros D, Alkabes M, Burés-Jelstrup A, Postorino M et al. Illuminated Ando Plombe for Optimal Positioning in Highly Myopic Eyes With Vitreoretinal Diseases Secondary to Posterior Staphyloma [Internet]. http://archopht.jamanetwork.com/

Mortada HA, Mortada HA. A Novel Episcleral Macular Buckling: wire-strengthened sponge exoplant for recurrent Macular Hole and Retinal detachment in high myopic eyes. Volume 2. Discovery & I nnovation Ophthalmology Journal; 2013.

Parolini BMD, Frisina RMD, Pinackatt SMD, Mete. Maurizio MD † . A New L-Shaped Design of Macular Buckle to Support a Posterior Staphyloma in High Myopia. Retina 33(7):p 1466–1470, July/August, 2013. | https://doi.org/10.1097/IAE.0b013e31828e69ea

Parolini B, Frisina R, Pinackatt S, Gasparotti R, Gatti E, Baldi A et al. Indications and results of a New L-shaped Macular Buckle to support a posterior staphyloma in high myopia. Retina. 2015.

Ahmed J, Shaikh F, Rizwan A, Memon MF, Ahmad J. Evaluation of Vitreo-Retinal pathologies using B-Scan Ultrasound. Pak J Ophthalmol. 2009;25(4).

Gulkilik G, Ustuner A, Ozdamar A. Comparison of optical coherence biometry and applanation ultrasound biometry in high-myopic eyes with posterior Pole staphyloma. Ann Ophthalmol. 2007;39(3).

Palte HD. Local and Regional Anesthesia Dovepress Ophthalmic regional blocks: management, challenges, and solutions. Local Reg Anesth. 2015;8(AUGUST).

Qureshi MA, Laghari K. Role of B-scan ultrasonography in pre-operative cataract patients. Int J Health Sci (Qassim). 2010;4(1).

Shinar Z, Chan L, Orlinsky M. Use of ocular ultrasound for the evaluation of retinal detachment. J Emerg Med. 2011;40(1).

Alanazi R, Schellini S, AlSheikh O, Elkhamary S. Scleral buckle induce orbital cellulitis and scleritis – a case report and literature review. Saudi J Ophthalmol. 2019;33(4).

Huang D, Swanson EA, Lin CP, Schuman JS, Stinson WG, Chang W et al. Optical coherence tomography. Science (1979). 1991;254(5035).

Panozzo G, Mercanti A. Optical Coherence Tomography Findings in Myopic Traction Maculopathy. Vol. 122, Arch Ophthalmol. 2004.

Parolini B, Palmieri M, Finzi A, Besozzi G, Lucente A, Nava U et al. The New Myopic Traction Maculopathy Staging System. Eur J Ophthalmol. 2021;31(3).

Parolini B, Arevalo JF, Hassan T, Kaiser P, Rezaei KA, Singh R et al. International Validation of myopic traction Maculopathy Staging System. Ophthalmic Surg Lasers Imaging Retina. 2023;54(3).

Parolini B, Palmieri M, Finzi A, Frisina R. Proposal for the management of myopic traction maculopathy based on the new MTM staging system. Eur J Ophthalmol. 2021;31(6).

Kang SW, Ahn K, Ham DI. Types of macular hole closure and their clinical implications. Br J Ophthalmol. 2003;87(8).

Alkabes M, Mateo C. Macular buckle technique in myopic traction maculopathy: a 16-year review of the literature and a comparison with vitreous surgery. Graefe’s Archive for Clinical and Experimental Ophthalmology. Volume 256. Springer; 2018. pp. 863–77.

Gonvers M, Machemer R. A New Approach to treating Retinal detachment with Macular Hole. Am J Ophthalmol. 1982;94(4):468–72.

Ikuno Y, Sayanagi K, Oshima T, Gomi F, Kusaka S, Kamei M, et al. Optical coherence tomographic findings of macular holes and retinal detachment after vitrectomy in highly myopic eyes. Am J Ophthalmol. 2003;136(3):477–81.

Sasoh M, Yoshida S, Ito Y, Matsui K, Osawa S, Uji Y. Macular buckling for retinal detachment due to macular hole in highly myopic eyes with posterior staphyloma. Retina. 2000;20(5):445–9.

Ripandelli G, Coppé AM, Fedeli R, Parisi V, D’Amico DJ, Stirpe M. Evaluation of primary surgical procedures for retinal detachment with macular hole in highly myopic eyes a randomized comparison of vitrectomy versus posterior episcleral buckling surgery. Ophthalmology. 2001;108(12):2258–64.

Ando F, Ohba N, Touura K, Hirose H. Anatomical and visual outcomes after episcleral macular buckling compared with those after pars plana vitrectomy for retinal detachment caused by macular hole in highly myopic eyes. Retina. 2007;27(1):37–44.

Suda K, Hangai M, Yoshimura N. Axial length and outcomes of macular hole surgery assessed by spectral-domain optical coherence tomography. Am J Ophthalmol. 2011;151(1).

Nadal J, Verdaguer P, Canut MI. Treatment of retinal detachment secondary to macular hole in high myopia: vitrectomy with dissection of the inner limiting membrane to the edge of the staphyloma and long-term tamponade. Retina. 2012;32(8).

Arias L, Caminal JM, Rubio MJ, Cobos E, Garcia-Bru P, Filloy A et al. Autofluorescence and axial length as prognostic factors for outcomes of macular hole retinal detachment surgery in high myopia. Retina. 2015;35(3).

Jo Y, Ikuno Y, Nishida K, Retinoschisis. A predictive factor in vitrectomy for macular holes without retinal detachment in highly myopic eyes. Br J Ophthalmol. 2012;96(2).

Sun CB, Liu Z, Xue AQ, Yao K. Natural evolution from macular retinoschisis to full-thickness macular hole in highly myopic eyes. Eye. 2010;24(12).

Mateo C, Burés-Jelstrup A. Macular buckling with Ando plombe may increase choroidal thickness and mimic serous retinal detachment seen in the tilted disk syndrome. Retin Cases Brief Rep. 2016 Fall;10(4):327–30.

Download references

Acknowledgements

We would like to express our sincere gratitude to Dr. Barbara Parolini for her invaluable contributions to the field of macular buckling surgery. Her pioneering work in describing the staging system for myopic tractional maculopathy and the surgical techniques for macular buckling has been instrumental in the execution and development of our study.

The authors received no financial support for the research, authorship, and/or publication of this article.

Author information

Authors and affiliations.

Department of Ophthalmology, Ribeirão Preto Medical School, University of São Paulo, 3900, Bandeirantes Ave, Ribeirão Preto, SP, 14049-900, Brazil

Francyne Veiga Reis Cyrino, Moisés Moura de Lucena, Letícia de Oliveira Audi, José Afonso Ribeiro Ramos Filho, João Pedro Romero Braga, Thais Marino de Azeredo Bastos, Igor Neves Coelho & Rodrigo Jorge

You can also search for this author in PubMed   Google Scholar

Contributions

F.C., J.R.F., and R.J. were primarily responsible for the research design. F.C., J.R.F., J.B., T.B., and I.C. were responsible for data acquisition. M.L., L.A., and I.C. performed the data analysis and drafted the initial manuscript. F.C., J.R.F. and R.J. provided critical revisions and contributed to the refinement of the manuscript. All authors reviewed and approved the final version of the manuscript.

Corresponding author

Correspondence to Francyne Veiga Reis Cyrino .

Ethics declarations

Ethics approval and consent to participate.

The institutional review board and ethics committee of the Division of Ophthalmology, Ribeirão Preto Medical School, University of São Paulo, Ribeirão Preto, Brazil, approved this study (CAAE: 79706624.6.0000.5440).

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Cyrino, F.V.R., de Lucena, M.M., de Oliveira Audi, L. et al. Historical and practical aspects of macular buckle surgery in the treatment of myopic tractional maculopathy: case series and literature review. Int J Retin Vitr 10 , 60 (2024). https://doi.org/10.1186/s40942-024-00578-w

Download citation

Received : 04 June 2024

Accepted : 20 August 2024

Published : 28 August 2024

DOI : https://doi.org/10.1186/s40942-024-00578-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

International Journal of Retina and Vitreous

ISSN: 2056-9920

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

time series analysis literature review

IMAGES

  1. Time Series Analysis

    time series analysis literature review

  2. Time series analysis Free Essay Example

    time series analysis literature review

  3. Time Series Analysis

    time series analysis literature review

  4. (PDF) A LITERATURE REVIEW ON TIME SERIES FORECASTING METHODS

    time series analysis literature review

  5. Chapter 9 Time Series Analysis

    time series analysis literature review

  6. Wiley: Introduction to Time Series Analysis and Forecasting, 2nd

    time series analysis literature review

VIDEO

  1. TIME SERIES ANALYSIS LESSON 1

  2. Unlocking the Secrets of arxiv

  3. Systematic Literature Review and Meta Analysis(literature review)(quantitative analysis)

  4. Time Series Analysis

  5. How to Research

  6. Lecture 1: Time Series analysis. The Nature of Time Series Data and Components of a Time Series

COMMENTS

  1. A LITERATURE REVIEW ON TIME SERIES FORECASTING METHODS

    Bournemouth. [email protected]. Abstract —The purpose of this study is to review time series forecasting methods and briefly explains the working of time series. forecasting methods. We ...

  2. Applications of time series analysis in epidemiology: Literature review

    Applications of time series analysis in epidemiology: Literature review and our experience during COVID-19 pandemic ... Time series analysis is a valuable tool in epidemiology that complements the classical epidemiological models in two different ways: Prediction and forecast. ... Viboud C, Ajelli M, Leung DT, Yu H. Serological evidence of ...

  3. Time series analysis with explanatory variables: A systematic

    Abstract. Time series analysis with explanatory variables encompasses methods to model and predict correlated data taking into account additional information, known as exogenous variables. A thorough search in literature returned a dearth of systematic literature reviews (SLR) on time series models with explanatory variables.

  4. A Comparative Analysis of Time Series Prediction Techniques a

    Time series analysis plays a crucial role in understanding and predicting the behavior of data that evolves over time. It finds applications in various domains, including finance, economics, weather forecasting, stock market analysis, and many others. ... The aim of this paper is to conduct a systematic literature review on time series ...

  5. Applications of time series analysis in epidemiology: Literature review

    Time series analysis is a valuable tool in epidemiology that complements the classical epidemiological models in two different ways: Prediction and forecast. ... resources and literature review ...

  6. Applications of time series analysis in epidemiology: Literature review

    Time series analysis is a valuable tool in epidemiology that complements the classical epidemiological models in two different ways: Prediction and forecast. ... Applications of time series analysis in epidemiology: Literature review and our experience during COVID-19 pandemic World J Clin Cases. 2023 Oct 16;11 ...

  7. A Systematic Review of Time Series Classification Techniques Used in

    The motivation for this review came from the observation that the types of algorithms explored and the depth of analysis performed in time series biomedical data science have not been well described. ... The literature search was limited to the last six years for a manageable scope of review. ... Flow chart of the common steps in time series ...

  8. A Systematic Review of Methodology: Time Series Regression Analysis for

    The literature review was conducted following the guidelines of Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) . Time series regression model. ... While time series analysis with GLMs or GAMs is the established method in environmental epidemiology research, our review brings attention to several potential issues ...

  9. Time series analysis with explanatory variables: A systematic

    A time series analysis literature review by Macaira et al. (2018) shows that regression model is the method with the highest number of applications, followed by artificial neural networks (ANNs ...

  10. Interval time series forecasting: A systematic literature review

    To achieve this goal, we have conducted a systematic literature review, comprising search strategy planning, screening mechanism determination, document analysis, and report generation. During the search strategy planning stage, eight literature search libraries are selected to obtain the most extensive studies (total of 525 targets).

  11. PDF Literature review of modern time series forecasting methods

    Literature review of modern time series forecasting methods (This document covers the stochastic linear model approaches) By Paul Karapanagiotidis July 31, 2012 ... Nerlove and Grether (1970)), the í õ ó ìs were to become dominated by the time-domain analysis techniques advocated by Box and Jenkins (1970). There are various reasons for this.

  12. A Comparative Analysis of Time Series Prediction Techniques a

    This paper highlights the significance of systematic literature reviews and explores the different techniques employed in these reviews, including statistical methods, machine learning, deep learning, and hybrid methods. ... Evaluation of multivariate transductive neuro-fuzzy inference system for multivariate time-series analysis and modelling ...

  13. Time Series Analysis and Modeling to Forecast: a Survey

    View a PDF of the paper titled Time Series Analysis and Modeling to Forecast: a Survey, by Fatoumata Dama and 1 other authors. Time series modeling for predictive purpose has been an active research area of machine learning for many years. However, no sufficiently comprehensive and meanwhile substantive survey was offered so far.

  14. Time series analysis with explanatory variables: A systematic

    Time series analysis with explanatory variables encompasses methods to model and predict correlated data taking into account additional information, known as exogenous variables. A thorough search in literature returned a dearth of systematic literature reviews (SLR) on time series models with explanatory variables. The main objective is to fill this gap by applying a rigorous and reproducible ...

  15. PDF A Literature Survey of Time Series Forecasting Approaches

    A LITERATURE SURVEY OF TIME SERIES FORECASTING APPROACHES Smitkumar Arvindbhai Patel, 2023 Abstract: This literature review offers a comprehensive analysis of time series forecasting techniques. It explores traditional methods such as autoregressive integrated moving average (ARIMA) and exponential smoothing, focusing on their strengths and ...

  16. Time Series Data Analysis for Forecasting

    A literature review of the use of DM and statistical approaches with time series data, focusing on weather prediction, is presented, attracting a great deal of attention from researchers in the field. In today's world there is ample opportunity to clout the numerous sources of time series data available for decision making. This time ordered data can be used to improve decision making if the ...

  17. Time Series Data Analysis for Forecasting

    An analysis of history—a time series—can be used by management to make current decisions and plans based on long-term forecasting. One usually assumes that past patterns will continue into the future. Long-term forecasts extend more than 1 year into the future; 5-, 10-, 15-, and 20-year projections are common.

  18. A Systematic Review of Packages for Time Series Analysis

    This paper presents a systematic review of Python packages with a focus on time series analysis. The objective is to provide (1) an overview of the different time series analysis tasks and preprocessing methods implemented, and (2) an overview of the development characteristics of the packages (e.g., documentation, dependencies, and community size). This review is based on a search of ...

  19. Interval time series forecasting: A systematic literature review

    The. purpose of this research is to identify the most widely used definition of. interval time series; classify existing research into mature research, current. research focus, and research gaps ...

  20. Generative Adversarial Networks in Time Series: A Systematic Literature

    We provide a review of current state-of-the-art and novel time series GANs and their solutions to real-world problems with time series data. GANs have been gaining a lot of traction within the deep learning research community since their inception in 2014 [ 38 ].

  21. analysis A systematic review of Python packages for time series

    nalysis. The objective is to provide (1) an overview of the di erent time series analysis tasks and preprocessing methods implemented, and (2) an overview of the devel-opment characteristics of the packages (e.g., documentation, dependencies, and communi. y size). This review is based on a search of literature databases as well as GitHub repos.

  22. Time series big data: a survey on data stream frameworks, analysis and

    This article presents a literature review on how to process huge amounts of time series that are continuously being produced over time and need to be processed in real-time. Therefore, in Table 1 , we consider papers regarding big data, stream processing, real-time processing, machine learning and deep learning, forecasting, and anomaly detection.

  23. Advancements in Deep Learning Techniques for Time Series Forecasting in

    This paper reviews deep learning applications in time series analysis within the maritime industry, focusing on three areas: ship operation-related, port operation-related, and shipping market-related topics. ... M. Systematic Literature Review of Various Neural Network Techniques for Sea Surface Temperature Prediction Using Remote Sensing Data ...

  24. Historical and practical aspects of macular buckle surgery in the

    In a literature review, Alkabes and Mateo showed that after MB surgery, the retinal reattachment rate ranged from 81.8 to 100%, while the MH closure rate ranged from 40 to 93.3%. Although persistent MH was identified as a risk factor for retinal re-detachment, eyes with persistent MH that underwent MB did not experience retinal re-detachment.

  25. Big data and time series: A literature review paper

    This pap er has a goal to go through literature that refers to big data, time. series and different big data analytics methods using data mining. Keywords: big data, time series, data mining ...