Walking Back the Data Quantity Assumption to Improve Time Series Prediction in Deep Learning

A. Lazcano, P. Hidalgo, J. E. Sandubete Galán

Deep learning techniques have significantly advanced time series prediction by effectively modeling temporal dependencies, particularly for datasets with numerous observations. The deep learning models used for this study have been the Multi-Layer Perceptron neural nets (MLP), the Long Short-Term Memory neural nets (LSTM) and the Transformer neural nets. In this paper we have considered 4 different time series: (i) prices from Nominated Electricity Market Operator for the management of the daily and intraday market for electricity prices, (ii) data related to the West Texas Intermediate Crude Price Index, (iii) the Apple stock index, (iv) the gold price index using the XAU/USD pair. Although larger datasets are generally associated with improved accuracy, the results of this study demonstrate that this assumption does not always hold. By progressively increasing the amount of training data in a controlled experimental setup, the best predictive metrics were achieved in intermediate iterations, with variations of up to 66% in Root Mean Square Error (RMSE) and 44% in Mean Absolute Percentage Error (MAPE) across different models and datasets. The findings challenge the notion that more data necessarily leads to better generalization, showing that additional observations can sometimes result in diminishing returns or even degradation of predictive metrics. These results emphasize the importance of strategically balancing dataset size and model optimization to achieve robust and efficient performance. Such insights offer valuable guidance for time series forecasting, especially in contexts where computational efficiency and predictive accuracy must be optimized.

Palabras clave: Time series forecasting, Preprocessing techniques, Multi-layer perceptron neural network, Long short-term memory neural network, Transformer neural network.

Programado

Series Temporales
13 de junio de 2025 11:00
Sala VIP Jaume Morera i Galícia

Otros trabajos en la misma sesión

Efficient outlier detection for large time series databases

P. Galeano San Miguel, D. Peña, R. S. Tsay

Empirical study on the performance of conformal forecasting methods on real-world data

R. Morales Arsenal, C. Bergmeir, L. Escot Mangas

Impacto del comercio transfronterizo entre Ceuta y Marruecos

J. A. Martín Segura, C. Pérez López

Incertidumbre de los parámetros modales estimados en el análisis modal operacional de estructuras

J. Cara

Walking Back the Data Quantity Assumption to Improve Time Series Prediction in Deep Learning

Otros trabajos en la misma sesión

Política de cookies