Incomplete time-series data forecasting based on clustering fill-in technique and ensembling neural network model

Sirapat Chiewchanwattana

Please use this identifier to cite or link to this item: https://cuir.car.chula.ac.th/handle/123456789/67494

Title:	Incomplete time-series data forecasting based on clustering fill-in technique and ensembling neural network model
Other Titles:	การพยากรณ์ข้อมูลอนุกรมเวลาที่ไม่สมบูรณ์โดยใช้วิธีเติมเต็มแบบจัดกลุ่มข้อมูลให้สมบูรณ์ และวิธีประสานผลของตัวแบบโครงข่ายประสาท
Authors:	Sirapat Chiewchanwattana
Advisors:	Chidchanok Lursinsap
Other author:	Chulalongkorn University. Faculty of Science
Advisor's Email:	[email protected]
Subjects:	Time-series analysis Evolutionary computation Neural networks (Computer science) การวิเคราะห์อนุกรมเวลา การคำนวณเชิงวิวัฒนาการ นิวรัลเน็ตเวิร์ค (วิทยาการคอมพิวเตอร์)
Issue Date:	2005
Publisher:	Chulalongkorn University
Abstract:	This dissertation demonstrates the problem of incomplete time-series prediction by modelling the forecasting of several natural and social phenomena. The modeling consists of two main steps. The first step is to estimate the collected incomplete data, which are considered as missing data or missing values. The second step is to predict new data based on the nature of the data obtained from the first step. Our solution is to develop a new neural network model for forecasting incomplete time-series data and improving the accuracy of prediction. Two neural network models are proposed. First, various versions of EM-based algorithm and smoothing spline interpolation are used to preprocess the incomplete data sets. The individual networks are trained by supervised multilayer perceptron(MLP) with extended Kalman filtering. The ensemble construction is used for the combination of the individual networks. We name this type of network Fill-in - Generalized Ensemble Method (FI-GEM) networks. Second, each individual network uses a Finite Impulse Response model to perform the prediction. The outputs of all individual neural networks are combined by the genetic algorithm-based selective neural network ensemble method (GASEN). We denote this network as a reconstructed missing data-finite impulse response selective ensemble (RMD-FSE) network. Moreover, we proposed a new fill-in technique that is improved for estimating missing values based on clustering technique for characterizing the pattern of incomplete time-series data. The main idea is the time-series data are divided into separate subsequences of different sizes and, therefore, each subsequence can be viewed as a window. The imputation of missing samples is achieved by finding a complete subsequence similar to the missing sample subsequence and imputing the missing samples from this complete subsequence. The imputation accuracy of the proposed algorithm, namely varied window clustering (WDC) algorithm is comparable or better than the others traditional methods such as: the spline interpolation, the multiple imputation (MI), and the optimal completion strategy fuzzy c-means algorithm (OCSFCM) in case of the non-stationary time-series data.
Other Abstract:	วิทยานิพนธ์นี้นำเสนอการพยากรณ์ข้อมูลอนุกรมเวลาที่ไม่สมบูรณ์ โดยอาศัยการจำลองรูปแบบของโครงข่ายประสาทเทียม ซึ่งการจำลองนั้นสามารถแบ่งได้เป็นสองขั้นตอนดังนี้ ขั้นตอนที่หนึ่ง ทำการเติมเต็มข้อมูลอนุกรมเวลาที่ไม่สมบูรณ์นั้นให้สมบูรณ์ ในขั้นตอนที่สองทำการพยากรณ์ ข้อมูลอนุกรมเวลาที่ได้จากขั้นตอนที่หนึ่ง การแก้ปัญหาในงานนี้คือพัฒนาแบบจำลองโครงข่ายประสาทเทียมใหม่ สำหรับการพยากรณ์ข้อมูลอนุกรมเวลาที่ไม่สมบูรณ์ และยังต้องสามารถให้ความถูกต้องในการพยากรณ์เพิ่มขึ้นด้วย โดยได้นำเสนอแบบจำลองโครงข่ายประสาทเทียม สองแบบ แบบแรก ใช้วิธีการเติมเต็มข้อมูลแบบ EM หลายลักษณะ และวิธีการเติมเต็มข้อมูลแบบ Spline ซึ่งข้อมูลหลายๆ ชุดที่ถูกเติมเต็มจากหลายๆ วิธีนั้นจะถูกนำมาสอนโดยใช้โครงข่ายประสาทเทียม MLP โดยใช้แบบขยาย Kalman Filtering จากนั้นทำการประสานผลลัพธ์ของโครงข่ายประสาทเทียมทุกโครงข่ายเข้าด้วยกัน แบบจำลองโครงข่ายนี้ให้ชื่อว่า โครงข่าย F-GEM แบบที่สองปรับเปลี่ยนมาใช้โครงข่ายประสาทเทียม FIR เพื่อทำการพยากรณ์ จากนั้นผลลัพธ์ของโครงข่ายประสาทเทียมทุกโครงข่ายจะถูกประสานเข้าด้วยกันโดยใช้วิธีการเลือกโครงข่ายแบบ genetic algorithm ให้ชื่อแบบจำลองโครงข่ายนี้ว่า โครงข่าย RMD-FSE นอกจากนั้นยังได้นำเสนอวิธีการเติมเต็มข้อมูลแบบใหม่ เพื่อปรับปรุงการประมาณค่าข้อมูลที่หายไปนั้นให้ได้ค่าที่ถูกต้องมากยิ่งขึ้น โดยได้ใช้เทคนิคการจัดกลุ่ม โดยอาศัยคุณลักษณะของรูปแบบข้อมูลที่มีอยู่จริง แนวคิดหลักคือทำการตัดแบ่งข้อมูลอนุกรมเวลาออกเป็นหลายๆ ชิ้นที่มีขนาดต่างๆ กัน วิธีการคำนวณหาค่าข้อมูลที่หายไป จะคำนวณหาจากชิ้นข้อมูลที่มีความคล้ายกับชิ้นที่มีข้อมูลที่หายไปมากที่สุดแล้วทำการคำนวณหาค่าข้อมูลที่หายไปนั้น ให้ชื่อว่า ขั้นตอนวิธี WDC ซึ่งสามารถให้ผลที่เทียบเท่าหรือดีกว่าวิธีอื่น เช่น EM, M1, OCSFCM และ Spline ในกรณีของข้อมูลอนุกรมเวลาที่ไม่คงที่
Description:	Thesis (Ph.D.)--Chulalongkorn University, 2005
Degree Name:	Doctor of Philosophy
Degree Level:	Doctoral Degree
Degree Discipline:	Computer Science
URI:	http://cuir.car.chula.ac.th/handle/123456789/67494
ISBN:	9741767501
Type:	Thesis
Appears in Collections:	Sci - Theses

Files in This Item:

File	Description	Size	Format
Sirapat_ch_front_p.pdf	หน้าปก และบทคัดย่อ	1.08 MB	Adobe PDF	View/Open
Sirapat_ch_ch1_p.pdf	บทที่ 1	817.41 kB	Adobe PDF	View/Open
Sirapat_ch_ch2_p.pdf	บทที่ 2	759.73 kB	Adobe PDF	View/Open
Sirapat_ch_ch3_p.pdf	บทที่ 3	1.5 MB	Adobe PDF	View/Open
Sirapat_ch_ch4_p.pdf	บทที่ 4	4.33 MB	Adobe PDF	View/Open
Sirapat_ch_ch5_p.pdf	บทที่ 5	671.03 kB	Adobe PDF	View/Open
Sirapat_ch_back_p.pdf	บรรณานุกรม และภาคผนวก	975.99 kB	Adobe PDF	View/Open

Show full item record