การทำดัชนีที่แม่นยำสำหรับการค้นข้อมูลอนุกรมเวลาตามความคล้ายภายใต้ไทม์วอร์ปปิงด้วยการเข้าถึงข้อมูลแบบลำดับโดยใช้ดัชนี

พงศกร เรืองรองหิรัญญา

Please use this identifier to cite or link to this item: https://cuir.car.chula.ac.th/handle/123456789/31241

Title:	การทำดัชนีที่แม่นยำสำหรับการค้นข้อมูลอนุกรมเวลาตามความคล้ายภายใต้ไทม์วอร์ปปิงด้วยการเข้าถึงข้อมูลแบบลำดับโดยใช้ดัชนี
Other Titles:	Exact indexing for similarity search on time series data under time warping using indexed sequential access
Authors:	พงศกร เรืองรองหิรัญญา
Advisors:	โชติรัตน์ รัตนามหัทธนะ
Other author:	จุฬาลงกรณ์มหาวิทยาลัย. คณะวิศวกรรมศาสตร์
Advisor's Email:	[email protected]
Subjects:	การวิเคราะห์อนุกรมเวลา การทำดัชนี
Issue Date:	2551
Publisher:	จุฬาลงกรณ์มหาวิทยาลัย
Abstract:	การค้นคืนข้อมูลอนุกรมเวลาตามความคล้ายเป็นสิ่งที่สำคัญสำหรับงานประยุกต์มากมาย เช่น การค้นคืนข้อมูลมัลติมีเดียและการจำแนกข้อมูลเหล่านั้น สำหรับงานประยุกต์ดังกล่าวนั้นไดนามิกไทม์วอร์ปปิงจัดเป็นมาตรวัดที่ใช้สำหรับการค้นคืนข้อมูลอนุกรมเวลาที่มีความแม่นยำมากที่สุดวิธีหนึ่ง อย่างไรก็ตามการใช้ไดนามิกไทม์วอร์ปปิงสำหรับการค้นคืนข้อมูลนั้นต้องใช้เวลาในการคำนวณสูง ซึ่งปัญหาดังกล่าวเป็นประเด็นที่มีนักวิจัยมากมายสนใจที่จะพัฒนาให้มีความรวดเร็วมากยิ่งขึ้น จนเมื่อไม่นานมานี้ได้มีงานวิจัยที่ได้เสนอวิธีการแก้ปัญหาดังกล่าวด้วยฟังก์ชันขอบเขตล่างซึ่งสามารถลดทอนการคำนวณไดนามิกไทม์วอร์ปปิงและสามารถเพิ่มความเร็วในการค้นคืนข้อมูลได้อย่างมีประสิทธิภาพ แต่สำหรับในงานด้านฐานข้อมูล ประเด็นสำคัญนั้นไม่ได้อยู่เพียงแค่การลดเวลาในการคำนวณสำหรับการค้นคืนข้อมูลเท่านั้น แต่อยู่ที่จะทำอย่างไรเพื่อที่จะลดการเข้าถึงข้อมูลหรืออินพุต / เอาต์พุตใหได้มากที่สุด ซึ่งวิธีที่ใช้กันทั่วไปก็คือวิธีการทำดัชนี แต่จนถึงปัจจุบันยังไม่มีวิธีการทำดัชนีข้อมูลอนุกรมเวลาที่มีประสิทธิภาพเนื่องจากต้องเผชิญกับปัญหาสำคัญอยู่สองประการ ประการแรกเกิดจากการที่ข้อมูลอนุกรมเวลานั้นมีจำนวนมิติที่สูงมาก และในประการที่สองเนื่องจากการคำนวณไดนามิกไทม์วอร์ปปิงนั้นจุดข้อมูลหนึ่งสามารถมีความสัมพันธ์กับจุดข้อมูลที่อยู่คนละมิติได้ ด้วยเหตุนี้วิธีการทำดัชนีข้อมูลอนุกรมเวลาที่มีอยู่ในปัจจุบันยังคงไม่สามารถลดทอนการเข้าถึงข้อมูลให้ได้มากเพียงพอกับเวลาที่ต้องเสียเพิ่มเติมจากการเข้าถึงข้อมูลแบบสุ่ม กล่าวคือการกราดตรวจข้อมูลตามลำดับนั้นสามารถค้นคืนข้อมูลได้เร็วกว่าการทำดัชนีทุกวิธีที่มีอยู่ในปัจจุบัน ดังนั้นงานวิจัยนี้จึงนำเสนอวิธีการทำดัชนีข้อมูลอนุกรมเวลาที่มีประสิทธิภาพด้วยการนำวิธีการจับกลุ่มข้อมูลมาประยุกต์ใช้ โดยใช้แนวคิดในการจัดลำดับการเข้าถึงข้อมูลในแต่ละกลุ่มข้อมูลโดยเรียงตามค่าระยะทางขอบเขตล่างที่ได้นำเสนอ ซึ่งเป็นผลทำให้การค้นคืนข้อมูลสามารถเข้าถึงข้อมูลที่มีความคล้ายได้ในช่วงต้นของกระบวนการค้น นอกจากนี้ยังสามารถลดทอนการเข้ากลุ่มข้อมูลหลายกลุ่มได้ด้วยค่าระยะทางขอบเขตล่างดังกล่าว ในการทดลองนั้นวิธีการทำดัชนีที่ได้นำเสนอสามารถค้นคืนข้อมูลได้เร็วกว่าวิธีกราดตรวจ โดยลดการเข้าถึงข้อมูลได้หลายสิบเท่า
Other Abstract:	As time series has become one of the most prevalent types of data in this digital age, similarity search occurs to be crucial for these applications, including multimedia data retrieval and classification. Dynamic time warping (DTW) is one of the most accurate distance measure exploited in the search. However, it is known to suffer from such a high computational cost, raising great amount of interest in trying to speed it up. Recently, this obstacle has been alleviated using efficient existing lower bounding functions, e.g., FTW and LB_Keogh which can greatly speed up the distance calculation for the similarity search. However, in database field, the crux of the matter is how to minimize the data access or I/O cost. The most typical approach is through indexing. Unlike other types of data, time series indexing is almost impractical due to their explosively high number of dimensions. Moreover, unlike typical Euclidean distance measure, in DTW distance measure, a data point can be in relation to data points from other dimensions. As a result, time series indexing is beyond challenging since sequential scan unexpectedly outperforms all existing indexing methods. This research proposes an efficient time series indexing structure. With clustering technique on data preprocessing, the order of the data access on each data cluster is sorted by the proposed lower bounding distance which indicates the similarity between query and data cluster. Therefore, the similar candidates can be accessed early in the search process. Moreover, many data clusters can be pruned by the proposed lower bounding distance. In the experiments, the proposed indexing method outperforms the sequential scan by over an order of magnitude in terms of data access reduction.
Description:	วิทยนิพนธ์ (วศ.ม.)--จุฬาลงกรณ์มหาวิทยาลัย, 2551
Degree Name:	วิศวกรรมศาสตรมหาบัณฑิต
Degree Level:	ปริญญาโท
Degree Discipline:	วิศวกรรมคอมพิวเตอร์
URI:	http://cuir.car.chula.ac.th/handle/123456789/31241
URI:	http://doi.org/10.14457/CU.the.2008.779
metadata.dc.identifier.DOI:	10.14457/CU.the.2008.779
Type:	Thesis
Appears in Collections:	Eng - Theses

Files in This Item:

File	Description	Size	Format
Pongsakorn_Ru.pdf		3.27 MB	Adobe PDF	View/Open

Show full item record