การเปรียบเทียบวิธีการคัดกรองตัวแปรสำหรับวิธีการแบ่งข้อมูลตัวอย่างหลายครั้งในการหาค่าพี-แวลูสำหรับข้อมูลที่มีมิติสูง

ศุภวัฒน์ อังคะสี

Please use this identifier to cite or link to this item: https://cuir.car.chula.ac.th/handle/123456789/43951

Title:	การเปรียบเทียบวิธีการคัดกรองตัวแปรสำหรับวิธีการแบ่งข้อมูลตัวอย่างหลายครั้งในการหาค่าพี-แวลูสำหรับข้อมูลที่มีมิติสูง
Other Titles:	COMPARISON OF THE VARIABLES SCREENING METHODS FOR MULTI – SAMPLE SPLIT TO FIND P – VALUES FOR HIGH – DIMENSIONAL DATA
Authors:	ศุภวัฒน์ อังคะสี
Advisors:	วิฐรา พึ่งพาพงศ์
Other author:	จุฬาลงกรณ์มหาวิทยาลัย. คณะพาณิชยศาสตร์และการบัญชี
Advisor's Email:	[email protected]
Subjects:	สถิติวิเคราะห์ ตัวแปร (คณิตศาสตร์) การวิเคราะห์การถดถอย Variables (Mathematics) Regression analysis
Issue Date:	2556
Publisher:	จุฬาลงกรณ์มหาวิทยาลัย
Abstract:	งานวิจัยฉบับนี้มีวัตถุประสงค์เพื่อเปรียบเทียบวิธีการคัดกรองตัวแปรจากวิธี Lasso, Adaptive Lasso, Elastic net และ SCAD สำหรับขั้นตอนวิธีแบ่งข้อมูลหลายครั้ง (Multi - Split) เพื่อหาค่า p-value ในการวิเคราะห์ความถดถอยของข้อมูลที่มีมิติสูง โดยวิเคราะห์จากจำนวนสัมประสิทธิ์ของตัวแปรอิสระที่ไม่เท่ากับ 0 ความผิดพลาดเชิงบวกและความผิดพลาดเชิงลบภายหลังจากควบคุมด้วยวิธี False Discovery Rate (FDR) โดยมีการจำลองข้อมูลที่มีขอบเขตต่างกัน โดยมีขนาดตัวอย่างเท่ากับ 10, 100 และ 200 จำนวนสัมประสิทธิ์ที่ไม่เท่ากับ 0 เป็นร้อยละ 10, 20, 50 ของขนาดตัวอย่าง และความสัมพันธ์ของตัวแปรอิสระเป็น 0, 0.5 และ 0.9 โดยทำการจำลองข้อมูลและวิเคราะห์ผลด้วยโปรแกรม R 3.0.3 ทั้งนี้จะใช้ค่าความผิดพลาดในการตรวจจับเชิงบวก (False Positive : FP) ความผิดพลาดในการตรวจจับเชิงลบ (False Negative : FN) และจำนวนของสัมประสิทธิ์ของตัวแปรอิสระที่มีค่าไม่เท่ากับ 0 จากการทดสอบสมมติฐาน เมื่อควบคุม FDR เป็นเครื่องมือในการเปรียบเทียบและการวัดประสิทธิภาพ การศึกษาภายใต้ขอบเขตดังกล่าวผลปรากฏว่ากรณีที่ขนาดตัวอย่างเท่ากับ 10 พิจารณาจากจำนวนของสัมประสิทธิ์ของตัวแปรอิสระที่มีค่าไม่เท่ากับ 0 จากการทดสอบสมมติฐาน เมื่อควบคุม FDR ,ค่าของ FP และ FN ที่ตารางแสดงจำนวนของสัมประสิทธิ์ของตัวแปรอิสระที่มีค่าไม่เท่ากับ 0 จากการทดสอบสมมติฐาน เมื่อควบคุม FDR และค่าของ FN จะไปในทิศทางเดียวกัน นั่นคือการคัดกรองตัวแปรด้วยวิธี Adaptive Lasso จะเหมาะสมมากที่สุด แต่จากตาราง FP จะได้วิธี Lasso ที่เหมาะสมแต่ค่าที่ได้ยังไม่ชัดเจน ในกรณีที่ขนาดตัวอย่างเท่ากับ 100 และ 200 การคัดกรองตัวแปรด้วยวิธี Adaptive Lasso และวิธี SCAD จะเหมาะสมมากที่สุด แต่จากตาราง FP จะได้วิธี Lasso และวิธี EN ที่เหมาะสม นั่นแสดงให้เห็นว่าวิธี Lasso และวิธี EN มีประสิทธิภาพในการคัดกรองตัวแปรน้อยกว่าวิธี Adaptive Lasso และวิธี SCAD
Other Abstract:	This research is aimed to compare the screening variables of Lasso, Adaptive Lasso, Elastic net and SCAD for the Multi - Split to find p-values in the regression analysis for high dimensional data. To analyze from the number of non-zero coefficients, false positives and false negatives after controlling False Discovery Rate (FDR) were collected and analyzed based on simulated data. The sample size are 10, 100 and 200. The numbers of non-zero coefficients is not equal to 0 are set to 10, 20 and 50 percent of sample size and the correlation among independent variables are 0, 0.5 and 0.9. he simulating and analyzing data in this study used the R 3.0.3 . It uses The False Positive (FP), The False Negative (FN) and the number of coefficients of independent variables is not equal to 0 by hypothesis testing after control by FDR., which is not equal to 0, that use as a tool to compare and performance measurement. The study showed that within the scope of the case considering the sample size of 10 .The tables of the number of coefficients of independent variables is not equal to 0 by hypothesis testing after control by FDR, FP and FN shows the value of the number of coefficients of independent variables is not equal to 0 by hypothesis testing after control by FDR and FN are go to the same direction. That is data screening by Adaptive Lasso are the most appropriate. On the other hand, in the table of FP data screening by Lasso, this will get to the right value but the value will not very clear. In case of the sample size are 100 and 200, the data screening by Adaptive Lasso and SCAD are the most appropriate but from the table of FP will approach Lasso and appropriate EN, which showed that Lasso and EN are effective to the data screening,that is less than Adaptive Lasso and SCAD.
Description:	วิทยานิพนธ์ (วท.ม.)--จุฬาลงกรณ์มหาวิทยาลัย, 2556
Degree Name:	วิทยาศาสตรมหาบัณฑิต
Degree Level:	ปริญญาโท
Degree Discipline:	สถิติ
URI:	http://cuir.car.chula.ac.th/handle/123456789/43951
URI:	http://doi.org/10.14457/CU.the.2013.1404
metadata.dc.identifier.DOI:	10.14457/CU.the.2013.1404
Type:	Thesis
Appears in Collections:	Acctn - Theses

Files in This Item:

File	Description	Size	Format
5581609426.pdf		5.14 MB	Adobe PDF	View/Open

Show full item record