목차

Title Page

Abstract

국문초록

Contents

Chapter 1. Introduction 14

1.1. Properties of fraud 14

1.2. Related studies 15

1.3. Fraud detection 16

1.4. SVM for fraud detection 16

1.5. Outline of Thesis 17

Chapter 2. L₁-penalized Fraud Detection Support Vector Machines 18

2.1. Introduction 18

2.2. Piecewise linearity of L₁-FDSVM Estimator 24

2.3. Solution Path Algorithm of L₁-FDSVM 27

2.3.1. Initialization 28

2.3.2. Finding t(ℓ+1)[이미지참조] 29

2.3.3. Update Related Quantities at t(ℓ+1)[이미지참조] 30

2.4. Selection of Tuning Parameter t 32

2.5. Simulation 35

2.6. Illustration to Wafer Data 40

2.7. Conclusion 43

Chapter 3. Fraud Detection Support Vector Machines with Functional Predictors 45

3.1. Introduction 45

3.2. FDSVM 48

3.3. F²DSVM: Extension to Functional Predictor 50

3.4. Wafer Data illustration 53

3.5. Conclusion 58

Chapter 4. Variable Selection for Fraud Detection Support Vector Machines via Smoothing Spline ANOVA 60

4.1. Introduction 60

4.2. The COSSO FDSVM via Smoothing Spline ANOVA 63

4.2.1. Smoothing Spline ANOVA 63

4.2.2. COSSO FDSVM 64

4.3. Algorithm for COSSO FDSVM 65

4.3.1. Update (b,c) 66

4.3.2. Update θ 68

4.3.3. Overall procedure 68

4.3.4. Subset basis algorithm 69

4.4. Simulations 71

4.5. Real Data Illustration 74

4.6. Conclusion 76

Chapter 5. Conclusion and Future Research 77

Bibliography 80

Table 2.1. Simulation results for forward models: Classification performances of the L₁-FDSVM measured in sensitivity, specificity, accuracy, and weighted G-... 38

Table 2.2. Variable selection performance for forward models: The averaged number for correctly selected (CS) variables and incorrectly selected (IS) variables are... 39

Table 2.3. The averaged performance over thirty repetitions with the random partitioning of Wafer data into 30% training, 70% test sets. The numbers in parentheses... 42

Table 3.1. Comparison of averaged computing time between the grid search with quadratic programming (QP) for F²DSVM and its path algorithm over fifty repeti-... 55

Table 3.2. The averaged of sensitivity, specificity, accuracy, and G-mean over 50 iterations with random sampling (70% for the train and 30% for the test) from wafer... 58

Table 4.1. The averaged performance over thirty repetitions with 2-fold cross-validation of simulation data. The numbers in parentheses are standard errors. Both... 72

Table 4.2. The averaged performance over thirty repetitions with the random splitting of three datasets into 2-fold cross-validation. The numbers in parentheses... 75

Figure 2.1. Trajectories of sensitivity and specificity as a function of t for the test set of Wafer data we analyzed in Section 3.4. The dotted horizontal lines represent... 34

Figure 2.2. The regularization paths of L₁-FDSVM, i.e., the trajectories of β (red solid lines) and β0 (green dashed line) as a function of t for the training set of...[이미지참조] 41

Figure 2.3. Variable Selection Results for L₁-FDSVM for the Wafer data: There are 130 variables that never selects out of thirty repetitions which means that about one... 43

Figure 3.1. The total 1,194. wafer data of the 405 nm emission during measurement time. Normal wafers in green solid lines and abnormal wafers in blue dotted lines. 54

Figure 3.2. Trajectories of dual solution as a function of λ for the training set of wafer data we analyzed in Section 3.4. The left plot represent trajectories of... 56

Figure 3.3. Trajectories of sensitivity and specificity as a function of 1/λ for the test set of wafer data. For the left plot, the dotted horizontal lines represent the... 57

Figure 4.1. The averaged variable selection results for COSSO FDSVM and L₁-FDSVM for the simulation data: In nonlinear models (NL1, NL2), COSSO... 73