|
|
EDA365欢迎您登录!
您需要 登录 才可以下载或查看,没有帐号?注册
x
一、前言8 N/ b+ U) l8 f( l
支持向量数据描述(Support Vector Data Description,SVDD)是一种单值分类算法,能够实现目标样本和非目标样本的区分,算法的具体描述可以参考以下文献:
b& X; q5 C4 _6 d1 `6 q(1)Tax D M J, Duin R P W. Support vector domain description[J]. Pattern recognition letters, 1999, 20(11-13): 1191-1199.
+ t8 j7 v7 e D m6 t) y) q% _(2)Tax D M J, Duin R P W. Support vector data description[J]. Machine learning, 2004, 54(1): 45-66.. f B! O: N' z
. a6 [3 g$ @4 q2 N" V5 J% a: w. v, ^
台湾大学林智仁 (Lin Chih-Jen) 教授等开发设计的 libsvm 工具箱提供了SVDD算法的MATLAB接口,其中两个关键参数 c 和 g 直接影响SVDD的单值分类结果。笔者在此基础上,通过引入鲸鱼优化算法(Whale Optimization Algorithm,WOA),实现对 libsvm 工具箱中的SVDD算法的参数优化。1 S% `" I+ t5 n1 t+ N) E
WOA的具体描述可以参考以下文献:
! i+ ^$ {* K1 R6 a(1)Mirjalili S, Lewis A. The whale optimization algorithm[J]. Advances in engineering software, 2016, 95: 51-67.
1 V1 A* {1 v3 R6 N7 c
7 A0 g( v6 s# G
) O9 X1 d+ G/ m& r1 |# p该算法的提出者已经把代码开源在mathworks。1 ]9 k2 c( C) O; G3 M: v8 _
# |0 v4 l: m% u 注:(1)笔者已把 libsvm工具箱的svmtrain和svmpredict函数的名字分别改为libsvmtrain和libsvmpredict。5 B. V3 W4 S7 t
(2)WOA算法和其他群智能优化算法一样,容易陷入局部最优,若寻优结果出现异常,可以尝试多运行几次。/ [7 s! d0 o1 _% k. b; Z
' d* R( G' X% I2 ^
二、例子1 (libsvm 工具箱提供的heart_scale data)! o8 \; X7 o: x' z( ^. t
! |6 @ Z' l8 _ M$ y- r. K! h) s
1. 数据说明
- r4 h, O; D- p& R5 l+ [ 该数据集共有13个属性,270个样本,包括120个正样本和150个负样本。在该例子中,把正样本作为训练集,标签为1;负样本作为测试集,标签为-1。
7 l- @% g, N2 a9 F& l C9 A0 @; _$ N, f. u
2. 主程序代码) v# N- D! Z: r6 V- z9 Z
: I$ Z x: t% y- clc
- clear all
- close all
- addpath(genpath(pwd))
- global traindata trainlabel
- % heart_scale data
- [traindata, testdata, trainlabel, testlabel] = prepareData;
- % Parameter setting of WOA
- agent = 10; % Number of search agents
- iteration = 20; % Maximum numbef of iterations
- lb = [10^-3,2^-4]; % Lower bound of 'c' and 'g'
- ub = [10^0,2^4]; % Upper bound of 'c' and 'g'
- dim = 2; % Number of Parameter
- fobj = @woa_obj; % Objective function
- % Parameter optimization using WOA
- [Best_score,Best_pos,~] = WOA(agent,iteration,lb,ub,dim,fobj);
- % Train SVDD hypersphere using the optimal parameters
- cmd = ['-s 5 -t 2 ','-c ',num2str(Best_pos(1,1)),' -g ', ...
- num2str(Best_pos(1,2)),' -q'];
- model = libsvmtrain(trainlabel, traindata, cmd);
- % Test
- [predictlabel,accuracy,~] = libsvmpredict(testlabel, testdata, model);
6 z) {1 m( ^) X5 i L 1 K; g/ V/ K, o8 w
1 N7 Z# W! T6 |" e6 E; u最后一次迭代的结果以及最终的分类结果:: o. y9 T6 F+ s) n5 v" [
3 d8 _- V- d$ c0 E3 t% K" i# T8 K- ans =
- 19.0000 0.0667
- Accuracy = 80% (96/120) (classification)
- Accuracy = 66.6667% (80/120) (classification)
- Accuracy = 60% (72/120) (classification)
- Accuracy = 80% (96/120) (classification)
- Accuracy = 53.3333% (64/120) (classification)
- Accuracy = 54.1667% (65/120) (classification)
- Accuracy = 42.5% (51/120) (classification)
- Accuracy = 35% (42/120) (classification)
- Accuracy = 80% (96/120) (classification)
- Accuracy = 35% (42/120) (classification)
- ans =
- 20.0000 0.0667
- Accuracy = 100% (150/150) (classification); M7 o+ l3 [9 x3 l$ a
* ~! ]4 o* c U- {& b1 t. N
9 w7 Q+ W! m1 ?" W1 \可以看出,利用优化后的参数建立的SVDD模型,训练集的正确率为93.33%,测试集的正确率为100%。
2 X$ \5 s; l6 E$ g: L+ R. y6 v" @
$ B1 ` H1 d) m9 Z6 H三、例子2 (工业过程数据)# G2 e1 x* V0 A4 }
+ x' i8 { t: V4 @9 h6 t6 R1. 数据说明4 ]/ o, Q3 A, W/ }7 C9 D+ ]1 Q( a
采用某工业过程数据,该数据集共有10个属性,训练集有400个正样本,测试集有80个样本(前40个样本为正样本,后40个样本为负样本)。
) c+ c8 }; {: z$ q3 F) `8 ]" c
' p7 l- H9 A5 L* \6 S0 C/ |: f2. 主程序代码3 c# z% J' ^3 d% Z6 g
9 F4 a8 c& F4 S' |# E! A! `
- clc
- clear all
- addpath(genpath(pwd))
- global traindata trainlabel
- % Industrial process data
- load ('.\data\data_2.mat')
- % Parameter setting of WOA
- agent = 10; % Number of search agents
- iteration = 30; % Maximum numbef of iterations
- lb = [10^-3,2^-7]; % Lower bound of 'c' and 'g'
- ub = [10^0,2^7]; % Upper bound of 'c' and 'g'
- dim = 2; % Number of Parameter
- fobj = @woa_obj; % Objective function
- % Parameter optimization using WOA
- [Best_score,Best_pos,~] = WOA(agent,iteration,lb,ub,dim,fobj);
- % Train SVDD hypersphere using the optimal parameters
- cmd = ['-s 5 -t 2 ','-c ',num2str(Best_pos(1,1)),' -g ', ...
- num2str(Best_pos(1,2)),' -q'];
- model = libsvmtrain(trainlabel, traindata, cmd);
- % Test
- [predictlabel,accuracy,~] = libsvmpredict(testlabel, testdata, model);
- % Visualize the results
- plotResult(testlabel,predictlabel)
, c3 X) @8 A* C
: Z/ y- [7 |+ y' g2 M) W3 r
& ?( x6 u6 s3 H8 a8 T- ~最后一次迭代的结果以及最终的分类结果:
0 S! |: g* w% d) y8 U# f6 d! T
- q. V( j; U9 B( [ `- Accuracy = 99.5% (398/400) (classification)
- Accuracy = 99.25% (397/400) (classification)
- Accuracy = 99.75% (399/400) (classification)
- Accuracy = 99.75% (399/400) (classification)
- Accuracy = 99.5% (398/400) (classification)
- Accuracy = 99.25% (397/400) (classification)
- Accuracy = 99.75% (399/400) (classification)
- Accuracy = 99.75% (399/400) (classification)
- Accuracy = 99.5% (398/400) (classification)
- Accuracy = 99.5% (398/400) (classification)
- ans =
- 30.0000 0.0025
- Accuracy = 93.75% (75/80) (classification)8 {! M1 {* p3 K2 I% b5 C# h; J, e0 S
) f, w) g& ^9 w6 R) h
- P2 L8 x6 x4 p& ]& h% z可以看出,利用优化后的参数建立的SVDD模型,训练集的正确率为99.75%,测试集的正确率为93.75%。
; k- E( {1 O9 L可视化结果如下:
3 q3 M J0 v7 X/ p+ @ u) r/ P# ]7 @
* L6 |7 o, l# i7 @0 F0 c1 a E0 ? Y; \
: g9 G5 L F8 {) C h0 u4 d0 a' f3 r$ t1 c
9 A; A5 P2 m+ s: c
|
|