|
EDA365欢迎您登录!
您需要 登录 才可以下载或查看,没有帐号?注册
x
一、前言
8 n+ d' K) {; M 支持向量数据描述(Support Vector Data Description,SVDD)是一种单值分类算法,能够实现目标样本和非目标样本的区分,算法的具体描述可以参考以下文献:
. O( @5 {# `: J% @(1)Tax D M J, Duin R P W. Support vector domain description[J]. Pattern recognition letters, 1999, 20(11-13): 1191-1199.; V3 n! F! _0 x. Z) `/ ~2 }
(2)Tax D M J, Duin R P W. Support vector data description[J]. Machine learning, 2004, 54(1): 45-66.
9 M' y3 H: q( @. Z# U( p6 _, E- Y/ v9 r- Z) V2 Z
台湾大学林智仁 (Lin Chih-Jen) 教授等开发设计的 libsvm 工具箱提供了SVDD算法的MATLAB接口,其中两个关键参数 c 和 g 直接影响SVDD的单值分类结果。笔者在此基础上,通过引入鲸鱼优化算法(Whale Optimization Algorithm,WOA),实现对 libsvm 工具箱中的SVDD算法的参数优化。
) A5 t {* Q+ h, V3 C; e- Y% ^WOA的具体描述可以参考以下文献:" D6 v! [$ |% L! k* w! b
(1)Mirjalili S, Lewis A. The whale optimization algorithm[J]. Advances in engineering software, 2016, 95: 51-67.; O6 D! M& g$ @- F' S/ m+ u" u/ b
4 w6 o* T" R% s% R+ ^8 C; a# f% E0 U& r- L; {
该算法的提出者已经把代码开源在mathworks。. \# x2 a' y# [
- W# |& Z- G. q4 Y, z* I 注:(1)笔者已把 libsvm工具箱的svmtrain和svmpredict函数的名字分别改为libsvmtrain和libsvmpredict。- s M. ~$ b# B, g& s
(2)WOA算法和其他群智能优化算法一样,容易陷入局部最优,若寻优结果出现异常,可以尝试多运行几次。8 u) O. b3 A ?2 }! H* h
% R7 ? E& {2 m# B" O- {二、例子1 (libsvm 工具箱提供的heart_scale data)
% L4 A1 ]. P1 ~1 C) C8 I
! d. n4 ~& [& Q, s1 Q1. 数据说明$ ?5 V! V$ g1 _" L o( a3 x
该数据集共有13个属性,270个样本,包括120个正样本和150个负样本。在该例子中,把正样本作为训练集,标签为1;负样本作为测试集,标签为-1。
o- T0 s$ a1 _. u% y# H! V
5 c# t9 u( c7 r# }) M3 p' Z% {% |2. 主程序代码
& n! C4 u. H, |" y
8 D+ w+ _1 F' R: ]3 W0 q0 [8 B6 n$ i# \- clc
- clear all
- close all
- addpath(genpath(pwd))
- global traindata trainlabel
- % heart_scale data
- [traindata, testdata, trainlabel, testlabel] = prepareData;
- % Parameter setting of WOA
- agent = 10; % Number of search agents
- iteration = 20; % Maximum numbef of iterations
- lb = [10^-3,2^-4]; % Lower bound of 'c' and 'g'
- ub = [10^0,2^4]; % Upper bound of 'c' and 'g'
- dim = 2; % Number of Parameter
- fobj = @woa_obj; % Objective function
- % Parameter optimization using WOA
- [Best_score,Best_pos,~] = WOA(agent,iteration,lb,ub,dim,fobj);
- % Train SVDD hypersphere using the optimal parameters
- cmd = ['-s 5 -t 2 ','-c ',num2str(Best_pos(1,1)),' -g ', ...
- num2str(Best_pos(1,2)),' -q'];
- model = libsvmtrain(trainlabel, traindata, cmd);
- % Test
- [predictlabel,accuracy,~] = libsvmpredict(testlabel, testdata, model);
9 k- Q5 ?4 q1 `2 {/ A2 b2 t
T4 j$ J% P! F$ }0 `4 P
( T& K. z4 X7 g3 w z& M3 Y7 D最后一次迭代的结果以及最终的分类结果:
- f" }2 K/ M- P$ I: w; ?' X
; T( E1 L1 \9 A$ _" W! f- ans =
- 19.0000 0.0667
- Accuracy = 80% (96/120) (classification)
- Accuracy = 66.6667% (80/120) (classification)
- Accuracy = 60% (72/120) (classification)
- Accuracy = 80% (96/120) (classification)
- Accuracy = 53.3333% (64/120) (classification)
- Accuracy = 54.1667% (65/120) (classification)
- Accuracy = 42.5% (51/120) (classification)
- Accuracy = 35% (42/120) (classification)
- Accuracy = 80% (96/120) (classification)
- Accuracy = 35% (42/120) (classification)
- ans =
- 20.0000 0.0667
- Accuracy = 100% (150/150) (classification)" p* i- _* S# Z9 G: e- Z
- T8 @1 d# e$ z0 T& K: k
( H) @3 L7 B" B
可以看出,利用优化后的参数建立的SVDD模型,训练集的正确率为93.33%,测试集的正确率为100%。
- e5 n. S& n2 L4 Z3 F+ ]; r, T Q& y3 h1 i
三、例子2 (工业过程数据): R9 u! U/ Y% _2 G7 K2 e
/ T* B/ {$ `) d# z7 T
1. 数据说明( `$ [+ H* ^3 ?: B: S; @ b
采用某工业过程数据,该数据集共有10个属性,训练集有400个正样本,测试集有80个样本(前40个样本为正样本,后40个样本为负样本)。
9 u& Q0 ?' f; }1 Y# J2 g& b/ m+ N2 W4 R! u; D. N2 Y
2. 主程序代码
% e1 \, d$ X2 s" b
; y' a- o/ M# o6 Y- clc
- clear all
- addpath(genpath(pwd))
- global traindata trainlabel
- % Industrial process data
- load ('.\data\data_2.mat')
- % Parameter setting of WOA
- agent = 10; % Number of search agents
- iteration = 30; % Maximum numbef of iterations
- lb = [10^-3,2^-7]; % Lower bound of 'c' and 'g'
- ub = [10^0,2^7]; % Upper bound of 'c' and 'g'
- dim = 2; % Number of Parameter
- fobj = @woa_obj; % Objective function
- % Parameter optimization using WOA
- [Best_score,Best_pos,~] = WOA(agent,iteration,lb,ub,dim,fobj);
- % Train SVDD hypersphere using the optimal parameters
- cmd = ['-s 5 -t 2 ','-c ',num2str(Best_pos(1,1)),' -g ', ...
- num2str(Best_pos(1,2)),' -q'];
- model = libsvmtrain(trainlabel, traindata, cmd);
- % Test
- [predictlabel,accuracy,~] = libsvmpredict(testlabel, testdata, model);
- % Visualize the results
- plotResult(testlabel,predictlabel)3 x _# N: V3 |7 s. u% \, A8 _
8 `* F, g# f8 S/ L
, n* v3 M0 Z) o% G* K0 x& w最后一次迭代的结果以及最终的分类结果:* t* t6 w& O: b! V8 J
6 l0 X7 y" [& w# |- Accuracy = 99.5% (398/400) (classification)
- Accuracy = 99.25% (397/400) (classification)
- Accuracy = 99.75% (399/400) (classification)
- Accuracy = 99.75% (399/400) (classification)
- Accuracy = 99.5% (398/400) (classification)
- Accuracy = 99.25% (397/400) (classification)
- Accuracy = 99.75% (399/400) (classification)
- Accuracy = 99.75% (399/400) (classification)
- Accuracy = 99.5% (398/400) (classification)
- Accuracy = 99.5% (398/400) (classification)
- ans =
- 30.0000 0.0025
- Accuracy = 93.75% (75/80) (classification)' g; L6 V, |- W6 ]
3 b( ~1 J3 ` i1 z* i6 e) _
* C# l0 c Z0 l& r* [& X! u
可以看出,利用优化后的参数建立的SVDD模型,训练集的正确率为99.75%,测试集的正确率为93.75%。
0 [: {; s0 ^ w' c* a可视化结果如下:
/ p9 d' p* V6 w; N, s
5 Q# S0 e0 s) n9 c! p ; Q' ?4 N* K% G) u9 Y2 ~
; I0 D& c- ?; ]( f/ v
& b; @# U1 K& j7 F1 Q) u9 Y# ?% N) ~. U* J4 h& q. u0 y4 S! R
|
|