CONCLUSION
In this paper, the authors have proposed
successfully the use of "cleft-overstep"
algorithm to improve the neural network
training having the special error surface and
have illustrated particularly through the
application of hand writing recognition.
Through research and experimentation,
gained results have shown that: with the
neural network structure that error surface is
shaped deep cleft, still use this
Backpropagation algorithm but applying
"cleft-overstep" to train the network gives us
more accuracy and faster convergence speed
than the gradient method.
The use of "cleft-overstep" algorithm can be
applied to train a neural network structure that
has the special error surface. Thus, the results
of this study can be applied to many other
problems in the field of telecommunications,
control, and information technology.
The paper should be more research on the
identification of vector direction searching in
the algorithm "cleft-overstep"and the
changing the assess standard of the quality
function to reduce the complexity of the
calculation process on the computer
[6]. However, the results of this study have
initially reflected the correctness of the
proposed algorithm and revealed possibilities
to practical applications
7 trang |
Chia sẻ: thucuc2301 | Lượt xem: 496 | Lượt tải: 0
Bạn đang xem nội dung tài liệu A study to improve a learning algorithm of neural networks - Cong Huu Nguyen, để tải tài liệu về máy bạn click vào nút DOWNLOAD ở trên
Nguyễn Hữu Công và Đtg Tạp chí KHOA HỌC & CÔNG NGHỆ 93(05): 53 - 59
53
A STUDY TO IMPROVE A LEARNING ALGORITHM
OF NEURAL NETWORKS
Cong Huu Nguyen*1, Thanh Nga Thi Nguyen2, Ngoc Van Dong3
1Thai Nguyen University, 2College of Technology – TNU,
3Ha Noi Vocational College of electrical mechanical
ABSTRACT
Since the last mid- twentieth century, the study of optimization algorithms, especially on the
development of digital computers, is increasingly becoming an important branch of
mathematics. Nowadays, those mathematical tools are practically applied to neural
networks training. In the process of finding an optimal algorithm to minimize the convergence
time of the solution or avoiding the weak minima, local minima, the problems are starting to study
the characteristics of the error surface. For the complex error surface as cleft-error surface, that its
contours are stretched, bent forming cleft and cleft shaft, the old algorithms can not be settled.This
paper proposes an algorithm to improve the convergence of the solution and the ability to exit
from undesired areas on the error surface.
Keywords: neural networks, special error surface, local minima, optimization, algorithms
BACKGROUND*
In the process of finding an optimal algorithm
to minimize the convergence time of the
solution or to avoid tweak m inima, local
minima, the problems are starting to study the
characteristics of the error surface
and take it as a starting point for
improvement or propose a new training
algorithm. When mentioning about the neural
networks, trained network quality is usually
offered (supervised learning). This
related quality function and led to the
concept of network
quality surface. Sometimes, we also call the
quality surface by other terms: the error
surface, the executing surface. Figure 1 shows
an error surface. There are some special
things to note for this surface such as: the
slope is drastically changing on the parameter
space. For this reason, it will be difficult
to choose an apprppriate pace for learning
algorithm known as the steepest descent
algoritm, conjugate gradient In some areas
of the error surface is very flat, allowing large
learning rate, while other regions of big
slopes, require a small learning rate.
Other methods such as rules of torque,
adaptive learning rate VLBP (Variable
*
Tel: 0913 589758, Email: conghn@tnu.edu.vn
Learning Rate Back propagation algoritm) are
not effective in this problem [5].
Thus, with the complex quality surfaces are
more difficult in the process of finding the
optimal weights and can still blocked at the
shaft of the cleft before reaching the
minimum point, if the quality surface is the
cleft form. Probably, having
next strategy to solve this problem is that after
reaching near the cleft shaft by gradient
method with calculated step
approach minimized following line (or with s
pecified learning steps) we will move along
the bottom of a narrow
cleft through the gradually asymptotic
geometry, it is assumed that the
geometry is a line or approximately
quadratic curve.
The objective of this paper is to study and
apply the cleft algorithm to calculate
the learning step for finding the optimal
weights of neural networks to solve the
control problem.
CLEFT-OVERSTEP ALGORITHM FOR
NEURAL NETWORK TRAINING
Cleft-overstep principle
Examining the unconstrained minimizing
optimization problem:
J(u) → min, u∈ En (1)
Số hóa bởi Trung tâm Học liệu – Đại học Thái Nguyên
Nguyễn Hữu Công và Đtg Tạp chí KHOA HỌC & CÔNG NGHỆ 93(05): 53 - 59
54
Where u is the minimizing vector in an n-
dimensional Euclidean space, J(u) is the target
function which satisfies
(2)
The optimization algorithm for problem (1)
has the iteration equation as follows:
1
, 0,1..k k kKu u s kα
+
= + = ... (3)
where uk and uk+1 are the starting point and the
ending point of the kth iteration step, sk is the
vecgtor which show the changed direction of
numeric variables in n-dimentional space;
kα is the step length.
kα is determined according to the cleft-
overstep principle and called a “cleft-
overstep” step and equation (3) is called the
cleft-overstep algorithm.
The basic difference between cleft-overstep
method and other methods is in the principle
for step adjustment. According to this
principle, the step length of the searching
point at each iteration step is not smaller than
the smallest step length at which the target
function reaches the (local) minimum value in
the moving direction at that iteration step.
The searching optimization trajectory of the
cleft-overstep principle creates a geometric
picture in which the searching point
“oversteps” the cleft bottom at each iteration
step. To specify the cleft-overstep principle
we examine a one numeric variable function at
each iteration step [4]:
( ) ( ).k kh J u sα α= +
(4)
Suppose that sk is the direction of the target
function at the point uk. According to
condition (2), there is a smallest value * 0α >
so that h(α) reaches minimum:
( )* arg min , 0hα α α= >
(5)
If J(uk), this also means h(α), continuously
differentiable, we can define the cleft-overstep
step as follows:
( ) ( ) ( )' 0, 0v vh h hα αα α= > ≤
(6)
( vα is the overstep step, means that it
oversteps the cleft)
The variation graph of function h(α), when the
optimization trajectory changes from the
starting point uk to the ending point uk+1 is
illustrated in figure 2. We can see that when
the value α ascends from 0, go through the
minimal point *α of h(α), to the value vα ,
the corresponding optimization trajectory
moves forward parallely with sk in the
relationship that 1 , 0,1..k k kKu u s kα+ = + = and
takes a step length of *vα α α= ≥ . This graph
also shows that, considering the moving
direction, the target function changes in the
descending direction from point uk, but when
it reaches point uk+1 it changes to the
ascending direction.
If we use moving steps according to condition
(5), we may be trapped at the cleft axis and the
corresponding optimization algorithm is also
trapped at that point. Also, if the optimization
process follows condition (6), the searching
point is not allowed to be located at the cleft
bottom before the optimal solution is obtained
and, simultaneously, it always draw a
trajectory which overstep the cleft bottom. In
order to obtain effective and stable iteration
process, condition (6) is substituted by
condition (7).
( ) ( )* * 0 *
0
arg min ,v vh h h h h
α
α α α α λ
>
≥ = − ≤ − (7)
Where: 0 < λ < 1 is called overstep
coefficient ( )** αhh =
( )00 αhh =
Determining the cleft-overstep step
( )lim J x b
u
=
→∞
Figure 1: Cleft-similar error surface
Số hóa bởi Trung tâm Học liệu – Đại học Thái Nguyên
Nguyễn Hữu Công và Đtg Tạp chí KHOA HỌC & CÔNG NGHỆ 93(05): 53 - 59
55
Choosing the length of learning step in the
cleft problem is of grave importance. If this
length is too short, the running time of
computers will be long. If this length is too
long, there may be difficulties in searching
process because it is difficult to observe the
curvedness of the cleft. Therefore, adaptive
learning step for the cleft problem is essential
in the process of searching for optimal
solution. In this section, we propose a simple,
yet effective way to find the cleft-overstep
step. Suppose that J(u) is continuous and
satisfies the condition limJ(u) = ∞ when
u → ∞ and at each iteration k, point uk-1 and
moving vector sk-1 was determined. We need
to determine the length of the step kα which
satisfies condition (7).
If instead of h* in (7) by estimate
**
, hhhh >≈
, we still get cleft-overstep by
definition. Therefore,
to simplify programming should take the
smallest value of h simply calculated
corresponding in each iteration,
without accurately determining h *. It also
significantly reduces the number of objective
function’s value. We have kα identified the
following algorithm:
( ) ( ) ( ) ( ) 00. 1'11''' <= −−− kTkskTkskkk SuJSuJhh α (8)
PROGRAM AND RESULTS
To illustrate above remarks, here we offer
a method using neural network training with
the Backpropagation procedure and
learning step calculated by principles of cleft-
overstep principle. The example: for an input
vector, neural networks have to
answer what it is. The software provides
a structured network of 35 input neurons,
5 middle layer neurons, 10 output
layer neurons. Sigmoid is activated
function; this function’s characteristic is easy
to make the cleft - error surface. [1]
Figure 2: Determining the cleft-overstep step
vượt khe vα
Start
α=a =0.5
Correctness ε>0
γ=0.1
Initialized u0
Searching
direction s0
h(α)=h(u+
as)
h(α)≥h(
0)
α=0
β=a
α=β
β=a
β=1.5β
h(α)≤h(
β)
h(θ)≤h(
α)
(βγαθ −+=
α=β α=θ
β=θ
α β ε− ≤
End
1
0
1 0
1
0
0
1
Figure 3: Diagram of algorithm that
determines the cleft-overstep learning step
Số hóa bởi Trung tâm Học liệu – Đại học Thái Nguyên
Nguyễn Hữu Công và Đtg Tạp chí KHOA HỌC & CÔNG NGHỆ 93(05): 53 - 59
56
The principle of network training algoritms is
done by Backpropagation procedure
associated with learning step calculated by
cleft-overstep principle. Cleft-overstep
algoritm has been presented in section 2.
Thus with the use of the fastest descent
methods to update the weight of the network,
we need the information related to separate
derivative of the error function in
each weight, that is to determine
formulate and update algorithm of weight
in the hidden layer and output layer. For
a given sample set, we will calculate the
derivative of error function by taking the
derivative’s sum on each sample in that
set. Analysis method and the derivative are
based on the "chain rule". According
to the slope of the tangent to the error
curve in the w – axis cross section called the
partial derivative of error function J taken by
that weight, denoted ∂ J/∂ w, using the chain
rule we have:
1
1 2
. ...
n
wwJ J
w w w w
∂∂∂ ∂
=
∂ ∂ ∂ ∂
Adjust the weights of output layer:
Define:b: the weight of output layer
z: the output of output layer.
t: the desired target value
yj: the output of neurons in the hidden
layer
v: total weight of
1
0
M
j j
j
v b y
−
=
= ∑
so
∂ v / ∂ bj = yj (ignoring the index
of neurons in output layer)
We use J = 0.5 * (z-t)2, so ∂ J / ∂ z = (z-t).
Activated function of output layer
neuron is sigmoid z = g (v), with ∂ z / ∂ v = z
(1-z).
We have:
( ) ( ). . . 1 .J J z v z t z z y
b z v b
∂ ∂ ∂ ∂
= = − −
∂ ∂ ∂ ∂
(9)
Since then we have the updated formula of
output layer’s weight as follows (ignoring the
indices): ( ) ( ). . . 1 .b z t z z yα∆ = − − (10)
We will use formula (10) [4]
in the procedure DIEUCHINHTRONGSO ()
to adjust the weight of output
layer, learning rate α is calculated according
to the principle of the cleft-overstep principle.
Adjust the weights of hidden layers:
The derivative of the objective function for
a weight of hidden layer is
calculated by chain rule:
( ) ( ) ( ). .J a J y y u u a∂ ∂ = ∂ ∂ ∂ ∂ ∂ ∂
Define: a: the weight of hidden layer
y: output of the neuron in the hidden
layer
xi: the components of the input vector
of input layer
u: total weight
1
0
N
i i
i
u a x
−
=
= ∑
so ∂ u / ∂ ai = xi
k: index of neuron in output layer
We have objective derivative of the weight
of hidden layer
( ) ( ) ( )1
0
. . 1 . . . 1 .
K
k k k k k i
i k
J
z t z z b y y x
a
−
=
∂
= − − −
∂ ∑
(11)
From here, we can adjust formula for
weight of the hidden layer as below:
( ) ( ) ( )1
0
. . . 1 . . . 1 .
K
i k k k k k i
k
a z t z z b y y xα
−
=
∆ = − − −∑
(12)
Figure 4: Structure of neural network for
recognition
Số hóa bởi Trung tâm Học liệu – Đại học Thái Nguyên
Nguyễn Hữu Công và Đtg Tạp chí KHOA HỌC & CÔNG NGHỆ 93(05): 53 - 59
57
In this formula, the index i denotes the
ith neuron of input layer, the index k denotes
the kth neuron of output layer.
We will use formula (12) [4] in
the procedure DIEUCHINHTRONGSO () to
adjust the weights of hidden
layer, learning rate α is calculated according
to the principle of the cleft-overstep principle.
The network structure
Using the sigmoid function
that is prone to produce the narrow cleft -
network quality, the
equation ( )
1
1 exp -x
f =
+
(13)
Example
Recognizing the characters are the digits 0, 1,
... 9; [1]
Comparison of convergence of
the three learning step methods: cleft-
overstep principle, fixed step and gradually
descent step.
We use a matrix of 5 × 7 = 35 for each
character encoding. Corresponding to each
input vector x is a vector of size 35 × 1, with
components receiving the value of 0 or 1.
Thus, we can select the input layer with
35 inputs. To distinguish ten characters, we
make the output layer is 10. For the hidden
layer, five neurons are selected, so
Hidden layer weight matrix: W1,1,
size 35 × 5
Output layer weight matrix: W2,1, size 5 × 10
Input vector x, size 35 × 1
Hidden layer output vector y, size 5 × 1
Output layer output vector z, size 10 × 1
After compiling with Visual C + +, run the
program, in turn training the network by 3
methods: fixed learning step, gradually
descent step and cleft
overstep. Each method, we try to train 20
times. The result gives the following table:
Table 1: Input sample file records {0 1 2 3 4 5 6 7 8 9}
No Fixed Step 0.2 Gradually descent step from 1 Cleft-overstep
1 Fail Fail Fail
2 7902 (iteration) 3634 (iteration) 23 (iteration)
3 7213 2416 50
4 Fail 2908 34
5 12570 2748 31
6 9709 3169 42
7 Fail 2315 43
8 9173 2375 33
9 Fail Fail 34
10 8410 2820 33
11 10333 2618 32
12 12467 2327 39
13 Fail 3238 44
14 9631 2653 Fail
15 12930 2652 31
16 10607 Fail 53
17 Fail 2792 31
18 7965 2322 42
19 11139 2913 42
20 Fail 2689 33
Summary Average: 10003 iterations, 7 Fail/20
Average: 2740 iteration, 3
Fail/20
Average: 35 iteration, 2
fail/20
Số hóa bởi Trung tâm Học liệu – Đại học Thái Nguyên
Nguyễn Hữu Công và Đtg Tạp chí KHOA HỌC & CÔNG NGHỆ 93(05): 53 - 59
58
Comments:
We have trained the network by three different
methods and have realized that learning by
cleft overstep principle has much higher
convergence speed, the number of fail times is
also reduced.
One drawback of the cleft overstep principle
is that the time for calculating by computer
in each iteration is long, this is because we
defined that the constants FD = 1 - e4 is
small. However, the total
network training time is more beneficial.
CONCLUSION
In this paper, the authors have proposed
successfully the use of "cleft-overstep"
algorithm to improve the neural network
training having the special error surface and
have illustrated particularly through the
application of hand writing recognition.
Through research and experimentation,
gained results have shown that: with the
neural network structure that error surface is
shaped deep cleft, still use this
Backpropagation algorithm but applying
"cleft-overstep" to train the network gives us
more accuracy and faster convergence speed
than the gradient method.
The use of "cleft-overstep" algorithm can be
applied to train a neural network structure that
has the special error surface. Thus, the results
of this study can be applied to many other
problems in the field of telecommunications,
control, and information technology.
The paper should be more research on the
identification of vector direction searching in
the algorithm "cleft-overstep"and the
changing the assess standard of the quality
function to reduce the complexity of the
calculation process on the computer
[6]. However, the results of this study have
initially reflected the correctness of the
proposed algorithm and revealed possibilities
to practical applications.
REFERENCES
[1]. Cong Huu Nguyen; Thanh Nga Thi
Nguyen; Phuong Huy Nguyen(2011); Research on
the application of genetic algorithm combined
with the “cleft-overstep” algorithm for improving
learning process of MLP neural network with
special error surface , Natural Computation
(ICNC), 2011 Seventh International Conference on
Issue Date: 26-28 July 2011; On page(s): 222 - 227 .
[2]. Maciej Lawrynczuk (2010), “Training or
neural models for predictive control”, Insitute of
control and computation Engineering, Faculty of
Electronics and Information Technology, Warsaw
University of Technology, ul. Nowowiejska 15/19,
00-665 Warsaw, Poland, Neurocomputing 73.
[3]. Thuc Nguyen Dinh & Hai Hoang Duc,
Artificial Intelligence – Neural Network, Method
and Application, Educational Publisher, Ha noi.
[4]. Nguyen Van Manh and Bui Minh Tri, Method
of “cleft-overstep” by perpendicular direction for
solving the unconstrained nonlinear optimization
problem, Acta Mathematica Vietnamica, vol. 15,
N02, 1990.
[5]. Hagan, M.T., H.B. Demuth and M.H Beal,
Neural Networks Design, PWS Publishing
Company, Boston, 1996.
[6]. R.K. Al Seyab, Y. Cao (2007) “Nonlinear
system identification for predictive control using
continuous time recurrent neural networks and
automatic differentiation”, School of Engineering
Cranfield University, College Road, Cranfield,
Bedford MK43 0AL, UK, Science Direct
Số hóa bởi Trung tâm Học liệu – Đại học Thái Nguyên
Nguyễn Thị Hồng Hoa và Đtg Tạp chí KHOA HỌC & CÔNG NGHỆ 93(05): 61 - 64
59
TÓM TẮT
NGHIÊN CỨU CẢI TIẾN THUẬT TOÁN HỌC CỦA MẠNG NƠRON
Nguyễn Hữu Công*1, Nguyễn Thị Thanh Nga2, Đồng Văn Ngọc3
1Đại học Thái Nguyên, 2Trường Đại học Kỹ thuật Công nghiệp – ĐH Thái Nguyên
3Trường Cao đẳng nghề Cơ điện Hà Nội
Từ giữa thế kỷ XX, nghiên cứu về các thuật toán tối ưu hóa, đặc biệt là sự phát triển của kỹ thuật
số máy tính, đang ngày càng trở thành một lĩnh vực quan trọng trong toán học. Ngày nay, những
công cụ toán học này được áp dụng để huấn luyện các mạng nơron. Trong quá trình tìm kiếm một
thuật toán tối ưu để giảm thiểu thời gian hội tụ hoặc tránh các cực tiểu yếu, cực tiểu địa phương,
đều bắt đầu từ việc nghiên cứu các đặc tính của mặt lỗi (mặt sai số). Đối với các bề mặt lỗi phức
tạp như mặt lỗi có bề mặt hở, đường nét của nó được kéo dài, uốn cong hình thành trục và hở hàm
ếch, các thuật toán cũ không thể giải quyết được. Bài báo này đề xuất một thuật toán để cải thiện
sự hội tụ và khả năng để thoát ra từ các khu vực không mong muốn trên các bề mặt lỗi đặc biệt.
Từ khóa: mạng nơron, mặt lỗi đặc biệt, cực tiểu địa phương, tối ưu hóa, thuật toán học
Ngày nhận bài: , ngày phản biện: , ngày duyệt đăng:
*
Số hóa bởi Trung tâm Học liệu – Đại học Thái Nguyên
Các file đính kèm theo tài liệu này:
- brief_33518_37339_109201285228so593_split_9_2407_2048468.pdf