Elman Net

Dynamics of RNN

\bm{x}_o^{(t)} &= \bm{a}_o( \bm{u}_o^{(t)} ) \\
\bm{x}_c^{(t)} &= \bm{a}_c( \bm{u}_c^{(t)} ) \\
\bm{u}_o^{(t)} &= W_{oc}\, \bm{u}_c^{(t)} + \bm{b}_o \\
\bm{u}_c^{(t)} &= (\bm{1}-\bm{e}_c)\bm{u}_c^{(t-1)} +
\bm{e}_c (W_{cc}\, \bm{x}_c^{(t-1)} + W_{ci}\, \bm{x}_i^{(t)} + \bm{b}_c)

Note that \bm{x}\bm{u} = \sum_i x_i u_i and \bm{1} = (1,1,\cdots)^T .

Networks states and biases:

Layer Activated values Potentials Activation Func. Bias
Output \bm{x}_o \bm{u}_o \bm{a}_o \bm{b}_o
Context \bm{x}_c \bm{u}_c \bm{a}_c \bm{b}_c
Input \bm{x}_i      

Network weights:

Connection Weights
to from  
Output Context W_{oc}
Context Context W_{cc}
Context Input W_{ci}

\bm{e}_c is the time-constant like variable. For typical RNN, set \bm{e}_c = \bm{1}.

BPTT

Backward propagation of delta-error dE/d u_{c'}^{(t)} can be written as follows:

\frac{dE}{d u_c^{(t-1)}} &=
\sum_{o} \frac{dE}{d u_{o}^{(t)}} w_{oc} a_c'(u_c^{(t)})
+ \sum_{c'} \frac{dE}{d u_{c'}^{(t)}} \frac{d u_{c'}^{(t)}}{d u_c^{(t-1)}} \\
\frac{d u_{c'}^{(t)}}{d u_c^{(t-1)}} &= (1 - e_{c'}) \delta_{c' c}
+ e_{c'} w_{c' c} a_c'(u_c^{(t-1)})

Here I assume d a_c(u_c)/d u_{c'} = 0 if c \neq c'.

Gradient of each parameters:

\frac{dE}{d w_{oc}}  &= \sum_{t} \frac{dE}{d u_{o}^{(t)}} x_c^{(t)}\\
\frac{dE}{d b_{o}}   &= \sum_{t} \frac{dE}{d u_{o}^{(t)}}\\
\frac{dE}{d w_{cc'}} &= \sum_{t} \frac{dE}{d u_{c}^{(t)}} e_c x_{c'}^{(t-1)}\\
\frac{dE}{d w_{ci}}  &= \sum_{t} \frac{dE}{d u_{c}^{(t)}} e_c x_{i}^{(t)}\\
\frac{dE}{d b_{c}}   &= \sum_{t} \frac{dE}{d u_{c}^{(t)}} e_c\\

Table Of Contents

Previous topic

Welcome to PyRNN’s documentation!

This Page