API doc of all exported functions are listed here:
Chains
NNHelferlein.AbstractNN
— Typeabstract type AbstractNN
Mother type for AbstractNN hierarchy with implementation for a chain of layers.
Signatures:
(m::AbstractNN)(x)
: run the AbstractArrayx
througth all layers and return the output(m::AbstractNN)(x,y)
: Calculate the loss for one minibatchx
and teaching inputy
(m::AbstractNN)(d::Knet.Data)
: Calculate the loss for all minibatches ind
(m::AbstractNN)(d::Tuple)
: Calculate the loss for all minibatches ind
(m::AbstractNN)(d::NNHelferlein.DataLoader)
: Calculate the loss for all minibatches ind
if teaching input is included (i.e. elements of d are tuples). Otherwise return the out of all minibatches as one array with samples as columns.
```
NNHelferlein.AbstractChain
— Typeabstract type AbstractChain
Mother type for AbstractChain hierarchy with implementation for a chain of layers. By default every AbstractChain
has a property layers
with a iterable list of AbstractLayer
s or AbstractChain
s that are executed recursively.
Non-standard Chains in which Layers are not execueted sequnetially (such as ResnetBlocks) must provide a custom implementation with the signature chain(x)
.
Signatures:
(m::AbstractChain)(x)
: run the AbstractArrayx
througth all layers and return the output
```
NNHelferlein.add_layer!
— Functionfunction add_layer!(n::Union{NNHelferlein.AbstractNN, NNHelferlein.AbstractChain}, l)
Add a layer l
or a chain to a model n
. The layer is always added at the end of the chains. The modified model is returned.
Base.:+
— Functionfunction +(n::Union{NNHelferlein.AbstractNN, NNHelferlein.AbstractChain}, l::Union{AbstractLayer, AbstractChain})
function +(l1::AbstractLayer, l2::Union{AbstractLayer, AbstractChain})
The plus
-operator is overloaded to be able to add layers and chains to a network.
The second form returns a new chain if 2 Layers are added.
Example:
julia> mdl = Classifier() + Dense(2,5)
julia> print_network(mdl)
NNHelferlein neural network summary:
Classifier with 1 layers, 15 params
Details:
Dense layer 2 → 5 with sigm, 15 params
Total number of layers: 1
Total number of parameters: 15
julia> mdl = mdl + Dense(5,5) + Dense(5,1, actf=identity)
julia> print_network(mdl)
NNHelferlein neural network summary:
Classifier with 3 layers, 51 params
Details:
Dense layer 2 → 5 with sigm, 15 params
Dense layer 5 → 5 with sigm, 30 params
Dense layer 5 → 1 with identity, 6 params
Total number of layers: 3
Total number of parameters: 51
NNHelferlein.Classifier
— Typestruct Classifier <: AbstractNN
Classifier with default nll loss. An alternative loss function can be supplied as keyword argument. The function must provide a signature to be called as loss(model(x), y)
.
Constructors:
Classifier(layers...; loss=Knet.nll)
Signatures:
(m::Classifier)(x,y) = m.loss(m(x), y)
NNHelferlein.Regressor
— Typestruct Regressor <: AbstractNN
Regression network with square loss as loss function.
Constructors:
Regressor(layers...; loss=mean_squared_error.nll)
Signatures:
(m::Regression)(x,y) = mean(abs2, Array(m(x)) - y)
NNHelferlein.Transformer
— Typemutable struct Transformer
A Bert-like transformer network consisting of an encoder and a decoder stack.
Constructor:
Transformer(n_layers, depth, heads; drop_rate=0.1)
n_layers
: number of layers in encoder and decoderdepth
: embedding depthheads
: number of heads for the multi-head attentiondrop_rate
: dropout rate used in all layers
Signature:
(tf::Transformer)(x, y; enc_mask=nothing, dec_mask=nothing)
The transformer is called with two 3-d-arrays of embedded sequences x
and y
of size [depth, seq_len, n_minibatch]
and returns a tensor of size [depth, seq_len_y, n_minibatch]
. Sequences x
and y
may be of different lengths; output has always the same dimensions as y
.
Attention factors of the last run are stored in the field α
of the transformer object.
enc_mask
and dec_mask
are optional padding masks for the encoder and decoder input, respectively. They must be of size [seq_len, n_minibatch]
.
NNHelferlein.TokenTransformer
— Typemutable struct TokenTransformer
A wrapper around the Transformer
object that takes sequences of token ids as input.
Constructor:
TokenTransformer(n_layers, depth, heads,
x_vocab, y_vocab;
drop_rate=0.1)
n_layers
: number of layers in encoder and decoderdepth
: embedding depthheads
: number of heads for the multi-head attentionx_vocab
: vocabulary size of the input sequences as integer value or aWordTokenizer
objecty_vocab
: vocabulary size of the output sequences as integer value or aWordTokenizer
objectdrop_rate
: dropout rate used in all layers
Signature:
(tt::TokenTransformer)(x, y; enc_mask=nothing, dec_mask=nothing
embedded=true)
The transformer is called with two 2-d-arrays of token ids x
and y
of size [seq_len, n_minibatch]
which may be of different lengths. It returns a tensor of size [y_vocab, seq_len_y, n_minibatch]
with the raw activations of output neurons or, if embedded
is set to false
, a 2-d-array of size [seq_len_y, n_minibatch]
with the sequences of generated tokens.
NNHelferlein.Chain
— Typestruct Chain <: AbstractChain
Simple wrapper to chain layers and execute them one after another.
NNHelferlein.VAE
— Typestruct VAE <: AbstractNN
Type for a generic variational autoencoder.
Constructor:
VAE(encoder, decoder)
Separate predefind chains (ideally, but not necessarily of type Chain
) for encoder and decoder must be specified. The VAE needs the 2 parameters mean and variance to define the distribution of each code-neuron in the bottleneck-layer. In consequence the encoder output must be 2 times the size of the decoder input (in case of dense layers: if encoder output is a 8-value vector, 4 codes are defined and the decoder input is a 4-value vector; in case of convolutional layers the number of encoder output channels must be 2 times the number of the encoder input channels - see the examples).
Signatures:
(vae::VAE)(x)
(vae::VAE)(x,y)
Called with one argument, predict will be executed; with two arguments (args x and y should be identical for the autoencoder) the loss will be returned.
Details:
The loss is calculated as the sum of element-wise error squares plus the Kullback-Leibler-Divergence to adapt the distributions of the bottleneck codes:
\[\mathcal{L} = \frac{1}{2} \sum_{i=1}^{n_{outputs}} (t_{i}-o_{i})^{2} - \frac{1}{2} \sum_{j=1}^{n_{codes}}(1 + ln\sigma_{c_j}^{2}-\mu_{c_j}^{2}-\sigma_{c_j}^{2}) \]
Output of the autoencoder is cropped to the size of input before loss calculation (and before prediction); i.e. the output has always the same dimensions as the input, even if the last layer generates a bigger shape.
KL-training parameters:
The parameter β is by default set to 1.0, i.e. mean-squared error and KL has the same weights. The functions set_beta(vae, beta)
and get_beta(vae)
can be used to set and get the β used in training. With β=0.0 no KL-loss will be used.
NNHelferlein.get_beta
— Functionfunction get_beta(vae::VAE; ramp=false)
Return a Dict
with the current VAE-parameters beta and ramp-up.
Arguments:
ramp=false
: iftrue
, a vector of β for all ramp-up steps is returned. This way, the ramp-up phase can be visualised: <img src="./assets/vae-beta-range.png"/>
NNHelferlein.set_beta!
— Functionfunction setbeta!(vae::VAE, βmax; ramp_up=false, steps=0)
Helper to set the current value of the VAE-parameter beta and ramp-up settings.
VAE loss is calculated as (mean of error squares) + β * (mean of KL divergence).
Ramp-up:
In case of ramp_up=true
, β starts with almost 0.0 (sigm(-10.0)
≈4.5e-5) and reaches almost 1.0 after steps
steps, following a sigmoid curve. steps
should be more than 25, to avoid rounding errors in the calculation of the derivative of the sigmoid function.
Layers
NNHelferlein.AbstractLayer
— Typeabstract type AbstractLayer
abstract type Layer
Mother type for layers hierarchy. (The type Layer
is kept for backward compatibility)
Fully connected layers
NNHelferlein.Dense
— Typestruct Dense <: AbstractLayer
Default Dense layer.
Constructors:
Dense(w, b, actf)
: default constructor,w
are the weights andb
the bias.Dense(i::Int, j::Int; actf=sigm, init=..)
: layer ofj
neurons withi
inputs. Initialiser is xavieruniform foractf=sigm
and xaviewnormal otherwise.Dense(h5::HDF5.File, group::String; trainable=false, actf=sigm)
: kernel and bias are loaded by the specifiedgroup
.Dense(h5::HDF5.File, kernel::String, bias::String; trainable=false, actf=sigm)
: layer imported from a hdf5-file from TensorFlow with the hdf-object h5 and the group name group.
NNHelferlein.Linear
— Typestruct Linear <: AbstractLayer
Almost standard dense layer, but functionality inspired by the TensorFlow-layer:
- capable to work with input tensors of any number of dimensions
- default activation function
identity
- optionally without biases.
The shape of the input tensor is preserved; only the size of the first dim is changed from in to out.
Constructors:
Linear(i::Int, j::Int; bias=true, actf=identity, init=xaview_normal)
wherei
is fan-in andj
is fan-out.
Keyword arguments:
bias=true
: if false biases are fixed to 0.0actf=identity
: activation function.
NNHelferlein.Embed
— Typestruct Embed <: AbstractLayer
Simple type for an embedding layer to embed a virtual onehot-vector into a smaller number of neurons by linear combination. The onehot-vector is virtual, because not the vector, but only the index of the "one" in the vector has to be provided as Integer value (or a minibatch of integers) with values between 1 and the vocab size.
Constructors:
Embed(v,d; actf=identity, mask=nothing):
with vocab sizev
, embedding depthd
and default activation function identity.mask
defines the padding token (see below).
Signatures:
(l::Embed)(x)
: default embedding of input tensorx
.
Value:
The embedding is constructed by adding a first dimension to the input tensor with number of rows = embedding depth. If x
is a column vector, the value is a matrix. If x
is as row-vector or a matrix, the value is a 3-d array, etc.
Padding and masking:
If a token value is defined as mask
, occurences are embedded as zero vector. This can be used for padding sequence with zeros. The masking/padding token counts to the vocab size. If padding tokens are not masked, their embedding will be optimised during training (which is not recommended but still possible for many applications).
Zero may be used as padding token, but it must count to the vocab size (i.e. the vocab size must be one larger than the number of tokens) and the keyword arg mask=0
must be specified.
Convolutional
NNHelferlein.Conv
— Typestruct Conv <: AbstractLayer
Default Conv layer.
Constructors:
Conv(w1::Int, w2::Int, i::Int, o::Int; actf=relu; kwargs...)
: layer with o kernels of size (w1,w2) for an input of i channels.Conv(w1::Int, w2::Int, w3::Int, i::Int, o::Int; actf=relu; kwargs...)
: layer with 3-dimensional kernels for 3D convolution (requires 5-dimensional input)Conv(w1::Int, i::Int, o::Int; actf=relu; kwargs...)
: layer with o kernels of size (1,w1) for an input of i channels. This 1-dimensional convolution uses a 2-dimensional kernel with a first dimension of size 1. Input and output contain an empty firfst dimension of size 1. Ifpadding
,stride
ordilation
are specified, 2-tuples must be specified to correspond with the 2-dimensional kernel (e.g.padding=(0,1)
for a 1-padding along the 1D sequence).
Constructors to read parameters from Tensorflow/Keras HDF-files:
Conv(h5::HDF5.File, kernel::String, bias::String; trainable=false, actf=Knet.relu, use_bias=true, kwargs...)
: Import parameters from HDF fileh5
withkernel
andbias
specifying the full path to weights and biases, respectively.Conv(h5::HDF5.File, group::String; trainable=false, actf=relu, tf=true, use_bias=true)
: Import a conv-layer from a default TF/Keras HDF5 file. Iftf=false
,group
defines the full path to the parametersgroup/kernel:0
andgroup/bias:0
. Iftf=true
,group
defines the only the group name and parameters are addressed asmodel_weights/group/group/kernel:0
andmodel_weights/group/group/bias:0
.
Keyword arguments:
padding=0
: the number of extra zeros implicitly concatenated at the start and end of each dimension.stride=1
: the number of elements to slide to reach the next filtering window.dilation=1
: dilation factor for each dimension....
See the Knet documentation for Details: https://denizyuret.github.io/Knet.jl/latest/reference/#Convolution-and-Pooling. All keywords to the Knet functionconv4()
are supported.
NNHelferlein.DeConv
— Typestruct DeConv <: AbstractLayer
Default deconvolution layer.
Constructors:
DeConv(w, b, actf, kwargs...)
: default constructorDeConv(w1::Int, w2::Int, i::Int, o::Int; actf=relu, kwargs...)
: layer with o kernels of size (w1,w2) for an input of i channels.DeConv(w1::Int, w2::Int, w3::Int, i::Int, o::Int; actf=relu, kwargs...)
: layer with o kernels of size (w1,w2,w3) for an input of i channels.
Keyword arguments:
padding=0
: the number of extra zeros implicitly concatenated at the start and end of each dimension (applied to the output).stride=1
: the number of elements to slide to reach the next filtering window (applied to the output)....
See the Knet documentation for Details: https://denizyuret.github.io/Knet.jl/latest/reference/#Convolution-and-Pooling. All keywords to the Knet functiondeconv4()
are supported.
NNHelferlein.ResNetBlock
— Typestruct ResNetBlock <: AbstractChain
Executable type for one block of a ResNet-type network.
Constructors:
ResNetBlock(layers; shortcut=[identity], post=[identity])
: 3 chains to form the block: the main chain, the shortcut and a chain of layers to be added after the confluence. All chains must be specified as lists, even if they are empty ([]
) or comprise only one layer ([BatchNorm]
).
NNHelferlein.DepthwiseConv
— TypeDepthwiseConv <: AbstractLayer
Conv layer with seperate filters per input channel. o output feature maps will be created by performing a convolution on only one input channel. o
must be a multiple of i
.
Constructors:
DepthwiseConv(w, b, actf; kwargs)
: default constructorConv(w1::Int, w2::Int, i::Int, o::Int; actf=relu, kwargs...)
: layer witho
kernels of size (w1,w2) for every input channel of an 2-d input ofi
layers.o
must be a multiple ofi
; ifo == i
, each output feature map is generated from one channel. Ifo == n*i
,n
feature maps are generated from each channel.
Keyword arguments:
padding=0
: the number of extra zeros implicitly concatenated at the start and end of each dimension.stride=1
: the number of elements to slide to reach the next filtering window.dilation=1
: dilation factor for each dimension.
NNHelferlein.Pool
— Typestruct Pool <: AbstractLayer
Pooling layer.
Constructors:
Pool(;kwargs...)
: max pooling; withoutkwargs
, 2-pooling is performed.
Keyword arguments:
window=2
: poolingwindow
size (same for all directions)...
: See the Knet documentation for Details: https://denizyuret.github.io/Knet.jl/latest/reference/#Convolution-and-Pooling. All keywords to the Knet functionpool
are supported.
NNHelferlein.UnPool
— Typestruct UnPool <: AbstractLayer
Unpooling layer.
Constructors:
UnPool(;kwargs...)
: user-defined unpooling
NNHelferlein.Pad
— Typestruct Pad <: AbstractLayer
Pad an n-dimensional array along dims
with one of the types supported by Flux.NNlib
.
Constructors:
Pad(padding::Int; type=:zeros, dims=nothing)
: Pad withpadding
along all dims.
Keyword arguments:
type
: one of:zeros
: zero-padding:ones
: one-padding:repeat
: repeat values on the border:relect
: reflect values across the border
dims
: Tuple of dims to be padded. Ifdims==nothing
all except of the last 2 dimensions (i.e. channel and minibatch dimension for convolution layers) are padded.
Recurrent
NNHelferlein.RecurrentUnit
— Typeabstract type RecurrentUnit end
Supertype for all recurrent unit types. Self-defined recurrent units which are a child of RecurrentUnit
can be used inside the 'Recurrent' layer.
Interface
All subtypes of RecurrentUnit
must provide the followning:
- a constructor with signature
Type(n_inputs, n_units; kwargs)
and arbitrary keyword arguments. - an implementation of signature
(o::Recurrent)(x)
wherex
is a 3d- or 2d-array of shape [fan-in, mb-size, 1] or [fan-in, mb-size]. The function must return the result of one forward computation for one step and return the hidden state and set the internal fieldsh
and optionallyc
. - a field
h
(to store the last hidden state) - an optional field
c
, if the cell state is to be stored such as in a lstm unit.
NNHelferlein.Recurrent
— Typestruct Recurrent <: AbstractLayer
One layer RNN that works with minibatches of (time) series data. Minibatch can be a 2- or 3-dimensional Array. If 2-d, inputs for one step are in one column and the Array has as many colums as steps. If 3-d, the last dimension iterates the samples of the minibatch.
Result is an array matrix with the output of the units of all steps for all smaples of the minibatch (with model depth as first and samples of the minimatch as last dimension).
Constructors:
Recurrent(n_inputs::Int, n_units::Int; u_type=:lstm,
bidirectional=false, allow_mask=false, o...)
n_inputs
: number of inputsn_units
: number of unitsu_type
: unit type can be one of the Knet unit types (:relu, :tanh, :lstm, :gru
) or a type which must be a subtype ofRecurrentUnit
and fullfill the respective interface (see the docs forRecurentUnit
).bidirectional=false
: if true, 2 layers ofn_units
units will be defined and run in forward and backward direction respectively. The hidden state is[2*n_units*mb]
or[2*n_units,steps,mb]
idreturn_all==true
.allow_mask=false
: if masking is allowed, a slower algorithm is used to be able to ignore any masked step. Arbitrary sequence positions may be masked for any sequence.
Any keyword argument of Knet.RNN
or a self-defined RecurrentUnit
type may be provided.
Signatures:
function (rnn::Recurrent)(x; c=nothing, h=nothing, return_all=false,
mask=nothing)
The layer is called either with a 2-dimensional array of the shape [fan-in, steps] or a 3-dimensional array of [fan-in, steps, batchsize].
Arguments:
c=0
,h=0
: inits the hidden and cell state. Ifnothing
, statesh
orc
keep their values. Ifc=0
orh=0
, the states are resetted to0
; otherwise an array of states of the correct dimensions can be supplied to be used as initial states.return_all=false
: iftrue
an array with all hidden states of all steps is returned (size is [units, time-steps, minibatch]). Otherwise only the hidden states of the last step are returned ([units, minibatch]).mask
: optional mask for the input sequence minibatch of shape [steps, minibatch]. Values in the mask must be 1.0 for masked positions or 0.0 otherwise and of typeFloat32
orCuArray{Float32}
for GPU context. Appropriate masks can be generated with the NNHelferlein functionmk_padding_mask()
.
Bidirectional layers can be constructed by specifying bidirectional=true
, if the unit-type supports it (Knet.RNN does). Please be aware that the actual number of units is 2 x n_units for bidirectional layers and the output dimension is [2 x units, steps, mb] or [2 x units, mb].
function get_hidden_states(l::<RNN_Type>; flatten=true)
Return the hidden states of one or more layers of an RNN. <RNN_Type>
is one of NNHelferlein.Recurrent
, Knet.RNN
.
Arguments:
flatten=true
: if the states tensor is 3d with a 3rd dim > 1, the array is transformed to [units, mb, 1] to represent all current states after the last step.
NNHelferlein.get_cell_states
— Functionfunction get_cell_states(l::<RNN_Type>; unbox=true, flatten=true)
Return the cell states of one or more layers of an RNN only if it is a LSTM (Long short-term memory).
Arguments:
unbox=true
: By default, c is unboxed when called in@diff
context (while AutoGrad is recording) to avoid unwanted dependencies of the computation graph s2s.attn(reset=true) (backprop should run via the hidden states, not the cell states).flatten=true
: if the states tensor is 3d with a 3rd dim > 1, the array is transformed to [units, mb, 1] to represent all current states after the last step.
function set_hidden_states!(l::<RNN_Type>, h)
Set the hidden states of one or more layers of an RNN to h
.
NNHelferlein.set_cell_states!
— Functionfunction set_cell_states!(l::<RNN_Type>, c)
Set the cell states of one or more layers of an RNN to c
.
function reset_hidden_states!(l::<RNN_Type>)
Reset the hidden states of one or more layers of an RNN to 0.
NNHelferlein.reset_cell_states!
— Functionfunction reset_cell_states!(l::<RNN_Type>)
Reset the cell states of one or more layers of an RNN to 0.
Transformers
NNHelferlein.TFEncoder
— TypeTFEncoder
A Bert-like encoder to be used as part of a tranformer. The encoder is build as a stack of TFEncoderLayer
s which is entered after embedding, positional encoding and generation of a padding mask.
Constructor:
TFEncoder(n_layers, depth, n_heads; drop_rate=0.1)
Signature:
(e::TFEncoder)(x)
The encoder is called with a matrix of embedded tokens of size [depth, seq_len, n_minibatch]
and returns a tensor of size [depth, seq_len, n_minibatch]
.
NNHelferlein.TFEncoderLayer
— TypeTFEncoderLayer
A Bert-like encoder layer to be used as part of a Bert-like transformer. The layer consists of a multi-head attention sub-layer followed by a feed-forward network of size depth -> 4*depth -> depth. Both parts have separate residual connections and layer normalisation.
The design follows the original paper "Attention is all you need" by Vaswani, 2017.
Constructor:
TFEncoderLayer(depth, n_heads, drop)
depth
: Embedding depthn_heads
: number of heads for the multi-head attentiondrop_rate
: dropout rate
Signature:
(el::TFEncoderLayer)(x; mask=nothing)
Objects of type TFEncoderLayer
are callable and expect a 3-dimensional array of size [embeddingdepth, seqlen, minibatchsize] as input. The optional mask
must be of size [seqlen, minibatch_size] and mark masked positions with 1.0.
It returns a tensor of the same size as the input and the self-attention factors of size [seqlen, seqlen, minibatch_size].
NNHelferlein.TFDecoder
— TypeTFDecoder
A Bert-like decoder to be used as part of a tranformer. The decoder is build as a stack of TFDecoderLayer
s which is entered after embedding, positional encoding and generation of a padding mask and a peek-ahead mask.
Constructor:
TFDecoder(n_layers, depth, n_heads, vocab_size;
pad_id=NNHelferlein.TOKEN_PAD, drop_rate=0.1)
Signature:
(e::TFdecoder)(x)
The decoder is called with a matrix of token ids of size [seq_len, n_minibatch]
and returns a tensor of size [depth, seq_len, n_minibatch]
and the attention factors.
NNHelferlein.TFDecoderLayer
— TypeTFDecoderLayer
A Bert-like decoder layer to be used as part of a Bert-like transformer. The layer consists of a multi-head self-attention sub-layer, a multi-head attention sub-layer followed by a feed-forward network of size depth -> 4*depth -> depth. All three parts have separate residual connections and layer normalisation.
The design follows the original paper "Attention is all you need" by Vaswani, 2017.
Constructor:
TFDecoderLayer(depth, n_heads, drop)
depth
: Embedding depthn_heads
: number of heads for the multi-head attentiondrop
: dropout rate
Signature:
(el::TFDecoderLayer)(x, h_encoder; enc_m_pad=nothing, m_combi=nothing)
Objects of type TFDecoderLayer
are callable and expect a minibatch of embedded sequences as input.
x
: 3-dimensional array of size [embeddingdepth, seqlen, minibatch_size]h_encoder
: output of the encoder stackenc_m_pad
: optional padding mask for the encoder outputm_combi
: optional mask for the decoder self-attention combining padding and peek-ahead mask.
It returns a tensor of the same size as the input, the self-attention factors and the decoder-encoder attention factors.
These layers are used by the Transformer
and TokenTransformer
types to build Bert-like transformer networks.
Others
NNHelferlein.Flat
— Typestruct Flat <: AbstractLayer
Default flatten layer.
Constructors:
Flat()
: with no options.
NNHelferlein.flatten
— Functionflatten(x)
Flatten a tensor to a matrix, preserving the last dimension.
NNHelferlein.PyFlat
— Typestruct PyFlat <: AbstractLayer
Flatten layer with optional Python-stype flattening (row-major). This layer can be used if pre-trained weight matrices from tensorflow are applied after the flatten layer.
Constructors:
PyFlat(; python=true)
: if true, row-major flatten is performed.
NNHelferlein.FeatureSelection
— Typestruct FeatureSelection <: AbstractLayer
Simple feature selection layer that maps input to output with one-by-one connections; i.e. a layer of size 128 has 128 weights (plus optional biases).
Biases and activation functions are disabled by default.
Constructors:
FeatureSelection(i; bias=false, actf=identity)
: with the same input- and output-sizei
, whrei
is an integer or a Tuple of the input dimensions.
NNHelferlein.Activation
— Typestruct Activation <: AbstractLayer
Simple activation layer with the desired activation function as argument.
Constructors:
Activation(actf)
Relu()
Sigm()
Swish()
NNHelferlein.Softmax
— Typestruct Softmax <: AbstractLayer
Simple softmax layer to compute softmax probabilities.
Constructors:
Softmax()
NNHelferlein.Logistic
— Typestruct Logistic <: AbstractLayer
Logistic (sigmoid) layer activation with additional Temperature parameter to control the slope of the curve. Low temperatures (such as T=0.001) result in a step-like activation function, whereas high temperatures (such as T=10) makes the activation almoset linear.
Constructors:
Logistic(; T=1.0)
NNHelferlein.Dropout
— Typestruct Dropout <: AbstractLayer
Dropout layer. Implemented with help of Knet's dropout() function that evaluates AutoGrad.recording() to detect if in training or in prediction. Dropouts are applied only if prediction.
Constructors:
Dropout(p)
with the dropout rate p.
NNHelferlein.BatchNorm
— Typestruct BatchNorm <: AbstractLayer
Batchnormalisation layer. Implemented with help of Knet's batchnorm() function that evaluates AutoGrad.recording() to detect if in training or in prediction. In training the moments are updated to record the running averages; in prediction the moments are applied, but not modified.
In addition, optional trainable factor a
and bias b
are applied:
\[y = a \cdot \frac{(x - \mu)}{(\sigma + \epsilon)} + b\]
Constructors:
BatchNorm(; scale=true, channels=0)
will initialise the moments withKnet.bnmoments()
and trainable parametersβ
andγ
only ifscale==true
(in this case, the number of channels must be defined - for CNNs this is the number of feature maps).
Constructors to read parameters from Tensorflow/Keras HDF-files:
BatchNorm(h5::HDF5.File, β_path, γ_path, μ_path, var_path; scale=false, trainable=true, momentum=0.1, ε=1e-5, dims=4)
: Import parameters from HDF fileh5
withβ_path
,γ_path
,μ_path
andvar_path
specifying the full path to β, γ, μ and variance respectively.BatchNorm(h5::HDF5.File, group::String; scale=false, trainable=true, momentum=0.1, ε=1e-5, dims=4, tf=true)
: Import parameters from HDF fileh5
with parameters in the groupgroup
. Paths to β, γ, μ and variance are constructed iftf=true
asmodel_weights/group/group/beta:0
, etc. Iftf=false
group must define the full group path:group/beta:0
.dims
specifies the number of dimensions of the input and may be 2, 4 or 5. The default (4) applies to standard CNNs (imgsize, imgsize, channels, batchsize).
Keyword arguments:
scale=true
: iftrue
, the trainable scale parameters β and γ are used.trainable=true
. only used with hdf5-import. Iftrue
the parameters β and γ are initialised asParam
and trained in training.
Details:
2d, 4d and 5d inputs are supported. Mean and variance are computed over dimensions (2), (1,2,4) and (1,2,3,5) for 2d, 4d and 5d arrays, respectively.
If scale=true
and channels != 0
, trainable parameters β
and γ
will be initialised for each channel.
If scale=true
and channels == 0
(i.e. BatchNorm(scale=true)
), the params β
and γ
are not initialised by the constructor. Instead, the number of channels is inferred when the first minibatch is normalised as: 2d: size(x)[1]
4d: size(x)[3]
5d: size(x)[4]
or 0
otherwise.
NNHelferlein.LayerNorm
— Typestruct LayerNorm <: AbstractLayer
Simple layer normalisation (inspired by TFs LayerNormalization). Implementation is from Deniz Yuret's answer to feature request 429 (https://github.com/denizyuret/Knet.jl/issues/492).
The layer performs a normalisation within each sample, not batchwise. Normalisation is modified by two trainable parameters a
and b
(variance and mean) added to every value of the sample vector.
Constructors:
LayertNorm(depth; eps=1e-6)
:depth
is the number of activations for one sample of the layer.
Signatures:
function (l::LayerNorm)(x; dims=1)
: normalisex
along the given dimensions. The size of the specified dimension must fit with the initialiseddepth
.
NNHelferlein.GaussianNoise
— Typestruct GaussianNoise
Gaussian noise layer. Multiplies Gaussian-distributed random values with mean = 1.0 and sigma = σ to each training value.
Constructors:
aussianNoise(σ; train_only=true)
Arguments:
σ
: Standard deviation for the distribution of noisetrain_only=true
: iftrue
, noise will only be applied in training.
NNHelferlein.GlobalAveragePooling
— Typestruct GlobalAveragePooling <: AbstractLayer
Layer to return a matrix with the mean values of all but the last two dimensions for each sample of the minibatch. If the input is a stack of feature maps from a convolutional layer, the result can be seen as the mean value of each feature map. Number of output-rows equals number of input-featuremaps; number of output-columns equals size of minibatch.
Constructors:
GlobalAveragePooling()
NNHelferlein.global_average_pooling
— Functionglobal_average_pooling(x)
Function to return a matrix with the mean values of all but the last two dimensions for each sample of the minibatch.
Attention Mechanisms
NNHelferlein.AttentionMechanism
— Typeabstract type AttentionMechanism
Attention mechanisms follow the same interface and common signatures.
If possible, the algorithm allows precomputing of the projections of the context vector generated by the encoder in a encoder-decoder-architecture (i.e. in case of an RNN encoder the accumulated encoder hidden states).
By default attention scores are scaled according to Vaswani et al., 2017 (Vaswani et al., Attention Is All You Need, CoRR, 2017).
All algorithms use soft attention.
Constructors:
Attn*Mechanism*(dec_units, enc_units; scale=true)
Attn*Mechanism*(units; scale=true)
The one-argument version can be used, if encoder dimensions and decoder dimensions are the same.
Common Signatures:
function (attn::AttentionMechanism)(h_t, h_enc; reset=false, mask=nothing)
function (attn::AttentionMechanism)(; reset=false)
Arguments:
h_t
: decoder hidden state. If $h_t$ is a vector, its length equals the number of decoder units. If it is a matrix, $h_t$ includes the states for a minibatch of samples and has the size [units, mb].h_enc
: encoder hidden states, 2d or 3d. If $h_{enc}$ is a matrix [units, steps] with the hidden states of all encoder steps. If 3d: [units, mb, steps] encoder states for all minibatches.mask
: optional mask (e.g. padding mask) for masking input steps of dimensions [mb, steps]. Attentions factors for masked steps will be set to 0.0.reset=false
: If the keyword argument is set totrue
, projections of the encoder states are computed. By default projections are stored in the object and reused until the object is resetted. For attention mechanisms that do not allow precomputation the argument is ignored.
The short form (::AttentionMechanism)(reset=true)
can be used to reset the precomputed projections.
Return values
All functions return c
and α
where α
is a matrix of size [mb,steps] with the attention factors for each step and minibatch. c
is a matrix of size [units, mb] with the context vector for each sample of the minibatch, calculated as the α-weighted sum of all encoder hidden states $h_{enc}$ for each minibatch.
Attention Mechanisms:
All attention mechanisms calculate attention factors α from scores derived from projections of the encoder hidden states:
\[\alpha = \mathrm{softmax}(\mathrm{score}(h_{enc},h_{t}) \cdot 1/\sqrt{n}))\]
Attention mechanisms implemented:
NNHelferlein.AttnBahdanau
— Typemutable struct AttnBahdanau <: AttentionMechanism
Bahdanau-style (additive, concat) attention mechanism according to the paper:
D. Bahdanau, KH. Co, Y. Bengio, Neural Machine Translation by jointlylearning to align and translate, ICLR, 2015.
\[\mathrm{score}(h_{t},h_{enc}) = v_{a}^{\top}\cdot\tanh(W[h_{t},h_{enc}])\]
Constructors:
AttnBahdanau(dec_units, enc_units; scale=true)
AttnBahdanau(units; scale=true)
NNHelferlein.AttnLuong
— Typemutable struct AttnLuong <: AttentionMechanism
Luong-style (multiplicative) attention mechanism according to the paper (referred as General-type attention): M.-T. Luong, H. Pham, C.D. Manning, Effective Approaches to Attention-based Neural Machine Translation, CoRR, 2015.
\[\mathrm{score}(h_{t},h_{enc}) = h_{t}^{\top} W h_{enc}\]
Constructors:
AttnLuong(dec_units, enc_units; scale=true)
AttnLuong(units; scale=true)
NNHelferlein.AttnDot
— Typemutable struct AttnDot <: AttentionMechanism
Dot-product attention (without trainable parameters) according to the Luong, et al. (2015) paper.
$\mathrm{score}(h_{t},h_{enc}) = h_{t}^{\top} h_{enc}$
Constructors:
AttnDot(; scale=true)
NNHelferlein.AttnLocation
— Typemutable struct AttnLocation <: AttentionMechanism
Location-based attention that only depends on the current decoder state $h_t$ and not on the encoder states, according to the Luong, et al. (2015) paper.
$\mathrm{score}(h_{t}) = W h_{t}$
Constructors:
AttnLocation(len, dec_units; scale=true)
len
: maximum sequence length of the encoder to be considered for attention. If the actual length of $h_{enc}$ is bigger than the length ofα
, attention factors for the remaining states are set to 0.0. If the actual length of h_enc is smaller thanα
, only the matching attention factors are applied.dec_units
: number of decoder units.
NNHelferlein.AttnInFeed
— Typemutable struct AttnInFeed <: AttentionMechanism
Input-feeding attention that depends on the current decoder state $h_t$ and the next input to the decoder $i_{t+1}$, according to the Luong, et al. (2015) paper.
Infeed attention provides a semantic attention that depends on the next input token.
$\mathrm{score}(h_{t}, i_{t+1}) = W_h h_{t} + W_i i_{t+1} = W [h_t, i_{t+1}]$
Constructors:
AttnInFeed(len, dec_units, fan_in; scale=true)
len
: maximum sequence length of the encoder to be considered for attention. If the actual length of $h_{enc}$ is bigger than the length ofα
, attention factors for the remaining states are set to 0.0. If the actual length ofh_enc
is smaller thanα
, only the matching attention factors are applied.dec_units
: number of decoder units.fan_in
: size of the decoder input.
Signature:
function (attn::AttnInFeed)(h_t, inp, h_enc; mask=nothing)
h_t
: decoder hidden state. If $h_t$ is a vector, its length equals the number of decoder units. If it is a matrix, $h_t$ includes the states for a minibatch of samples and has the size [units, mb].inp
: next decoder input $i_{t+1}$ (e.g. next embedded token of sequence)h_enc
: encoder hidden states, 2d or 3d. If $h_{enc}$ is a matrix [units, steps] with the hidden states of all encoder steps. If 3d: [units, mb, steps] encoder states for all minibatches.mask
: Optional mask for input states of shape [mb, steps].
Data providers
NNHelferlein.DataLoader
— Typeabstract type DataLoader
Mother type for minibatch iterators.
NNHelferlein.SequenceData
— Typestruct SequenceData <: DataLoader
Type for a generic minibatch iterator. All NNHelferlein models accept minibatches of type DataLoader
.
Constructors:
SequenceData(x; shuffle=true)
x
: List, Array or other iterable object with the minibatchesshuffle
: iftrue
, minibatches are shuffled every epoch.
Iteration utilities
NNHelferlein.PartialIterator
— Typestruct PartialIterator <: DataLoader
The PartialIterator
wraps any iterator and will only iterate the states specified in the list indices
.
Constuctors
PartialIterator(inner, indices; shuffle=true)
Type of the states must match the states of the wrapped iterator inner
. A nothing
element may be given to specify the first iterator element.
If shuffle==true
, the list of indices are shuffled every time the PartialIterator
is started.
NNHelferlein.split_minibatches
— Functionfunction split_minibatches(it, at=0.8; shuffle=true)
Return 2 iterators of type PartialIterator
which iterate only parts of the states of the iterator it
. Be aware that the partial iterators will not contain copies of the data but instead forward the data provided by the iterator it
.
The function can be used to split an iterator of minibatches into train- and validation iterators, without copying any data. As the PartialIterator
objects work with the states of the inner iterator, it is important not to shuffle the inner iterator (in this case the composition of the partial iterators would change and training and validation data may be mixed!).
Arguments:
it
: Iterator to be splitted. The list of allowed states is created by performing a full iteration once.at
: Split point. The first returned iterator will include the given fraction (default: 80%) of the states.shuffle
: If true, the elements are shuffled at each restart of the iterator.
NNHelferlein.MBNoiser
— Typetype MBNoiser
Iterator to wrap any Knet.Data iterator of minibatches in order to add random noise. Each value will be multiplied with a random value form Gaussian noise with mean=1.0 and sd=σ.
Construtors:
MBNoiser(mbs::Knet.Data, σ)
MBNoiser(mbs::Knet.Data; σ=0.01)
mbs
: iterator with minibatchesσ
: standard deviation for the Gaussian noise
Example:
julia> trn = minibatch(x)
julia> tb_train!(mdl, Adam, MBNoiser(trn, σ=0.1))
julia> mbs_noised = MBNoiser(mbs, 0.05)
NNHelferlein.MBMasquerade
— Typestruct MBMasquerade <: DataLoader
Iterator wrapper to partially mask training data of a minibatch iterator of type Knet.Data
or NNHelferlein.DataLoader
.
Constructors:
MBMasquerade(it, rho=0.1; mode=:noise, value=0)
MBMasquerade(it; ρ=0.1, mode=:noise, value=0)
The constructor may be called with the density rho
as normal argument or ρ
as keyword argument.
Arguments:
it
: Minibatch iterator that must deliver (x,y)-tuples of minibatchesρ=0.1
orrho
: Density of mask; a value of 1.0 will mask everything, a value of 0.0 nothing.value=0
: the value with which the masking is done.mode=:noise
: type of masking (only:noise
implemented yet)::noise
: randomly distributed single values of the training data will be overwitten withvalue
.
Examples:
julia> dtrn
26-element Knet.Train20.Data{Tuple{CuArray{Float32}, Array{UInt8}}}
julia> mtrn = Masquerade(dtrn, 0.5, value=2.0h)
Masquerade(26-element Knet.Train20.Data{Tuple{CuArray{Float32}, Array{UInt8}}}, 0.5, 2.0, :noise)
NNHelferlein.GPUIterator
— TypeGPUIterator(iterator)
Wraps any iterator and makes it return CuArrays. Element types are preserved except of Float-Types, which are casted to Float32
for performance reasons).
Contsructor:
GPUIterator(iterator; y=:cpu)
: + iterator
: any iterator + y
: if :gpu
, the labels of the iterator are also converted to CuArray{}
. If :cpu
, the labels are not converted. For a classifier (labels are integers), keeping labels on the cpu is more efficient. For Regression (labels are Floats), labels on the gpu is recommended.
Deprecation warning:
Use of GPUIterator
is deprecated in favour of CUDA.CuIterator
, which offeres similar functionality.
Tabular data
Tabular data is normally provided in table form (csv, ods) row-wise, i.e. one sample per row. The helper functions can read the tables and generate Knet compatible iterators of minibatches.
NNHelferlein.dataframe_read
— Functiondataframe_read(fname; o...)
Read a data table from an CSV-file with one sample per row and return a DataFrame with the data. (ODS-support is removed because of PyCall compatibility issues of the OdsIO package).
All keyword arguments accepted by CSV.File() can be used.
NNHelferlein.dataframe_minibatch
— Functiondataframe_minibatch(data::DataFrames.DataFrame; size=256,
ignore=[], teaching=nothing,
verbose=1, o...)
dataframe_minibatches()
Make Knet-conform minibatches of type Knet.data
from a dataframe with one sample per row.
dataframe_minibatches()
is an alieas kept for backward compatibility.
Arguments:
ignore
: defines a list of column names to be ignoredteaching=nothing
: defines the column name with teaching input.teaching
is handled differently, depending on its type: IfInt
, the teaching input is interpreted as class IDs and directly used for training (this assumes that the values range from 1..n). If type is a String, values are interpreted as class labels and converted to numeric class IDs by callingmk_class_ids()
. The list of valid lables and their order can be created by callingmk_class_ids(data.y)[2]
. If teaching is a scalar value, regression context is assumed, and the value is used unchanged for training. Ifteaching
isnothing
, no teaching input is used and minibatches of x-data only are returned.verbose=1
: if > 0, a summary of how the dataframe is used is echoed.- other keyword arguments: all keyword arguments accepted by
Knet.minibatch()
may be used.
Allowed column definitions for ignore
and teaching
include names (as Strings), column names (as Symbols) or column indices (as Integer values).
NNHelferlein.dataframe_split
— Functionfunction dataframe_split(df::DataFrames.DataFrame;
teaching="y", split=0.8, balanced=true)
Split data, organised row-wise in a DataFrame into train and validation sets.
Arguments:
df
: datateaching="y"
: name or index of column with teaching input "y"split=0.8
: fraction of data to be used for the first returned subdataframeshuffle=true
: shuffle the rows of the dataframe.balanced=true
: iftrue
, result datasets will be balanced by oversampling. Returned datasets will be bigger as expected but include the same numbers of samples for each class.
NNHelferlein.mk_class_ids
— Functionfunction mk_class_ids(labels)
Take a list with n class labels for n instances and return a list of n class-IDs (of type Int) and an array of lables with the array index of each label corresponds its ID.
Arguments:
labels
: List of labels (typically Strings)
Result values:
- array of class-IDs in the same order as the input
- array of unique class-IDs ordered by their ID.
Examples:
julia> labels = ["blue", "red", "red", "red", "green", "blue", "blue"]
7-element Array{String,1}:
"blue"
"red"
"red"
"red"
"green"
"blue"
"blue"
julia> mk_class_ids(labels)[1]
7-element Array{Int64,1}:
1
3
3
3
2
1
1
julia> mk_class_ids(labels)[2]
3-element Array{String,1}:
"blue"
"green"
"red"
Image data
Images as data should be provided in directories with the directory names denoting the class labels. The helpers read from the root of a directory tree in which the first level of sub-dirs tell the class label. All images in the tree under a class label are read as instances of the respective class. The following tree will generate the classes daisy
, rose
and tulip
:
image_dir/
├── daisy
│ ├── 01
│ │ ├── 01
│ │ ├── 02
│ │ └── 03
│ ├── 02
│ │ ├── 01
│ │ └── 02
│ └── others
├── rose
│ ├── big
│ └── small
└── tulip
NNHelferlein.ImageLoader
— Typestruct ImageLoader <: DataLoader
dir
i_paths
i_classes
classes
batchsize
shuffle
train
aug_pipl
pre_proc
pre_load
i_images
end
Iterable image loader to provide minibatches of images as 4-d-arrays (x,y,rgb,mb).
NNHelferlein.mk_image_minibatch
— Functionfunction mk_image_minibatch(dir, batchsize; split=false, at=0.8,
balanced=false, shuffle=true, train=true,
pre_load=false,
aug_pipl=nothing, pre_proc=nothing)
Return one or two iterable image-loader-objects that provides minibatches of images. For training each minibatch is a tupel (x,y)
with x: 4-d-array with the minibatch of data and y: vector of class IDs as Int.
Arguments:
dir
: base-directory of the image dataset. The first level of sub-dirs are used as class names.batchsize
: size of minibatches
Keyword arguments:
split
: return two iterators for training and validationat
: split fraction (for training; the rest is for validation).balanced
: return balanced data (i.e. same number of instances for all classes). Balancing is achieved via oversamplingshuffle
: if true, shuffle the images everytime the iterator restartstrain
: if true, minibatches with (x,y) tuples are provided, if false only x (for prediction)aug_pipl
: augmentation pipeline for Augmentor.jl. Augmentation is performed before the pre_proc-function is appliedpre_proc
: function with preprocessing and augmentation algorithms of type x = f(x). In contrast to the augmentation that modifies images, ispre_proc
working on Arrays{Float32}.pre_load=false
: read all images from disk once when populating the loader (requires loads of memory, but speeds up training).
NNHelferlein.get_class_labels
— Functionfunction get_class_labels(d::DataLoader)
Extracts a list of class labels from a DataLoader.
NNHelferlein.image2array
— Functionfunction image2array(img)
Take an image and return a 3d-array for RGB and a 2d-array for grayscale images with the colour channels as last dimension.
NNHelferlein.array2image
— Functionfunction array2image(arr)
Take a 3d-array with colour channels as last dimension or a 2d-array and return an array of RGB or of Gray as Image.
NNHelferlein.array2RGB
— Functionfunction array2RGB(arr)
Take a 3d-array with colour channels as last dimension or a 2d-array and return always an array of RGB as Image.
Text data
NNHelferlein.WordTokenizer
— Typemutable struct WordTokenizer
len
w2i
i2w
end
Create a word-based vocabulary: every unique word of a String or a list of Strings is assigned to a unique number. The created object includes a list of words (i2w
, ordered by their numbers) and a dictionary w2i
with the words as keys.
The constants TOKEN_START, TOKEN_END, TOKEN_PAD
and TOKEN_UNKOWN
are exported.
The WordTokenizer implements length
, so length(vt::WordTokenizer)
reuturns the number of words in the vocabulary.
Constructor:
function WordTokenizer(texts; len=nothing, add_ctls=true)
With arguments:
texts
:AbstractArray
or iterable collection ofAbstractArray
s to be analysed.len=nothing
: maximum number of different words in the vocabulary. Additional words in texts will be encoded as unknown. Ifnothing
, all words of the texts are included.add_ctls=true
: if true, control words are added in front of the vocabulary (extending the maximum length by 4):"<start>"=>1
,"<end>"=>2
,"<pad>"=>3
and"<unknown>"=>4
.
Signatures:
function (t::WordTokenizer)(w::T; split_words=false, add_ctls=false)
where {T <: AbstractString}
Encode a word and return the corresponding number in the vocabulary or the highest number (i.e. "<unknown>"
) if the word is not in the vocabulary.
The encode-signature accepts the keyword arguments split_words
and add_ctls
. If split_words==true
, the input is treated as a sentence and splitted into single words and an array of integer with the encoded sequence is returned. If add_ctls==true
the sequence will be framed by <start>
and <end>
tokens.
function (t::WordTokenizer)(i::Integer)
Decode a word by returning the word corresponding to i
or "<unknown>" if the number is out of range of the vocabulary.
function (t::WordTokenizer)(s::AbstractArray{T}; add_ctls=false)
where {T <: AbstractString}
Called with an Array of Strings the tokeniser splits the strings into words and returns an Array of Array{Integer}
with each of the input strings represented by a sequence of Integer values.
function (t::WordTokenizer)(seq::AbstractArray{T}; add_ctls=false)
where {T <: Integer}
Called with an Array of Integer values a single string is returned with the decoded token-IDs as words (space-separated).
Base Signatures:
function length(t::WordTokenizer)
Return the length of the vocab.
Examples:
julia> vocab = WordTokenizer(["I love Julia", "They love Python"]);
Julia> vocab(8)
"Julia"
julia> vocab("love")
5
julia> vocab.(split("I love Julia"))
3-element Array{Int64,1}:
5
6
8
julia> vocab.i2w
9-element Array{String,1}:
"<start>"
"<end>"
"<pad>"
"<unknown>"
"love"
"I"
"They"
"Julia"
"Python"
julia> vocab.w2i
Dict{String,Int64} with 9 entries:
"I" => 6
"<end>" => 2
"<pad>" => 3
"They" => 7
"Julia" => 8
"love" => 5
"Python" => 9
"<start>" => 1
"<unknown>" => 4
julia> vocab.([7,5,8])
3-element Array{String,1}:
"They"
"love"
"Julia
julia> vocab.("I love Scala", split_words=true)
3-element Array{Int64,1}:
6
5
4
julia> vocab.([6,5,4])
3-element Array{String,1}:
"I"
"love"
"<unknown>"
julia> vocab("I love Python", split_words=true, add_ctls=true)
5-element Array{Int64,1}:
1
6
5
9
2
julia> vocab(["They love Julia", "I love Julia"])
2-element Array{Array{Int64,1},1}:
[7, 5, 8]
[6, 5, 8]
NNHelferlein.get_tatoeba_corpus
— Functionfunction get_tatoeba_corpus(lang; force=false,
url="https://www.manythings.org/anki/")
Download and read a bilingual text corpus from Tatoeba (provided) by ManyThings (https://www.manythings.org). All corpi are English-Language-pairs with different size and quality. Considerable languages include:
fra
: French-English, 180 000 sentencesdeu
: German-English, 227 000 sentencesheb
: Hebrew-English, 126 000 sentencespor
: Portuguese-English, 170 000 sentencestur
: Turkish-English, 514 000 sentences
The function returns two lists with corresponding sentences in both languages. Sentences are not processed/normalised/cleaned, but exactly as provided by Tatoeba.
The data is stored in the package directory and only downloaded once.
Arguments:
lang
: languagecodeforce=false
: iftrue
, the corpus is downloaded even if a data file is already saved.url
: base url of ManyThings.
NNHelferlein.sequence_minibatch
— Functionfunction sequence_minibatch(x, [y], batchsize;
pad=NNHelferlein.TOKEN_PAD,
seq2seq=true, pad_y=pad,
x_padding=false,
shuffle=true, partial=false)
Return an iterator of type DataLoader
with (x,y) sequence minibatches from two lists of sequences.
All sequences within a minibatch in x and y are brought to the same length by padding with the token provided as pad
.
The sequences are sorted by length before building minibatches in order to reduce padding (i.e. sequences of similar length are combined to a minibatch). If the same sequence length is needed for all minibatches, the sequences must be truncated or padded before call of sequence_minibatch()
(see functions truncate_seqence()
and pad_sequence()
).
Arguments:
x
: List of sequences ofInt
y
: List of sequences ofInt
or list of target values (i.e. teaching input)batchsize
: size of minibatchespad=NNHelferlein.PAD_TOKEN
,pad_y=x
: token, used for padding. The token must be compatible with the type of the sequence elements. Ifpad_y
is omitted, it is set equal to pad_x.seq2seq=true
: iftrue
andy
is provided, sequence-to-sequence minibatches are created. Otherwisey
is treated as scalar teaching input.shuffle=true
: The minibatches are shuffled as last step. Iffalse
the minibatches with short sequences will be at the beginning of the dataset.partial=false
: Iftrue
, a partial minibatch will be created if necessaray to include all input data.x_padding=false
: iftrue
, pad sequences in x to make minibatches of the demanded size, even if there are not enougth sequences of the same length in x. Iffalse
, partial minibatches are built (if partial ==true
) or remaining sequneces are skipped (if partial ==false
).
NNHelferlein.pad_sequence
— Functionfunction pad_sequence(s, len; token=NNHelferlein.TOKEN_PAD)
Stretch a sequence to length len
by adding the padding token.
NNHelferlein.truncate_sequence
— Functionfunction truncate_sequence(s, len; end_token=nothing)
Truncate a sequence to the length len
. If not isnothing(end_token)
, the last token of the sequence is overwritten by the token.
NNHelferlein.clean_sentence
— Functionfunction clean_sentence(s)
Cleaning a sentence in some simple steps:
- normalise Unicode
- remove punctuation
- remove duplicate spaces
- strip
Training
NNHelferlein.tb_train!
— Functionfunction tb_train!(mdl, opti, trn, vld=nothing; epochs=1, split=nothing,
lr_decay=nothing, lrd_steps=5, lrd_linear=false,
l2=nothing, l1=nothing,
eval_size=0.2, eval_freq=1,
acc_fun=nothing,
mb_loss_freq=100,
checkpoints=nothing, cp_dir="checkpoints",
tb_dir="logs", tb_name="run",
tb_text="""Description of tb_train!() run.""",
resume=true, tensorboard=true, return_stats=false,
opti_args...)
Train function with TensorBoard integration. TB logs are written with the TensorBoardLogger.jl package. The model is updated (in-place) and the trained model is returned.
Arguments:
mdl
: model; i.e. forward-function for the netopti
: Knet-stype optimiser typetrn
: training data; iterator to provide (x,y)-tuples with minibatchesvld
: validation data; iterator to provide (x,y)-tuples with minibatches. Set tonothing
, if not defined.
Keyword arguments:
Optimiser:
epochs=1
: number of epochs to trainresume=true
: iftrue
, optimiser parameters (momentum or gradient moving average) from a previous run are used to enable a seemless continuation of the training. Be aware that in aresume
ed training, the original optimizer will be used, even if a different one is specified for the continuation.lr_decay=nothing
: do a leraning rate decay if notnothing
: the value given is the final learning rate afterlrd_steps
steps of decay (lr_decay
may be bigger thanlr
; in this case the leraning rate is increased).lr_decay
is only applied if both start learning ratelr
and final learning ratelr_decay
are defined explicitly. Example:lr=0.01, lr_decay=0.001
will reduce the lr from 0.01 to 0.001 during the training (by default in 5 steps).lr_decay
is applied tol1
andl2
with the same decay rate.lrd_steps=5
: number of learning rate decay steps. Default is5
, i.e. modify the lr 4 times during the training (resulting in 5 different learning rates).lrd_linear=false
: type of learning rate decay; Iffalse
, lr is modified by a constant factor (e.g. 0.9) resulting in an exponential decay. Iftrue
, lr is modified by the same step size, i.e. linearly.l1=nothing
: L1 regularisation; implemented as weight decay per parameter. If learning-rate decay is used, L1 and L2 are also decayed.l2=nothing
: L2 regularisation; implemented as weight decay per parameteropti_args...
: optional keyword arguments for the optimiser can be specified (i.e.lr
,gamma
, ...).
Model evaluation:
split=nothing
: if no validation data is specified and split is a fraction (between 0.0 and 1.0), the training dataset is splitted at the specified point (e.g.: ifsplit=0.8
, 80% of the minibatches are used for training and 20% for validation).eval_size=0.2
: fraction of validation data to be used for calculating loss and accuracy for train and validation data during training.eval_freq=1
: frequency of evaluation; default=1 means evaluation is calculated after each epoch. With eval_freq=10 eveluation is calculated 10 times per epoch.acc_fun=nothing
: function to calculate accuracy. The function must implement the following signature:fun(model; data)
where data is an iterator that provides (x,y)-tuples of minibatches. For classification tasks,accuracy
from the Knet package is a good choice. For regression a correlation or mean error may be preferred.mb_loss_freq=100
: frequency of training loss reporting. default=100 means that 100 loss-values per epoch will be logged to TensorBoard. If mblossfreq is greater then the number of minibatches, loss is logged for each minibatch.checkpoints=nothing
: frequency of model checkpoints written to disk. Default isnothing
, i.e. no checkpoints are written. To write the model after each epoch with namemodel
use cpepoch=1; to write every second epochs cpepoch=2, etc.cp_dir="checkpoints"
: directory for checkpointsreturn_stats=false
: iftrue
, a dictionary with losses and accuracies of training and validation data is returned instead of the model.
TensorBoard:
TensorBoard log-directory is created from 3 parts: tb_dir/tb_name/<current date time>
.
tensorboard=true
: iftrue
, TensorBoard logs are writtentb_dir="logs"
: root directory for TensorBoard logs.tb_name="run"
: name of training run.tb_name
will be used as directory name and should not include whitespacetb_text
: description to be included in the TensorBoard log as text log.
Evaluation and accuracy
NNHelferlein.focal_nll
— Functionfunction focal_nll(scores, labels::AbstractArray{<:Integer}; γ=2.0, dims=1)
function focal_nll(mdl; data, γ=2.0, dims=1)
Calculate the negative log-likelihood (i.e. cross entropy) with increased weights on weekly classified samples. focal nll for sample j is defined as
\[- (1 - p_{j})^{\gamma} \cdot \ln p_{j} =\]
\[(1 - p_{j})^{\gamma} \cdot nll(p_{j})\]
where p is the softmax-scaled likelyhood for the true class of the j-th sample. The sample weight is high, if predicted p << 1.
The second signature can be used to caclulate the mean focus nll for a dataset of minibatches of (x,y)-tuples.
Arguments:
scores
: unnormalised scores (i.e. activations of output neurons without applying an activation function), typically of a classifier with one neuron per classlabels
: ground truth as integer valuesγ=2.0
: The parameter γ controls the strength of the effect: for γ=0, all weights become exactly 1.0; with higher values for γ, focus on mis-classified or weakly classified sample is increased.
dims=1
: dimension in which the instances are organised.
NNHelferlein.focal_bce
— Functionfunction focal_bce(scores, labels::AbstractArray{<:Integer};
function focal_bce(mdl; data, γ=2.0, dims=1)
Calculate the biray crossentropywith increased weights on weekly classified samples. focal bce for sample j is defined as
\[(1 - p_{j})^{\gamma} \cdot bce(p_{j})\]
where p is the softmax-scaled likelyhood for the true class of the j-th sample. The sample weight is high, if predicted p << 1.
The second signature can be used to caclulate the mean focus bce for a dataset of minibatches of (x,y)-tuples.
For arguments and details, please refer to the documentation of focal_nll
.
NNHelferlein.predict
— Functionfunction predict(mdl; data, softmax=false)
function predict(mdl, x; softmax=false )
Return the prediction for minibatches of data. The signature follows the standard call predict(model, data=xxx)
. The second signature predicts a single Array of data.
Arguments:
mdl
: executable network modeldata=iterator
: iterator providing minibatches of input data; if the minibatches include y-values (i.e. teaching input), predictions (i.e. index of class with highest value and the y-values will be returned.data
: single Array of input data (i.e. input for one minibatch)softmax
: if true or if model is of typeClassifier
the predicted softmax probabilities are returned instead of raw activations.
NNHelferlein.predict_top5
— Functionfunction predict_top5(mdl; data, top_n=5, classes=nothing)
Run the model mdl
for data in minibatches data
and print the top 5 predictions as softmax probabilities.
Arguments:
top_n
: print top n hitsclasses
: optional list of human readable class labels.
NNHelferlein.minibatch_eval
— Functionfunction minibatch_eval(mdl, fun, data; o...)
Given an accuracy or loss function fun(p, y)
that returns an accuracy meassure for n-dimensional arrays of predictions p
and teaching input y
(i.e. one minibatch of data), minibatch_eval()
applies the fun()
to all minibatches supplied by the minibatch iterator data
.
Arguments:
mdl
: model to compute predictionsfun
: evaluation function for one minibatch that returns the mean of results for all samples of the minibatchdata
: iterator that supplies a Tuple of (x,y) for each minibatch
o...
: all additional keyword arguments are forwarded to fun()
.
NNHelferlein.squared_error_acc
— Functionfunction squared_error_acc(mdl; data)
Return the mean squared error between the predictions of the model mdl
and the corresponding teaching input by providung the standard signature fun(model, data=iterator)
.
Arguments
mdl
: model with the signaturemdl(x)
to generate predictions for one minibatch (i.e. array) of data.data
: iterator, providing (x,y)-tuples of training or validation data.
NNHelferlein.abs_error_acc
— Functionfunction abs_error_acc(mdl; data)
Return the mean absolute error between the predictions of the model mdl
and the corresponding teaching input by providung the standard signature fun(model, data=iterator)
.
Arguments
mdl
: model with the signaturemdl(x)
to generate predictions for one minibatch (i.e. array) of data.data
: iterator, providing (x,y)-tuples of training or validation data.
NNHelferlein.hamming_dist
— Functionfunction hamming_dist(p, t; accuracy=false,
ignore_ctls=false, vocab=nothing,
start=nothing, stop=nothing, pad=nothing, unk=nothing)
function hamming_acc(p, t; o...)
function hamming_acc(mdl; data=data, o...)
Return the Hamming distance between two sequences or two minibatches of sequences. Predicted sequences p
and teaching input sequences t
may be of different length but the number of sequences in the minibatch must be the same.
Arguments:
p
,t
: n-dimensional arrays of typeInt
with predictions and teaching input for a minibatch of sequences. Shape of the arrays must be identical except of the first dimension (i.e. the sequence length) that may differ betweenp
andt
.accuracy=false
: iffalse
, the mean Hamming distance in the minibatch is returned (i.e. the average number of differences in the sequences). Iftrue
, the accuracy is returned for all not padded positions in a range (0.0 - 1.0).ignore_ctls=false
: a vocab is used to replace all '<start>, <end>, <unknwon>, <pad>' tokens by<pad>
. If true, padding and other control tokens are treated as normal codes and are not ignored.vocab=nothing
: target laguage vocabulary of typeNNHelferlein.WordTokenizer
. If defined, the padding token ofvocab
is used to mask all control tokens in the sequences (i.e. '<start>, <end>, <unknwon>, <pad>').start, stop, pad, unk
: may be used to define individual control tokens. default isnothing
.
Details:
The function hamming_acc()
is a shortcut to return the accuracy instead of the distance. The signature hamming_acc(mdl; data=data; o...)
is for compatibility with acc functions called by train.
NNHelferlein.peak_finder_acc
— Functionfunction peak_finder_acc(p, t; ret=:f1, verbose=0,
tolerance=1, limit=0.5
function peak_finder_acc(mdl; data=data, o...)
Calculate an accuracy-like measure for data series consisting mainly of zeros and rare peaks. The function counts the number of peaks in y
detected by p
(true positives), peaks not detected (false negatives) and the number of peaks in p
not present in y
(false positives).
It is assumed that peaks in y
are marked by a single value higher as the limit (typically 1.0). Peaks in p
may be broader; and are defined as local maxima with a value above the limit. If the tolerance ist set to > 0, it may happen that the peaks at the first or last step are not evaluated (because evaluation stops at end-tolerance
).
If requested, f1, G-mean and intersection over union are calulated from the raw values .
Arguments:
p
,t
: Predictionsp
and teaching inputt
(i.e.y
) are mini-batches of 1-d series of data. The sequence must be in the 1st dimension (column). All other dims are treated as separate windows of length size(p/t,1).ret
: return value asSymbol
; one of:peaks
,:recall
,:precision
,:miss_rate
,:f1
,:g_mean
,:iou
or:all
. If:all
a named tuple is returned.verbose=0
: if0
, no additional output is generated; if1
, composite measures are printed to stdout; if2
, all raw counts are printed.tolerance=1
: peak finder tolerance: The peak is defined as correct if it is detected within the tolerance.limit=0.5
: Only maxima with values above the limit are considered.
NNHelferlein.confusion_matrix
— Functionfunction confusion_matrix(mdl; data, labels=nothing, pretty_print=true, accuracy=true)
function confusion_matrix(y, p; labels=nothing, pretty_print=true, accuracy=true)
Compute and display the confusion matrix of (x,y)-minibatches. Predictions are calculated with model mdl
for which a signature mdl(x)
must exist.
The second signature generates the confusion matrix from the 2 vectors ground truth y
and predictions p
.
The function is an interface to the function confusmat
provided by the package MLBase
.
Arguments:
mdl
: mdl with signaturemdl(x)
to generate predictionsdata
: minibatches of (x,y)-tuplespretty_print=true
: iftrue
, the matrix will pe displayed to stdoutlabels=nothing
: a vecor of human readable labels can be providedaccuracy=true
: iftrue
, accuracy, precisiomn and recall is printed for all classes.
ImageNet tools
NNHelferlein.preproc_imagenet_vgg
— Functionfunction preproc_imagenet_vgg(img)
function preproc_imagenet_resnetv2(img)
Image preprocessing for pre-trained ImageNet examples. Preprocessing includes
- bring RGB colour values into a range 0-255
- standardise of colour values by substracting mean colour values (103.939, 116.779, 123.68) from RGB
- changing colour channel sequence from RGB to BGR
- normalising or scaling colour values.
Resize is not done, because this may be part of the augmentation pipeline.
Details
Unfortunately image preprocessing is not consistent between all pretrained Tenrflow/Keras applications. As a result, different preprocessing functions must be used for different pretrained applications:
- VGG16, VGG19:
preproc_imagenet_vgg
(colour space: BGR, values: 0 - 255, centered according to the imagenet training set) - RESNET:
preproc_imagenet_resnet
(identical to vgg) - RESNET V2:
preproc_imagenet_resnetv2
(colour space: RGB, values: -1.0 - 1.0, scaled for each sample individually)
Examples:
The function can be used with the image loader; for prediction with a trained model as:
pipl = CropRatio(ratio=1.0) |> Resize(224,224)
images = mk_image_minibatch("./example_pics", 16;
shuffle=false, train=false,
aug_pipl=pipl,
pre_proc=preproc_imagenet_vgg)
And for training something like:
pipl = Either(1=>FlipX(), 1=>FlipY(), 2=>NoOp()) |>
Rotate(-5:5) |>
ShearX(-5:5) * ShearY(-5:5) |>
RCropSize(224,224)
dtrn, dvld = mk_image_minibatch("./example_pics", 16;
split=true, at=0.8, balanced=false,
shuffle=true, train=true,
aug_pipl=pipl,
pre_proc=preproc_imagenet_vgg)
NNHelferlein.preproc_imagenet_resnet
— Functionpreproc_imagenet_resnet(img)
See documentation of preproc_imagenet_vgg
.
NNHelferlein.preproc_imagenet_resnetv2
— Functionpreproc_imagenet_resnetv2(img)
See documentation of preproc_imagenet_vgg
.
NNHelferlein.predict_imagenet
— Functionfunction predict_imagenet(mdl; data, top_n=5)
Predict the ImageNet-class of images from the predefined list of class labels.
NNHelferlein.get_imagenet_classes
— Functionfunction get_imagenet_classes()
Return a list of all 1000 ImageNet class labels.
Other utils
Layers and helpers for transformers
NNHelferlein.PositionalEncoding
— Typestruct PositionalEncoding <: AbstractLayer
Positional encoding layer. Only sincos-style (according to Vaswani, et al., NIPS 2017) is implemented.
The layer takes an array of any number of dimensions (>=2), calculates the Vaswani-2017-style positional encoding and adds the encoding to each plane of the array.
NNHelferlein.positional_encoding_sincos
— Functionfunction positional_encoding_sincos(n_embed, n_seq)
Calculate and return a matrix of size [n_embed, n_seq]
of positional encoding values following the sin and cos style in the paper Vaswani, A. et al.; Attention Is All You Need; 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 2017.
NNHelferlein.mk_padding_mask
— Functionfunction mk_padding_mask(x; pad=TOKEN_PAD, add_dims=false)
Make a padding mask; i.e. return an Array of type KnetArray{Float32}
(or Array{Float32}
) similar to x
but with two additional dimensions of size 1 in the middle (this will represent the 2nd seq_len and the number of heads) in multi-head attention and the value 1.0
at each position where x
is pad
and 0.0
otherwise.
The function can be used for creating padding masks for attention mechanisms.
Arguments:
x
: Array of sequences (typically a matrix with ncols sequences of length nrows)pad
: value for the token to be maskedadd_dims
: iftrue
, 2 additional dimensions are inserted to return a 4-D-array as needed for transformer architectures. Otherwise the size of the returned array is similar tox
.
NNHelferlein.mk_peek_ahead_mask
— Functionfunction mk_peek_ahead_mask(x; dim=1)
function mk_peek_ahead_mask(n_seq)
Return a matrix of size [n_seq, n_seq]
filled with 1.0 and the uppper triangle set to 0.0. Type is CuArray{Float32}
in GPU context, Array{Float32}
otherwise. The matrix can be used as peek-ahead mask in transformers.
dim=1
specifies the dimension in which the sequence length is represented. For un-embedded data this is normally 1
, i.e. the shape of x
is [nseq, nmb]. After embedding the shape probably is [depth, nseq, nmb].
NNHelferlein.dot_prod_attn
— Functionfunction dot_prod_attn(q, k, v; mask=nothing)
Generic scaled dot product attention following the paper of Vaswani et al., (2017), Attention Is All You Need.
Arguments:
q
: query of size[depth, n_seq_q, ...]
k
: key of size[depth, n_seq_v, ...]
v
: value of size[depth, n_seq_v, ...]
mask
: mask for attention factors may have different shapes but must be broadcastable for addition to the scores tensor (which as the same size as alpha[n_seq_v, n_seq_q, ...]
). In transformer context typical masks are one of: padding mask of size[n_seq_v, ...]
or a peek-ahead mask of size[n_seq_v, n_seq_v]
(which is only possible in case of self-attention when all sequence lengths are identical).
q, k, v
must have matching leading dimensions (i.e. same depth or embedding). k
and v
must have the same sequence length.
Return values:
c
: context as alpha-weighted sum of values with size [depth, nseqv, ...]alpha
: attention factors of size [nseqv, nseqq, ...]
NNHelferlein.MultiHeadAttn
— Typestruct MultiHeadAttn <: AbstractLayer
Multi-headed attention layer, designed following the Vaswani, 2017 paper.
Constructor:
MultiHeadAttn(depth, n_heads)
depth
: Embedding depthn_heads
: number of heads for the attention.
Signature:
function(mha::MultiHeadAttn)(q, k, v; mask=nothing)
q, k, v
are 3-dimensional tensors of the same size ([depth, seqlen, nminibatch]) and the optional mask must be of size [seqlen, nminibatch] and mark masked positions with 1.0.
NNHelferlein.separate_heads
— Functionfunction separate_heads(x, n)
Helper function for multi-headed attention mechanisms: an additional second dimension is added to a tensor of minibatches by splitting the first (i.e. depth).
NNHelferlein.merge_heads
— Functionfunction merge_heads(x)
Helper to merge the result of multi-headed attention back to full depth .
Utils for array manipulation
NNHelferlein.crop_array
— Functionfunction crop_array(x, crop_sizes)
Crop a n-dimensional array to the given size. Cropping is always centered (i.e. a margin is removed).
Arguments:
x
: n-dim AbstractArraycrop_sizes
: Tuple of target sizes to which the array is cropped. Allowed values are Int or:
. Ifcrop_sizes
defines less dims as x has, the remaining dims will not be cropped (assuming:
). If a demanded crop size is bigger as the actual size of x, it is ignored.
NNHelferlein.blowup_array
— Functionfunction blowup_array(x, n)
Blow up an array x
with an additional dimension and repeat the content of the array n
times.
Arguments:
x
: Array of any dimensionn
: number of repeats. ´n=1´ will return an
array with an additional dimension of size 1.
Examples:
julia> x = [1,2,3,4]; blowup_array(x, 3)
4×3 Array{Int64,2}:
1 1 1
2 2 2
3 3 3
4 4 4
julia> x = [1 2; 3 4]; blowup_array(x, 3)
2×2×3 Array{Int64,3}:
[:, :, 1] =
1 2
3 4
[:, :, 2] =
1 2
3 4
[:, :, 3] =
1 2
3 4
NNHelferlein.recycle_array
— Functionfunction recycle_array(x, n; dims=dims(x))
Recycle an array x
along the specified dimension (default the last dimension) and repeat the content of the array n
times. The number of dims stays unchanged, but the array values are repeated n
times.
Arguments:
x
: Array of any dimensionn
: number of repeats. ´n=1´ will return an unchanged arraydims
: dimension to be repeated.
Examples:
julia> recycle_array([1,2],3)
6-element Array{Int64,1}:
1
2
1
2
1
2
julia> x = [1 2; 3 4]
2×2 Array{Int64,2}:
1 2
3 4
julia> recycle_array(x,3)
2×6 Array{Int64,2}:
1 2 1 2 1 2
3 4 3 4 3 4
julia> recycle_array([1 2 3],3, dims=1)
3x3 Array{Int64,2}:
1 2 3
1 2 3
1 2 3
NNHelferlein.de_embed
— Functionfunction de_embed(x; remove_dim=false)
Replace the maximum of the first dimension of an n-dimensional array by its index (aka argmax()). If remove_dim
is true, the result has the first dimension removed; otherwise the returned array has the first dimension with size 1 (default).
Examples:
> x = [1 1 1
2 1 1
1 2 1
1 1 2]
> de_embed(x)
1×3 Matrix{Int64}:
2 3 4
> de_embed(x, remove_dim=true)
3-element Vector{Int64}:
2
3
4
Utils for fixing types in GPU context
NNHelferlein.init0
— Functionfunction init0(siz...)
Initialise a vector or array of size siz
with zeros. If a GPU is detected type of the returned value is KnetArray{Float32}
, otherwise Array{Float32}
.
Examples:
julia> init0(2,10)
2×10 Array{Float32,2}:
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
julia> init0(0,10)
0×10 Array{Float32,2}
NNHelferlein.convert2CuArray
— Functionfunction convert2CuArray(x, innerType=Float32)
function convert2KnetArray(x, innerType=Float32)
function ifgpu(x, innerType=Float32)
Convert an array x
to a CuArray{Float32}
or whatever specified as innerType only in GPU context (if CUDA.functional()
) or to an Array{Float32}
otherwise. By converting, the data is copied to the GPU.
convert2KnetArray()
is kept as an alias for backward compatibility.
ifgpu()
is an alias/shortcut to convert2KnetArray()
.
NNHelferlein.convert2KnetArray
— Functionfunction convert2CuArray(x, innerType=Float32)
function convert2KnetArray(x, innerType=Float32)
function ifgpu(x, innerType=Float32)
Convert an array x
to a CuArray{Float32}
or whatever specified as innerType only in GPU context (if CUDA.functional()
) or to an Array{Float32}
otherwise. By converting, the data is copied to the GPU.
convert2KnetArray()
is kept as an alias for backward compatibility.
ifgpu()
is an alias/shortcut to convert2KnetArray()
.
NNHelferlein.ifgpu
— Functionfunction convert2CuArray(x, innerType=Float32)
function convert2KnetArray(x, innerType=Float32)
function ifgpu(x, innerType=Float32)
Convert an array x
to a CuArray{Float32}
or whatever specified as innerType only in GPU context (if CUDA.functional()
) or to an Array{Float32}
otherwise. By converting, the data is copied to the GPU.
convert2KnetArray()
is kept as an alias for backward compatibility.
ifgpu()
is an alias/shortcut to convert2KnetArray()
.
NNHelferlein.emptyCuArray
— Functionfunction emptyCuArray(size...=(0,0);innerType=Float32)
function emptyKnetArray(size...=(0,0);innerType=Float32)
Return an empty CuArray with the specified dimensions. The array may be empty (i.e. one dimension 0) or elements will be undefined.
By default an empty matrix is returned.
Examples:
>>> emptyKnetArray(0,0)
0×0 Knet.KnetArrays.KnetMatrix{Float32}
>>> emptyKnetArray()
0×0 Knet.KnetArrays.KnetMatrix{Float32}
>>> emptyKnetArray(0)
0-element Knet.KnetArrays.KnetVector{Float32}
Utils for Bioinformatics
NNHelferlein.aminoacid_tokenizer
— Functionaminoacid_tokenizer(sec; ignore_unknown=true)
Tokenize a protein sequence into amino acids using the following table:
Amino acid | Token | Description
--------------------------------
C | 1 | Cysteine
S | 2 | Serine
T | 3 | Threonine
A | 4 | Alanine
G | 5 | Glycine
P | 6 | Proline
D | 7 | Aspartic acid
E | 8 | Glutamic acid
Q | 9 | Glutamine
N | 10 | Asparagine
H | 11 | Histidine
R | 12 | Arginine
K | 13 | Lysine
M | 14 | Methionine
I | 15 | Isoleucine
L | 16 | Leucine
V | 17 | Valine
W | 18 | Tryptophan
Y | 19 | Tyrosine
F | 20 | Phenylalanine
B | 21 | Aspartic acid or Asparagine
Z | 22 | Glutamic acid or Glutamine
J | 23 | Leucine or Isoleucine
U | 24 | Selenocysteine
X | 25 | Unknown amino acid
. | 26 | padding token
Arguments:
sec
: A string containing the protein sequence in uppercase or lowercase. All other letters or symbols will be converted to the unknwon token.ignore_unknown
: Iftrue
, unkown amino acids (i.e. "X") will be converted to the padding token. Iffalse
, the embedding for "X" will be trained as for all other amino acids.
NNHelferlein.embed_blosum62
— Functionembed_blosum62(x)
Embed a protein sequence into a 21-dimensional vector using the BLOSUM62 amino acid substitution matrix. Aminoacid are encoded as with NNHelferleins aminoacid tokenizer
function. x
can be any AbstractArray
of Int
and a dimension of size 21 will be added as the first dimension.
NNHelferlein.embed_vhse8
— Functionembed_vhse8(x)
Embed a protein sequence into a 8-dimensional vector using the VHSE8 amino acid embedding scheme. Aminoacid are encoded as with NNHelferleins aminoacid tokenizer
function. x
can be any AbstractArray
of Int
and a dimension of size 21 will be added as the first dimension.
NNHelferlein.EmbedAminoAcids
— TypeEmbedAminoAcids <: AbstractLayer
Embed a protein sequence into a 21-dimensional vector using the BLOSUM62 amino acid substitution matrix or as a 8-dimensional vector using the VHSE8 parameters. Aminoacids must be encoded acording to NNHelferlein's aminoacid tokenizer
function.
Layer input a is a n-dimensional array of an Integer type. Output is a (n+1)-dimensional array of Float32 type with a first (added) dimension of size 21 or 8.
Constructor:
EmbedAminoAcids(embedding::Symbol=:blosum62)
:embedding=:blosum62
: Either:blosum62
or:vhse8
to select the embedding scheme.
Saving, loading and inspection of models
NNHelferlein.save_network
— Functionsave_network(fname, mdl)
Save a model as jld2-file.
Arguments:
fname
: filename; if the name does not end with the extension.jld2
, it will be added.mdl
: network model to be saved. The model will be copied to a cpu-based model viacopy_network(mdl, to=:cpu)
before saving, to remove hardware dependencies of parameters on the gpu.
NNHelferlein.load_network
— Functionload_network(fname; to=:gpu)
Load a model from a jld2-file.
Arguments:
fname
: filename; if the name does not end with the extension.jld2
, it will be added.to=:gpu
: by default, parameters are loaded as CuArrays, if a functional gpu is detected. Ifto=:cpu
is specified parameters are loaded as cpu-arrays.
NNHelferlein.copy_network
— Functioncopy_network(mdl::AbstractNN; to=:gpu)
Returns a copy of a Helferlein model. cave: the copy is generated by Adapt.adapt()
and no deep copy!
Arguments:
mdl
: Network model of typeAbstractNN
.to=:gpu
: by default all parameters of the copy areCuArrays
for GPU usage. Ifto=:cpu
is specified, parameters are Arrays and the model will be processed in the cpu.
Base.summary
— Functionfunction summary(mdl)
Print a network summary of any model of Type AbstractNN
, AbstractChain
or AbstractLayer
.
NNHelferlein.print_network
— Functionfunction print_network(mdl::AbstractNN)
Alias to summary()
, kept for backward compatibility only.
Datasets
NNHelferlein.dataset_mit_nsr
— Functionfunction dataset_mit_nsr(records=nothing; force=false)
Retrieve the Physionet ECG data set: "MIT-BIH Normal Sinus Rhythm Database". If necessary the data is downloaded from Zenodo (and stored in the NNHelferlein data directory, ).
All 18 recordings are returned as a list of DataFrames.
ECGs from the MIT-NSR database with some modifications to make them more suitable as playground data set for machine learning.
- all 18 ECGs are trimmed to approx. 50000 heart beats from a region without recording errors
- scaled to a range -1 to 1 (non-linear/tanh)
- heart beats annotation as time series with value 1.0 at the point of the annotated beat and 0.0 for all other times
- additional heart beat column smoothed by applying a gaussian filter
- provided as csv with columns "time in sec", "channel 1", "channel 2", "beat" and "smooth".
Arguments:
force=false
: iftrue
the download will be forced and local data will be overwitten.records
: list of records names to be downloaded.
Examples:
nsr_16265 = dataset_mit_nsr("16265")
nsr_16265 = dataset_mit_nsr(["16265", "19830"])
nsr_all = dataset_mit_nsr()
NNHelferlein.dataset_mnist
— Functionfunction dataset_mnist(; force=false)
Download the MNIST dataset with help of MLDatasets.jl
from Yann LeCun's official website. 4 arrays xtrn, ytrn, xtst, ytst
are returned.
xtrn
and xtst
will be the images as a multi-dimensional array, and ytrn
and ytst
the corresponding labels as integers.
The image(s) is/are returned in the horizontal-major memory layout as a single numeric array of eltype Float32
. The values are scaled to be between 0 and 1. The labels are returned as a vector of Int8
.
In the teaching input (i.e. y
) the digit 0
is encoded as 10
.
The data is stored in the Helferlein data directory and only downloaded the files are not already saved.
Ref.: Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. "Gradient-based learning applied to document recognition." Proceedings of the IEEE, 86(11):2278-2324, November 1998 http://yann.lecun.com/exdb/mnist/.
Arguments:
force=false
: iftrue
, the dataset download will be forced.
NNHelferlein.dataset_fashion_mnist
— Functionfunction dataset_fashion_mnist(; force=false)
Download Zalando's Fashion-MNIST datset with help of MLDatasets.jl
from https://github.com/zalandoresearch/fashion-mnist.
4 arrays xtrn, ytrn, xtst, ytst
are returned in the same structure as the original MNIST dataset.
The data is stored in the Helferlein data directory and only downloaded the files are not already saved.
Authors: Han Xiao, Kashif Rasul, Roland Vollgraf
Arguments:
force=false
: iftrue
, the dataset download will be forced.
NNHelferlein.dataset_iris
— Functionfunction dataset_iris()
Return Fisher's iris dataset of 150 records as dataframe.
Ref: Fisher,R.A. "The use of multiple measurements in taxonomic problems" Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions to Mathematical Statistics" (John Wiley, NY, 1950). https://archive.ics.uci.edu/ml/datasets/Iris
NNHelferlein.dataset_pfam
— Functionfunction dataset_pfam(records; force=false)
Retrieve the curated PFAM protein families database from Zenodo including 46872 sequences from 62 families. Sequences are between 100 and 1000 amino acids long and families have between 100 and 200 memebers. Training and test data are padded to a length of 1000 amino acids with the padding token of the amino acid tokenizer (26).
More information about the data set can be found at https://zenodo.org/record/8138939, including PDB sequence IDs for each data table.
Available records:
:raw
: dataframe with all (46872) rows of data and the columns ID (PDB-ID), family (family name) and sequence (amino acid sequence):families
: list of all family names as dataframe with the columns class (cnumeric class ID 1-62), family (family name) and and count (number of family members in the dataset):aminoacids
: list of amino acid tokes as dataframe with the columns Token (aa token 1-26), One-Letter (one-letter code of the amino acid), and Amino acid (full name of the amino acid):train
: dataframe with 42187 rows of training data and labels with the class ID as first column and the amino acid tokens as columns 2-1001 (padded to 1000 amino acids):test
: dataframe with 4687 rows of test data in the same format as the training data:balanced_train
: dataframe with 111601 rows of balanced training data in the same format as the training data. The data is balanced by sampling 1800 sequences from each family.:balanced_test
: dataframe with 12401 rows of balanced test data in the same format as the training data.
Pretrained networks
NNHelferlein.get_vgg16
— Functionfunction get_vgg16(; filters_only=false, trainable=true)
Return a VGG16 model with pretrained parameters from Tensorflow/Keras applications API. For details about original model and training see Keras Applications
.
Arguments
filters_only=false
: iftrue
, only the filterstack is returned (without Flatten() and classifier) to be integrated in to any chain.trainable=true
: iftrue
, the filterstack is set trainable, otherwise only the classifier part is trainable and the filter weights are fixed.
Details:
The model weights are imported from the respective Keras Application, which is trained with preprocessed images of size 224x224 pixel. Image data format must be colour channels BGR
and colour values 0.0 - 1.0
.
This can be re-built by using a preprocessing pipeline and the Helferlein-function preproc_imagenet_vgg()
from a directory img_path
with images:
pipl = CropRatio(ratio=1.0) |> Resize(224,224)
mini_batches = mk_image_minibatch(img_path, 2, train=false,
aug_pipl=pipl, pre_proc=preproc_imagenet_vgg)
Model structure is: VGG16 topology plot created by netron
NNHelferlein.get_resnet50v2
— Functionfunction get_resnet50v2(; filters_only=false, trainable=true)
Return a ResNet50 v2 model with pretrained parameters from Tensorflow/Keras applications API. For details about original model and training see Keras Applications
.
Arguments
filters_only=false
: iftrue
, only the filterstack is returned (without Flatten() and classifier) to be integrated in to any chain.trainable=true
: iftrue
, the filterstack is set trainable, otherwise only the classifier part is trainable and the filter weights are fixed.
Details:
The model weights are imported from the respective Keras Application, which is trained with images of size 224x224 pixel. Cave: The training set images have not been preprocessed with the imagenet default procedure! In contrats image data format must be colour channels RGB
and colour values 0.0 - 1.0
.
This can be re-built by using a preprocessing pipeline with application preproc_imagenet_resnetv2()
from a directory img_path
with images:
pipl = CropRatio(ratio=1.0) |> Resize(224,224)
mini_batches = mk_image_minibatch(img_path, 2, train=false,
aug_pipl=pipl, pre_proc=preproc_imagenet_resnetv2)
Model structure is: ResNet50 V2 topology plot created by netron