API doc of all exported functions are listed here:

Chains

NNHelferlein.AbstractNN — Type

abstract type AbstractNN

Mother type for AbstractNN hierarchy with implementation for a chain of layers.

Signatures:

(m::AbstractNN)(x): run the AbstractArray x througth all layers and return the output
(m::AbstractNN)(x,y): Calculate the loss for one minibatch x and teaching input y
(m::AbstractNN)(d::Knet.Data): Calculate the loss for all minibatches in d
(m::AbstractNN)(d::Tuple): Calculate the loss for all minibatches in d
(m::AbstractNN)(d::NNHelferlein.DataLoader): Calculate the loss for all minibatches in d if teaching input is included (i.e. elements of d are tuples). Otherwise return the out of all minibatches as one array with samples as columns.

```

source

NNHelferlein.AbstractChain — Type

abstract type AbstractChain

Mother type for AbstractChain hierarchy with implementation for a chain of layers. By default every AbstractChain has a property layers with a iterable list of AbstractLayers or AbstractChains that are executed recursively.

Non-standard Chains in which Layers are not execueted sequnetially (such as ResnetBlocks) must provide a custom implementation with the signature chain(x).

Signatures:

(m::AbstractChain)(x): run the AbstractArray x througth all layers and return the output

```

source

NNHelferlein.add_layer! — Function

function add_layer!(n::Union{NNHelferlein.AbstractNN, NNHelferlein.AbstractChain}, l)

Add a layer l or a chain to a model n. The layer is always added at the end of the chains. The modified model is returned.

source

Base.:+ — Function

function +(n::Union{NNHelferlein.AbstractNN, NNHelferlein.AbstractChain}, l::Union{AbstractLayer, AbstractChain})
function +(l1::AbstractLayer, l2::Union{AbstractLayer, AbstractChain})

The plus-operator is overloaded to be able to add layers and chains to a network.

The second form returns a new chain if 2 Layers are added.

Example:

julia> mdl = Classifier() + Dense(2,5)
julia> print_network(mdl)

NNHelferlein neural network summary:
Classifier with 1 layers,                                           15 params
Details:
 
    Dense layer 2 → 5 with sigm,                                    15 params
 
Total number of layers: 1
Total number of parameters: 15


julia> mdl = mdl + Dense(5,5) + Dense(5,1, actf=identity)
julia> print_network(mdl)

NNHelferlein neural network summary:
Classifier with 3 layers,                                           51 params
Details:
 
    Dense layer 2 → 5 with sigm,                                    15 params
    Dense layer 5 → 5 with sigm,                                    30 params
    Dense layer 5 → 1 with identity,                                 6 params
 
Total number of layers: 3
Total number of parameters: 51

source

NNHelferlein.Classifier — Type

struct Classifier <: AbstractNN

Classifier with default nll loss. An alternative loss function can be supplied as keyword argument. The function must provide a signature to be called as loss(model(x), y).

Constructors:

Classifier(layers...; loss=Knet.nll)

Signatures:

(m::Classifier)(x,y) = m.loss(m(x), y)

source

NNHelferlein.Regressor — Type

struct Regressor <: AbstractNN

Regression network with square loss as loss function.

Constructors:

Regressor(layers...; loss=mean_squared_error.nll)

Signatures:

(m::Regression)(x,y) = mean(abs2, Array(m(x)) - y)

source

NNHelferlein.Transformer — Type

mutable struct Transformer

A Bert-like transformer network consisting of an encoder and a decoder stack.

Constructor:

Transformer(n_layers, depth, heads; drop_rate=0.1)

n_layers: number of layers in encoder and decoder
depth: embedding depth
heads: number of heads for the multi-head attention
drop_rate: dropout rate used in all layers

Signature:

(tf::Transformer)(x, y; enc_mask=nothing, dec_mask=nothing)

The transformer is called with two 3-d-arrays of embedded sequences x and y of size [depth, seq_len, n_minibatch] and returns a tensor of size [depth, seq_len_y, n_minibatch]. Sequences x and y may be of different lengths; output has always the same dimensions as y.

Attention factors of the last run are stored in the field α of the transformer object.

enc_mask and dec_mask are optional padding masks for the encoder and decoder input, respectively. They must be of size [seq_len, n_minibatch].

source

NNHelferlein.TokenTransformer — Type

mutable struct TokenTransformer

A wrapper around the Transformer object that takes sequences of token ids as input.

Constructor:

TokenTransformer(n_layers, depth, heads, 
                 x_vocab, y_vocab;
                 drop_rate=0.1)

n_layers: number of layers in encoder and decoder
depth: embedding depth
heads: number of heads for the multi-head attention
x_vocab: vocabulary size of the input sequences as integer value or a WordTokenizer object
y_vocab: vocabulary size of the output sequences as integer value or a WordTokenizer object
drop_rate: dropout rate used in all layers

Signature:

    (tt::TokenTransformer)(x, y; enc_mask=nothing, dec_mask=nothing
                           embedded=true)

The transformer is called with two 2-d-arrays of token ids x and y of size [seq_len, n_minibatch] which may be of different lengths. It returns a tensor of size [y_vocab, seq_len_y, n_minibatch] with the raw activations of output neurons or, if embedded is set to false, a 2-d-array of size [seq_len_y, n_minibatch] with the sequences of generated tokens.

source

NNHelferlein.Chain — Type

struct Chain <: AbstractChain

Simple wrapper to chain layers and execute them one after another.

source

NNHelferlein.VAE — Type

struct VAE   <: AbstractNN

Type for a generic variational autoencoder.

Constructor:

VAE(encoder, decoder)

Separate predefind chains (ideally, but not necessarily of type Chain) for encoder and decoder must be specified. The VAE needs the 2 parameters mean and variance to define the distribution of each code-neuron in the bottleneck-layer. In consequence the encoder output must be 2 times the size of the decoder input (in case of dense layers: if encoder output is a 8-value vector, 4 codes are defined and the decoder input is a 4-value vector; in case of convolutional layers the number of encoder output channels must be 2 times the number of the encoder input channels - see the examples).

Signatures:

(vae::VAE)(x)
(vae::VAE)(x,y)

Called with one argument, predict will be executed; with two arguments (args x and y should be identical for the autoencoder) the loss will be returned.

Details:

The loss is calculated as the sum of element-wise error squares plus the Kullback-Leibler-Divergence to adapt the distributions of the bottleneck codes:

\[\mathcal{L} = \frac{1}{2} \sum_{i=1}^{n_{outputs}} (t_{i}-o_{i})^{2} - \frac{1}{2} \sum_{j=1}^{n_{codes}}(1 + ln\sigma_{c_j}^{2}-\mu_{c_j}^{2}-\sigma_{c_j}^{2}) \]

Output of the autoencoder is cropped to the size of input before loss calculation (and before prediction); i.e. the output has always the same dimensions as the input, even if the last layer generates a bigger shape.

KL-training parameters:

The parameter β is by default set to 1.0, i.e. mean-squared error and KL has the same weights. The functions set_beta(vae, beta) and get_beta(vae) can be used to set and get the β used in training. With β=0.0 no KL-loss will be used.

source

NNHelferlein.get_beta — Function

function get_beta(vae::VAE; ramp=false)

Return a Dict with the current VAE-parameters beta and ramp-up.

Arguments:

ramp=false: if true, a vector of β for all ramp-up steps is returned. This way, the ramp-up phase can be visualised: <img src="./assets/vae-beta-range.png"/>

source

NNHelferlein.set_beta! — Function

function setbeta!(vae::VAE, βmax; ramp_up=false, steps=0)

Helper to set the current value of the VAE-parameter beta and ramp-up settings.

VAE loss is calculated as (mean of error squares) + β * (mean of KL divergence).

Ramp-up:

In case of ramp_up=true, β starts with almost 0.0 (sigm(-10.0) ≈4.5e-5) and reaches almost 1.0 after steps steps, following a sigmoid curve. steps should be more than 25, to avoid rounding errors in the calculation of the derivative of the sigmoid function.

source

Layers

NNHelferlein.AbstractLayer — Type

abstract type AbstractLayer
abstract type Layer

Mother type for layers hierarchy. (The type Layer is kept for backward compatibility)

source

Fully connected layers

NNHelferlein.Dense — Type

struct Dense  <: AbstractLayer

Default Dense layer.

Constructors:

Dense(w, b, actf): default constructor, w are the weights and b the bias.
Dense(i::Int, j::Int; actf=sigm, init=..): layer of j neurons with i inputs. Initialiser is xavieruniform for actf=sigm and xaviewnormal otherwise.
Dense(h5::HDF5.File, group::String; trainable=false, actf=sigm): kernel and bias are loaded by the specified group.
Dense(h5::HDF5.File, kernel::String, bias::String; trainable=false, actf=sigm): layer imported from a hdf5-file from TensorFlow with the hdf-object h5 and the group name group.

source

NNHelferlein.Linear — Type

struct Linear  <: AbstractLayer

Almost standard dense layer, but functionality inspired by the TensorFlow-layer:

capable to work with input tensors of any number of dimensions
default activation function identity
optionally without biases.

The shape of the input tensor is preserved; only the size of the first dim is changed from in to out.

Constructors:

Linear(i::Int, j::Int; bias=true, actf=identity, init=xaview_normal) where i is fan-in and j is fan-out.

Keyword arguments:

bias=true: if false biases are fixed to 0.0
actf=identity: activation function.

source

NNHelferlein.Embed — Type

struct Embed <: AbstractLayer

Simple type for an embedding layer to embed a virtual onehot-vector into a smaller number of neurons by linear combination. The onehot-vector is virtual, because not the vector, but only the index of the "one" in the vector has to be provided as Integer value (or a minibatch of integers) with values between 1 and the vocab size.

Constructors:

Embed(v,d; actf=identity, mask=nothing): with vocab size v, embedding depth d and default activation function identity. mask defines the padding token (see below).

Signatures:

(l::Embed)(x): default embedding of input tensor x.

Value:

The embedding is constructed by adding a first dimension to the input tensor with number of rows = embedding depth. If x is a column vector, the value is a matrix. If x is as row-vector or a matrix, the value is a 3-d array, etc.

Padding and masking:

If a token value is defined as mask, occurences are embedded as zero vector. This can be used for padding sequence with zeros. The masking/padding token counts to the vocab size. If padding tokens are not masked, their embedding will be optimised during training (which is not recommended but still possible for many applications).

Zero may be used as padding token, but it must count to the vocab size (i.e. the vocab size must be one larger than the number of tokens) and the keyword arg mask=0 must be specified.

source

Convolutional

NNHelferlein.Conv — Type

struct Conv  <: AbstractLayer

Default Conv layer.

Constructors:

Conv(w1::Int, w2::Int, i::Int, o::Int; actf=relu; kwargs...): layer with o kernels of size (w1,w2) for an input of i channels.
Conv(w1::Int, w2::Int, w3::Int, i::Int, o::Int; actf=relu; kwargs...): layer with 3-dimensional kernels for 3D convolution (requires 5-dimensional input)
Conv(w1::Int, i::Int, o::Int; actf=relu; kwargs...): layer with o kernels of size (1,w1) for an input of i channels. This 1-dimensional convolution uses a 2-dimensional kernel with a first dimension of size 1. Input and output contain an empty firfst dimension of size 1. If padding, stride or dilation are specified, 2-tuples must be specified to correspond with the 2-dimensional kernel (e.g. padding=(0,1) for a 1-padding along the 1D sequence).

Constructors to read parameters from Tensorflow/Keras HDF-files:

Conv(h5::HDF5.File, kernel::String, bias::String; trainable=false, actf=Knet.relu, use_bias=true, kwargs...): Import parameters from HDF file h5 with kernel and bias specifying the full path to weights and biases, respectively.
Conv(h5::HDF5.File, group::String; trainable=false, actf=relu, tf=true, use_bias=true): Import a conv-layer from a default TF/Keras HDF5 file. If tf=false, group defines the full path to the parameters group/kernel:0 and group/bias:0. If tf=true, group defines the only the group name and parameters are addressed as model_weights/group/group/kernel:0 and model_weights/group/group/bias:0.

Keyword arguments:

padding=0: the number of extra zeros implicitly concatenated at the start and end of each dimension.
stride=1: the number of elements to slide to reach the next filtering window.
dilation=1: dilation factor for each dimension.
... See the Knet documentation for Details: https://denizyuret.github.io/Knet.jl/latest/reference/#Convolution-and-Pooling. All keywords to the Knet function conv4() are supported.

source

NNHelferlein.DeConv — Type

struct DeConv  <: AbstractLayer

Default deconvolution layer.

Constructors:

DeConv(w, b, actf, kwargs...): default constructor
DeConv(w1::Int, w2::Int, i::Int, o::Int; actf=relu, kwargs...): layer with o kernels of size (w1,w2) for an input of i channels.
DeConv(w1::Int, w2::Int, w3::Int, i::Int, o::Int; actf=relu, kwargs...): layer with o kernels of size (w1,w2,w3) for an input of i channels.

Keyword arguments:

padding=0: the number of extra zeros implicitly concatenated at the start and end of each dimension (applied to the output).
stride=1: the number of elements to slide to reach the next filtering window (applied to the output).
... See the Knet documentation for Details: https://denizyuret.github.io/Knet.jl/latest/reference/#Convolution-and-Pooling. All keywords to the Knet function deconv4() are supported.

source

NNHelferlein.ResNetBlock — Type

struct ResNetBlock <: AbstractChain

Executable type for one block of a ResNet-type network.

Constructors:

ResNetBlock(layers; shortcut=[identity], post=[identity]): 3 chains to form the block: the main chain, the shortcut and a chain of layers to be added after the confluence. All chains must be specified as lists, even if they are empty ([]) or comprise only one layer ([BatchNorm]).

source

NNHelferlein.DepthwiseConv — Type

DepthwiseConv  <: AbstractLayer

Conv layer with seperate filters per input channel. o output feature maps will be created by performing a convolution on only one input channel. o must be a multiple of i.

Constructors:

DepthwiseConv(w, b, actf; kwargs): default constructor
Conv(w1::Int, w2::Int, i::Int, o::Int; actf=relu, kwargs...): layer with o kernels of size (w1,w2) for every input channel of an 2-d input of i layers. o must be a multiple of i; if o == i, each output feature map is generated from one channel. If o == n*i, n feature maps are generated from each channel.

Keyword arguments:

padding=0: the number of extra zeros implicitly concatenated at the start and end of each dimension.
stride=1: the number of elements to slide to reach the next filtering window.
dilation=1: dilation factor for each dimension.

source

NNHelferlein.Pool — Type

struct Pool <: AbstractLayer

Pooling layer.

Constructors:

Pool(;kwargs...): max pooling; without kwargs, 2-pooling is performed.

Keyword arguments:

window=2: pooling window size (same for all directions)
...: See the Knet documentation for Details: https://denizyuret.github.io/Knet.jl/latest/reference/#Convolution-and-Pooling. All keywords to the Knet function pool are supported.

source

NNHelferlein.UnPool — Type

struct UnPool <: AbstractLayer

Unpooling layer.

Constructors:

UnPool(;kwargs...): user-defined unpooling

source

NNHelferlein.Pad — Type

struct Pad     <: AbstractLayer

Pad an n-dimensional array along dims with one of the types supported by Flux.NNlib.

Constructors:

Pad(padding::Int; type=:zeros, dims=nothing): Pad with padding along all dims.

Keyword arguments:

type: one of
- :zeros: zero-padding
- :ones: one-padding
- :repeat: repeat values on the border
- :relect: reflect values across the border
dims: Tuple of dims to be padded. If dims==nothing all except of the last 2 dimensions (i.e. channel and minibatch dimension for convolution layers) are padded.

source

Recurrent

NNHelferlein.RecurrentUnit — Type

abstract type RecurrentUnit end

Supertype for all recurrent unit types. Self-defined recurrent units which are a child of RecurrentUnit can be used inside the 'Recurrent' layer.

Interface

All subtypes of RecurrentUnit must provide the followning:

a constructor with signature Type(n_inputs, n_units; kwargs) and arbitrary keyword arguments.
an implementation of signature (o::Recurrent)(x) where x is a 3d- or 2d-array of shape [fan-in, mb-size, 1] or [fan-in, mb-size]. The function must return the result of one forward computation for one step and return the hidden state and set the internal fields h and optionally c.
a field h (to store the last hidden state)
an optional field c, if the cell state is to be stored such as in a lstm unit.

source

NNHelferlein.Recurrent — Type

struct Recurrent <: AbstractLayer

One layer RNN that works with minibatches of (time) series data. Minibatch can be a 2- or 3-dimensional Array. If 2-d, inputs for one step are in one column and the Array has as many colums as steps. If 3-d, the last dimension iterates the samples of the minibatch.

Result is an array matrix with the output of the units of all steps for all smaples of the minibatch (with model depth as first and samples of the minimatch as last dimension).

Constructors:

Recurrent(n_inputs::Int, n_units::Int; u_type=:lstm, 
          bidirectional=false, allow_mask=false, o...)

n_inputs: number of inputs
n_units: number of units
u_type : unit type can be one of the Knet unit types (:relu, :tanh, :lstm, :gru) or a type which must be a subtype of RecurrentUnit and fullfill the respective interface (see the docs for RecurentUnit).
bidirectional=false: if true, 2 layers of n_units units will be defined and run in forward and backward direction respectively. The hidden state is [2*n_units*mb] or [2*n_units,steps,mb] id return_all==true.
allow_mask=false: if masking is allowed, a slower algorithm is used to be able to ignore any masked step. Arbitrary sequence positions may be masked for any sequence.

Any keyword argument of Knet.RNN or a self-defined RecurrentUnit type may be provided.

Signatures:

function (rnn::Recurrent)(x; c=nothing, h=nothing, return_all=false, 
          mask=nothing)

The layer is called either with a 2-dimensional array of the shape [fan-in, steps] or a 3-dimensional array of [fan-in, steps, batchsize].

Arguments:

c=0, h=0: inits the hidden and cell state. If nothing, states h or c keep their values. If c=0 or h=0, the states are resetted to 0; otherwise an array of states of the correct dimensions can be supplied to be used as initial states.
return_all=false: if true an array with all hidden states of all steps is returned (size is [units, time-steps, minibatch]). Otherwise only the hidden states of the last step are returned ([units, minibatch]).
mask: optional mask for the input sequence minibatch of shape [steps, minibatch]. Values in the mask must be 1.0 for masked positions or 0.0 otherwise and of type Float32 or CuArray{Float32} for GPU context. Appropriate masks can be generated with the NNHelferlein function mk_padding_mask().

Bidirectional layers can be constructed by specifying bidirectional=true, if the unit-type supports it (Knet.RNN does). Please be aware that the actual number of units is 2 x n_units for bidirectional layers and the output dimension is [2 x units, steps, mb] or [2 x units, mb].

source

NNHelferlein.get_hidden_states — Function

function get_hidden_states(l::<RNN_Type>; flatten=true)

Return the hidden states of one or more layers of an RNN. <RNN_Type> is one of NNHelferlein.Recurrent, Knet.RNN.

Arguments:

flatten=true: if the states tensor is 3d with a 3rd dim > 1, the array is transformed to [units, mb, 1] to represent all current states after the last step.

source

NNHelferlein.get_cell_states — Function

function get_cell_states(l::<RNN_Type>; unbox=true, flatten=true)

Return the cell states of one or more layers of an RNN only if it is a LSTM (Long short-term memory).

Arguments:

unbox=true: By default, c is unboxed when called in @diff context (while AutoGrad is recording) to avoid unwanted dependencies of the computation graph s2s.attn(reset=true) (backprop should run via the hidden states, not the cell states).
flatten=true: if the states tensor is 3d with a 3rd dim > 1, the array is transformed to [units, mb, 1] to represent all current states after the last step.

source

NNHelferlein.set_hidden_states! — Function

function set_hidden_states!(l::<RNN_Type>, h)

Set the hidden states of one or more layers of an RNN to h.

source

NNHelferlein.set_cell_states! — Function

function set_cell_states!(l::<RNN_Type>, c)

Set the cell states of one or more layers of an RNN to c.

source

NNHelferlein.reset_hidden_states! — Function

function reset_hidden_states!(l::<RNN_Type>)

Reset the hidden states of one or more layers of an RNN to 0.

source

NNHelferlein.reset_cell_states! — Function

function reset_cell_states!(l::<RNN_Type>)

Reset the cell states of one or more layers of an RNN to 0.

source

Transformers

NNHelferlein.TFEncoder — Type

TFEncoder

A Bert-like encoder to be used as part of a tranformer. The encoder is build as a stack of TFEncoderLayers which is entered after embedding, positional encoding and generation of a padding mask.

Constructor:

TFEncoder(n_layers, depth, n_heads; drop_rate=0.1)

Signature:

(e::TFEncoder)(x)

The encoder is called with a matrix of embedded tokens of size [depth, seq_len, n_minibatch] and returns a tensor of size [depth, seq_len, n_minibatch].

source

NNHelferlein.TFEncoderLayer — Type

TFEncoderLayer

A Bert-like encoder layer to be used as part of a Bert-like transformer. The layer consists of a multi-head attention sub-layer followed by a feed-forward network of size depth -> 4*depth -> depth. Both parts have separate residual connections and layer normalisation.

The design follows the original paper "Attention is all you need" by Vaswani, 2017.

Constructor:

TFEncoderLayer(depth, n_heads, drop)

depth: Embedding depth
n_heads: number of heads for the multi-head attention
drop_rate: dropout rate

Signature:

(el::TFEncoderLayer)(x; mask=nothing)

Objects of type TFEncoderLayer are callable and expect a 3-dimensional array of size [embeddingdepth, seqlen, minibatchsize] as input. The optional mask must be of size [seqlen, minibatch_size] and mark masked positions with 1.0.

It returns a tensor of the same size as the input and the self-attention factors of size [seqlen, seqlen, minibatch_size].

source

NNHelferlein.TFDecoder — Type

TFDecoder

A Bert-like decoder to be used as part of a tranformer. The decoder is build as a stack of TFDecoderLayers which is entered after embedding, positional encoding and generation of a padding mask and a peek-ahead mask.

Constructor:

TFDecoder(n_layers, depth, n_heads, vocab_size; 
          pad_id=NNHelferlein.TOKEN_PAD, drop_rate=0.1)

Signature:

(e::TFdecoder)(x)

The decoder is called with a matrix of token ids of size [seq_len, n_minibatch] and returns a tensor of size [depth, seq_len, n_minibatch] and the attention factors.

source

NNHelferlein.TFDecoderLayer — Type

TFDecoderLayer

A Bert-like decoder layer to be used as part of a Bert-like transformer. The layer consists of a multi-head self-attention sub-layer, a multi-head attention sub-layer followed by a feed-forward network of size depth -> 4*depth -> depth. All three parts have separate residual connections and layer normalisation.

The design follows the original paper "Attention is all you need" by Vaswani, 2017.

Constructor:

TFDecoderLayer(depth, n_heads, drop)

depth: Embedding depth
n_heads: number of heads for the multi-head attention
drop: dropout rate

Signature:

(el::TFDecoderLayer)(x, h_encoder; enc_m_pad=nothing, m_combi=nothing)

Objects of type TFDecoderLayer are callable and expect a minibatch of embedded sequences as input.

x: 3-dimensional array of size [embeddingdepth, seqlen, minibatch_size]
h_encoder: output of the encoder stack
enc_m_pad: optional padding mask for the encoder output
m_combi: optional mask for the decoder self-attention combining padding and peek-ahead mask.

It returns a tensor of the same size as the input, the self-attention factors and the decoder-encoder attention factors.

source

These layers are used by the Transformer and TokenTransformer types to build Bert-like transformer networks.

Others

NNHelferlein.Flat — Type

struct Flat <: AbstractLayer

Default flatten layer.

Constructors:

Flat(): with no options.

source

NNHelferlein.flatten — Function

flatten(x)

Flatten a tensor to a matrix, preserving the last dimension.

source

NNHelferlein.PyFlat — Type

struct PyFlat <: AbstractLayer

Flatten layer with optional Python-stype flattening (row-major). This layer can be used if pre-trained weight matrices from tensorflow are applied after the flatten layer.

Constructors:

PyFlat(; python=true): if true, row-major flatten is performed.

source

NNHelferlein.FeatureSelection — Type

struct FeatureSelection  <: AbstractLayer

Simple feature selection layer that maps input to output with one-by-one connections; i.e. a layer of size 128 has 128 weights (plus optional biases).

Biases and activation functions are disabled by default.

Constructors:

FeatureSelection(i; bias=false, actf=identity): with the same input- and output-size i, whre i is an integer or a Tuple of the input dimensions.

source

NNHelferlein.Activation — Type

struct Activation <: AbstractLayer

Simple activation layer with the desired activation function as argument.

Constructors:

Activation(actf)
Relu()
Sigm()
Swish()

source

NNHelferlein.Softmax — Type

struct Softmax <: AbstractLayer

Simple softmax layer to compute softmax probabilities.

Constructors:

Softmax()

source

NNHelferlein.Logistic — Type

struct Logistic <: AbstractLayer

Logistic (sigmoid) layer activation with additional Temperature parameter to control the slope of the curve. Low temperatures (such as T=0.001) result in a step-like activation function, whereas high temperatures (such as T=10) makes the activation almoset linear.

Constructors:

Logistic(; T=1.0)

source

NNHelferlein.Dropout — Type

struct Dropout <: AbstractLayer

Dropout layer. Implemented with help of Knet's dropout() function that evaluates AutoGrad.recording() to detect if in training or in prediction. Dropouts are applied only if prediction.

Constructors:

Dropout(p) with the dropout rate p.

source

NNHelferlein.BatchNorm — Type

struct BatchNorm <: AbstractLayer

Batchnormalisation layer. Implemented with help of Knet's batchnorm() function that evaluates AutoGrad.recording() to detect if in training or in prediction. In training the moments are updated to record the running averages; in prediction the moments are applied, but not modified.

In addition, optional trainable factor a and bias b are applied:

\[y = a \cdot \frac{(x - \mu)}{(\sigma + \epsilon)} + b\]

Constructors:

BatchNorm(; scale=true, channels=0) will initialise the moments with Knet.bnmoments() and trainable parameters β and γ only if scale==true (in this case, the number of channels must be defined - for CNNs this is the number of feature maps).

Constructors to read parameters from Tensorflow/Keras HDF-files:

BatchNorm(h5::HDF5.File, β_path, γ_path, μ_path, var_path; scale=false, trainable=true, momentum=0.1, ε=1e-5, dims=4): Import parameters from HDF file h5 with β_path, γ_path, μ_path and var_path specifying the full path to β, γ, μ and variance respectively.
BatchNorm(h5::HDF5.File, group::String; scale=false, trainable=true, momentum=0.1, ε=1e-5, dims=4, tf=true): Import parameters from HDF file h5 with parameters in the group group. Paths to β, γ, μ and variance are constructed if tf=true as model_weights/group/group/beta:0, etc. If tf=false group must define the full group path: group/beta:0. dims specifies the number of dimensions of the input and may be 2, 4 or 5. The default (4) applies to standard CNNs (imgsize, imgsize, channels, batchsize).

Keyword arguments:

scale=true: if true, the trainable scale parameters β and γ are used.
trainable=true. only used with hdf5-import. If true the parameters β and γ are initialised as Param and trained in training.

Details:

2d, 4d and 5d inputs are supported. Mean and variance are computed over dimensions (2), (1,2,4) and (1,2,3,5) for 2d, 4d and 5d arrays, respectively.

If scale=true and channels != 0, trainable parameters β and γ will be initialised for each channel.

If scale=true and channels == 0 (i.e. BatchNorm(scale=true)), the params β and γ are not initialised by the constructor. Instead, the number of channels is inferred when the first minibatch is normalised as: 2d: size(x)[1] 4d: size(x)[3] 5d: size(x)[4] or 0 otherwise.

source

NNHelferlein.LayerNorm — Type

struct LayerNorm  <: AbstractLayer

Simple layer normalisation (inspired by TFs LayerNormalization). Implementation is from Deniz Yuret's answer to feature request 429 (https://github.com/denizyuret/Knet.jl/issues/492).

The layer performs a normalisation within each sample, not batchwise. Normalisation is modified by two trainable parameters a and b (variance and mean) added to every value of the sample vector.

Constructors:

LayertNorm(depth; eps=1e-6): depth is the number of activations for one sample of the layer.

Signatures:

function (l::LayerNorm)(x; dims=1): normalise x along the given dimensions. The size of the specified dimension must fit with the initialised depth.

source

NNHelferlein.GaussianNoise — Type

struct GaussianNoise

Gaussian noise layer. Multiplies Gaussian-distributed random values with mean = 1.0 and sigma = σ to each training value.

Constructors:

aussianNoise(σ; train_only=true)

Arguments:

σ: Standard deviation for the distribution of noise
train_only=true: if true, noise will only be applied in training.

source

NNHelferlein.GlobalAveragePooling — Type

struct GlobalAveragePooling  <: AbstractLayer

Layer to return a matrix with the mean values of all but the last two dimensions for each sample of the minibatch. If the input is a stack of feature maps from a convolutional layer, the result can be seen as the mean value of each feature map. Number of output-rows equals number of input-featuremaps; number of output-columns equals size of minibatch.

Constructors:

GlobalAveragePooling()

source

NNHelferlein.global_average_pooling — Function

global_average_pooling(x)

Function to return a matrix with the mean values of all but the last two dimensions for each sample of the minibatch.

source

Attention Mechanisms

NNHelferlein.AttentionMechanism — Type

abstract type AttentionMechanism

Attention mechanisms follow the same interface and common signatures.

If possible, the algorithm allows precomputing of the projections of the context vector generated by the encoder in a encoder-decoder-architecture (i.e. in case of an RNN encoder the accumulated encoder hidden states).

By default attention scores are scaled according to Vaswani et al., 2017 (Vaswani et al., Attention Is All You Need, CoRR, 2017).

All algorithms use soft attention.

Constructors:

Attn*Mechanism*(dec_units, enc_units; scale=true)
Attn*Mechanism*(units; scale=true)

The one-argument version can be used, if encoder dimensions and decoder dimensions are the same.

Common Signatures:

function (attn::AttentionMechanism)(h_t, h_enc; reset=false, mask=nothing)
function (attn::AttentionMechanism)(; reset=false)

Arguments:

h_t: decoder hidden state. If $h_t$ is a vector, its length equals the number of decoder units. If it is a matrix, $h_t$ includes the states for a minibatch of samples and has the size [units, mb].
h_enc: encoder hidden states, 2d or 3d. If $h_{enc}$ is a matrix [units, steps] with the hidden states of all encoder steps. If 3d: [units, mb, steps] encoder states for all minibatches.
mask: optional mask (e.g. padding mask) for masking input steps of dimensions [mb, steps]. Attentions factors for masked steps will be set to 0.0.
reset=false: If the keyword argument is set to true, projections of the encoder states are computed. By default projections are stored in the object and reused until the object is resetted. For attention mechanisms that do not allow precomputation the argument is ignored.

The short form (::AttentionMechanism)(reset=true) can be used to reset the precomputed projections.

Return values

All functions return c and α where α is a matrix of size [mb,steps] with the attention factors for each step and minibatch. c is a matrix of size [units, mb] with the context vector for each sample of the minibatch, calculated as the α-weighted sum of all encoder hidden states $h_{enc}$ for each minibatch.

Attention Mechanisms:

All attention mechanisms calculate attention factors α from scores derived from projections of the encoder hidden states:

\[\alpha = \mathrm{softmax}(\mathrm{score}(h_{enc},h_{t}) \cdot 1/\sqrt{n}))\]

Attention mechanisms implemented:

source

NNHelferlein.AttnBahdanau — Type

mutable struct AttnBahdanau <: AttentionMechanism

Bahdanau-style (additive, concat) attention mechanism according to the paper:

D. Bahdanau, KH. Co, Y. Bengio, Neural Machine Translation by jointlylearning to align and translate, ICLR, 2015.

\[\mathrm{score}(h_{t},h_{enc}) = v_{a}^{\top}\cdot\tanh(W[h_{t},h_{enc}])\]

Constructors:

AttnBahdanau(dec_units, enc_units; scale=true)
AttnBahdanau(units; scale=true)

source

NNHelferlein.AttnLuong — Type

mutable struct AttnLuong <: AttentionMechanism

Luong-style (multiplicative) attention mechanism according to the paper (referred as General-type attention): M.-T. Luong, H. Pham, C.D. Manning, Effective Approaches to Attention-based Neural Machine Translation, CoRR, 2015.

\[\mathrm{score}(h_{t},h_{enc}) = h_{t}^{\top} W h_{enc}\]

Constructors:

AttnLuong(dec_units, enc_units; scale=true)
AttnLuong(units; scale=true)

source

NNHelferlein.AttnDot — Type

mutable struct AttnDot <: AttentionMechanism

Dot-product attention (without trainable parameters) according to the Luong, et al. (2015) paper.

$\mathrm{score}(h_{t},h_{enc}) = h_{t}^{\top} h_{enc}$

Constructors:

AttnDot(; scale=true)

source

NNHelferlein.AttnLocation — Type

mutable struct AttnLocation <: AttentionMechanism

Location-based attention that only depends on the current decoder state $h_t$ and not on the encoder states, according to the Luong, et al. (2015) paper.

$\mathrm{score}(h_{t}) = W h_{t}$

Constructors:

AttnLocation(len, dec_units; scale=true)

len: maximum sequence length of the encoder to be considered for attention. If the actual length of $h_{enc}$ is bigger than the length of α, attention factors for the remaining states are set to 0.0. If the actual length of h_enc is smaller than α, only the matching attention factors are applied.
dec_units: number of decoder units.

source

NNHelferlein.AttnInFeed — Type

mutable struct AttnInFeed <: AttentionMechanism

Input-feeding attention that depends on the current decoder state $h_t$ and the next input to the decoder $i_{t+1}$, according to the Luong, et al. (2015) paper.

Infeed attention provides a semantic attention that depends on the next input token.

$\mathrm{score}(h_{t}, i_{t+1}) = W_h h_{t} + W_i i_{t+1} = W [h_t, i_{t+1}]$

Constructors:

AttnInFeed(len, dec_units, fan_in; scale=true)

len: maximum sequence length of the encoder to be considered for attention. If the actual length of $h_{enc}$ is bigger than the length of α, attention factors for the remaining states are set to 0.0. If the actual length of h_enc is smaller than α, only the matching attention factors are applied.
dec_units: number of decoder units.
fan_in: size of the decoder input.

Signature:

function (attn::AttnInFeed)(h_t, inp, h_enc; mask=nothing)

h_t: decoder hidden state. If $h_t$ is a vector, its length equals the number of decoder units. If it is a matrix, $h_t$ includes the states for a minibatch of samples and has the size [units, mb].
inp: next decoder input $i_{t+1}$ (e.g. next embedded token of sequence)
h_enc: encoder hidden states, 2d or 3d. If $h_{enc}$ is a matrix [units, steps] with the hidden states of all encoder steps. If 3d: [units, mb, steps] encoder states for all minibatches.
mask: Optional mask for input states of shape [mb, steps].

source

Data providers

NNHelferlein.DataLoader — Type

abstract type DataLoader

Mother type for minibatch iterators.

source

NNHelferlein.SequenceData — Type

struct SequenceData <: DataLoader

Type for a generic minibatch iterator. All NNHelferlein models accept minibatches of type DataLoader.

Constructors:

SequenceData(x; shuffle=true)

x: List, Array or other iterable object with the minibatches
shuffle: if true, minibatches are shuffled every epoch.

source

Iteration utilities

NNHelferlein.PartialIterator — Type

struct PartialIterator <: DataLoader

The PartialIterator wraps any iterator and will only iterate the states specified in the list indices.

Constuctors

PartialIterator(inner, indices; shuffle=true)

Type of the states must match the states of the wrapped iterator inner. A nothing element may be given to specify the first iterator element.

If shuffle==true, the list of indices are shuffled every time the PartialIterator is started.

source

NNHelferlein.split_minibatches — Function

function split_minibatches(it, at=0.8; shuffle=true)

Return 2 iterators of type PartialIterator which iterate only parts of the states of the iterator it. Be aware that the partial iterators will not contain copies of the data but instead forward the data provided by the iterator it.

The function can be used to split an iterator of minibatches into train- and validation iterators, without copying any data. As the PartialIterator objects work with the states of the inner iterator, it is important not to shuffle the inner iterator (in this case the composition of the partial iterators would change and training and validation data may be mixed!).

Arguments:

it: Iterator to be splitted. The list of allowed states is created by performing a full iteration once.
at: Split point. The first returned iterator will include the given fraction (default: 80%) of the states.
shuffle: If true, the elements are shuffled at each restart of the iterator.

source

NNHelferlein.MBNoiser — Type

type MBNoiser

Iterator to wrap any Knet.Data iterator of minibatches in order to add random noise. Each value will be multiplied with a random value form Gaussian noise with mean=1.0 and sd=σ.

Construtors:

MBNoiser(mbs::Knet.Data, σ)
MBNoiser(mbs::Knet.Data; σ=0.01)

mbs: iterator with minibatches
σ: standard deviation for the Gaussian noise

Example:

julia> trn = minibatch(x)
julia> tb_train!(mdl, Adam, MBNoiser(trn, σ=0.1))
julia> mbs_noised = MBNoiser(mbs, 0.05)

source

NNHelferlein.MBMasquerade — Type

struct MBMasquerade  <: DataLoader

Iterator wrapper to partially mask training data of a minibatch iterator of type Knet.Data or NNHelferlein.DataLoader.

Constructors:

MBMasquerade(it, rho=0.1; mode=:noise, value=0)
MBMasquerade(it; ρ=0.1, mode=:noise, value=0)

The constructor may be called with the density rho as normal argument or ρ as keyword argument.

Arguments:

it: Minibatch iterator that must deliver (x,y)-tuples of minibatches
ρ=0.1 or rho: Density of mask; a value of 1.0 will mask everything, a value of 0.0 nothing.
value=0: the value with which the masking is done.
mode=:noise: type of masking (only :noise implemented yet):
- :noise: randomly distributed single values of the training data will be overwitten with value.

Examples:

julia> dtrn 
26-element Knet.Train20.Data{Tuple{CuArray{Float32}, Array{UInt8}}}

julia> mtrn = Masquerade(dtrn, 0.5, value=2.0h)
Masquerade(26-element Knet.Train20.Data{Tuple{CuArray{Float32}, Array{UInt8}}}, 0.5, 2.0, :noise)

source

NNHelferlein.GPUIterator — Type

GPUIterator(iterator)

Wraps any iterator and makes it return CuArrays. Element types are preserved except of Float-Types, which are casted to Float32 for performance reasons).

Contsructor:

GPUIterator(iterator; y=:cpu): + iterator: any iterator + y: if :gpu, the labels of the iterator are also converted to CuArray{}. If :cpu, the labels are not converted. For a classifier (labels are integers), keeping labels on the cpu is more efficient. For Regression (labels are Floats), labels on the gpu is recommended.

Deprecation warning:

Use of GPUIterator is deprecated in favour of CUDA.CuIterator, which offeres similar functionality.

source

Tabular data

Tabular data is normally provided in table form (csv, ods) row-wise, i.e. one sample per row. The helper functions can read the tables and generate Knet compatible iterators of minibatches.

NNHelferlein.dataframe_read — Function

dataframe_read(fname; o...)

Read a data table from an CSV-file with one sample per row and return a DataFrame with the data. (ODS-support is removed because of PyCall compatibility issues of the OdsIO package).

All keyword arguments accepted by CSV.File() can be used.

source

NNHelferlein.dataframe_minibatch — Function

dataframe_minibatch(data::DataFrames.DataFrame; size=256, 
                    ignore=[], teaching=nothing, 
                    verbose=1, o...)

dataframe_minibatches()

Make Knet-conform minibatches of type Knet.data from a dataframe with one sample per row.

dataframe_minibatches() is an alieas kept for backward compatibility.

Arguments:

ignore: defines a list of column names to be ignored
teaching=nothing: defines the column name with teaching input. teaching is handled differently, depending on its type: If Int, the teaching input is interpreted as class IDs and directly used for training (this assumes that the values range from 1..n). If type is a String, values are interpreted as class labels and converted to numeric class IDs by calling mk_class_ids(). The list of valid lables and their order can be created by calling mk_class_ids(data.y)[2]. If teaching is a scalar value, regression context is assumed, and the value is used unchanged for training. If teaching is nothing, no teaching input is used and minibatches of x-data only are returned.
verbose=1: if > 0, a summary of how the dataframe is used is echoed.
other keyword arguments: all keyword arguments accepted by Knet.minibatch() may be used.

Allowed column definitions for ignore and teaching include names (as Strings), column names (as Symbols) or column indices (as Integer values).

source

NNHelferlein.dataframe_split — Function

function dataframe_split(df::DataFrames.DataFrame;
                         teaching="y", split=0.8, balanced=true)

Split data, organised row-wise in a DataFrame into train and validation sets.

Arguments:

df: data
teaching="y": name or index of column with teaching input "y"
split=0.8: fraction of data to be used for the first returned subdataframe
shuffle=true: shuffle the rows of the dataframe.
balanced=true: if true, result datasets will be balanced by oversampling. Returned datasets will be bigger as expected but include the same numbers of samples for each class.

source

NNHelferlein.mk_class_ids — Function

function mk_class_ids(labels)

Take a list with n class labels for n instances and return a list of n class-IDs (of type Int) and an array of lables with the array index of each label corresponds its ID.

Arguments:

labels: List of labels (typically Strings)

Result values:

array of class-IDs in the same order as the input
array of unique class-IDs ordered by their ID.

Examples:

julia> labels = ["blue", "red", "red", "red", "green", "blue", "blue"]
7-element Array{String,1}:
 "blue"
 "red"
 "red"
 "red"
 "green"
 "blue"
 "blue"

julia> mk_class_ids(labels)[1]
7-element Array{Int64,1}:
 1
 3
 3
 3
 2
 1
 1

 julia> mk_class_ids(labels)[2]
3-element Array{String,1}:
 "blue"
 "green"
 "red"

source

Image data

Images as data should be provided in directories with the directory names denoting the class labels. The helpers read from the root of a directory tree in which the first level of sub-dirs tell the class label. All images in the tree under a class label are read as instances of the respective class. The following tree will generate the classes daisy, rose and tulip:

image_dir/
├── daisy
│   ├── 01
│   │   ├── 01
│   │   ├── 02
│   │   └── 03
│   ├── 02
│   │   ├── 01
│   │   └── 02
│   └── others
├── rose
│   ├── big
│   └── small
└── tulip

NNHelferlein.ImageLoader — Type

struct ImageLoader <: DataLoader
    dir
    i_paths
    i_classes
    classes
    batchsize
    shuffle
    train
    aug_pipl
    pre_proc
    pre_load
    i_images
end

Iterable image loader to provide minibatches of images as 4-d-arrays (x,y,rgb,mb).

source

NNHelferlein.mk_image_minibatch — Function

function mk_image_minibatch(dir, batchsize; split=false, at=0.8,
                            balanced=false, shuffle=true, train=true,
                            pre_load=false,
                            aug_pipl=nothing, pre_proc=nothing)

Return one or two iterable image-loader-objects that provides minibatches of images. For training each minibatch is a tupel (x,y) with x: 4-d-array with the minibatch of data and y: vector of class IDs as Int.

Arguments:

dir: base-directory of the image dataset. The first level of sub-dirs are used as class names.
batchsize: size of minibatches

Keyword arguments:

split: return two iterators for training and validation
at: split fraction (for training; the rest is for validation).
balanced: return balanced data (i.e. same number of instances for all classes). Balancing is achieved via oversampling
shuffle: if true, shuffle the images everytime the iterator restarts
train: if true, minibatches with (x,y) tuples are provided, if false only x (for prediction)
aug_pipl: augmentation pipeline for Augmentor.jl. Augmentation is performed before the pre_proc-function is applied
pre_proc: function with preprocessing and augmentation algorithms of type x = f(x). In contrast to the augmentation that modifies images, is pre_proc working on Arrays{Float32}.
pre_load=false: read all images from disk once when populating the loader (requires loads of memory, but speeds up training).

source

NNHelferlein.get_class_labels — Function

function get_class_labels(d::DataLoader)

Extracts a list of class labels from a DataLoader.

source

NNHelferlein.image2array — Function

function image2array(img)

Take an image and return a 3d-array for RGB and a 2d-array for grayscale images with the colour channels as last dimension.

source

NNHelferlein.array2image — Function

function array2image(arr)

Take a 3d-array with colour channels as last dimension or a 2d-array and return an array of RGB or of Gray as Image.

source

NNHelferlein.array2RGB — Function

function array2RGB(arr)

Take a 3d-array with colour channels as last dimension or a 2d-array and return always an array of RGB as Image.

source

Text data

NNHelferlein.WordTokenizer — Type

mutable struct WordTokenizer
    len
    w2i
    i2w
end

Create a word-based vocabulary: every unique word of a String or a list of Strings is assigned to a unique number. The created object includes a list of words (i2w, ordered by their numbers) and a dictionary w2i with the words as keys.

The constants TOKEN_START, TOKEN_END, TOKEN_PAD and TOKEN_UNKOWN are exported.

The WordTokenizer implements length, so length(vt::WordTokenizer) reuturns the number of words in the vocabulary.

Constructor:

function WordTokenizer(texts; len=nothing, add_ctls=true)

With arguments:

texts: AbstractArray or iterable collection of AbstractArrays to be analysed.
len=nothing: maximum number of different words in the vocabulary. Additional words in texts will be encoded as unknown. If nothing, all words of the texts are included.
add_ctls=true: if true, control words are added in front of the vocabulary (extending the maximum length by 4): "<start>"=>1, "<end>"=>2, "<pad>"=>3 and "<unknown>"=>4.

Signatures:

function (t::WordTokenizer)(w::T; split_words=false, add_ctls=false)
                            where {T <: AbstractString}

Encode a word and return the corresponding number in the vocabulary or the highest number (i.e. "<unknown>") if the word is not in the vocabulary.

The encode-signature accepts the keyword arguments split_words and add_ctls. If split_words==true, the input is treated as a sentence and splitted into single words and an array of integer with the encoded sequence is returned. If add_ctls==true the sequence will be framed by <start> and <end> tokens.

function (t::WordTokenizer)(i::Integer)

Decode a word by returning the word corresponding to i or "<unknown>" if the number is out of range of the vocabulary.

function (t::WordTokenizer)(s::AbstractArray{T}; add_ctls=false)
                           where {T <: AbstractString}

Called with an Array of Strings the tokeniser splits the strings into words and returns an Array of Array{Integer} with each of the input strings represented by a sequence of Integer values.

function (t::WordTokenizer)(seq::AbstractArray{T}; add_ctls=false)
                                 where {T <: Integer}

Called with an Array of Integer values a single string is returned with the decoded token-IDs as words (space-separated).

Base Signatures:

    function length(t::WordTokenizer)

Return the length of the vocab.

Examples:

julia> vocab = WordTokenizer(["I love Julia", "They love Python"]);
Julia> vocab(8)
"Julia"

julia> vocab("love")
5

julia> vocab.(split("I love Julia"))
3-element Array{Int64,1}:
 5
 6
 8

julia> vocab.i2w
9-element Array{String,1}:
 "<start>"
 "<end>"
 "<pad>"
 "<unknown>"
 "love"
 "I"
 "They"
 "Julia"
 "Python"

julia> vocab.w2i
Dict{String,Int64} with 9 entries:
  "I"         => 6
  "<end>"     => 2
  "<pad>"     => 3
  "They"      => 7
  "Julia"     => 8
  "love"      => 5
  "Python"    => 9
  "<start>"   => 1
  "<unknown>" => 4

julia> vocab.([7,5,8])
3-element Array{String,1}:
 "They"
 "love"
 "Julia

julia> vocab.("I love Scala", split_words=true)
3-element Array{Int64,1}:
 6
 5
 4

julia> vocab.([6,5,4])
3-element Array{String,1}:
 "I"
 "love"
 "<unknown>"

julia> vocab("I love Python", split_words=true, add_ctls=true)
5-element Array{Int64,1}:
 1
 6
 5
 9
 2

julia> vocab(["They love Julia", "I love Julia"])
2-element Array{Array{Int64,1},1}:
 [7, 5, 8]
 [6, 5, 8]

source

NNHelferlein.get_tatoeba_corpus — Function

function get_tatoeba_corpus(lang; force=false,
            url="https://www.manythings.org/anki/")

Download and read a bilingual text corpus from Tatoeba (provided) by ManyThings (https://www.manythings.org). All corpi are English-Language-pairs with different size and quality. Considerable languages include:

fra: French-English, 180 000 sentences
deu: German-English, 227 000 sentences
heb: Hebrew-English, 126 000 sentences
por: Portuguese-English, 170 000 sentences
tur: Turkish-English, 514 000 sentences

The function returns two lists with corresponding sentences in both languages. Sentences are not processed/normalised/cleaned, but exactly as provided by Tatoeba.

The data is stored in the package directory and only downloaded once.

Arguments:

lang: languagecode
force=false: if true, the corpus is downloaded even if a data file is already saved.
url: base url of ManyThings.

source

NNHelferlein.sequence_minibatch — Function

function sequence_minibatch(x, [y], batchsize; 
                            pad=NNHelferlein.TOKEN_PAD, 
                            seq2seq=true, pad_y=pad,
                            x_padding=false,
                            shuffle=true, partial=false)

Return an iterator of type DataLoader with (x,y) sequence minibatches from two lists of sequences.

All sequences within a minibatch in x and y are brought to the same length by padding with the token provided as pad.

The sequences are sorted by length before building minibatches in order to reduce padding (i.e. sequences of similar length are combined to a minibatch). If the same sequence length is needed for all minibatches, the sequences must be truncated or padded before call of sequence_minibatch() (see functions truncate_seqence() and pad_sequence()).

Arguments:

x: List of sequences of Int
y: List of sequences of Int or list of target values (i.e. teaching input)
batchsize: size of minibatches
pad=NNHelferlein.PAD_TOKEN,
pad_y=x: token, used for padding. The token must be compatible with the type of the sequence elements. If pad_y is omitted, it is set equal to pad_x.
seq2seq=true: if true and y is provided, sequence-to-sequence minibatches are created. Otherwise y is treated as scalar teaching input.
shuffle=true: The minibatches are shuffled as last step. If false the minibatches with short sequences will be at the beginning of the dataset.
partial=false: If true, a partial minibatch will be created if necessaray to include all input data.
x_padding=false: if true, pad sequences in x to make minibatches of the demanded size, even if there are not enougth sequences of the same length in x. If false, partial minibatches are built (if partial == true) or remaining sequneces are skipped (if partial == false).

source

NNHelferlein.pad_sequence — Function

function pad_sequence(s, len; token=NNHelferlein.TOKEN_PAD)

Stretch a sequence to length len by adding the padding token.

source

NNHelferlein.truncate_sequence — Function

function truncate_sequence(s, len; end_token=nothing)

Truncate a sequence to the length len. If not isnothing(end_token), the last token of the sequence is overwritten by the token.

source

NNHelferlein.clean_sentence — Function

function clean_sentence(s)

Cleaning a sentence in some simple steps:

normalise Unicode
remove punctuation
remove duplicate spaces
strip

source

Training

NNHelferlein.tb_train! — Function

function tb_train!(mdl, opti, trn, vld=nothing; epochs=1, split=nothing,
                  lr_decay=nothing, lrd_steps=5, lrd_linear=false,
                  l2=nothing, l1=nothing,
                  eval_size=0.2, eval_freq=1,
                  acc_fun=nothing,
                  mb_loss_freq=100,
                  checkpoints=nothing, cp_dir="checkpoints",
                  tb_dir="logs", tb_name="run",
                  tb_text="""Description of tb_train!() run.""",
                  resume=true, tensorboard=true, return_stats=false,
                  opti_args...)

Train function with TensorBoard integration. TB logs are written with the TensorBoardLogger.jl package. The model is updated (in-place) and the trained model is returned.

Arguments:

mdl: model; i.e. forward-function for the net
opti: Knet-stype optimiser type
trn: training data; iterator to provide (x,y)-tuples with minibatches
vld: validation data; iterator to provide (x,y)-tuples with minibatches. Set to nothing, if not defined.

Keyword arguments:

Optimiser:

epochs=1: number of epochs to train
resume=true: if true, optimiser parameters (momentum or gradient moving average) from a previous run are used to enable a seemless continuation of the training. Be aware that in a resumeed training, the original optimizer will be used, even if a different one is specified for the continuation.
lr_decay=nothing: do a leraning rate decay if not nothing: the value given is the final learning rate after lrd_steps steps of decay (lr_decay may be bigger than lr; in this case the leraning rate is increased). lr_decay is only applied if both start learning rate lr and final learning rate lr_decay are defined explicitly. Example: lr=0.01, lr_decay=0.001 will reduce the lr from 0.01 to 0.001 during the training (by default in 5 steps). lr_decay is applied to l1 and l2 with the same decay rate.
lrd_steps=5: number of learning rate decay steps. Default is 5, i.e. modify the lr 4 times during the training (resulting in 5 different learning rates).
lrd_linear=false: type of learning rate decay; If false, lr is modified by a constant factor (e.g. 0.9) resulting in an exponential decay. If true, lr is modified by the same step size, i.e. linearly.
l1=nothing: L1 regularisation; implemented as weight decay per parameter. If learning-rate decay is used, L1 and L2 are also decayed.
l2=nothing: L2 regularisation; implemented as weight decay per parameter
opti_args...: optional keyword arguments for the optimiser can be specified (i.e. lr, gamma, ...).

Model evaluation:

split=nothing: if no validation data is specified and split is a fraction (between 0.0 and 1.0), the training dataset is splitted at the specified point (e.g.: if split=0.8, 80% of the minibatches are used for training and 20% for validation).
eval_size=0.2: fraction of validation data to be used for calculating loss and accuracy for train and validation data during training.
eval_freq=1: frequency of evaluation; default=1 means evaluation is calculated after each epoch. With eval_freq=10 eveluation is calculated 10 times per epoch.
acc_fun=nothing: function to calculate accuracy. The function must implement the following signature: fun(model; data) where data is an iterator that provides (x,y)-tuples of minibatches. For classification tasks, accuracy from the Knet package is a good choice. For regression a correlation or mean error may be preferred.
mb_loss_freq=100: frequency of training loss reporting. default=100 means that 100 loss-values per epoch will be logged to TensorBoard. If mblossfreq is greater then the number of minibatches, loss is logged for each minibatch.
checkpoints=nothing: frequency of model checkpoints written to disk. Default is nothing, i.e. no checkpoints are written. To write the model after each epoch with name model use cpepoch=1; to write every second epochs cpepoch=2, etc.
cp_dir="checkpoints": directory for checkpoints
return_stats=false: if true, a dictionary with losses and accuracies of training and validation data is returned instead of the model.

TensorBoard:

TensorBoard log-directory is created from 3 parts: tb_dir/tb_name/<current date time>.

tensorboard=true: if true, TensorBoard logs are written
tb_dir="logs": root directory for TensorBoard logs.
tb_name="run": name of training run. tb_name will be used as directory name and should not include whitespace
tb_text: description to be included in the TensorBoard log as text log.

source

Evaluation and accuracy

NNHelferlein.focal_nll — Function

function focal_nll(scores, labels::AbstractArray{<:Integer}; γ=2.0, dims=1)
function focal_nll(mdl; data, γ=2.0, dims=1)

Calculate the negative log-likelihood (i.e. cross entropy) with increased weights on weekly classified samples. focal nll for sample j is defined as

\[- (1 - p_{j})^{\gamma} \cdot \ln p_{j} =\]

\[(1 - p_{j})^{\gamma} \cdot nll(p_{j})\]

where p is the softmax-scaled likelyhood for the true class of the j-th sample. The sample weight is high, if predicted p << 1.

The second signature can be used to caclulate the mean focus nll for a dataset of minibatches of (x,y)-tuples.

Arguments:

scores: unnormalised scores (i.e. activations of output neurons without applying an activation function), typically of a classifier with one neuron per class
labels: ground truth as integer values
γ=2.0: The parameter γ controls the strength of the effect: for γ=0, all weights become exactly 1.0; with higher values for γ, focus on mis-classified or weakly classified sample is increased.

dims=1: dimension in which the instances are organised.

source

NNHelferlein.focal_bce — Function

function focal_bce(scores, labels::AbstractArray{<:Integer}; 
function focal_bce(mdl; data, γ=2.0, dims=1)

Calculate the biray crossentropywith increased weights on weekly classified samples. focal bce for sample j is defined as

\[(1 - p_{j})^{\gamma} \cdot bce(p_{j})\]

where p is the softmax-scaled likelyhood for the true class of the j-th sample. The sample weight is high, if predicted p << 1.

The second signature can be used to caclulate the mean focus bce for a dataset of minibatches of (x,y)-tuples.

For arguments and details, please refer to the documentation of focal_nll.

source

NNHelferlein.predict — Function

function predict(mdl; data, softmax=false)
function predict(mdl, x; softmax=false )

Return the prediction for minibatches of data. The signature follows the standard call predict(model, data=xxx). The second signature predicts a single Array of data.

Arguments:

mdl: executable network model
data=iterator: iterator providing minibatches of input data; if the minibatches include y-values (i.e. teaching input), predictions (i.e. index of class with highest value and the y-values will be returned.
data: single Array of input data (i.e. input for one minibatch)
softmax: if true or if model is of type Classifier the predicted softmax probabilities are returned instead of raw activations.

source

NNHelferlein.predict_top5 — Function

function predict_top5(mdl; data, top_n=5, classes=nothing)

Run the model mdl for data in minibatches data and print the top 5 predictions as softmax probabilities.

Arguments:

top_n: print top n hits
classes: optional list of human readable class labels.

source

NNHelferlein.minibatch_eval — Function

function minibatch_eval(mdl, fun, data; o...)

Given an accuracy or loss function fun(p, y) that returns an accuracy meassure for n-dimensional arrays of predictions p and teaching input y (i.e. one minibatch of data), minibatch_eval() applies the fun() to all minibatches supplied by the minibatch iterator data.

Arguments:

mdl: model to compute predictions
fun: evaluation function for one minibatch that returns the mean of results for all samples of the minibatch
data: iterator that supplies a Tuple of (x,y) for each minibatch

o...: all additional keyword arguments are forwarded to fun().

source

NNHelferlein.squared_error_acc — Function

function squared_error_acc(mdl; data)

Return the mean squared error between the predictions of the model mdl and the corresponding teaching input by providung the standard signature fun(model, data=iterator).

Arguments

mdl: model with the signature mdl(x) to generate predictions for one minibatch (i.e. array) of data.
data: iterator, providing (x,y)-tuples of training or validation data.

source

NNHelferlein.abs_error_acc — Function

function abs_error_acc(mdl; data)

Return the mean absolute error between the predictions of the model mdl and the corresponding teaching input by providung the standard signature fun(model, data=iterator).

Arguments

mdl: model with the signature mdl(x) to generate predictions for one minibatch (i.e. array) of data.
data: iterator, providing (x,y)-tuples of training or validation data.

source

NNHelferlein.hamming_dist — Function

function hamming_dist(p, t; accuracy=false, 
                            ignore_ctls=false, vocab=nothing, 
                            start=nothing, stop=nothing, pad=nothing, unk=nothing)


function hamming_acc(p, t; o...)

function hamming_acc(mdl; data=data, o...)

Return the Hamming distance between two sequences or two minibatches of sequences. Predicted sequences p and teaching input sequences t may be of different length but the number of sequences in the minibatch must be the same.

Arguments:

p, t: n-dimensional arrays of type Int with predictions and teaching input for a minibatch of sequences. Shape of the arrays must be identical except of the first dimension (i.e. the sequence length) that may differ between p and t.
accuracy=false: if false, the mean Hamming distance in the minibatch is returned (i.e. the average number of differences in the sequences). If true, the accuracy is returned for all not padded positions in a range (0.0 - 1.0).
ignore_ctls=false: a vocab is used to replace all '<start>, <end>, <unknwon>, <pad>' tokens by <pad>. If true, padding and other control tokens are treated as normal codes and are not ignored.
vocab=nothing: target laguage vocabulary of type NNHelferlein.WordTokenizer. If defined, the padding token of vocab is used to mask all control tokens in the sequences (i.e. '<start>, <end>, <unknwon>, <pad>').
start, stop, pad, unk: may be used to define individual control tokens. default is nothing.

Details:

The function hamming_acc() is a shortcut to return the accuracy instead of the distance. The signature hamming_acc(mdl; data=data; o...) is for compatibility with acc functions called by train.

source

NNHelferlein.peak_finder_acc — Function

function peak_finder_acc(p, t; ret=:f1, verbose=0, 
                         tolerance=1, limit=0.5

function peak_finder_acc(mdl; data=data, o...)

Calculate an accuracy-like measure for data series consisting mainly of zeros and rare peaks. The function counts the number of peaks in y detected by p (true positives), peaks not detected (false negatives) and the number of peaks in p not present in y (false positives).

It is assumed that peaks in y are marked by a single value higher as the limit (typically 1.0). Peaks in p may be broader; and are defined as local maxima with a value above the limit. If the tolerance ist set to > 0, it may happen that the peaks at the first or last step are not evaluated (because evaluation stops at end-tolerance).

If requested, f1, G-mean and intersection over union are calulated from the raw values .

Arguments:

p, t: Predictions p and teaching input t (i.e. y) are mini-batches of 1-d series of data. The sequence must be in the 1st dimension (column). All other dims are treated as separate windows of length size(p/t,1).
ret: return value as Symbol; one of :peaks, :recall, :precision, :miss_rate, :f1, :g_mean, :iou or :all. If :all a named tuple is returned.
verbose=0: if 0, no additional output is generated; if 1, composite measures are printed to stdout; if 2, all raw counts are printed.
tolerance=1: peak finder tolerance: The peak is defined as correct if it is detected within the tolerance.
limit=0.5: Only maxima with values above the limit are considered.

source

NNHelferlein.confusion_matrix — Function

function confusion_matrix(mdl; data, labels=nothing, pretty_print=true, accuracy=true)
function confusion_matrix(y, p; labels=nothing, pretty_print=true, accuracy=true)

Compute and display the confusion matrix of (x,y)-minibatches. Predictions are calculated with model mdl for which a signature mdl(x) must exist.

The second signature generates the confusion matrix from the 2 vectors ground truth y and predictions p.

The function is an interface to the function confusmat provided by the package MLBase.

Arguments:

mdl: mdl with signature mdl(x) to generate predictions
data: minibatches of (x,y)-tuples
pretty_print=true: if true, the matrix will pe displayed to stdout
labels=nothing: a vecor of human readable labels can be provided
accuracy=true: if true, accuracy, precisiomn and recall is printed for all classes.

source

ImageNet tools

NNHelferlein.preproc_imagenet_vgg — Function

function preproc_imagenet_vgg(img)
function preproc_imagenet_resnetv2(img)

Image preprocessing for pre-trained ImageNet examples. Preprocessing includes

bring RGB colour values into a range 0-255
standardise of colour values by substracting mean colour values (103.939, 116.779, 123.68) from RGB
changing colour channel sequence from RGB to BGR
normalising or scaling colour values.

Resize is not done, because this may be part of the augmentation pipeline.

Details

Unfortunately image preprocessing is not consistent between all pretrained Tenrflow/Keras applications. As a result, different preprocessing functions must be used for different pretrained applications:

VGG16, VGG19: preproc_imagenet_vgg (colour space: BGR, values: 0 - 255, centered according to the imagenet training set)
RESNET: preproc_imagenet_resnet (identical to vgg)
RESNET V2: preproc_imagenet_resnetv2 (colour space: RGB, values: -1.0 - 1.0, scaled for each sample individually)

Examples:

The function can be used with the image loader; for prediction with a trained model as:

pipl = CropRatio(ratio=1.0) |> Resize(224,224)
images = mk_image_minibatch("./example_pics", 16;
                    shuffle=false, train=false,
                    aug_pipl=pipl,
                    pre_proc=preproc_imagenet_vgg)

And for training something like:

pipl = Either(1=>FlipX(), 1=>FlipY(), 2=>NoOp()) |>
       Rotate(-5:5) |>
       ShearX(-5:5) * ShearY(-5:5) |>
       RCropSize(224,224)

dtrn, dvld = mk_image_minibatch("./example_pics", 16;
                    split=true, at=0.8, balanced=false,
                    shuffle=true, train=true,
                    aug_pipl=pipl,
                    pre_proc=preproc_imagenet_vgg)

source

NNHelferlein.preproc_imagenet_resnet — Function

preproc_imagenet_resnet(img)

See documentation of preproc_imagenet_vgg.

source

NNHelferlein.preproc_imagenet_resnetv2 — Function

preproc_imagenet_resnetv2(img)

See documentation of preproc_imagenet_vgg.

source

NNHelferlein.predict_imagenet — Function

function predict_imagenet(mdl; data, top_n=5)

Predict the ImageNet-class of images from the predefined list of class labels.

source

NNHelferlein.get_imagenet_classes — Function

function get_imagenet_classes()

Return a list of all 1000 ImageNet class labels.

source

Other utils

Layers and helpers for transformers

NNHelferlein.PositionalEncoding — Type

struct PositionalEncoding <: AbstractLayer

Positional encoding layer. Only sincos-style (according to Vaswani, et al., NIPS 2017) is implemented.

The layer takes an array of any number of dimensions (>=2), calculates the Vaswani-2017-style positional encoding and adds the encoding to each plane of the array.

source

NNHelferlein.positional_encoding_sincos — Function

function positional_encoding_sincos(n_embed, n_seq)

Calculate and return a matrix of size [n_embed, n_seq] of positional encoding values following the sin and cos style in the paper Vaswani, A. et al.; Attention Is All You Need; 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 2017.

source

NNHelferlein.mk_padding_mask — Function

function mk_padding_mask(x; pad=TOKEN_PAD, add_dims=false)

Make a padding mask; i.e. return an Array of type KnetArray{Float32} (or Array{Float32}) similar to x but with two additional dimensions of size 1 in the middle (this will represent the 2nd seq_len and the number of heads) in multi-head attention and the value 1.0 at each position where x is pad and 0.0 otherwise.

The function can be used for creating padding masks for attention mechanisms.

Arguments:

x: Array of sequences (typically a matrix with ncols sequences of length nrows)
pad: value for the token to be masked
add_dims: if true, 2 additional dimensions are inserted to return a 4-D-array as needed for transformer architectures. Otherwise the size of the returned array is similar to x.

source

NNHelferlein.mk_peek_ahead_mask — Function

function mk_peek_ahead_mask(x; dim=1)
function mk_peek_ahead_mask(n_seq)

Return a matrix of size [n_seq, n_seq] filled with 1.0 and the uppper triangle set to 0.0. Type is CuArray{Float32} in GPU context, Array{Float32} otherwise. The matrix can be used as peek-ahead mask in transformers.

dim=1 specifies the dimension in which the sequence length is represented. For un-embedded data this is normally 1, i.e. the shape of x is [nseq, nmb]. After embedding the shape probably is [depth, nseq, nmb].

source

NNHelferlein.dot_prod_attn — Function

function dot_prod_attn(q, k, v; mask=nothing)

Generic scaled dot product attention following the paper of Vaswani et al., (2017), Attention Is All You Need.

Arguments:

q: query of size [depth, n_seq_q, ...]
k: key of size [depth, n_seq_v, ...]
v: value of size [depth, n_seq_v, ...]
mask: mask for attention factors may have different shapes but must be broadcastable for addition to the scores tensor (which as the same size as alpha [n_seq_v, n_seq_q, ...]). In transformer context typical masks are one of: padding mask of size [n_seq_v, ...] or a peek-ahead mask of size [n_seq_v, n_seq_v] (which is only possible in case of self-attention when all sequence lengths are identical).

q, k, v must have matching leading dimensions (i.e. same depth or embedding). k and v must have the same sequence length.

Return values:

c: context as alpha-weighted sum of values with size [depth, nseqv, ...]
alpha: attention factors of size [nseqv, nseqq, ...]

source

NNHelferlein.MultiHeadAttn — Type

struct MultiHeadAttn <: AbstractLayer

Multi-headed attention layer, designed following the Vaswani, 2017 paper.

Constructor:

MultiHeadAttn(depth, n_heads)

depth: Embedding depth
n_heads: number of heads for the attention.

Signature:

function(mha::MultiHeadAttn)(q, k, v; mask=nothing)

q, k, v are 3-dimensional tensors of the same size ([depth, seqlen, nminibatch]) and the optional mask must be of size [seqlen, nminibatch] and mark masked positions with 1.0.

source

NNHelferlein.separate_heads — Function

function separate_heads(x, n)

Helper function for multi-headed attention mechanisms: an additional second dimension is added to a tensor of minibatches by splitting the first (i.e. depth).

source

NNHelferlein.merge_heads — Function

function merge_heads(x)

Helper to merge the result of multi-headed attention back to full depth .

source

Utils for array manipulation

NNHelferlein.crop_array — Function

function crop_array(x, crop_sizes)

Crop a n-dimensional array to the given size. Cropping is always centered (i.e. a margin is removed).

Arguments:

x: n-dim AbstractArray
crop_sizes: Tuple of target sizes to which the array is cropped. Allowed values are Int or :. If crop_sizes defines less dims as x has, the remaining dims will not be cropped (assuming :). If a demanded crop size is bigger as the actual size of x, it is ignored.

source

NNHelferlein.blowup_array — Function

function blowup_array(x, n)

Blow up an array x with an additional dimension and repeat the content of the array n times.

Arguments:

x: Array of any dimension
n: number of repeats. ´n=1´ will return an

array with an additional dimension of size 1.

Examples:

julia> x = [1,2,3,4]; blowup_array(x, 3)
4×3 Array{Int64,2}:
 1  1  1
 2  2  2
 3  3  3
 4  4  4

julia> x = [1 2; 3 4]; blowup_array(x, 3)
2×2×3 Array{Int64,3}:
[:, :, 1] =
 1  2
 3  4

[:, :, 2] =
 1  2
 3  4

[:, :, 3] =
 1  2
 3  4

source

NNHelferlein.recycle_array — Function

function recycle_array(x, n; dims=dims(x))

Recycle an array x along the specified dimension (default the last dimension) and repeat the content of the array n times. The number of dims stays unchanged, but the array values are repeated n times.

Arguments:

x: Array of any dimension
n: number of repeats. ´n=1´ will return an unchanged array
dims: dimension to be repeated.

Examples:

julia> recycle_array([1,2],3)
6-element Array{Int64,1}:
 1
 2
 1
 2
 1
 2

julia> x = [1 2; 3 4]
2×2 Array{Int64,2}:
 1  2
 3  4

julia> recycle_array(x,3)
2×6 Array{Int64,2}:
 1  2  1  2  1  2
 3  4  3  4  3  4

julia> recycle_array([1 2 3],3, dims=1)
3x3 Array{Int64,2}:
 1 2 3
 1 2 3
 1 2 3

source

NNHelferlein.de_embed — Function

function de_embed(x; remove_dim=false)

Replace the maximum of the first dimension of an n-dimensional array by its index (aka argmax()). If remove_dim is true, the result has the first dimension removed; otherwise the returned array has the first dimension with size 1 (default).

Examples:

> x = [1 1 1
       2 1 1
       1 2 1
       1 1 2]
> de_embed(x)
1×3 Matrix{Int64}:
 2  3  4

> de_embed(x, remove_dim=true)
3-element Vector{Int64}:
 2
 3
 4

source

Utils for fixing types in GPU context

NNHelferlein.init0 — Function

function init0(siz...)

Initialise a vector or array of size siz with zeros. If a GPU is detected type of the returned value is KnetArray{Float32}, otherwise Array{Float32}.

Examples:

julia> init0(2,10)
2×10 Array{Float32,2}:
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0

 julia> init0(0,10)
 0×10 Array{Float32,2}

source

NNHelferlein.convert2CuArray — Function

function convert2CuArray(x, innerType=Float32)
function convert2KnetArray(x, innerType=Float32)
function ifgpu(x, innerType=Float32)

Convert an array x to a CuArray{Float32} or whatever specified as innerType only in GPU context (if CUDA.functional()) or to an Array{Float32} otherwise. By converting, the data is copied to the GPU.

convert2KnetArray() is kept as an alias for backward compatibility.

ifgpu() is an alias/shortcut to convert2KnetArray().

source

NNHelferlein.convert2KnetArray — Function

function convert2CuArray(x, innerType=Float32)
function convert2KnetArray(x, innerType=Float32)
function ifgpu(x, innerType=Float32)

convert2KnetArray() is kept as an alias for backward compatibility.

ifgpu() is an alias/shortcut to convert2KnetArray().

source

NNHelferlein.ifgpu — Function

function convert2CuArray(x, innerType=Float32)
function convert2KnetArray(x, innerType=Float32)
function ifgpu(x, innerType=Float32)

convert2KnetArray() is kept as an alias for backward compatibility.

ifgpu() is an alias/shortcut to convert2KnetArray().

source

NNHelferlein.emptyCuArray — Function

function emptyCuArray(size...=(0,0);innerType=Float32)
function emptyKnetArray(size...=(0,0);innerType=Float32)

Return an empty CuArray with the specified dimensions. The array may be empty (i.e. one dimension 0) or elements will be undefined.

By default an empty matrix is returned.

Examples:

>>> emptyKnetArray(0,0)
0×0 Knet.KnetArrays.KnetMatrix{Float32}

>>> emptyKnetArray()
0×0 Knet.KnetArrays.KnetMatrix{Float32}

>>> emptyKnetArray(0)
0-element Knet.KnetArrays.KnetVector{Float32}

source

Utils for Bioinformatics

NNHelferlein.aminoacid_tokenizer — Function

aminoacid_tokenizer(sec; ignore_unknown=true)

Tokenize a protein sequence into amino acids using the following table:

    Amino acid | Token | Description
    --------------------------------
    C          | 1     | Cysteine
    S          | 2     | Serine
    T          | 3     | Threonine 
    A          | 4     | Alanine
    G          | 5     | Glycine
    P          | 6     | Proline
    D          | 7     | Aspartic acid
    E          | 8     | Glutamic acid
    Q          | 9     | Glutamine
    N          | 10    | Asparagine
    H          | 11    | Histidine
    R          | 12    | Arginine
    K          | 13    | Lysine
    M          | 14    | Methionine
    I          | 15    | Isoleucine
    L          | 16    | Leucine
    V          | 17    | Valine
    W          | 18    | Tryptophan
    Y          | 19    | Tyrosine
    F          | 20    | Phenylalanine

    B          | 21    | Aspartic acid or Asparagine
    Z          | 22    | Glutamic acid or Glutamine
    J          | 23    | Leucine or Isoleucine
    U          | 24    | Selenocysteine
    X          | 25    | Unknown amino acid
    .          | 26    | padding token

Arguments:

sec: A string containing the protein sequence in uppercase or lowercase. All other letters or symbols will be converted to the unknwon token.
ignore_unknown: If true, unkown amino acids (i.e. "X") will be converted to the padding token. If false, the embedding for "X" will be trained as for all other amino acids.

source

NNHelferlein.embed_blosum62 — Function

embed_blosum62(x)

Embed a protein sequence into a 21-dimensional vector using the BLOSUM62 amino acid substitution matrix. Aminoacid are encoded as with NNHelferleins aminoacid tokenizer function. x can be any AbstractArray of Int and a dimension of size 21 will be added as the first dimension.

source

NNHelferlein.embed_vhse8 — Function

embed_vhse8(x)

Embed a protein sequence into a 8-dimensional vector using the VHSE8 amino acid embedding scheme. Aminoacid are encoded as with NNHelferleins aminoacid tokenizer function. x can be any AbstractArray of Int and a dimension of size 21 will be added as the first dimension.

source

NNHelferlein.EmbedAminoAcids — Type

EmbedAminoAcids <: AbstractLayer

Embed a protein sequence into a 21-dimensional vector using the BLOSUM62 amino acid substitution matrix or as a 8-dimensional vector using the VHSE8 parameters. Aminoacids must be encoded acording to NNHelferlein's aminoacid tokenizer function.

Layer input a is a n-dimensional array of an Integer type. Output is a (n+1)-dimensional array of Float32 type with a first (added) dimension of size 21 or 8.

Constructor:

EmbedAminoAcids(embedding::Symbol=:blosum62):
- embedding=:blosum62: Either :blosum62 or :vhse8 to select the embedding scheme.

source

Saving, loading and inspection of models

NNHelferlein.save_network — Function

save_network(fname, mdl)

Save a model as jld2-file.

Arguments:

fname: filename; if the name does not end with the extension .jld2, it will be added.
mdl: network model to be saved. The model will be copied to a cpu-based model via copy_network(mdl, to=:cpu) before saving, to remove hardware dependencies of parameters on the gpu.

source

NNHelferlein.load_network — Function

load_network(fname; to=:gpu)

Load a model from a jld2-file.

Arguments:

fname: filename; if the name does not end with the extension .jld2, it will be added.
to=:gpu: by default, parameters are loaded as CuArrays, if a functional gpu is detected. If to=:cpu is specified parameters are loaded as cpu-arrays.

source

NNHelferlein.copy_network — Function

copy_network(mdl::AbstractNN; to=:gpu)

Returns a copy of a Helferlein model. cave: the copy is generated by Adapt.adapt() and no deep copy!

Arguments:

mdl: Network model of type AbstractNN.
to=:gpu: by default all parameters of the copy are CuArrays for GPU usage. If to=:cpu is specified, parameters are Arrays and the model will be processed in the cpu.

source

Base.summary — Function

function summary(mdl)

Print a network summary of any model of Type AbstractNN, AbstractChain or AbstractLayer.

source

NNHelferlein.print_network — Function

function print_network(mdl::AbstractNN)

Alias to summary(), kept for backward compatibility only.

source

Datasets

NNHelferlein.dataset_mit_nsr — Function

function dataset_mit_nsr(records=nothing; force=false)

Retrieve the Physionet ECG data set: "MIT-BIH Normal Sinus Rhythm Database". If necessary the data is downloaded from Zenodo (and stored in the NNHelferlein data directory, ).

All 18 recordings are returned as a list of DataFrames.

ECGs from the MIT-NSR database with some modifications to make them more suitable as playground data set for machine learning.

all 18 ECGs are trimmed to approx. 50000 heart beats from a region without recording errors
scaled to a range -1 to 1 (non-linear/tanh)
heart beats annotation as time series with value 1.0 at the point of the annotated beat and 0.0 for all other times
additional heart beat column smoothed by applying a gaussian filter
provided as csv with columns "time in sec", "channel 1", "channel 2", "beat" and "smooth".

Arguments:

force=false: if true the download will be forced and local data will be overwitten.
records: list of records names to be downloaded.

Examples:

nsr_16265 = dataset_mit_nsr("16265")
nsr_16265 = dataset_mit_nsr(["16265", "19830"])
nsr_all = dataset_mit_nsr()

source

NNHelferlein.dataset_mnist — Function

function dataset_mnist(; force=false)

Download the MNIST dataset with help of MLDatasets.jl from Yann LeCun's official website. 4 arrays xtrn, ytrn, xtst, ytst are returned.

xtrn and xtst will be the images as a multi-dimensional array, and ytrn and ytst the corresponding labels as integers.

The image(s) is/are returned in the horizontal-major memory layout as a single numeric array of eltype Float32. The values are scaled to be between 0 and 1. The labels are returned as a vector of Int8.

In the teaching input (i.e. y) the digit 0 is encoded as 10.

The data is stored in the Helferlein data directory and only downloaded the files are not already saved.

Ref.: Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. "Gradient-based learning applied to document recognition." Proceedings of the IEEE, 86(11):2278-2324, November 1998 http://yann.lecun.com/exdb/mnist/.

Arguments:

force=false: if true, the dataset download will be forced.

source

NNHelferlein.dataset_fashion_mnist — Function

function dataset_fashion_mnist(; force=false)

Download Zalando's Fashion-MNIST datset with help of MLDatasets.jl from https://github.com/zalandoresearch/fashion-mnist.

4 arrays xtrn, ytrn, xtst, ytst are returned in the same structure as the original MNIST dataset.

The data is stored in the Helferlein data directory and only downloaded the files are not already saved.

Authors: Han Xiao, Kashif Rasul, Roland Vollgraf

Arguments:

force=false: if true, the dataset download will be forced.

source

NNHelferlein.dataset_iris — Function

function dataset_iris()

Return Fisher's iris dataset of 150 records as dataframe.

Ref: Fisher,R.A. "The use of multiple measurements in taxonomic problems" Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions to Mathematical Statistics" (John Wiley, NY, 1950). https://archive.ics.uci.edu/ml/datasets/Iris

source

NNHelferlein.dataset_pfam — Function

function dataset_pfam(records; force=false)

Retrieve the curated PFAM protein families database from Zenodo including 46872 sequences from 62 families. Sequences are between 100 and 1000 amino acids long and families have between 100 and 200 memebers. Training and test data are padded to a length of 1000 amino acids with the padding token of the amino acid tokenizer (26).

More information about the data set can be found at https://zenodo.org/record/8138939, including PDB sequence IDs for each data table.

Available records:

:raw: dataframe with all (46872) rows of data and the columns ID (PDB-ID), family (family name) and sequence (amino acid sequence)
:families: list of all family names as dataframe with the columns class (cnumeric class ID 1-62), family (family name) and and count (number of family members in the dataset)
:aminoacids: list of amino acid tokes as dataframe with the columns Token (aa token 1-26), One-Letter (one-letter code of the amino acid), and Amino acid (full name of the amino acid)
:train: dataframe with 42187 rows of training data and labels with the class ID as first column and the amino acid tokens as columns 2-1001 (padded to 1000 amino acids)
:test: dataframe with 4687 rows of test data in the same format as the training data
:balanced_train: dataframe with 111601 rows of balanced training data in the same format as the training data. The data is balanced by sampling 1800 sequences from each family.
:balanced_test: dataframe with 12401 rows of balanced test data in the same format as the training data.

source

Pretrained networks

NNHelferlein.get_vgg16 — Function

function get_vgg16(; filters_only=false, trainable=true)

Return a VGG16 model with pretrained parameters from Tensorflow/Keras applications API. For details about original model and training see Keras Applications.

Arguments

filters_only=false: if true, only the filterstack is returned (without Flatten() and classifier) to be integrated in to any chain.
trainable=true: if true, the filterstack is set trainable, otherwise only the classifier part is trainable and the filter weights are fixed.

Details:

The model weights are imported from the respective Keras Application, which is trained with preprocessed images of size 224x224 pixel. Image data format must be colour channels BGR and colour values 0.0 - 1.0.

This can be re-built by using a preprocessing pipeline and the Helferlein-function preproc_imagenet_vgg() from a directory img_path with images:

pipl = CropRatio(ratio=1.0) |> Resize(224,224)
mini_batches = mk_image_minibatch(img_path, 2, train=false, 
        aug_pipl=pipl, pre_proc=preproc_imagenet_vgg)

Model structure is: VGG16 topology plot created by netron

source

NNHelferlein.get_resnet50v2 — Function

function get_resnet50v2(; filters_only=false, trainable=true)

Return a ResNet50 v2 model with pretrained parameters from Tensorflow/Keras applications API. For details about original model and training see Keras Applications.

Arguments

filters_only=false: if true, only the filterstack is returned (without Flatten() and classifier) to be integrated in to any chain.
trainable=true: if true, the filterstack is set trainable, otherwise only the classifier part is trainable and the filter weights are fixed.

Details:

The model weights are imported from the respective Keras Application, which is trained with images of size 224x224 pixel. Cave: The training set images have not been preprocessed with the imagenet default procedure! In contrats image data format must be colour channels RGB and colour values 0.0 - 1.0.

This can be re-built by using a preprocessing pipeline with application preproc_imagenet_resnetv2() from a directory img_path with images:

pipl = CropRatio(ratio=1.0) |> Resize(224,224)
mini_batches = mk_image_minibatch(img_path, 2, train=false, 
        aug_pipl=pipl, pre_proc=preproc_imagenet_resnetv2)

Model structure is: ResNet50 V2 topology plot created by netron

source