Note:
$$\Large\begin{array}{c|l|l} \textbf{Paramter} & \textbf{Variable} & \textbf{Type} \\\hline\text{Inputs} & \text{$x_i (i^{th}$ input)} & \text{Real or binary} \\\hline\text{Synaptic weights} & \text{$w_i$ associated with $x_i$} & \text{Real, initialized with random values} \\\hline\text{Threshold bias} & \text{$\theta$} & \text{Real, initialized with random values} \\\hline\text{Output} & \text{$y$} & \text{Binary} \\\hline\text{Activation function} & \text{$g(.)$} & \text{Step or bipolar step function} \\\hline\text{Training} & \text{--} & \text{Supervised} \\\hline\text{Learning rule} & \text{--} & \text{Whatever rule you're using} \\\end{array}$$where,
$$y = g(u)$$where we do the following:
(here the activation functions are categorized in 2 groups, namely partially - and fully differentiable functions when we consider their full definition domains.
Given the density associated with output $y \epsilon (0, 1)$
Knowing that our $p_y(y)$ is the following:
Do also note that,
\frac{1}{y(1 - y)\sqrt2\pi\sigma^2}exp\bigg-{\frac{(ln\frac{y}{1 - y} - \mu)^2}{2\sigma^2}\bigg} $$
we start to explore
$$\Large H(y) \doteq-\int\limits_0^1 p_y(y) ln p_y(y)dy = \Large\int\limits_{-\infty}^{\infty}\frac{e^-\frac{(x-\mu)^2}{2\sigma^2}}{\sqrt{2\pi\sigma^2}} = \Bigg\{ \frac{(z - \mu)^2}{2\mu^2} + ln \bigg[g(z)(1 - (g(z)) \sqrt{2\pi\sigma^2}\bigg]\Bigg\}dz $$$$ = \Large\int\limits_{-\infty}^{\infty}\frac{e^-\frac{(x-\mu)^2}{2\sigma^2}}{\sqrt{2\pi\sigma^2}}\Bigg\{\frac{(z - \mu)^2}{2\sigma^2} + ln\sqrt{2\pi\sigma^2} + lng'(z)\Bigg\}dz = H(z) + E_z\Bigg\{lng'(z)\Bigg\}$$here $H(z), E_z\bigg\{.\bigg\}, g'(z)$ are the entropy, output and derivative of $g(.)$ compute on varialbes $z$.