View the assignment here

$\newcommand{\vect}{\boldsymbol{\mathbf{#1}}}$ The loss function for logistic regression is defined by $$J\left( \vect \theta \right) = - \log p\left( \vect y | \vect X, \vect \theta \right)= -\sum _{i=1} ^n y_i \log \pi_i + (1-y_i)\log (1 - \pi_i)$$ where $\pi_i=\frac{1}{1+\exp(-\vect \theta^T \vect x_i)} = \frac{\exp(-\vect \theta^T \vect x_i)}{1+\exp(\vect \theta^T \vect x_i)}$, $\vect y \in \mathbb{R}^{n \times 1}$, $\vect \theta \in \mathbb{R}^{d \times 1}$ and $\vect X \in \mathbb{R}^{n \times d}$.

By algebraic manipulation it can be written as $$J(\vect \theta) = \sum _{i=1} ^n \log(\exp(\vect \theta^T \vect x_i) + 1) - y_i \vect \theta^T \vect x_i$$

Then the gradient is $$\vect g(\vect \theta) = \frac{\partial}{\partial \vect \theta} J(\vect \theta) = \sum _{i=1}^n \frac{1}{\exp(\vect \theta^T \vect x_i) + 1} \exp(\vect \theta^T \vect x_i) \vect x_i^T - y_i\vect x_i^T$$ $$=\sum_{i=1}^n \vect x_i^T (\pi_i - y_i) = \vect X^T (\vect \pi - \vect y)$$

And the Hessian is $$\frac{\partial}{\partial \vect \theta} \vect g(\vect \theta)^T = \frac{\partial}{\partial \vect \theta} \sum _{i=1}^n \left( \frac{1}{1+\exp(-\vect \theta^T \vect x_i)} - y_i\right) \vect x_i$$ $$= \sum _{i=1}^n \frac {\exp(-\vect \theta^T \vect x_i)\vect x_i } {\left( 1 + \exp(-\vect \theta^T \vect x_i) \right)^2} \vect x_i^T = \sum _{i=1}^n \pi_i (1 - \pi_i)\vect x_i \vect x_i^T= \vect X^T diag(\pi_i(1-\pi_i))\vect X$$