Regression problems are a form of machine learning problem in which the output is a continuous variable (denoted with y). The features can be either continuous or categorical. We will denote a set of features with \(x \in \mathbb{R}^n\) for n features, i.e. \(\mathbf{x} = (x_1 , \dots, x_n)\).
More specifically, we will find the \(\mathbf{\hat{w}}\) that minimizes the loss function by differentiating the loss function and setting the derivative equal to zero:
In complicated cases, we can use gradient-based methods to find the optimal weights, as shown below (\(\alpha\) stands for learning rate).
If our dataset has a large number of n data points then computing the gradient as above in each iteration of the gradient descent algorithm might be too computationally intensive. As such, approaches like stochastic and batch gradient descent have been proposed.
In stochastic gradient descent at each iteration of the algorithm we use only one data point to compute the gradient. That one data point is each time randomly sampled form the dataset. Given that we only use one data point to estimate the gradient, stochastic gradient descent can lead to noisy gradients and thus make convergence a bit harder.
Logistic regression allows us to turn a linear combination of our input features into a probability using the logistic function: \(h_\mathbf{w}(\mathbf{x})=\frac1{1+e^{-\mathbf{w}^T\mathbf{x}}}\). It is important to note that though logistic regression is named as regression, this is a misnomer. Logistic regression is used to solve classification problems, not regression problems (for its function looks like:)