- While updating weights, we multiply learning rate with derivative of the loss function with respect to the weights.
- Loss function is a function of independent variables,
X, and weights.
- Cross-entropy loss function is useful for classification problems where you don’t care about how close your prediction was.
- Softmax is an activation function that allows your output to be between 0 and 1. Somewhat like the Sigmoid that conforms output to a range.
- Some discussion on when to use or the other of these: Softmax vs Sigmoid function in Logistic classifier?
- You generally want Cross-entropy loss and Softmax for single-label multi-class classification problems. They go well together.
- Regularization techniques allow you to avoid over-fitting.
- Weight decay.
- Batch Norm.
- Data augmentation.
- One way to avoid over-fitting is to use less parameters. However, Jeremy proposes that we instead use a lot of parameters but penalize complexity. Weight decay is one way to do the latter.