Today I learned

Fastai:

While updating weights, we multiply learning rate with derivative of the loss function with respect to the weights.
Loss function is a function of independent variables, X, and weights.
Cross-entropy loss function is useful for classification problems where you don’t care about how close your prediction was.
Softmax is an activation function that allows your output to be between 0 and 1. Somewhat like the Sigmoid that conforms output to a range.
1. Some discussion on when to use or the other of these: Softmax vs Sigmoid function in Logistic classifier?
You generally want Cross-entropy loss and Softmax for single-label multi-class classification problems. They go well together.
Regularization techniques allow you to avoid over-fitting.
1. Weight decay.
2. Dropout.
3. Batch Norm.
4. Data augmentation.
One way to avoid over-fitting is to use less parameters. However, Jeremy proposes that we instead use a lot of parameters but penalize complexity. Weight decay is one way to do the latter.