Today I learned


  1. While updating weights, we multiply learning rate with derivative of the loss function with respect to the weights.
  2. Loss function is a function of independent variables, X, and weights.
  3. Cross-entropy loss function is useful for classification problems where you don’t care about how close your prediction was.
  4. Softmax is an activation function that allows your output to be between 0 and 1. Somewhat like the Sigmoid that conforms output to a range.
    1. Some discussion on when to use or the other of these: Softmax vs Sigmoid function in Logistic classifier?
  5. You generally want Cross-entropy loss and Softmax for single-label multi-class classification problems. They go well together.
  6. Regularization techniques allow you to avoid over-fitting.
    1. Weight decay.
    2. Dropout.
    3. Batch Norm.
    4. Data augmentation.
  7. One way to avoid over-fitting is to use less parameters. However, Jeremy proposes that we instead use a lot of parameters but penalize complexity. Weight decay is one way to do the latter.