PyTorch:

  • You can initialize parameters through the nn.init module.
  • Weight decay parameter is supplied to the optimizer (i.e. torch.optim module), not the loss function.