PyTorch: You can initialize parameters through the nn.init module. Weight decay parameter is supplied to the optimizer (i.e. torch.optim module), not the loss function.