Pytorch在训练时冻结某些层使其不参与训练

我们知道,深度学习网络中的参数是通过计算梯度,在反向传播进行更新的,从而能得到一个优秀的参数,但是有的时候,我们想固定其中的某些层的参数不参与反向传播。比如说,进行微调时,我们想固定已经加载预训练模型的参数部分,只想更新最后一层的分类器,这时应该怎么做呢。

定义网络

  1. # 定义一个简单的网络
  2. class net(nn.Module):
  3. def __init__(self, num_class=10):
  4. super(net, self).__init__()
  5. self.fc1 = nn.Linear(8, 4)
  6. self.fc2 = nn.Linear(4, num_class)
  7. def forward(self, x):
  8. return self.fc2(self.fc1(x))

情况一:当不冻结层时

  1. model = net()
  2.  
  3. # 情况一:不冻结参数时
  4. loss_fn = nn.CrossEntropyLoss()
  5. optimizer = optim.SGD(model.parameters(), lr=1e-2) # 传入的是所有的参数
  6.  
  7. # 训练前的模型参数
  8. print("model.fc1.weight", model.fc1.weight)
  9. print("model.fc2.weight", model.fc2.weight)
  10.  
  11. for epoch in range(10):
  12. x = torch.randn((3, 8))
  13. label = torch.randint(0,10,[3]).long()
  14. output = model(x)
  15. loss = loss_fn(output, label)
  16. optimizer.zero_grad()
  17. loss.backward()
  18. optimizer.step()
  19.  
  20. # 训练后的模型参数
  21. print("model.fc1.weight", model.fc1.weight)
  22. print("model.fc2.weight", model.fc2.weight)

结果:

  1. (bbn) jyzhang@admin2-X10DAi:~/test$ python -u "/home/jyzhang/test/net.py"
  2. model.fc1.weight Parameter containing:
  3. tensor([[ 0.3362, -0.2676, -0.3497, -0.3009, -0.1013, -0.2316, -0.0189, 0.1430],
  4. [-0.2486, 0.2900, -0.1818, -0.0942, 0.1445, 0.2410, -0.1407, -0.3176],
  5. [-0.3198, 0.2039, -0.2249, 0.2819, -0.3136, -0.2794, -0.3011, -0.2270],
  6. [ 0.3376, -0.0842, 0.2747, -0.0232, 0.0768, 0.3160, -0.1185, 0.2911]],
  7. requires_grad=True)
  8. model.fc2.weight Parameter containing:
  9. tensor([[ 0.4277, 0.0945, 0.1768, 0.3773],
  10. [-0.4595, -0.2447, 0.4701, 0.2873],
  11. [ 0.3281, -0.1861, -0.2202, 0.4413],
  12. [-0.1053, -0.1238, 0.0275, -0.0072],
  13. [-0.4448, -0.2787, -0.0280, 0.4629],
  14. [ 0.4063, -0.2091, 0.0706, 0.3216],
  15. [-0.2287, -0.1352, -0.0502, 0.3434],
  16. [-0.2946, -0.4074, 0.4926, -0.0832],
  17. [-0.2608, 0.0165, 0.0501, -0.1673],
  18. [ 0.2507, 0.3006, 0.0481, 0.2257]], requires_grad=True)
  19. model.fc1.weight Parameter containing:
  20. tensor([[ 0.3316, -0.2628, -0.3391, -0.2989, -0.0981, -0.2178, -0.0056, 0.1410],
  21. [-0.2529, 0.2991, -0.1772, -0.0992, 0.1447, 0.2480, -0.1370, -0.3186],
  22. [-0.3246, 0.2055, -0.2229, 0.2745, -0.3158, -0.2750, -0.2994, -0.2295],
  23. [ 0.3366, -0.0877, 0.2693, -0.0182, 0.0807, 0.3117, -0.1184, 0.2946]],
  24. requires_grad=True)
  25. model.fc2.weight Parameter containing:
  26. tensor([[ 0.4189, 0.0985, 0.1723, 0.3804],
  27. [-0.4593, -0.2356, 0.4772, 0.2784],
  28. [ 0.3269, -0.1874, -0.2173, 0.4407],
  29. [-0.1061, -0.1248, 0.0309, -0.0062],
  30. [-0.4322, -0.2868, -0.0319, 0.4647],
  31. [ 0.4048, -0.2150, 0.0692, 0.3228],
  32. [-0.2252, -0.1353, -0.0433, 0.3396],
  33. [-0.2936, -0.4118, 0.4875, -0.0782],
  34. [-0.2625, 0.0192, 0.0509, -0.1670],
  35. [ 0.2474, 0.3056, 0.0418, 0.2265]], requires_grad=True)

情况二:采用方式一冻结fc1层时

方式一

  1. 优化器传入所有的参数
  1. optimizer = optim.SGD(model.parameters(), lr=1e-2) # 传入的是所有的参数
  1. 将要冻结层的参数的requires_grad置为False
  1. for name, param in model.named_parameters():
  2. if "fc1" in name:
  3. param.requires_grad = False

代码:

  1. # 情况二:采用方式一冻结fc1层时
  2. loss_fn = nn.CrossEntropyLoss()
  3. optimizer = optim.SGD(model.parameters(), lr=1e-2) # 优化器传入的是所有的参数
  4.  
  5. # 训练前的模型参数
  6. print("model.fc1.weight", model.fc1.weight)
  7. print("model.fc2.weight", model.fc2.weight)
  8.  
  9. # 冻结fc1层的参数
  10. for name, param in model.named_parameters():
  11. if "fc1" in name:
  12. param.requires_grad = False
  13.  
  14. for epoch in range(10):
  15. x = torch.randn((3, 8))
  16. label = torch.randint(0,10,[3]).long()
  17. output = model(x)
  18. loss = loss_fn(output, label)
  19. optimizer.zero_grad()
  20. loss.backward()
  21. optimizer.step()
  22.  
  23. print("model.fc1.weight", model.fc1.weight)
  24. print("model.fc2.weight", model.fc2.weight)

结果:

  1. (bbn) jyzhang@admin2-X10DAi:~/test$ python -u "/home/jyzhang/test/net.py"
  2. model.fc1.weight Parameter containing:
  3. tensor([[ 0.3163, -0.1592, -0.2360, 0.1436, 0.1158, 0.0406, -0.0627, 0.0566],
  4. [-0.1688, 0.3519, 0.2464, -0.2693, 0.1284, 0.0544, -0.0188, 0.2404],
  5. [ 0.0738, 0.2013, 0.0868, 0.1396, -0.2885, 0.3431, -0.1109, 0.2549],
  6. [ 0.1222, -0.1877, 0.3511, 0.1951, 0.2147, -0.0427, -0.3374, -0.0653]],
  7. requires_grad=True)
  8. model.fc2.weight Parameter containing:
  9. tensor([[-0.1830, -0.3147, -0.1698, 0.3235],
  10. [-0.1347, 0.3096, 0.4895, 0.1221],
  11. [ 0.2735, -0.2238, 0.4713, -0.0683],
  12. [-0.3150, -0.1905, 0.3645, 0.3766],
  13. [-0.0340, 0.3212, 0.0650, 0.1380],
  14. [-0.2500, 0.1128, -0.3338, -0.4151],
  15. [ 0.0446, -0.4776, -0.3655, 0.0822],
  16. [-0.1871, -0.0602, -0.4855, -0.3604],
  17. [-0.3296, 0.0523, -0.3424, 0.2151],
  18. [-0.2478, 0.1424, 0.4547, -0.1969]], requires_grad=True)
  19. model.fc1.weight Parameter containing:
  20. tensor([[ 0.3163, -0.1592, -0.2360, 0.1436, 0.1158, 0.0406, -0.0627, 0.0566],
  21. [-0.1688, 0.3519, 0.2464, -0.2693, 0.1284, 0.0544, -0.0188, 0.2404],
  22. [ 0.0738, 0.2013, 0.0868, 0.1396, -0.2885, 0.3431, -0.1109, 0.2549],
  23. [ 0.1222, -0.1877, 0.3511, 0.1951, 0.2147, -0.0427, -0.3374, -0.0653]])
  24. model.fc2.weight Parameter containing:
  25. tensor([[-0.1821, -0.3155, -0.1637, 0.3213],
  26. [-0.1353, 0.3130, 0.4807, 0.1245],
  27. [ 0.2731, -0.2206, 0.4687, -0.0718],
  28. [-0.3138, -0.1925, 0.3561, 0.3809],
  29. [-0.0344, 0.3152, 0.0606, 0.1332],
  30. [-0.2501, 0.1154, -0.3267, -0.4137],
  31. [ 0.0400, -0.4723, -0.3586, 0.0808],
  32. [-0.1823, -0.0667, -0.4854, -0.3543],
  33. [-0.3285, 0.0547, -0.3388, 0.2166],
  34. [-0.2497, 0.1410, 0.4551, -0.2008]], requires_grad=True)

方法二

  1. 优化器传入不冻结的fc2层的参数
  1. optimizer = optim.SGD(model.fc2.parameters(), lr=1e-2) # 优化器只传入fc2的参数

注:不需要将要冻结层的参数的requires_grad置为False

代码:

  1. # 情况三:采用方式二冻结fc1层时
  2. loss_fn = nn.CrossEntropyLoss()
  3. optimizer = optim.SGD(model.fc2.parameters(), lr=1e-2) # 优化器只传入fc2的参数
  4. print("model.fc1.weight", model.fc1.weight)
  5. print("model.fc2.weight", model.fc2.weight)
  6.  
  7. for epoch in range(10):
  8. x = torch.randn((3, 8))
  9. label = torch.randint(0,3,[3]).long()
  10. output = model(x)
  11. loss = loss_fn(output, label)
  12. optimizer.zero_grad()
  13. loss.backward()
  14. optimizer.step()
  15. print("model.fc1.weight", model.fc1.weight)
  16. print("model.fc2.weight", model.fc2.weight)

有两种思路实现这个目标,一个是设置不要更新参数的网络层为false,另一个就是在定义优化器时只传入要更新的参数。

最优做法是,优化器只传入requires_grad=True的参数,这样占用的内存会更小一点,效率也会更高。

最优写法

将不更新的参数的requires_grad设置为False,同时不将该参数传入optimizer

  1. 将不更新的参数的requires_grad设置为False
  1. # 冻结fc1层的参数
  2. for name, param in model.named_parameters():
  3. if "fc1" in name:
  4. param.requires_grad = False
  1. 不将不更新的模型参数传入optimizer
  1. # 定义一个fliter,只传入requires_grad=True的模型参数
  2. optimizer = optim.SGD(filter(lambda p : p.requires_grad, model.parameters()), lr=1e-2)

结论

最优写法能够节省显存和提升速度:

节省显存:不将不更新的参数传入optimizer
提升速度:将不更新的参数的requires_grad设置为False,节省了计算这部分参数梯度的时间

 

 

发表评论

匿名网友

拖动滑块以完成验证
加载失败