模型、參數、非線性、前向傳播、反向偏微分｜深度學習入門_技术 _ 頭條網

頭條ID：錢多多先森，關注更多AI、CV、數碼、個人理財領域知識，關注我，一起成長

在深度學習中，數據、模型、參數、非線性、前向傳播預測、反向偏微分參數更新等等，都是該領域的基礎內容。究竟他們最基礎的都有哪些？什麼原理？用python如何實現？都是本節要描述的內容。

sigmoid激活函數

<code>import numpy as npimport matplotlib.pyplot as pltimport h5pyimport sklearnimport sklearn.datasetsimport sklearn.linear_modelimport scipy.iodef sigmoid(x): """ Compute the sigmoid of x Arguments: x -- A scalar or numpy array of any size. Return: s -- sigmoid(x) """ s = 1/(1+np.exp(-x)) return s/<code>

relu激活函數

<code>def relu(x): """ Compute the relu of x Arguments: x -- A scalar or numpy array of any size. Return: s -- relu(x) """ s = np.maximum(0,x) return s/<code>

網絡層參數的初始化

網絡層參數的初始化，就是初始化網絡模型中間的權值和偏執（簡單理解）

<code>def initialize_parameters(layer_dims): """ Arguments: layer_dims -- python array (list) containing the dimensions of each layer in our network Returns: parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL": W1 -- weight matrix of shape (layer_dims[l], layer_dims[l-1]) b1 -- bias vector of shape (layer_dims[l], 1) Wl -- weight matrix of shape (layer_dims[l-1], layer_dims[l]) bl -- bias vector of shape (1, layer_dims[l]) Tips: - For example: the layer_dims for the "Planar Data classification model" would have been [2,2,1]. This means W1's shape was (2,2), b1 was (1,2), W2 was (2,1) and b2 was (1,1). Now you have to generalize it! - In the for loop, use parameters['W' + str(l)] to access Wl, where l is the iterative integer. """ np.random.seed(3) parameters = {} L = len(layer_dims) # number of layers in the network for l in range(1, L): parameters['W' + str(l)] = np.random.randn(layer_dims[l], layer_dims[l-1]) / np.sqrt(layer_dims[l-1]) parameters['b' + str(l)] = np.zeros((layer_dims[l], 1)) assert(parameters['W' + str(l)].shape == layer_dims[l], layer_dims[l-1]) assert(parameters['W' + str(l)].shape == layer_dims[l], 1) return parameters/<code>

前向傳播(FP)

從網絡輸入到網絡最終輸出的過程稱為前向算法。前向傳播包括三塊內容，一是輸入，二是網絡中間參數，三是輸出，具體過程如下圖所示：

<code>def forward_propagation(X, parameters): """ Implements the forward propagation (and computes the loss) presented in Figure 2. Arguments: X -- input dataset, of shape (input size, number of examples) parameters -- python dictionary containing your parameters "W1", "b1", "W2", "b2", "W3", "b3": W1 -- weight matrix of shape () b1 -- bias vector of shape () W2 -- weight matrix of shape () b2 -- bias vector of shape () W3 -- weight matrix of shape () b3 -- bias vector of shape () Returns: loss -- the loss function (vanilla logistic loss) """ # retrieve parameters W1 = parameters["W1"] b1 = parameters["b1"] W2 = parameters["W2"] b2 = parameters["b2"] W3 = parameters["W3"] b3 = parameters["b3"] # LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SIGMOID Z1 = np.dot(W1, X) + b1 A1 = relu(Z1) Z2 = np.dot(W2, A1) + b2 A2 = relu(Z2) Z3 = np.dot(W3, A2) + b3 A3 = sigmoid(Z3) cache = (Z1, A1, W1, b1, Z2, A2, W2, b2, Z3, A3, W3, b3) return A3, cache/<code>

反向傳播(BP)

用來解決網絡優化問題，通過調節輸出層的結果和真實值之間的偏差來進行逐層調節參數。該學習過程是一個不斷迭代的過程。

<code>def backward_propagation(X, Y, cache): """ Implement the backward propagation presented in figure 2. Arguments: X -- input dataset, of shape (input size, number of examples) Y -- true "label" vector (containing 0 if cat, 1 if non-cat) cache -- cache output from forward_propagation() Returns: gradients -- A dictionary with the gradients with respect to each parameter, activation and pre-activation variables """ m = X.shape[1] (Z1, A1, W1, b1, Z2, A2, W2, b2, Z3, A3, W3, b3) = cache dZ3 = A3 - Y # error dW3 = 1./m * np.dot(dZ3, A2.T)#矩陣點乘 db3 = 1./m * np.sum(dZ3, axis=1, keepdims = True) dA2 = np.dot(W3.T, dZ3) dZ2 = np.multiply(dA2, np.int64(A2 > 0)) #數組和矩陣對應位置相乘，輸出與相乘數組/矩陣的大小一致 dW2 = 1./m * np.dot(dZ2, A1.T) db2 = 1./m * np.sum(dZ2, axis=1, keepdims = True) dA1 = np.dot(W2.T, dZ2) dZ1 = np.multiply(dA1, np.int64(A1 > 0)) dW1 = 1./m * np.dot(dZ1, X.T) db1 = 1./m * np.sum(dZ1, axis=1, keepdims = True) gradients = {"dZ3": dZ3, "dW3": dW3, "db3": db3, "dA2": dA2, "dZ2": dZ2, "dW2": dW2, "db2": db2, "dA1": dA1, "dZ1": dZ1, "dW1": dW1, "db1": db1} return gradients/<code>

更新模型（權值w、偏執b）參數

<code>def update_parameters(parameters, grads, learning_rate): """ Update parameters using gradient descent Arguments: parameters -- python dictionary containing your parameters: parameters['W' + str(i)] = Wi parameters['b' + str(i)] = bi grads -- python dictionary containing your gradients for each parameters: grads['dW' + str(i)] = dWi grads['db' + str(i)] = dbi learning_rate -- the learning rate, scalar. Returns: parameters -- python dictionary containing your updated parameters """ n = len(parameters) // 2 # number of layers in the neural networks # Update rule for each parameter for k in range(n): parameters["W" + str(k+1)] = parameters["W" + str(k+1)] - learning_rate * grads["dW" + str(k+1)] parameters["b" + str(k+1)] = parameters["b" + str(k+1)] - learning_rate * grads["db" + str(k+1)] return parameters/<code>

前向傳播進行預測

網絡執行前向傳播，預測的結果大於閾值的就置為1。

<code>def predict(X, y, parameters): """ This function is used to predict the results of a n-layer neural network. Arguments: X -- data set of examples you would like to label parameters -- parameters of the trained model Returns: p -- predictions for the given dataset X """ m = X.shape[1] p = np.zeros((1,m), dtype = np.int) # Forward propagation a3, caches = forward_propagation(X, parameters) # convert probas to 0/1 predictions for i in range(0, a3.shape[1]): if a3[0,i] > 0.5: p[0,i] = 1 else: p[0,i] = 0 # print results #print ("predictions: " + str(p[0,:])) #print ("true labels: " + str(y[0,:])) print("Accuracy: " + str(np.mean((p[0,:] == y[0,:])))) return p/<code>

計算代價函數

以交叉熵損失函數為例(Cross Entropy Loss)，其代價函數的計算公式如下：

<code>def compute_cost(a3, Y): """ Implement the cost function Arguments: a3 -- post-activation, output of forward propagation Y -- "true" labels vector, same shape as a3 Returns: cost - value of the cost function """ m = Y.shape[1] logprobs = np.multiply(-np.log(a3),Y) + np.multiply(-np.log(1 - a3), 1 - Y) cost = 1./m * np.nansum(logprobs) return cost/<code>

結語

通過這篇文章，你應該對深度學習中的地基模塊：數據、模型、參數、非線性、前向傳播預測、反向偏微分參數更新等等有了新的認識。在平時的學習中，不能單純的知道tf.sigmoid就可以四線非線性，而更加深入的瞭解其底層的代碼，這樣能加深我們對深度學習的認識。

最後，感謝你關注：錢多多先森，一個關注更多AI、CV、數碼、個人理財領域知識的同學。關注我，一起成長。

往期內容回顧：

sigmoid激活函數

relu激活函數

網絡層參數的初始化

前向傳播(FP)

​反向傳播(BP)

更新模型（權值w、偏執b）參數

前向傳播進行預測

計算代價函數

結語

相關文章:

深度學習-Pytorch框架學習之模型訓練和測試

深度學習-Pytorch框架學習之張量處理篇

目標檢測之numpy——向量和矩陣乘法相關

“深度學習”第一實踐課，收穫NVIDIA開發者證書

ScrabbleGAN；UnrealText；跟蹤模型；G2L-Net等

目標檢測之tensorflow——padding選擇

深度學習/目標檢測之tensorflow——莫煩教程總結（19-20）

深度學習/目標檢測之tensorflow——莫煩教程總結（14-）

深度學習/目標檢測之tensorflow——莫煩教程總結（1-13）

一文弄懂Resnet

深度神經網絡應如何避免過擬合

PyTorch保存和加載多GPU模型和單GPU模型

Vgg網絡解讀

人工智能知識點：python+機器學習+深度學習，附贈全套視頻教程

量化交易學習筆記（二十三）——自定義Indicator

深度學習中，一般如何防止過擬合？

人工智能編程：如何為神經網絡每一層設置不同的梯度下降學習率？

match：一款基於深度學習的層級問答匹配工具

人工智能編程：神經網絡的反向傳播的自動求導是如何計算的？

pytorch中的where和gather的介紹

深度學習編程：張量的運算（通過人工智能框架pytorch實現）

深度學習-LSTM算法實現（MNIST手寫數字識別）

深度學習-遷移學習流程及代碼解析

提升訓練質量的技巧合集

深度學習 pytorch實戰 神經網絡分類任務

深度學習 pytorch實戰 神經網絡關係擬合

反向R？削弱顯著特徵為細粒度分類帶來提升

組合求解器 + 深度學習 =？這篇ICLR 2020論文告訴你答案

深度學習理論與實戰PyTorch實現

深度學習/圖像處理歷史最全最細-網絡、技巧、迭代-論文整理分享

可以丟掉SGD和Adam了，新的深度學習優化器Ranger：RAdam + LookAhead強強結合

使用 TensorFlow 來實現一個簡單的驗證碼識別過程

深度自適應性Transformer

深度學習中的多任務學習綜述

梯度之上：海森矩陣

「深度學習」用TensorFlow實現人臉識別（附源碼，快速get技能）

卷積神經網絡CNN

深度學習——你需要了解的八大開源框架

Tensorflow實戰-TensorFlow的正則化實現

深度學習：所有矩陣尺寸和計算的深層指南！

Kafka +深度學習+ MQTT搭建可擴展的物聯網平台「附源碼」

深度學習：基本概念深度解析

深度學習 Python 必備知識點

DeepLearning-Ng編程中遇到的一些問題

05.09 使用TensorFlow構建簡單的生成對抗網絡（GAN）

第二章 IoC容器和Bean配置

運算裡不得不說的python模塊—math

Devops度量--DevOps 現狀快速檢查表

SOP是什麼（解讀）

還不知道交換機上如何配置DHCP，趕緊過來圍觀吧，一分鐘包你學會

還在手動配置IP地址嗎？太Low了，一分鐘教會您如何配置DHCP

Python爬蟲自學筆記：分析頭條文章網頁源文件

DNS偵查工具

國人開源的異步 Python ORM：GINO

程序測評：Create React App 3.3中有哪些酷炫新功能？

“明學”的魅力？我只要我覺得：駕馭終端，提高生產力

（必收藏系列）Linux面試題——命令集

五分鐘學會如何在 IPFS 上部署網站

「正點原子NANO STM32F103開發板資料連載」第29章 內存管理實驗

小白怎麼學Web前端開發 如何成為技術達人

如何開發一個web靜態服務器

學Java編程還有前景嗎 如何才能拿到高薪

Python網絡爬蟲之配置篇（一）

SpringBoot 整合SpringSecurity示例實現前後分離權限註解+JWT登錄認證

Python的運行效率太低？幾行代碼快速提升！

python的優點是什麼？最新Python400集視頻（附教程）

MySQL中OOM故障應如何下手-愛可生

像專家一樣使用 panic

30種不同的編程語言怎麼寫“Hello, World”

percona QAN 介紹

反向傳播(BP)

深度學習 pytorch實戰神經網絡分類任務

深度學習 pytorch實戰神經網絡關係擬合

「正點原子NANO STM32F103開發板資料連載」第29章內存管理實驗

小白怎麼學Web前端開發如何成為技術達人

學Java編程還有前景嗎如何才能拿到高薪