神经网络NN与全连接层FCL

小型经典数据集加载

keras.datasets 常用数据集

keras.datasets下载数据集是在google源下载,需要科学上网。已经下载过一遍的数据集第二次就不需要下载了(有需要科学上网的同学可以到About me邮件联系我)

  • boston housing:波士顿房价回归模型
  • mnist/fashion minst:手写数字识别
  • cifar10/100:小型图片分类,cifar100是cifar10的再分类
  • imdb:评语情感分类

MNIST

共70k[28*28]图片,60k来做训练,10k做检测

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
In [2]: import  tensorflow as tf
...: from tensorflow import keras
In [4]: (x,y),(x_test,y_test) = keras.datasets.mnist.load_data() #返回numpy格式的两个tuple
In [7]: x.shape,y.shape
Out[7]: ((60000, 28, 28), (60000,))
In [9]: x.min(),x.max(),x.mean() #numpy的min、max
Out[9]: (0, 255, 33.318421449829934)
In [10]: x_test.shape,y_test.shape
Out[10]: ((10000, 28, 28), (10000,))

In [13]: y_onehot = tf.one_hot(y, depth=10) #将lable转换为onehot编码

In [15]: y[:2],y_onehot[0:2]
Out[15]:
(array([5, 0], dtype=uint8),
<tf.Tensor: shape=(2, 10), dtype=float32, numpy=
array([[0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.]], dtype=float32)>)

CIFAR10/100

共60k[3232\3]图片,50k来做训练,10k做检测

1
2
3
In [16]: (x,y),(x_test,y_test) = keras.datasets.cifar10.load_data()
In [17]: x.shape,y.shape,x_test.shape,y_test.shape
Out[17]: ((50000, 32, 32, 3), (50000, 1), (10000, 32, 32, 3), (10000, 1))

tf.data.Dataset

我们需要numpy->tensor->iter,tf.data.Dataset是专门进行数据集迭代的类

from_tensor_slices 直接转换为对象

1
2
3
4
5
6
7
8
9
10
11
In [19]: (x,y),(x_test,y_test) = keras.datasets.cifar10.load_data()

In [20]: db = tf.data.Dataset.from_tensor_slices(x)
In [21]: next(iter(db)).shape #先使用iter生成db的迭代器,在使用next每次调用向后迭代一次
Out[21]: TensorShape([32, 32, 3])

In [28]: db = tf.data.Dataset.from_tensor_slices((x,y))
In [29]: next(iter(db))[0].shape
Out[29]: TensorShape([32, 32, 3])
In [30]: next(iter(db))[1].shape
Out[30]: TensorShape([1])

.shuffle 随机打散

1
2
In [31]: db = tf.data.Dataset.from_tensor_slices((x_test,y_test)) #对应随机打散
In [32]: db = db.shuffle(10000) #这个参数可以给的大一点,10000这个范围内的随机打散

.map 数据预处理

map(a)对db中每一个数据进行a处理

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
In [33]: def preprocess(x,y):
...: x=tf.cast(x,dtype=tf.float32)/255.
...: y=tf.cast(y,dtype=tf.int32)
...: y=tf.one_hot(y,depth=10)
...: return(x,y)
...:

In [34]: db2 = db.map(preprocess)

In [35]: res = next(iter(db2))

In [36]: res[0].shape,res[1].shape
Out[36]: (TensorShape([32, 32, 3]), TensorShape([1, 10]))

In [49]: res[1]
Out[49]: <tf.Tensor: shape=(1, 10), dtype=float32, numpy=array([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.]], dtype=float32)>

.batch

一般进行数据集读取都是读取多张

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
In [50]: db3 = db2.batch(32)

In [51]: res = next(iter(db3))

In [52]: res[0].shape,res[1].shape
Out[52]: (TensorShape([32, 32, 32, 3]), TensorShape([32, 1, 10]))
#这里我们不需要[32,1,10]中的1,所以要在前面预处理中使用tf.squeeze将1去掉

In [61]: def preprocess(x,y):
...: x=tf.cast(x,dtype=tf.float32)/255.
...: y=tf.cast(y,dtype=tf.int32)
...: y=tf.one_hot(y,depth=10)
...: y = tf.squeeze(y)
...: return(x,y)
In [62]: db2 = db.map(preprocess)
In [63]: db3 = db2.batch(32)
In [64]: res = next(iter(db3))
In [65]: res[0].shape,res[1].shape
Out[65]: (TensorShape([32, 32, 32, 3]), TensorShape([32, 10]))

StopIteration

使用

1
2
for x,y in db:
#迭代next

正常

如果使用

1
2
In [54]: while True:
...: next(db_iter)

则会报StopIteration 错误

也就是循环50k次之后继续迭代会报错,如果需要多次迭代,可以使用repeat

.repeat

1
In [66]: db4 = db3.repeat(10)

在使用for迭代db4的时候会迭代10次50k

完整步骤

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
import  tensorflow as tf
from tensorflow import keras

def prepare_mnist_features_and_labels(x,y):
x = tf.cast(x, tf.float32) / 255.
y = tf.cast(y, tf.int64)
return x,y
def mnist_datasets():
(x,y),(x_val,y_val) = keras.datasets.fashion_mnist.load_data()
y = tf.one_hot(y,depth=10)
y_val = tf.one_hot(y_val,depth=10)

ds = tf.data.Dataset.from_tensor_slices((x,y))
ds = ds.map(prepare_mnist_features_and_labels)
ds = ds.shuffle(60000).batch(100)
ds_val = tf.data.Dataset.from_tensor_slices((x_val, y_val))
ds_val = ds_val.map(prepare_mnist_features_and_labels)
ds_val = ds_val.shuffle(60000).batch(100)
return ds,ds_val

全连接层

Layers

  • Input
  • Hidden
  • Output
1
2
3
4
5
6
7
8
9
10
In [3]: x = tf.random.normal([4,784])

In [4]: net = tf.keras.layers.Dense(512)
In [5]: out = net(x) #自动根据输入的x生成w和b

In [6]: out.shape
Out[6]: TensorShape([4, 512])

In [7]: net.kernel.shape,net.bias.shape
Out[7]: (TensorShape([784, 512]), TensorShape([512]))
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
In [2]: net = tf.keras.layers.Dense(10)
In [3]: net.bias
AttributeError: 'Dense' object has no attribute 'bias'
#在声明netDense的时候并没有完成对w和b的创建
In [7]: net.get_weights()
Out[7]: []
In [8]: net.weights
Out[8]: []
#可以通过build来实现w和b的创建
In [9]: net.build(input_shape=(None,4))
In [10]: net.kernel.shape,net.bias.shape
Out[10]: (TensorShape([4, 10]), TensorShape([10]))
#可以重复创建kernal的参数
In [11]: net.build(input_shape=(2,4))
In [12]: net.kernel
Out[12]:
<tf.Variable 'kernel:0' shape=(4, 10) dtype=float32, numpy=
array([[ 0.61441875, 0.24404484, 0.46651304, 0.19085598, -0.05145264,
-0.35335562, -0.10202849, -0.15380013, 0.01670462, 0.41096544],
[-0.57477844, 0.335864 , 0.02894145, -0.6324929 , 0.3016789 ,
0.38328493, 0.33733964, -0.5588818 , 0.20204544, -0.15296638],
[-0.56863743, 0.53329456, 0.38212597, -0.29313013, 0.5511124 ,
0.22399694, -0.13377267, -0.24024266, 0.6475775 , -0.61608607],
[ 0.51299465, -0.19775617, -0.0596118 , 0.13451362, 0.5777488 ,
0.02472413, -0.5219021 , -0.19751549, -0.62549543, 0.17085516]],
dtype=float32)>

在前面调用net(x)时会自动调用net.build()

若提前手动设定的输入shape与实际输入shape不同时会报错

每个节点和每个节点都有连接即为全连接

Multi-Layers

  • keras.Sequemtial([layer1,layer2,layer3]) 调用容器,多个Dense层组成list交给容器,调用一次forward,就会进行一层数据流动
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
In [14]: x = tf.random.normal([2,4])
In [15]: model = keras.Sequential([])

In [16]: model = keras.Sequential([keras.layers.Dense(2,activation='relu'),
keras.layers.Dense(2,activation='relu'),
keras.layers.Dense(2)])

In [17]: model.build(input_shape=[None,4])

In [18]: model.summary() #方便查看网络结构
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================

dense_1 (Dense) multiple 10
_________________________________________________________________
dense_2 (Dense) multiple 6
_________________________________________________________________
dense_3 (Dense) multiple 6
=================================================================

Total params: 22
Trainable params: 22
Non-trainable params: 0
_________________________________________________________________

In [19]: for p in model.trainable_variables:
...: print(p.name,p.shape)
...:
dense_1/kernel:0 (4, 2)
dense_1/bias:0 (2,)
dense_2/kernel:0 (2, 2)
dense_2/bias:0 (2,)
dense_3/kernel:0 (2, 2)
dense_3/bias:0 (2,)

误差计算

MSE

  • loss

  • 二范数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
In [22]: y = tf.constant([1,2,3,0,2])
In [23]: y = tf.one_hot(y,depth=4)
In [24]: y = tf.cast(y,dtype=tf.float32)

In [25]: out = tf.random.normal([5,4])

In [26]: loss1 = tf.reduce_mean(tf.square(y-out))
In [27]: loss2 = tf.square(tf.norm(y-out))/(5*4)
In [28]: loss3 = tf.reduce_mean(tf.losses.MSE(y,out))
#tf.losses.MSE(y,out) 返回的是每个instance的mse [b]
In [29]: loss1,loss2,loss3
Out[29]:
(<tf.Tensor: shape=(), dtype=float32, numpy=0.5689168>,
<tf.Tensor: shape=(), dtype=float32, numpy=0.5689168>,
<tf.Tensor: shape=(), dtype=float32, numpy=0.5689168>)

Cross Entropy Loss

Entropy 熵

信息论中的概念

  • 信息不确定性的度量方法
  • measure of surprise
  • lower entropy -> more certainty

Cross Entropy 交叉熵

交叉熵针对两个值,而后可以推导出 p的熵+pq离散度(衡量pq距离,p=q时离散度为0)

使用交叉熵时,使pq离散度趋于0,也就是y与out的离散度,就是我们需要的状态

1
2
3
4
5
6
7
8
In [30]: tf.losses.categorical_crossentropy([0,1,0,0],[0.25,0.25,0.25,0.25])
Out[30]: <tf.Tensor: shape=(), dtype=float32, numpy=1.3862944>

In [31]: tf.losses.categorical_crossentropy([0,1,0,0],[0.1,0.8,0.05,0.05])
Out[31]: <tf.Tensor: shape=(), dtype=float32, numpy=0.22314353>

In [32]: tf.losses.categorical_crossentropy([0,1,0,0],[0.01,0.97,0.01,0.01])
Out[32]: <tf.Tensor: shape=(), dtype=float32, numpy=0.030459179>
  • 对比MSE ,sigmod+MSE可能造成gradient vanish
  • Cross Entropy在预测错误较严重时,收敛会相对较快
  • 具体问题具体分析 比如meta-learning使用MSE就会比较稳定

logit指的是最后一层没有加激活函数,经过softmax和CE会有数值不稳定的问题,我们将最后两个部分统一合并为一起,作为一个函数,在函数内部已经做好优化

1
In [34]: tf.losses.categorical_crossentropy([0,1,0,0],logits,from_logits=True)

from_logits一定要设定

且传入的是logits,而不是经过softmax函数的

打赏
  • 版权声明: 本博客所有文章除特别声明外,均采用 Apache License 2.0 许可协议。转载请注明出处!

请我喝杯咖啡吧~

支付宝
微信