神经网络NN与全连接层FCL

2020-03-30

机器学习 / Tensorflow

字数统计: 1.8k字 | 阅读时长≈ 9分钟

小型经典数据集加载

keras.datasets 常用数据集

keras.datasets下载数据集是在google源下载，需要科学上网。已经下载过一遍的数据集第二次就不需要下载了（有需要科学上网的同学可以到About me邮件联系我）

boston housing：波士顿房价回归模型
mnist/fashion minst：手写数字识别
cifar10/100：小型图片分类，cifar100是cifar10的再分类
imdb：评语情感分类

MNIST

共70k[28*28]图片，60k来做训练，10k做检测

In [2]: import  tensorflow as tf
   ...: from tensorflow import keras
In [4]: (x,y),(x_test,y_test) = keras.datasets.mnist.load_data() #返回numpy格式的两个tuple
In [7]: x.shape,y.shape
Out[7]: ((60000, 28, 28), (60000,))
In [9]: x.min(),x.max(),x.mean()	#numpy的min、max
Out[9]: (0, 255, 33.318421449829934)
In [10]: x_test.shape,y_test.shape
Out[10]: ((10000, 28, 28), (10000,))

In [13]: y_onehot = tf.one_hot(y, depth=10)	#将lable转换为onehot编码

In [15]: y[:2],y_onehot[0:2]
Out[15]:
(array([5, 0], dtype=uint8),
 <tf.Tensor: shape=(2, 10), dtype=float32, numpy=
 array([[0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0., 0., 0.]], dtype=float32)>)

CIFAR10/100

共60k[3232\3]图片，50k来做训练，10k做检测

1
2
3

In [16]: (x,y),(x_test,y_test) = keras.datasets.cifar10.load_data()
In [17]: x.shape,y.shape,x_test.shape,y_test.shape
Out[17]: ((50000, 32, 32, 3), (50000, 1), (10000, 32, 32, 3), (10000, 1))

tf.data.Dataset

我们需要numpy->tensor->iter，tf.data.Dataset是专门进行数据集迭代的类

from_tensor_slices 直接转换为对象

In [19]: (x,y),(x_test,y_test) = keras.datasets.cifar10.load_data()

In [20]: db = tf.data.Dataset.from_tensor_slices(x)
In [21]: next(iter(db)).shape #先使用iter生成db的迭代器，在使用next每次调用向后迭代一次
Out[21]: TensorShape([32, 32, 3])

In [28]: db = tf.data.Dataset.from_tensor_slices((x,y))
In [29]: next(iter(db))[0].shape
Out[29]: TensorShape([32, 32, 3])
In [30]: next(iter(db))[1].shape
Out[30]: TensorShape([1])

.shuffle 随机打散

1 2	`In [31]: db = tf.data.Dataset.from_tensor_slices((x_test,y_test)) #对应随机打散 In [32]: db = db.shuffle(10000) #这个参数可以给的大一点，10000这个范围内的随机打散`

.map 数据预处理

map(a)对db中每一个数据进行a处理

In [33]: def preprocess(x,y):
    ...:     x=tf.cast(x,dtype=tf.float32)/255.
    ...:     y=tf.cast(y,dtype=tf.int32)
    ...:     y=tf.one_hot(y,depth=10)
    ...:     return(x,y)
    ...:

In [34]: db2 = db.map(preprocess)

In [35]: res = next(iter(db2))

In [36]: res[0].shape,res[1].shape
Out[36]: (TensorShape([32, 32, 3]), TensorShape([1, 10]))

In [49]: res[1]
Out[49]: <tf.Tensor: shape=(1, 10), dtype=float32, numpy=array([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.]], dtype=float32)>

.batch

一般进行数据集读取都是读取多张

In [50]: db3 = db2.batch(32)

In [51]: res = next(iter(db3))

In [52]: res[0].shape,res[1].shape
Out[52]: (TensorShape([32, 32, 32, 3]), TensorShape([32, 1, 10])) 
#这里我们不需要[32,1,10]中的1，所以要在前面预处理中使用tf.squeeze将1去掉

In [61]: def preprocess(x,y):
    ...:     x=tf.cast(x,dtype=tf.float32)/255.
    ...:     y=tf.cast(y,dtype=tf.int32)
    ...:     y=tf.one_hot(y,depth=10)
    ...:     y = tf.squeeze(y)
    ...:     return(x,y)
In [62]: db2 = db.map(preprocess)
In [63]: db3 = db2.batch(32)
In [64]: res = next(iter(db3))
In [65]:  res[0].shape,res[1].shape
Out[65]: (TensorShape([32, 32, 32, 3]), TensorShape([32, 10]))

StopIteration

使用

1 2	`for x,y in db： #迭代next`

正常

如果使用

1 2	`In [54]: while True: ...: next(db_iter)`

则会报StopIteration 错误

也就是循环50k次之后继续迭代会报错，如果需要多次迭代，可以使用repeat

.repeat

1	`In [66]: db4 = db3.repeat(10)`

在使用for迭代db4的时候会迭代10次50k

完整步骤

import  tensorflow as tf
from tensorflow import keras

def prepare_mnist_features_and_labels(x,y):
    x = tf.cast(x, tf.float32) / 255.
    y = tf.cast(y, tf.int64)
    return x,y
def mnist_datasets():
    (x,y),(x_val,y_val) = keras.datasets.fashion_mnist.load_data()
    y = tf.one_hot(y,depth=10)
    y_val = tf.one_hot(y_val,depth=10)

    ds = tf.data.Dataset.from_tensor_slices((x,y))
    ds = ds.map(prepare_mnist_features_and_labels)
    ds = ds.shuffle(60000).batch(100)
    ds_val = tf.data.Dataset.from_tensor_slices((x_val, y_val))
    ds_val = ds_val.map(prepare_mnist_features_and_labels)
    ds_val = ds_val.shuffle(60000).batch(100)
    return ds,ds_val

全连接层

Layers

Input
Hidden
Output

In [3]: x = tf.random.normal([4,784])

In [4]: net = tf.keras.layers.Dense(512)
In [5]: out = net(x) #自动根据输入的x生成w和b

In [6]: out.shape
Out[6]: TensorShape([4, 512])

In [7]: net.kernel.shape,net.bias.shape
Out[7]: (TensorShape([784, 512]), TensorShape([512]))

In [2]: net = tf.keras.layers.Dense(10)
In [3]: net.bias
AttributeError: 'Dense' object has no attribute 'bias'
		#在声明netDense的时候并没有完成对w和b的创建
In [7]: net.get_weights()
Out[7]: []
In [8]: net.weights
Out[8]: []
		#可以通过build来实现w和b的创建
In [9]: net.build(input_shape=(None,4))
In [10]: net.kernel.shape,net.bias.shape
Out[10]: (TensorShape([4, 10]), TensorShape([10]))
		#可以重复创建kernal的参数
In [11]: net.build(input_shape=(2,4))
In [12]: net.kernel
Out[12]:
<tf.Variable 'kernel:0' shape=(4, 10) dtype=float32, numpy=
array([[ 0.61441875,  0.24404484,  0.46651304,  0.19085598, -0.05145264,
        -0.35335562, -0.10202849, -0.15380013,  0.01670462,  0.41096544],
       [-0.57477844,  0.335864  ,  0.02894145, -0.6324929 ,  0.3016789 ,
         0.38328493,  0.33733964, -0.5588818 ,  0.20204544, -0.15296638],
       [-0.56863743,  0.53329456,  0.38212597, -0.29313013,  0.5511124 ,
         0.22399694, -0.13377267, -0.24024266,  0.6475775 , -0.61608607],
       [ 0.51299465, -0.19775617, -0.0596118 ,  0.13451362,  0.5777488 ,
         0.02472413, -0.5219021 , -0.19751549, -0.62549543,  0.17085516]],
      dtype=float32)>

在前面调用net(x)时会自动调用net.build()

若提前手动设定的输入shape与实际输入shape不同时会报错

每个节点和每个节点都有连接即为全连接

Multi-Layers

keras.Sequemtial([layer1,layer2,layer3]) 调用容器，多个Dense层组成list交给容器，调用一次forward，就会进行一层数据流动

In [14]: x = tf.random.normal([2,4])
In [15]: model = keras.Sequential([])

In [16]: model = keras.Sequential([keras.layers.Dense(2,activation='relu'),
	keras.layers.Dense(2,activation='relu'),
	keras.layers.Dense(2)])

In [17]: model.build(input_shape=[None,4])

In [18]: model.summary() #方便查看网络结构
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
dense_1 (Dense)              multiple                  10
_________________________________________________________________
dense_2 (Dense)              multiple                  6
_________________________________________________________________
dense_3 (Dense)              multiple                  6
=================================================================
Total params: 22
Trainable params: 22
Non-trainable params: 0
_________________________________________________________________

In [19]: for p in model.trainable_variables:
    ...:     print(p.name,p.shape)
    ...:
dense_1/kernel:0 (4, 2)
dense_1/bias:0 (2,)
dense_2/kernel:0 (2, 2)
dense_2/bias:0 (2,)
dense_3/kernel:0 (2, 2)
dense_3/bias:0 (2,)

误差计算

MSE

loss

二范数

In [22]: y = tf.constant([1,2,3,0,2])
In [23]: y = tf.one_hot(y,depth=4)
In [24]: y = tf.cast(y,dtype=tf.float32)

In [25]: out = tf.random.normal([5,4])

In [26]: loss1 = tf.reduce_mean(tf.square(y-out))
In [27]: loss2 = tf.square(tf.norm(y-out))/(5*4)
In [28]: loss3 = tf.reduce_mean(tf.losses.MSE(y,out))
#tf.losses.MSE(y,out) 返回的是每个instance的mse [b]
In [29]: loss1,loss2,loss3
Out[29]:
(<tf.Tensor: shape=(), dtype=float32, numpy=0.5689168>,
 <tf.Tensor: shape=(), dtype=float32, numpy=0.5689168>,
 <tf.Tensor: shape=(), dtype=float32, numpy=0.5689168>)

Cross Entropy Loss

Entropy 熵

信息论中的概念

信息不确定性的度量方法
measure of surprise
lower entropy -> more certainty

Cross Entropy 交叉熵

交叉熵针对两个值，而后可以推导出 p的熵+pq离散度（衡量pq距离，p=q时离散度为0）

使用交叉熵时，使pq离散度趋于0，也就是y与out的离散度，就是我们需要的状态

In [30]: tf.losses.categorical_crossentropy([0,1,0,0],[0.25,0.25,0.25,0.25])
Out[30]: <tf.Tensor: shape=(), dtype=float32, numpy=1.3862944>

In [31]: tf.losses.categorical_crossentropy([0,1,0,0],[0.1,0.8,0.05,0.05])
Out[31]: <tf.Tensor: shape=(), dtype=float32, numpy=0.22314353>

In [32]: tf.losses.categorical_crossentropy([0,1,0,0],[0.01,0.97,0.01,0.01])
Out[32]: <tf.Tensor: shape=(), dtype=float32, numpy=0.030459179>

对比MSE ，sigmod+MSE可能造成gradient vanish
Cross Entropy在预测错误较严重时，收敛会相对较快
具体问题具体分析比如meta-learning使用MSE就会比较稳定

logit指的是最后一层没有加激活函数，经过softmax和CE会有数值不稳定的问题，我们将最后两个部分统一合并为一起，作为一个函数，在函数内部已经做好优化

1	`In [34]: tf.losses.categorical_crossentropy([0,1,0,0],logits,from_logits=True)`

from_logits一定要设定

且传入的是logits，而不是经过softmax函数的

打赏

版权声明： 本博客所有文章除特别声明外，均采用 Apache License 2.0 许可协议。转载请注明出处！