学习笔记TF034:实现Word2Vec

学学笔记TF034:完成Word2Vec,tf034word2vec

卷积神经网络发展趋势。Perceptron(感知机),一九五四年,弗兰k
Resenblatt建议,国王。Neocognitron(神经认识机),多层级神经互联网,东瀛物工学家Kunihiko
fukushima,20世纪80年份提议,一定水平视觉认识功用,启发卷积神经网络。LeNet-5,CNN之父,Yann
LeCun,一九九八年提议,第三遍多层级联卷积结构,手写数字有效识别。2013年,Hinton学生亚历克斯,8层卷积神经网络,ILSVRC
二〇一三比赛季军。亚历克斯Net
成功应用ReLU激活函数、Dropout、最大覆盖池化、L君越N层、GPU加速,启发后续技革,卷积神经互连网切磋进入快车道。

亚历克斯Netx后,卷积神经网络,一类互连网结构改进调节,一类互联网深度扩展。

Perceptron(1957)
Neocognitron(198x)
NIN(2013) VGG(2014)
Incepiton V1(2014) MSRANet(2014)
Incepiton V2(2015) ResNet(2015)
Incepiton V3(2015) ResNet V2(2015)
Inception ResNet V2(2016)

2011年,颜水成人事教育育授,Network in
Network第2回刊登,优化卷积神经互联网布局,推广1×1卷积结构。二零一四年,Google英斯piton Net V1,英斯ption Module,频频聚积高效卷积互联网布局,ILSVRC
二零一六亚军。二〇一六年终,Incepiton V2,Batch
Normalization,加快演习进度,进步网络品质。二零一六年末,英斯ption
V3,Factorization in Small
Convolutions观念,分解大尺寸卷积为多个小卷积或一维卷积。

贰零壹肆年,ILSVRC季军,VGGNet,全程3×3卷积,19层互联网。亚军MSRA-Net(微软)也是深层网络。2015年,微软ResNet,152层互连网,ILSVRC
二〇一六亚军,top-5错误率3.56%。ResNet V2,Batch
Normalization,去除激活层,用Identity
Mapping或Preactivation,进步互联网品质。英斯ption ResNet V2,融入英斯ption
Net互连网布局,和ResNet演练极深网络残差学习模块。

GPU计算财富,开源工具。

循环神经互联网(奥德赛NN),NLP(Nature Language
Processing,自然语言处理)最常用神经互连网结构。Word2Vec,语言字词转化稠密向量(Dense
Vector)。

Word2Vec,Word Embeddings,词向量或词嵌入。语言字词转向量情势表明(Vector
Representations)模型。图片,像素点稠密矩阵,音频,声音讯号频谱数据。

One-Hot
Encoder,字词转离散单独符号。多少个词对应一个向量,整篇小说对应四个疏散矩阵。文本分类模型,Bag
of
Words,荒废矩阵合併为二个向量,每一个词对应向量计数,总计词出现次数,作为特色。特征编码随机,未有关联新闻,未有字词关系。疏弃向量供给更加的多多少练习,磨练作用低,总计麻烦。

向量表达(Vector Representations),向量空间模型(Vector Space
Models),字词转一连值向量表明,意思周围词映射向量空量空间左近地点。向量空间模型在NLP重视若是Distributional
Hypothesis,一样语境词语义周边。向量空间模型,分两类,计数模型(Latent
Semantic Analysis),预测模型(Neural Probabilistic Language
Models)。计数模型总结语言材质库相邻词频率,计数计算结果转小稠密矩阵,预测模型依照词周边相邻词估量出那么些词和空中向量。

Word2Vec,总括非常飞快,从原始语言材料学习字词空间向量预测模型。CBOW(Continuous
Bag of
Words)形式从原始语句推断目的字词,适合小型数据。Skip-Gram从指标字词预计原始语句,适合大型语言材料。意思周边词向量空间地点临近。

推断模型(Neural Probabilistic Language
Models),用最大似然方法,给定前语句h,最大化指标词汇Wt可能率。总结量大,需计算词汇表全数单词出现大概。Word2Vec
CBOw模型,只需训练二元分类模型,区分真实指标词汇、编造词汇(噪声)两类。小量噪声词汇估算,类似蒙特Carlo模拟。

模型预测真实目的词汇高可能率,预测其余噪音词汇低可能率,练习学习指标最优化。编造噪声词汇练习,Negative
萨姆pling,总计loss
fuction作用相当高,只需总计随机选拔k个词汇,陶冶进程快。Noise_contrastive
Estimation(NCE) Loss,TensorFlow tf.nn.nce_loss。

Word2Vec
Skip-Gram形式。构造语境与目的词汇映射关系。语境包蕴单词右边和左臂词汇。滑窗尺寸
1。Skip-Gram模型,从指标词汇预测语境。创制随机词汇作负样本(噪声)。预测概率布满,正样本尽恐怕大,随机发生负样本尺大概小。优化算法(SGD)更新模型Word
Embedding参数,可能率分布损失函数(NCE Loss)尽恐怕小。单词Embedded
Vector随教练进程调治,直到最适合语言材质空间地点。损失函数最小,最契合语言质地,预测精确单词概率最高。

载入信赖库。

概念下载青海多少函数,urllib.request.urlretrieve下载数据压缩文件核文件尺寸。已下载跳过。

解压下载压缩文件,tf.compat.as_str
数据转单词列表。数据转为17005207单词列表。

创建vocabulary词汇表,collections.Counter总计单词列表单词频数,most_common方法取top
50000频数单词作者vocabulary。创设dict,top
四千0词汇vocabulary放入dictionary,急速查询。Python
dict查询复杂度O(1),质量好。全部单词转编号(频数排序编号)。top40000以外单词,断定为Unkown(未知),编号0,总结数据。遍历单词列表,各样单词,推断是或不是出现在dictionary,是转编号,不是编0。再次回到调换编码(data)、单词频数总计count、词汇表(dictionary)、反转情势(reverse_dictionary)。

去除原始单词列表,节约内部存款和储蓄器。打字与印刷vocabulary最高频词汇、数量(富含Unknow词汇)。“UNK”类4183九十一个。“the”1061398个。“of”5936八十多个。data前10单词[‘anarchism’,’originated’,’as’,’a’,’term’,’of’,’abuse’,’first’,’used’,’against’],编号[5235,3084,12,6,195,2,3137,46,59,156]。

生成Word2Vec演练样本。Skip-Gram情势(从目的单词反推语境)。定义函数generate_batch生成练习batch数据。参数batch_size为batch大小。skip_window单词最远可联系距离,设1只好跟附近八个单词生成样本。num_skips单词生成样本个数,不能够压倒skip_window两倍,batch_size是它的整好数倍,确认保障batch包含词汇全部样本。

单词序号data_index为global变量,屡屡调用generate_batch,确保data_index能够在函数genetate_batch修改。assert确保num_skips、batch_size知足条件。np.ndarray最早化batch、labels为数组。定义span
单词创建连锁样本单词数量,包涵目的单词和上下单词,span=2*skip_window+1。创造最大容积span
deque,双向队列,deque append方法加多变量,只保留最终插入span个变量。

从序号data_index先导,span个单词顺序读入buffer作初叶值。buffer体积为span
deque,已填满,后续数据替换前边数据。

率先层循环(次数batch_size//num_skips),循环内指标单词生成样本。buffer指标单词和具有有关单词,定义target-skip_window,buffer第skip_window个变量为指标单词。定义生成样本需制止单词列表,tagets_to_avoid,列表起首席卷第skip_window个单词(指标单词),预测语境单词,不富含指标单词。

第二层循环(次数num_skips),循环语境单词生成样本,先发生随机数,直到随机数不在targets_to_avoid中,代表可用语境单词,生成样本,feature目的词汇buffer[skip_window],label是buffer[target]。语境单词使用,加多到targets_to_avoid过滤。目的单词全数样本生成完(num_skips个),读入下二个单词,抛掉buffer第2个单词,滑窗向后运动一个人,目的单词向后活动八个,语境单词全部后移,开首生成下八个目的单词磨练样本。

两层循环实现,获得batch_size个陶冶样本。再次来到batch、labels。

调用generate_batch函数测量检验。参数batch_size设8,num_skips设2,skip_window设1,执行generate_batch获得batch、labels,打印。

概念陶冶batch_size 128,embedding_size
128。embedding_size,单词转稠密向量维度,50〜1000。skip_window单词间最远联系距离设1,num_skips指标单词提取样本数设2.生成验证数据valid_examples。随机收取频数最高单词,看向量空间近来单词是不是相关性高。valid_size设16抽取验证单词数。valid_window设100验证单词频为最高一百个单词抽出。np.random.choice函数随机收取。num_sampled陶冶负样本噪声单词数量。

定义Skip_Gram
Word2Vec模子互连网布局。创设f.Graph,设置为暗中同意graph。创造陶冶多少inputs、labels
placeholder,随机发生valid_examples转TensorFlow constant。with
tf.device(‘/cpu:0’)限定全部计算在CPU实践。tf.random_uniform随机生成全体单词词向量embeddings,单词表大小伍仟0,向量维度128,tf.nn.embedding_lookup查找输入train_inputs对应赂理embed。tf.truncated_normal早先化磨炼优化指标NCE
Loss的权重参数nce_weights,nce_biases初始化0。tf.nn.nce_loss总括学习词向量embedding演习多少loss,tf.reduce_mean汇总。

概念优化器SGD ,学习速率1.0。总计嵌入向量embeddings
L2范数norm,embeddings除L2范数得规范normalized_embeddings。tf.nn.embedding_lookup查询证实单词嵌入向量,总结验证单词嵌入同与词汇表全体单词相似性。tf.global_variables_initializer伊始化全体模型参数。

概念最大迭代次数10万次,创立设置私下认可session,实施参数起先化。迭代中,generate_batch生成batch
inputs、labels数据,创建feed_dict。session.run()施行优化器运算(参数更新)和损失总计,磨炼loss积累到avegage_loss。

每3000次巡回,计算平均loss,显示。

每10000次巡回,总计验证单词和成套单词相似度,验证单词最相似8个单词显示。

磨练模型对名词、动词、形容词类型单词相似词汇识别特别标准。Skip-Gram
Word2Vec 向量空间表达(Vetor
Representations)品质不行高,近义词在向量空间地方十一分临近。

概念可视化Word2Vec功力函数。low_dim_embs降给到2维单词空间向量,图表彰显单词地点。plt.scatter(matplotlib.pyplot)呈现散点图(单词地点),plt.annotate显示单词本人。plt.savefig保存图片到地面文件。

sklearn.manifold.TSNe完成降维,原始128维嵌入同量降到2维,plot_sith_labels函数显示。只呈现词频最高一百个单词可视化结果。

离开临近单词,语义高相似性。左上角单个假名聚焦地。冠词聚集在左侧中部。Word2Vec性质量评定价,可视化观察,Analogical
Reasoning直接预测语义、语境关系。回答填空难题。大范围语言材质库,参数调节和测验接纳最适合值。

 

     import collections
     import math
     import os
     import random
     import zipfile
     import numpy as np
     import urllib
     import tensorflow as tf
     # Step 1: Download the data.
     url = 'http://mattmahoney.net/dc/'
     def maybe_download(filename, expected_bytes):
       if not os.path.exists(filename):
         filename, _ = urllib.request.urlretrieve(url + filename, filename)
       statinfo = os.stat(filename)
       if statinfo.st_size == expected_bytes:
         print('Found and verified', filename)
       else:
         print(statinfo.st_size)
         raise Exception(
        'Failed to verify ' + filename + '. Can you get to it with a browser?')
       return filename
     filename = maybe_download('text8.zip', 31344016)
     # Read the data into a list of strings.
     def read_data(filename):
       with zipfile.ZipFile(filename) as f:
         data = tf.compat.as_str(f.read(f.namelist()[0])).split()
       return data
     words = read_data(filename)
     print('Data size', len(words))
     # Step 2: Build the dictionary and replace rare words with UNK token.
     vocabulary_size = 50000
     def build_dataset(words):
       count = [['UNK', -1]]
       count.extend(collections.Counter(words).most_common(vocabulary_size - 1))
       dictionary = dict()
       for word, _ in count:
         dictionary[word] = len(dictionary)
       data = list()
       unk_count = 0
       for word in words:
         if word in dictionary:
           index = dictionary[word]
         else:
           index = 0  # dictionary['UNK']
           unk_count += 1
         data.append(index)
       count[0][1] = unk_count
       reverse_dictionary = dict(zip(dictionary.values(), dictionary.keys()))
       return data, count, dictionary, reverse_dictionary
     data, count, dictionary, reverse_dictionary = build_dataset(words)
     del words  # Hint to reduce memory.
     print('Most common words (+UNK)', count[:5])
     print('Sample data', data[:10], [reverse_dictionary[i] for i in data[:10]])
     data_index = 0
     # Step 3: Function to generate a training batch for the skip-gram model.
     def generate_batch(batch_size, num_skips, skip_window):
       global data_index
       assert batch_size % num_skips == 0
       assert num_skips <= 2 * skip_window
       batch = np.ndarray(shape=(batch_size), dtype=np.int32)
       labels = np.ndarray(shape=(batch_size, 1), dtype=np.int32)
       span = 2 * skip_window + 1 # [ skip_window target skip_window ]
       buffer = collections.deque(maxlen=span)
       for _ in range(span):
         buffer.append(data[data_index])
         data_index = (data_index + 1) % len(data)
       for i in range(batch_size // num_skips):
         target = skip_window  # target label at the center of the buffer
         targets_to_avoid = [ skip_window ]
         for j in range(num_skips):
           while target in targets_to_avoid:
             target = random.randint(0, span - 1)
      targets_to_avoid.append(target)
      batch[i * num_skips + j] = buffer[skip_window]
      labels[i * num_skips + j, 0] = buffer[target]
    buffer.append(data[data_index])
    data_index = (data_index + 1) % len(data)
       return batch, labels
     batch, labels = generate_batch(batch_size=8, num_skips=2, skip_window=1)
     for i in range(8):
       print(batch[i], reverse_dictionary[batch[i]],
           '->', labels[i, 0], reverse_dictionary[labels[i, 0]])
     # Step 4: Build and train a skip-gram model.
     batch_size = 128
     embedding_size = 128  # Dimension of the embedding vector.
     skip_window = 1       # How many words to consider left and right.
     num_skips = 2         # How many times to reuse an input to generate a label.
     valid_size = 16     # Random set of words to evaluate similarity on.
     valid_window = 100  # Only pick dev samples in the head of the distribution.
     valid_examples = np.random.choice(valid_window, valid_size, replace=False)
     num_sampled = 64    # Number of negative examples to sample.
     graph = tf.Graph()
     with graph.as_default():
       # Input data.
       train_inputs = tf.placeholder(tf.int32, shape=[batch_size])
       train_labels = tf.placeholder(tf.int32, shape=[batch_size, 1])
       valid_dataset = tf.constant(valid_examples, dtype=tf.int32)
       # Ops and variables pinned to the CPU because of missing GPU implementation
       with tf.device('/cpu:0'):
    # Look up embeddings for inputs.
         embeddings = tf.Variable(
        tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))
         embed = tf.nn.embedding_lookup(embeddings, train_inputs)
         # Construct the variables for the NCE loss
         nce_weights = tf.Variable(
        tf.truncated_normal([vocabulary_size, embedding_size],
                            stddev=1.0 / math.sqrt(embedding_size)))
         nce_biases = tf.Variable(tf.zeros([vocabulary_size]))
       loss = tf.reduce_mean(
           tf.nn.nce_loss(weights=nce_weights,
                          biases=nce_biases,
                          labels=train_labels,
                          inputs=embed,
                          num_sampled=num_sampled,
                          num_classes=vocabulary_size))
       # Construct the SGD optimizer using a learning rate of 1.0.
       optimizer = tf.train.GradientDescentOptimizer(1.0).minimize(loss)
       # Compute the cosine similarity between minibatch examples and all embeddings.
       norm = tf.sqrt(tf.reduce_sum(tf.square(embeddings), 1, keep_dims=True))
       normalized_embeddings = embeddings / norm
       valid_embeddings = tf.nn.embedding_lookup(
      normalized_embeddings, valid_dataset)
       similarity = tf.matmul(
      valid_embeddings, normalized_embeddings, transpose_b=True)
       # Add variable initializer.
       init = tf.global_variables_initializer()
     # Step 5: Begin training.
     num_steps = 100001
     with tf.Session(graph=graph) as session:
       init.run()
       print("Initialized")
       average_loss = 0
       for step in range(num_steps):
         batch_inputs, batch_labels = generate_batch(
             batch_size, num_skips, skip_window)
         feed_dict = {train_inputs : batch_inputs, train_labels : batch_labels}
         _, loss_val = session.run([optimizer, loss], feed_dict=feed_dict)
         average_loss += loss_val
         if step % 2000 == 0:
           if step > 0:
             average_loss /= 2000
           # The average loss is an estimate of the loss over the last 2000 batches.
           print("Average loss at step ", step, ": ", average_loss)
           average_loss = 0
         # Note that this is expensive (~20% slowdown if computed every 500 steps)
         if step % 10000 == 0:
           sim = similarity.eval()
           for i in range(valid_size):
             valid_word = reverse_dictionary[valid_examples[i]]
             top_k = 8 # number of nearest neighbors
             nearest = (-sim[i, :]).argsort()[1:top_k+1]
             log_str = "Nearest to %s:" % valid_word
             for k in range(top_k):
               close_word = reverse_dictionary[nearest[k]]
               log_str = "%s %s," % (log_str, close_word)
             print(log_str)
       final_embeddings = normalized_embeddings.eval()
     # Step 6: Visualize the embeddings.
     def plot_with_labels(low_dim_embs, labels, filename='tsne.png'):
       assert low_dim_embs.shape[0] >= len(labels), "More labels than embeddings"
       plt.figure(figsize=(18, 18))  #in inches
       for i, label in enumerate(labels):
         x, y = low_dim_embs[i,:]
         plt.scatter(x, y)
         plt.annotate(label,
                      xy=(x, y),
                      xytext=(5, 2),
                      textcoords='offset points',
                      ha='right',
                      va='bottom')
       plt.savefig(filename)
       #%%
     try:
       from sklearn.manifold import TSNE
       import matplotlib.pyplot as plt
       tsne = TSNE(perplexity=30, n_components=2, init='pca', n_iter=5000)
       plot_only = 200
       low_dim_embs = tsne.fit_transform(final_embeddings[:plot_only,:])
       labels = [reverse_dictionary[i] for i in range(plot_only)]
       plot_with_labels(low_dim_embs, labels)
     except ImportError:
       print("Please install sklearn, matplotlib, and scipy to visualize embeddings.")

 

 

参谋资料:
《TensorFlow实践》

接待付费咨询(150元每时辰),小编的微信:qingxingfengzi

卷积神经互联网发展趋势。Perceptron(感知机),一九五八年,Frank
Resenblatt提议,国君。Neocognitron(神经认识机),…

卷积神经网络发展趋势。Perceptron(感知机),一九六零年,Frank
Resenblatt提出,君主。Neocognitron(神经认识机),多层级神经互连网,日本化学家Kunihiko
fukushima,20世纪80年份提议,一定程度视觉认识作用,启发卷积神经互联网。LeNet-5,CNN之父,Yann
LeCun,1999年建议,第一遍多层级联卷积结构,手写数字有效识别。二零一三年,Hinton学生亚历克斯,8层卷积神经互联网,ILSVRC
二〇一二比赛季军。亚历克斯Net
成功应用ReLU激活函数、Dropout、最大覆盖池化、LQX56N层、GPU加快,启发后续技艺革新,卷积神经网络研讨步向快车道。

卷积神经互联网发展趋势。Perceptron(感知机),1956年,Frank
Resenblatt提议,国王。Neocognitron(神经认识机),多层级神经互联网,东瀛化学家Kunihiko
fukushima,20世纪80时代提议,一定水准视觉认识效率,启发卷积神经互连网。LeNet-5,CNN之父,Yann
LeCun,一九九六年提议,第二遍多层级联卷积结构,手写数字有效识别。二〇一二年,Hinton学生亚历克斯,8层卷积神经互连网,ILSVRC
二零一一比赛冠军。亚历克斯Net
成功利用ReLU激活函数、Dropout、最大覆盖池化、LEscortN层、GPU加速,启发后续技革,卷积神经网络商讨进入快车道。

读书笔记TF033:完结ResNet,tf033resnet

ResNet(Residual Neural Network),微软探讨院 Kaiming
He等4名华夏族提议。通过Residual Unit练习152层深神经网络,ILSVRC
二〇一六比赛季军,3.56%top-5错误率,参数量比VGGNet低,效果卓殊特出。ResNet结构,相当的慢加快超深神经网络训练,模型准确率相当大进步。英斯ption
V4,英斯ption Module、ResNet结合。ResNet推广性好。

瑞十教授Schmidhuber(LSTM网络发明者,壹玖玖玖年)建议Highway
Network。化解极深神经互连网难磨练难题。修改每层激活函数,以前激活函数只是对输入非线性别变化换y=H(x,WH),Highway
NetWork保留一定比例原始输入x,y=H(x,WH)·T(x,WT)+x·C(x,WC),T转变周到,C保留周详,令C=1-T。前边一层音信,一定比重不通过矩阵乘法和非线性调换,直接传输下一层。Highway
Network,gating
units学习决定互联网新闻流,学习原始音信应封存比例。gating机制,Schmidhuber教师早年LSTM循环神经网络gating。几百上千层深Highway
Network,直接梯度下跌算法磨练,合作多样非线性激活函数,学习极深神经互连网。Highway
Network允许磨练任性深度网络,优化措施与互联网深度独立。

ResNet
允许原始输入消息直接传输到后层。Degradation难题,不断强化神经互连网深度,精确率先上升到达饱和,再下落。ResNet灵感,用全等映射直接将前层输出传到后层。神经网络输入x,期望输出H(x),输入x直接传到输出作开头结果,学习指标F(x)=H(x)-x。ResNet残差学习单元(Residual
Unit),不再念书欧洲经济共同体输出H(x),只学习输出输入差距H(x)-x,残差。

ResNet,非常多旁路支线,输入间接连到后层,后层直接攻读残差,shortcut或connections。直接将输入音讯绕道传到输出,保养新闻完整性,整个互联网只学习输入、输出差异,简化学习目的、难度。

两层残新式学习单元蕴涵七个一律输出通道数3×3卷积。三层残差互联网用Network
In Network和英斯ption Net
1×1卷积。在中等3×3卷积前后都用1×1卷积,先降维再升维。假诺输入输出维度分裂,对输入x线性映射调换维度,再接后层。

layername outputsize 18-layer 34-layer 50-layer 101-layer 152-layer
conv1 112×112 7×7,64,stride 2
conv2_x 56×56 3×3 max pool,stride 2
3×3,64×2 3×3,64×3 1×1,64×3 1×1,64×3 1×1,64×3
3×3,64 3×3,64 3×3,64 3×3,64 3×3,64
1×1,256 1×1,256 1×1,256
conv3_x 28×28 3×3,128×2 3×3,128×4 1×1,128×4 1×1,128×4 1×1,128×8
3×3,128 3×3,128 3×3,128 3×3,128 3×3,128
1×1,512 1×1,512 1×1,512
conv4_x 14×14 3×3,256×2 3×3,256×6 1×1,256×6 1×1,256×23 1×1,256×36
3×3,256 3×3,256 3×3,256 3×3,256 3×3,256
1×1,1024 1×1,1024 1×1,1024
conv5_x 7×7 3×3,512×2 3×3,512×3 1×1,512×3 1×1,512×3 1×1,512×3
3×3,512 3×3,512 3×3,512 3×3,512 3×3,512
1×1,2048 1×1,2048 1×1,2048
1×1 average pool,1000-d fc,softmax
FLOPs 1.8×10^9 3.6×10^9 3.8×10^9 7.6×10^9 11.3×10^9

ResNet结构,消除层数不断加剧演习集抽样误差增大现象。ResNet网络练习标称误差随层数增大慢慢减小,测量试验集表现变好。Google借鉴ResNet,提出英斯ption
V4和Inception-ResNet-V2,ILSVRC错误率3.08%。《Identyty Mappings in Deep
Residual Networks》提出ResNet
V2。ResNet残差学习单元传播公式,前馈音讯和陈述实信号可径直传输。skip
connection 非线性激活函数,替换Identity
Mappings(y=x)。ResNet每层都用Batch Normalization。

施密德huber教授,ResNet,未有gates
LSTM网络,输入x传递到后层进程一向发生。ResNet等价奥迪Q5NN,ResNet类似多层网络间集成方法(ensemble)。

《The Power of Depth for Feedforward Neural
Networks》,理论申明加深互联网比加宽互联网更有效。

Tensorflow落成ResNet。contrib.slim库,原生collections。collections.namedtuple设计ResNet基本Block模块组named
tuple,成立Block类,独有数据结构,未有具体方法。规范Block,三个参数,scope、unit_fn、args。
Block(‘block1’, bottleneck, [(256, 64, 1)] * 2 + [(256, 64,
2)]),block1是Block名称(或scope),bottleneck是ResNet
V2残差学习单元。末了参数是Block
args,args是列表,每种元素对应bottleneck残差学习单元。前边八个要素(256,
64, 1),第伊利素(256, 64,
2),种种成分都以安慕希tuple(depth,depth_bottleneck,stride)。(256, 64,
3)代表bottleneck残差学习单元(八个卷积层),第三层输出通道数depth
256,前两层输出通道数depth_bottleneck 64,中间层步长stride
3。残差学习单元结构[(1×1/s1,64),(3×3/s3,64),(1×1/s1,256)]。

降采集样品subsample方法,参数inputs(输入)、factor(采集样品因子)、scope。fator1,不做修改直接重临inputsx,不为1,用slim.max_pool2d最大池化完结。1×1池化尺寸,stride步长,落成降采集样品。

定义conv2d_same函数创立卷积层,如若stride为1,用slim.conv2d,padding形式SAME。stride不为1,显式pad
zero。pad zero总的数量kernel_size-1
pad_beg为pad//2,pad_end为余下部分。tf.pad补零输入变量。已经zero
padding,只需padding格局VALID的slim.conv2d开立此卷积层。

概念聚成堆Blocks函数,参数net输入,blocks是Block class
列表。outputs_collections收集各end_points
collections。两层循环,每种Block,每一个Residual
Unit积聚。用五个tf.variable_scope命名残差学习单元block/unit_1形式。第2层循环,每个Block每个Residual
Unit
args,展开depth、depth_bottleneck、stride。unit_fn残差学习单元生成函数,顺序创立连接全数残差学习单元。slim.utils.collect_named_outputs函数,输出net增多到collection。全数Block全体Residual
Unit堆成堆完,重临最后net作stack_blocks_dense函数结果。

创建ResNet通用arg_scope,定义函数参数默许值。定义磨炼标识is_training默许True,权重衰减速度weight_decay暗许0.001。BN衰减速率暗中认可0.997,BN
epsilon暗中认可1e-5,BN
scale暗中认可True。先安装好BN每一项参数,通过slim.arg_scope设置slim.conv2d暗中认可参数,权重正则器设L2正则,权重初步化器设slim.variance_scaling_initializer(),激活函数设ReLU,标准化器设BN。最大池化padding方式暗中同意设SAME(随想中用VALID),特征对齐更简便易行。多层嵌套arg_scope作结果重临。

概念主题bottleneck残差学习单元。ResNet V2杂文Full Preactivation Residual
Unit 变种。每层前都用Batch
Normalization,输入preactivation,不在卷积实行激活函数管理。参数,inputs输入,depth、depth_bottleneck、stride,outputs_collections收集end_points
collection,scope是unit名称。用slim.utils.last_dimension函数获取输入最终维度输出通道数,参数min_rank=4限定最少4个维度。slim.batch_norm
输入 Batch 诺玛lization,用ReLU函数预激活Preactivate。

定义shorcut,直连x,借使残差单元输入通道数depth_in、输出通道数depth一致,用subsample,步长stride,inputs空间降采集样品,确认保证空间尺寸和残差一致,残差中间层卷积步长stride;倘使不一致样,用步长stride
1×1卷积退换通道数,变一致。

定义residual(残差),3层,1×1尺寸、步长1、出口通道数depth_bottleneck卷积,3×3尺寸、步长stride、输出通道数depth_bottleneck卷积,1×1尺码、步长1、输出通道数depth卷积,得最后residual,最终层没有正则项尚未激活函数。residual、shorcut相加,得最后结果output,用slim.utils.collect_named_outputs,结果加多collection,再次来到output函数结果。

概念生成ResNet
V2主函数。参数,inputs输入,blocks为Block类列表,num_classes最终输出类数,global_pool标识是还是不是加最后一层全局平均池化,include_root_block标识是还是不是加ResNet互联网最前边7×7卷积、最大池化,reuse标记是还是不是重用,scope整个网络名称。定义variable_scope、end_points_collection,通过slim.arg_scope设slim.con2d、bottleneck、stack_block_dense函数的参数outputs_collections默认end_points_colletion。根据include_root_block标志,创设ResNet最终面64输出通道步长2的7×7卷积,接步长2的3×3最大池化。四个上涨的幅度2层,图片尺寸收缩为五分之二。用stack_blocks_dense生成残差学习模块组,依照标志加多全局平均池化层,用tf.reduce_mean落成全局平均池化,效能比直接avg_pool高。依据是还是不是有分类数,增加输出通道num_classes1x1卷积(无激活函数无正则项),增添Softmax层输出网络结果。用slim.utils.convert_to_dict
转化collection为Python dict。最终回来net、end_points。

50层ResNet,4个残差学习Blocks,units数量为3、4、6、3,总层数(3+4+6+3)x3+2=50。残差学习模块前,卷积、池化把尺寸降低4倍,前3个Blocks富含步长2层,总尺寸减少4×8=32倍。输入图片尺寸最终变224/32=7。ResNet不断用步长2层缩减尺寸,输出通道数持续扩充,到达2048。

152层ResNet,第二Block units数8,第三Block units数36。

200层ResNet,第二Block units数23,第三Block units数36。

评测函数time_tensorflow_run测量检验152层ResNet
forward质量。图片尺寸224×224,batch size 32。is_training
FLAG设False。resnet_v2_152开立网络,time_tensorflow_run评测forward品质。耗费时间扩张百分之五十,实用卷积神经互连网布局,帮助超深网络磨练,实际工业应用forward质量不差。

仿效资料:
《TensorFlow实践》

接待付费咨询(150元每时辰),笔者的微信:qingxingfengzi

ResNet(Residual Neural Network),微软切磋院 Kaiming
He等4名中夏族民共和国人提议。通过Residual Unit操练152层深神经网络,ILSV…

读书笔记TF031:达成VGGNet,tf031vggnet

VGGNet,威斯康星Madison分校高校处理器视觉组(Visual 吉优metry Group)和谷歌DeepMind公司一起研究开发,深度卷积神经网络。VGGNet一再堆成堆3x3Mini卷积核和2×2最大池化层,成功构筑16~19层深卷积神经互连网。比state-of-the-art网络结构,错误率幅下挫,猎取ILSVRC
二〇一四竞赛分类第2名和稳固第1名。拓宽性强,迁移其余图片数据泛化性好。结构轻易,整个网络都用一样大小卷积核尺寸和最大池化尺寸。VGGNet练习后模型参数官方开源,domain
specific图像分类任务再磨炼,提供较好伊始化权重。

                       ConvNet Configuration
    A           A-LRN      B         C         D         E
    weight layers 11 11    13        16        16        19 
                      input(224x224 RGB image)
    conv3-64  conv3-64  conv3-64  conv3-64  conv3-64  conv3-64
                 LRN    conv3-64  conv3-64  conv3-64  conv3-64
                             maxpool
    conv3-128 conv3-128 conv3-128 conv3-128 conv3-128 conv3-128
                        conv3-128 conv3-128 conv3-128 conv3-128
                             maxpool
    conv3-256 conv3-256 conv3-256 conv3-256 conv3-256 conv3-256
    conv3-256 conv3-256 conv3-256 conv3-256 conv3-256 conv3-256
                        conv1-256 conv3-256 conv3-256 conv3-256
                              maxpool
    conv3-512 conv3-512 conv3-512 conv3-512 conv3-512 conv3-512
    conv3-512 conv3-512 conv3-512 conv3-512 conv3-512 conv3-512
                        conv1-512 conv3-512 conv3-512 conv3-512
                             maxpool
    conv3-512 conv3-512 conv3-512 conv3-512 conv3-512 conv3-512
    conv3-512 conv3-512 conv3-512 conv3-512 conv3-512 conv3-512
                        conv1-512 conv3-512 conv3-512 conv3-512
                             maxpool
                             FC-4096
                             FC-4096
                             FC-1000
                             soft-max

    Network                 A,A-LRN   B    C    D    E
    Number of parameters      133    133  134  138  144

 

卷积层参数量少,最后3个全连接层参数多。磨炼耗费时间在卷积,计算量相当大。D为VGGNet-16,E为VGGNet-19。C比B多3个1×1卷积层,线性别变化换,输入、输出通道数不改变,没降维。

VGGNet
5段卷积,每段2~3卷积层,每段后接最大池化层给减弱图片尺寸。每段卷积核数量同样,越后段卷积核数量越来越多,64-128-256-512-512。四个3×3卷积层堆积。2个3×3卷积层串联格外1个5×5。3个3×3卷积层串联极度1个7×7。
参数越来越少,非线性别变化换更多,加强特征学习本事。

先磨炼品级A轻巧互联网,再复用A互连网权重初如化复杂模型,练习未有速度更加快。预测,Multi-Scale,图像scale尺寸Q,图片输入卷积网络总括。最终卷积层,滑窗分类预测,分化窗口分类结果平均,差异尺寸Q结果平均得最后结果,进步图片数据利用率,升高预测正确率。磨炼进程,用Multi-Scale数据增进,原始图像缩放差别尺寸S,随机裁切224×224图形,扩大数据量,防止过拟合。

L奔驰G级N层成效十分的小,越深网络功能越好,1×1卷积很得力,但大卷积核可以学学更加大空间特点。

载入系统库、TensorFlow。

conv_op函数,制造卷积层,参数存入参数列表。输入,input_op tensor,name
层名,kh kernel height 卷积核高,kw kernel width 卷积核宽,n_out
卷积核数量 输出通道数,dh 步长高,dw
步长度宽度,p参数列表。get_shape()[-1].value得到输入input_op通道数。tf.name_scope(name)设置scope。tf.get_variable创建kernel(卷积核),shape
[kh,kw,n_in,n_out],卷积核高宽、输入输出通道数。tf.contrib.layers.xavier_initializer_conv2d()参数开首化。

tf.nn.conv2d卷积管理input_op。卷积核kernel,步长dhxdw,paddings情势SAME。tf.constant
赋值biases 0,tf.Variable转可磨炼参数。tf.nn.bias_add
相加卷积结果conv和bias,tf.nn.relu非线性管理得activation。创设卷积层,参数kernel、biases增加到参数列表p,卷积层输出activation重返。

全连接层成立函数
fc_op。先得到输入input_op通道数。tf.get_variable创建全连接层参数,第一维度输入通道数n_in,第二维度输出通道数n_out。xavier_initializer参数开端化。biases初始化0.1,幸免dead
neuron。tf.nn.relu_layer矩阵相乘input_op、kernel,加biases,ReLU非线性,交流得activation。全连接层参数kernel、biases增多参数列表p,
activation重回。

概念最大池化层创造函数mpool_op。tf.nn.max_pool,输入input_op,池化尺寸khxkw,步长dhxdw,padding情势SAME。

VGGNet-16互连网布局,6个部分,前5段卷积互联网,最后一段全连接网络。定义成立VGGNet互连网布局函数inference_op。输入input_op、keep_prob(调整dropout比率,placeholder)。先初阶化参数列表p。

创设第一段卷积网络,七个卷积层(conv_op),三个最大池化层(mpool_op)。卷积核大小3×3,卷积核数量(输出通道数)
64,步长1×1,全像素扫描。第一卷积层输入input_op尺寸224x224x3,输出尺寸224x224x64。第二卷积层输入输出尺寸224x224x64。最大池化层2×2,输出112x112x64。

第二段卷积互连网,2个卷积层,1个最大池化层。卷积输出通道数128。输出尺寸56x56x128。

其三段卷积网络,3个卷积层,1个最大池化层。卷积输出通道数256。输出尺寸28x28x256。

第四段卷积网络,3个卷积层,1个最大池化层。卷积输出通道数512。输出尺寸14x14x512。

第五段卷积网络,3个卷积层,1个最大池化层。卷积输出通道数512。输出尺寸7x7x512。输出结果各类样本,tf.reshape
扁平化为长度7x7x512=25088一维向量。

连日来4096隐含点全连接层,激活函数ReLU。连接Dropout层,磨炼节点保留率0.5,预测1.0。

全连接层,Dropout层。

终极连接一千隐含点全连接层,Softmax 分类输出概率。tf.argmax
输出可能率最大项目。重回fc8、softmax、predictions、参数列表p。

VGGNet-16网络布局创设造成。

评测函数time_tensorflow_run。session.run()方法,引入feed_dict,方便传入keep_prob调节Dropout层保留比率。

评测主函数run_benchmark。评测forward(inference)、backward(trainning)运算质量。生成尺寸224×224随机图片,tf.random_nornal函数生成标准差0.1正态遍及随机数。

创建keep_prob
placeholder,调用inference_op函数塑造VGGNet-16互联网布局,获得predictions、softmax、fc8、参数列表p。

始建Session,开始化全局参数。设keep_prob 1.0
预测。time_tensorflow_run评测forward运算时间。

计算VGGNet-16最后全连接层输出fc8 l2
loss。tf.gradients求loss全部模型参数梯度。time_tensorflow_run评测backward运算时间。target为求解梯度操作grad,keep_prob
0.5。设batch_size 32。

实践业评比测主函数run_benchmark(),测量检验VGGNet-16 TensorFlow
forward、backward耗时。forward平均每个batch耗费时间0.152s。backward求解梯度,平均每一种batch耗费时间0.617s。

VGGNet,7.3%错误率。更加深互连网,更加小卷积核,隐式正则化。

    from datetime import datetime
    import math
    import time
    import tensorflow as tf
    def conv_op(input_op, name, kh, kw, n_out, dh, dw, p):
        n_in = input_op.get_shape()[-1].value
        with tf.name_scope(name) as scope:
            kernel = tf.get_variable(scope+"w",
                                 shape=[kh, kw, n_in, n_out],
                                 dtype=tf.float32, 
                                 initializer=tf.contrib.layers.xavier_initializer_conv2d())
            conv = tf.nn.conv2d(input_op, kernel, (1, dh, dw, 1), padding='SAME')
            bias_init_val = tf.constant(0.0, shape=[n_out], dtype=tf.float32)
            biases = tf.Variable(bias_init_val, trainable=True, name='b')
            z = tf.nn.bias_add(conv, biases)
            activation = tf.nn.relu(z, name=scope)
            p += [kernel, biases]
            return activation
    def fc_op(input_op, name, n_out, p):
        n_in = input_op.get_shape()[-1].value
        with tf.name_scope(name) as scope:
            kernel = tf.get_variable(scope+"w",
                                 shape=[n_in, n_out],
                                 dtype=tf.float32, 
                                 initializer=tf.contrib.layers.xavier_initializer())
            biases = tf.Variable(tf.constant(0.1, shape=[n_out], dtype=tf.float32), name='b')
            activation = tf.nn.relu_layer(input_op, kernel, biases, name=scope)
            p += [kernel, biases]
            return activation
    def mpool_op(input_op, name, kh, kw, dh, dw):
        return tf.nn.max_pool(input_op,
                          ksize=[1, kh, kw, 1],
                          strides=[1, dh, dw, 1],
                          padding='SAME',
                          name=name)
    def inference_op(input_op, keep_prob):
        p = []
        # assume input_op shape is 224x224x3
        # block 1 -- outputs 112x112x64
        conv1_1 = conv_op(input_op, name="conv1_1", kh=3, kw=3, n_out=64, dh=1, dw=1, p=p)
        conv1_2 = conv_op(conv1_1,  name="conv1_2", kh=3, kw=3, n_out=64, dh=1, dw=1, p=p)
        pool1 = mpool_op(conv1_2,   name="pool1",   kh=2, kw=2, dw=2, dh=2)
        # block 2 -- outputs 56x56x128
        conv2_1 = conv_op(pool1,    name="conv2_1", kh=3, kw=3, n_out=128, dh=1, dw=1, p=p)
        conv2_2 = conv_op(conv2_1,  name="conv2_2", kh=3, kw=3, n_out=128, dh=1, dw=1, p=p)
        pool2 = mpool_op(conv2_2,   name="pool2",   kh=2, kw=2, dh=2, dw=2)
        # # block 3 -- outputs 28x28x256
        conv3_1 = conv_op(pool2,    name="conv3_1", kh=3, kw=3, n_out=256, dh=1, dw=1, p=p)
        conv3_2 = conv_op(conv3_1,  name="conv3_2", kh=3, kw=3, n_out=256, dh=1, dw=1, p=p)
        conv3_3 = conv_op(conv3_2,  name="conv3_3", kh=3, kw=3, n_out=256, dh=1, dw=1, p=p)    
        pool3 = mpool_op(conv3_3,   name="pool3",   kh=2, kw=2, dh=2, dw=2)
        # block 4 -- outputs 14x14x512
        conv4_1 = conv_op(pool3,    name="conv4_1", kh=3, kw=3, n_out=512, dh=1, dw=1, p=p)
        conv4_2 = conv_op(conv4_1,  name="conv4_2", kh=3, kw=3, n_out=512, dh=1, dw=1, p=p)
        conv4_3 = conv_op(conv4_2,  name="conv4_3", kh=3, kw=3, n_out=512, dh=1, dw=1, p=p)
        pool4 = mpool_op(conv4_3,   name="pool4",   kh=2, kw=2, dh=2, dw=2)
        # block 5 -- outputs 7x7x512
        conv5_1 = conv_op(pool4,    name="conv5_1", kh=3, kw=3, n_out=512, dh=1, dw=1, p=p)
        conv5_2 = conv_op(conv5_1,  name="conv5_2", kh=3, kw=3, n_out=512, dh=1, dw=1, p=p)
        conv5_3 = conv_op(conv5_2,  name="conv5_3", kh=3, kw=3, n_out=512, dh=1, dw=1, p=p)
        pool5 = mpool_op(conv5_3,   name="pool5",   kh=2, kw=2, dw=2, dh=2)
        # flatten
        shp = pool5.get_shape()
        flattened_shape = shp[1].value * shp[2].value * shp[3].value
        resh1 = tf.reshape(pool5, [-1, flattened_shape], name="resh1")
        # fully connected
        fc6 = fc_op(resh1, name="fc6", n_out=4096, p=p)
        fc6_drop = tf.nn.dropout(fc6, keep_prob, name="fc6_drop")
        fc7 = fc_op(fc6_drop, name="fc7", n_out=4096, p=p)
        fc7_drop = tf.nn.dropout(fc7, keep_prob, name="fc7_drop")
        fc8 = fc_op(fc7_drop, name="fc8", n_out=1000, p=p)
        softmax = tf.nn.softmax(fc8)
        predictions = tf.argmax(softmax, 1)
        return predictions, softmax, fc8, p

    def time_tensorflow_run(session, target, feed, info_string):
        num_steps_burn_in = 10
        total_duration = 0.0
        total_duration_squared = 0.0
        for i in range(num_batches + num_steps_burn_in):
            start_time = time.time()
            _ = session.run(target, feed_dict=feed)
            duration = time.time() - start_time
            if i >= num_steps_burn_in:
                if not i % 10:
                    print ('%s: step %d, duration = %.3f' %
                       (datetime.now(), i - num_steps_burn_in, duration))
                total_duration += duration
                total_duration_squared += duration * duration
        mn = total_duration / num_batches
        vr = total_duration_squared / num_batches - mn * mn
        sd = math.sqrt(vr)
        print ('%s: %s across %d steps, %.3f +/- %.3f sec / batch' %
           (datetime.now(), info_string, num_batches, mn, sd))
    def run_benchmark():
        with tf.Graph().as_default():
            image_size = 224
            images = tf.Variable(tf.random_normal([batch_size,
                                               image_size,
                                               image_size, 3],
                                               dtype=tf.float32,
                                               stddev=1e-1))
            keep_prob = tf.placeholder(tf.float32)
            predictions, softmax, fc8, p = inference_op(images, keep_prob)
            init = tf.global_variables_initializer()
            config = tf.ConfigProto()
            config.gpu_options.allocator_type = 'BFC'
            sess = tf.Session(config=config)
            sess.run(init)
            time_tensorflow_run(sess, predictions, {keep_prob:1.0}, "Forward")
            objective = tf.nn.l2_loss(fc8)
            grad = tf.gradients(objective, p)
            time_tensorflow_run(sess, grad, {keep_prob:0.5}, "Forward-backward")
    batch_size=32
    num_batches=100
    run_benchmark()

 

参谋资料:
《TensorFlow实践》

接待付费咨询(150元每时辰),笔者的微信:qingxingfengzi

VGGNet,清华大学管理器视觉组(Visual 吉优metry Group)和GoogleDeepMind公司一道研究开发,深度卷积神经网络。VGGNet反…

亚历克斯Netx后,卷积神经网络,一类网络结构改革调解,一类网络深度扩大。

亚历克斯Netx后,卷积神经互联网,一类网络结构改正调度,一类互联网深度扩充。

Perceptron(1957)
Neocognitron(198x)
NIN(2013) VGG(2014)
Incepiton V1(2014) MSRANet(2014)
Incepiton V2(2015) ResNet(2015)
Incepiton V3(2015) ResNet V2(2015)
Inception ResNet V2(2016)

           Perceptron(1957)
          Neocognitron(198x)
 NIN(2013)                  VGG(2014)
 Incepiton V1(2014)    MSRANet(2014)
 Incepiton V2(2015)    ResNet(2015)
 Incepiton V3(2015)    ResNet V2(2015)
        Inception ResNet V2(2016)

2012年,颜水成人事教育育授,Network in
Network第一回刊出,优化卷积神经互联网布局,推广1×1卷积结构。二〇一五年,Google英斯piton Net V1,英斯ption Module,一再堆放高效卷积网络布局,ILSVRC
2016亚军。2014年终,英斯piton V2,Batch
诺玛lization,加快锻练进度,进步互联网品质。二零一五年末,英斯ption
V3,Factorization in Small
Convolutions观念,分解大尺寸卷积为八个小卷积或一维卷积。

二零一二年,颜水成人事教育育授,Network in
Network第三次公布,优化卷积神经互联网布局,推广1×1卷积结构。二零一四年,Google英斯piton Net V1,英斯ption Module,再三堆放高效卷积互连网布局,ILSVRC
2014亚军。二〇一五年底,英斯piton V2,Batch
Normalization,加快练习进程,进步网络质量。二零一四年末,英斯ption
V3,Factorization in Small
Convolutions观念,分解大尺寸卷积为四个小卷积或一维卷积。

2015年,ILSVRC亚军,VGGNet,全程3×3卷积,19层网络。亚军MSRA-Net(微软)也是深层网络。2014年,微软ResNet,152层互联网,ILSVRC
贰零壹陆冠军,top-5错误率3.47%。ResNet V2,Batch
Normalization,去除激活层,用Identity
Mapping或Preactivation,升高互连网品质。英斯ption ResNet V2,融合英斯ption
Net网络布局,和ResNet磨炼极深互联网残差学习模块。

2015年,ILSVRC季军,VGGNet,全程3×3卷积,19层互连网。亚军MSRA-Net(微软)也是深层网络。2016年,微软ResNet,152层网络,ILSVRC
二零一五亚军,top-5错误率3.1/2。ResNet V2,Batch
Normalization,去除激活层,用Identity
Mapping或Preactivation,升高网络质量。英斯ption ResNet V2,融入英斯ption
Net互连网布局,和ResNet磨练极深互联网残差学习模块。

GPU总括财富,开源工具。

GPU总结财富,开源工具。

循环神经网络(WranglerNN),NLP(Nature Language
Processing,自然语言管理)最常用神经网络结构。Word2Vec,语言字词转化稠密向量(Dense
Vector)。

循环神经互连网(哈弗NN),NLP(Nature Language
Processing,自然语言管理)最常用神经网络结构。Word2Vec,语言字词转化稠密向量(Dense
Vector)。

Word2Vec,Word Embeddings,词向量或词嵌入。语言字词转向量情势发布(Vector
Representations)模型。图片,像素点稠密矩阵,音频,声新闻号频谱数据。

Word2Vec,Word Embeddings,词向量或词嵌入。语言字词转向量情势表达(Vector
Representations)模型。图片,像素点稠密矩阵,音频,声音讯号频谱数据。

One-Hot
Encoder,字词转离散单独符号。叁个词对应贰个向量,整篇作品对应一个疏散矩阵。文本分类模型,Bag
of
Words,荒疏矩阵合併为三个向量,各类词对应向量计数,总结词出现次数,作为特色。特征编码随机,未有关联信息,未有字词关系。荒凉向量必要越来越多多少磨炼,磨炼作用低,计算麻烦。

One-Hot
Encoder,字词转离散单独符号。多少个词对应三个向量,整篇小说对应一个疏散矩阵。文本分类模型,Bag
of
Words,疏落矩阵合併为贰个向量,每种词对应向量计数,总计词现身次数,作为特色。特征编码随机,未有涉嫌消息,没有字词关系。抛荒向量须求更加多多少磨练,锻练作用低,总括麻烦。

向量表明(Vector Representations),向量空间模型(Vector Space
Models),字词转三番五次值向量表达,意思周围词映射向量空量空间周围地点。向量空间模型在NLP依赖纵然Distributional
Hypothesis,同样语境词语义左近。向量空间模型,分两类,计数模型(Latent
Semantic Analysis),预测模型(Neural Probabilistic Language
Models)。计数模型总结语言材质库相邻词频率,计数总结结果转小稠密矩阵,预测模型依照词周围相邻词估计出这么些词和空间向量。

向量表明(Vector Representations),向量空间模型(Vector Space
Models),字词转三番五次值向量表明,意思周边词映射向量空量空间左近地方。向量空间模型在NLP依赖假如Distributional
Hypothesis,一样语境词语义周边。向量空间模型,分两类,计数模型(Latent
Semantic Analysis),预测模型(Neural Probabilistic Language
Models)。计数模型总计语料库相邻词频率,计数计算结果转小稠密矩阵,预测模型依照词相近相邻词推断出这么些词和空中向量。

Word2Vec,计算特别快捷,从原来语言材料学习字词空间向量预测模型。CBOW(Continuous
Bag of
Words)形式从原始语句推测指标字词,适合小型数据。Skip-Gram从指标字词估量原始语句,适合大型语言材料。意思周围词向量空间地方相近。

Word2Vec,总括相当的红速,从原始语言材料学习字词空间向量预测模型。CBOW(Continuous
Bag of
Words)情势从原始语句猜测目的字词,适合小型数据。Skip-Gram从目标字词估计原始语句,适合大型语言材质。意思周围词向量空间地点临近。

预测模型(Neural Probabilistic Language
Models),用最大似然方法,给定前语句h,最大化目的词汇Wt概率。总括量大,需计算词汇表全体单词出现大概。Word2Vec
CBOw模型,只需演练二元分类模型,区分真实目的词汇、编造词汇(噪声)两类。少些噪声词汇估量,类似蒙特Carlo效仿。

前瞻模型(Neural Probabilistic Language
Models),用最大似然方法,给定前语句h,最大化目的词汇Wt可能率。总结量大,需计算词汇表全部单词出现大概。Word2Vec
CBOw模型,只需磨炼二元分类模型,区分真实目的词汇、编造词汇(噪声)两类。少些噪音词汇测度,类似蒙特Carlo模仿。

模型预测真实目标词汇高可能率,预测其余噪音词汇低可能率,演习学习目的最优化。编造噪声词汇磨练,Negative
萨姆pling,总计loss
fuction作用非常高,只需计算随机挑选k个词汇,训练进程快。Noise_contrastive
Estimation(NCE) Loss,TensorFlow tf.nn.nce_loss。

模型预测真实目的词汇高可能率,预测其余噪音词汇低可能率,磨炼学习指标最优化。编造噪声词汇磨炼,Negative
Sampling,计算loss
fuction功用相当高,只需总计随机选拔k个词汇,陶冶进程快。Noise_contrastive
Estimation(NCE) Loss,TensorFlow tf.nn.nce_loss。

Word2Vec
Skip-Gram方式。构造语境与目的词汇映射关系。语境蕴涵单词侧面和侧边词汇。滑窗尺寸
1。Skip-Gram模型,从目的词汇预测语境。创造随机词汇作负样本(噪声)。预测可能率布满,正样本尽恐怕大,随机爆发负样本尺大概小。优化算法(SGD)更新模型Word
Embedding参数,可能率分布损失函数(NCE Loss)尽只怕小。单词Embedded
Vector随教练进度调节,直到最适合语言材质空间地方。损失函数最小,最契合语言质感,预测精确单词可能率最高。

Word2Vec
Skip-Gram方式。构造语境与指标词汇映射关系。语境包罗单词侧面和右侧词汇。滑窗尺寸
1。Skip-Gram模型,从指标词汇预测语境。创设随机词汇作负样本(噪声)。预测可能率遍及,正样本尽或然大,随机发生负样本尺恐怕小。优化算法(SGD)更新模型Word
Embedding参数,可能率遍布损失函数(NCE Loss)尽大概小。单词Embedded
Vector随教练进程调治,直到最符合语料空间地点。损失函数最小,最适合语言材质,预测准确单词概率最高。

载入信赖库。

载入依赖库。

概念下载河南数码函数,urllib.request.urlretrieve下载数据压缩文件核文件尺寸。已下载跳过。

概念下载广西数码函数,urllib.request.urlretrieve下载数据压缩文件核文件尺寸。已下载跳过。

解压下载压缩文件,tf.compat.as_str
数据转单词列表。数据转为17005207单词列表。

解压下载压缩文件,tf.compat.as_str
数据转单词列表。数据转为17005207单词列表。

制造vocabulary词汇表,collections.Counter总计单词列表单词频数,most_common方法取top
50000频数单词作者vocabulary。创设dict,top
陆仟0词汇vocabulary放入dictionary,火速查询。Python
dict查询复杂度O(1),质量好。全体单词转编号(频数排序编号)。top伍仟0以外单词,肯定为Unkown(未知),编号0,总括数据。遍历单词列表,每个单词,剖断是或不是出现在dictionary,是转编号,不是编0。再次回到转换编码(data)、单词频数总括count、词汇表(dictionary)、反转格局(reverse_dictionary)。

始建vocabulary词汇表,collections.Counter总括单词列表单词频数,most_common方法取top
陆仟0频数单词作者vocabulary。创制dict,top
60000词汇vocabulary归入dictionary,神速查询。Python
dict查询复杂度O(1),质量好。全部单词转编号(频数排序编号)。top四千0以外单词,肯定为Unkown(未知),编号0,总结数据。遍历单词列表,每种单词,剖断是还是不是出现在dictionary,是转编号,不是编0。再次回到调换编码(data)、单词频数总括count、词汇表(dictionary)、反转情势(reverse_dictionary)。

去除原始单词列表,节约内部存款和储蓄器。打字与印刷vocabulary最高频词汇、数量(包罗Unknow词汇)。“UNK”类4183九十几个。“the”1061398个。“of”5936八市斤个。data前10单词[‘anarchism’,’originated’,’as’,’a’,’term’,’of’,’abuse’,’first’,’used’,’against’],编号[5235,3084,12,6,195,2,3137,46,59,156]。

删去原始单词列表,节约内部存款和储蓄器。打字与印刷vocabulary最高频词汇、数量(包含Unknow词汇)。“UNK”类4183九十三个。“the”10613九十九个。“of”5936柒20个。data前10单词[‘anarchism’,’originated’,’as’,’a’,’term’,’of’,’abuse’,’first’,’used’,’against’],编号[5235,3084,12,6,195,2,3137,46,59,156]。

生成Word2Vec磨练样本。Skip-Gram形式(从指标单词反推语境)。定义函数generate_batch生成磨炼batch数据。参数batch_size为batch大小。skip_window单词最远可调换距离,设1只好跟左近两个单词生成样本。num_skips单词生成样本个数,不能够超越skip_window两倍,batch_size是它的整数倍,确认保证batch富含词汇全体样本。

生成Word2Vec磨练样本。Skip-Gram方式(从目的单词反推语境)。定义函数generate_batch生成练习batch数据。参数batch_size为batch大小。skip_window单词最远可调换距离,设1只好跟周围五个单词生成样本。num_skips单词生成样本个数,不可能赶上skip_window两倍,batch_size是它的平头倍,确认保障batch包涵词汇全部样本。

单词序号data_index为global变量,每每调用generate_batch,确保data_index能够在函数genetate_batch修改。assert确保num_skips、batch_size满意条件。np.ndarray初阶化batch、labels为数组。定义span
单词成立连锁样本单词数量,饱含指标单词和上下单词,span=2*skip_window+1。创造最大体量span
deque,双向队列,deque append方法加多变量,只保留最终插入span个变量。

单词序号data_index为global变量,每每调用generate_batch,确保data_index能够在函数genetate_batch修改。assert确保num_skips、batch_size满意条件。np.ndarray开首化batch、labels为数组。定义span
单词创立连锁样本单词数量,包括指标单词和上下单词,span=2*skip_window+1。成立最大体量span
deque,双向队列,deque append方法增添变量,只保留最后插入span个变量。

从序号data_index初阶,span个单词顺序读入buffer作伊始值。buffer体积为span
deque,已填满,后续数据替换前边数据。

从序号data_index起初,span个单词顺序读入buffer作开端值。buffer容积为span
deque,已填满,后续数据替换前边数据。

率先层循环(次数batch_size//num_skips),循环内指标单词生成样本。buffer指标单词和享有相关单词,定义target-skip_window,buffer第skip_window个变量为目的单词。定义生成样本需幸免单词列表,tagets_to_avoid,列表开端席卷第skip_window个单词(指标单词),预测语境单词,不包含指标单词。

首先层循环(次数batch_size//num_skips),循环内目的单词生成样本。buffer指标单词和装有相关单词,定义target-skip_window,buffer第skip_window个变量为目的单词。定义生成样本需防止单词列表,tagets_to_avoid,列表初叶席卷第skip_window个单词(目的单词),预测语境单词,不满含指标单词。

其次层循环(次数num_skips),循环语境单词生成样本,首发生随机数,直到随机数不在targets_to_avoid中,代表可用语境单词,生成样本,feature目的词汇buffer[skip_window],label是buffer[target]。语境单词使用,增添到targets_to_avoid过滤。目的单词全数样本生成完(num_skips个),读入下多个单词,抛掉buffer第叁个单词,滑窗向后移动一个人,指标单词向后运动三个,语境单词全体后移,开端生成下一个对象单词练习样本。

其次层循环(次数num_skips),循环语境单词生成样本,首发生随机数,直到随机数不在targets_to_avoid中,代表可用语境单词,生成样本,feature指标词汇buffer[skip_window],label是buffer[target]。语境单词使用,加多到targets_to_avoid过滤。指标单词全数样本生成完(num_skips个),读入下一个单词,抛掉buffer第贰个单词,滑窗向后移动一人,指标单词向后运动一个,语境单词全体后移,开首生成下三个指标单词锻炼样本。

两层循环完结,获得batch_size个练习样本。再次回到batch、labels。

两层循环达成,获得batch_size个陶冶样本。再次回到batch、labels。

调用generate_batch函数测量检验。参数batch_size设8,num_skips设2,skip_window设1,执行generate_batch获得batch、labels,打印。

调用generate_batch函数测量试验。参数batch_size设8,num_skips设2,skip_window设1,执行generate_batch获得batch、labels,打印。

概念训练batch_size 128,embedding_size
128。embedding_size,单词转稠密向量维度,50〜一千。skip_window单词间最远联系距离设1,num_skips目的单词提取样本数设2.生成验证数据valid_examples。随机收取频数最高单词,看向量空间近些日子单词是不是相关性高。valid_size设16收取验证单词数。valid_window设100验证单词频为最高玖十六个单词抽出。np.random.choice函数随机收取。num_sampled练习负样本噪声单词数量。

概念陶冶batch_size 128,embedding_size
128。embedding_size,单词转稠密向量维度,50〜1000。skip_window单词间最远联系距离设1,num_skips指标单词提取样本数设2.生成验证数据valid_examples。随机抽取频数最高单词,看向量空间近年来单词是或不是相关性高。valid_size设16收取验证单词数。valid_window设100表达单词频为最高玖16个单词抽取。np.random.choice函数随机抽出。num_sampled练习负样本噪声单词数量。

定义Skip_Gram
Word2Vec模子互连网布局。成立f.Graph,设置为暗中同意graph。创立练习多少inputs、labels
placeholder,随机爆发valid_examples转TensorFlow constant。with
tf.device(‘/cpu:0’)限定全部计算在CPU实践。tf.random_uniform随机生成全部单词词向量embeddings,单词表大小陆仟0,向量维度128,tf.nn.embedding_lookup查找输入train_inputs对应赂理embed。tf.truncated_normal早先化锻炼优化目标NCE
Loss的权重参数nce_weights,nce_biases初始化0。tf.nn.nce_loss计算学习词向量embedding练习多少loss,tf.reduce_mean汇总。

定义Skip_Gram
Word2Vec模子互联网布局。制造f.Graph,设置为暗中认可graph。创立磨练多少inputs、labels
placeholder,随机发生valid_examples转TensorFlow constant。with
tf.device(‘/cpu:0’)限定全部总结在CPU实践。tf.random_uniform随机生成全部单词词向量embeddings,单词表大小四千0,向量维度128,tf.nn.embedding_lookup查找输入train_inputs对应赂理embed。tf.truncated_normal早先化练习优化目的NCE
Loss的权重参数nce_weights,nce_biases初始化0。tf.nn.nce_loss总结学习词向量embedding磨练多少loss,tf.reduce_mean汇总。

概念优化器SGD ,学习速率1.0。计算嵌入向量embeddings
L2范数norm,embeddings除L2范数得标准normalized_embeddings。tf.nn.embedding_lookup查询证实单词嵌入向量,总计验证单词嵌入同与词汇表全体单词相似性。tf.global_variables_initializer早先化全部模型参数。

概念优化器SGD ,学习速率1.0。总括嵌入向量embeddings
L2范数norm,embeddings除L2范数得标准normalized_embeddings。tf.nn.embedding_lookup查询求证单词嵌入向量,总结验证单词嵌入同与词汇表全体单词相似性。tf.global_variables_initializer初步化全部模型参数。

概念最大迭代次数10万次,创立设置暗许session,实行参数开始化。迭代中,generate_batch生成batch
inputs、labels数据,创建feed_dict。session.run()执行优化器运算(参数更新)和损失总结,操练loss储存到avegage_loss。

概念最大迭代次数10万次,创制设置私下认可session,实践参数发轫化。迭代中,generate_batch生成batch
inputs、labels数据,创建feed_dict。session.run()试行优化器运算(参数更新)和损失总括,陶冶loss积累到avegage_loss。

每两千次巡回,总计平均loss,突显。

每2000次巡回,总计平均loss,展现。

每一千0次巡回,总括验证单词和全方位单词相似度,验证单词最相似8个单词显示。

每一千0次巡回,总结验证单词和全体单词相似度,验证单词最相似8个单词体现。

教练模型对名词、动词、形容词类型单词相似词汇识别特别确切。Skip-Gram
Word2Vec 向量空间发挥(Vetor
Representations)质量不行高,近义词在向量空间地方极度接近。

演练模型对名词、动词、形容词类型单词相似词汇识别特别标准。Skip-Gram
Word2Vec 向量空间表明(Vetor
Representations)品质相当高,近义词在向量空间地点十一分贴近。

概念可视化Word2Vec意义函数。low_dim_embs降给到2维单词空间向量,图表展现单词地方。plt.scatter(matplotlib.pyplot)显示散点图(单词地方),plt.annotate展示单词本身。plt.savefig保存图片到地面文件。

概念可视化Word2Vec效果与利益函数。low_dim_embs降给到2维单词空间向量,图表展现单词地方。plt.scatter(matplotlib.pyplot)显示散点图(单词地点),plt.annotate浮现单词本身。plt.savefig保存图片到当地文件。

sklearn.manifold.TSNe达成降维,原始128维嵌入同量降到2维,plot_sith_labels函数展示。只彰显词频最高九15个单词可视化结果。

sklearn.manifold.TSNe完毕降维,原始128维嵌入同量降到2维,plot_sith_labels函数呈现。只展现词频最高玖拾玖个单词可视化结果。

距离临近单词,语义高相似性。左上角单个字母聚集地。冠词聚集在侧边中部。Word2Vec品质评价,可视化观看,Analogical
Reasoning直接预测语义、语境关系。回答填空难点。大范围语言质感库,参数调节和测量试验选用最适合值。

离开临近单词,语义高相似性。左上角单个假名聚焦地。冠词集中在右手中部。Word2Vec性质量评定价,可视化阅览,Analogical
Reasoning直接预测语义、语境关系。回答填空难题。大面积语言材质库,参数调节和测量试验选拔最适合值。

 

 import collections
 import math
 import os
 import random
 import zipfile
 import numpy as np
 import urllib
 import tensorflow as tf
 # Step 1: Download the data.
 url = 'http://mattmahoney.net/dc/'
 def maybe_download(filename, expected_bytes):
   if not os.path.exists(filename):
     filename, _ = urllib.request.urlretrieve(url + filename, filename)
   statinfo = os.stat(filename)
   if statinfo.st_size == expected_bytes:
     print('Found and verified', filename)
   else:
     print(statinfo.st_size)
     raise Exception(
    'Failed to verify ' + filename + '. Can you get to it with a browser?')
   return filename
 filename = maybe_download('text8.zip', 31344016)
 # Read the data into a list of strings.
 def read_data(filename):
   with zipfile.ZipFile(filename) as f:
     data = tf.compat.as_str(f.read(f.namelist()[0])).split()
   return data
 words = read_data(filename)
 print('Data size', len(words))
 # Step 2: Build the dictionary and replace rare words with UNK token.
 vocabulary_size = 50000
 def build_dataset(words):
   count = [['UNK', -1]]
   count.extend(collections.Counter(words).most_common(vocabulary_size - 1))
   dictionary = dict()
   for word, _ in count:
     dictionary[word] = len(dictionary)
   data = list()
   unk_count = 0
   for word in words:
     if word in dictionary:
       index = dictionary[word]
     else:
       index = 0  # dictionary['UNK']
       unk_count += 1
     data.append(index)
   count[0][1] = unk_count
   reverse_dictionary = dict(zip(dictionary.values(), dictionary.keys()))
   return data, count, dictionary, reverse_dictionary
 data, count, dictionary, reverse_dictionary = build_dataset(words)
 del words  # Hint to reduce memory.
 print('Most common words (+UNK)', count[:5])
 print('Sample data', data[:10], [reverse_dictionary[i] for i in data[:10]])
 data_index = 0
 # Step 3: Function to generate a training batch for the skip-gram model.
 def generate_batch(batch_size, num_skips, skip_window):
   global data_index
   assert batch_size % num_skips == 0
   assert num_skips <= 2 * skip_window
   batch = np.ndarray(shape=(batch_size), dtype=np.int32)
   labels = np.ndarray(shape=(batch_size, 1), dtype=np.int32)
   span = 2 * skip_window + 1 # [ skip_window target skip_window ]
   buffer = collections.deque(maxlen=span)
   for _ in range(span):
     buffer.append(data[data_index])
     data_index = (data_index + 1) % len(data)
   for i in range(batch_size // num_skips):
     target = skip_window  # target label at the center of the buffer
     targets_to_avoid = [ skip_window ]
     for j in range(num_skips):
       while target in targets_to_avoid:
         target = random.randint(0, span - 1)
  targets_to_avoid.append(target)
  batch[i * num_skips + j] = buffer[skip_window]
  labels[i * num_skips + j, 0] = buffer[target]
buffer.append(data[data_index])
data_index = (data_index + 1) % len(data)
   return batch, labels
 batch, labels = generate_batch(batch_size=8, num_skips=2, skip_window=1)
 for i in range(8):
   print(batch[i], reverse_dictionary[batch[i]],
       '->', labels[i, 0], reverse_dictionary[labels[i, 0]])
 # Step 4: Build and train a skip-gram model.
 batch_size = 128
 embedding_size = 128  # Dimension of the embedding vector.
 skip_window = 1       # How many words to consider left and right.
 num_skips = 2         # How many times to reuse an input to generate a label.
 valid_size = 16     # Random set of words to evaluate similarity on.
 valid_window = 100  # Only pick dev samples in the head of the distribution.
 valid_examples = np.random.choice(valid_window, valid_size, replace=False)
 num_sampled = 64    # Number of negative examples to sample.
 graph = tf.Graph()
 with graph.as_default():
   # Input data.
   train_inputs = tf.placeholder(tf.int32, shape=[batch_size])
   train_labels = tf.placeholder(tf.int32, shape=[batch_size, 1])
   valid_dataset = tf.constant(valid_examples, dtype=tf.int32)
   # Ops and variables pinned to the CPU because of missing GPU implementation
   with tf.device('/cpu:0'):
# Look up embeddings for inputs.
     embeddings = tf.Variable(
    tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))
     embed = tf.nn.embedding_lookup(embeddings, train_inputs)
     # Construct the variables for the NCE loss
     nce_weights = tf.Variable(
    tf.truncated_normal([vocabulary_size, embedding_size],
                        stddev=1.0 / math.sqrt(embedding_size)))
     nce_biases = tf.Variable(tf.zeros([vocabulary_size]))
   loss = tf.reduce_mean(
       tf.nn.nce_loss(weights=nce_weights,
                      biases=nce_biases,
                      labels=train_labels,
                      inputs=embed,
                      num_sampled=num_sampled,
                      num_classes=vocabulary_size))
   # Construct the SGD optimizer using a learning rate of 1.0.
   optimizer = tf.train.GradientDescentOptimizer(1.0).minimize(loss)
   # Compute the cosine similarity between minibatch examples and all embeddings.
   norm = tf.sqrt(tf.reduce_sum(tf.square(embeddings), 1, keep_dims=True))
   normalized_embeddings = embeddings / norm
   valid_embeddings = tf.nn.embedding_lookup(
  normalized_embeddings, valid_dataset)
   similarity = tf.matmul(
  valid_embeddings, normalized_embeddings, transpose_b=True)
   # Add variable initializer.
   init = tf.global_variables_initializer()
 # Step 5: Begin training.
 num_steps = 100001
 with tf.Session(graph=graph) as session:
   init.run()
   print("Initialized")
   average_loss = 0
   for step in range(num_steps):
     batch_inputs, batch_labels = generate_batch(
         batch_size, num_skips, skip_window)
     feed_dict = {train_inputs : batch_inputs, train_labels : batch_labels}
     _, loss_val = session.run([optimizer, loss], feed_dict=feed_dict)
     average_loss += loss_val
     if step % 2000 == 0:
       if step > 0:
         average_loss /= 2000
       # The average loss is an estimate of the loss over the last 2000 batches.
       print("Average loss at step ", step, ": ", average_loss)
       average_loss = 0
     # Note that this is expensive (~20% slowdown if computed every 500 steps)
     if step % 10000 == 0:
       sim = similarity.eval()
       for i in range(valid_size):
         valid_word = reverse_dictionary[valid_examples[i]]
         top_k = 8 # number of nearest neighbors
         nearest = (-sim[i, :]).argsort()[1:top_k+1]
         log_str = "Nearest to %s:" % valid_word
         for k in range(top_k):
           close_word = reverse_dictionary[nearest[k]]
           log_str = "%s %s," % (log_str, close_word)
         print(log_str)
   final_embeddings = normalized_embeddings.eval()
 # Step 6: Visualize the embeddings.
 def plot_with_labels(low_dim_embs, labels, filename='tsne.png'):
   assert low_dim_embs.shape[0] >= len(labels), "More labels than embeddings"
   plt.figure(figsize=(18, 18))  #in inches
   for i, label in enumerate(labels):
     x, y = low_dim_embs[i,:]
     plt.scatter(x, y)
     plt.annotate(label,
                  xy=(x, y),
                  xytext=(5, 2),
                  textcoords='offset points',
                  ha='right',
                  va='bottom')
   plt.savefig(filename)
   #%%
 try:
   from sklearn.manifold import TSNE
   import matplotlib.pyplot as plt
   tsne = TSNE(perplexity=30, n_components=2, init='pca', n_iter=5000)
   plot_only = 200
   low_dim_embs = tsne.fit_transform(final_embeddings[:plot_only,:])
   labels = [reverse_dictionary[i] for i in range(plot_only)]
   plot_with_labels(low_dim_embs, labels)
 except ImportError:
   print("Please install sklearn, matplotlib, and scipy to visualize embeddings.")
     import collections
     import math
     import os
     import random
     import zipfile
     import numpy as np
     import urllib
     import tensorflow as tf
     # Step 1: Download the data.
     url = 'http://mattmahoney.net/dc/'
     def maybe_download(filename, expected_bytes):
       if not os.path.exists(filename):
         filename, _ = urllib.request.urlretrieve(url + filename, filename)
       statinfo = os.stat(filename)
       if statinfo.st_size == expected_bytes:
         print('Found and verified', filename)
       else:
         print(statinfo.st_size)
         raise Exception(
        'Failed to verify ' + filename + '. Can you get to it with a browser?')
       return filename
     filename = maybe_download('text8.zip', 31344016)
     # Read the data into a list of strings.
     def read_data(filename):
       with zipfile.ZipFile(filename) as f:
         data = tf.compat.as_str(f.read(f.namelist()[0])).split()
       return data
     words = read_data(filename)
     print('Data size', len(words))
     # Step 2: Build the dictionary and replace rare words with UNK token.
     vocabulary_size = 50000
     def build_dataset(words):
       count = [['UNK', -1]]
       count.extend(collections.Counter(words).most_common(vocabulary_size - 1))
       dictionary = dict()
       for word, _ in count:
         dictionary[word] = len(dictionary)
       data = list()
       unk_count = 0
       for word in words:
         if word in dictionary:
           index = dictionary[word]
         else:
           index = 0  # dictionary['UNK']
           unk_count += 1
         data.append(index)
       count[0][1] = unk_count
       reverse_dictionary = dict(zip(dictionary.values(), dictionary.keys()))
       return data, count, dictionary, reverse_dictionary
     data, count, dictionary, reverse_dictionary = build_dataset(words)
     del words  # Hint to reduce memory.
     print('Most common words (+UNK)', count[:5])
     print('Sample data', data[:10], [reverse_dictionary[i] for i in data[:10]])
     data_index = 0
     # Step 3: Function to generate a training batch for the skip-gram model.
     def generate_batch(batch_size, num_skips, skip_window):
       global data_index
       assert batch_size % num_skips == 0
       assert num_skips <= 2 * skip_window
       batch = np.ndarray(shape=(batch_size), dtype=np.int32)
       labels = np.ndarray(shape=(batch_size, 1), dtype=np.int32)
       span = 2 * skip_window + 1 # [ skip_window target skip_window ]
       buffer = collections.deque(maxlen=span)
       for _ in range(span):
         buffer.append(data[data_index])
         data_index = (data_index + 1) % len(data)
       for i in range(batch_size // num_skips):
         target = skip_window  # target label at the center of the buffer
         targets_to_avoid = [ skip_window ]
         for j in range(num_skips):
           while target in targets_to_avoid:
             target = random.randint(0, span - 1)
      targets_to_avoid.append(target)
      batch[i * num_skips + j] = buffer[skip_window]
      labels[i * num_skips + j, 0] = buffer[target]
    buffer.append(data[data_index])
    data_index = (data_index + 1) % len(data)
       return batch, labels
     batch, labels = generate_batch(batch_size=8, num_skips=2, skip_window=1)
     for i in range(8):
       print(batch[i], reverse_dictionary[batch[i]],
           '->', labels[i, 0], reverse_dictionary[labels[i, 0]])
     # Step 4: Build and train a skip-gram model.
     batch_size = 128
     embedding_size = 128  # Dimension of the embedding vector.
     skip_window = 1       # How many words to consider left and right.
     num_skips = 2         # How many times to reuse an input to generate a label.
     valid_size = 16     # Random set of words to evaluate similarity on.
     valid_window = 100  # Only pick dev samples in the head of the distribution.
     valid_examples = np.random.choice(valid_window, valid_size, replace=False)
     num_sampled = 64    # Number of negative examples to sample.
     graph = tf.Graph()
     with graph.as_default():
       # Input data.
       train_inputs = tf.placeholder(tf.int32, shape=[batch_size])
       train_labels = tf.placeholder(tf.int32, shape=[batch_size, 1])
       valid_dataset = tf.constant(valid_examples, dtype=tf.int32)
       # Ops and variables pinned to the CPU because of missing GPU implementation
       with tf.device('/cpu:0'):
    # Look up embeddings for inputs.
         embeddings = tf.Variable(
        tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))
         embed = tf.nn.embedding_lookup(embeddings, train_inputs)
         # Construct the variables for the NCE loss
         nce_weights = tf.Variable(
        tf.truncated_normal([vocabulary_size, embedding_size],
                            stddev=1.0 / math.sqrt(embedding_size)))
         nce_biases = tf.Variable(tf.zeros([vocabulary_size]))
       loss = tf.reduce_mean(
           tf.nn.nce_loss(weights=nce_weights,
                          biases=nce_biases,
                          labels=train_labels,
                          inputs=embed,
                          num_sampled=num_sampled,
                          num_classes=vocabulary_size))
       # Construct the SGD optimizer using a learning rate of 1.0.
       optimizer = tf.train.GradientDescentOptimizer(1.0).minimize(loss)
       # Compute the cosine similarity between minibatch examples and all embeddings.
       norm = tf.sqrt(tf.reduce_sum(tf.square(embeddings), 1, keep_dims=True))
       normalized_embeddings = embeddings / norm
       valid_embeddings = tf.nn.embedding_lookup(
      normalized_embeddings, valid_dataset)
       similarity = tf.matmul(
      valid_embeddings, normalized_embeddings, transpose_b=True)
       # Add variable initializer.
       init = tf.global_variables_initializer()
     # Step 5: Begin training.
     num_steps = 100001
     with tf.Session(graph=graph) as session:
       init.run()
       print("Initialized")
       average_loss = 0
       for step in range(num_steps):
         batch_inputs, batch_labels = generate_batch(
             batch_size, num_skips, skip_window)
         feed_dict = {train_inputs : batch_inputs, train_labels : batch_labels}
         _, loss_val = session.run([optimizer, loss], feed_dict=feed_dict)
         average_loss += loss_val
         if step % 2000 == 0:
           if step > 0:
             average_loss /= 2000
           # The average loss is an estimate of the loss over the last 2000 batches.
           print("Average loss at step ", step, ": ", average_loss)
           average_loss = 0
         # Note that this is expensive (~20% slowdown if computed every 500 steps)
         if step % 10000 == 0:
           sim = similarity.eval()
           for i in range(valid_size):
             valid_word = reverse_dictionary[valid_examples[i]]
             top_k = 8 # number of nearest neighbors
             nearest = (-sim[i, :]).argsort()[1:top_k+1]
             log_str = "Nearest to %s:" % valid_word
             for k in range(top_k):
               close_word = reverse_dictionary[nearest[k]]
               log_str = "%s %s," % (log_str, close_word)
             print(log_str)
       final_embeddings = normalized_embeddings.eval()
     # Step 6: Visualize the embeddings.
     def plot_with_labels(low_dim_embs, labels, filename='tsne.png'):
       assert low_dim_embs.shape[0] >= len(labels), "More labels than embeddings"
       plt.figure(figsize=(18, 18))  #in inches
       for i, label in enumerate(labels):
         x, y = low_dim_embs[i,:]
         plt.scatter(x, y)
         plt.annotate(label,
                      xy=(x, y),
                      xytext=(5, 2),
                      textcoords='offset points',
                      ha='right',
                      va='bottom')
       plt.savefig(filename)
       #%%
     try:
       from sklearn.manifold import TSNE
       import matplotlib.pyplot as plt
       tsne = TSNE(perplexity=30, n_components=2, init='pca', n_iter=5000)
       plot_only = 200
       low_dim_embs = tsne.fit_transform(final_embeddings[:plot_only,:])
       labels = [reverse_dictionary[i] for i in range(plot_only)]
       plot_with_labels(low_dim_embs, labels)
     except ImportError:
       print("Please install sklearn, matplotlib, and scipy to visualize embeddings.")

参照他事他说加以考察资料:
《TensorFlow实战》

 

应接付费咨询(150元每小时),小编的微信:qingxingfengzi

 

仿照效法资料:
《TensorFlow实战》

接待付费咨询(150元每小时),作者的微信:qingxingfengzi

相关文章