文章均遷移到我的主頁 http://zhenlianghe.com
my github: https://github.com/LynnHo
input_producer
-
input_producer(input_tensor, element_shape=None, num_epochs=None, shuffle=True, seed=None, capacity=32, shared_name=None, summary_name=None, name=None, cancel_op=None)
- 大概流程就是input_tensor (-> shuffle) -> FIFOQueue
- 可以看出,shuffle是在進(jìn)入隊(duì)列之前完成的(出隊(duì)不是隨機(jī)的,因?yàn)槭窍冗M(jìn)先出隊(duì)列)
- capacity代表的是隊(duì)列的容量
- 返回的是一個(gè)隊(duì)列對象
- Note: if num_epochs is not None, this function creates local counter epochs. Use local_variables_initializer() to initialize local variables.
-
string_input_producer
- 簡單封裝了一下input_producer而已
- 輸入是一個(gè)string tensor,返回一個(gè)string tensor的隊(duì)列
sess = tf.Session() a = tf.train.string_input_producer(['a', 'b'], shuffle=False) b = a.dequeue_many(4) coord = tf.train.Coordinator() threads = tf.train.start_queue_runners(sess=sess, coord=coord) # 注意,先調(diào)用這個(gè)函數(shù)來啟動(dòng)所有的queue print(sess.run(b)) [out] ['a' 'b' 'a' 'b'] - Note: if num_epochs is not None, this function creates local counter epochs. Use local_variables_initializer() to initialize local variables.
-
range_input_producer(limit, ...)
- 同樣是簡單封裝了一下input_producer而已
- 輸入一個(gè)limit,然后讓[0, limit)的整數(shù)(隨機(jī))入隊(duì)
- 返回一個(gè)隊(duì)列對象
- Note: if num_epochs is not None, this function creates local counter epochs. Use local_variables_initializer() to initialize local variables.
-
slice_input_producer(tensor_list, ...)
- 封裝了range_input_producer
- tensor_list是一個(gè)列表,其中每個(gè)元素的形狀都是(N, ...)
- 與上述producer不同,該函數(shù)返回的是一個(gè)tensor列表,而并非隊(duì)列,實(shí)際上就是已經(jīng)出隊(duì)了,并且只出隊(duì)一個(gè)元素,源碼如下
with ops.name_scope(name, "input_producer", tensor_list): tensor_list = ops.convert_n_to_tensor_or_indexed_slices(tensor_list) if not tensor_list: raise ValueError( "Expected at least one tensor in slice_input_producer().") range_size = array_ops.shape(tensor_list[0])[0] # TODO(josh11b): Add an assertion that the first dimension of # everything in TensorList matches. Maybe just check the inferred shapes? queue = range_input_producer(range_size, num_epochs=num_epochs, shuffle=shuffle, seed=seed, capacity=capacity, shared_name=shared_name) index = queue.dequeue() output = [array_ops.gather(t, index) for t in tensor_list] return output- 從上述源碼看出,返回的也是一個(gè)list -> [t[i] for t in tensor_list]
- Note: if num_epochs is not None, this function creates local counter epochs. Use local_variables_initializer() to initialize local variables.
batch
-
batch(tensors, ...)
- 輸入tensors是一個(gè)list或dictionary,可以將這個(gè)tensors理解為一個(gè)樣本,包含的不同屬性,比如tensors = {'img': img, 'label': label}
- 維護(hù)一個(gè)FIFOQueue,tensors將入隊(duì),并先入先出,沒有隨機(jī)性
- 返回的是出隊(duì)之后的一個(gè)batch的tensors,對應(yīng)上述tensors,返回={'img': imgs, 'label': labels}
- 這個(gè)函數(shù)有一個(gè)參數(shù)是num_threads,即多線程入隊(duì)?。∵@時(shí)候,由于多線程的時(shí)間順序不確定,因此入隊(duì)的順序也將不確定,相當(dāng)于一個(gè)小shuffle了
-
shuffle_batch(tensors, ...)
- 基本同上
- 維護(hù)一個(gè)RandomShuffleQueue,這個(gè)Queue在出隊(duì)的時(shí)候是隨機(jī)的!
- 同樣也有一個(gè)num_threads參數(shù),即多線程入隊(duì)
- 所以說,這個(gè)shuffle體現(xiàn)在兩點(diǎn)
- 多線程入隊(duì),但是這個(gè)shuffle的范圍僅限于線程數(shù),因此隨機(jī)效果幾乎忽略不計(jì)
- 隨機(jī)出隊(duì)!主要shuffle的功效體現(xiàn)在隨機(jī)出隊(duì),但需要注意的是,如果隊(duì)列中的元素過少,隨機(jī)性就會(huì)小,所以函數(shù)提供了一個(gè)min_after_dequeue來限制隊(duì)列中的最少元素個(gè)數(shù)
-
shuffle_batch_join(tensors_list, ...)
- 上述的兩個(gè)batch操作基本都是給定了一個(gè)tensor,然后函數(shù)里面將定義對這個(gè)tensor的入隊(duì)操作
- 該函數(shù)則是給定一個(gè)list的tensor,對這個(gè)list的tensor進(jìn)行多線程入隊(duì),就是說list里面有多少個(gè)tensor,就有多少個(gè)線程,每個(gè)線程負(fù)責(zé)其中一個(gè)tensor的入隊(duì)
- 該函數(shù)能實(shí)現(xiàn)文件間樣本的shuffle,詳見http://wiki.jikexueyuan.com/project/tensorflow-zh/how_tos/reading_data.html