人妻av一区二区三区,日韩精品不卞视频,久艹视频日韩美女

讀取機(jī)制

Tensorflow中數(shù)據(jù)讀取機(jī)制可見(jiàn)下圖

關(guān)于這張圖，這篇文章已經(jīng)介紹的非常詳細(xì)，簡(jiǎn)而言之，Tensorflow為了不讓數(shù)據(jù)讀取成為代碼的事件瓶頸，用了兩個(gè)隊(duì)列來(lái)進(jìn)行文件的讀取：

文件隊(duì)列，通過(guò)tf.train.string_input_producer()函數(shù)來(lái)創(chuàng)建，文件名隊(duì)列不包含文件的具體內(nèi)容，只是在隊(duì)列中記錄所有的文件名，所以可以在這個(gè)函數(shù)中對(duì)文件設(shè)置多個(gè)epoch，并對(duì)其進(jìn)行shuffle。這個(gè)函數(shù)只是創(chuàng)建一個(gè)文件隊(duì)列，并指定入隊(duì)的操作由幾個(gè)線程同時(shí)完成。真正的讀取文件名內(nèi)容是從執(zhí)行了tf.train.start_queue_runners()開(kāi)始的，start_queue_runners返回一個(gè)op，一旦執(zhí)行這個(gè)op，文件名隊(duì)列就開(kāi)始被填充了。
內(nèi)存隊(duì)列，這個(gè)隊(duì)列不需要用戶(hù)手動(dòng)創(chuàng)建，有了文件名隊(duì)列后，start_queue_runners之后，Tensorflow會(huì)自己維護(hù)內(nèi)存隊(duì)列并保證用戶(hù)時(shí)時(shí)有數(shù)據(jù)可讀。
典型的代碼如下：

import tensorflow as tf 

# 新建一個(gè)Session
with tf.Session() as sess:
    # 我們要讀三幅圖片A.jpg, B.jpg, C.jpg
    filename = ['A.jpg', 'B.jpg', 'C.jpg']
    # string_input_producer會(huì)產(chǎn)生一個(gè)文件名隊(duì)列
    filename_queue = tf.train.string_input_producer(filename, shuffle=False, num_epochs=5)
    # reader從文件名隊(duì)列中讀數(shù)據(jù)。對(duì)應(yīng)的方法是reader.read
    reader = tf.WholeFileReader()
    key, value = reader.read(filename_queue)
    # tf.train.string_input_producer定義了一個(gè)epoch變量，要對(duì)它進(jìn)行初始化
    tf.local_variables_initializer().run()
    # 使用start_queue_runners之后，才會(huì)開(kāi)始填充隊(duì)列
    threads = tf.train.start_queue_runners(sess=sess)
    i = 0
    while True:
        i += 1
        # 獲取圖片數(shù)據(jù)并保存
        image_data = sess.run(value)
        with open('read/test_%d.jpg' % i, 'wb') as f:
            f.write(image_data)

注意string_input_producer()中的shuffle是文件級(jí)別的，如果要讀取的文件是TFRecord文件，一個(gè)文件中就包含幾千甚至更多條數(shù)據(jù)，那么這里的shuffle和我們平時(shí)訓(xùn)練數(shù)據(jù)時(shí)說(shuō)的shuffle還是不一樣的。

TODO: 把讀取出的數(shù)據(jù)組成batch的代碼

slim數(shù)據(jù)讀取接口

用slim讀取數(shù)據(jù)分為以下幾步：

給出數(shù)據(jù)來(lái)源的文件名并據(jù)此建立slim.Dataset，邏輯上Dataset中是含有所有數(shù)據(jù)的，當(dāng)然物理上并非如此。
根據(jù)slim.Dataset建立一個(gè)DatasetDataProvider，這個(gè)class提供接口可以讓你從Dataset中一條一條的去取數(shù)據(jù)
通過(guò)DatasetDataProvider的get接口拿到獲取數(shù)據(jù)的op，并對(duì)數(shù)據(jù)進(jìn)行必要的預(yù)處理（如有）
利用從provider中g(shù)et到的數(shù)據(jù)建立batch，此處可以對(duì)數(shù)據(jù)進(jìn)行shuffle，確定batch_size等等
利用分好的batch建立一個(gè)prefetch_queue
prefetch_queue中有一個(gè)dequeue的op，沒(méi)執(zhí)行一次dequeue則返回一個(gè)batch的數(shù)據(jù)。

下面我們通過(guò)代碼來(lái)一一介紹具體如何使用。
1.建立slim.Dataset
根據(jù)官方文檔，slim.Dataset包含data_sources，reader，decoder，num_samples，descriptions五個(gè)部分，其中data_sources是一系列文件名，代表組成數(shù)據(jù)集全體的文件名；reader，針對(duì)文件的類(lèi)型，選擇合適的reader；decoder，一個(gè)解釋器，用于將文件中存儲(chǔ)的數(shù)據(jù)轉(zhuǎn)換為T(mén)ensor類(lèi)型；num_samples，指明數(shù)據(jù)集中一共含有多少條數(shù)據(jù)；descriptions可以添加一些對(duì)于數(shù)據(jù)的額外備注和說(shuō)明，非必須。下面是一段典型的建立Dataset的代碼，假設(shè)我們的數(shù)據(jù)由多個(gè)TFRecord文件組成，每個(gè)TFRecord存儲(chǔ)若干數(shù)據(jù)，在TFRecord中，每條數(shù)據(jù)都是一個(gè)TFExample類(lèi)型：

def get_split(split_name, dataset_dir, file_pattern, num_samples, reader=None):
    dataset_dir = util.io.get_absolute_path(dataset_dir)
    
    if util.str.contains(file_pattern, '%'):
        # 處理有多個(gè)文件的情況，file_pattern是文件名list
        file_pattern = util.io.join_path(dataset_dir, file_pattern % split_name)
    else:
        file_pattern = util.io.join_path(dataset_dir, file_pattern)
    # Allowing None in the signature so that dataset_factory can use the default.
    if reader is None:
        reader = tf.TFRecordReader
    keys_to_features = {
        'image/encoded': tf.FixedLenFeature((), tf.string, default_value=''),
        'image/format': tf.FixedLenFeature((), tf.string, default_value='jpeg'),
        'image/filename': tf.FixedLenFeature((), tf.string, default_value=''),
        'image/shape': tf.FixedLenFeature([3], tf.int64),
        'image/object/bbox/label': int64_feature(labels),
    }
    items_to_handlers = {
        'image': slim.tfexample_decoder.Image('image/encoded', 'image/format'),
        'shape': slim.tfexample_decoder.Tensor('image/shape'),
        'filename': slim.tfexample_decoder.Tensor('image/filename'),
        'object/label': slim.tfexample_decoder.Tensor('image/object/bbox/label')
    }
    # slim.Decoder可以給兩個(gè)參數(shù)，兩個(gè)都是dict，第一個(gè)參數(shù)指定要如何解析每個(gè)Example，第二個(gè)參數(shù)可以把讀取出的數(shù)據(jù)進(jìn)一步簡(jiǎn)單處理或者組合成需要的數(shù)據(jù)
    decoder = slim.tfexample_decoder.TFExampleDecoder(keys_to_features, items_to_handlers)

    items_to_descriptions = {
        'image': 'A color image of varying height and width.',
        'shape': 'Shape of the image',
        'object/label': 'A list of labels, one per each object.',
    }
    ## 建立并返回一個(gè)Dataset
    return slim.dataset.Dataset(
            data_sources=file_pattern,
            reader=reader,
            decoder=decoder,
            num_samples=num_samples,
            items_to_descriptions=items_to_descriptions,
            num_classes=2,
            labels_to_names=labels_to_names)

2. 建立DatasetDataProvider

# 下面用到的dataset就是我們上面建立的slim.dataset.Dataset，num_readers是指定線程數(shù)目，即如果后續(xù)
# 要多線程讀數(shù)據(jù)的話，最多可以有5個(gè)的get可以被同時(shí)調(diào)用來(lái)填充數(shù)據(jù)。capacity是provider自己維護(hù)的
# 隊(duì)列的大小，get操作相當(dāng)于dequeue操作，enqueue操作由provider自己完成
provider = slim.dataset_data_provider.DatasetDataProvider(dataset, num_readers=5, \
                common_queue_capacity=10, common_queue_min=1, shuffle=True)
# 每調(diào)用一次get，得到一條數(shù)據(jù)。同樣，這里的get得到的依然是一個(gè)Tensor的op，不是一個(gè)實(shí)實(shí)在在的張量
[image, shape, label] = provider.get(['image', 'shape', 'object/label'])

3. 必要的預(yù)處理

# 此處可以做一些預(yù)處理，數(shù)據(jù)就一條，沒(méi)有第一維的batch維度
[image, shape, label] = preprocess(image, shape, label)

4. 建立batch
根據(jù)官方文檔，train.batch是維護(hù)有自己的隊(duì)列的，所以它也可以開(kāi)多個(gè)線程從provider中獲取數(shù)據(jù)，num_threads就是這個(gè)意思，capacity自然就是隊(duì)列大小。

# 官方還有tf.train.shuffle_batch等接口，提供shuffle數(shù)據(jù)等功能
b_image, b_label = tf.train.batch([image, label], batch_size=32, num_threads=4, capacity=200)

5. 建立prefetch_queue

batch_queue = slim.prefetch_queue.prefetch_queue([b_image, b_label], capacity = 20)

其實(shí)這個(gè)地方我有一個(gè)不解，既然第四步已經(jīng)將數(shù)據(jù)都分好的batch放進(jìn)了隊(duì)列，理論上只要執(zhí)行batch返回的的op就可以直接得到數(shù)據(jù)，為了還要再包一層隊(duì)列，產(chǎn)生一個(gè)batch_queue呢？根據(jù)官方的解釋，prefetch_queue的作用是把batch后的數(shù)據(jù)聚合到一起(assemble)，保證用戶(hù)在讀取數(shù)據(jù)時(shí)不需要再花時(shí)間assemble。
看來(lái)Tensorflow早就想到了這個(gè)，并且外面再包一層也是有道理的，但是我本人理解batch后的數(shù)據(jù)就是assemble之后的，不知道它的batch操作是怎么樣的等研究過(guò)代碼再說(shuō)吧。（TODO）

6. 運(yùn)行dequeue的op獲取數(shù)據(jù)

b_images, b_labels = batch_queue.dequeue()
with tf.Sesstion() as sess:
    images, labels = sess.run(images, labels)
    print(images)
    print(labels)

tf.data.Dataset接口

slim提供的數(shù)據(jù)讀取接口其實(shí)也不夠簡(jiǎn)潔，看看生一部分的六個(gè)步驟就知道過(guò)程還有有些繁瑣的，想要熟練運(yùn)用，不了解一些Tensorflow的實(shí)現(xiàn)是有點(diǎn)難的。但是tf.data.Dataset則不然，他隱藏了所有Tensorflow處理數(shù)據(jù)流的細(xì)節(jié)，用戶(hù)只需要幾步簡(jiǎn)單的操作就可以輕松讀到數(shù)據(jù)，這使得數(shù)據(jù)讀取更加容易上手且寫(xiě)出的代碼更加簡(jiǎn)潔、易懂。tf.data.Dataset的介紹將會(huì)在另外一篇文章中講解。

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

Tensorflow數(shù)據(jù)讀取

Tensorflow數(shù)據(jù)讀取

讀取機(jī)制

slim數(shù)據(jù)讀取接口

tf.data.Dataset接口

參考文獻(xiàn)

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

Tensorflow數(shù)據(jù)讀取

讀取機(jī)制

slim數(shù)據(jù)讀取接口

tf.data.Dataset接口

參考文獻(xiàn)

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av