大規(guī)模深度學(xué)習(xí),快速處理和解析TFRecord已經(jīng)是必備要求了,記錄一下如何快速預(yù)覽和解析TFRecord
導(dǎo)入相關(guān)包
import tensorflow as tf
from tensorflow.python.ops.parsing_ops import FixedLenFeature
tf.enable_eager_execution()
tf.logging.set_verbosity(tf.logging.INFO)
加載TFRecord(這里我加載的壓縮格式)
filenames = 'data/20210830/part-r-00000.gz'
raw_dataset = tf.data.TFRecordDataset(filenames=filenames,compression_type='GZIP')
快速解析并預(yù)覽內(nèi)容
for raw_record in raw_dataset.take(1):
example = tf.train.Example()
example.ParseFromString(raw_record.numpy())
print(example)
輸出如下:
features {
feature {
key: "album_fea"
value {
float_list {
value: 906.0
value: 1957.0
}
}
}
feature {
key: "albumid"
value {
bytes_list {
value: "41595773"
}
}
}
feature {
key: "is_click"
value {
float_list {
value: 0.0
}
}
}
}
上面只是單純的解析為字符串,如果需要解析為可以Tensorflow可以操作的數(shù)據(jù)格式,則需要根據(jù)輸出的數(shù)據(jù)格式定義feature_description
feature_description = {
'albumid': FixedLenFeature(shape=[1], dtype=tf.string),
'album_fea': FixedLenFeature(shape=[2], dtype=tf.float32),
'is_click': FixedLenFeature( shape=[1], dtype=tf.float32)
}
for serialized_example in raw_dataset.take(1):
features = tf.io.parse_single_example(serialized_example, feature_description)
print(features)
輸出如下
{'album_fea': <tf.Tensor: id=748, shape=(2,), dtype=float32, numpy=array([ 906., 1957.], dtype=float32)>,
'albumid': <tf.Tensor: id=749, shape=(1,), dtype=string, numpy=array([b'41595773'], dtype=object)>,
'is_click': <tf.Tensor: id=750, shape=(1,), dtype=float32, numpy=array([0.], dtype=float32)>
}