cdh默認安裝了。我安裝6.2下面對應的路徑是/opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.0.p0.967373/bin/parquet-tools
如果找不到使用以下命令查看一下具體位置
find / -name parquet-tools
直接運行
/opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.0.p0.967373/bin/parquet-tools
可以看到幫助信息??戳艘幌轮饕?個選項;
- cat命令 打印所有記錄(與linux命令同)
parquet-tools cat:
Prints the content of a Parquet file. The output contains only the data, no
metadata is displayed
usage: parquet-tools cat [option...] <input>
where option is one of:
--debug Enable debug output
-h,--help Show this help string
-j,--json Show records in JSON format.
--no-color Disable color output even if supported
where <input> is the parquet file to print to stdout
- head 打印前面幾條記錄的數(shù)據(jù)(默認5條)(與linux命令同)
parquet-tools head:
Prints the first n record of the Parquet file
usage: parquet-tools head [option...] <input>
where option is one of:
--debug Enable debug output
-h,--help Show this help string
-n,--records <arg> The number of records to show (default: 5)
--no-color Disable color output even if supported
where <input> is the parquet file to print to stdout
- schema 打印parquet的schema
parquet-tools schema:
Prints the schema of Parquet file(s)
usage: parquet-tools schema [option...] <input>
where option is one of:
-d,--detailed Show detailed information about the schema.
--debug Enable debug output
-h,--help Show this help string
--no-color Disable color output even if supported
-o,--originalType Print logical types in OriginalType representation.
where <input> is the parquet file containing the schema to show
- meta 輸出元數(shù)據(jù)信息(可以看到文件是否壓縮,壓縮的方式)
parquet-tools meta:
Prints the metadata of Parquet file(s)
usage: parquet-tools meta [option...] <input>
where option is one of:
--debug Enable debug output
-h,--help Show this help string
--no-color Disable color output even if supported
-o,--originalType Print logical types in OriginalType representation.
where <input> is the parquet file to print to stdout
- dump
parquet-tools dump:
Prints the content and metadata of a Parquet file
usage: parquet-tools dump [option...] <input>
where option is one of:
-c,--column <arg> Dump only the given column, can be specified more than
once
-d,--disable-data Do not dump column data
--debug Enable debug output
-h,--help Show this help string
-m,--disable-meta Do not dump row group and page metadata
-n,--disable-crop Do not crop the output based on console width
--no-color Disable color output even if supported
where <input> is the parquet file to print to stdout
- merge 看介紹應該是合并多個文件使用(我沒有具體操作過)
parquet-tools merge:
Merges multiple Parquet files into one. The command doesn't merge row groups,
just places one after the other. When used to merge many small files, the
resulting file will still contain small row groups, which usually leads to bad
query performance.
usage: parquet-tools merge [option...] <input> [<input> ...] <output>
where option is one of:
--debug Enable debug output
-h,--help Show this help string
--no-color Disable color output even if supported
where <input> is the source parquet files/directory to be merged
<output> is the destination parquet file
- rowcount 打印記錄數(shù)
parquet-tools rowcount:
Prints the count of rows in Parquet file(s)
usage: parquet-tools rowcount [option...] <input>
where option is one of:
-d,--detailed Detailed rowcount of each matching file
--debug Enable debug output
-h,--help Show this help string
--no-color Disable color output even if supported
where <input> is the parquet file to count rows to stdout
- size 打印文件的大小
parquet-tools size:
Prints the size of Parquet file(s)
usage: parquet-tools size [option...] <input>
where option is one of:
-d,--detailed Detailed size of each matching file
--debug Enable debug output
-h,--help Show this help string
--no-color Disable color output even if supported
-p,--pretty Pretty size
-u,--uncompressed Uncompressed size