hash & crc32 & conv
數(shù)值轉(zhuǎn)換 sql
SELECT
md5("foo") md5_str
,conv(md5("foo"), 16, 10) md5_10base
,substring(md5("foo"), 0, 16) md5_sub_16
,conv(substring(md5("foo"), 0, 16), 16, 10) md5_sub_16_10base
,cast(conv(substring(md5("foo"), 0, 16), 16, 10) as BIGINT) md5_sub_16_10base_cast
output
| md5_str | md5_10base | md5_sub_16 | md5_sub_16_10base | md5_sub_16_10base_cast |
|---|---|---|---|---|
| acbd18db4cc2f85cedef654fccc4a4d8 | 18446744073709551615 | acbd18db4cc2f85c | 12447132275286669404 | NULL |
hash 散列 sql
SELECT
crc32("hello") crc32_value
,hash("hello") hash_value
,md5("hello") md5_value
,reflect("java.util.UUID", "randomUUID") uuid
output
| crc32_value | hash_value | md5_value | uuid |
|---|---|---|---|
| 907060870 | 99162322 | 5d41402abc4b2a76b9719d911017c592 | d28488e4-e395-47e4-8f39-4450cf28f701 |
length
sql
select
length("中國(guó)") col1
,length("1234") col2
,char_length("中國(guó)") col3
,reverse("1234") col4
output
| col1 | col2 | col3 | col4 |
|---|---|---|---|
| 2 | 4 | 2 | 4321 |
散列統(tǒng)計(jì)
字符串散列 去重 統(tǒng)計(jì)
SELECT count(1) cnt
,count(DISTINCT crc32(keyword)) crc32_cnt
,count(DISTINCT hash(keyword)) hash_cnt
,count(DISTINCT sha(keyword)) sha_cnt
,count(DISTINCT sha1(keyword)) sha1_cnt
,count(DISTINCT sha2(keyword, 256)) sha2_cnt
FROM tb_keyword
LIMIT 10
ouput
搜索:
| cnt | crc32_cnt | hash_cnt | sha_cnt | sha1_cnt | sha2_cnt |
|---|---|---|---|---|---|
| 41491690 | 41292280 | 41265272 | 41491690 | 41491690 | 41491690 |
regexp_extract
select regexp_extract('你好,李先生你的電話是15622150839','電話是([0-9])+', 0)
輸出
電話是15622150839
regexp_replace
第二個(gè)參數(shù),注意轉(zhuǎn)義
Note that some care is necessary in using predefined character classes: using '\s' as the second argument will match the letter s; '\s' is necessary to match whitespace, etc.
示例1
select
regexp_replace(poi, '\\n', '')
from (
select '
2015' poi
union
select '
2016' poi
) t1
輸出
2015
2016
示例2
select
regexp_replace(poi, '\\s', '')
from (
select '2015 ab' poi
union
select '2016 77' poi
) t1
輸出
2015ab
201677
regexp
示例
select
word,
word regexp('aa|bb|cc') reg
FROM (
SELECT "aaA" word
UNION
SELECT "bbB" word
UNION
SELECT "ccC" word
UNION
SELECT "xyz" word
) t1
輸出
word reg
aaA true
bbB true
ccC true
xyz false
case when
case when then 條件添加 and or
case
when (t.batchid <>' ' and t.batchid is not null) then 1
else 0
end
hdfs文件上傳
2019-12-12
若hdfs上已經(jīng)存在文件,要強(qiáng)制覆蓋,用 -f 命令,如:
hadoop fs -put -f src_file /home/test/
create table as
2021-01-29
how do I “create table as select..” with partitions from original table?
P1
No, this is not possible, because Create Table As Select (CTAS) has restrictions:
The target table cannot be a partitioned table.
The target table cannot be an external table.
The target table cannot be a list bucketing table.
You can create table separately and then insert overwrite it.
P2
Creating partitioned table as select is not supported. You can do it in two steps:
create table my_table like dlk.big_table; This will create table with the same schema.
Load data.
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table my_table partition (day, month, year)
select * from dlk.big_table;
P3
There has been some development since this question was originally asked and answered. As per hive documentation: Starting with Hive 3.2.0, CTAS statements can define a partitioning specification for the target table (HIVE-20241).
You can also see the related ticket here. It has been resolved back in July 2018.
Therefore if your hive is of 3.2.0 or higher, then you can simply do
CREATE TABLE test_extract PARTITIONED BY (year string, month string) AS
SELECT
col1,
col2,
year,
month
FROM master_extract