2021年7月6日星期二

hive学习笔记之七:内置函数

欢迎访问我的GitHub

https://github.com/zq2599/blog_demos

内容:所有原创文章分类汇总及配套源码,涉及Java、Docker、Kubernetes、DevOPS等;

《hive学习笔记》系列导航

  1. 基本数据类型
  2. 复杂数据类型
  3. 内部表和外部表
  4. 分区表
  5. 分桶
  6. HiveQL基础
  7. 内置函数
  8. Sqoop
  9. 基础UDF
  10. 用户自定义聚合函数(UDAF)
  11. UDTF

本篇概览

  • 本文是《hive学习笔记》系列的第七篇,前文熟悉了HiveQL的常用语句,接下来把常用的内置函数简单过一遍,分为以下几部分:
  1. 数学
  2. 字符
  3. json处理
  4. 转换
  5. 日期
  6. 条件
  7. 聚合

准备数据

  1. 本次实战要准备两个表:学生表和住址表,字段都很简单,如下图所示,学生表有个住址ID字段,是住址表里的记录的唯一ID:

在这里插入图片描述
2. 先创建住址表:

create table address (addressid int, province string, city string) row format delimited fields terminated by ',';
  1. 创建address.txt文件,内容如下:
1,guangdong,guangzhou2,guangdong,shenzhen3,shanxi,xian4,shanxi,hanzhong6,jiangshu,nanjing
  1. 加载数据到address表:
load data local inpath '/home/hadoop/temp/202010/25/address.txt' into table address;
  1. 创建学生表,其addressid字段关联了address表的addressid字段:
create table student (name string, age int, addressid int) row format delimited fields terminated by ',';
  1. 创建student.txt文件,内容如下:
tom,11,1jerry,12,2mike,13,3john,14,4mary,15,5
  1. 加载数据到student表:
load data local inpath '/home/hadoop/temp/202010/25/student.txt' into table student;
  1. 至此,本次操作所需数据已准备完毕,如下所示:
hive> select * from address;OK1	guangdong	guangzhou2	guangdong	shenzhen3	shanxi	xian4	shanxi	hanzhong6	jiangshu	nanjingTime taken: 0.043 seconds, Fetched: 5 row(s)hive> select * from student;OKtom	11	1jerry	12	2mike	13	3john	14	4mary	15	5Time taken: 0.068 seconds, Fetched: 5 row(s)
  • 开始体验内置函数;

总览

  1. 进入hive控制台;
  2. 执行命令show functions;显示内置函数列表:
hive> show functions;OK!!=%&*+-/<<=<=><>===>>=^absacosadd_monthsandarrayarray_containsasciiasinassert_trueatanavgbase64betweenbincasecbrtceilceilingcoalescecollect_listcollect_setcompute_statsconcatconcat_wscontext_ngramsconvcorrcoscountcovar_popcovar_sampcreate_unioncume_distcurrent_databasecurrent_datecurrent_timestampcurrent_userdate_adddate_formatdate_subdatediffdaydayofmonthdecodedegreesdense_rankdiveeltencodeewah_bitmapewah_bitmap_andewah_bitmap_emptyewah_bitmap_orexpexplodefactorialfieldfind_in_setfirst_valuefloorformat_numberfrom_unixtimefrom_utc_timestampget_json_objectgreatesthashhexhistogram_numerichourifinin_fileindexinitcapinlineinstrisnotnullisnulljava_methodjson_tuplelaglast_daylast_valuelcaseleadleastlengthlevenshteinlikelnlocateloglog10log2lowerlpadltrimmapmap_keysmap_valuesmatchpathmaxminminutemonthmonths_betweennamed_structnegativenext_dayngramsnoopnoopstreamingnoopwithmapnoopwithmapstreamingnotntilenvlorparse_urlparse_url_tuplepercent_rankpercentilepercentile_approxpipmodposexplodepositivepowpowerprintfradiansrandrankreflectreflect2regexpregexp_extractregexp_replacerepeatreverserlikeroundrow_numberrpadrtrimsecondsentencesshiftleftshiftrightshiftrightunsignedsignsinsizesort_arraysoundexspacesplitsqrtstackstdstddevstddev_popstddev_sampstr_to_mapstructsubstrsubstringsumtanto_dateto_unix_timestampto_utc_timestamptranslatetrimtruncucaseunbase64unhexunix_timestampuppervar_popvar_sampvarianceweekofyearwhenwindowingtablefunctionxpathxpath_booleanxpath_doublexpath_floatxpath_intxpath_longxpath_numberxpath_shortxpath_stringyear|~Time taken: 0.003 seconds, Fetched: 216 row(s)
  1. 以lower函数为例,执行命令describe function lower;即可查看lower函数的说明:
hive> describe function lower;OKlower(str) - Returns str with all characters changed to lowercaseTime taken: 0.005 seconds, Fetched: 1 row(s)
  • 接下来从计算函数开始,体验常用函数;
  • 先执行以下命令,使查询结果中带有字段名:
set hive.cli.print.header=true;

计算函数

  1. 加法+:
hive> select name, age, age+1 as add_value from student;OKname	age	add_valuetom	11	12jerry	12	13mike	13	14john	14	15mary	15	16Time taken: 0.098 seconds, Fetched: 5 row(s)
  1. 减法(-)、乘法(*)、除法(/)的使用与加法类似,不再赘述了;
  2. 四舍五入round:
hive> select round(1.1), round(1.6);OK_c0	_c11.0	2.0Time taken: 0.028 seconds, Fetched: 1 row(s)
  1. 向上取整ceil:
hive> select ceil(1.1);OK_c02Time taken: 0.024 seconds, Fetched: 1 row(s)
  1. 向下取整floor:
hive> select floor(1.1);OK_c01Time taken: 0.024 seconds, Fetched: 1 row(s)
  1. 平方pow,例如pow(2,3)表示2的三次方,等于8:
hive> select pow(2,3);OK_c08.0Time taken: 0.027 seconds, Fetched: 1 row(s)
  1. 取模pmod:
hive> select pmod(10,3);OK_c01Time taken: 0.059 seconds, Fetched: 1 row(s)

字符函数

  1. 转小写lower,转大写upper:
hive> select lower(name), upper(name) from student;OK_c0	_c1tom	TOMjerry	JERRYmike	MIKEjohn	JOHNmary	MARYTime taken: 0.051 seconds, Fetched: 5 row(s)
  1. 字符串长度length:
hive> select name, length(name) from student;OKtom	3jerry	5mike	4john	4mary	4Time taken: 0.322 seconds, Fetched: 5 row(s)
  1. 字符串拼接concat:
hive> select concat("prefix_", name) from student;OKprefix_tomprefix_jerryprefix_mikeprefix_johnprefix_maryTime taken: 0.106 seconds, Fetched: 5 row(s)
  1. 子串substr,substr(xxx,2)表示从第二位开始到右边所有,substr(xxx,2,3)表示从第二位开始取三个字符:
hive> select substr("0123456",2);OK123456Time taken: 0.067 seconds, Fetched: 1 row(s)hive> select substr("0123456",2,3);OK123Time taken: 0.08 seconds, Fetched: 1 row(s)
  1. 去掉前后空格trim:
hive> select trim(" 123 ");OK123Time taken: 0.065 seconds, Fetched: 1 row(s)

json处理(get_json_object)

为了使用json处理的函数,先准备一些数据:

  1. 先创建表t15,只有一个字段用于保存字符串:
create table t15(json_raw string) row format delimited;
  1. 创建t15.txt文件,内容如下:
{"name":"tom","age":"10"}{"name":"jerry","age":"11"}
  1. 加载数据到t15表:
load data local inpath '/home/hadoop/temp/202010/25/015.txt' into table t15;
  1. 使用get_json_object函数,解析json_raw字段,分别取出指定name和age属性:
select get_json_object(json_raw, "$.name"), get_json_object(json_raw, "$.age") from t15;

得到结果:

hive> select  > get_json_object(json_raw, "$.name"),  > get_json_object(json_raw, "$.age")  > from t15;OKtom	10jerry	11Time taken: 0.081 seconds, Fetched: 2 row(s)

日期

  1. 获取当前日期current_date:
hive> select current_date();OK2020-11-02Time taken: 0.052 seconds, Fetched: 1 row(s)
  1. 获取当前时间戳current_timestamp:
hive> select current_timestamp();OK2020-11-02 10:07:58.967Time taken: 0.049 seconds, Fetched: 1 row(s)
  1. 获取年份year、月份month、日期day:......

    原文转载:http://www.shaoqun.com/a/849222.html

    跨境电商:https://www.ikjzd.com/

    inkfrog:https://www.ikjzd.com/w/668

    stylenanda:https://www.ikjzd.com/w/1675.html

    yiqu:https://www.ikjzd.com/w/210


    欢迎访问我的GitHubhttps://github.com/zq2599/blog_demos内容:所有原创文章分类汇总及配套源码,涉及Java、Docker、Kubernetes、DevOPS等;《hive学习笔记》系列导航基本数据类型复杂数据类型内部表和外部表分区表分桶HiveQL基础内置函数Sqoop基础UDF用户自定义聚合函数(UDAF)UDTF本篇概览本文是《hive学习笔记》系列的第
    好东东网:https://www.ikjzd.com/w/1238
    2000亿中国商品关税25%正式实施 / 安克创新发布招股书:https://www.ikjzd.com/articles/91452
    Facebook 联合创始人公开发文支持拆分公司:https://www.ikjzd.com/articles/91454
    速看!皮卡丘带你造爆款!:https://www.ikjzd.com/articles/91456
    中东"土豪"平台NOON销量猛增,你GET到了吗?:https://www.ikjzd.com/articles/91457
    口述实录:刺激!我沉迷在换妻游戏中不可自拔:http://lady.shaoqun.com/m/a/250697.html
    昨天被三个猛男弄个半死 女婿的那个很大:http://lady.shaoqun.com/m/a/283415.html
    学长将我抱到小树林要了我 在学校与学长做太爽了:http://www.30bags.com/m/a/249752.html
    男人眼中的好女人十大标准:http://lady.shaoqun.com/a/403792.html
    我对前夫不满意,再婚的老公不如前夫。六个女人的采访让我看到了很多:http://lady.shaoqun.com/a/403793.html
    第二个已婚女人告诉你:再婚的老公会比前夫好吗?两者的答案非常一致:http://lady.shaoqun.com/a/403794.html
    ebay不出单怎么办:https://www.ikjzd.com/articles/146355

没有评论:

发表评论