2021 Big Data Hive (4): Hive query syntax

2021 Big Data Hive (4): Hive query syntax

The most detailed series of Hive articles in the entire network, it is strongly recommended to collect and pay attention!

Later updated articles will list historical article directories to help you review the key points of knowledge.

table of Contents

Series of historical articles

Preface

hive query syntax

1. the SELECT statement

1. Statement structure

2. Full table query

3. Select a specific column to query

4. Column alias

5. Commonly used functions

6, LIMIT statement

7, WHERE statement

2. the operator

1. Comparison operators

2. Logical operators

3. grouping

1. GROUP BY statement

2, HAVING statement

4. JOIN statement

1, inner join (INNER JOIN)

2, LEFT OUTER JOIN

3, RIGHT OUTER JOIN

4, FULL OUTER JOIN

5, multi-table connection

5. sort

1. Order By-Global Sorting

2. Sort By- Local sorting within each MapReduce

3, Distribute By- partition sort

4 Cluster By


Series of historical articles

Big Data Hive in 2021 (12): A comprehensive case of Hive! ! !

Hive large data 2021 (eleven): Hive Tuning

Big Data Hive in 2021 (ten): Hive's data storage format

2021 Big Data Hive (nine): Hive data compression

2021 Big Data Hive (8): Hive custom functions

Big Data Hive in 2021 (7): Hive's windowing function

2021 big data Hive (6): Hive's table generation function

2021 Big Data Hive (5): Hive's built-in functions (mathematics, string, date, condition, conversion, row to column)

2021 Big Data Hive (4): Hive query syntax

Big Data Hive in 2021 (3): Teach you how to thoroughly understand Hive database and table operations (learn how to master the variable warehouse in seconds)

Big Data Hive in 2021 (2): 3.Hive Installation Modes Used with MySQL

2021 Big Data Hive (1): Basic Concepts of Hive

Preface

 A high-quality blog in the field of big data in 2021 will take you from entry to proficiency. The blog is updated every day to gradually improve articles on various knowledge systems of big data to help everyone learn more efficiently.

Those interested in big data can follow the WeChat public account: 3.Gangs of Big Data

hive query syntax

1. the SELECT statement

1. Statement structure

Basic syntax:

SELECT [ALL | DISTINCT]select_expr, select_expr, ... FROM table_reference [WHERE where_condition] [GROUP BY col_list] [HAVING where_condition] [ORDER BY col_list] [CLUSTER BY col_list   | [DISTRIBUTE BY col_list] [SORT BY col_list] ] [LIMIT number] Copy code

1. ORDER BY is used for global sorting, that is, to perform global sorting on all the specified sort keys, use ORDER BY query statements, and finally use a Reduce Task to complete the global sorting. Explanation:

2. Sort by is used for sorting within the partition, that is, sorting within each Reduce task** . **, sort by only guarantees that the output of each reducer is in order, and does not guarantee global order.

3. ** distribute by ( ** field) divide the data into different reducers according to the specified field, and the distribution algorithm is hash.

4. ** cluster by ( ** field) In addition to the function of Distribute by, it also has the sorting function of sort by. .

Therefore, if the distribute by and sort by fields are the same, at this time, cluster by = distribute by + sort by

2. Full table query

select * from score; copy code

3. Select a specific column to query

select sid ,cid from score; copy code

4. Column alias

select sid as myid ,cid from score; copy code

5, commonly used functions

1) Find the total number of rows (count)  select count(1) from score; 2) Find the maximum value of the score (max)  select max(sscore) from score; 3) Find the minimum value of the score (min)  select min(sscore) from score; 4) Find the sum of scores (sum)  select sum(sscore) from score; 5) Find the average of scores (avg)  select avg(sscore) from score; Copy code

6, LIMIT statement

A typical query will return multiple rows of data. The LIMIT clause is used to limit the number of rows returned.

select * from score limit 3; copy the code

7, WHERE statement

1) Use the WHERE clause to filter out rows that do not meet the conditions.

2) The WHERE clause immediately follows the FROM clause.

3) Case practice

Query data with a score greater than 60

select * from score where sscore> 60; copy the code

2. the operator

1, comparison operators operators

1. Operator

OperatorSupported data typesdescription
A=BBasic data typeIf A is equal to B, it returns TRUE, otherwise it returns FALSE
A<=>BBasic data typeIf both A and B are NULL, it returns TRUE, the other is the same as the result of the equal sign (=) operator, if either is NULL, the result is NULL
A<>B, A!=BBasic data typeIf A or B is NULL, it returns NULL; if A is not equal to B, it returns TRUE, otherwise it returns FALSE
A****Basic data typeIf A or B is NULL, it returns NULL; if A is less than B, it returns TRUE, otherwise it returns FALSE
A<=BBasic data typeIf A or B is NULL, it returns NULL; if A is less than or equal to B, it returns TRUE, otherwise it returns FALSE
A>BBasic data typeIf A or B is NULL, it returns NULL; if A is greater than B, it returns TRUE, otherwise it returns FALSE
A>=BBasic data typeIf A or B is NULL, it returns NULL; if A is greater than or equal to B, it returns TRUE, otherwise it returns FALSE
A [NOT] BETWEEN B AND CBasic data typeIf any of A, B or C is NULL, the result is NULL. If the value of A is greater than or equal to B and less than or equal to C, the result is TRUE, otherwise it is FALSE. If the NOT keyword is used, the opposite effect can be achieved.
A IS NULLAll data typesIf A is NULL, it returns TRUE, otherwise it returns FALSE
A IS NOT NULLAll data typesIf A is not equal to NULL, it returns TRUE, otherwise it returns FALSE
IN (number 1, value 2)All data typesUse the IN operation to display the values in the list
A [NOT] LIKE BSTRING typeB is a simple regular expression in SQL. If A matches it, it returns TRUE; otherwise, it returns FALSE. The expression of B is explained as follows:'x%' means A must start with the letter'x','%x' means A must end with the letter'x', and'%x%' means A contains the letter'x', It can be at the beginning, end or in the middle of the string. If the NOT keyword is used, the opposite effect can be achieved.
A RLIKE B, A REGEXP BSTRING typeB is a regular expression. If A matches it, it returns TRUE; otherwise, it returns FALSE. The matching is implemented using the regular expression interface in the JDK, because the regular expression is also based on the rules. For example, the regular expression must match the entire string A, not just its string.

2. Case practice

(1) Query all data with a score equal to 80 select * from score where sscore = 80; (2) Query all data with a score of 80 to 100 select * from score where sscore between 80 and 100; (3) Query all data with empty results select * from score where sscore is null; (4) Query data whose score is 80 or 90 select * from score where sscore in(80,90); Copy code

3. LIKE and RLIKE

1) Use the LIKE operation to select similar values

2) The selection criteria can include characters or numbers:

% Represents zero or more characters (any number of characters).

_ Represents a character.

3) RLIKE clause is an extension of this function in Hive, which can specify matching conditions through Java's regular expression, a more powerful language.

4) Case practice

(1) Find all scores starting with 8 select * from score where sscore like '8%'; (2) Find all score data with the second value of 9 select * from score where sscore like'_9%'; (3) Find all score information with 1 in the id select * from score where sid rlike'[1]'; Copy code

2, logical operators operators

Operatormeaning
ANDLogical union
ORLogical OR
NOTLogical negation

Case practice

(1) Query data whose score is greater than 80 and sid is 01 select * from score where sscore >80 and sid = '01'; (2) The query result is greater than 80, or the sid is a number of 01 select * from score where sscore> 80 or sid = '01'; (3) Query students whose sid is not 01 and 02 select * from score where sid not in ('01','02'); Copy code

3. grouping

1. GROUP BY statement

The GROUP BY statement is usually used with aggregate functions to group by one or more queued results, and then perform aggregation operations on each group. Note that after using group by grouping, the fields after select can only be grouped fields and aggregate functions.

Case practice:

1) Calculate the average score of each student

select sid ,avg(sscore) from score group by sid; copy code

2) Calculate the highest score of each student

select sid ,max(sscore) from score group by sid; copy code

2, HAVING statement

1. The difference between having and where

(1) Where plays a role for the columns in the table to query data; having plays a role for the columns in the query results to filter data.

(2) The grouping function cannot be written after where, and the grouping function can be used after having.

(3) Having is only used for group by grouping statistics statement.

** 2. Case practice: **

- Find the average score of each student select sid ,avg(sscore) from score group by sid; - Find the people whose average score is greater than 85 for each student select sid ,avg(sscore) avgscore from score group by sid having avgscore> 85; Copy code

4. JOIN statement

Hive's join operation only supports equal joins

1, inner join (INNER JOIN)

Inner join: Only the data that matches the join conditions in the two tables that are connected will be retained.

select * from teacher t, course c where t.tid = c.tid; #Implicit inner connection select * from teacher t inner join course c on t.tid = c.tid; #Explicit inner join select * from teacher t join course c on t.tid = c.tid; Copy code

2, LEFT OUTER JOIN

Left outer join: All records in the table on the left side of the JOIN operator that meet the WHERE clause will be returned.

Query the courses corresponding to the teacher

select * from teacher t left join course c on t.tid = c.tid; Copy code

3, RIGHT OUTER JOIN

Right outer join: All records in the table to the right of the JOIN operator that meet the WHERE clause will be returned.

select * from teacher t right join course c on t.tid = c.tid; copy code

4, FULL OUTER JOIN

Full outer join: All records in all tables that meet the conditions of the WHERE statement will be returned. If the specified field of any table does not have a qualified value, then the NULL value is used instead.

SELECT * FROM teacher t FULL JOIN course c ON t.tid = c.tid ; Copy code

5, multi-table connection

Note: To connect n tables, at least n-1 connection conditions are required. For example: to connect three tables, at least two connection conditions are required.

Multi-table join query, query the teacher's corresponding course, and the corresponding score, the corresponding student

select * from teacher t left join course c on t.tid = c.tid left join score s on s.cid = c.cid left join student stu on s.sid = stu.sid; Copy code

In most cases, Hive will start a MapReduce task for each pair of JOIN connection objects. In this example, a MapReduce job will be started to connect the table teacher and the table course, and then a MapReduce job will be started to connect the output of the first MapReduce job with the table score; for the connection operation.

5. sort

1. Order By-Global Sorting

Order By: global ordering, one reduce

1. Sort using the ORDER BY clause

ASC (ascend): Ascending order (default)

DESC (descend): descending order

2. The ORDER BY clause is at the end of the SELECT statement.

3. Case practice

(1) Query the students' scores and sort them in descending order of scores

SELECT * FROM student s LEFT JOIN score sco ON s.sid = sco.sid ORDER BY sco.sscore DESC; duplicated code

(2) Sort according to the average of the scores

select sid ,avg(sscore) avg from score group by sid order by avg; Copy code

(3) Sort by student id and average grade

select sid ,avg(sscore) avg from score group by sid order by sid,avg; copy code

2, the Sort MapReduce inside each partial sorting By-

Sort By: Sorting is performed internally in each MapReduce, not sorting for the global result set.

1) Set the number of reduce  set mapreduce.job.reduces=3; 2) View the number of reduce settings  set mapreduce.job.reduces; 3) Query results are sorted in descending order of results  select * from score sort by sscore; 4) Import the query results into the file (arranged in descending order of grades)  insert overwrite local directory'/export/data/exporthive/sort' select * from score sort by sscore; Copy code

3, Distribute By-partition sort

Distribute By: Similar to partition in MR, it is partitioned and used in combination with sort by.

Note that Hive requires the DISTRIBUTE BY statement to be written before the SORT BY statement.

For the test of distribute by, it is necessary to allocate multiple reduce for processing, otherwise the effect of distribute by cannot be seen.

Case practice:

First is divided according to student id area , and then sorted according to student achievement.

1) Set the number of reduce, divide our corresponding sid into the corresponding reduce set mapreduce.job.reduces=7; 2) Partition of data through distribute by insert overwrite local directory'/export/data/exporthive/distribute' select * from score distribute by sid sort by sscore;   Copy code

4, Cluster By

When the distribute by and sort by fields are the same, the cluster by method can be used.

In addition to the function of distribute by, cluster by also has the function of sort by. But the sorting can only be in ascending order, and the sorting rule cannot be specified as ASC or DESC.

The following two ways of writing are equivalent:

select * from score cluster by sid;  select * from score distribute by sid sort by sid; Copy code

  • Blog homepage: lansonli.blog.csdn.net
  • Welcome to like it Collection Leave a message Please correct me if there is any mistake!
  • This article was originally written by Lansonli and first published on the CSDN blog
  • The series of big data articles will be updated every day. When you stop and rest, don t forget that others are still running. I hope you all hurry up to learn and go all out for a better life.