Practical Hive : a guide to Hadoop's data warehouse system
Table of Contents:
- At a Glance; Contents; About the Authors; About the Technical Reviewers; Acknowledgments; Introduction; Chapter 1: Setting the Stage for Hive: Hadoop; An Elephant Is Born; Hadoop Mechanics; Data Redundancy; Traditional High Availability; Hadoop High Availability; Processing with MapReduce; Beyond MapReduce; YARN and the Modern Data Architecture; Hadoop and the Open Source Community; Where Are We Now; Chapter 2: Introducing Hive; Hadoop Distributions; Cluster Architecture; Hive Installation; Finding Your Way Around; Hive CLI; Chapter 3: Hive Architecture; Hive Components; HCatalog.
- Hiveserver2Client Tools; Execution Engine: Tez; Chapter 4: Hive Tables DDL; Schema-on-Read; Hive Data Model; Schemas/Databases; Why Use Multiple Schemas/Databases; Creating Databases; Altering Databases; Dropping Databases; List Databases; Data Types in Hive; Primitive Data Types; Choosing Data Types; Complex Data Types; Arrays; Maps; Structs; Unions; Tables; Creating Tables; Listing Tables; Internal/External Tables; External Tables; Internal or Managed Tables; External/Internal Table Example; Table Properties; Generating a Create Table Command for Existing Tables; Partitioning and Bucketing.
- PartitioningPartitioning Considerations; Efficiently Partitioning on Date Columns; Bucketing; Bucketing Considerations; Temporary Tables; Altering Tables; Renaming Tables; Modifying a Table's Storage Properties; ORC File Format; Merging a Table's Files; Altering Table Partitions; Add Partition; Rename Partition; Modifying Columns; Adding Columns; Dropping Tables/Partitions; Drop Tables; Dropping Partitions; Protecting Tables/Partitions; Other Create Table Command Options; Create Table as Select (CTAS); Create Table Like; Chapter 5: Data Manipulation Language (DML); Loading Data into Tables.
- Loading Data Using Files Stored on the Hadoop Distributed File SystemUsing Hive to Upload a Data File; Loading Data Using Queries; Using an Existing Table to Create a New Table; Writing Data into the File System from Queries; Using an Existing Table to Create an Output Directory; Inserting Values Directly into Tables; Adding Extra Records to an Existing Table; Updating Data Directly in Tables; Updating Records in an Existing Table; Deleting Data Directly in Tables; Updating Records in an Existing Table; Creating a Table with the Same Structure.
- Using an Existing Table to Create a New Table with the Same StructureJoins; Using Equality Joins to Combine Tables; Joining Tables in Hive; Using Outer Joins; Joining Tables in Hive Using Left Join; Joining Tables in Hive Using Right Join; Joining Tables in Hive Using a Full Outer Join; Using Left Semi-Joins; Performing a Semi-Join; Using Join with Single MapReduce; Joining Three Tables in One MapReduce; Using Largest Table Last; Transactions; What Is ACID and Why Use It?; Hive Configuration; Chapter 6: Loading Data into Hive; Design Considerations Before Loading Data; Loading Data into HDFS.