Pig installation and basic use

Pig installation and basic use

pig installation

Download and unzip the installation package

Download the latest Pig software package in Apache, click download will recommend the fastest mirror site, the following is the download address:
pig download

Configuration Environment

Unzip it to the installation path and edit the/etc/profile file with the following command:
Pig working mode

Local mode: only need to configure the PATH environment variable ${PIG_HOME}/bin, suitable for testing

Mapreduce mode: need to add environment variable PIG_CLASSPATH=${HADOOP_HOME}/conf/, pointing to hadoop's conf directory, mine is hadoop2.6, hadoop home:/usr/local/hadoop/etc/hadoop

sudo vi/etc/profile Add to: export PIG_HOME=/app/pig-0.13.0 export PIG_CLASSPATH=/usr/local/hadoop/etc/hadoop export PATH=$PATH:$PIG_HOME/bin Copy code

Basic use

Copy test data to hdfs: test data download

hadoop fs -put ncdc_data.txt/input/in1 /duplicated code

Use Pig latin to find the annual maximum temperature

1. Load weather data

Wrong address for the first time, resulting in no file found

grunt> A = LOAD '/input/in1/ncdc_data.txt' USING PigStorage ( ':') AS (year: int, temp: int, quality: int); duplicated code

1. Filter data

grunt> B = FILTER A BY temp != 9999 AND ((chararray)quality matches'[01459]'); Or B = FILTER A BY temp != 9999 AND (quality == 0 OR quality == 1 OR quality == 4 OR quality == 5 OR quality == 9); Copy code

Group weather data by year

grunt> C = GROUP B BY year ; duplicated code

Scan the data line by line and find the maximum value and the corresponding year (group)

grunt> D = FOREACH C GENERATE group , MAX (B.temp) AS max_temp; duplicated code

Output result

grunt> DUMP D; copy the code
2016-11-20 06:02:41,902 [main] INFO org.apache.pig.tools.pigstats.ScriptState-Pig features used in the script: GROUP_BY,FILTER 2016-11-20 06:02:42,053 [main] INFO org.apache.hadoop.conf.Configuration.deprecation-fs.default.name is deprecated. Instead, use fs.defaultFS 2016-11-20 06:02:42,054 [main] INFO org.apache.hadoop.conf.Configuration.deprecation-mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address 2016-11-20 06:02:42,067 [main] INFO org.apache.pig.data.SchemaTupleBackend-Key [pig.schematuple] was not set... will not generate code. 2016-11-20 06:02:42,069 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer-{RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserterOptimizer, PartitionFilterForEach, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]} 2016-11-20 06:02:42,107 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler-File concatenation threshold: 100 optimistic? false 2016-11-20 06:02:42,114 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.CombinerOptimizerUtil-Choosing to move algebraic foreach to combiner 2016-11-20 06:02:42,140 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer-MR plan size before optimization: 1 2016-11-20 06:02:42,140 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer-MR plan size after optimization: 1 2016-11-20 06:02:42,241 [main] INFO org.apache.hadoop.conf.Configuration.deprecation-fs.default.name is deprecated. Instead, use fs.defaultFS 2016-11-20 06:02:42,250 [main] INFO org.apache.hadoop.yarn.client.RMProxy-Connecting to ResourceManager at TEST/192.168.1.124:8032 2016-11-20 06:02:42,263 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState-Pig script settings are added to the job 2016-11-20 06:02:42,278 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler-mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2016-11-20 06:02:42,280 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler-Reduce phase detected, estimating # of required reducers. 2016-11-20 06:02:42,280 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler-Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator 2016-11-20 06:02:42,308 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator-BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=3673672 2016-11-20 06:02:42,308 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler-Setting Parallelism to 1 2016-11-20 06:02:42,308 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler-This job cannot be converted run in-process 2016-11-20 06:02:43,095 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler-Added jar file:/usr/local/pig/pig-0.16.0-core-h2 .jar to DistributedCache through/tmp/temp-60624248/tmp72750994/pig-0.16.0-core-h2.jar 2016-11-20 06:02:43,367 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler-Added jar file:/usr/local/pig/lib/automaton-1.11-8.jar to DistributedCache through/tmp/temp-60624248/tmp-2105835473/automaton-1.11-8.jar 2016-11-20 06:02:43,518 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler-Added jar file:/usr/local/pig/lib/antlr-runtime-3.4.jar to DistributedCache through/tmp/temp-60624248/tmp1218719075/antlr-runtime-3.4.jar 2016-11-20 06:02:43,701 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler-Added jar file:/usr/local/pig/lib/joda-time-2.9.3 .jar to DistributedCache through/tmp/temp-60624248/tmp-2048402576/joda-time-2.9.3.jar 2016-11-20 06:02:43,707 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler-Setting up single store job 2016-11-20 06:02:43,710 [main] INFO org.apache.pig.data.SchemaTupleFrontend-Key [pig.schematuple] is false, will not generate code. 2016-11-20 06:02:43,710 [main] INFO org.apache.pig.data.SchemaTupleFrontend-Starting process to move generated code to distributed cacche 2016-11-20 06:02:43,710 [main] INFO org.apache.pig.data.SchemaTupleFrontend-Setting key [pig.schematuple.classes] with classes to deserialize [] 2016-11-20 06:02:43,840 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher-1 map-reduce job(s) waiting for submission. 2016-11-20 06:02:43,847 [JobControl] INFO org.apache.hadoop.yarn.client.RMProxy-Connecting to ResourceManager at TEST/192.168.1.124:8032 2016-11-20 06:02:44,029 [JobControl] WARN org.apache.hadoop.mapreduce.JobSubmitter-No job jar file set. User classes may not be found. See Job or Job#setJar(String). 2016-11-20 06:02:44,159 [JobControl] INFO org.apache.pig.builtin.PigStorage-Using PigTextInputFormat 2016-11-20 06:02:44,172 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat-Total input paths to process: 1 2016-11-20 06:02:44,172 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil-Total input paths to process: 1 2016-11-20 06:02:44,350 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil-Total input paths (combined) to process: 1 2016-11-20 06:02:44,709 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter-number of splits:1 2016-11-20 06:02:47,105 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter-Submitting tokens for job: job_1479576092520_0006 2016-11-20 06:02:47,816 [JobControl] INFO org.apache.hadoop.mapred.YARNRunner-Job jar is not present. Not adding any jar to the list of resources. 2016-11-20 06:02:53,694 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl-Submitted application application_1479576092520_0006 2016-11-20 06:02:54,016 [JobControl] INFO org.apache.hadoop.mapreduce.Job-The url to track the job: http://TEST:8088/proxy/application_1479576092520_0006/ 2016-11-20 06:02:54,017 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher-HadoopJobId: job_1479576092520_0006 2016-11-20 06:02:54,017 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher-Processing aliases A,B,C,D 2016-11-20 06:02:54,017 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher-detailed locations: M: A[5,4],A[-1,-1] ,B[6,4],D[8,4],C[7,4] C: D[8,4],C[7,4] R: D[8,4] 2016-11-20 06:02:54,252 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher-0% complete 2016-11-20 06:02:54,252 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher-Running jobs are [job_1479576092520_0006] 2016-11-20 06:04:53,944 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher-5% complete 2016-11-20 06:04:53,945 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher-Running jobs are [job_1479576092520_0006] 2016-11-20 06:04:56,974 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher-21% complete 2016-11-20 06:04:56,978 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher-Running jobs are [job_1479576092520_0006] 2016-11-20 06:05:04,031 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher-33% complete 2016-11-20 06:05:04,031 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher-Running jobs are [job_1479576092520_0006] 2016-11-20 06:05:24,319 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher-50% complete 2016-11-20 06:05:24,320 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher-Running jobs are [job_1479576092520_0006] 2016-11-20 06:10:06,870 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher-66% complete 2016-11-20 06:10:06,870 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher-Running jobs are [job_1479576092520_0006] 2016-11-20 06:10:14,258 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher-83% complete 2016-11-20 06:10:14,258 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher-Running jobs are [job_1479576092520_0006] 2016-11-20 06:10:22,325 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher-Running jobs are [job_1479576092520_0006] 2016-11-20 06:11:03,514 [main] INFO org.apache.hadoop.yarn.client.RMProxy-Connecting to ResourceManager at TEST/192.168.1.124:8032 2016-11-20 06:11:03,646 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate-Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2016-11-20 06:11:49,363 [main] INFO org.apache.hadoop.yarn.client.RMProxy-Connecting to ResourceManager at TEST/192.168.1.124:8032 2016-11-20 06:11:49,434 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate-Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2016-11-20 06:11:49,883 [main] INFO org.apache.hadoop.yarn.client.RMProxy-Connecting to ResourceManager at TEST/192.168.1.124:8032 2016-11-20 06:11:49,910 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate-Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2016-11-20 06:11:50,354 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher-100% complete 2016-11-20 06:11:50,367 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats-Script Statistics: HadoopVersion PigVersion UserId StartedAt FinishedAt Features 2.6.0 0.16.0 chb 2016-11-20 06:02:42 2016-11-20 06:11:50 GROUP_BY,FILTER Success! Job Stats (time in seconds): JobId Maps Reduces MaxMapTime MinMapTime AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs job_1479576092520_0006 1 1 88 88 88 88 310 310 310 310 A,B,C,D GROUP_BY,COMBINER hdfs://192.168.1.124:9000/tmp/temp-60624248/tmp-1087782019, Input(s): Successfully read 321146 records (3674048 bytes) from: "/input/in1/ncdc_data.txt" Output(s): Successfully stored 43 records (430 bytes) in: "hdfs://192.168.1.124:9000/tmp/temp-60624248/tmp-1087782019" Counters: Total records written: 43 Total bytes written: 430 Spillable Memory Manager spill count: 0 Total bags proactively spilled: 0 Total records proactively spilled: 0 Job DAG: job_1479576092520_0006 2016-11-20 06:11:50,377 [main] INFO org.apache.hadoop.yarn.client.RMProxy-Connecting to ResourceManager at TEST/192.168.1.124:8032 2016-11-20 06:11:50,397 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate-Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2016-11-20 06:11:50,554 [main] INFO org.apache.hadoop.yarn.client.RMProxy-Connecting to ResourceManager at TEST/192.168.1.124:8032 2016-11-20 06:11:50,573 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate-Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2016-11-20 06:11:51,275 [main] INFO org.apache.hadoop.yarn.client.RMProxy-Connecting to ResourceManager at TEST/192.168.1.124:8032 2016-11-20 06:11:51,349 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate-Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2016-11-20 06:11:52,066 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher-Success! 2016-11-20 06:11:52,068 [main] INFO org.apache.hadoop.conf.Configuration.deprecation-fs.default.name is deprecated. Instead, use fs.defaultFS 2016-11-20 06:11:52,069 [main] INFO org.apache.hadoop.conf.Configuration.deprecation-mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address 2016-11-20 06:11:52,070 [main] INFO org.apache.pig.data.SchemaTupleBackend-Key [pig.schematuple] was not set... will not generate code. 2016-11-20 06:11:52,528 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat-Total input paths to process: 1 2016-11-20 06:11:52,528 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil-Total input paths to process: 1 (1901,317) (1902,261) (1903,278) (1904,194) (1905,278) (1906,283) (1907,300) (1908,322) (1909,350) (1910,322) (1911,322) (1912,411) (1913,361) (1914,378) (1915,411) (1916,289) (1917,478) (1918,450) (1919,428) (1920,344) (1921,417) (1922,400) (1923,394) (1924,456) (1925,322) (1926,411) (1928,161) (1929,178) (1930,311) (1931,450) (1932,322) (1933,411) (1934,300) (1935,311) (1936,389) (1937,339) (1938,411) (1939,433) (1940,433) (1941,462) (1942,278) (1949,367) (1953,400) grunt> Copy code

Store results to file

grunt> STORE D INTO 'max_temp' USING PigStorage ( ':'); duplicated code
2016-11-20 06:28:32,644 [main] INFO org.apache.hadoop.conf.Configuration.deprecation-fs.default.name is deprecated. Instead, use fs.defaultFS 2016-11-20 06:28:32,645 [main] INFO org.apache.hadoop.conf.Configuration.deprecation-mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address 2016-11-20 06:28:32,925 [main] INFO org.apache.hadoop.conf.Configuration.deprecation-mapred.textoutputformat.separator is deprecated. Instead, use mapreduce.output.textoutputformat.separator 2016-11-20 06:28:33,159 [main] INFO org.apache.pig.tools.pigstats.ScriptState-Pig features used in the script: GROUP_BY,FILTER 2016-11-20 06:28:33,444 [main] INFO org.apache.hadoop.conf.Configuration.deprecation-fs.default.name is deprecated. Instead, use fs.defaultFS 2016-11-20 06:28:33,444 [main] INFO org.apache.hadoop.conf.Configuration.deprecation-mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address 2016-11-20 06:28:33,447 [main] INFO org.apache.pig.data.SchemaTupleBackend-Key [pig.schematuple] was not set... will not generate code. 2016-11-20 06:28:33,448 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer-{RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserterOptimizer, PartitionFilterForEach, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]} 2016-11-20 06:28:33,496 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler-File concatenation threshold: 100 optimistic? false 2016-11-20 06:28:33,520 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.CombinerOptimizerUtil-Choosing to move algebraic foreach to combiner 2016-11-20 06:28:33,546 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer-MR plan size before optimization: 1 2016-11-20 06:28:33,546 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer-MR plan size after optimization: 1 2016-11-20 06:28:33,751 [main] INFO org.apache.hadoop.conf.Configuration.deprecation-fs.default.name is deprecated. Instead, use fs.defaultFS 2016-11-20 06:28:33,773 [main] INFO org.apache.hadoop.yarn.client.RMProxy-Connecting to ResourceManager at TEST/192.168.1.124:8032 2016-11-20 06:28:33,781 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState-Pig script settings are added to the job 2016-11-20 06:28:33,804 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler-mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2016-11-20 06:28:33,806 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler-Reduce phase detected, estimating # of required reducers. 2016-11-20 06:28:33,806 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler-Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator 2016-11-20 06:28:33,826 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator-BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=3673672 2016-11-20 06:28:33,826 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler-Setting Parallelism to 1 2016-11-20 06:28:33,826 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler-This job cannot be converted run in-process 2016-11-20 06:28:36,502 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler-Added jar file:/usr/local/pig/pig-0.16.0-core-h2 .jar to DistributedCache through/tmp/temp-60624248/tmp-1199985731/pig-0.16.0-core-h2.jar 2016-11-20 06:28:36,765 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler-Added jar file:/usr/local/pig/lib/automaton-1.11-8.jar to DistributedCache through/tmp/temp-60624248/tmp721246289/automaton-1.11-8.jar 2016-11-20 06:28:37,076 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler-Added jar file:/usr/local/pig/lib/antlr-runtime-3.4.jar to DistributedCache through/tmp/temp-60624248/tmp341502194/antlr-runtime-3.4.jar 2016-11-20 06:28:37,560 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler-Added jar file:/usr/local/pig/lib/joda-time-2.9.3 .jar to DistributedCache through/tmp/temp-60624248/tmp-587981636/joda-time-2.9.3.jar 2016-11-20 06:28:37,567 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler-Setting up single store job 2016-11-20 06:28:37,574 [main] INFO org.apache.pig.data.SchemaTupleFrontend-Key [pig.schematuple] is false, will not generate code. 2016-11-20 06:28:37,574 [main] INFO org.apache.pig.data.SchemaTupleFrontend-Starting process to move generated code to distributed cacche 2016-11-20 06:28:37,574 [main] INFO org.apache.pig.data.SchemaTupleFrontend-Setting key [pig.schematuple.classes] with classes to deserialize [] 2016-11-20 06:28:37,907 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher-1 map-reduce job(s) waiting for submission. 2016-11-20 06:28:37,943 [JobControl] INFO org.apache.hadoop.yarn.client.RMProxy-Connecting to ResourceManager at TEST/192.168.1.124:8032 2016-11-20 06:28:38,104 [JobControl] WARN org.apache.hadoop.mapreduce.JobSubmitter-No job jar file set. User classes may not be found. See Job or Job#setJar(String). 2016-11-20 06:28:38,208 [JobControl] INFO org.apache.pig.builtin.PigStorage-Using PigTextInputFormat 2016-11-20 06:28:38,233 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat-Total input paths to process: 1 2016-11-20 06:28:38,234 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil-Total input paths to process: 1 2016-11-20 06:28:38,249 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil-Total input paths (combined) to process: 1 2016-11-20 06:28:38,887 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter-number of splits:1 2016-11-20 06:28:39,586 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter-Submitting tokens for job: job_1479576092520_0007 2016-11-20 06:28:39,610 [JobControl] INFO org.apache.hadoop.mapred.YARNRunner-Job jar is not present. Not adding any jar to the list of resources. 2016-11-20 06:28:39,843 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl-Submitted application application_1479576092520_0007 2016-11-20 06:28:39,945 [JobControl] INFO org.apache.hadoop.mapreduce.Job-The url to track the job: http://TEST:8088/proxy/application_1479576092520_0007/ 2016-11-20 06:28:39,945 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher-HadoopJobId: job_1479576092520_0007 2016-11-20 06:28:39,947 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher-Processing aliases A,B,C,D 2016-11-20 06:28:39,947 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher-detailed locations: M: A[5,4],A[-1,-1] ,B[6,4],D[8,4],C[7,4] C: D[8,4],C[7,4] R: D[8,4] 2016-11-20 06:28:40,011 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher-0% complete 2016-11-20 06:28:40,011 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher-Running jobs are [job_1479576092520_0007] 2016-11-20 06:30:39,691 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher-9% complete 2016-11-20 06:30:39,704 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher-Running jobs are [job_1479576092520_0007] 2016-11-20 06:30:44,340 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher-33% complete 2016-11-20 06:30:44,340 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher-Running jobs are [job_1479576092520_0007] 2016-11-20 06:30:54,464 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher-50% complete 2016-11-20 06:30:54,465 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher-Running jobs are [job_1479576092520_0007] 2016-11-20 06:32:23,937 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher-83% complete 2016-11-20 06:32:23,937 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher-Running jobs are [job_1479576092520_0007] 2016-11-20 06:32:29,164 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher-Running jobs are [job_1479576092520_0007] 2016-11-20 06:32:51,921 [main] INFO org.apache.hadoop.yarn.client.RMProxy-Connecting to ResourceManager at TEST/192.168.1.124:8032 2016-11-20 06:32:52,670 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate-Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2016-11-20 06:33:02,007 [main] INFO org.apache.hadoop.yarn.client.RMProxy-Connecting to ResourceManager at TEST/192.168.1.124:8032 2016-11-20 06:33:02,123 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate-Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2016-11-20 06:33:02,537 [main] INFO org.apache.hadoop.yarn.client.RMProxy-Connecting to ResourceManager at TEST/192.168.1.124:8032 2016-11-20 06:33:02,561 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate-Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2016-11-20 06:33:02,822 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher-100% complete 2016-11-20 06:33:02,824 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats-Script Statistics: HadoopVersion PigVersion UserId StartedAt FinishedAt Features 2.6.0 0.16.0 chb 2016-11-20 06:28:33 2016-11-20 06:33:02 GROUP_BY,FILTER Success! Job Stats (time in seconds): JobId Maps Reduces MaxMapTime MinMapTime AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs job_1479576092520_0007 1 1 35 35 35 35 97 97 97 97 A,B,C,D GROUP_BY,COMBINER hdfs://192.168.1.124:9000/user/chb/max_temp, Input(s): Successfully read 321146 records (3674048 bytes) from: "/input/in1/ncdc_data.txt" Output(s): Successfully stored 43 records (387 bytes) in: "hdfs://192.168.1.124:9000/user/chb/max_temp" Counters: Total records written: 43 Total bytes written: 387 Spillable Memory Manager spill count: 0 Total bags proactively spilled: 0 Total records proactively spilled: 0 Job DAG: job_1479576092520_0007 2016-11-20 06:33:02,847 [main] INFO org.apache.hadoop.yarn.client.RMProxy-Connecting to ResourceManager at TEST/192.168.1.124:8032 2016-11-20 06:33:02,884 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate-Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2016-11-20 06:33:03,175 [main] INFO org.apache.hadoop.yarn.client.RMProxy-Connecting to ResourceManager at TEST/192.168.1.124:8032 2016-11-20 06:33:03,209 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate-Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2016-11-20 06:33:03,469 [main] INFO org.apache.hadoop.yarn.client.RMProxy-Connecting to ResourceManager at TEST/192.168.1.124:8032 2016-11-20 06:33:03,491 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate-Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2016-11-20 06:33:03,725 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher-Success! grunt> Copy code

View Results:

grunt> cat max_temp copy the code
1901:317 1902:261 1903:278 1904:194 1905:278 1906:283 1907:300 1908:322 1909:350 1910:322 1911:322 1912:411 1913:361 1914:378 1915:411 1916:289 1917:478 1918:450 1919:428 1920:344 1921:417 1922:400 1923:394 1924:456 1925:322 1926:411 1928:161 1929:178 1930:311 1931:450 1932:322 1933:411 1934:300 1935:311 1936:389 1937:339 1938:411 1939:433 1940:433 1941:462 1942:278 1949:367 1953:400 grunt> Copy code