How to use 'insert overwrite [local] directory' correctly?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

How to use 'insert overwrite [local] directory' correctly?

Bang Xiao
Spark-2.3.0 support INSERT OVERWRITE DIRECTORY to directly write data into
the filesystem from a query.

I have met a problem with sql

 "INSERT OVERWRITE  DIRECTORY '/tmp/test-insert-spark' select vrid, query,
url, loc_city from custom.common_wap_vr where logdate >= '2018073000' and
logdate <= '2018073023' and vrid = '11000801' group by vrid,query,
loc_city,url;"

this will create a empty file /tmp/test-insert-spark in hdfs, rather than a
directory

but if a add 'using json' in sql
"INSERT OVERWRITE  DIRECTORY '/tmp/test-insert-spark'  using json select
vrid, query, url, loc_city from custom.common_wap_vr where logdate >=
'2018073000' and logdate <= '2018073023' and vrid = '11000801' group by
vrid,query, loc_city,url;"

this wil create /tmp/test-insert-spark directory correctly and  output json
files in it.

Is this because I am using it in the wrong way?  Do we have a detailed
introduction to how to use it?
 



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: How to use 'insert overwrite [local] directory' correctly?

Bang Xiao
 Spark needs to create a directory first, while hive can automatically create
directory.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: How to use 'insert overwrite [local] directory' correctly?

Bang Xiao
In reply to this post by Bang Xiao
solve the problem by create directory on hdfs before execute the sql.
but i met a new error when i use  :

INSERT OVERWRITE LOCAL DIRECTORY '/search/odin/test' row format delimited
FIELDS TERMINATED BY '\t' select vrid, query, url, loc_city from
custom.common_wap_vr where logdate >= '2018073000' and logdate <=
'2018073023' and vrid = '11000801' group by vrid,query, loc_city,url;

spark command is : spark-sql --master yarn --deploy-mode client -f test.sql

here is the stack track:
18/08/27 17:16:21 ERROR util.Utils: Aborting task
org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException:
Mkdirs failed to create
file:/user/hive/datadir-tmp_hive_2018-08-27_17-14-45_908_2829491226961893146-1/-ext-10000/_temporary/0/_temporary/attempt_20180827171619_0002_m_000000_0
(exists=false,
cwd=file:/search/hadoop09/yarn_local/usercache/ultraman/appcache/application_1535079600137_133521/container_e09_1535079600137_133521_01_000051)
        at
org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:249)
        at
org.apache.spark.sql.hive.execution.HiveOutputWriter.<init>(HiveFileFormat.scala:123)
        at
org.apache.spark.sql.hive.execution.HiveFileFormat$$anon$1.newInstance(HiveFileFormat.scala:103)
        at
org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.newOutputWriter(FileFormatWriter.scala:367)
        at
org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:378)
        at
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:269)
        at
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:267)
        at
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1414)
        at
org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:272)
        at
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:197)
        at
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:196)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:109)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Mkdirs failed to create
file:/user/hive/datadir-tmp_hive_2018-08-27_17-14-45_908_2829491226961893146-1/-ext-10000/_temporary/0/_temporary/attempt_20180827171619_0002_m_000000_0
(exists=false,
cwd=file:/search/hadoop09/yarn_local/usercache/ultraman/appcache/application_1535079600137_133521/container_e09_1535079600137_133521_01_000051)
        at
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:447)
        at
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:433)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:925)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:818)
        at
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat.getHiveRecordWriter(HiveIgnoreKeyTextOutputFormat.java:80)
        at
org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:261)
        at
org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:246)
        ... 16 more



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: How to use 'insert overwrite [local] directory' correctly?

Xiao Li
Open a JIRA?

Bang Xiao <[hidden email]> 于2018年8月27日周一 上午2:46写道:
solve the problem by create directory on hdfs before execute the sql.
but i met a new error when i use  :

INSERT OVERWRITE LOCAL DIRECTORY '/search/odin/test' row format delimited
FIELDS TERMINATED BY '\t' select vrid, query, url, loc_city from
custom.common_wap_vr where logdate >= '2018073000' and logdate <=
'2018073023' and vrid = '11000801' group by vrid,query, loc_city,url;

spark command is : spark-sql --master yarn --deploy-mode client -f test.sql

here is the stack track:
18/08/27 17:16:21 ERROR util.Utils: Aborting task
org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException:
Mkdirs failed to create
file:/user/hive/datadir-tmp_hive_2018-08-27_17-14-45_908_2829491226961893146-1/-ext-10000/_temporary/0/_temporary/attempt_20180827171619_0002_m_000000_0
(exists=false,
cwd=file:/search/hadoop09/yarn_local/usercache/ultraman/appcache/application_1535079600137_133521/container_e09_1535079600137_133521_01_000051)
        at
org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:249)
        at
org.apache.spark.sql.hive.execution.HiveOutputWriter.<init>(HiveFileFormat.scala:123)
        at
org.apache.spark.sql.hive.execution.HiveFileFormat$$anon$1.newInstance(HiveFileFormat.scala:103)
        at
org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.newOutputWriter(FileFormatWriter.scala:367)
        at
org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:378)
        at
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:269)
        at
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:267)
        at
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1414)
        at
org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:272)
        at
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:197)
        at
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:196)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:109)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Mkdirs failed to create
file:/user/hive/datadir-tmp_hive_2018-08-27_17-14-45_908_2829491226961893146-1/-ext-10000/_temporary/0/_temporary/attempt_20180827171619_0002_m_000000_0
(exists=false,
cwd=file:/search/hadoop09/yarn_local/usercache/ultraman/appcache/application_1535079600137_133521/container_e09_1535079600137_133521_01_000051)
        at
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:447)
        at
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:433)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:925)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:818)
        at
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat.getHiveRecordWriter(HiveIgnoreKeyTextOutputFormat.java:80)
        at
org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:261)
        at
org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:246)
        ... 16 more



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]