File not found exceptions on S3 while running spark jobs

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

File not found exceptions on S3 while running spark jobs

Nagendra Darla
Hello All,
I am converting existing parquet table (size: 50GB) into Delta format. It took around 1hr 45 mins to convert.
And I see that there are lot of FileNotFoundExceptions in the logs
Caused by: java.io.FileNotFoundException: No such file or directory: s3a://old-data/delta-data/PL1/output/denorm_table/part-00031-183e54ef-50bc-46fc-83a3-7836baa28f86-c000.snappy.parquet
How do I fix these errors? I am using below options in spark-submit command
spark-submit --packages io.delta:delta-core_2.11:0.6.0,org.apache.hadoop:hadoop-aws:2.8.5 --conf spark.delta.logStore.class=org.apache.spark.sql.delta.storage.S3SingleDriverLogStore --class Pipeline1 Pipeline.jar
Thank You,
Nagendra Darla
Reply | Threaded
Open this post in threaded view
|

Re: File not found exceptions on S3 while running spark jobs

Hulio andres
https://examples.javacodegeeks.com/java-io-filenotfoundexception-how-to-solve-file-not-found-exception/

Are you a programmer   ?

Regards,

Hulio



> Sent: Friday, July 17, 2020 at 2:41 AM
> From: "Nagendra Darla" <[hidden email]>
> To: [hidden email]
> Subject: File not found exceptions on S3 while running spark jobs
>
> Hello All,
> I am converting existing parquet table (size: 50GB) into Delta format. It
> took around 1hr 45 mins to convert.
> And I see that there are lot of FileNotFoundExceptions in the logs
>
> Caused by: java.io.FileNotFoundException: No such file or directory:
> s3a://old-data/delta-data/PL1/output/denorm_table/part-00031-183e54ef-50bc-46fc-83a3-7836baa28f86-c000.snappy.parquet
>
> *How do I fix these errors?* I am using below options in spark-submit
> command
>
> spark-submit --packages
> io.delta:delta-core_2.11:0.6.0,org.apache.hadoop:hadoop-aws:2.8.5
> --conf spark.delta.logStore.class=org.apache.spark.sql.delta.storage.S3SingleDriverLogStore
> --class Pipeline1 Pipeline.jar
>
> Thank You,
> Nagendra Darla
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: File not found exceptions on S3 while running spark jobs

Nagendra Darla
Hi, 

Thanks I know about FileNotFound Exception.

This error is with S3 buckets which has a delay in showing newly created files. These files eventually shows up after some time. 

These errors are coming up while running a parquet table into Delta table.

My question is more around avoiding this error with spark jobs which create / updates / deletes lots of files on S3 buckets. 

On Thu, Jul 16, 2020 at 10:28 PM Hulio andres <[hidden email]> wrote:
https://examples.javacodegeeks.com/java-io-filenotfoundexception-how-to-solve-file-not-found-exception/

Are you a programmer   ?

Regards,

Hulio



> Sent: Friday, July 17, 2020 at 2:41 AM
> From: "Nagendra Darla" <[hidden email]>
> To: [hidden email]
> Subject: File not found exceptions on S3 while running spark jobs
>
> Hello All,
> I am converting existing parquet table (size: 50GB) into Delta format. It
> took around 1hr 45 mins to convert.
> And I see that there are lot of FileNotFoundExceptions in the logs
>
> Caused by: java.io.FileNotFoundException: No such file or directory:
> s3a://old-data/delta-data/PL1/output/denorm_table/part-00031-183e54ef-50bc-46fc-83a3-7836baa28f86-c000.snappy.parquet
>
> *How do I fix these errors?* I am using below options in spark-submit
> command
>
> spark-submit --packages
> io.delta:delta-core_2.11:0.6.0,org.apache.hadoop:hadoop-aws:2.8.5
> --conf spark.delta.logStore.class=org.apache.spark.sql.delta.storage.S3SingleDriverLogStore
> --class Pipeline1 Pipeline.jar
>
> Thank You,
> Nagendra Darla
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

--
Sent from iPhone
Reply | Threaded
Open this post in threaded view
|

Re: File not found exceptions on S3 while running spark jobs

Slava Rodionov
Hi
those are only my thoughts, not a solution, hope they may help you.

First of all, we need a full stacktrace not just an exception to make a conclusion.
I see you're using s3a. Where do you run your job? Is that EMR? Normally you need to make S3 more consistent first to make it usable. This means using some consistency laye, e.g. via emrfs on EMR or S3Guard on vanilla Hadoop. Databricks is using DBFS for that purpose and there are some others. I'm not sure that Delta Lake can work with S3 directly without such a layer even though I see that they're trying to do that in their code.

Best regards,
Viacheslav Rodionov


On Fri, 17 Jul 2020, 18:03 Nagendra Darla, <[hidden email]> wrote:
Hi, 

Thanks I know about FileNotFound Exception.

This error is with S3 buckets which has a delay in showing newly created files. These files eventually shows up after some time. 

These errors are coming up while running a parquet table into Delta table.

My question is more around avoiding this error with spark jobs which create / updates / deletes lots of files on S3 buckets. 

On Thu, Jul 16, 2020 at 10:28 PM Hulio andres <[hidden email]> wrote:
https://examples.javacodegeeks.com/java-io-filenotfoundexception-how-to-solve-file-not-found-exception/

Are you a programmer   ?

Regards,

Hulio



> Sent: Friday, July 17, 2020 at 2:41 AM
> From: "Nagendra Darla" <[hidden email]>
> To: [hidden email]
> Subject: File not found exceptions on S3 while running spark jobs
>
> Hello All,
> I am converting existing parquet table (size: 50GB) into Delta format. It
> took around 1hr 45 mins to convert.
> And I see that there are lot of FileNotFoundExceptions in the logs
>
> Caused by: java.io.FileNotFoundException: No such file or directory:
> s3a://old-data/delta-data/PL1/output/denorm_table/part-00031-183e54ef-50bc-46fc-83a3-7836baa28f86-c000.snappy.parquet
>
> *How do I fix these errors?* I am using below options in spark-submit
> command
>
> spark-submit --packages
> io.delta:delta-core_2.11:0.6.0,org.apache.hadoop:hadoop-aws:2.8.5
> --conf spark.delta.logStore.class=org.apache.spark.sql.delta.storage.S3SingleDriverLogStore
> --class Pipeline1 Pipeline.jar
>
> Thank You,
> Nagendra Darla
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

--
Sent from iPhone
Reply | Threaded
Open this post in threaded view
|

Re: File not found exceptions on S3 while running spark jobs

Hulio andres
In reply to this post by Nagendra Darla
Most likely directory write permission not permission.
The app user doesn't have permission to write files to that directory.


> Sent: Friday, July 17, 2020 at 6:03 PM
> From: "Nagendra Darla" <[hidden email]>
> To: "Hulio andres" <[hidden email]>
> Cc: [hidden email]
> Subject: Re: File not found exceptions on S3 while running spark jobs
>
> Hi,
>
> Thanks I know about FileNotFound Exception.
>
> This error is with S3 buckets which has a delay in showing newly created
> files. These files eventually shows up after some time.
>
> These errors are coming up while running a parquet table into Delta table.
>
> My question is more around avoiding this error with spark jobs which create
> / updates / deletes lots of files on S3 buckets.
>
> On Thu, Jul 16, 2020 at 10:28 PM Hulio andres <[hidden email]> wrote:
>
> >
> > https://examples.javacodegeeks.com/java-io-filenotfoundexception-how-to-solve-file-not-found-exception/
> >
> > Are you a programmer   ?
> >
> > Regards,
> >
> > Hulio
> >
> >
> >
> > > Sent: Friday, July 17, 2020 at 2:41 AM
> > > From: "Nagendra Darla" <[hidden email]>
> > > To: [hidden email]
> > > Subject: File not found exceptions on S3 while running spark jobs
> > >
> > > Hello All,
> > > I am converting existing parquet table (size: 50GB) into Delta format. It
> > > took around 1hr 45 mins to convert.
> > > And I see that there are lot of FileNotFoundExceptions in the logs
> > >
> > > Caused by: java.io.FileNotFoundException: No such file or directory:
> > >
> > s3a://old-data/delta-data/PL1/output/denorm_table/part-00031-183e54ef-50bc-46fc-83a3-7836baa28f86-c000.snappy.parquet
> > >
> > > *How do I fix these errors?* I am using below options in spark-submit
> > > command
> > >
> > > spark-submit --packages
> > > io.delta:delta-core_2.11:0.6.0,org.apache.hadoop:hadoop-aws:2.8.5
> > > --conf
> > spark.delta.logStore.class=org.apache.spark.sql.delta.storage.S3SingleDriverLogStore
> > > --class Pipeline1 Pipeline.jar
> > >
> > > Thank You,
> > > Nagendra Darla
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe e-mail: [hidden email]
> >
> > --
> Sent from iPhone
>

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]