How Fault Tolerance is achieved in Spark ??

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

How Fault Tolerance is achieved in Spark ??

NikhilP

Hello Techie’s,

 

How fault tolerance is achieved in Spark when data is read from HDFS and is in form of RDD (Memory).

 

Regards

Nikhil


"Confidentiality Warning: This message and any attachments are intended only for the use of the intended recipient(s), are confidential and may be privileged. If you are not the intended recipient, you are hereby notified that any review, re-transmission, conversion to hard copy, copying, circulation or other use of this message and any attachments is strictly prohibited. If you are not the intended recipient, please notify the sender immediately by return email and delete this message and any attachments from your system.

Virus Warning: Although the company has taken reasonable precautions to ensure no viruses are present in this email. The company cannot accept responsibility for any loss or damage arising from the use of this email or attachment."

Reply | Threaded
Open this post in threaded view
|

Re: How Fault Tolerance is achieved in Spark ??

Naresh Dulam
Hi Nikhil,


Fault tolerance is something which is not lost incase of failures. Fault tolerance achieved in different way in case of different cases.
In case of HDFS fault tolerance is achieved by having the replication across different nodes. 
In case of spark fault tolerance is achieved by having DAG.  Let me put in simple words
 You have created RDD1 by reading data from HDFS. Applied couple of transformations and created two new data frames 

RDD1-->RDD2--> RDD3.

Let's assume now you have cached RDD3 and for after some time for some reason RDD3 cleared from cache from to provide space for new RDD4 created and cached.  

Now if you wanted to acccess RDD3 which is not available in cache. So now Spark will use the DAG to compute RDD3. So in this way Data in RDD3 always available. 


Hope this answer your question in straight way.

Thank you,
Naresh 


On Tue, Dec 12, 2017 at 12:51 AM <[hidden email]> wrote:

Hello Techie’s,

 

How fault tolerance is achieved in Spark when data is read from HDFS and is in form of RDD (Memory).

 

Regards

Nikhil


"Confidentiality Warning: This message and any attachments are intended only for the use of the intended recipient(s), are confidential and may be privileged. If you are not the intended recipient, you are hereby notified that any review, re-transmission, conversion to hard copy, copying, circulation or other use of this message and any attachments is strictly prohibited. If you are not the intended recipient, please notify the sender immediately by return email and delete this message and any attachments from your system.

Virus Warning: Although the company has taken reasonable precautions to ensure no viruses are present in this email. The company cannot accept responsibility for any loss or damage arising from the use of this email or attachment."