Complex mapping question

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Complex mapping question

I have the following input file:

Tx ID , Dest Node ID, Original Tx ID, Amount 

for every line with Original Tx ID we will find a line with the same Tx ID
if Tx are edges going into nodes then every edge going out from a node had a previous edge going in.
Sample Data:
Tx1, node A, null , 100
Tx2, node B, Tx1, 50
Tx3, node C, Tx1, 50
Tx4, node B, null, 100
Tx5, node C, Tx4, 75
Tx6, node B, Tx4, 25

I want to build a spark program that build a file with the following structure:

Source Node, Tx ID edge , Dest Node

Sample Data:
ROOT    Tx1,     A,     100
A,          Tx2,     B,     50
A,          Tx3,     C,     50
ROOT    Tx4,      B,    100
B,          Tx5,      C,    75
B,          Tx6,      B,    25

The logic needs to be implemented here is:
for each node (N -> , 
           for each row where N is the dest node ( row -> 
                write: N , Row.TxID, Row.Node, Row.Amount))

Any idea how to do I do it using Spark? 

Eran | CTO