S3-SQS vs Auto Loader With Apache Spark Structured Streaming

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

S3-SQS vs Auto Loader With Apache Spark Structured Streaming

Rachana Srivastava

Problem Statement: I want to read files from S3 write files to s3 using Spark Structured Streaming. I looked at the reference architecture recommended by Spark team that recommends using S3 -> SNS -> SQS using S3-SQS file source.

Question:

  1. S3-SQS file source: Is S3-SQS file source available in Apache Spark? Do we need to use apache Bahir's SQS implementation https://github.com/apache/bahir/tree/master/sql-streaming-sqs
  2. Auto Loader: This article recommends that we should use Auto Loader. Is Auto Loader available from Apache Spark https://docs.databricks.com/spark/latest/structured-streaming/sqs.html