DataSource API v2 & Spark-SQL

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

DataSource API v2 & Spark-SQL

Lavelle, Shawn

Hello Spark community,

   I have a custom datasource in v1 API that I’m trying to port to v2 API, in Java.  Currently I have a DataSource registered via catalog.createTable(name, <package>, schema, options map).  When trying to do this in data source API v2, I get an error saying my class (package) isn’t a valid data source Can you help me out?

 

Spark versions are 3.0.0 w/scala 2.12, artifacts are Spark-core, spark-sql, spark-hive, spark-hive-thriftserver, spark-catalyst

 

Here’s what the dataSource definition:  public class LogTableSource implements  TableProvider,  SupportsRead,  DataSourceRegister, Serializable

 

I’m guessing that I am missing one of the required interfaces. Note, I did try this with using the LogTableSource below as “DefaultSource” but the behavior is the same.  Also, I keep reading about a DataSourceV2 Marker Interface, but it seems deprecated?

 

Also, I tried to add DataSourceV2ScanRelation but that won’t compile:

Output() in DataSourceV2ScanRelation cannot override Output() in QueryPlan return type Seq<AttributeReference> is not compatible with Seq<Attribute>

 

  I’m fairly stumped – everything I’ve read online says there’s a marker interface of some kind and yet I can’t find it in my package list.

 

  Looking forward to hearing from you,

 

~ Shawn


 

 

 
 
 
OSI
Shawn Lavelle

Software Development
 
4101 Arrowhead Drive
Medina, Minnesota 55340-9457
Phone: 763 551 0559
Email: [hidden email]
Website: www.osii.com
Reply | Threaded
Open this post in threaded view
|

Re: DataSource API v2 & Spark-SQL

Russell Spitzer
That's a bad error message. Basically you can't make a spark native catalog reference for a dsv2 source. You have to use that Datasources catalog or use the programmatic API. Both dsv1 and dsv2 programattic apis work (plus or minus some options)

On Mon, Aug 3, 2020, 7:28 AM Lavelle, Shawn <[hidden email]> wrote:

Hello Spark community,

   I have a custom datasource in v1 API that I’m trying to port to v2 API, in Java.  Currently I have a DataSource registered via catalog.createTable(name, <package>, schema, options map).  When trying to do this in data source API v2, I get an error saying my class (package) isn’t a valid data source Can you help me out?

 

Spark versions are 3.0.0 w/scala 2.12, artifacts are Spark-core, spark-sql, spark-hive, spark-hive-thriftserver, spark-catalyst

 

Here’s what the dataSource definition:  public class LogTableSource implements  TableProvider,  SupportsRead,  DataSourceRegister, Serializable

 

I’m guessing that I am missing one of the required interfaces. Note, I did try this with using the LogTableSource below as “DefaultSource” but the behavior is the same.  Also, I keep reading about a DataSourceV2 Marker Interface, but it seems deprecated?

 

Also, I tried to add DataSourceV2ScanRelation but that won’t compile:

Output() in DataSourceV2ScanRelation cannot override Output() in QueryPlan return type Seq<AttributeReference> is not compatible with Seq<Attribute>

 

  I’m fairly stumped – everything I’ve read online says there’s a marker interface of some kind and yet I can’t find it in my package list.

 

  Looking forward to hearing from you,

 

~ Shawn


 

 

 
 
 
OSI
Shawn Lavelle

Software Development
 
4101 Arrowhead Drive
Medina, Minnesota 55340-9457
Phone: 763 551 0559
Email: [hidden email]
Website: www.osii.com
Reply | Threaded
Open this post in threaded view
|

RE: DataSource API v2 & Spark-SQL

Lavelle, Shawn

Thanks for clarifying, Russel.  Is spark native catalog reference on the roadmap for dsv2 or should I be trying to use something else?

~ Shawn

 

From: Russell Spitzer [mailto:[hidden email]]
Sent: Monday, August 3, 2020 8:27 AM
To: Lavelle, Shawn <[hidden email]>
Cc: user <[hidden email]>
Subject: Re: DataSource API v2 & Spark-SQL

 

<<<< EXTERNAL email. Do not open links or attachments unless you recognize the sender. If suspicious report here. >>>>

 

That's a bad error message. Basically you can't make a spark native catalog reference for a dsv2 source. You have to use that Datasources catalog or use the programmatic API. Both dsv1 and dsv2 programattic apis work (plus or minus some options)

 

On Mon, Aug 3, 2020, 7:28 AM Lavelle, Shawn <[hidden email]> wrote:

Hello Spark community,

   I have a custom datasource in v1 API that I’m trying to port to v2 API, in Java.  Currently I have a DataSource registered via catalog.createTable(name, <package>, schema, options map).  When trying to do this in data source API v2, I get an error saying my class (package) isn’t a valid data source Can you help me out?

 

Spark versions are 3.0.0 w/scala 2.12, artifacts are Spark-core, spark-sql, spark-hive, spark-hive-thriftserver, spark-catalyst

 

Here’s what the dataSource definition:  public class LogTableSource implements  TableProvider,  SupportsRead,  DataSourceRegister, Serializable

 

I’m guessing that I am missing one of the required interfaces. Note, I did try this with using the LogTableSource below as “DefaultSource” but the behavior is the same.  Also, I keep reading about a DataSourceV2 Marker Interface, but it seems deprecated?

 

Also, I tried to add DataSourceV2ScanRelation but that won’t compile:

Output() in DataSourceV2ScanRelation cannot override Output() in QueryPlan return type Seq<AttributeReference> is not compatible with Seq<Attribute>

 

  I’m fairly stumped – everything I’ve read online says there’s a marker interface of some kind and yet I can’t find it in my package list.

 

  Looking forward to hearing from you,

 

~ Shawn

 

 

 
 
 

Image removed by sender. OSI

Shawn Lavelle

Software Development
 
4101 Arrowhead Drive
Medina, Minnesota 55340-9457
Phone: 763 551 0559
Email: [hidden email]
Website: www.osii.com