Shark does not give any results with SELECT count(*) command

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Shark does not give any results with SELECT count(*) command

qingyang li
Hi, i install spark0.9.0 and shark0.9 on 3 nodes , when i run select * from src , i can get result, but when i run select count(*) from src or select * from src limit 1,  there is no result output.

i have found similiar problem on google groups:
https://groups.google.com/forum/#!searchin/spark-users/Shark$20does$20not$20give$20any$20results$20with$20SELECT$20command/spark-users/oKMBPBWim0U/_hbDCi4m-xUJ
but , there is no solution on it.

Does anyone encounter such problem?
Reply | Threaded
Open this post in threaded view
|

Re: Shark does not give any results with SELECT count(*) command

qingyang li
have found the cause , my problem is : 
the style of file salves is not correct, so the task only be run on master.

explain here to help other guy who also encounter similiar problem.


2014-03-20 9:57 GMT+08:00 qingyang li <[hidden email]>:
Hi, i install spark0.9.0 and shark0.9 on 3 nodes , when i run select * from src , i can get result, but when i run select count(*) from src or select * from src limit 1,  there is no result output.

i have found similiar problem on google groups:
https://groups.google.com/forum/#!searchin/spark-users/Shark$20does$20not$20give$20any$20results$20with$20SELECT$20command/spark-users/oKMBPBWim0U/_hbDCi4m-xUJ
but , there is no solution on it.

Does anyone encounter such problem?

Reply | Threaded
Open this post in threaded view
|

Re: Shark does not give any results with SELECT count(*) command

qingyang li
reopen this thread because i encounter this problem again.
Here is my env:
scala 2.10.3 s 
spark 0.9.0    tandalone mode
shark 0.9.0    downlaod the source code and build by myself
hive hive-shark-0.11
I have copied hive-site.xml from my hadoop cluster , it's hive version is 0.12,  after copied , i deleted some attributes from hive-site.xml

When run select count(*) from xxx, no resut and no errors output.

Can someone give me some suggestions to debug ?





2014-03-20 11:27 GMT+08:00 qingyang li <[hidden email]>:
have found the cause , my problem is : 
the style of file salves is not correct, so the task only be run on master.

explain here to help other guy who also encounter similiar problem.


2014-03-20 9:57 GMT+08:00 qingyang li <[hidden email]>:

Hi, i install spark0.9.0 and shark0.9 on 3 nodes , when i run select * from src , i can get result, but when i run select count(*) from src or select * from src limit 1,  there is no result output.

i have found similiar problem on google groups:
https://groups.google.com/forum/#!searchin/spark-users/Shark$20does$20not$20give$20any$20results$20with$20SELECT$20command/spark-users/oKMBPBWim0U/_hbDCi4m-xUJ
but , there is no solution on it.

Does anyone encounter such problem?


Reply | Threaded
Open this post in threaded view
|

Re: Shark does not give any results with SELECT count(*) command

Praveen R
Hi Qingyang Li,

Shark-0.9.0 uses a patched version of hive-0.11 and using configuration/metastore of hive-0.12 could be incompatible.

May I know the reason you are using hive-site.xml from previous hive version(to use existing metastore?). You might just leave hive-site.xml blank, otherwise. Something like this:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
</configuration>

In any case you could run ./bin/shark-withdebug for any errors.

Regards,
Praveen

On 25-Mar-2014, at 1:49 pm, qingyang li <[hidden email]> wrote:

reopen this thread because i encounter this problem again.
Here is my env:
scala 2.10.3 s 
spark 0.9.0    tandalone mode
shark 0.9.0    downlaod the source code and build by myself
hive hive-shark-0.11
I have copied hive-site.xml from my hadoop cluster , it's hive version is 0.12,  after copied , i deleted some attributes from hive-site.xml

When run select count(*) from xxx, no resut and no errors output.

Can someone give me some suggestions to debug ?





2014-03-20 11:27 GMT+08:00 qingyang li <[hidden email]>:
have found the cause , my problem is : 
the style of file salves is not correct, so the task only be run on master.

explain here to help other guy who also encounter similiar problem.


2014-03-20 9:57 GMT+08:00 qingyang li <[hidden email]>:

Hi, i install spark0.9.0 and shark0.9 on 3 nodes , when i run select * from src , i can get result, but when i run select count(*) from src or select * from src limit 1,  there is no result output.

i have found similiar problem on google groups:
https://groups.google.com/forum/#!searchin/spark-users/Shark$20does$20not$20give$20any$20results$20with$20SELECT$20command/spark-users/oKMBPBWim0U/_hbDCi4m-xUJ
but , there is no solution on it.

Does anyone encounter such problem?



Reply | Threaded
Open this post in threaded view
|

Re: Shark does not give any results with SELECT count(*) command

qingyang li
hi, Praveen, thanks for replying.

I am using hive-0.11 which comes from amplab,  at the begining , the  hive-site.xml of amplab is empty, so , i copy one hive-site.xml from my cluster and then remove some attributes and aslo add some atrributs.

i think it is not the reason for my problem, 
i think the reason is shark is runing on local mode , not cluster mode,  when i run bin/shark on bigdata001, it certainly can not get the result which exist on bigdata003.   while i run bin/shark on bigdata003, i can get result.

though it is the reason, i still can not understand why the result is on bigdata003(master is bigdata001)?




2014-03-25 18:41 GMT+08:00 Praveen R <[hidden email]>:
Hi Qingyang Li,

Shark-0.9.0 uses a patched version of hive-0.11 and using configuration/metastore of hive-0.12 could be incompatible.

May I know the reason you are using hive-site.xml from previous hive version(to use existing metastore?). You might just leave hive-site.xml blank, otherwise. Something like this:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
</configuration>

In any case you could run ./bin/shark-withdebug for any errors.

Regards,
Praveen

On 25-Mar-2014, at 1:49 pm, qingyang li <[hidden email]> wrote:

reopen this thread because i encounter this problem again.
Here is my env:
scala 2.10.3 s 
spark 0.9.0    tandalone mode
shark 0.9.0    downlaod the source code and build by myself
hive hive-shark-0.11
I have copied hive-site.xml from my hadoop cluster , it's hive version is 0.12,  after copied , i deleted some attributes from hive-site.xml

When run select count(*) from xxx, no resut and no errors output.

Can someone give me some suggestions to debug ?





2014-03-20 11:27 GMT+08:00 qingyang li <[hidden email]>:
have found the cause , my problem is : 
the style of file salves is not correct, so the task only be run on master.

explain here to help other guy who also encounter similiar problem.


2014-03-20 9:57 GMT+08:00 qingyang li <[hidden email]>:

Hi, i install spark0.9.0 and shark0.9 on 3 nodes , when i run select * from src , i can get result, but when i run select count(*) from src or select * from src limit 1,  there is no result output.

i have found similiar problem on google groups:
https://groups.google.com/forum/#!searchin/spark-users/Shark$20does$20not$20give$20any$20results$20with$20SELECT$20command/spark-users/oKMBPBWim0U/_hbDCi4m-xUJ
but , there is no solution on it.

Does anyone encounter such problem?




Reply | Threaded
Open this post in threaded view
|

Re: Shark does not give any results with SELECT count(*) command

Praveen R-2
Oh k. You must be running shark server on bigdata001 to use it from other machines.
./bin/shark --service sharkserver  # runs shark server on port 10000

You could connect to shark server as ./bin/shark -h <bigdata001>, this should work unless there is a firewall blocking it. You might use telnet bigdata001 10000 from bigdata003 to check if port is accessible. Hope that helps.


On Wed, Mar 26, 2014 at 12:57 PM, qingyang li <[hidden email]> wrote:
hi, Praveen, thanks for replying.

I am using hive-0.11 which comes from amplab,  at the begining , the  hive-site.xml of amplab is empty, so , i copy one hive-site.xml from my cluster and then remove some attributes and aslo add some atrributs.

i think it is not the reason for my problem, 
i think the reason is shark is runing on local mode , not cluster mode,  when i run bin/shark on bigdata001, it certainly can not get the result which exist on bigdata003.   while i run bin/shark on bigdata003, i can get result.

though it is the reason, i still can not understand why the result is on bigdata003(master is bigdata001)?




2014-03-25 18:41 GMT+08:00 Praveen R <[hidden email]>:

Hi Qingyang Li,

Shark-0.9.0 uses a patched version of hive-0.11 and using configuration/metastore of hive-0.12 could be incompatible.

May I know the reason you are using hive-site.xml from previous hive version(to use existing metastore?). You might just leave hive-site.xml blank, otherwise. Something like this:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
</configuration>

In any case you could run ./bin/shark-withdebug for any errors.

Regards,
Praveen

On 25-Mar-2014, at 1:49 pm, qingyang li <[hidden email]> wrote:

reopen this thread because i encounter this problem again.
Here is my env:
scala 2.10.3 s 
spark 0.9.0    tandalone mode
shark 0.9.0    downlaod the source code and build by myself
hive hive-shark-0.11
I have copied hive-site.xml from my hadoop cluster , it's hive version is 0.12,  after copied , i deleted some attributes from hive-site.xml

When run select count(*) from xxx, no resut and no errors output.

Can someone give me some suggestions to debug ?





2014-03-20 11:27 GMT+08:00 qingyang li <[hidden email]>:
have found the cause , my problem is : 
the style of file salves is not correct, so the task only be run on master.

explain here to help other guy who also encounter similiar problem.


2014-03-20 9:57 GMT+08:00 qingyang li <[hidden email]>:

Hi, i install spark0.9.0 and shark0.9 on 3 nodes , when i run select * from src , i can get result, but when i run select count(*) from src or select * from src limit 1,  there is no result output.

i have found similiar problem on google groups:
https://groups.google.com/forum/#!searchin/spark-users/Shark$20does$20not$20give$20any$20results$20with$20SELECT$20command/spark-users/oKMBPBWim0U/_hbDCi4m-xUJ
but , there is no solution on it.

Does anyone encounter such problem?





Reply | Threaded
Open this post in threaded view
|

Re: Shark does not give any results with SELECT count(*) command

qingyang li
hi, Praveen, I can start server on bigdata001 using "/bin/shark --service sharkserver", i  can also connect this server using  "./bin/shark -h bigdata001" .
but, the problem still there:
run "select count(*) from  b " on  bigdata001, no result , no error.
run "select count(*) from  b " on  bigdata002, no result , no error.
run "select count(*) from  b " on  bigdata004, no result , no error.
run "select count(*) from  b " on  bigdata003, have result.


2014-03-26 15:49 GMT+08:00 Praveen R <[hidden email]>:
Oh k. You must be running shark server on bigdata001 to use it from other machines.
./bin/shark --service sharkserver  # runs shark server on port 10000

You could connect to shark server as ./bin/shark -h <bigdata001>, this should work unless there is a firewall blocking it. You might use telnet bigdata001 10000 from bigdata003 to check if port is accessible. Hope that helps.


On Wed, Mar 26, 2014 at 12:57 PM, qingyang li <[hidden email]> wrote:
hi, Praveen, thanks for replying.

I am using hive-0.11 which comes from amplab,  at the begining , the  hive-site.xml of amplab is empty, so , i copy one hive-site.xml from my cluster and then remove some attributes and aslo add some atrributs.

i think it is not the reason for my problem, 
i think the reason is shark is runing on local mode , not cluster mode,  when i run bin/shark on bigdata001, it certainly can not get the result which exist on bigdata003.   while i run bin/shark on bigdata003, i can get result.

though it is the reason, i still can not understand why the result is on bigdata003(master is bigdata001)?




2014-03-25 18:41 GMT+08:00 Praveen R <[hidden email]>:

Hi Qingyang Li,

Shark-0.9.0 uses a patched version of hive-0.11 and using configuration/metastore of hive-0.12 could be incompatible.

May I know the reason you are using hive-site.xml from previous hive version(to use existing metastore?). You might just leave hive-site.xml blank, otherwise. Something like this:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
</configuration>

In any case you could run ./bin/shark-withdebug for any errors.

Regards,
Praveen

On 25-Mar-2014, at 1:49 pm, qingyang li <[hidden email]> wrote:

reopen this thread because i encounter this problem again.
Here is my env:
scala 2.10.3 s 
spark 0.9.0    tandalone mode
shark 0.9.0    downlaod the source code and build by myself
hive hive-shark-0.11
I have copied hive-site.xml from my hadoop cluster , it's hive version is 0.12,  after copied , i deleted some attributes from hive-site.xml

When run select count(*) from xxx, no resut and no errors output.

Can someone give me some suggestions to debug ?





2014-03-20 11:27 GMT+08:00 qingyang li <[hidden email]>:
have found the cause , my problem is : 
the style of file salves is not correct, so the task only be run on master.

explain here to help other guy who also encounter similiar problem.


2014-03-20 9:57 GMT+08:00 qingyang li <[hidden email]>:

Hi, i install spark0.9.0 and shark0.9 on 3 nodes , when i run select * from src , i can get result, but when i run select count(*) from src or select * from src limit 1,  there is no result output.

i have found similiar problem on google groups:
https://groups.google.com/forum/#!searchin/spark-users/Shark$20does$20not$20give$20any$20results$20with$20SELECT$20command/spark-users/oKMBPBWim0U/_hbDCi4m-xUJ
but , there is no solution on it.

Does anyone encounter such problem?






Reply | Threaded
Open this post in threaded view
|

Re: Shark does not give any results with SELECT count(*) command

qingyang li
i have found such log on bigdata003:
14/03/25 17:08:49 INFO network.ConnectionManager: Accepted connection from [bigdata001/192.168.1.101]
14/03/25 17:08:49 INFO network.ConnectionManager: Accepted connection from [bigdata002/192.168.1.102]
14/03/25 17:08:49 INFO network.ConnectionManager: Accepted connection from [bigdata004/192.168.1.104]

and also found such log on bigdata004 002 001:

09/01/13 09:32:29 INFO network.ConnectionManager: Accepted connection from [bigdata003/192.168.1.103] 09/01/13 09:32:29 INFO network.SendingConnection: Initiating connection to [bigdata003/192.168.1.103:39848] 09/01/13 09:32:29 INFO network.SendingConnection: Connected to [bigdata003/192.168.1.103:39848], 1 messages pending



from the log, it seems bigdata003 becomes master, but i config bigdata001 as master.

Another clue :
sometimes, after i re-start spark cluster, it becomes ok again, i can get result on bigdata001, but fail on bigdata003,
so, if spark choose one node randomly to store the result?


if i did not say the problem clearly, please let me know. thanks.


2014-03-26 16:55 GMT+08:00 qingyang li <[hidden email]>:
hi, Praveen, I can start server on bigdata001 using "/bin/shark --service sharkserver", i  can also connect this server using  "./bin/shark -h bigdata001" .
but, the problem still there:
run "select count(*) from  b " on  bigdata001, no result , no error.
run "select count(*) from  b " on  bigdata002, no result , no error.
run "select count(*) from  b " on  bigdata004, no result , no error.
run "select count(*) from  b " on  bigdata003, have result.


2014-03-26 15:49 GMT+08:00 Praveen R <[hidden email]>:

Oh k. You must be running shark server on bigdata001 to use it from other machines.
./bin/shark --service sharkserver  # runs shark server on port 10000

You could connect to shark server as ./bin/shark -h <bigdata001>, this should work unless there is a firewall blocking it. You might use telnet bigdata001 10000 from bigdata003 to check if port is accessible. Hope that helps.


On Wed, Mar 26, 2014 at 12:57 PM, qingyang li <[hidden email]> wrote:
hi, Praveen, thanks for replying.

I am using hive-0.11 which comes from amplab,  at the begining , the  hive-site.xml of amplab is empty, so , i copy one hive-site.xml from my cluster and then remove some attributes and aslo add some atrributs.

i think it is not the reason for my problem, 
i think the reason is shark is runing on local mode , not cluster mode,  when i run bin/shark on bigdata001, it certainly can not get the result which exist on bigdata003.   while i run bin/shark on bigdata003, i can get result.

though it is the reason, i still can not understand why the result is on bigdata003(master is bigdata001)?




2014-03-25 18:41 GMT+08:00 Praveen R <[hidden email]>:

Hi Qingyang Li,

Shark-0.9.0 uses a patched version of hive-0.11 and using configuration/metastore of hive-0.12 could be incompatible.

May I know the reason you are using hive-site.xml from previous hive version(to use existing metastore?). You might just leave hive-site.xml blank, otherwise. Something like this:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
</configuration>

In any case you could run ./bin/shark-withdebug for any errors.

Regards,
Praveen

On 25-Mar-2014, at 1:49 pm, qingyang li <[hidden email]> wrote:

reopen this thread because i encounter this problem again.
Here is my env:
scala 2.10.3 s 
spark 0.9.0    tandalone mode
shark 0.9.0    downlaod the source code and build by myself
hive hive-shark-0.11
I have copied hive-site.xml from my hadoop cluster , it's hive version is 0.12,  after copied , i deleted some attributes from hive-site.xml

When run select count(*) from xxx, no resut and no errors output.

Can someone give me some suggestions to debug ?





2014-03-20 11:27 GMT+08:00 qingyang li <[hidden email]>:
have found the cause , my problem is : 
the style of file salves is not correct, so the task only be run on master.

explain here to help other guy who also encounter similiar problem.


2014-03-20 9:57 GMT+08:00 qingyang li <[hidden email]>:

Hi, i install spark0.9.0 and shark0.9 on 3 nodes , when i run select * from src , i can get result, but when i run select count(*) from src or select * from src limit 1,  there is no result output.

i have found similiar problem on google groups:
https://groups.google.com/forum/#!searchin/spark-users/Shark$20does$20not$20give$20any$20results$20with$20SELECT$20command/spark-users/oKMBPBWim0U/_hbDCi4m-xUJ
but , there is no solution on it.

Does anyone encounter such problem?