BOOK review of Spark: WARNING to spark users

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

BOOK review of Spark: WARNING to spark users

emma davis

Book: Machine Learning with Apache Spark Quick Start Guide
publisher : packt>


Following this Getting Started with Python in VS Code

I realised Jillur Qudus has written and published a book without any knowledge
of subject matter, amongst other things Python.


Highlighted proof with further details further down the email.

import findspark # these lines of code are unnecessary see link above for setup
findspark.init()

Setting SPARK_HOME or any other spark variables are unnecessary because Spark like any
frameworks is self contained and has its own conf directory for startup persistent configuration settings.
Obviously the software would find its own current directory upon starting i.e. sbin/start-master.sh

Spark is a BIG DATA tool ( heavy distributed ,parallelism processing) so clearly you would expect its hello world demo programs to demonstrate that.

what is the point of setting num_samples=100. something like 10**10 would make sense to test performance.


This is my warning do not end up wasting your valuable time as I did .  I fee your time is valuable.
I realise the scam as I got a better understanding of the product by just doing the correct hello world program from correct source.

“Research by CISQ found that, in 2018, poor quality software cost organizations $2.8 trillion in the US alone. “

I attribute this to the Indian IT industry claiming they can do job better than the natives [US , Europeans.] Implying Indian Education or IT people is superior. For example People like me born, live and educated  in the western Europe

https://www.it-cisq.org/the-cost-of-poor-quality-software-in-the-us-a-2018-report/The-Cost-of-Poor-Quality-Software-in-the-US-2018-Report.pdf


Contributors: About the Author
Jillur Qudus is a lead technical architect, polygot software engineer and data scientist
with over 10 years of hand-on experience in architecting and engineering distributed,
scalable , high performance .. to combat serious organised crime. Jillur has extensive experience working with government, intelligence,law enforcement and banking, and has worked across the world including Japan,Singapore,Malysia,Hong Kong and New Zealand .. founder of keisan, a UK-based company specializing in open source distributed technologies and machine learning…“
This obviously means a lot to many but when I look at his work Judge for yourself based on evidence.

Page 54
<quote> ”
Additional Python Packages
> conda install -c conda-forge findspark
> conda install -c conda-forge pykafka
...”<quote>

The remainder of the program was copied from spark website so that wasn’t wrong.
Page 63

<quote> “
> cd etc/profile.d
vi spark.sh
$ export SPARK_HOME=/opt/spark-2.3.2-bin-hadoop2.7
> source spark.sh

.. in order for the SPARK_HOME environment variable to be successfully recognized and registered by findspark ...
….

We are now ready to write out first spark application in Python ! …..

# (1) import required Python dependencies
import findspark
findspark.init()

(3)
….
num_samples = 100 </quote>


emma davis
[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: BOOK review of Spark: WARNING to spark users

Jacek Laskowski
Hi Emma,

I'm curious about the purpose of the email. Mind elaborating?

On Wed, May 20, 2020 at 10:43 PM emma davis <[hidden email]> wrote:

Book: Machine Learning with Apache Spark Quick Start Guide
publisher : packt>


Following this Getting Started with Python in VS Code

I realised Jillur Qudus has written and published a book without any knowledge
of subject matter, amongst other things Python.


Highlighted proof with further details further down the email.

import findspark # these lines of code are unnecessary see link above for setup
findspark.init()

Setting SPARK_HOME or any other spark variables are unnecessary because Spark like any
frameworks is self contained and has its own conf directory for startup persistent configuration settings.
Obviously the software would find its own current directory upon starting i.e. sbin/start-master.sh

Spark is a BIG DATA tool ( heavy distributed ,parallelism processing) so clearly you would expect its hello world demo programs to demonstrate that.

what is the point of setting num_samples=100. something like 10**10 would make sense to test performance.


This is my warning do not end up wasting your valuable time as I did .  I fee your time is valuable.
I realise the scam as I got a better understanding of the product by just doing the correct hello world program from correct source.

“Research by CISQ found that, in 2018, poor quality software cost organizations $2.8 trillion in the US alone. “

I attribute this to the Indian IT industry claiming they can do job better than the natives [US , Europeans.] Implying Indian Education or IT people is superior. For example People like me born, live and educated  in the western Europe



Contributors: About the Author
Jillur Qudus is a lead technical architect, polygot software engineer and data scientist
with over 10 years of hand-on experience in architecting and engineering distributed,
scalable , high performance .. to combat serious organised crime. Jillur has extensive experience working with government, intelligence,law enforcement and banking, and has worked across the world including Japan,Singapore,Malysia,Hong Kong and New Zealand .. founder of keisan, a UK-based company specializing in open source distributed technologies and machine learning…“
This obviously means a lot to many but when I look at his work Judge for yourself based on evidence.

Page 54
<quote> ”
Additional Python Packages
> conda install -c conda-forge findspark
> conda install -c conda-forge pykafka
...”<quote>

The remainder of the program was copied from spark website so that wasn’t wrong.
Page 63

<quote> “
> cd etc/profile.d
vi spark.sh
$ export SPARK_HOME=/opt/spark-2.3.2-bin-hadoop2.7
> source spark.sh

.. in order for the SPARK_HOME environment variable to be successfully recognized and registered by findspark ...
….

We are now ready to write out first spark application in Python ! …..

# (1) import required Python dependencies
import findspark
findspark.init()

(3)
….
num_samples = 100 </quote>


emma davis
[hidden email]