pmml with augustus

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

pmml with augustus

filipus
hello guys,

has anybody experiances with the library augustus as a serializer for scoring models?

looks very promising and i even found a hint on the connection augustus and spark

all the best
Reply | Threaded
Open this post in threaded view
|

Re: pmml with augustus

sowen
It's worth mentioning that Augustus is a Python-based library. On a
related note, in Java-land, I have had good experiences with jpmml's
projects:


On Tue, Jun 10, 2014 at 7:52 AM, filipus <[hidden email]> wrote:

> hello guys,
>
> has anybody experiances with the library augustus as a serializer for
> scoring models?
>
> looks very promising and i even found a hint on the connection augustus and
> spark
>
> all the best
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/pmml-with-augustus-tp7313.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: pmml with augustus

sowen
On Tue, Jun 10, 2014 at 7:59 AM, Sean Owen <[hidden email]> wrote:
> It's worth mentioning that Augustus is a Python-based library. On a
> related note, in Java-land, I have had good experiences with jpmml's
> projects:

https://github.com/jpmml

in particular

https://github.com/jpmml/jpmml-model
https://github.com/jpmml/jpmml-evaluator

I have not used OpenScoring yet.
Reply | Threaded
Open this post in threaded view
|

Re: pmml with augustus

filipus
Thank you very much

the cascading project i didn't recognize it at all till now

this project is very interesting

also I got the idea of the usage of scala as a language for spark - becuase i can intergrate jvm based libraries very easy/naturaly when I got it right

mh... but I could also use sparc as a model engine, augustus for the serializer and a third party produkt for the prediction engine like using jpmml

mh... got the feeling that i need to do java, scala and python at the same time...

first things first -> augustus for an pmml output from spark :-)

Reply | Threaded
Open this post in threaded view
|

Re: pmml with augustus

Evan R. Sparks
I should point out that if you don't want to take a polyglot approach to languages and reside solely in the JVM, then you can just use plain old java serialization on the Model objects that come out of MLlib's APIs from Java or Scala and load them up in another process and call the relevant .predict() method when it comes time to serve. The same approach would probably also work for models trained via MLlib's python APIs, but I haven't tried that.

Native PMML serialization would be a nice feature to add to MLlib as a mechanism to transfer models to other environments for further analysis/serving. There's a JIRA discussion about this here: https://issues.apache.org/jira/browse/SPARK-1406


On Tue, Jun 10, 2014 at 10:53 AM, filipus <[hidden email]> wrote:
Thank you very much

the cascading project i didn't recognize it at all till now

this project is very interesting

also I got the idea of the usage of scala as a language for spark - becuase
i can intergrate jvm based libraries very easy/naturaly when I got it right

mh... but I could also use sparc as a model engine, augustus for the
serializer and a third party produkt for the prediction engine like using
jpmml

mh... got the feeling that i need to do java, scala and python at the same
time...

first things first -> augustus for an pmml output from spark :-)





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/pmml-with-augustus-tp7313p7335.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: pmml with augustus

Paco Nathan
That's a good point about polyglot. Given that Spark is incorporating a range of languages (Scala, Java, Py, R, SQL) it becomes a trade-off whether or not to centralize support or integrate with native options. Going with the latter implies more standardization and less tech debt.

The big win with PMML however is migration, e.g., regulated industries may have a strong requirement to train in one place that is auditable (e.g., SAS) but then score at scale (e.g., Spark). Migration in the opposite direction is also much in demand, e.g., to leverage training at scale through Spark.

It's worth noting that there is a PMML community. Open Data Group (Augustus) and Zementis do much work to help organize and promote that. Opinion: both of those projects seem more likely as best ref impls than JPMML -- at least more actively cooperating within the PMML open standard community. YMMV.

If you're interested in PMML then I'd encourage you to get involved. There are workshops, e.g., generally at KDD, ACM gatherings, etc.

FWIW, I was the original lead on Cascading's PMML support -- first rev that other firms used in production, not the rewrite on Concurrent's site that added Cascading deep dependencies.



On Tue, Jun 10, 2014 at 11:10 AM, Evan R. Sparks <[hidden email]> wrote:
I should point out that if you don't want to take a polyglot approach to languages and reside solely in the JVM, then you can just use plain old java serialization on the Model objects that come out of MLlib's APIs from Java or Scala and load them up in another process and call the relevant .predict() method when it comes time to serve. The same approach would probably also work for models trained via MLlib's python APIs, but I haven't tried that.

Native PMML serialization would be a nice feature to add to MLlib as a mechanism to transfer models to other environments for further analysis/serving. There's a JIRA discussion about this here: https://issues.apache.org/jira/browse/SPARK-1406


On Tue, Jun 10, 2014 at 10:53 AM, filipus <[hidden email]> wrote:
Thank you very much

the cascading project i didn't recognize it at all till now

this project is very interesting

also I got the idea of the usage of scala as a language for spark - becuase
i can intergrate jvm based libraries very easy/naturaly when I got it right

mh... but I could also use sparc as a model engine, augustus for the
serializer and a third party produkt for the prediction engine like using
jpmml

mh... got the feeling that i need to do java, scala and python at the same
time...

first things first -> augustus for an pmml output from spark :-)





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/pmml-with-augustus-tp7313p7335.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Reply | Threaded
Open this post in threaded view
|

Re: pmml with augustus

filipus
This post was updated on .
@Paco: I understand that most promising for me to put effort in understanding for in deploying models in the spark enviroment would be augustus and zementis right?

just saw this slides of yours:

http://de.slideshare.net/pacoid/data-workflows-for-machine-learning-33341183

so must be wrong with my understanding

actually as you mention I would have both direction of deploying. I have already models which I could transform into pmml and I also think in building more models in time using spark... or other model engines in the hadoop field

When I read about mllib and mlbase I got very interested in it because it seems to handle some aspects of my actual challenge (building arround 1000 models, administrate 1000 models, calculate arount 2 billions scores each week) but with the administartion stuf I am so so sure about. also i find that one need to put in the field (spark, mllib, mlbase, ...) some effort into the transparency of the models.

as long you just build a recomender system you probably dont need something like that but as you mention... there are a lot of departments where analysts are building the models because the risk to spend millions of money in a wrong place beause of the model which wasnt proofed carefully... is simply to high for the managers

....

is there actually a direction of administration of scores in the spark/mllib/mlbase field?. I mean somthing like

a) description of the score model, training data set, target variable, for what for
b) quality check, actual performance in comparison with other models,
c) version control system
d) indicator if the score is activ or not
e) for specificily which action (for instance which website, wich customer group, wich country,...)  

a commercial product which is in a way compareable would the model manager from sas

hey guys.. in anyway I will get involved in this field. It looks so promissing

ps: think about integrating a mip solver! because you can not handle every thing with a statistical model. in business you have quite often discrete optimization problems when you try to manage your business with prediction models :-)
 
Reply | Threaded
Open this post in threaded view
|

Re: pmml with augustus

Villu Ruusmann
In reply to this post by filipus
Hello Spark/PMML enthusiasts,

It's pretty trivial to integrate the JPMML-Evaluator library with Spark. In brief, take the following steps in your Spark application code:
1) Create a Java Map ("arguments") that represents the input data record. You need to specify a key-value mapping for every active MiningField. The key type is org.jpmml.evaluator.FieldName. The value type could be String or any Java primitive data type that can be converted to the requested PMML type.
2) Obtain an instance of org.jpmml.evaluator.Evaluator. Invoke its #evaluate(Map<FieldName, ?>) method using the argument map created in step 1.
3) Process the Java Map ("results") that represents the output data record.

Putting it all together:
JavaRDD<Map<FieldName, String>> arguments = ...
final ModelEvaluator<?> modelEvaluator = (ModelEvaluator<?>)pmmlManager.getModelManager(null, ModelEvaluatorFactory.getInstance()); // See the JPMML-Evaluator documentation
JavaRDD<Map<FieldName, ?>> results = arguments.flatMap(new FlatMapFunction<Map<FieldName, String>, Map<FieldName, ?>>(){

        @Override
        public Iterable<Map<FieldName, ?>> call(Map<FieldName, String> arguments){
                Map<FieldName, ?> result = modelEvaluator.evaluate(arguments);
                return Collections.<Map<FieldName, ?>>singletonList(result);
        }
});

Of course, it's not very elegant to be using JavaRDD<Map<K, V>> here. Maybe someone can give me a hint about making it look and feel more Spark-y?

Also, I would like to refute earlier comment by @pacoid, that JPMML-evaluator compares poorly against Augustus and Zementis products. First, JPMML-Evaluator fully supports PMML specification versions 3.0 through 4.2. I would specifically stress the support for PMML 4.2, which was released just a few months ago. Second, JPMML is open source. Perhaps its licensing terms could be more liberal, but it's nevertheless the most open and approachable way of bringing Java and PMML together.


VR
Reply | Threaded
Open this post in threaded view
|

Re: pmml with augustus

filipus
@villu: thank you for your help. In prommis I gonna try it! thats cools :-) do you know also the other way around from pmml to a model object in spark?