Python API Performance

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Python API Performance

Nilesh Chakraborty
Hi there,

Background:
I need to do some matrix multiplication stuff inside the mappers, and trying to choose between Python and Scala for writing the Spark MR jobs. I'm equally fluent with Python and Java, and find Scala pretty easy too for what it's worth. Going with Python would let me use numpy + scipy, which is blazing fast when compared to Java libraries like Colt etc. Configuring Java with BLAS seems to be a pain when compared to scipy (direct apt-get installs, or pip).

Question:
I posted a couple of comments on this answer at StackOverflow: http://stackoverflow.com/questions/17236936/api-compatibility-between-scala-and-python. Basically it states that as of Spark 0.7.2, the Python API would be slower than Scala. What's the performance scenario now? The fork issue seems to be fixed. How about serialization? Can it match Java/Scala Writable-like serialization (having knowledge of object type beforehand, reducing I/O) performance? Also, a probably silly question - loops seem to be slow in Python in general, do you think this can turn out to be an issue?

Bottomline, should I choose Python for computation-intensive algorithms like PageRank? Scipy gives me an edge, but does the framework kill it?

Any help, insights, benchmarks will be much appreciated. :)

Cheers,
Nilesh
Reply | Threaded
Open this post in threaded view
|

Re: Python API Performance

Jeremy Freeman
Hi Nilesh,

We're building a data analysis library purely in PySpark that uses a fair bit of numerical computing (https://github.com/freeman-lab/thunder), and faced the same decision as you when starting out.

We went with PySpark because of NumPy and SciPy. So many functions are included, with robust implementations: signal processing, optimization, matrix math, etc., and it's trivial to setup. In Scala, we needed different libraries for specific problems, and many are still in their early days (bugs, missing features, etc.). The PySpark API is relatively complete, though a few bits of functionality aren't there (zipping is probably the only one we're sometimes missing, useful for certain matrix operations). It was definitely feasible to build a functional library entirely in PySpark.

That said, there's a performance hit. In my testing (v0.8.1) a simple algorithm, KMeans (the versions included with Spark), is ~2x faster per iteration in Scala than Python in our set up (private HPC, ~30 nodes, each with 128GB and 16 cores, roughly comparable to the higher-end EC2 instances). I'm preparing more extensive benchmarks, esp. re: matrix calculations, where the difference may shrink (will post them to this forum when ready). For our purposes (purely research), things are fast enough already that the benefits of PySpark outweigh the costs, but will depend on your use case.

I can't speak much to the current roadblocks and future plans for speed-ups, though I know Josh has mentioned he's working on new custom serializers.

-- Jeremy
Reply | Threaded
Open this post in threaded view
|

Re: Python API Performance

Evan R. Sparks
In reply to this post by Nilesh Chakraborty
If you just need basic matrix operations - Spark is dependent on JBlas (http://mikiobraun.github.io/jblas/) to have access to quick linear algebra routines inside of MLlib and graphx. Jblas does a nice job of avoiding boxing/unboxing issues when calling out to blas, so it might be what you're looking for. The programming patterns you'll be able to support with jblas (matrix ops on local partitions) are very similar to what you'd get with numpy, etc. 

I agree that the python libraries are more complete/feature rich, but if you really crave high performance then I'd recommend staying pure scala and giving jblas a try.


On Thu, Jan 30, 2014 at 8:30 AM, nileshc <[hidden email]> wrote:
Hi there,

*Background:*
I need to do some matrix multiplication stuff inside the mappers, and trying
to choose between Python and Scala for writing the Spark MR jobs. I'm
equally fluent with Python and Java, and find Scala pretty easy too for what
it's worth. Going with Python would let me use numpy + scipy, which is
blazing fast when compared to Java libraries like Colt etc. Configuring Java
with BLAS seems to be a pain when compared to scipy (direct apt-get
installs, or pip).

*Question:*
I posted a couple of comments on this answer at StackOverflow:
http://stackoverflow.com/questions/17236936/api-compatibility-between-scala-and-python.
Basically it states that as of Spark 0.7.2, the Python API would be slower
than Scala. What's the performance scenario now? The fork issue seems to be
fixed. How about serialization? Can it match Java/Scala Writable-like
serialization (having knowledge of object type beforehand, reducing I/O)
performance? Also, a probably silly question - loops seem to be slow in
Python in general, do you think this can turn out to be an issue?

Bottomline, should I choose Python for computation-intensive algorithms like
PageRank? Scipy gives me an edge, but does the framework kill it?

Any help, insights, benchmarks will be much appreciated. :)

Cheers,
Nilesh



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Python-API-Performance-tp1048.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Python API Performance

Nilesh Chakraborty
In reply to this post by Jeremy Freeman
Hi Jeremy,

Thanks for the reply.

Jeremy Freeman wrote
That said, there's a performance hit. In my testing (v0.8.1) a simple algorithm, KMeans (the versions included with Spark), is ~2x faster per iteration in Scala than Python in our set up (private HPC, ~30 nodes, each with 128GB and 16 cores, roughly comparable to the higher-end EC2 instances). I'm preparing more extensive benchmarks, esp. re: matrix calculations, where the difference may shrink (will post them to this forum when ready). For our purposes (purely research), things are fast enough already that the benefits of PySpark outweigh the costs, but will depend on your use case.
So you measured with a Scala/Java library on Spark vs numpy/scipy on PySpark, right? Can you tell me which library you used?

A benchmark (or just an initial ballpark figure about the performance difference) on matrix calculations would be awesome - that's the thing that I'm wondering about, whether the difference will even out. I'm still working on something else, and will arrive at Spark/PySpark in a couple of weeks. If you guys can share the results before, it'll save me a great deal of time/toil.

Best,
Nilesh
Reply | Threaded
Open this post in threaded view
|

Re: Python API Performance

Nilesh Chakraborty
In reply to this post by Evan R. Sparks
Hi Evans,

Thanks! I didn't know that Sparks has a dependency on JBLAS. That's good to know. Does this mean I can directly use JBLAS from my Spark MR code and not worry about the painstaking setup of getting Java to recognize the native BLAS libraries on my system? Does Spark take care of that?

But then again, my particular use case deals with large sparse matrices, in which case my only option on the Java/Scala side seems to be Colt (which is pretty slow compared to both JBLAS and scipy/numpy). MTJ is another option - but I'm not sure how much BLAS/ATLAS-setup that'll need. That's what's confusing me - I can't figure out how this will balance out until I take some time off to code some benchmarks myself. :(

Nilesh


On Fri, Jan 31, 2014 at 3:04 AM, Evan R. Sparks [via Apache Spark User List] <[hidden email]> wrote:
If you just need basic matrix operations - Spark is dependent on JBlas (http://mikiobraun.github.io/jblas/) to have access to quick linear algebra routines inside of MLlib and graphx. Jblas does a nice job of avoiding boxing/unboxing issues when calling out to blas, so it might be what you're looking for. The programming patterns you'll be able to support with jblas (matrix ops on local partitions) are very similar to what you'd get with numpy, etc. 

I agree that the python libraries are more complete/feature rich, but if you really crave high performance then I'd recommend staying pure scala and giving jblas a try.


On Thu, Jan 30, 2014 at 8:30 AM, nileshc <[hidden email]> wrote:
Hi there,

*Background:*
I need to do some matrix multiplication stuff inside the mappers, and trying
to choose between Python and Scala for writing the Spark MR jobs. I'm
equally fluent with Python and Java, and find Scala pretty easy too for what
it's worth. Going with Python would let me use numpy + scipy, which is
blazing fast when compared to Java libraries like Colt etc. Configuring Java
with BLAS seems to be a pain when compared to scipy (direct apt-get
installs, or pip).

*Question:*
I posted a couple of comments on this answer at StackOverflow:
http://stackoverflow.com/questions/17236936/api-compatibility-between-scala-and-python.
Basically it states that as of Spark 0.7.2, the Python API would be slower
than Scala. What's the performance scenario now? The fork issue seems to be
fixed. How about serialization? Can it match Java/Scala Writable-like
serialization (having knowledge of object type beforehand, reducing I/O)
performance? Also, a probably silly question - loops seem to be slow in
Python in general, do you think this can turn out to be an issue?

Bottomline, should I choose Python for computation-intensive algorithms like
PageRank? Scipy gives me an edge, but does the framework kill it?

Any help, insights, benchmarks will be much appreciated. :)

Cheers,
Nilesh



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Python-API-Performance-tp1048.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.




If you reply to this email, your message will be added to the discussion below:
http://apache-spark-user-list.1001560.n3.nabble.com/Python-API-Performance-tp1048p1068.html
To unsubscribe from Python API Performance, click here.
NAML



--
A quest eternal, a life so small! So don't just play the guitar, build one.
You can also email me at [hidden email] or visit my website

Reply | Threaded
Open this post in threaded view
|

Re: Python API Performance

Nilesh Chakraborty
In reply to this post by Jeremy Freeman
Hi Jeremy,

Can you try doing a comparison of the Scala ALS code (https://github.com/apache/incubator-spark/blob/master/examples/src/main/scala/org/apache/spark/examples/SparkALS.scala) and Python ALS code (https://github.com/apache/incubator-spark/blob/master/python/examples/als.py) from the Spark repo?

This might be the easiest way to compare Scala+Colt vs Python+Numpy on Spark! Both contain sparse matrix manipulation and multiplications. If someone already has a small Spark cluster (even standalone) already setup, please let us know about how this fares.

I'll try setting up Spark on a few nodes next week.

Best,
Nilesh
Reply | Threaded
Open this post in threaded view
|

Re: Python API Performance

Josh Rosen
If anyone wants to benchmark PySpark against the Scala/Java APIs, it might be nice to add Python benchmarks to the spark-perf performance testing suite: https://github.com/amplab/spark-perf.


On Thu, Jan 30, 2014 at 3:53 PM, nileshc <[hidden email]> wrote:
Hi Jeremy,

Can you try doing a comparison of the Scala ALS code
(https://github.com/apache/incubator-spark/blob/master/examples/src/main/scala/org/apache/spark/examples/SparkALS.scala)
and Python ALS code
(https://github.com/apache/incubator-spark/blob/master/python/examples/als.py)
from the Spark repo?

This might be the easiest way to compare Scala+Colt vs Python+Numpy on
Spark! Both contain sparse matrix manipulation and multiplications. If
someone already has a small Spark cluster (even standalone) already setup,
please let us know about how this fares.

I'll try setting up Spark on a few nodes next week.

Best,
Nilesh



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Python-API-Performance-tp1048p1071.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Python API Performance

Jeremy Freeman
The test I was referring to was the included KMeans algorithm, which uses NumPy for PySpark but can be done without jBlas in scala, so it's more testing basic performance, not matrix libraries.

I can certainly try the ALS test, though note that the scala example you pointed to uses Colt, whereas most of MlLib at this point uses jBlas, so probably most relevant to compare to something using jBlas (or simply rewrite that example to use jBlas).

I basically agree with Evan that if you're only using matrices, and not the richer features of SciPy/NumPy, scala is the way to go, but I'll report back with more tests. I also like Josh's suggestion of adding proper PySpark benchmarking, I'll take a stab at that.

-- Jeremy
Reply | Threaded
Open this post in threaded view
|

Re: Python API Performance

Nilesh Chakraborty
In reply to this post by Josh Rosen
OK, I did some uber-basic testing of the Python ALS example and the Scala ALS example (I wouldn't call this real benchmarking because of the configuration and casual nature of the test).

CPU:i5-2500K
Memory allotted to an example with -Djava.executor.memory=2g
I've got one master and one slave running.

I'm listing the results in <API Language> <List of params: movies users features iterations slices> : <Time taken> format.

<Scala> 500 2000 100 5 2 1m21s
<Scala> 500 2000 100 5 4 0m50s
<Scala> 700 2000 100 5 2 1m41s
<Scala> 700 2000 100 5 4 1m14s
<Python> 500 2000 100 5 4 8m18s
(Sorry, no more for Python, I'm pressed for time at the moment.)

I noticed that average CPU utilization on the quad-core was always 99%+ (except for the drops to ~90% between iterations) during the Scala runs. During Python however, it was around 55-67%, and the rest was WAIT. Evidently a huge time was being wasted (on I/O? slow loops?).

And a funnier thing was, the RMSE over the 5 iterations for Scala began with 0.82 and ended with 0.73, while for the Python version, it started with an RMSE of 1294.1236 and ended with 210.2984. That's a pretty huge gap. Can someone very all this at least on a single node?

I haven't even modified any code, so Scala's using the usual Colt.
Reply | Threaded
Open this post in threaded view
|

Re: Python API Performance

Aureliano Buendia
In reply to this post by Evan R. Sparks



On Thu, Jan 30, 2014 at 7:51 PM, Evan R. Sparks <[hidden email]> wrote:
If you just need basic matrix operations - Spark is dependent on JBlas (http://mikiobraun.github.io/jblas/) to have access to quick linear algebra routines inside of MLlib and graphx. Jblas does a nice job of avoiding boxing/unboxing issues when calling out to blas, so it might be what you're looking for. The programming patterns you'll be able to support with jblas (matrix ops on local partitions) are very similar to what you'd get with numpy, etc. 

jblas is not the top java matrix library when it comes to performance:

https://code.google.com/p/java-matrix-benchmark/wiki/RuntimeCorei7v2600_2013_10
 

I agree that the python libraries are more complete/feature rich, but if you really crave high performance then I'd recommend staying pure scala and giving jblas a try.


On Thu, Jan 30, 2014 at 8:30 AM, nileshc <[hidden email]> wrote:
Hi there,

*Background:*
I need to do some matrix multiplication stuff inside the mappers, and trying
to choose between Python and Scala for writing the Spark MR jobs. I'm
equally fluent with Python and Java, and find Scala pretty easy too for what
it's worth. Going with Python would let me use numpy + scipy, which is
blazing fast when compared to Java libraries like Colt etc. Configuring Java
with BLAS seems to be a pain when compared to scipy (direct apt-get
installs, or pip).

*Question:*
I posted a couple of comments on this answer at StackOverflow:
http://stackoverflow.com/questions/17236936/api-compatibility-between-scala-and-python.
Basically it states that as of Spark 0.7.2, the Python API would be slower
than Scala. What's the performance scenario now? The fork issue seems to be
fixed. How about serialization? Can it match Java/Scala Writable-like
serialization (having knowledge of object type beforehand, reducing I/O)
performance? Also, a probably silly question - loops seem to be slow in
Python in general, do you think this can turn out to be an issue?

Bottomline, should I choose Python for computation-intensive algorithms like
PageRank? Scipy gives me an edge, but does the framework kill it?

Any help, insights, benchmarks will be much appreciated. :)

Cheers,
Nilesh



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Python-API-Performance-tp1048.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Reply | Threaded
Open this post in threaded view
|

Re: Python API Performance

Aureliano Buendia
In reply to this post by Nilesh Chakraborty
A much (much) better solution than python, (and also scala, if that doesn't make you upset) is julia.

Libraries like numpy and scipy are bloated when compared with julia c-like performance. Julia comes with eveything that numpy+scipy come with + more - performance hit.

I hope we can see an official support of julia on spark very soon.


On Thu, Jan 30, 2014 at 4:30 PM, nileshc <[hidden email]> wrote:
Hi there,

*Background:*
I need to do some matrix multiplication stuff inside the mappers, and trying
to choose between Python and Scala for writing the Spark MR jobs. I'm
equally fluent with Python and Java, and find Scala pretty easy too for what
it's worth. Going with Python would let me use numpy + scipy, which is
blazing fast when compared to Java libraries like Colt etc. Configuring Java
with BLAS seems to be a pain when compared to scipy (direct apt-get
installs, or pip).

*Question:*
I posted a couple of comments on this answer at StackOverflow:
http://stackoverflow.com/questions/17236936/api-compatibility-between-scala-and-python.
Basically it states that as of Spark 0.7.2, the Python API would be slower
than Scala. What's the performance scenario now? The fork issue seems to be
fixed. How about serialization? Can it match Java/Scala Writable-like
serialization (having knowledge of object type beforehand, reducing I/O)
performance? Also, a probably silly question - loops seem to be slow in
Python in general, do you think this can turn out to be an issue?

Bottomline, should I choose Python for computation-intensive algorithms like
PageRank? Scipy gives me an edge, but does the framework kill it?

Any help, insights, benchmarks will be much appreciated. :)

Cheers,
Nilesh



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Python-API-Performance-tp1048.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Python API Performance

ankurcha
How does Julia interact with spark. I would be interested, mainly because I seem to find scala syntax a little obscure and it would be great to see actual numbers comparing scala, Python, Julia workloads. 

On Feb 1, 2014, at 16:08, Aureliano Buendia <[hidden email]> wrote:

A much (much) better solution than python, (and also scala, if that doesn't make you upset) is julia.

Libraries like numpy and scipy are bloated when compared with julia c-like performance. Julia comes with eveything that numpy+scipy come with + more - performance hit.

I hope we can see an official support of julia on spark very soon.


On Thu, Jan 30, 2014 at 4:30 PM, nileshc <[hidden email]> wrote:
Hi there,

*Background:*
I need to do some matrix multiplication stuff inside the mappers, and trying
to choose between Python and Scala for writing the Spark MR jobs. I'm
equally fluent with Python and Java, and find Scala pretty easy too for what
it's worth. Going with Python would let me use numpy + scipy, which is
blazing fast when compared to Java libraries like Colt etc. Configuring Java
with BLAS seems to be a pain when compared to scipy (direct apt-get
installs, or pip).

*Question:*
I posted a couple of comments on this answer at StackOverflow:
http://stackoverflow.com/questions/17236936/api-compatibility-between-scala-and-python.
Basically it states that as of Spark 0.7.2, the Python API would be slower
than Scala. What's the performance scenario now? The fork issue seems to be
fixed. How about serialization? Can it match Java/Scala Writable-like
serialization (having knowledge of object type beforehand, reducing I/O)
performance? Also, a probably silly question - loops seem to be slow in
Python in general, do you think this can turn out to be an issue?

Bottomline, should I choose Python for computation-intensive algorithms like
PageRank? Scipy gives me an edge, but does the framework kill it?

Any help, insights, benchmarks will be much appreciated. :)

Cheers,
Nilesh



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Python-API-Performance-tp1048.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Python API Performance

yinxusen
How about breeze (http://www.scalanlp.org/) ? It is written in scala, and use netlib-java as the backend. (https://github.com/scalanlp/breeze/wiki/Breeze-Linear-Algebra#wiki-performance)

I think breeze is more like matlab and numpy/scipy on the subject of ease of use. This is also a good aspect to have a test.


2014-02-02 Ankur Chauhan <[hidden email]>:
How does Julia interact with spark. I would be interested, mainly because I seem to find scala syntax a little obscure and it would be great to see actual numbers comparing scala, Python, Julia workloads. 

On Feb 1, 2014, at 16:08, Aureliano Buendia <[hidden email]> wrote:

A much (much) better solution than python, (and also scala, if that doesn't make you upset) is julia.

Libraries like numpy and scipy are bloated when compared with julia c-like performance. Julia comes with eveything that numpy+scipy come with + more - performance hit.

I hope we can see an official support of julia on spark very soon.


On Thu, Jan 30, 2014 at 4:30 PM, nileshc <[hidden email]> wrote:
Hi there,

*Background:*
I need to do some matrix multiplication stuff inside the mappers, and trying
to choose between Python and Scala for writing the Spark MR jobs. I'm
equally fluent with Python and Java, and find Scala pretty easy too for what
it's worth. Going with Python would let me use numpy + scipy, which is
blazing fast when compared to Java libraries like Colt etc. Configuring Java
with BLAS seems to be a pain when compared to scipy (direct apt-get
installs, or pip).

*Question:*
I posted a couple of comments on this answer at StackOverflow:
http://stackoverflow.com/questions/17236936/api-compatibility-between-scala-and-python.
Basically it states that as of Spark 0.7.2, the Python API would be slower
than Scala. What's the performance scenario now? The fork issue seems to be
fixed. How about serialization? Can it match Java/Scala Writable-like
serialization (having knowledge of object type beforehand, reducing I/O)
performance? Also, a probably silly question - loops seem to be slow in
Python in general, do you think this can turn out to be an issue?

Bottomline, should I choose Python for computation-intensive algorithms like
PageRank? Scipy gives me an edge, but does the framework kill it?

Any help, insights, benchmarks will be much appreciated. :)

Cheers,
Nilesh



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Python-API-Performance-tp1048.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.




--
Best Regards
-----------------------------------
Xusen Yin    尹绪森
Beijing Key Laboratory of Intelligent Telecommunications Software and Multimedia
Beijing University of Posts & Telecommunications
Intel Labs China
Reply | Threaded
Open this post in threaded view
|

Re: Python API Performance

Evan R. Sparks
We used breeze in some early MLlib prototypes last year. It feels very "scala" which is a huge plus, but unfortunately we found that the object overhead and difficulty of tracking down performance problems due to heavy use of implicit conversions inside breeze made writing high performance matrix code with it difficult. Further - at least for the early algorithms, we didn't need all the extra flexibility that breeze provides, since our use cases were pretty straightforward. 

On Feb 1, 2014, at 5:51 PM, 尹绪森 <[hidden email]> wrote:

How about breeze (http://www.scalanlp.org/) ? It is written in scala, and use netlib-java as the backend. (https://github.com/scalanlp/breeze/wiki/Breeze-Linear-Algebra#wiki-performance)

I think breeze is more like matlab and numpy/scipy on the subject of ease of use. This is also a good aspect to have a test.


2014-02-02 Ankur Chauhan <[hidden email]>:
How does Julia interact with spark. I would be interested, mainly because I seem to find scala syntax a little obscure and it would be great to see actual numbers comparing scala, Python, Julia workloads. 

On Feb 1, 2014, at 16:08, Aureliano Buendia <[hidden email]> wrote:

A much (much) better solution than python, (and also scala, if that doesn't make you upset) is julia.

Libraries like numpy and scipy are bloated when compared with julia c-like performance. Julia comes with eveything that numpy+scipy come with + more - performance hit.

I hope we can see an official support of julia on spark very soon.


On Thu, Jan 30, 2014 at 4:30 PM, nileshc <[hidden email]> wrote:
Hi there,

*Background:*
I need to do some matrix multiplication stuff inside the mappers, and trying
to choose between Python and Scala for writing the Spark MR jobs. I'm
equally fluent with Python and Java, and find Scala pretty easy too for what
it's worth. Going with Python would let me use numpy + scipy, which is
blazing fast when compared to Java libraries like Colt etc. Configuring Java
with BLAS seems to be a pain when compared to scipy (direct apt-get
installs, or pip).

*Question:*
I posted a couple of comments on this answer at StackOverflow:
http://stackoverflow.com/questions/17236936/api-compatibility-between-scala-and-python.
Basically it states that as of Spark 0.7.2, the Python API would be slower
than Scala. What's the performance scenario now? The fork issue seems to be
fixed. How about serialization? Can it match Java/Scala Writable-like
serialization (having knowledge of object type beforehand, reducing I/O)
performance? Also, a probably silly question - loops seem to be slow in
Python in general, do you think this can turn out to be an issue?

Bottomline, should I choose Python for computation-intensive algorithms like
PageRank? Scipy gives me an edge, but does the framework kill it?

Any help, insights, benchmarks will be much appreciated. :)

Cheers,
Nilesh



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Python-API-Performance-tp1048.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.




--
Best Regards
-----------------------------------
Xusen Yin    尹绪森
Beijing Key Laboratory of Intelligent Telecommunications Software and Multimedia
Beijing University of Posts & Telecommunications
Intel Labs China