MapReduce from Java

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

MapReduce from Java

thomas.kuehner
Hello, I hope don't annoy you for asking so much questions, but here  
is one more.
How does GPars support map/reduce from Java, because unfortunately I  
have to use Java. In the Documentation it's recommended for Java:  
"Parallel Collection - use jsr-166y library's Parallel Array  
directly". Does this include map/reduce? I'm not sure if that's  
already the answer because I can't find it in the doc and the API.

Here are more details. My specific/simple problem is:
I have a list of users and each user contains a list of places he  
visited (e.g. List ={ A,B,C...A,D,E,A}), so the same place can occur  
multiple times. Now the actual task is to count how often each place  
has been visited. As far is I know, this is a simple and typical  
map/reduce problem (like counting words) and thus I want to exploit  
the ability of executing parallel. As mentioned before, how does GPars  
support to solve that with Java?

best regards
Thomas Kühner




---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email


Reply | Threaded
Open this post in threaded view
|

Re: MapReduce from Java

Paolo Di Tommaso
Hello, 

MapReduce is a programming model meant for distributed computing, even though you can execute it locally I don't think it has much sense. 


Cheers,
Paolo



On Fri, Aug 16, 2013 at 12:18 PM, <[hidden email]> wrote:
Hello, I hope don't annoy you for asking so much questions, but here is one more.
How does GPars support map/reduce from Java, because unfortunately I have to use Java. In the Documentation it's recommended for Java: "Parallel Collection - use jsr-166y library's Parallel Array directly". Does this include map/reduce? I'm not sure if that's already the answer because I can't find it in the doc and the API.

Here are more details. My specific/simple problem is:
I have a list of users and each user contains a list of places he visited (e.g. List ={ A,B,C...A,D,E,A}), so the same place can occur multiple times. Now the actual task is to count how often each place has been visited. As far is I know, this is a simple and typical map/reduce problem (like counting words) and thus I want to exploit the ability of executing parallel. As mentioned before, how does GPars support to solve that with Java?

best regards
Thomas Kühner




---------------------------------------------------------------------
To unsubscribe from this list, please visit:

   http://xircles.codehaus.org/manage_email



Reply | Threaded
Open this post in threaded view
|

Re: MapReduce from Java

Russel Winder-3
On Fri, 2013-08-16 at 12:25 +0200, Paolo Di Tommaso wrote:
> Hello,
>
> MapReduce is a programming model meant for distributed computing, even
> though you can execute it locally I don't think it has much sense.

MapReduce™ may be a model for distributed computing, but map–reduce
algorithms are as important for data parallel parallelism as for data
parallel distribution.  Multicore algorithms use map–reduce a lot.

Note the labelling:


        Smalltalk Most other
        Groovy languages
        ------ --------
        collect map
        inject reduce

:-)

--
Russel.
=============================================================================
Dr Russel Winder      t: +44 20 7585 2200   voip: sip:[hidden email]
41 Buckmaster Road    m: +44 7770 465 077   xmpp: [hidden email]
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

signature.asc (205 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: MapReduce from Java

Russel Winder-3
In reply to this post by thomas.kuehner
On Fri, 2013-08-16 at 12:18 +0200, [hidden email] wrote:
> Hello, I hope don't annoy you for asking so much questions, but here  
> is one more.

Only one ;-)

> How does GPars support map/reduce from Java, because unfortunately I  
> have to use Java. In the Documentation it's recommended for Java:  
> "Parallel Collection - use jsr-166y library's Parallel Array  
> directly". Does this include map/reduce? I'm not sure if that's  
> already the answer because I can't find it in the doc and the API.

jsr166y is dead. I know Parallel Array is used in GPars, but this will
have to change. Java 8 introduces streams which provide lazily evaluated
sequential or parallel execution.  GPars will have to change to
accommodate this.

Currently though GPars has parallel collect and parallel inject (also
known as parallel map / parallel reduce)

http://gpars.org/guide/guide/single.html#dataParallelism

Is the place to get the information, but it is a lengthy read, there are
11 sections.

> Here are more details. My specific/simple problem is:
> I have a list of users and each user contains a list of places he  
> visited (e.g. List ={ A,B,C...A,D,E,A}), so the same place can occur  
> multiple times. Now the actual task is to count how often each place  
> has been visited. As far is I know, this is a simple and typical  
> map/reduce problem (like counting words) and thus I want to exploit  
> the ability of executing parallel. As mentioned before, how does GPars  
> support to solve that with Java?

Why do you have to code this up in Java, using Groovy would seem to me
to be better.

I am not convinced a map–reduce parallel algorithm is what you want, you
almost certainly need to investigate groupBy.

--
Russel.
=============================================================================
Dr Russel Winder      t: +44 20 7585 2200   voip: sip:[hidden email]
41 Buckmaster Road    m: +44 7770 465 077   xmpp: [hidden email]
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

signature.asc (205 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: MapReduce from Java

thomas.kuehner
So,  I was able to write a solution in Groovy, but I am absolutely not  
sure if my solution has a good performance because my experience with  
Groovy is very low. Can perhaps someone look at the code and perhaps  
give some suggestion how to improve the whole. In info.txt you can  
read again the task which has to be solved. It's just about the method  
mapreduce(10 lines) between //------------- Code //----------  
https://github.com/TommiK/MapReduceGroovy/blob/master/GPars/src/system/MapReduce.groovy


Zitat von Russel Winder <[hidden email]>:

> On Fri, 2013-08-16 at 12:18 +0200, [hidden email] wrote:
>> Hello, I hope don't annoy you for asking so much questions, but here
>> is one more.
>
> Only one ;-)
>
>> How does GPars support map/reduce from Java, because unfortunately I
>> have to use Java. In the Documentation it's recommended for Java:
>> "Parallel Collection - use jsr-166y library's Parallel Array
>> directly". Does this include map/reduce? I'm not sure if that's
>> already the answer because I can't find it in the doc and the API.
>
> jsr166y is dead. I know Parallel Array is used in GPars, but this will
> have to change. Java 8 introduces streams which provide lazily evaluated
> sequential or parallel execution.  GPars will have to change to
> accommodate this.
>
> Currently though GPars has parallel collect and parallel inject (also
> known as parallel map / parallel reduce)
>
> http://gpars.org/guide/guide/single.html#dataParallelism
>
> Is the place to get the information, but it is a lengthy read, there are
> 11 sections.
>
>> Here are more details. My specific/simple problem is:
>> I have a list of users and each user contains a list of places he
>> visited (e.g. List ={ A,B,C...A,D,E,A}), so the same place can occur
>> multiple times. Now the actual task is to count how often each place
>> has been visited. As far is I know, this is a simple and typical
>> map/reduce problem (like counting words) and thus I want to exploit
>> the ability of executing parallel. As mentioned before, how does GPars
>> support to solve that with Java?
>
> Why do you have to code this up in Java, using Groovy would seem to me
> to be better.
>
> I am not convinced a map–reduce parallel algorithm is what you want, you
> almost certainly need to investigate groupBy.
>
> --
> Russel.
> =============================================================================
> Dr Russel Winder      t: +44 20 7585 2200   voip: sip:[hidden email]
> 41 Buckmaster Road    m: +44 7770 465 077   xmpp: [hidden email]
> London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder
>



---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email


Reply | Threaded
Open this post in threaded view
|

RE: MapReduce from Java

Bob Brown
Don't know about the map/reduce parts but instead of

def sequenceA=""
new File("SequenceA.txt").eachLine { line -> sequenceA+=line }

You can just do

def sequenceA=new File("SequenceA.txt").readLines()

You might also like to take a look at gbench, see:

http://wordpress.transentia.com.au/wordpress/2013/03/25/gorgeous-gbench/

HTH

Bob

> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]]
> Sent: Thursday, 22 August 2013 6:41 PM
> To: [hidden email]
> Subject: Re: [gpars-user] MapReduce from Java
>
> So,  I was able to write a solution in Groovy, but I am absolutely not sure if my
> solution has a good performance because my experience with Groovy is very
> low. Can perhaps someone look at the code and perhaps give some
> suggestion how to improve the whole. In info.txt you can read again the task
> which has to be solved. It's just about the method
> mapreduce(10 lines) between //------------- Code //----------
> https://github.com/TommiK/MapReduceGroovy/blob/master/GPars/src/sys
> tem/MapReduce.groovy
>
>
> Zitat von Russel Winder <[hidden email]>:
>
> > On Fri, 2013-08-16 at 12:18 +0200, [hidden email] wrote:
> >> Hello, I hope don't annoy you for asking so much questions, but here
> >> is one more.
> >
> > Only one ;-)
> >
> >> How does GPars support map/reduce from Java, because unfortunately I
> >> have to use Java. In the Documentation it's recommended for Java:
> >> "Parallel Collection - use jsr-166y library's Parallel Array
> >> directly". Does this include map/reduce? I'm not sure if that's
> >> already the answer because I can't find it in the doc and the API.
> >
> > jsr166y is dead. I know Parallel Array is used in GPars, but this will
> > have to change. Java 8 introduces streams which provide lazily
> > evaluated sequential or parallel execution.  GPars will have to change
> > to accommodate this.
> >
> > Currently though GPars has parallel collect and parallel inject (also
> > known as parallel map / parallel reduce)
> >
> > http://gpars.org/guide/guide/single.html#dataParallelism
> >
> > Is the place to get the information, but it is a lengthy read, there
> > are
> > 11 sections.
> >
> >> Here are more details. My specific/simple problem is:
> >> I have a list of users and each user contains a list of places he
> >> visited (e.g. List ={ A,B,C...A,D,E,A}), so the same place can occur
> >> multiple times. Now the actual task is to count how often each place
> >> has been visited. As far is I know, this is a simple and typical
> >> map/reduce problem (like counting words) and thus I want to exploit
> >> the ability of executing parallel. As mentioned before, how does
> >> GPars support to solve that with Java?
> >
> > Why do you have to code this up in Java, using Groovy would seem to me
> > to be better.
> >
> > I am not convinced a map–reduce parallel algorithm is what you want,
> > you almost certainly need to investigate groupBy.
> >
> > --
> > Russel.
> >
> ==========================================================
> ===================
> > Dr Russel Winder      t: +44 20 7585 2200   voip: sip:[hidden email]
> > 41 Buckmaster Road    m: +44 7770 465 077   xmpp: [hidden email]
> > London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder
> >
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe from this list, please visit:
>
>     http://xircles.codehaus.org/manage_email
>



---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email