Need some tips about doing some files operations

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Need some tips about doing some files operations

chiquitinxx
Hello!

I want to make some stuff with files, and dunno what Gpars tecnique/approach to use. Maybe you can give me some guidance what to do. I'll try to explain it:

Main task (only 1 main task can be executed, don't allow two at same time):

while (not finished)
        do some before stuff()
        checkFiles([array of files and folders],lastModificationDates)
        do some after stuff()
        wait some time()
end-while

before and after stuff are just normal functions, also wait is like a sleep.

function checkFiles([], lastModificationDates)
        //Here i want to read all the files, and all the files in the folders
        //Check if the file has changed, and if is changed, call a function that make some 'heavy stuff'
        //That heavy stuff can (must) be done concurrently, and at same time that checking files
end function
       
So, the tips here are:
*While Im reading files, i want to start doing that 'heavy stuff'... actors or async?
*I want end the function when all files are read and all the 'heavy stuff' is done... dataflow? how detect all heavy stuff is done?

I hope you understand me. Thank you very much.

I have this approach atm, i don't like this number in agent, no idea how to manage long stuff done:

def items = ['.','../','../../groovy/data']
def dates = [:]

def doLongStuff(String name) {
    //println '  Starting long stuff '+name
   sleep(1000)
}

    def transformer
    final def works = new DataflowQueue()

    transformer = Actors.actor {
        def number = new Agent(0)
        def ending = false

        loop {
            react { msg->
                //println '  Msg->'+msg
                if (msg.startsWith('send')) {
                    number.send { increment() }
                    changes << msg.substring(5)
                    task {
                        works << doLongStuff(msg.substring(5))
                    }.then {
                        number.send { decrement() }
                        transformer << 'MaybeTerminate'
                    }
                } else if (msg == 'End') {
                    ending = true
                    transformer << 'MaybeTerminate'
                } else if (msg == 'MaybeTerminate') {
                    if (ending && number.val <= 0) {
                        terminate()
                    }
                } else {
                    println 'I don\'t understand message: ' + msg
                }
            }
        }
    }

    def checkFile = { File file ->
        def change = false
        if (dates."${file.absolutePath}") {
            change = !(dates."${file.absolutePath}"==file.lastModified())
        } else {
            change = true
        }
        if (change) {
            println '  File changed: '+file.absolutePath
            transformer << 'send '+file.absolutePath
            dates."${file.absolutePath}" = file.lastModified()
        }
    }

    withPool {
        items.eachParallel { name ->
            def file = new File(name)
            if (file && (file.isDirectory() || file.isFile())) {
                if (file.isDirectory()) {

                    file.eachFile { File item ->
                        if (item.isFile()) {
                            checkFile(item)
                        }
                    }

                } else {
                    checkFile(file)
                }
            } else {
                println 'Error in file/folder '+name
            }
        }
    }

    task {
        transformer << 'End'
    }

    transformer.join()
---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email


Reply | Threaded
Open this post in threaded view
|

Re: Need some tips about doing some files operations

Vaclav
Administrator
Hi Jorge,

to me this looks like a dataflow problem - pipes and operators would make the code much simpler, I believe.
Your checkFile() method would feed work items into a dataflow queue, which would be read by an operator that performs your doLongStuff() action. The operator should have the maxForks property set to the desired level of parallelism.

Sending a Poison pill would then terminate the operator.

Vaclav




On Mon, Feb 18, 2013 at 8:44 AM, Jorge Franco Leza <[hidden email]> wrote:
Hello!

I want to make some stuff with files, and dunno what Gpars tecnique/approach to use. Maybe you can give me some guidance what to do. I'll try to explain it:

Main task (only 1 main task can be executed, don't allow two at same time):

while (not finished)
        do some before stuff()
        checkFiles([array of files and folders],lastModificationDates)
        do some after stuff()
        wait some time()
end-while

before and after stuff are just normal functions, also wait is like a sleep.

function checkFiles([], lastModificationDates)
        //Here i want to read all the files, and all the files in the folders
        //Check if the file has changed, and if is changed, call a function that make some 'heavy stuff'
        //That heavy stuff can (must) be done concurrently, and at same time that checking files
end function

So, the tips here are:
*While Im reading files, i want to start doing that 'heavy stuff'... actors or async?
*I want end the function when all files are read and all the 'heavy stuff' is done... dataflow? how detect all heavy stuff is done?

I hope you understand me. Thank you very much.

I have this approach atm, i don't like this number in agent, no idea how to manage long stuff done:

def items = ['.','../','../../groovy/data']
def dates = [:]

def doLongStuff(String name) {
    //println '  Starting long stuff '+name
   sleep(1000)
}

    def transformer
    final def works = new DataflowQueue()

    transformer = Actors.actor {
        def number = new Agent(0)
        def ending = false

        loop {
            react { msg->
                //println '  Msg->'+msg
                if (msg.startsWith('send')) {
                    number.send { increment() }
                    changes << msg.substring(5)
                    task {
                        works << doLongStuff(msg.substring(5))
                    }.then {
                        number.send { decrement() }
                        transformer << 'MaybeTerminate'
                    }
                } else if (msg == 'End') {
                    ending = true
                    transformer << 'MaybeTerminate'
                } else if (msg == 'MaybeTerminate') {
                    if (ending && number.val <= 0) {
                        terminate()
                    }
                } else {
                    println 'I don\'t understand message: ' + msg
                }
            }
        }
    }

    def checkFile = { File file ->
        def change = false
        if (dates."${file.absolutePath}") {
            change = !(dates."${file.absolutePath}"==file.lastModified())
        } else {
            change = true
        }
        if (change) {
            println '  File changed: '+file.absolutePath
            transformer << 'send '+file.absolutePath
            dates."${file.absolutePath}" = file.lastModified()
        }
    }

    withPool {
        items.eachParallel { name ->
            def file = new File(name)
            if (file && (file.isDirectory() || file.isFile())) {
                if (file.isDirectory()) {

                    file.eachFile { File item ->
                        if (item.isFile()) {
                            checkFile(item)
                        }
                    }

                } else {
                    checkFile(file)
                }
            } else {
                println 'Error in file/folder '+name
            }
        }
    }

    task {
        transformer << 'End'
    }

    transformer.join()
---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email





--
E-mail: [hidden email]
Blog: http://www.jroller.com/vaclav
Linkedin page: http://www.linkedin.com/in/vaclavpech
Reply | Threaded
Open this post in threaded view
|

Re: Need some tips about doing some files operations

chiquitinxx
Oh! Thank you very much :)

Missing that poison pill, haha fun name, code look better now. 

In the operator, if the outputs is empty, when poison pill comes, a null pointer exception raises.

    final def works = new DataflowQueue()
    final def exit = new DataflowVariable()

    final op = operator(inputs: [works], outputs: [exit], maxForks: 4) { x ->
        doLongStuff(x)
    }

    def checkFile = { File file ->
        ...
        if (change) {
            println '  File changed: '+file.absolutePath
            task {
                works << file.absolutePath
            }
            dates."${file.absolutePath}" = file.lastModified()
        }
    }

    ...

    works << PoisonPill.instance

    op.join()


El 19/02/2013, a las 16:17, Václav Pech escribió:

Hi Jorge,

to me this looks like a dataflow problem - pipes and operators would make the code much simpler, I believe.
Your checkFile() method would feed work items into a dataflow queue, which would be read by an operator that performs your doLongStuff() action. The operator should have the maxForks property set to the desired level of parallelism.

Sending a Poison pill would then terminate the operator.

Vaclav




On Mon, Feb 18, 2013 at 8:44 AM, Jorge Franco Leza <[hidden email]> wrote:
Hello!

I want to make some stuff with files, and dunno what Gpars tecnique/approach to use. Maybe you can give me some guidance what to do. I'll try to explain it:

Main task (only 1 main task can be executed, don't allow two at same time):

while (not finished)
        do some before stuff()
        checkFiles([array of files and folders],lastModificationDates)
        do some after stuff()
        wait some time()
end-while

before and after stuff are just normal functions, also wait is like a sleep.

function checkFiles([], lastModificationDates)
        //Here i want to read all the files, and all the files in the folders
        //Check if the file has changed, and if is changed, call a function that make some 'heavy stuff'
        //That heavy stuff can (must) be done concurrently, and at same time that checking files
end function

So, the tips here are:
*While Im reading files, i want to start doing that 'heavy stuff'... actors or async?
*I want end the function when all files are read and all the 'heavy stuff' is done... dataflow? how detect all heavy stuff is done?

I hope you understand me. Thank you very much.

I have this approach atm, i don't like this number in agent, no idea how to manage long stuff done:

def items = ['.','../','../../groovy/data']
def dates = [:]

def doLongStuff(String name) {
    //println '  Starting long stuff '+name
   sleep(1000)
}

    def transformer
    final def works = new DataflowQueue()

    transformer = Actors.actor {
        def number = new Agent(0)
        def ending = false

        loop {
            react { msg->
                //println '  Msg->'+msg
                if (msg.startsWith('send')) {
                    number.send { increment() }
                    changes << msg.substring(5)
                    task {
                        works << doLongStuff(msg.substring(5))
                    }.then {
                        number.send { decrement() }
                        transformer << 'MaybeTerminate'
                    }
                } else if (msg == 'End') {
                    ending = true
                    transformer << 'MaybeTerminate'
                } else if (msg == 'MaybeTerminate') {
                    if (ending && number.val <= 0) {
                        terminate()
                    }
                } else {
                    println 'I don\'t understand message: ' + msg
                }
            }
        }
    }

    def checkFile = { File file ->
        def change = false
        if (dates."${file.absolutePath}") {
            change = !(dates."${file.absolutePath}"==file.lastModified())
        } else {
            change = true
        }
        if (change) {
            println '  File changed: '+file.absolutePath
            transformer << 'send '+file.absolutePath
            dates."${file.absolutePath}" = file.lastModified()
        }
    }

    withPool {
        items.eachParallel { name ->
            def file = new File(name)
            if (file && (file.isDirectory() || file.isFile())) {
                if (file.isDirectory()) {

                    file.eachFile { File item ->
                        if (item.isFile()) {
                            checkFile(item)
                        }
                    }

                } else {
                    checkFile(file)
                }
            } else {
                println 'Error in file/folder '+name
            }
        }
    }

    task {
        transformer << 'End'
    }

    transformer.join()
---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email





--
E-mail: [hidden email]
Blog: http://www.jroller.com/vaclav
Linkedin page: http://www.linkedin.com/in/vaclavpech

Reply | Threaded
Open this post in threaded view
|

Re: Need some tips about doing some files operations

Vaclav
Administrator
I'm glad this worked.
One minor improvement, I think you don't need to start a new task to write the path into a dataflow queue in "works << file.absolutePath"

Thanks for reporting the exception. I'll look into it. I believe, leaving the list empty "outputs: []" should work for now.

Vaclav



On Tue, Feb 19, 2013 at 8:16 PM, Jorge Franco Leza <[hidden email]> wrote:
Oh! Thank you very much :)

Missing that poison pill, haha fun name, code look better now. 

In the operator, if the outputs is empty, when poison pill comes, a null pointer exception raises.

    final def works = new DataflowQueue()
    final def exit = new DataflowVariable()

    final op = operator(inputs: [works], outputs: [exit], maxForks: 4) { x ->
        doLongStuff(x)
    }

    def checkFile = { File file ->
        ...
        if (change) {
            println '  File changed: '+file.absolutePath
            task {
                works << file.absolutePath
            }
            dates."${file.absolutePath}" = file.lastModified()
        }
    }

    ...

    works << PoisonPill.instance

    op.join()


El 19/02/2013, a las 16:17, Václav Pech escribió:

Hi Jorge,

to me this looks like a dataflow problem - pipes and operators would make the code much simpler, I believe.
Your checkFile() method would feed work items into a dataflow queue, which would be read by an operator that performs your doLongStuff() action. The operator should have the maxForks property set to the desired level of parallelism.

Sending a Poison pill would then terminate the operator.

Vaclav




On Mon, Feb 18, 2013 at 8:44 AM, Jorge Franco Leza <[hidden email]> wrote:
Hello!

I want to make some stuff with files, and dunno what Gpars tecnique/approach to use. Maybe you can give me some guidance what to do. I'll try to explain it:

Main task (only 1 main task can be executed, don't allow two at same time):

while (not finished)
        do some before stuff()
        checkFiles([array of files and folders],lastModificationDates)
        do some after stuff()
        wait some time()
end-while

before and after stuff are just normal functions, also wait is like a sleep.

function checkFiles([], lastModificationDates)
        //Here i want to read all the files, and all the files in the folders
        //Check if the file has changed, and if is changed, call a function that make some 'heavy stuff'
        //That heavy stuff can (must) be done concurrently, and at same time that checking files
end function

So, the tips here are:
*While Im reading files, i want to start doing that 'heavy stuff'... actors or async?
*I want end the function when all files are read and all the 'heavy stuff' is done... dataflow? how detect all heavy stuff is done?

I hope you understand me. Thank you very much.

I have this approach atm, i don't like this number in agent, no idea how to manage long stuff done:

def items = ['.','../','../../groovy/data']
def dates = [:]

def doLongStuff(String name) {
    //println '  Starting long stuff '+name
   sleep(1000)
}

    def transformer
    final def works = new DataflowQueue()

    transformer = Actors.actor {
        def number = new Agent(0)
        def ending = false

        loop {
            react { msg->
                //println '  Msg->'+msg
                if (msg.startsWith('send')) {
                    number.send { increment() }
                    changes << msg.substring(5)
                    task {
                        works << doLongStuff(msg.substring(5))
                    }.then {
                        number.send { decrement() }
                        transformer << 'MaybeTerminate'
                    }
                } else if (msg == 'End') {
                    ending = true
                    transformer << 'MaybeTerminate'
                } else if (msg == 'MaybeTerminate') {
                    if (ending && number.val <= 0) {
                        terminate()
                    }
                } else {
                    println 'I don\'t understand message: ' + msg
                }
            }
        }
    }

    def checkFile = { File file ->
        def change = false
        if (dates."${file.absolutePath}") {
            change = !(dates."${file.absolutePath}"==file.lastModified())
        } else {
            change = true
        }
        if (change) {
            println '  File changed: '+file.absolutePath
            transformer << 'send '+file.absolutePath
            dates."${file.absolutePath}" = file.lastModified()
        }
    }

    withPool {
        items.eachParallel { name ->
            def file = new File(name)
            if (file && (file.isDirectory() || file.isFile())) {
                if (file.isDirectory()) {

                    file.eachFile { File item ->
                        if (item.isFile()) {
                            checkFile(item)
                        }
                    }

                } else {
                    checkFile(file)
                }
            } else {
                println 'Error in file/folder '+name
            }
        }
    }

    task {
        transformer << 'End'
    }

    transformer.join()
---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email





--
E-mail: [hidden email]
Blog: http://www.jroller.com/vaclav
Linkedin page: http://www.linkedin.com/in/vaclavpech




--
E-mail: [hidden email]
Blog: http://www.jroller.com/vaclav
Linkedin page: http://www.linkedin.com/in/vaclavpech