Re: foldParallel() doesn't work as inject(). Is that correct?
you say you're planning to process large files, hence using parallel collections is not the best choice, IMHO, since for parallel collections to work they need all the elements populated into the memory first.
I would rather model the process around the dataflow abstraction, with a reader process feeding the elements into a DataflowQueue, which is then read by a group of processing operators. At the end of the computation the operators would share and combine their local statistics.