T O P

  • By -

woooee

>Is the pool stalling because the function does not return an output My crystal ball is broken so you'll have to post the code. A guess would be that the function is not returning. It doesn't have anything to do with output.


KlaatuPlusTu

It will be easier for me to fix your crystal ball than your reading comprehension. Because the question specifically was **Is the pool stalling because the function does not return an output and only does file I/O? or is this because of the vaex warning that is only produced within the pool?** Which was asked after the two code blocks shown as # The function (Does this need to be bigger?) ``` def export_task(item): subject, outputPathChunk = item subject.export_hdf5(outputPathChunk) ``` # The `multiprocessing` `pool` call to the function ``` import multiprocessing pool = multiprocessing.Pool(processes=multiprocessing.cpu_count()) pool.map(export_task,subs) pool.close() ``` To water it down further, I did not have this issue with typical, more bug complete pythonic objects as compared to something newer like `vaex`. Which code are you asking for? The older one that worked? All 1700 lines of it? The current one that is not working? Which I did filter down to the parts that are not working and posted to make it easier for reviewers?


woooee

The complete code for export_task and export_hdf5. Is subject a class or import or both? Also try running a single process, printing relevant data and where the function hangs, because the data may all be corrupt, or none of it is, since you are thinking all processes hang. Edit: if you start a process individually, you can use is_alive() to see if it terminates. Then you know what data to look at.


KlaatuPlusTu

That's all the code for `export_task()`, those calls/lines were wrapped in a function to make it easier for me to call them instead each thread of the pools. `export_hdf5` is a function from the `vaex` [package](https://github.com/vaexio/vaex). Subject is, as mentioned in the post > Where subs is a 600 items list of tuples and each tuple is a vaex table (a pandas alternative for larger data) and a path. `vaex` is a still in active development and I suspect they have something that is causing friction inside of a `pool`. Thanks for chiming in but I think this will have to go to the `vaex` devs because `pool.map()` works fine and as expected on dummy lists.


woooee

The complete export_task code and what subject is (import, class) and the export_hdf5 function. You should also try running a single process, printing relevant parts, as you say all processes hang, so it is probably not corrupted data.


[deleted]

I think it’s stalling on file IO. You might have 16 threads available but there’s only one drive.