Skip to content

Changed wait_for_group logic#1

Open
baratrion wants to merge 1 commit into
fergalwalsh:masterfrom
baratrion:master
Open

Changed wait_for_group logic#1
baratrion wants to merge 1 commit into
fergalwalsh:masterfrom
baratrion:master

Conversation

@baratrion

Copy link
Copy Markdown

Previously, the logic for waiting for tasks within a group had a small bug: it was calculating remaining time before brpoping a task, effectively only getting result of a single task that is close to timeout.

It's better to go over it with an example:

def task(x):
    time.sleep(x)
    return x

l = [1.2, 2.3, 3.1, 4.6, 4.7, 4.8, 5, 6, 7]
tasks_args = [(x,) for x in l]
task_ids = fifo.queue_tasks('example.tasks.task', tasks_args,
                            max_wait=30, result_timeout=30)


results = fifo.wait_for_group(task_ids, timeout=5)

Results in:

('2b39d9d7e53a477bbdc88b42a3ceba75', 'timeout', None)
('fff77873d32c403ba67bbb948653d52b', 'timeout', None)
('f12674188d0347ce896862c0ebdaded0', 'timeout', None)
('0751786d6f584ea78ad312879b556ec8', 'timeout', None)
('a9a69cca2cb7406fa62246648cd00f10', 'timeout', None)
('9786029043ba4cd9b74318998138f14d', 'completed', 2.3)
('b2c8df2304d64d258da1384039c8f82a', 'completed', 3.1)
('a2bad899fbea4bb395eb82468dd4996b', 'completed', 4.6)
('7c5ed51ca6b94828a6b7937b955cf654', 'completed', 1.2)

After this change the result is as follows:

('c933d3443fdd4d6fa13b3260ebb7ad4f', 'completed', 3.1)
('be707f648da54cf2a5e4747bd54fb6d1', 'timeout', None)
('28f64c68df9f412dbabfa1ea2b459908', 'completed', 1.2)
('28f0ece03f214f6c8690406ebcb9d664', 'completed', 2.3)
('c014f3c8178145829e665c940442de9c', 'completed', 5)
('0577ba31d591415fb3163d355c8982ea', 'completed', 4.6)
('d1d2983173104fe98d0289594cb72fcc', 'timeout', None)
('3bb4b55e08b147ebadcd82304adaf680', 'completed', 4.7)
('e3875a3de1be4568a22018484557314f', 'completed', 4.8)

@fergalwalsh

Copy link
Copy Markdown
Owner

Hi @baratrion , thanks, for the bug report. I have ran your test code and confirmed there is a problem but I think the source is a bit different from what you describe.

I have rechecked the logic and it seems correct for how brpop works: When any items are in the list the command will immediately return with a single item. If there are many items in the list it still only returns one each time brpop is called. If there are no items in the list then it blocks/waits until there is an item or until timeout seconds have passed.

The problem I see is that timeout is an integer (must be for redis) and the condition for continuing is if timeout > 0:. If the remaining time (deadline - time.time()) is < 1s but > 0s it doesn't wait so effectively the timeout is 1s less than the specified timeout argument.

The solution I see is to change the condition to if timeout >= 0: or to move the int casting after the condition. If you try either of these your example works fine as far as I can see. I will open an issue and make a fix for this bug.

@baratrion

Copy link
Copy Markdown
Author

Hi @fergalwalsh, yes, you've elaborated the problem nicely. However, without trying out your suggestion first, wouldn't if timeout >= 0 cause brpop to block indefinitely when timeout = 0?

Then, it'd cause it to block at least until there's an item, so wait_for_group wouldn't respect its timeout at all, no?

@fergalwalsh

Copy link
Copy Markdown
Owner

@baratrion yup right you are. That must be why I used > not >= in the first place.

Unfortunately the timeout must be an int for redis. If we round up the remaining time to 1s then it can end up waiting for more than the specified timeout in total which isn't really acceptable. On the other hand in the current situation it may not wait for the full timeout which also isn't good.

So at the moment I'm not really sure what the best solution is. Any ideas?

@baratrion

Copy link
Copy Markdown
Author

Probably the best solution would be Redis having floating timeouts, but I'd suggest taking a look at my changes also.

Do you have any objections using an interval (default 1 secs) for popping results and calculating total_elapsed every time?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants