Skip to content

Make Job Scheduling Aware of Queues the Server is Responsible For#755

Open
EProd-Rhansen wants to merge 11 commits intoHangfireIO:mainfrom
EProd-Rhansen:master
Open

Make Job Scheduling Aware of Queues the Server is Responsible For#755
EProd-Rhansen wants to merge 11 commits intoHangfireIO:mainfrom
EProd-Rhansen:master

Conversation

@EProd-Rhansen
Copy link
Copy Markdown

In working with this product in a distributed architecture, we have noticed our logs becomming inundated with job load exceptions from servers that are trying to schedule jobs when they do not have a reference to the assembly that contains the type that contains the job's logic.

This series of changes aims to isolate job scheduling such that servers only attempt to schedule or enqueue jobs that are assigned to a queue that the server is responsible for. Additionally, schedulers will now lock at the queue level which allows mutliple schedulers to run simultaneously so long as they are working from different queues.

None of these changes are breaking and can roll into any environment that is currently using the version that this change set was branched from.

@codecov-io
Copy link
Copy Markdown

codecov-io commented Dec 3, 2016

Codecov Report

Merging #755 into master will increase coverage by 2.26%.
The diff coverage is 81.84%.

Impacted Files Coverage Δ
src/Hangfire.Core/RecurringJobOptions.cs 63.63% <ø> (-36.37%) ⬇️
src/Hangfire.Core/Storage/JobStorageTransaction.cs 11.11% <ø> (ø) ⬆️
src/Hangfire.Core/Storage/JobStorageConnection.cs 8.33% <ø> (ø) ⬆️
src/Hangfire.Core/QueueAttribute.cs 100% <ø> (ø) ⬆️
src/Hangfire.Core/BackgroundJobServer.cs 0% <0%> (ø) ⬆️
src/Hangfire.SqlServer/ExpirationManager.cs 96.36% <100%> (+3.14%) ⬆️
src/Hangfire.Core/Storage/InvocationData.cs 73.62% <100%> (+0.89%) ⬆️
...rc/Hangfire.Core/Common/ReflectedAttributeCache.cs 100% <100%> (ø) ⬆️
src/Hangfire.SqlServer/SqlServerConnection.cs 99.7% <100%> (+0.02%) ⬆️
src/Hangfire.Core/States/ScheduledState.cs 100% <100%> (ø) ⬆️
... and 19 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ffc4912...6cd76d8. Read the comment docs.

@EProd-Rhansen
Copy link
Copy Markdown
Author

EProd-Rhansen commented Dec 5, 2016

Fixes: #228
Fixes: #406
Fixes: #595
Fixes: #683
Fixes: #704
Fixes: #743
Fixes: #748

@EProd-Rhansen
Copy link
Copy Markdown
Author

Is there a general feel for the willingness to move forward with accepting this PR?

@burningice2866
Copy link
Copy Markdown
Contributor

burningice2866 commented Jan 26, 2017

Unfortunately adding optional parameters to the public API is a breaking change so as i see it, all public methods/constructors now taking an optional queueName/queue parameter needs to be overloads instead for this to be feasible at all.

The reason for this is that optional parameters is a compiler trick, which injects the default value into the callee, therefor you'll get a MissingMethodException in existing code that haven't been recompiled against the updated methods.

https://lostechies.com/jimmybogard/2010/05/18/caveats-of-c-4-0-optional-parameters/

For folks shipping assemblies for a living, this means that optional argument values don’t version well, as callers have to recompile. When I used to work for a company whose product included a DLL, we avoided optional method arguments for just this reason. It’s not a reason not to use optional arguments, but it’s important to understand how they work so that you don’t run into headaches later.

@EProd-Rhansen
Copy link
Copy Markdown
Author

If you are referring to the changes to RecurringJob, then I did not add optional parameters for queue name. They already existed. The only thing I did was change the default value.

The one place that I can recall where I might have added queue name to the method signature where it did not already exist was to the creation of fire-and-forget jobs, but I am pretty sure that I implemented that with overloads rather than optional parameters.

If you can point out where I've done as you've said, then I'd be happy to look at correcting it. I generally shy away from default parameters whenever I can as I feel that they just end up bloating the method signature, on top of the fact that it is usually a pretty good indication that you're violating the single responsibility principle somewhere.

Beyond that, the problem that you've outlined (as I understand it) would only show its ugly face if someone simply dropped the updated assemblies into their production bin folder. In my mind. the only time that this could logically happen is if you're in an environment that does not treat updates to 3rd party references as a new build for your own applications - which is a terrible practice in its own right. In disciplined shops, you don't even get to production without a passing CI build - period - so this would never be a "breaking" update for them.

For a change to be a breaking one, it must necessitate a code change on your end. I've never (personally) considered the need to simply recompile my own code against an updated assembly as a breaking change due to the fact that I would NEVER perform the update in any other way. =)

@burningice2866
Copy link
Copy Markdown
Contributor

burningice2866 commented Jan 26, 2017

@EProd-Rhansen Even changing the default value can be a breaking change, since the original default value is already compiled into to existing libraries

And you're missing the scenario of a solution with multiple libraries depending on HangFire but build againts different version numbers.

(version numbers are random and only there to prove my point)

Solution is upgading to Hangfire 2.9 which contains these changed default values

  • Lib1 is built against hangfire 2.3
  • Lib2 is built against hangfire 2.5
  • Lib3 is built against hangfire 2.4

In this case yo can't update Hangfire for the solution without having to update all the libraries, and you don't control the release cycle of them.

@burningice2866
Copy link
Copy Markdown
Contributor

burningice2866 commented Jan 26, 2017

@EProd-Rhansen i'll admit that i haven't combed through all the code changes, i merely noted a signature change here https://github.com/HangfireIO/Hangfire/pull/755/files#diff-762b9c9bc1acba9c001d8544882b1d4aR34 and a range of changed default parameter values, which raised my flag since i know from experience that it can be a minefield.

After looking more carefully i suppose one can reason, that libraries not being recompiled wont fail, but one will experience unexpected behavior since the fixed queue-logic is not going into effect.

@EProd-Rhansen
Copy link
Copy Markdown
Author

Well, I was busy writing a well thought out rebuttal when you went and ruined it by coming to the conclusion all on your own! I guess that's what I get for thinking I know everything, lol.

It is splitting hairs, admittedly, but the worst that would happen in this case is that the framework exhibited the exact same behavior before and after the update was performed. The layman would just assume that their application was somehow auto-magically still using the old code until they recompiled it, whereas someone in-the-know would guess that someone cheated by just changing the default value of a public API.

For the record, I did make a (valiant) attempt at removing the optional parameters completely in favor of discrete method signatures in that class. However; that class is already overload hell. It would require some pretty dramatic restructuring of the API to remove them completely.

@hangfireCoder
Copy link
Copy Markdown

We also are currently facing this same exact issue and would like to see this implemented in the release.

@jcoutch
Copy link
Copy Markdown

jcoutch commented Apr 10, 2017

I would love to see this get into a build sooner than later. I'm getting 101,000+ "job load exception" messages between 3 servers in a 24-hour timespan. :-(

@WilkoSki
Copy link
Copy Markdown

WilkoSki commented Jun 2, 2017

Any movement on this? I agree with @jcoutch we are seeing too many job load exceptions and it just creates unnecessary noise.

@Z1ni
Copy link
Copy Markdown

Z1ni commented Sep 5, 2017

Waking this up, our server logs are filled with this exception noise.

@d3bgger
Copy link
Copy Markdown

d3bgger commented Oct 4, 2017

We want the server which added the recurrent job (or a direct job) to be the one that will execute the job!
If i understand correctly this pull request will be a solution for that.
Is there any other way to achieve this?
Thats an important issue for us.

thank you

@Tobias-H
Copy link
Copy Markdown

Tobias-H commented Oct 6, 2017

I registered account just to request you fix this issue. This is a blocker for us. We really wish to use Hangfire in more than one service per environment (multiple queues).

@d3bgger
Copy link
Copy Markdown

d3bgger commented Oct 6, 2017

Actually this seems to be working for us (we don't use retries for failed jobs):


// on app start
app.UseHangfireServer(new BackgroundJobServerOptions
{
	Queues = new[] { TaskQueueName.GetServerQueueName() }
});

// when adding recurring jobs
RecurringJob.AddOrUpdate<TaskRunner>(jobId, x => x.Run(taskId, true, null), cronExp, TimeZoneInfo.Local, TaskQueueName.GetServerQueueName());

// when executing jobs without schedule
IBackgroundJobClient hangFireClient = new BackgroundJobClient();
EnqueuedState myQueueState = new EnqueuedState(TaskQueueName.GetServerQueueName());
hangFireClient.Create<TaskRunner>(x => x.Run(id, false, null), myQueueState);

// helper for queue name
public static class TaskQueueName
{
	public static string GetServerQueueName()
	{
		return Environment.MachineName.ToLower().RemoveSpecialCharacters();
	}
}

@Tobias-H
Copy link
Copy Markdown

Tobias-H commented Oct 6, 2017

@nkwinder Strange that you claim this works for you since it contradicts the reason for this pull request (server only loading jobs in its queues).
@nkwinder Also, your code assumes you want to use same queue for all Hangfire servers on same machine, which is often not the case, at least for us.

@david-peden-q2
Copy link
Copy Markdown

We have also run into this issue. Many people here are claiming that it creates log noise. And that is true. But that is not the real problem. The real problem is that as a system's architecture gets more distributed, the chances of a successful re-run are reduced. In other words, as you scale a micro-service architecture up (each with their own dedicated queue against a single database), you inversely reduce the likelihood that a job will ever be picked up and successfully run. This is a serious design flaw. Let's move this discussion beyond the philosophical argument over the merit of optional parameters and, if necessary, considering versioning the codebase so that we can get a solution in place.

Thank you.

@burningice2866 @EProd-Rhansen @odinserj

@Z1ni
Copy link
Copy Markdown

Z1ni commented Dec 20, 2017

It seems that no progress has been made as there are some merge conflicts.
What's the "official" standpoint on this feature and integrating it to the codebase @odinserj?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.