Elvas Tower: Watchdog prevents start of timetable - Elvas Tower

Jump to content

  • 2 Pages +
  • 1
  • 2
  • You cannot start a new topic
  • You cannot reply to this topic

Watchdog prevents start of timetable Watchdog times out on timetable prerun phase Rate Topic: -----

#1 User is offline   roeter 

  • Vice President
  • Group: Status: Elite Member
  • Posts: 2,424
  • Joined: 25-October 11
  • Gender:Male
  • Country:

Posted 02 February 2019 - 03:22 AM

For the first time since about two years (during which time I only used a 'private' version) I decided to run the 'official' version of OR, and immediately ran into a serious problem.
My laptop is not the fastest computer in the world, and the route I'm running is quite large and I am running a very extensive, full 24 hour timetable. Starting a train later in the day, OR takes several minutes to start. But, alas, the watchdog function has no patience for this and aborts the program on time-out of the loader process.
Each of the main process threads sends a signal to the watchdog each time the process loop ends. The watchdog checks on these reports and aborts the program if, after a given time, a report is missing, assuming the process is 'hanging' in an endless loop.
The problem which occurs when starting a timetable is due to the 'prerun' process (method PreRunAI in AI.cs).

When a train is started, the prerun phase runs through the time upto the start of this train.
The prerun phase consists of three loops. The first loop runs from 00:00 (midnight) upto the activation of the first train. This process is limited, it only places trains which are started (e.g. in pools), and checks if any train must activated.

As soon as a train is activated, the second and main loop starts. This loop performs a full normal update, from the activation of the first train upto the defined (booked) start time of the player train. This is the normal update process, except there are no graphics and it runs with a higher time interval. But for large routes with lots of signals and many trains, it will take a long time, in my case several minutes. Unlike the normal updates, the prerun does not exit this loop allowing the 'top' method of the route to send a signal to the watchdog. So during the prerun, no signals are send and this will cause the watchdog to time out.

The third loop is only run if the player train is not ready at its booked time. The processing of the third loop is similar as for the second loop, and it runs until the player train is started, or when the end of the timetable has been reached. This third loop can also cause a time out.

Looking at my own private version, I avoided this problem by setting the time out for the watchdog to 100000 or so seconds, thus practically disabling the watchdog. Obviously this is no solution for the proper version.
What needs to be done is that signals need to be send to the watchdog from within the prerun loops, but I do not know enough of this logic to know if this is possible or how it can be done.
So I hope someone can help to sort this out.
If not, it's back to my private version again.

Regards,
Rob Roeterdink

#2 User is offline   Csantucci 

  • Member, Board of Directors
  • Group: Status: Elite Member
  • Posts: 6,996
  • Joined: 31-December 11
  • Gender:Male
  • Country:

Posted 02 February 2019 - 06:35 AM

Hi Rob,
I'm since ever quite convinced that the watchdog causes problems more than it avoids them. It's a case, but just yesterday I generated a version of OR MG where I added a general options checkbox allowing to disable the watchdog.

Coming to your problem, I see that method PrerunAI (the one for timetable) has already within its main loop a watchdog call that should restart watchdog count, that is
                    if (cancellation.IsCancellationRequested) return; // ping watchdog process

Doesn't this call work?

#3 User is offline   James Ross 

  • Open Rails Developer
  • Group: Status: Elite Member
  • Posts: 5,490
  • Joined: 30-June 10
  • Gender:Not Telling
  • Simulator:Open Rails
  • Country:

Posted 02 February 2019 - 10:13 AM

View Postroeter, on 02 February 2019 - 03:22 AM, said:

Looking at my own private version, I avoided this problem by setting the time out for the watchdog to 100000 or so seconds, thus practically disabling the watchdog. Obviously this is no solution for the proper version.
What needs to be done is that signals need to be send to the watchdog from within the prerun loops, but I do not know enough of this logic to know if this is possible or how it can be done.

You should pass the CancellationToken "cancellation" into any methods that could take too long; they can then call "cancellation.IsCancellationRequested" which serves two purposes:

  • Allows the code to exit faster when the user has tried to close the application
  • Pings the watchdog to indicate things are still alive

However, there are some strict requirements for calling this documented on the "Ping" method itself:

Quote

This method should be called (directly or indirectly) whenever a thread makes significant progress, such as when it is about to start processing the next file during loading. It must never be called inside a loop which is not guaranteed to exit (such as a while-true loop). It may be called within a loop that has fixed, known bounds (such as a typical for loop).

The requirements on when this should be or not be called are important: failure to call this (directly or indirectly) during a long but guaranteed-to-terminate loop will cause unnecessary hang reports, and calling this during a potentially infinite loop will result in hangs that are not reported.

So as long as the code is running a loop that is known to finish (even in the face of completely bogus values from the user or content), you can check the cancellation status at the start of each iteration.

#4 User is offline   James Ross 

  • Open Rails Developer
  • Group: Status: Elite Member
  • Posts: 5,490
  • Joined: 30-June 10
  • Gender:Not Telling
  • Simulator:Open Rails
  • Country:

Posted 02 February 2019 - 10:57 AM

View PostCsantucci, on 02 February 2019 - 06:35 AM, said:

I'm since ever quite convinced that the watchdog causes problems more than it avoids them. It's a case, but just yesterday I generated a version of OR MG where I added a general options checkbox allowing to disable the watchdog.

:(

Do you know what cases it is causing problems with? If it is crashing the program unexpectedly, the code is probably in need of checking cancellation too so it exits faster.

#5 User is offline   roeter 

  • Vice President
  • Group: Status: Elite Member
  • Posts: 2,424
  • Joined: 25-October 11
  • Gender:Male
  • Country:

Posted 03 February 2019 - 01:54 PM

View PostCsantucci, on 02 February 2019 - 06:35 AM, said:

Coming to your problem, I see that method PrerunAI (the one for timetable) has already within its main loop a watchdog call that should restart watchdog count, that is
                    if (cancellation.IsCancellationRequested) return; // ping watchdog process

Doesn't this call work?


The call is in the second loop. The problem occured on a testrun for a freighttrain which was formed of multiple entries. The starttimes for these entries were not yet properly set, and when I tried to start the last 'leg' of this train, the start was delayed. Therefor the third loop was started and as this has no watchdog reset, it timed out before the train was started.
So, an additional watchdog reset must be inserted into the third loop as well.

Quote

Do you know what cases it is causing problems with? If it is crashing the program unexpectedly, the code is probably in need of checking cancellation too so it exits faster.


I have not had any (further) troubles with the wachtdog in normal running, but the intervention of the wachtdog can be quite frustrating when debugging.
When stepping through code with 'step' (F10 or F11), one can suddenly find oneself stepping through the watchdog instead of the code which was being debugged. That's not only quite confusing but it's easy to loose track that way, which means starting all over again.
When stopping often in debug the watchdog can actually time out. And if an actual loop has accidently been created, it is clearly very frustrating when the watchdog kills the program when you are debugging the code to find to cause of the loop.

So, I would like to make a strong plea to disable the watchdog when the debugger is attached.

This can be done quite easy by replacing the lines at the head of the wachtdog process :

        void WatchdogThread()
        {
            Profiler.SetThread();
            Game.SetThreadLanguage();

            while (true)
            {


with these lines :

        void WatchdogThread()
        {
            Profiler.SetThread();
            Game.SetThreadLanguage();

            while (true && !Debugger.IsAttached)
            {


Regards,
Rob Roeterdink

#6 User is offline   James Ross 

  • Open Rails Developer
  • Group: Status: Elite Member
  • Posts: 5,490
  • Joined: 30-June 10
  • Gender:Not Telling
  • Simulator:Open Rails
  • Country:

Posted 03 February 2019 - 02:26 PM

View Postroeter, on 03 February 2019 - 01:54 PM, said:

I have not had any (further) troubles with the wachtdog in normal running, but the intervention of the wachtdog can be quite frustrating when debugging.
When stepping through code with 'step' (F10 or F11), one can suddenly find oneself stepping through the watchdog instead of the code which was being debugged. That's not only quite confusing but it's easy to loose track that way, which means starting all over again.

I don't think I've ever seen this happen while debugging, so I'm curious: what exactly is it jumping to, which line of code? And which version of Visual Studio are you using?

View Postroeter, on 03 February 2019 - 01:54 PM, said:

When stopping often in debug the watchdog can actually time out. And if an actual loop has accidently been created, it is clearly very frustrating when the watchdog kills the program when you are debugging the code to find to cause of the loop.

The watchdog won't kill the program when a debugger is attached, but it will pause in the debugger ("Debugger.Break()") after logging details (line 140, just down from the code you quoted).

View Postroeter, on 03 February 2019 - 01:54 PM, said:

So, I would like to make a strong plea to disable the watchdog when the debugger is attached.

I'd rather not, because then people will simply start creating code that triggers the watchdog but only for users and only know to fix it once bug reports start rolling in.

If it is the "Debugger.Break()" that is causing you issue while debugging, we can look for alternatives, but I do not want it to become silent. Developers do need to be alerted to possible hangs.

We might be able to figure out a way to only count time when not paused in the debugger, if that's not the case currently, so it does not trigger just because you stopped on a breakpoint or similar.

#7 User is offline   Csantucci 

  • Member, Board of Directors
  • Group: Status: Elite Member
  • Posts: 6,996
  • Joined: 31-December 11
  • Gender:Male
  • Country:

Posted 03 February 2019 - 11:50 PM

I practically always disable the watchdog when I debug, because it jumps in often, especially if you are proceeding line-by-line. With SVN I had a patch, now it's a stash that I apply before starting debug. It contains the watchdog disable as well as a comment of this line within GameStateRunActivity.cs
               SystemInfo.WriteSystemDetails(Console.Out);

because - at least in my quite slow computer - that call may last many seconds.

#8 User is offline   roeter 

  • Vice President
  • Group: Status: Elite Member
  • Posts: 2,424
  • Joined: 25-October 11
  • Gender:Male
  • Country:

Posted 04 February 2019 - 06:39 AM

View PostJames Ross, on 03 February 2019 - 02:26 PM, said:

I'd rather not, because then people will simply start creating code that triggers the watchdog but only for users and only know to fix it once bug reports start rolling in.

You mean to say that developpers always run OR with debug attached?
I certainly don't - it's way too slow. Ofcourse I can't speak for other developpers but this seems a bit far fetched.
Anyway - even if it's so, it doesn't prevent developpers from switching it off. That's the problem with developpers - they know how to develop the program, so they also know how to alter things they don't like.
The first things I do after getting 'official' code is to switch to 'console application' and disable the watchdog for debugging.

Regards,
Rob Roeterdink

#9 User is offline   Csantucci 

  • Member, Board of Directors
  • Group: Status: Elite Member
  • Posts: 6,996
  • Joined: 31-December 11
  • Gender:Male
  • Country:

Posted 04 February 2019 - 07:17 AM

Hi Rob,
that does interest me. I assume you switch to console application in order to see log messages in real time. How do you perform the switch?

#10 User is offline   roeter 

  • Vice President
  • Group: Status: Elite Member
  • Posts: 2,424
  • Joined: 25-October 11
  • Gender:Male
  • Country:

Posted 04 February 2019 - 11:30 AM

View PostCsantucci, on 04 February 2019 - 07:17 AM, said:

Hi Rob,
that does interest me. I assume you switch to console application in order to see log messages in real time. How do you perform the switch?

Hello Carlo,

that is indeed exactly why.
It's easy : in Visual Studio, select "Project" (make sure RunActivity is set as Project), select "Run Activity Properties", then "Application", and in application window, for "Output type" select 'console application' instead of 'windows application'.
Does not change anything w.r.t. building or running the program except that is shows the console window so you can see any messages the moment they are generated.

Regards,
Rob Roeterdink

  • 2 Pages +
  • 1
  • 2
  • You cannot start a new topic
  • You cannot reply to this topic

1 User(s) are reading this topic
0 members, 1 guests, 0 anonymous users