Elvas Tower: Watchdog prevents start of timetable - Elvas Tower

Jump to content

Page 1 of 1
  • You cannot start a new topic
  • You cannot reply to this topic

Watchdog prevents start of timetable Watchdog times out on timetable prerun phase Rate Topic: -----

#1 User is offline   roeter 

  • Vice President
  • Group: Posts: Elite Member
  • Posts: 2,453
  • Joined: 25-October 11
  • Gender:Male
  • Country:

Posted 02 February 2019 - 03:22 AM

For the first time since about two years (during which time I only used a 'private' version) I decided to run the 'official' version of OR, and immediately ran into a serious problem.
My laptop is not the fastest computer in the world, and the route I'm running is quite large and I am running a very extensive, full 24 hour timetable. Starting a train later in the day, OR takes several minutes to start. But, alas, the watchdog function has no patience for this and aborts the program on time-out of the loader process.
Each of the main process threads sends a signal to the watchdog each time the process loop ends. The watchdog checks on these reports and aborts the program if, after a given time, a report is missing, assuming the process is 'hanging' in an endless loop.
The problem which occurs when starting a timetable is due to the 'prerun' process (method PreRunAI in AI.cs).

When a train is started, the prerun phase runs through the time upto the start of this train.
The prerun phase consists of three loops. The first loop runs from 00:00 (midnight) upto the activation of the first train. This process is limited, it only places trains which are started (e.g. in pools), and checks if any train must activated.

As soon as a train is activated, the second and main loop starts. This loop performs a full normal update, from the activation of the first train upto the defined (booked) start time of the player train. This is the normal update process, except there are no graphics and it runs with a higher time interval. But for large routes with lots of signals and many trains, it will take a long time, in my case several minutes. Unlike the normal updates, the prerun does not exit this loop allowing the 'top' method of the route to send a signal to the watchdog. So during the prerun, no signals are send and this will cause the watchdog to time out.

The third loop is only run if the player train is not ready at its booked time. The processing of the third loop is similar as for the second loop, and it runs until the player train is started, or when the end of the timetable has been reached. This third loop can also cause a time out.

Looking at my own private version, I avoided this problem by setting the time out for the watchdog to 100000 or so seconds, thus practically disabling the watchdog. Obviously this is no solution for the proper version.
What needs to be done is that signals need to be send to the watchdog from within the prerun loops, but I do not know enough of this logic to know if this is possible or how it can be done.
So I hope someone can help to sort this out.
If not, it's back to my private version again.

Regards,
Rob Roeterdink

#2 User is offline   Csantucci 

  • Member, Board of Directors
  • Group: Posts: Elite Member
  • Posts: 7,443
  • Joined: 31-December 11
  • Gender:Male
  • Country:

Posted 02 February 2019 - 06:35 AM

Hi Rob,
I'm since ever quite convinced that the watchdog causes problems more than it avoids them. It's a case, but just yesterday I generated a version of OR MG where I added a general options checkbox allowing to disable the watchdog.

Coming to your problem, I see that method PrerunAI (the one for timetable) has already within its main loop a watchdog call that should restart watchdog count, that is
                    if (cancellation.IsCancellationRequested) return; // ping watchdog process

Doesn't this call work?

#3 User is online   James Ross 

  • Open Rails Developer
  • Group: Posts: Elite Member
  • Posts: 5,510
  • Joined: 30-June 10
  • Gender:Not Telling
  • Simulator:Open Rails
  • Country:

Posted 02 February 2019 - 10:13 AM

View Postroeter, on 02 February 2019 - 03:22 AM, said:

Looking at my own private version, I avoided this problem by setting the time out for the watchdog to 100000 or so seconds, thus practically disabling the watchdog. Obviously this is no solution for the proper version.
What needs to be done is that signals need to be send to the watchdog from within the prerun loops, but I do not know enough of this logic to know if this is possible or how it can be done.

You should pass the CancellationToken "cancellation" into any methods that could take too long; they can then call "cancellation.IsCancellationRequested" which serves two purposes:

  • Allows the code to exit faster when the user has tried to close the application
  • Pings the watchdog to indicate things are still alive

However, there are some strict requirements for calling this documented on the "Ping" method itself:

Quote

This method should be called (directly or indirectly) whenever a thread makes significant progress, such as when it is about to start processing the next file during loading. It must never be called inside a loop which is not guaranteed to exit (such as a while-true loop). It may be called within a loop that has fixed, known bounds (such as a typical for loop).

The requirements on when this should be or not be called are important: failure to call this (directly or indirectly) during a long but guaranteed-to-terminate loop will cause unnecessary hang reports, and calling this during a potentially infinite loop will result in hangs that are not reported.

So as long as the code is running a loop that is known to finish (even in the face of completely bogus values from the user or content), you can check the cancellation status at the start of each iteration.

#4 User is online   James Ross 

  • Open Rails Developer
  • Group: Posts: Elite Member
  • Posts: 5,510
  • Joined: 30-June 10
  • Gender:Not Telling
  • Simulator:Open Rails
  • Country:

Posted 02 February 2019 - 10:57 AM

View PostCsantucci, on 02 February 2019 - 06:35 AM, said:

I'm since ever quite convinced that the watchdog causes problems more than it avoids them. It's a case, but just yesterday I generated a version of OR MG where I added a general options checkbox allowing to disable the watchdog.

:(

Do you know what cases it is causing problems with? If it is crashing the program unexpectedly, the code is probably in need of checking cancellation too so it exits faster.

#5 User is offline   roeter 

  • Vice President
  • Group: Posts: Elite Member
  • Posts: 2,453
  • Joined: 25-October 11
  • Gender:Male
  • Country:

Posted 03 February 2019 - 01:54 PM

View PostCsantucci, on 02 February 2019 - 06:35 AM, said:

Coming to your problem, I see that method PrerunAI (the one for timetable) has already within its main loop a watchdog call that should restart watchdog count, that is
                    if (cancellation.IsCancellationRequested) return; // ping watchdog process

Doesn't this call work?


The call is in the second loop. The problem occured on a testrun for a freighttrain which was formed of multiple entries. The starttimes for these entries were not yet properly set, and when I tried to start the last 'leg' of this train, the start was delayed. Therefor the third loop was started and as this has no watchdog reset, it timed out before the train was started.
So, an additional watchdog reset must be inserted into the third loop as well.

Quote

Do you know what cases it is causing problems with? If it is crashing the program unexpectedly, the code is probably in need of checking cancellation too so it exits faster.


I have not had any (further) troubles with the wachtdog in normal running, but the intervention of the wachtdog can be quite frustrating when debugging.
When stepping through code with 'step' (F10 or F11), one can suddenly find oneself stepping through the watchdog instead of the code which was being debugged. That's not only quite confusing but it's easy to loose track that way, which means starting all over again.
When stopping often in debug the watchdog can actually time out. And if an actual loop has accidently been created, it is clearly very frustrating when the watchdog kills the program when you are debugging the code to find to cause of the loop.

So, I would like to make a strong plea to disable the watchdog when the debugger is attached.

This can be done quite easy by replacing the lines at the head of the wachtdog process :

        void WatchdogThread()
        {
            Profiler.SetThread();
            Game.SetThreadLanguage();

            while (true)
            {


with these lines :

        void WatchdogThread()
        {
            Profiler.SetThread();
            Game.SetThreadLanguage();

            while (true && !Debugger.IsAttached)
            {


Regards,
Rob Roeterdink

#6 User is online   James Ross 

  • Open Rails Developer
  • Group: Posts: Elite Member
  • Posts: 5,510
  • Joined: 30-June 10
  • Gender:Not Telling
  • Simulator:Open Rails
  • Country:

Posted 03 February 2019 - 02:26 PM

View Postroeter, on 03 February 2019 - 01:54 PM, said:

I have not had any (further) troubles with the wachtdog in normal running, but the intervention of the wachtdog can be quite frustrating when debugging.
When stepping through code with 'step' (F10 or F11), one can suddenly find oneself stepping through the watchdog instead of the code which was being debugged. That's not only quite confusing but it's easy to loose track that way, which means starting all over again.

I don't think I've ever seen this happen while debugging, so I'm curious: what exactly is it jumping to, which line of code? And which version of Visual Studio are you using?

View Postroeter, on 03 February 2019 - 01:54 PM, said:

When stopping often in debug the watchdog can actually time out. And if an actual loop has accidently been created, it is clearly very frustrating when the watchdog kills the program when you are debugging the code to find to cause of the loop.

The watchdog won't kill the program when a debugger is attached, but it will pause in the debugger ("Debugger.Break()") after logging details (line 140, just down from the code you quoted).

View Postroeter, on 03 February 2019 - 01:54 PM, said:

So, I would like to make a strong plea to disable the watchdog when the debugger is attached.

I'd rather not, because then people will simply start creating code that triggers the watchdog but only for users and only know to fix it once bug reports start rolling in.

If it is the "Debugger.Break()" that is causing you issue while debugging, we can look for alternatives, but I do not want it to become silent. Developers do need to be alerted to possible hangs.

We might be able to figure out a way to only count time when not paused in the debugger, if that's not the case currently, so it does not trigger just because you stopped on a breakpoint or similar.

#7 User is offline   Csantucci 

  • Member, Board of Directors
  • Group: Posts: Elite Member
  • Posts: 7,443
  • Joined: 31-December 11
  • Gender:Male
  • Country:

Posted 03 February 2019 - 11:50 PM

I practically always disable the watchdog when I debug, because it jumps in often, especially if you are proceeding line-by-line. With SVN I had a patch, now it's a stash that I apply before starting debug. It contains the watchdog disable as well as a comment of this line within GameStateRunActivity.cs
               SystemInfo.WriteSystemDetails(Console.Out);

because - at least in my quite slow computer - that call may last many seconds.

#8 User is offline   roeter 

  • Vice President
  • Group: Posts: Elite Member
  • Posts: 2,453
  • Joined: 25-October 11
  • Gender:Male
  • Country:

Posted 04 February 2019 - 06:39 AM

View PostJames Ross, on 03 February 2019 - 02:26 PM, said:

I'd rather not, because then people will simply start creating code that triggers the watchdog but only for users and only know to fix it once bug reports start rolling in.

You mean to say that developpers always run OR with debug attached?
I certainly don't - it's way too slow. Ofcourse I can't speak for other developpers but this seems a bit far fetched.
Anyway - even if it's so, it doesn't prevent developpers from switching it off. That's the problem with developpers - they know how to develop the program, so they also know how to alter things they don't like.
The first things I do after getting 'official' code is to switch to 'console application' and disable the watchdog for debugging.

Regards,
Rob Roeterdink

#9 User is offline   Csantucci 

  • Member, Board of Directors
  • Group: Posts: Elite Member
  • Posts: 7,443
  • Joined: 31-December 11
  • Gender:Male
  • Country:

Posted 04 February 2019 - 07:17 AM

Hi Rob,
that does interest me. I assume you switch to console application in order to see log messages in real time. How do you perform the switch?

#10 User is offline   roeter 

  • Vice President
  • Group: Posts: Elite Member
  • Posts: 2,453
  • Joined: 25-October 11
  • Gender:Male
  • Country:

Posted 04 February 2019 - 11:30 AM

View PostCsantucci, on 04 February 2019 - 07:17 AM, said:

Hi Rob,
that does interest me. I assume you switch to console application in order to see log messages in real time. How do you perform the switch?

Hello Carlo,

that is indeed exactly why.
It's easy : in Visual Studio, select "Project" (make sure RunActivity is set as Project), select "Run Activity Properties", then "Application", and in application window, for "Output type" select 'console application' instead of 'windows application'.
Does not change anything w.r.t. building or running the program except that is shows the console window so you can see any messages the moment they are generated.

Regards,
Rob Roeterdink

#11 User is online   James Ross 

  • Open Rails Developer
  • Group: Posts: Elite Member
  • Posts: 5,510
  • Joined: 30-June 10
  • Gender:Not Telling
  • Simulator:Open Rails
  • Country:

Posted 04 February 2019 - 03:13 PM

View PostCsantucci, on 03 February 2019 - 11:50 PM, said:

I practically always disable the watchdog when I debug, because it jumps in often, especially if you are proceeding line-by-line. With SVN I had a patch, now it's a stash that I apply before starting debug. It contains the watchdog disable as well as a comment of this line within GameStateRunActivity.cs
               SystemInfo.WriteSystemDetails(Console.Out);

because - at least in my quite slow computer - that call may last many seconds.

The system details shouldn't take that long, but we can put an exception in if that is still necessary in the current version. Does it trigger the watchdog without any code changes, just running it from Visual Studio?

View Postroeter, on 04 February 2019 - 06:39 AM, said:

You mean to say that developpers always run OR with debug attached?
I certainly don't - it's way too slow. Ofcourse I can't speak for other developpers but this seems a bit far fetched.

While developing code, I'm sure they'll run it with the debugger attached mostly. I know I do. Of course you'll also run it without, but given how I can barely tell it's running with the debugger (unless I have conditional breakpoints, now they are slow) I don't expect developers to run it without constantly.

View Postroeter, on 04 February 2019 - 06:39 AM, said:

Anyway - even if it's so, it doesn't prevent developpers from switching it off. That's the problem with developpers - they know how to develop the program, so they also know how to alter things they don't like.

I want the watchdog being enabled to be the default for developers just like it is for users; I did not say anything about preventing developers turning stuff off if they were so inclined.

View Postroeter, on 04 February 2019 - 06:39 AM, said:

The first things I do after getting 'official' code is to switch to 'console application' and disable the watchdog for debugging.

And I hope you realise this is at least partially why your code is what we're discussing here with regards the watchdog. Running with the watchdog during development greatly increases your chances of spotting these bugs before the code is "done". It obviously isn't perfect, but please at least try to understand what's going on here and help identify the precise issues.

#12 User is offline   roeter 

  • Vice President
  • Group: Posts: Elite Member
  • Posts: 2,453
  • Joined: 25-October 11
  • Gender:Male
  • Country:

Posted 10 February 2019 - 04:46 PM

Today, I had my first debug sessions using the 'official' version with VS2017.
The problem is complex so I needed a lot of stepping through long methods. After a certain time, the debugger started switching steps between the method I was testing and the watchdog code (the "while (true)" loop). It litteraly jumped between the code in TTTrain and Watchdog with each step (F10).
It is quite simply unworkable in this way, so I have put the !Debugger.isAttached back in again.

Regards,
Rob Roeterdink

#13 User is offline   roeter 

  • Vice President
  • Group: Posts: Elite Member
  • Posts: 2,453
  • Joined: 25-October 11
  • Gender:Male
  • Country:

Posted 11 February 2019 - 08:19 AM

The watchdog problem is more complex than I figured.
Today, I started my main timetable in the updated version with the proper watchdog setting and got this result.
Attached File  OpenRailsLog_190211.txt (113.99K)
Number of downloads: 288


As mentioned before, the pre-run for this timetable takes a very long time. During that time, the other threads have nothing yet to do.
It seems this causes the watchdog to time-out these threads. There can't be a loop in the render thread as the render thread is not yet active.

Regards,
Rob Roeterdink

Page 1 of 1
  • You cannot start a new topic
  • You cannot reply to this topic

1 User(s) are reading this topic
0 members, 1 guests, 0 anonymous users