Roll20 uses cookies to improve your experience on our site. Cookies enable you to enjoy certain features, social sharing functionality, and tailor message and display ads to your interests on our site and others. They also help us understand how our site is being used. By continuing to use our site, you consent to our use of cookies. Update your cookie preferences .
×
Create a free account

[Script] Airbag - API Crash Handler

March 14 (6 years ago)

Edited March 24 (6 years ago)
GM Michael
API Scripter

We all try to make our code as stable as possible, but sometimes crashes still happen.  API crashes can easily stall a game, lead to confusion, slow script development, break game immersion, and cause enormous frustration as you struggle to understand what actually went wrong.  Users receive no warning a crash has occurred and must instead figure it out on their own and then navigate to the API page to restart the API.  Even then, all you have is a single console line lacking formatting, making it difficult to draw conclusions regarding the source of the problem.

Airbag

Airbag is a two-part script that wraps the rest of your codebase, isolating the fragile API from exceptions thrown by your installed scripts and providing direct insight to the user when a crash occurs as well as the ability to force an API restart.

How

Roll20's API concatenates all of your scripts into a single enormous file, hence the sometimes-astronomical line numbers when you receive an exception.  Ordinarily, all scripts are self-contained and are themselves compilable, but Airbag's halves are not individually valid, instead relying on each other to function.  By sandwiching the rest of your API code in the middle, Airbag acts as an oversized try-catch block and can override API functions called by your other scripts.  This means that exceptions within your installed scripts will be caught by Airbag, rather than the API, allowing Airbag to inform the user and prompt them to restart the dead scripts.

Airbag does not neuter the value of in-script error handling.  Airbag will treat any unhandled exception that bubbles up to it as an unintended fatal error and do everything in its power to shut down the codebase and alert the user without killing the API as a whole.

Airbag will catch exceptions from...

  • API Main Thread: If initial API startup would immediately fail due to a logic error such as a bad reference, Airbag will deploy.
  • on(type, handler): This API function is shadowed to allow Airbag to wrap the asynchronous handler code.
  • setTimeout(): functions scheduled with setTimeout will be handled by Airbag.

Airbag will not catch exceptions from...

  • Infinite Loops: No support is planned at this time.  Halting on while(true) before API crash would require shadowing the while keyword.
  • Scheduled Functions (other than setTimeout): Support is planned (but is not yet implemented) for many scheduling functions, specifically setInterval(), _.delay(), and _.defer().  I am open to supporting other functions, but this effort has diminishing returns.
  • Asynchronous Get/Set: Things like reading from the gmnotes section are functions of objects that Airbag might not have access to or get the chance to shadow.  If someone can figure out a safe way to do this, even in select cases, I'm all ears, but I'm skeptical.

Operation

Should an exception occur while Airbag is installed, Airbag will catch the exception, dump the message and stack trace to the console log and chat log, and finally prompt the GM to restart the API at their leisure with a chat button.

In chat, you'll specifically see three things:

  • SRC: Airbag attempts to ascertain what source file and local line number actually threw the exception.  (This may be inaccurate if the offending script or the one before it is minified.)  This specific item will also not display unless the offending script is Marked (see below for developers).
  • MSG: This is the exception message.
  • STK: The stack trace, which is printed in global line numbers.

Code

To run Airbag, you must install both scripts.  AirbagStart MUST be the very first script installed in a game (unfortunately, this means uninstalling and reinstalling all your existing scripts if you already have some, or at minimum prepending AirbagStart to your first script if you have the source).  Similarly, AirbagEnd MUST be the very last script installed.  This allows them to wrap the rest of your scripts.  If you have scripts that are outside the Airbag sandwich, Airbag will not be able to catch the exceptions they throw.

Source

Changelog

  • 1.0: Initial Release
  • 1.1: Add on() support
  • 1.2: Fix duplicate Airbag on() registration
  • 1.3 Add setTimeout and clearTimeout handling, add line localization

For Developers

Utility Functions

Airbag supports some functions to help your development.

// Call on the very first line of your file (even before boilerplate)
void MarkStart(string scriptName)
// Call on the last meaningful line of your file (whitespace afterwards won't hurt it, but isn't recommended)
void MarkStop(string scriptName)
// Converts a global line number into a local line number for a script
// Returns: {string Name, int Line} where Name is the name of the script it is from and Line is the local line number within that script.
// Requires: Your script must be Marked with MarkStart and MarkStop.
obj ConvertGlobalLineToLocal(int globalLineNumber)

By Marking your file, Airbag will know where your file starts and stops, meaning it will be able to mark your file as the source of exceptions and even tell you the line number in your file.  In case you don't trust Airbag to be installed, you can always do something like...

if (MarkStart) MarkStart('MyScriptName');
What does Airbag Deployment Do?

When Airbag detects a fatal error, it performs the following operations in order:

  1. The codebaseRunning internal flag is set to false.  (All shadowed functions check this first, so any shadowed functions called after this point will do nothing.)
  2. All registrations by other scripts to the on() function are purged.
  3. All timeouts are purged.
  4. globalconfig is set to a blank object.
  5. Error is logged.
  6. User is alerted and prompted with a [Reboot] button.

Future Development

Current plans for the future are most immediately the remaining schedule functions, but I am open to providing new development tools if they are popular.  I would also like to expand the line number conversion system to include the full stack trace and even provide guesses when it detects an error outside a Marked script.

March 14 (6 years ago)
GiGs
Pro
Sheet Author
API Scripter

Interesting idea. Are scripts always loaded in the order they have been installed?

March 15 (6 years ago)

Edited March 15 (6 years ago)
GM Michael
API Scripter

Seem to be.  In my testing, I had 6 scripts running and they executed in the order they were installed despite that order being neither alpha nor anti-alpha.  Setting scripts to active/inactive once installed did not affect the order, so it really does appear to simply iterate over them in installation order.  Theoretically, you could probably change the order up a bit with hoisting, but I do not believe JS has any means by which a standalone-compilable script could escape the Airbag sandwich.

March 15 (6 years ago)
The Aaron
Roll20 Production Team
API Scripter

That is a neat idea, particularly the restart part.

Its causing my sandbox to crash

March 15 (6 years ago)

Edited March 15 (6 years ago)
GM Michael
API Scripter

How odd...   I guess search your scripts for .N?  Maybe something's calling eval?  I can't imagine what else it might be that would do that.  Or add me as a GM to your game, maybe and I can try to take a look?

March 15 (6 years ago)
The Aaron
Roll20 Production Team
API Scripter

The only thing in the repo with .N is DLEllipseDrawer:

//treehugger.js
//Author: Tim Matchen
/*Use: Simply draw an ellipse on the dynamic lighting layer and the script 
replaces the ellipse with a n-sided polygon approximating the ellipse. The
default number of sides is 20; this can be adjusted using the command 
!treehugger n, where n is the desired number of sides. For example, 
!treehugger 10 would generate 10-sided polygons instead of 20.*/

on("ready",function(){
var gc = globalconfig && globalconfig.dlellipsedrawer;
    if(isNaN(gc.N) != 1){
        var n = Math.ceil(gc.N);
    }
    else{
        var n = 20;
        log("Invalid input from globalconfig! Using n = 20")
    }
log("Treehugger is up and running!")
on("add:path",function(obj){
   /* ... */



The Aaron said:

The only thing in the repo with .N is DLEllipseDrawer:

//treehugger.js
//Author: Tim Matchen
/*Use: Simply draw an ellipse on the dynamic lighting layer and the script 
replaces the ellipse with a n-sided polygon approximating the ellipse. The
default number of sides is 20; this can be adjusted using the command 
!treehugger n, where n is the desired number of sides. For example, 
!treehugger 10 would generate 10-sided polygons instead of 20.*/

on("ready",function(){
var gc = globalconfig && globalconfig.dlellipsedrawer;
    if(isNaN(gc.N) != 1){
        var n = Math.ceil(gc.N);
    }
    else{
        var n = 20;
        log("Invalid input from globalconfig! Using n = 20")
    }
log("Treehugger is up and running!")
on("add:path",function(obj){
   /* ... */



Yea that was the culprit thank you

March 15 (6 years ago)

Edited March 15 (6 years ago)
GM Michael
API Scripter

Update: nevermind.  Got PM'd about it. Airbag is fine.  The issue is unrelated.

Original post...

How would that crash out Airbag though?

Having said that, that's such a weird way to check to see if...  Honestly, I don't even know what that's trying to do.

var gc = globalconfig && globalconfig.dlellipsedrawer;//This will try to get .dlellipsedrawer, but I can't find a definition for
// that anywhere. So this'll be falsy.
if(isNaN(gc.N) != 1){// .N won't exist.
var n = Math.ceil(gc.N);// this is just setting a pointless temp variable to something that'll crash.
}
March 16 (6 years ago)
Jakob
Sheet Author
API Scripter
Well, isNaN(undefined) is true, and true == 1, so this thing won't execute ... ah, this is JavaScript art :D:D:D.
March 18 (6 years ago)

Edited March 18 (6 years ago)
Ammo
Pro

EllipseDrawer is trying to see if globalconfig.dlelllipsedrawer.N is configured (i.e. not undefined and a valid number) to use as the value of 'n' and otherwise set it to 20.   It crashes because it does not check if globalconfig.dlellipsedrawer actually exists and it does not.  Hence this ends up being '<undefined>.N' 

It isn't a temp variable, since var variables aren't local in JavaScript.   I assume the value is used further down as the default value for the command lne argument 'n'.  

Sandwich is a cool idea.  Do exceptions thrown in event handlers on(... , ...) bubble up back to the block where the => arrow function is defined in JavaScript?   

March 18 (6 years ago)
keithcurtis
Forum Champion
Marketplace Creator
API Scripter

Quoting the Aaron from another thread, because this is just such a useful idea for installation:

The Aaron said:

Side note, you can probably modify the first script tab you have installed into the start for api-crash-handler and just append the replaced script to the end before adding the end part of api-crash-handler.  Might be much less effort. =D



March 20 (6 years ago)

Edited March 21 (6 years ago)
GM Michael
API Scripter

keithcurtis said:

Quoting the Aaron from another thread, because this is just such a useful idea for installation:

The Aaron said:

Side note, you can probably modify the first script tab you have installed into the start for api-crash-handler and just append the replaced script to the end before adding the end part of api-crash-handler.  Might be much less effort. =D



That's a good idea!  Thanks!

March 21 (6 years ago)

Edited March 21 (6 years ago)
GM Michael
API Scripter

Ammo said:

[snip]

Sandwich is a cool idea.  Do exceptions thrown in event handlers on(... , ...) bubble up back to the block where the => arrow function is defined in JavaScript?   

After testing, it seems that event-driven functions aren't going to trigger the airbag.  They just directly crash the API because they don't go through the normal execution callstack.  :(

About the only way around that would be to implement some sort of event registration system that would pass the data around to any scripts that registered with airbag, but that requires developers to hook into it themselves.  The goal here was to avoid doing something like that, but it looks like it might be the only way...

Idk, maybe someone who knows more about js than I can think of some weird js quirk to do it, but I can't, so maybe some future version of airbag will support developers doing something like...

airRegister('chat:message', (msg) => {
     // do stuff that could be unsafe
});

The ultimate goal would be to minimize the effort on the author's part to encourage use of Airbag over the stock on() function.

March 21 (6 years ago)
The Aaron
Roll20 Production Team
API Scripter

Since your airbag creates a new scope around all the scripts, you could provide your own on() function that shadows the global one and seamlessly passes the registration through with a try/catch decorator wrapping it. 

March 21 (6 years ago)
The Aaron
Roll20 Production Team
API Scripter

You’ll probably also want to provide new versions of setTimeout(), setInterval(), _.delay(), and _.defer(). 

March 21 (6 years ago)

Edited March 21 (6 years ago)
GM Michael
API Scripter

Good idea!

*8 hours of mostly sleep later*

V1.1 should have an operational shadow for on().  I'll add scheduling next.

March 21 (6 years ago)
Ammo
Pro

As long as you are being clear that you are only trying to catch SOME errors, then that's fine.   There are other asynchronous operations that you will not know about.   For example, in the Roll20 API there are asynchronous reads (like gmnotes on journal entries) that you won't be able to wrap.  Also, more advanced scripts can just choose to do things asynchronously, via Promise or otherwise.   It is probably ok that you don't catch these, because developers that use them are probably capable of deciding to catch their own errors if they want to.  Just clarifying that this can't ever be all of them.

I am more worried about the idea of restart.   If you restart a script in the same script host, won't you get duplicate handlers for all the events?   Won't every event now get handled twice (and then three times, etc?)    

I applaud what you are trying to do, and have at it as long as it is fun. :)   But a long term solution to this problem would be to petition Roll20 for a built-in GM command like "!roll20_api_restart" that actually tears down the script host (the software running the scripts) and starts it back up again, like restarting from the console.  

March 21 (6 years ago)
The Aaron
Roll20 Production Team
API Scripter

That double subscribe is a great point. Maybe aiming more for notification is a better goal. 

March 22 (6 years ago)

Edited March 22 (6 years ago)
GM Michael
API Scripter

Double-registration should be fixed.  Scheduling will use a similar system.

Also, I updated the OP to be more clear as to what Airbag can and can't do.

March 22 (6 years ago)
Ammo
Pro


Michael G. said:

Double-registration should be fixed.  Scheduling will use a similar system.

Also, I updated the OP to be more clear as to what Airbag can and can't do.

Looks good!   Also, you write very nice code.   I mean, except for the fact that it is in JavaScript :) :)

For increased usefulness, you and The Aaron (because people listen to him) could ask Roll20 nicely for a simple API function that allows you to ask for an array containing the script host's starting line numbers for loaded scripts.    Then you could translate the line numbers in crashes without making every script have to implement that themselves.  

It would mean we don't have to do this nonsense:

https://github.com/derammo/der20/blob/master/include/header.js.txt

https://github.com/derammo/der20/blob/947375ee8c38ba0554f6a4220b964e72cc2f8902/src/der20/plugin/main.ts#L344

It would be maximum bang for the buck to do this in one place here.   Roll20 can't do it for you, even if they ever implement line number translation on unhandled exceptions in the script host, because you no longer allow the exceptions to fly out of script host.   That means the only place this can happen now is your exception handler.





March 22 (6 years ago)

Edited March 22 (6 years ago)
GM Michael
API Scripter

Oh, if the Roll20 team just implemented chat alerts of API failure, Airbag wouldn't exist.

Having said that, having a line number grabber would be handy.  Airbag could have two functions that scripts dependent on it could use: MarkScriptStart(scriptName) and MarkScriptStop(scriptName).  You'd just call them on the first and last line of your script and it would alert Airbag what line numbers your script occupies, so it can then point to it if something goes wrong.  If something was reported outside that range, is it possible for an API script to read the API js file?  If nothing else, reporting the lines around the error would be useful.  If some installed scripts were using the mark functions, it could provide some bounds on which script failed, even the falling script wasn't using the mark functions itself.

Also, because all this would require line number conversion functions, other scripts could just use those for their own internal error reporting.

...I get the feeling that Airbag is about to balloon into a debugging suite because now my mind is going to breakpoints and how to implement them...

March 24 (6 years ago)

Edited March 24 (6 years ago)
GM Michael
API Scripter

Further Discoveries

So in my experimenting today, I discovered that roll20 directly concatenates files.  That is to say, there's no line break between them.  That means that any minified js files are going to be a single line, and if you have multiple adjacent files that are minified, Airbag won't be able to tell them apart.  I suppose it can just warn the user of such things, but still, that's rather disappointing.

Also, this has rather horrifying implications for if someone has a comment on the last line of their script.  O.o. (reported as a bug here)

1.3.2 Update

In other news, 1.3 is out: setTimeout() is now protected by Airbag and localized line numbers can now be detected.  To set this up for your script, call MarkStart('MyScriptName') and MarkStop('MyScriptName') on the first and last lines of your script.

Development Prospects

Also, current thought for breakpoints...  I'm thinking something like...

// Dumps state, this, and globalconfig to chat, then suspends this thread's operations until it's over
Promise Breakpoint(conditional, [objects, to, dump...])
// ex:
// await Breakpoint(true, myVar, myOtherVar);

I'm thinking the user would then get a chat log of the object tree structure with the ability to expand nodes.  state, this, and globalconfig would always be included, but if a user provided other objects or variables, those would be dumped too.  Of course, because this has to arrest the current thread, that means we need to use Promises, which means your whole function now needs to be async, which might be more trouble than it's worth, but hopefully not!

Relatedly, I was thinking about trying to shield promises.  In principle, you could just add a catch block after every then or catch block added by the user, but the redirection gets really weird really fast on this one, so I don't see myself ever supporting them.

March 25 (6 years ago)
Ammo
Pro

ok I have a gross suggestion:  

- hook some function that doesn't do anything harmful (like log(...)) 

- detect specific arguments passed, and use these as MarkScriptStart and MarkScriptEnd

=> now scripts that support this don't require Airbag to run

March 25 (6 years ago)

Edited March 25 (6 years ago)
GM Michael
API Scripter

I don't see how that's a whole lot better than just checking for MarkStart's existence in-line.  Plus, whatever I'd shadow to do that would take a performance hit from the string comparison.  It's clever, but I don't think it's the right way to do it.  I'm not opposed to doing something like that in the future if I were to add some debugging functions though.

March 26 (6 years ago)
Ammo
Pro

I did say that it was a "gross" suggestion.   As in, it would make me vomit a little to write something like that.  That said, you do need to create an easy way for people to support Airbag only optionally.  You don't want to have a situation where people can't use existing scripts without it.   Checking for existence and then calling the function in a one-liner is probably the best you can do?   

March 26 (6 years ago)
Ammo
Pro

Btw, I test my non-trivial code offline in Node.js.  So do some of the other devs here who write complicated stuff.

If I had any trouble debugging, I would probably focus on making mock20 better as a test platform, instead of trying to create the ability to debug in the sandbox itself.   Have you worked with that?  I don't use it myself, because I use TypeScript and can compile my classes separately in unit tests, but I feel like mock20 + a real debugger would make you happier.