Roll20 uses cookies to improve your experience on our site. Cookies enable you to enjoy certain features, social sharing functionality, and tailor message and display ads to your interests on our site and others. They also help us understand how our site is being used. By continuing to use our site, you consent to our use of cookies. Update your cookie preferences .
×
Create a free account

Recursive Descent Parser - here goes nothing (aka, There and back again: A coder's journey)

1594935184

Edited 1595003557
timmaugh
Pro
API Scripter
I'm trying something new. I know what you're thinking: You might regret that. And, you're right. I might. But, like I always say: "It's hard to regret trying something new unless... you try something new." After talking through a script idea with Aaron (MAJOR help, there... thank you!), I understood that I needed to learn recursive descent parsing (RDP) to implement one particular feature I wanted to include. And, after offering feedback on Tim M's first script, I realized that maybe there's value in logging breakthroughs and things I learn where other people can read it, too. At the very least, we can all regret this, together. Context for the problem I want to parse the api call to look for calls to functions that are internal to the script. If the user gives me an api call like: !regret --foo#getThings() --bar#3 --baz#doSomething() ... and if, in the script, I have functions declared for both "getThings()" and "doSomething()", I want to use the results of those internal functions to feed the arguments foo and baz . Make sense? Great. const getThings() => {     return "things"; }; const doSomething() => {     // ... do something }; One more thing to know, these internal functions could take arguments. Those will be demarked by '!!' in the command line, like this: !regret --foo#getThings() --bar#3 --baz#doSomething(!!a#shmacka !!bozo#true !!flag) ...where "a" will receive the value of "shmacka", "bozo" will receive "true", and "flag" is given no value. In that case, doSomething() would be declared as: const doSomething({a: a, bozo: b, flag: f} = {}) => {     // ... do something with a, b, and f }; That means that the initial arguments to the regret script are recognized by /\s+--/ (one or more white spaces followed by double hyphen), key/value transitions are found by #, and arguments to internal functions are marked by '!!'. The problem I'm trying to solve The problem is: what if an internal function is WITHIN another internal function. If the api call looks like this: !regret --bar#3 --baz#doSomething(!!a#getThings(!!what#handouts !!flubby)) ... then the arguments that should be internal to getThings() will instead LOOK like they are part of the doSomething() function... at least to a normal parsing operation. Enter recursive descent parsing. Extended Backus-Naur Form So, I will write up what I *think* these grammar rules are saying in a subsequent post, but if you're still with me, you've come through so much regret already I want to give you a break. Just know this is in extended Backus-Naur Form, which is a way of writing the rules that will drive the parsing process (without backtracking). If you're familiar with BNF, RDP, or if you have another solution to the problem, please give me any feedback. Here, the presence of a tick (`) indicates that the user wanted to just use the flat text (for instance, "doSomething" rather than the functional output of the doSomething() procedure)... expr   :     /^[^`]/ func | /.*/ func   :     /(f1|f2|...|fn)/ '(' (arg)* ')' arg    :     '!!' /[^#\s]+/ (('#' (func | `func | /[^\s]/))| (' ')*) `func  :     /`*(f 1 |f 2 |...|f n )/ '(' (`arg)* ')' `arg   :     '!!' /[^#\s]+/ (('#' `func) | (' ')*) ... more to come!
1594936089
The Aaron
Roll20 Production Team
API Scripter
Off to a great start!
1594956435
GiGs
Pro
Sheet Author
API Scripter
This sounds interesting, but I cant really grasp it at its current level of abstraction. I'd need to see it in actual use in a real script.
1595001002

Edited 1595436177
timmaugh
Pro
API Scripter
Part II GiGs said: This sounds interesting, but I cant really grasp it at its current level of abstraction. I'd need to see it in actual use in a real script. So would I! =D I will get there. Eventually. Hopefully. In writing terms, we're nearly to the midpoint of our plot already, where inaction turns to action and code hits the screen. It... just... also means there's still a lot of try/fail cycles (literally and programmatically) ahead. OK, so next up, what that Backus-Naur Form notation means, and what I think the grammar rules, above are saying. I will do my best to explain, but remember that this is written while I'm still learning, too, so go easy on me. =) Illustrating the process I found a concrete example helped me grasp what is going on, so let's start there. Imagine you're a building contractor, and you need to build a house. There are many smaller, discrete jobs that are a part of building the house, and there might be sub-contractors to whom you hand-off the work (and they might further subdivide the work and/or hand it off to another sub-contractor). There would be planning jobs like site survey, permits, and drawing up architectural plans, exterior construction jobs like excavation, foundation pouring, framing, roofing, siding, painting, windows... and all of that's before we even get to the inside jobs. Given all of that, we might define a house project (at a very high level) as: A HOUSE is PLANNING jobs followed by EXTERIOR jobs followed by INTERIOR jobs. ...or, removing all the extra verbiage: house          :        planning exterior interior To get more detail, all of those high level concepts have to be broken down. If they are non-terminating elements ("terminating" meaning no further derivation is available or required to understand them), we should proceed to define them, too. We've already mentioned what planning a house would entail: PLANNING is some order of SURVEYING , PERMITTING , and DRAWING plans. And, if you think about it, there might be rules for the order those things can come. You might survey the site before you draw plans, or you might have plans drawn and then look for a site that would work, but you definitely can't get a permit until both of those jobs are finished. We can represent that with an OR case ("|"). planning       :        ( surveying drawing | drawing surveying ) permitting Which tells us we are either going to proceed from surveying to drawing to permitting, or from drawing to surveying to permitting. Let's take it a step further and say that you might need to survey multiple sites, or you might need to draw up multiple plans (maybe the homeowners are a little high-maintenance). For the purposes of this example, we'll just stipulate that if you're in the "surveying" phase, you do all of the surveying, and if you're in the "drawing" phase, you'll do all of the drawing -- that is, you won't survey, draw, survey, draw, draw, survey survey. In that case, we need to represent optional elements. We can do this either with an optional empty (E - epsilon), or with an asterisk (think "zero or more" from regular expressions). In our case, since we need at least 1 survey before our surveying phase is done, we'll use a plus (again, like regular expressions). SURVEYING consists of at least 1 SURVEY surveying      :        survey+ (Let's assume that a survey is a known thing requiring no further definition just so we can keep the example manageable.) With that same logic, we can handle drawing up multiple plans and/or getting multiple permits: drawing        :        blueprint+ permitting     :        permit+ Now our Planning phase is complete. Our current set of rules looks like this: house          :        planning exterior interior planning       :        ( surveying drawing | drawing surveying ) permitting surveying      :        survey+ drawing        :        blueprint+ permitting     :        permit+ Next we would go on to the Exterior phase, where the jobs still come in a fairly regimented order: EXTERIOR work is EXCAVATION followed by the FOUNDATION followed by FRAMING followed by WINDOWS and ROOFING (in some order) followed by SIDING and PAINTING (in some order) exterior       : excavation foundation framing ( windows roofing | roofing windows ) ( siding painting | painting siding ) And, yes, if you wanted to, you could further divide some of these jobs (foundation is really a frame and a pour, siding is really weather-proofing and then the outside cladding). If you're doing that on your own, well done. For our purposes, remember, we're managing regret... and there's already a bunch, I think. Let's keep it simple. Great... so the Interior phase would be a bunch of potential jobs that could run in any order (painting, flooring, tile, cabinetry, etc.). Right now, we'll say that not every house requires every job: INTERIOR work is some number of JOBS from the set (j 1 = painting, j 2 = flooring, j 3 = tile, ...j n ) interior       : job* job            :        ( j 1 | j 2 | j 3 | ...j n ) The full set of rules Our full rules set would now be: house : planning exterior interior planning : ( surveying drawing | drawing surveying ) permitting surveying : survey+ drawing : blueprint+ permitting : permit+ exterior : excavation foundation framing ( windows roofing | roofing windows ) ( siding painting | painting siding ) interior       : job* job            :        ( j 1 | j 2 | j 3 | ...j n ) Turning Backus-Naur Form into code The beauty of BNF is that it is a way of thinking about the logic of recursion without having to get lost in the language or coding structures at the same time. You didn't know we were building recursion? Well, team, let's build a... house ...we'd better plan ...GiGs: I have a blueprint I like ...Aaron: I don't like that blueprint ...GiGs: I have another blueprint I like ...Aaron: No. Keith? ...GiGs: What about this one? ...Aaron: Shut it, GiGs. Keith? ...Keith: Me mum used to build houses like this (blueprint) ...Team: Ooh, let's do that ...GiGs: But... ...great, now let's get a site ...GiGs: I was doing some research, and *this* location is-- ...Aaron: Jakob, what ya got? ...Jakob: I have a site with good drainage, plenty of escape routes, and a portal to an alternate dimension ...Team: alternate dimensions are lovely, this time of year, let's do that ...on to getting our necessary permits ...Aaron: alright, who should we get to file the paperwork? ...Team: Gigs! ...now that that's done, let's get the outside work started ........ (not sure how the Aaron became the de facto team lead, Keith got a English accent, or GiGs was cast into the role of ... well, let's move on...) Every line in a BNF definition is a rule, and every rule is a function. Our "house" definition is a function, as is "planning," "surveying", etc. Every level of indentation in the example above is a deeper nest of functions: home => planning => drawing =>      blueprint (GiGs1)                                 blueprint (GiGs2)      blueprint (GiGs3)      blueprint (Keith) surveying =>    survey (GiGs1)    survey (Jakob) permitting => permit (GiGs) exterior => ... So the home() function isn't finished until it has called all of its required elements (planning(), exterior(), interior()). The planning() function isn't done until it has called its required elements (drawing(), surveying(), and permitting()). The drawing() function isn't done until it has found at least 1 blueprint; surveying(), one survey; and permitting(), one permit. For those last functions, we would probably write them to return the last blueprint, the last survey, and all of the permits. You can proceed, from there, to imagine the various ways the rest of the home-building process could be mapped using the rules/functions we have defined... I would just implore you, for the sake of comic cohesion, to continue to cast GiGs in the role we've begun to carve out for him. It doesn't contribute much to the understanding of the code, but... it is a lot more fun, that way. =) ...more to come...
1595002456
The Aaron
Roll20 Production Team
API Scripter
This is pretty great!  (Keith can do a marvelous British accent, and many others, btw... =D  And if anyone can find a portal to an alternate dimension, it's Jakob. =D  Sorry Gigs...) I'm looking forward to the next installment! 
1595005128
GiGs
Pro
Sheet Author
API Scripter
Jakob might find a portal to another dimension through the power of mathematics. I'll do it from relentless stubbornness (and invoking the power of Buckaroo Banzai).
1595021777

Edited 1595079705
timmaugh
Pro
API Scripter
PART III The Aaron  said: (Keith can do a marvelous British accent, and many others, btw... =D  And if anyone can find a portal to an alternate dimension, it's Jakob. =D   It's like a game of "What's Your Job in the Forum Apocalypse?" Some of us get Force Protection, others are Hunters or Farmers... Jakob gets portal-detection duty, and Keith composes the entire British Accent Division (Airborne Wing). It's a big job. =) Back to the problem at hand that I'm trying to solve: parsing a command line (potentially) full of (potentially) nested function references. Given a potential input like (here broken down to multiple lines for simple readability): !regret     --what#getrepeating(!!c#getchars(!!sf#4) !!col##993421 !!bf#somefunc(!!chug#getlocation(!!t#-Mjfkdsl !!rel#true) !!b#termfunc()))     --where#`getrepeating()     --how#buttonmashing(!!frantic#true !!keys#any)     --why#unrecognizedneed()     --who#GiGs ...I need to detect any call to the following bank of potential functions: getrepeating(...) getchars(...) somefunc(...) termfunc(...) buttonmashing(...) Certain steps I can handle before it gets to the parser. I can split on /\s+--/ and drop the api handle: let args = msg.content.split(/\s+--/)                      .slice(1); ...and I can further split each argument into a key/val or key/undefined pair: let args = msg.content.split(/\s+--/)                      .slice(1)                      .map(a => a.split("#"))                      .map(a => [a.slice(0)[0], a.slice(1).join("#")]); I know, that's a bit of a pet way of splitting the command line, and there's lots of ways to go about it, but at this point, I should have isolated everything, giving me an array with values like: KEY       |   VALUE =========|========================= what     |  getrepeating(!!c#getchars(!!sf#4) !!col##993421 !!bf#somefunc(!!chug#getlocation(!!t#-Mjfkdsl !!rel#true) !!b#termfunc())) where    |  `getrepeating() how      |  buttonmashing(!!frantic#true !!keys#any) why      |  unrecognizedneed() who | GiGs Now I can examine just what I need to -- the VALUE portion of every argument where any potential calls to the internal functions need to be detected. That would mean, using the house construction example of  Part II  of this thread (...a link? really? just scroll up...), each VALUE in the argument list would be like a new "house" -- the starting point for any recursion. Developing a rule-set Using the examples above, I can articulate a few guidelines: The string will (a) be empty, (b) begin with a tick (`) to signal our parser to skip over the string, or (c) be text that might or might not begin with a function requiring detection Because of how I have parsed it already, functions can  only  appear at the beginning of the string or within another function A function that appears within another function would  directly  follow '!!some-text#' or else it would not require detection Functions that appear and would otherwise be detected but which are within an escaped function should not be detected or run (that is, since the outer function is escaped, all of the contents should be escaped, too) Then I need a list of functions to check each part of the command line against. For that I will use an object where every property is a key:value pair of user-supplied-string (key) and declared-function (value). const getrepeating = () => { //... }; const getchars = () => { //... }; const somefunc = () => { //... }; const termfunc = () => { //... }; const buttonmashing = () => { //... }; const availFuncs = {     getrepeating: getrepeating,     getchars: getchars,     somefunc: somefunc,     termfunc: termfunc,     buttonmashing: buttonmashing }; And I will construct my list from the keys of that object, and use that to form a regex that will handle my testing: const getFuncRegex = (obj) => {     return new RegExp('^(' + Object.keys(obj).join("|") + ')\\(([^)]*)?\\)?');     // group 1: function from function(arguments)     // group 2: arguments from function(arguments) }; CAVEAT 1:  I could probably use a string literal for that and avoid the '+' parts, but I'd have to escape the slashes *again*... so I will have to test. That function might change by the end of the project, but for now you get the picture. CAVEAT 2:  I am anticipating some difficulty with the fact that user input might include a parenthesis, which could alter the grouping if it isn't detected and/or sanitized. A more robust rule might be required, and/or using a different character than what my script currently expects. Let's find out. The object construction of the function library also gives me the ease of calling my declared functions using the user-supplied data parsed out of the command line. | USER SUPPLIED | FED TO OBJECT | RETURNED VAL IS FUNCTION | DEFINED AS | |===============|==========================|==========================|=============================| | "getchars" | availFuncs["getchars"]() | getchars | const getchars = () => { }; | Original rules Here are my rules from the initial post again, just for reference: expr   :     /^[^`]/ func | /.*/ func : /(f 1 |f 2 |...|f n )/ '(' (arg)* ')' arg    :     '!!' /[^#\s]+/ (('#' (func | `func | /[^\s]/)) | (' ')*) `func  :     /`*(f 1 |f 2 |...|f n )/ '(' (`arg)* ')' `arg   :     '!!' /[^#\s]+/ (('#' `func) | (' ')*) Let's talk them through, one-by-one, to see if they do what I want them to. I think I'll need to tweak them a little bit. expr This rule should handle the start of the line (which, remember, is the  value  side of a detected  --key#value  in the original command line). When done with all recursion, it should hand off a finished string with any functions reduced to any string output. I think I will reorder it to look for the tick first. Shouldn't be a huge speed impact (it would still pass/fail on the first character), but it does make for better readability, I think. I also need to add the "empty" string. That should be picked up by a "get everything," where the "everything" will just be "nothing." expr   :     /^`(.*)/ | func | /(.*)/ Either we find a tick, and return everything that follows (but not the tick), or we find a function, or we grab whatever is there (even if it's nothing). func This rule/function should represent anything that we catch as being a part of our library of functions. We should be able to detect it by the getFuncRegex() function (just above). The returned regex object from that function should place the function name into the first capturing group, and what the BNF notation (below) calls "arg" into the second capturing group. I'll require that the function comes at the beginning of the string, but otherwise, I think it's good. func : /^(f 1 |f 2 |...|f n )/ '(' arg* ')' A function is defined by a recognized function name coming at the start of the string, followed by an open paren, '(', arg(s), and a closing paren, ')'. arg This rule/function should catch empty cases (nothing provided between the parens) or an argument to the bounding function (beginning with '!!'). A non-empty arg could be in the form of: !!name !!name#text !!name#function() !!name#`escapedfunction() My initial rendering of the rule did not properly account for an empty arg. I can fix that by enclosing the whole rule in parens and an asterisk. Also, it relied on white space being between the args, and so it precluded there being white space WITHIN the value. I think it's better logic to split the contents of the second capturing group of any detected function (that is, split everything between the parentheses of the detected bounding function) on white-space-then-double-exclamation-marks. (The process would be just as above, where the initial string is split on white-space-double-hyphen, but I can provide an example if necessary.) That would free the user to include spaces in their input. Rewriting the rule to incorporate these changes would look like... arg    :     ( '!!' /[^#\s]+/ (('#' (func | `func | /.*/ ) ) )* `func and `arg These two rules are included as escaped mirrors of their counterparts. The logic here is that once we're in an escaped function, we do not want to recognize or process a function, even if we would otherwise recognize it. Therefore, escaped functions (`func) can only take escaped arguments (`arg), and an escaped arg, if it contains an otherwise recognized function (func), should treat that function as if it were escaped (`func). The only way to get back to "live" code space (where a recognized function would be executed) would be for the recursion of that particular loop to stop and the control returned to the next level up in the descending levels. I'll leave `func and `arg in my rules set for now to illustrate the flow of the logic, but I'm not convinced that I will need them. I may only need to detect the tick (or, put another way, detect a function directly after a '#') if the wrapping of the rule just above this step in the process (one recursion level up) properly defines the scope of the amount of text I can "skip". We will see what we need. Next step Since all of these rules are functions in our recursive process, the next step is to write that code... using functions that call each other as appropriate, track the index of our position in the command line, and return the appropriate information. Wish me luck. ...more to come...
1595023851
The Aaron
Roll20 Production Team
API Scripter
I've got to say, I'm really enjoying this adventure!
1595087584
Jakob
Sheet Author
API Scripter
I love that this forum is a mix of "how do I make a macro to turn light on and off for my token", and then this! Anyway, this sounds pretty cool. It totally makes sense to represent the AST you get from this kind of parsing in a higher-level abstraction (like it's presented here) in your code as well, just so you don't get lost in regular expressions. I wish I had done that with ChatSetAttr instead of hacking together a bunch of regexps and objects...
1595102238

Edited 1595102390
timmaugh
Pro
API Scripter
Part IV Jakob said: I love that this forum is a mix of "how do I make a macro to turn light on and off for my token", and then this! Anyway, this sounds pretty cool. Excellent! Always room for one more in the adventuring party... especially a good dimensional-portal detector! The Aaron said: I've got to say, I'm really enjoying this adventure! I hope so, because there's a Lvl 3 Code Troll ahead. And unless Jakob is able to detect a portal or GiGs can buckaroo banzai us out of it, it could be time to roll initiative. Saving throw OK, so maybe I can make a Trap Detection roll and avoid the code troll altogether. Here's what I realized... when last we left our intrepid band of recursive heroes, I thought we wouldn't need the `func and `arg rules to denote escaped functions and their arguments. After some further thought, I think we will need them. As an example, take the command line of: !regret --what#getChars(!!filter#`getTagged(!!t#Party !!npc#getFriendly())) (Not sure the logic of what we might imagine these functions would do would make sense in this order, but the semantics of their construction will illustrate the point, well enough.) Here, getChars(), getTagged(), and getFriendly() are in our bank of usable internal functions. First, we parse on /\s+--/ , drop the api handle, and split/join the rest on '#'. That gives us an array of one item... which is an array of [key, val]: KEY | VAL =======|==================================================================== what | getChars(!!filter#`getTagged(!!t#Party !!npc#getFriendly())) So we apply our expr rule and detect a func : FUNC      ==>  getChars( ... ) We need to test the contents to see if there are any args: FUNC      ==>  getChars( ) STRING TO EXAMINE ==> !!filter#`getTagged(!!t#Party !!npc#getFriendly()) If we simply split on /!![^#\s]+/ (double exclamation mark, any non-white space characters followed by a hash), we would end up with potential args of: FUNC      ==>  getChars( ) ARG? ==> !!filter#`getTagged( ) ARG? ==> !!t#Party ARG? ==> !!npc#getFriendly() Which obviously doesn't work. This sort of parsing would give us !!t#Party as an argument to the getChars() function, when it *should* belong to the getTagged() function, even if we escaped it. We do still need to detect the getTagged() function so that we know how much it should gobble out of the command line. This line of thought is also pointing out another problem, which is simply that a straight split of the potential arg (s) on the !! will not work. We're going to need another left-to-right, top down parse. Instead of splitting on space-!!, I think we're going to have to proceed left to right looking for a '#'. The first hash puts us into the value portion of the arg element, which could contain a function. If we detect a space-!! first, we're into a new arg element. But if we do encounter a hash and a func (escaped or otherwise), we need to keep going to the right to find the proper/associated closure for that function call. More robust rules are called for. The question is, do those rules have to be stricter at the func level (finding the proper closure), at the arg level (properly detecting the division of args that apply to the func just above this in the tree), or both? So, good news, bad news, fellow adventurers. My Trap Detection is a well-honed craft passed down to be by zero-or-more generations of trap detectors. However, my Disarm Trap is... well, let's just say that we send GiGs in, first, when I think I have it. =D However, feel free to chime in with your own Trap Disarm rolls if you have an idea!
1595114985

Edited 1595115056
The Aaron
Roll20 Production Team
API Scripter
Yes, GiGs...  
1595432843

Edited 1595601394
timmaugh
Pro
API Scripter
Part V – The Finished (--ish) Product So, it took me a little longer to put the parser together because of the difficulty of dealing with nested structures (i.e., outerFunc(innerFunc(reallyInnerFunc()))…) in a left-to-right parser. As it stands right now, elements are detected in order, but if internal functions are not properly closed, they will not be executed. That is expected behavior, but probably something that can be improved upon with a little thought (giving the user better feedback about malformed command lines). I won't cloud the issue by getting into it right now, but I wanted to leave the door open to future improvement. Fiddle Before I go on, I wanted to point out this fiddle I built where you can go to see the source code (or lift it for your own purposes), or just to see the parser in action and test your own lines. Enter the argument to parse in the box and examine the process log and output, below. If you really want to play, you can add your own function to the bank of internal functions, or change what those functions return (I will write up a separate post on how to expand the fiddle in another post). Examining the rules with left-to-right limitation Without closure detection (finding the proper number of closures to get the full set of text that belongs to a function), each of the rules had to be examined in terms of what text defined the start of that element. Rules that defined terminal values (that is, rules that define themselves without using other rules), also needed to know how to detect their end. Really quickly, then, here are the final rules that define how this parser will proceed: val        :       ( func | efunc | text ) func       :       /^(f 1 |f 2 |f 3 |…f n )\((?=\)|!!)/i arg* efunc      :       /^`(f 1 |f 2 |f 3 |…f n )\((?=\)|!!)/i {closure detection} text       :       /^(.*?)(?:\)(?<!`\))|\s+!!|$)/ arg        :       key | flag key        :       /^\s*\(?!!([^\s#)]+)#[^\s]+/ flag       :       /^\s*!!([^\s)]+)(?=\s|\))/ val This is what I had called expr, previously. It will find either an internal function ( func ), an escaped internal function ( efunc ), or text . func Matches on a regex built from the list of internal functions, followed by an open paren, followed by either a close paren or '!!'. Consumes through the open paren. efunc Matches on the same regex in the same manner as func , except that it requires there to be a tick character prior to the function name. text Captures everything up to a boundary condition. Boundaries are defined as a closing paren not preceded by a tick or '!!' following at least one character of white space. arg Is either a key or a flag . key Looks for a pattern of !!key#anything to designate a key . If a key is found, we start a new val check. flag Looks for a pattern of !!flag followed by a space or a close paren. From rule to code Every rule is a function, so we are going to have a val() function, a func() function, an efunc() function, etc. Every rule function should return either the text it consumes or null if it cannot match. The func function takes that one step further in that it refers to an internal function, so once it has gathered its parameters, it runs that internal function and returns the output. Last, each function should increment us along the command line the appropriate number of characters based on how much of the input we consume. Helper functions We need a couple of helper functions, one for our zero-or-more implementations (i.e., a func contains zero or more args ), and one for the first element we find ( val should return the first thing it finds in order of looking for a func , an efunc , or text ). zeroOrMore() This function will take a function (a rule) as an argument, and aggregate the returns as many times as it can. const zeroOrMore = (f) => {     let ret = "";     let i = 0;     for (; ;) {         indent++;         temp = f();         indent--;         if (!temp) {             nestlog(`ZERO: Has built: ${ret}`);             return ret;         }         ret += temp;     } }; It runs an infinite loop in the empty for( ; ; ) construction, and only breaks when the rule we fed it (the function, assigned to temp in the loop) returns a falsey value. All of our recursive functions return null if they fail, so we're sure to eventually find a way to break out of the loop, no matter what function we feed. firstOf() This function needs to take a flexible number of arguments, each of which will be a function. The arguments we feed it are the rules it needs to check, and it needs to check them in the order the arguments are supplied. Because we are using fat arrow syntax, we don't have access to the arguments pseudo array of a normal function, but the rest parameter ( …args ) gives us the same functionality. firstOf() returns the results of the first matched rule, or it returns null. const firstOf = (...args) => { let ret; args.find(f => ret = f(), ret); return ret; }; EDIT: newer, slimmed-down, and more idiomatic version of this function thanks to Aaron's input! Other utility functions Just to understand what these functions mean when you see them in the javascript expressions of the rules, here are a couple of utility functions that the parser makes use of. getFuncRegex() The first is the regex builder for the available internal functions (this is the same basic function that was mentioned in a previous post). const getFuncRegex = (obj, e = false) => {     return new RegExp(`^${e ? '`' : ''}(${Object.keys(obj).join("|")})\\((?=\\)|!!)`, 'i');     // group 1: func from func(arg*)     // if escaped, a tick (`) must precede the func }; EDIT: Conversion of regex string to literal (for easier readability) thanks to Aaron's input nestlog() The second utility function is to give a nested look to the logging output. This function prepends the logged output with an appropriate number of characters, and highlights the slug portion of the statement. const nestlog = (stmt, ilvl = indent) => {     let l = Array(ilvl + 1).join("==") + stmt;     l = l.indexOf(":") ? '<span style="color:yellow;">' + l.slice(0, l.indexOf(":")) + ':</span>' + l.slice(l.indexOf(":") + 1) : l;     log(l); }; Rule functions Finally, here are the code expressions I've been talking about for the past few posts. val // val: ( func | efunc | text ) const val = () => {     let bt = index;     let loccmd = cmd.slice(index);     nestlog(`VAL RECEIVES: ${loccmd}`);     indent++;     let ret = firstOf(func, efunc, text);     indent--;     if (bt === 0) {         if (cmd.slice(index).length) {             nestlog(`VAL: Is getting the rest: ${cmd.slice(index)}`);         }         ret = ret + cmd.slice(index); // if this is the top level recursion and there is still something to grab, grab it     }     return ret; }; We are simple space foragers. We look for things that make us go. Well, the engine that makes us go, here, is the line that begins "let ret = …". Our rule says (loosely) that val is A or B or C. We process that by feeding the three rules into the firstOf() function. We set the "backtrack" index (bt) at the beginning of the function (think of it as an "initial index") so that we can check it later. Since the firstOf() call begins a recursion descent, the check that comes right after that line ( bt===0 ) will only occur after the recursion loops back to this procedure. However, since every time we call the val() function we are setting that instance of bt, and every time we return to the val function we will have incremented the index along the command line we're parsing, the bt variable will only equal 0 for the absolute first instance of val(). This means that when all of our recursions have completed, we will run the last few lines of code in the val() function only once. Those lines grab the remainder of the original command line (anything left after we advanced through our command line in each function). func // func: /^(f1|f2|f3|…fn)\((?=\)|!!)/i arg* const func = () => {     let loccmd = cmd.slice(index);     let f = funcrx.exec(loccmd);     if (f) {         nestlog(`FUNC DETECTS: ${f[1]}`)         let lp = /\(/.exec(loccmd).index;         index += lp + 1;         indent++;         let params = zeroOrMore(arg).trimLeft();         params = params.replace(/`\(|`\)/g, m => { return { '`(': '(', '`)': ')' }.m });         indent--;         if (cmd.charAt(index) === ')') {             nestlog(`FUNC: Running ${f[1]}(${params})`);             let retObj = obj[f[1]]({                 ...(Object.fromEntries((params || "")                     .split("!!")                     .filter(p => p !== "")                     .map(a => a.split("#"))                     .map(a => [a.slice(0)[0], a.slice(1).join("#").trim()])))             });             index++;             nestlog(`FUNC: Returning ${retObj.ret}`);             return retObj.ret;         }     }     return null; }; The first thing we do is start a new string from the current index. That gives this function a new starting point allowing for more exact regex match (you'll see this in other functions, too... even in the val() function, above). In this case, we match on a regex built from our function list. The recursion engine in this function is the line that begins "let params = zeroOrMore(…)". We feed the arg rule (function) to return as many args as we can find. If those arguments have further recursion, we'll catch that in the arg() function. Eventually this check of arg will fail (returning null). That will be the cue for zeroOrMore() to return everything it's aggregated from all of the successful arg checks. Once all of the params are gathered from the command line, we'll do a replace on parens preceded by a tick character. This was the way that a user could include a paren in their text input without having it trip our parser. Now that we're ready to output the result (or send it to the internal function call), we don't want to include the tick anymore. When that's done, we split the list into individual arguments (on '!!'), then map each non-empty argument into a two item array (splitting on '#', then joining everything after the first occurrence). Finally (you have to read the code backwards, a little bit), the resulting array is turned into an object, and then I use the spread operator to send the object to the function we detected. This might be clearer if you look at it from the perspective of the function you want to call. I'll write more up about this in the post on how to use the fiddle, but the basic idea is that by structuring the call in this way, you don't have to change this engine to add a function, nor do you have to worry about what arguments you want to pass to the receiving function. You only need to ask for a parameter in the function declaration to make it available to the user as a '!!' option. (See the getrepeating() function at line 23 in the fiddle to see this at work). (I think the internal functions, built the way they are built, use object destructuring assignment , but I have been told I might be wrong in this. Maybe we can get to the bottom of it in this thread. In any case, those functions can access whatever arguments you need them to using the '!!' method, just mentioned.) All internal functions return an object... let retObj = { ret: "", safe: true }; ...so a func has to check the ret property to get the text output. The ret property is the text returned; the safe property is... not necessary for this fiddle; it is a remnant of the larger project into which I want to write the recursive descent parser. efunc // efunc: /^`(f1|f2|f3|…fn)\((?=\)|!!)/i {closure detection} const efunc = () => {     let bt = index;     let loccmd = cmd.slice(index);     let f = efuncrx.exec(loccmd);     if (f) {         nestlog(`EFUNC DETECTS: ${f[1]}`)         let lp = /\(/.exec(loccmd).index;         let pos = lp,             pairs = 1;         while (pairs !== 0 && pos < loccmd.length) {             pos++;             if (loccmd.charAt(pos) === '(' && loccmd.charAt(pos - 1) !== "`") pairs++;             else if (loccmd.charAt(pos) === ')' && loccmd.charAt(pos - 1) !== "`") pairs--;         }         index += pos + 1;         let ret = cmd.slice(bt + 1, index);         nestlog(`EFUNC: Returning ${ret}`);         return ret;     }     return null; }; Here, again, I set a backtrack index (I'll explain in a minute) and grab a new string starting from our current position. The test for this rule is built on the same sort of regex as the func rule, except that it includes the prepended tick. The difference with this function is that instead of descending into a recursive loop when it finds the components that satisfy arg , it will just skim over everything until it detects the end of all paren closures. There's probably a quicker or more elegant way to get to the matching close paren to the detected efunc , but this does the job. Once the function finds the proper closure for the efunc , it returns everything from our initial index (bt) plus 1 for the open paren, and our new position (pos). text // text: /^(.*?)(?:\)(?<!`\))|\s+!!|$)/ const text = () => {     let loccmd = cmd.slice(index);     let tb = textrx.exec(loccmd);     if (tb) {         nestlog(`TEXT DETECTS: ${tb[1]}`);         index += tb[1].length;         return tb[1];     }     return null; }; This function is straightforward, if you've understood the process up to now. The only thing here, as far as logic is concerned, is to understand that if a val hasn't found a func or an efunc by this point, the thing we are looking for is the text portion of a key#val pair. That means we have to capture everything up to the end of that key#val . The right-bounding condition in that case could be a closing paren NOT preceded by a tick, or a space followed by '!!'. Both of these conditions are represented in the regex. Once found, we advance our index to the end of the captured characters. arg // arg: key | flag const arg = () => {     nestlog(`ARG RECEIVES: ${cmd.slice(index)}`);     indent++;     let ret = firstOf(key, flag);     indent--;     if (ret) return ret;     nestlog(`ARG: Returning null.`);     return null; }; Another straightforward function. This rule represents an OR case, so we feed our two cases (other rules, or functions) to the firstOf() function… …much like the party feeds GiGs to the Eldritch Horror summoned when GiGs fails his Sneak roll. Again. key // key: /^\s*\(?!!([^\s#)]+)#[^\s]+/ const key = () => {     let loccmd = cmd.slice(index);     let k = keyrx.exec(loccmd);     if (k) {         nestlog(`KEY DETECTS: ${k[1]}`, indent);         let hindex = loccmd.indexOf("#");         index += hindex + 1;         indent++;         let ret = ' !!' + k[1] + '#' + val();         indent--;         return ret;     }     return null; }; You can probably see the process, here. Check our regex to see if the rule matches. If it does, construct our output text (including the '!!' and the '#', because this key will be part of an argument passed to a bounding function). The thing to notice is the function calls val() again, since after a '#' we could find another func , efunc , or text . flag // flag: /^\s*!!([^\s)]+)(?=\s|\))/ const flag = () => {     let loccmd = cmd.slice(index);     let f = flagrx.exec(loccmd);     if (f) {         nestlog(`FLAG DETECTS: ${f[1]}`);         let offset = loccmd.indexOf(f[1]) + f[1].length;         index += offset;         return ` !!${f[1]}`;     }     return null; }; Look familiar? Slice a new string. Check our rule. Set our new index position based on what we find, then return anything caught up to our bounding condition (a space or a close paren). If we don't find anything, return null. (We don't worry about escaped closing parens, here, since a paren should not be allowed as part of an argument passed to an internal function, which is what a flag represents.) Time to play After all this work, it's time to play! It's time to send in GiGs... ...Mr. Even Paladins are Thieves ...Mr. My Barbarian can do Seduction ...Mr. Why Can't Ogres be Ninjas? ...Mr. That Last Trap was a Total Fluke -- Yes, Like the One Before Just kidding. I've never gamed with GiGs. But this IS how he games in my imagination. Anyway, head over to the fiddle and help me break it. Follow the guidelines for the input that I've described (the way a command line element is constructed), and see if the output is as you would expect. If you find an input that breaks it, post back and I'll take a look. I'll write up one more post about how to add more internal functions to the fiddle, because this will ultimately be for a script that could be extended to a number of potential implementations, and the method of doing that would be the same as adding a function to this fiddle. For now... everyone... single file line... following GiGs... and off... we... go.
1595433317
timmaugh
Pro
API Scripter
I want to give a big thanks to Aaron for pointing me in the right direction with recursive descent parsing. It took a little of work to get it figured, but I could not have moved forward with the script idea I had without his insight. So let's give him a hand, everybody, as we send him on after GiGs. Look at them go. The pair of them. Just magnificent. ... *crash* <from next room> It's not my fault! ... Right, then. Weapons out. Spears up. It's about to get real.
1595447909

Edited 1595452809
timmaugh
Pro
API Scripter
...ok, so I've had one person not able to run the fiddle because their browser doesn't support a particular feature of the way one of the regex statements is built. I will see if I can make it more browser friendly. Details: the negative look behind in the text rule is apparently not supported in certain browsers. Edge and Chrome work, but Firefox doesn't like it. Party-poopers. UPDATE: Regex "lookbehinds" are apparently part of the  ECMA2018 standard, which was implemented by Firefox in their July 2020 release (v78). Once I updated to the latest version, it is working. That means that the fiddle works in Edge, Firefox, and Chrome. I think I'm good, unless someone with more knowledge of the Roll20 backend can speak to a limitation there? I've read that the E18 standard requires Node.js v 6.4 (with the harmony flag), or Node.js 9 (without requiring a flag). I haven't yet ported over the parsing engine to Roll20 servers... and I'm wondering if this will break when I do.
1595600581

Edited 1595600685
GiGs
Pro
Sheet Author
API Scripter
It took me a while to escape the Eldritch Horror so I couldnt read till now. Very entertaining, but it will take me a bit of study to understand it. Your effort to provide explanations is greatly appreciated. Interesting use of the infinite for loop. Does that work okay on roll20? roll20 tends to be a bit touchy around infinite loop detection. Regarding browser support: one prinicple is you really only need to support chrome and firefox. It's nice to support others, but you shouldnt go out of your way to, since roll20 only officially supports those two, and when people have problems they will only offer support in those browsers. So people really hsould be using one of those two here. PS: I hope my English accent measures up to Keith's standard, what with me being English...
1595603386
The Aaron
Roll20 Production Team
API Scripter
Roll20's Infinite Loop Detection operates on the principle of a simple watchdog process. Conceptually, when the API Sandbox makes a call into client code, it first sets a timer for 60 seconds (or possibly 120, can't recall), then calls the code.  If it hasn't come back from the client code and canceled the timer, then when it goes off, the client code is considered to be in an infinite loop and is terminated.  If Tim's infinite loop never exited, it would get killed by the watchdog timer expiring.  However, there isn't any static analysis being applied to the code looking for potential infinite loops, so no problems there.
1595604387
The Aaron
Roll20 Production Team
API Scripter
timmaugh said: UPDATE: Regex "lookbehinds" are apparently part of the  ECMA2018 standard, which was implemented by Firefox in their July 2020 release (v78). Once I updated to the latest version, it is working. That means that the fiddle works in Edge, Firefox, and Chrome. I think I'm good, unless someone with more knowledge of the Roll20 backend can speak to a limitation there? I've read that the E18 standard requires Node.js v 6.4 (with the harmony flag), or Node.js 9 (without requiring a flag). I haven't yet ported over the parsing engine to Roll20 servers... and I'm wondering if this will break when I do. The API Sandbox runs Node 12.16.3, so as long as it supports lookbehinds, everything should be fine:
1595624876
timmaugh
Pro
API Scripter
Thanks for the confirmation and the explanation of infinite loop detection, Aaron! GiGs said: It took me a while to escape the Eldritch Horror so I couldnt read till now. :-D GiGs said: Very entertaining, but it will take me a bit of study to understand it. Your effort to provide explanations is greatly appreciated. I will try to get that use-case post up on how to use the fiddle. Meant to do that today but work got busy. Hopefully between the explanation here, the instructions for the fiddle, and actually seeing it work at the fiddle, it will be clear enough, because my grand goal is to give other scripters a way to connect into the script project I'm working on and provide their own internal functions. ... which, in the abstract, doesn't sound like I've clarified anything at all, but I promise it will make sense once I get the script together and you see the use-case. GiGs  said: PS: I hope my English accent measures up to Keith's standard, what with me being English... I didn't know that! Of course, if this forum (and our recursive team) were a sit-com, this is when the lot of us would, in shock, be demanding you say something "english," whereupon we'd curl our lips, exchange disapproving looks and declare, "Nah. Sounds fake, mate." =D