Part V – The Finished (--ish) Product So, it took me a little longer to put the parser together because of the difficulty of dealing with nested structures (i.e., outerFunc(innerFunc(reallyInnerFunc()))…) in a left-to-right parser. As it stands right now, elements are detected in order, but if internal functions are not properly closed, they will not be executed. That is expected behavior, but probably something that can be improved upon with a little thought (giving the user better feedback about malformed command lines). I won't cloud the issue by getting into it right now, but I wanted to leave the door open to future improvement. Fiddle Before I go on, I wanted to point out this fiddle I built where you can go to see the source code (or lift it for your own purposes), or just to see the parser in action and test your own lines. Enter the argument to parse in the box and examine the process log and output, below. If you really want to play, you can add your own function to the bank of internal functions, or change what those functions return (I will write up a separate post on how to expand the fiddle in another post). Examining the rules with left-to-right limitation Without closure detection (finding the proper number of closures to get the full set of text that belongs to a function), each of the rules had to be examined in terms of what text defined the start of that element. Rules that defined terminal values (that is, rules that define themselves without using other rules), also needed to know how to detect their end. Really quickly, then, here are the final rules that define how this parser will proceed: val : ( func | efunc | text ) func : /^(f 1 |f 2 |f 3 |…f n )\((?=\)|!!)/i arg* efunc : /^`(f 1 |f 2 |f 3 |…f n )\((?=\)|!!)/i {closure detection} text : /^(.*?)(?:\)(?<!`\))|\s+!!|$)/ arg : key | flag key : /^\s*\(?!!([^\s#)]+)#[^\s]+/ flag : /^\s*!!([^\s)]+)(?=\s|\))/ val This is what I had called expr, previously. It will find either an internal function ( func ), an escaped internal function ( efunc ), or text . func Matches on a regex built from the list of internal functions, followed by an open paren, followed by either a close paren or '!!'. Consumes through the open paren. efunc Matches on the same regex in the same manner as func , except that it requires there to be a tick character prior to the function name. text Captures everything up to a boundary condition. Boundaries are defined as a closing paren not preceded by a tick or '!!' following at least one character of white space. arg Is either a key or a flag . key Looks for a pattern of !!key#anything to designate a key . If a key is found, we start a new val check. flag Looks for a pattern of !!flag followed by a space or a close paren. From rule to code Every rule is a function, so we are going to have a val() function, a func() function, an efunc() function, etc. Every rule function should return either the text it consumes or null if it cannot match. The func function takes that one step further in that it refers to an internal function, so once it has gathered its parameters, it runs that internal function and returns the output. Last, each function should increment us along the command line the appropriate number of characters based on how much of the input we consume. Helper functions We need a couple of helper functions, one for our zero-or-more implementations (i.e., a func contains zero or more args ), and one for the first element we find ( val should return the first thing it finds in order of looking for a func , an efunc , or text ). zeroOrMore() This function will take a function (a rule) as an argument, and aggregate the returns as many times as it can. const zeroOrMore = (f) => { let ret = ""; let i = 0; for (; ;) { indent++; temp = f(); indent--; if (!temp) { nestlog(`ZERO: Has built: ${ret}`); return ret; } ret += temp; } }; It runs an infinite loop in the empty for( ; ; ) construction, and only breaks when the rule we fed it (the function, assigned to temp in the loop) returns a falsey value. All of our recursive functions return null if they fail, so we're sure to eventually find a way to break out of the loop, no matter what function we feed. firstOf() This function needs to take a flexible number of arguments, each of which will be a function. The arguments we feed it are the rules it needs to check, and it needs to check them in the order the arguments are supplied. Because we are using fat arrow syntax, we don't have access to the arguments pseudo array of a normal function, but the rest parameter ( …args ) gives us the same functionality. firstOf() returns the results of the first matched rule, or it returns null. const firstOf = (...args) => {
let ret;
args.find(f => ret = f(), ret);
return ret;
}; EDIT: newer, slimmed-down, and more idiomatic version of this function thanks to Aaron's input! Other utility functions Just to understand what these functions mean when you see them in the javascript expressions of the rules, here are a couple of utility functions that the parser makes use of. getFuncRegex() The first is the regex builder for the available internal functions (this is the same basic function that was mentioned in a previous post). const getFuncRegex = (obj, e = false) => { return new RegExp(`^${e ? '`' : ''}(${Object.keys(obj).join("|")})\\((?=\\)|!!)`, 'i'); // group 1: func from func(arg*) // if escaped, a tick (`) must precede the func }; EDIT: Conversion of regex string to literal (for easier readability) thanks to Aaron's input nestlog() The second utility function is to give a nested look to the logging output. This function prepends the logged output with an appropriate number of characters, and highlights the slug portion of the statement. const nestlog = (stmt, ilvl = indent) => { let l = Array(ilvl + 1).join("==") + stmt; l = l.indexOf(":") ? '<span style="color:yellow;">' + l.slice(0, l.indexOf(":")) + ':</span>' + l.slice(l.indexOf(":") + 1) : l; log(l); }; Rule functions Finally, here are the code expressions I've been talking about for the past few posts. val // val: ( func | efunc | text ) const val = () => { let bt = index; let loccmd = cmd.slice(index); nestlog(`VAL RECEIVES: ${loccmd}`); indent++; let ret = firstOf(func, efunc, text); indent--; if (bt === 0) { if (cmd.slice(index).length) { nestlog(`VAL: Is getting the rest: ${cmd.slice(index)}`); } ret = ret + cmd.slice(index); // if this is the top level recursion and there is still something to grab, grab it } return ret; }; We are simple space foragers. We look for things that make us go. Well, the engine that makes us go, here, is the line that begins "let ret = …". Our rule says (loosely) that val is A or B or C. We process that by feeding the three rules into the firstOf() function. We set the "backtrack" index (bt) at the beginning of the function (think of it as an "initial index") so that we can check it later. Since the firstOf() call begins a recursion descent, the check that comes right after that line ( bt===0 ) will only occur after the recursion loops back to this procedure. However, since every time we call the val() function we are setting that instance of bt, and every time we return to the val function we will have incremented the index along the command line we're parsing, the bt variable will only equal 0 for the absolute first instance of val(). This means that when all of our recursions have completed, we will run the last few lines of code in the val() function only once. Those lines grab the remainder of the original command line (anything left after we advanced through our command line in each function). func // func: /^(f1|f2|f3|…fn)\((?=\)|!!)/i arg* const func = () => { let loccmd = cmd.slice(index); let f = funcrx.exec(loccmd); if (f) { nestlog(`FUNC DETECTS: ${f[1]}`) let lp = /\(/.exec(loccmd).index; index += lp + 1; indent++; let params = zeroOrMore(arg).trimLeft(); params = params.replace(/`\(|`\)/g, m => { return { '`(': '(', '`)': ')' }.m }); indent--; if (cmd.charAt(index) === ')') { nestlog(`FUNC: Running ${f[1]}(${params})`); let retObj = obj[f[1]]({ ...(Object.fromEntries((params || "") .split("!!") .filter(p => p !== "") .map(a => a.split("#")) .map(a => [a.slice(0)[0], a.slice(1).join("#").trim()]))) }); index++; nestlog(`FUNC: Returning ${retObj.ret}`); return retObj.ret; } } return null; }; The first thing we do is start a new string from the current index. That gives this function a new starting point allowing for more exact regex match (you'll see this in other functions, too... even in the val() function, above). In this case, we match on a regex built from our function list. The recursion engine in this function is the line that begins "let params = zeroOrMore(…)". We feed the arg rule (function) to return as many args as we can find. If those arguments have further recursion, we'll catch that in the arg() function. Eventually this check of arg will fail (returning null). That will be the cue for zeroOrMore() to return everything it's aggregated from all of the successful arg checks. Once all of the params are gathered from the command line, we'll do a replace on parens preceded by a tick character. This was the way that a user could include a paren in their text input without having it trip our parser. Now that we're ready to output the result (or send it to the internal function call), we don't want to include the tick anymore. When that's done, we split the list into individual arguments (on '!!'), then map each non-empty argument into a two item array (splitting on '#', then joining everything after the first occurrence). Finally (you have to read the code backwards, a little bit), the resulting array is turned into an object, and then I use the spread operator to send the object to the function we detected. This might be clearer if you look at it from the perspective of the function you want to call. I'll write more up about this in the post on how to use the fiddle, but the basic idea is that by structuring the call in this way, you don't have to change this engine to add a function, nor do you have to worry about what arguments you want to pass to the receiving function. You only need to ask for a parameter in the function declaration to make it available to the user as a '!!' option. (See the getrepeating() function at line 23 in the fiddle to see this at work). (I think the internal functions, built the way they are built, use object destructuring assignment , but I have been told I might be wrong in this. Maybe we can get to the bottom of it in this thread. In any case, those functions can access whatever arguments you need them to using the '!!' method, just mentioned.) All internal functions return an object... let retObj = { ret: "", safe: true }; ...so a func has to check the ret property to get the text output. The ret property is the text returned; the safe property is... not necessary for this fiddle; it is a remnant of the larger project into which I want to write the recursive descent parser. efunc // efunc: /^`(f1|f2|f3|…fn)\((?=\)|!!)/i {closure detection} const efunc = () => { let bt = index; let loccmd = cmd.slice(index); let f = efuncrx.exec(loccmd); if (f) { nestlog(`EFUNC DETECTS: ${f[1]}`) let lp = /\(/.exec(loccmd).index; let pos = lp, pairs = 1; while (pairs !== 0 && pos < loccmd.length) { pos++; if (loccmd.charAt(pos) === '(' && loccmd.charAt(pos - 1) !== "`") pairs++; else if (loccmd.charAt(pos) === ')' && loccmd.charAt(pos - 1) !== "`") pairs--; } index += pos + 1; let ret = cmd.slice(bt + 1, index); nestlog(`EFUNC: Returning ${ret}`); return ret; } return null; }; Here, again, I set a backtrack index (I'll explain in a minute) and grab a new string starting from our current position. The test for this rule is built on the same sort of regex as the func rule, except that it includes the prepended tick. The difference with this function is that instead of descending into a recursive loop when it finds the components that satisfy arg , it will just skim over everything until it detects the end of all paren closures. There's probably a quicker or more elegant way to get to the matching close paren to the detected efunc , but this does the job. Once the function finds the proper closure for the efunc , it returns everything from our initial index (bt) plus 1 for the open paren, and our new position (pos). text // text: /^(.*?)(?:\)(?<!`\))|\s+!!|$)/ const text = () => { let loccmd = cmd.slice(index); let tb = textrx.exec(loccmd); if (tb) { nestlog(`TEXT DETECTS: ${tb[1]}`); index += tb[1].length; return tb[1]; } return null; }; This function is straightforward, if you've understood the process up to now. The only thing here, as far as logic is concerned, is to understand that if a val hasn't found a func or an efunc by this point, the thing we are looking for is the text portion of a key#val pair. That means we have to capture everything up to the end of that key#val . The right-bounding condition in that case could be a closing paren NOT preceded by a tick, or a space followed by '!!'. Both of these conditions are represented in the regex. Once found, we advance our index to the end of the captured characters. arg // arg: key | flag const arg = () => { nestlog(`ARG RECEIVES: ${cmd.slice(index)}`); indent++; let ret = firstOf(key, flag); indent--; if (ret) return ret; nestlog(`ARG: Returning null.`); return null; }; Another straightforward function. This rule represents an OR case, so we feed our two cases (other rules, or functions) to the firstOf() function… …much like the party feeds GiGs to the Eldritch Horror summoned when GiGs fails his Sneak roll. Again. key // key: /^\s*\(?!!([^\s#)]+)#[^\s]+/ const key = () => { let loccmd = cmd.slice(index); let k = keyrx.exec(loccmd); if (k) { nestlog(`KEY DETECTS: ${k[1]}`, indent); let hindex = loccmd.indexOf("#"); index += hindex + 1; indent++; let ret = ' !!' + k[1] + '#' + val(); indent--; return ret; } return null; }; You can probably see the process, here. Check our regex to see if the rule matches. If it does, construct our output text (including the '!!' and the '#', because this key will be part of an argument passed to a bounding function). The thing to notice is the function calls val() again, since after a '#' we could find another func , efunc , or text . flag // flag: /^\s*!!([^\s)]+)(?=\s|\))/ const flag = () => { let loccmd = cmd.slice(index); let f = flagrx.exec(loccmd); if (f) { nestlog(`FLAG DETECTS: ${f[1]}`); let offset = loccmd.indexOf(f[1]) + f[1].length; index += offset; return ` !!${f[1]}`; } return null; }; Look familiar? Slice a new string. Check our rule. Set our new index position based on what we find, then return anything caught up to our bounding condition (a space or a close paren). If we don't find anything, return null. (We don't worry about escaped closing parens, here, since a paren should not be allowed as part of an argument passed to an internal function, which is what a flag represents.) Time to play After all this work, it's time to play! It's time to send in GiGs... ...Mr. Even Paladins are Thieves ...Mr. My Barbarian can do Seduction ...Mr. Why Can't Ogres be Ninjas? ...Mr. That Last Trap was a Total Fluke -- Yes, Like the One Before Just kidding. I've never gamed with GiGs. But this IS how he games in my imagination. Anyway, head over to the fiddle and help me break it. Follow the guidelines for the input that I've described (the way a command line element is constructed), and see if the output is as you would expect. If you find an input that breaks it, post back and I'll take a look. I'll write up one more post about how to add more internal functions to the fiddle, because this will ultimately be for a script that could be extended to a number of potential implementations, and the method of doing that would be the same as adding a function to this fiddle. For now... everyone... single file line... following GiGs... and off... we... go.