This document provides a detailed technical specification and data sheet for the General Expression Language (GEL). The specification covers its lexing, parsing, and AST data structures, serving as a reference for both implementers and power users.
GEL (General Expression Language) is a domain-specific language intended for constructing logical and arithmetic expressions. These expressions can filter data records, define conditions, or implement advanced matching logic. The language supports:
and
, or
, not
).eq
, ne
, lt
, le
, gt
,
ge
, in
, has
).
file.name
).{...}
syntax for in
checks).arr[index]
or obj[key]
).GEL expressions are written in a compact form that closely resembles pseudo-code:
not process.name eq "cmd.exe" and (file.extension in { "exe" "dll" })
Internally, GEL undergoes two phases of processing:
Once parsed into an Expression
AST, the language elements can be further analyzed
or executed by an interpreter or evaluator.
The lexer is defined in lexer_optimized.ts
, providing an optimized tokenization phase.
Key optimizations include:
GelLexer
Class
The GelLexer
converts an input string into an array of Token
objects. Notable methods:
Token[]
.r#"..."#
syntax).b
or B
.GelLexer
also tracks line and column information to aid in
detailed error reporting for invalid tokens.
The parser, defined in parser.ts
(or parser_optimized.ts
in your final distribution),
reads the tokens produced by the lexer and constructs an Abstract Syntax Tree (AST). The parser
enforces correct syntax and performs semantic checks like type matching for function calls.
GelParser
ClassGelConfig
for field/function definitions. Internally calls the lexer.Expression
node. Throws
ParseError
if syntax or semantic checks fail.
Handles the top-level parse logic.
Implements logical operator precedence. E.g.expr1 or expr2
,expr1 and expr2
.
Parses optional not
prefix for expressions.
Parses parentheses or delegates to parseOperandOrFunctionWithOperator()
.
Determines if an identifier is a field reference or a function call.
Parses set-literal syntax ({ ... }
) forin
operations.
The parser also performs type inference (for instance, ensuring lt
only applies to
numeric types). It references this.config.fields
for known field types and
this.config.signatures
for known function signatures.
The lexer outputs tokens defined by TokenType
. Below is a summary table:
TokenType | Description / Example |
---|---|
lparen , rparen |
Left ( or Right ) parenthesis |
lbrace , rbrace |
Left { or Right } brace |
lbracket , rbracket |
Left [ or Right ] bracket |
comma |
Comma (, ) separator |
dot |
Period (. ) used for dotted field paths or part of dotdot |
dotdot |
Double-dot (.. ) used for numeric ranges in set literals |
star |
Asterisk (* ) used for subscript expansions ([*] ) |
identifier |
Keywords (and , or , not ) or user-defined fields (process )
or operator tokens (eq , lt , etc.) |
string |
A quoted string ("foo" or 'bar' ) or raw string (r#"something"# ) |
bytes |
A bytes literal (b"\\x41\\x42" ) |
number |
Numeric literal (123 , 3.14 ) |
less_than , greater_than |
< or > (not generally used in the parser, but recognized by the lexer) |
eof |
End of input marker |
The overall grammar hierarchy is as follows (simplified BNF notation):
Expression := OrExpression
OrExpression := AndExpression ("or" AndExpression)*
AndExpression := UnaryExpression ("and" UnaryExpression)*
UnaryExpression := ("not")? PrimaryExpression
PrimaryExpression := "(" Expression ")"
| OperandOrFunctionWithOperator
OperandOrFunctionWithOperator
:= OperandOrFunction (ComparisonOperator OperandOrFunction)?
OperandOrFunction := Operand ("(" ArgList? ")" )?
("[" SubscriptIndex "]")*
Operand := NumberLiteral
| StringLiteral
| BooleanLiteral
| BytesLiteral
| InSet
| FieldReference
InSet := "{" (InSetElement (InSetElement)*)? "}"
InSetElement := NumberLiteral (".." NumberLiteral)?
| StringLiteral
| ...
ComparisonOperator := "eq" | "ne" | "lt" | "le" | "gt" | "ge" | "in" | "has"
ArgList := Expression ("," Expression)*
SubscriptIndex := "*" | Expression
The parser transforms the tokens into a strongly typed AST. The core node types are:
Node Type | Role |
---|---|
LogicalNode |
Represents expr1 AND expr2 or expr1 OR expr2 . |
ComparisonNode |
Binary operator (eq , lt , etc.) with left and right
sub-expressions. |
FunctionCallNode |
Function invocation (starts_with(field, "xyz") ). Holds argument list and return type. |
FieldReferenceNode |
Refers to a field name, possibly dotted (file.path ). Has an inferred fieldType .
|
LiteralNode |
Represents a literal (number , string , boolean , or
bytes ).
|
InSetNode |
Used for the in operator to hold multiple possible values or expansions from numeric ranges.
|
SubscriptNode |
Array or map indexing (arr[idx] , obj[key] , or arr[*] for expansion).
|
negated
field. This indicates that
the node is prefixed with a not
(if applicable). In practice, the parser often sets or
toggles this during unary expression parsing.
The parser references a GelConfig
object containing:
FieldType
, e.g.:
{
"file.name": "string",
"process.pid": "number",
"network.ports": "array"
}
{
"starts_with": {
name: "starts_with",
parameters: [
{ paramName: "haystack", allowedTypes: ["string"] },
{ paramName: "needle", allowedTypes: ["string"] }
],
returnType: "boolean"
}
}
not user.is_admin eq true and file.extension in {"exe" "dll"}
After tokenization, a set of Token
objects is produced, e.g.
(identifier=not), (identifier=user), (dot=.), (identifier=is_admin), (identifier=eq), (identifier=true), (identifier=and), ...
.
The parser then builds an AST:
LogicalNode {
operator: "and",
left: ComparisonNode {
operator: "eq",
left: FieldReferenceNode("user.is_admin"),
right: LiteralNode(boolean=true),
negated: true
},
right: ComparisonNode {
operator: "in",
left: FieldReferenceNode("file.extension"),
right: InSetNode { values: ["exe", "dll"] },
negated: false
},
negated: false
}
starts_with(file.name, "test") or ends_with(file.name, ".txt")
If starts_with
and ends_with
are known functions returning booleans, the parser
will produce a FunctionCallNode
for each call and wrap them in a
LogicalNode
with operator="or"
.
ParseError
is thrown for invalid syntax, type mismatches, or unknown functions. It captures
line
and column
for user-friendly error messages:
throw new ParseError(
"Unknown function 'some_bad_function'.",
token.line,
token.column
);
When evaluating the AST, an UndefinedFieldError
may be thrown if a required
field is missing from the data at runtime.
Reference | Description | Link |
---|---|---|
evaluator.ts | Implementation of GelEvaluator . |
View Code |
engine.ts | Higher-level GelEngine showcasing parsing + evaluation. |
View Code |