General Expression Language (GEL)

This document provides a detailed technical specification and data sheet for the General Expression Language (GEL). The specification covers its lexing, parsing, and AST data structures, serving as a reference for both implementers and power users.


Table of Contents

  1. 1. Introduction
  2. 2. Overview of GEL
  3. 3. Lexer
  4. 4. Parser
  5. 5. Token Types
  6. 6. Grammar and AST
  7. 7. Configuration and Known Fields
  8. 8. Usage Examples
  9. 9. Error Handling
  10. 10. References

1. Introduction

GEL (General Expression Language) is a domain-specific language intended for constructing logical and arithmetic expressions. These expressions can filter data records, define conditions, or implement advanced matching logic. The language supports:


2. Overview of GEL

GEL expressions are written in a compact form that closely resembles pseudo-code:

Example Expression:
not process.name eq "cmd.exe" and (file.extension in { "exe" "dll" })

Internally, GEL undergoes two phases of processing:

  1. Tokenization (via the Lexer)
  2. Parsing into an AST (via the Parser)

Once parsed into an Expression AST, the language elements can be further analyzed or executed by an interpreter or evaluator.


3. Lexer

The lexer is defined in lexer_optimized.ts, providing an optimized tokenization phase. Key optimizations include:

3.1 GelLexer Class

The GelLexer converts an input string into an array of Token objects. Notable methods:

Note: The GelLexer also tracks line and column information to aid in detailed error reporting for invalid tokens.

4. Parser

The parser, defined in parser.ts (or parser_optimized.ts in your final distribution), reads the tokens produced by the lexer and constructs an Abstract Syntax Tree (AST). The parser enforces correct syntax and performs semantic checks like type matching for function calls.

4.1 GelParser Class

4.1.1 Key Parsing Methods

The parser also performs type inference (for instance, ensuring lt only applies to numeric types). It references this.config.fields for known field types and this.config.signatures for known function signatures.


5. Token Types

The lexer outputs tokens defined by TokenType. Below is a summary table:

TokenType Description / Example
lparen, rparen Left ( or Right ) parenthesis
lbrace, rbrace Left { or Right } brace
lbracket, rbracket Left [ or Right ] bracket
comma Comma (,) separator
dot Period (.) used for dotted field paths or part of dotdot
dotdot Double-dot (..) used for numeric ranges in set literals
star Asterisk (*) used for subscript expansions ([*])
identifier Keywords (and, or, not) or user-defined fields (process) or operator tokens (eq, lt, etc.)
string A quoted string ("foo" or 'bar') or raw string (r#"something"#)
bytes A bytes literal (b"\\x41\\x42")
number Numeric literal (123, 3.14)
less_than, greater_than < or > (not generally used in the parser, but recognized by the lexer)
eof End of input marker

6. Grammar and AST

The overall grammar hierarchy is as follows (simplified BNF notation):


Expression           := OrExpression

OrExpression         := AndExpression ("or" AndExpression)*
AndExpression        := UnaryExpression ("and" UnaryExpression)*
UnaryExpression      := ("not")? PrimaryExpression

PrimaryExpression    := "(" Expression ")"
                      | OperandOrFunctionWithOperator

OperandOrFunctionWithOperator
                     := OperandOrFunction (ComparisonOperator OperandOrFunction)?

OperandOrFunction    := Operand ("(" ArgList? ")" )?
                        ("[" SubscriptIndex "]")*

Operand              := NumberLiteral
                      | StringLiteral
                      | BooleanLiteral
                      | BytesLiteral
                      | InSet
                      | FieldReference

InSet                := "{" (InSetElement (InSetElement)*)? "}"

InSetElement         := NumberLiteral (".." NumberLiteral)?
                      | StringLiteral
                      | ...

ComparisonOperator   := "eq" | "ne" | "lt" | "le" | "gt" | "ge" | "in" | "has"

ArgList              := Expression ("," Expression)*

SubscriptIndex       := "*" | Expression

The parser transforms the tokens into a strongly typed AST. The core node types are:

Node Type Role
LogicalNode Represents expr1 AND expr2 or expr1 OR expr2.
ComparisonNode Binary operator (eq, lt, etc.) with left and right sub-expressions.
FunctionCallNode Function invocation (starts_with(field, "xyz")). Holds argument list and return type.
FieldReferenceNode Refers to a field name, possibly dotted (file.path). Has an inferred fieldType.
LiteralNode Represents a literal (number, string, boolean, or bytes).
InSetNode Used for the in operator to hold multiple possible values or expansions from numeric ranges.
SubscriptNode Array or map indexing (arr[idx], obj[key], or arr[*] for expansion).
Note: Most node interfaces share a negated field. This indicates that the node is prefixed with a not (if applicable). In practice, the parser often sets or toggles this during unary expression parsing.

7. Configuration and Known Fields

The parser references a GelConfig object containing:


8. Usage Examples

8.1 Simple Expression

not user.is_admin eq true and file.extension in {"exe" "dll"}

After tokenization, a set of Token objects is produced, e.g. (identifier=not), (identifier=user), (dot=.), (identifier=is_admin), (identifier=eq), (identifier=true), (identifier=and), .... The parser then builds an AST:

LogicalNode {
  operator: "and",
  left: ComparisonNode {
    operator: "eq",
    left: FieldReferenceNode("user.is_admin"),
    right: LiteralNode(boolean=true),
    negated: true
  },
  right: ComparisonNode {
    operator: "in",
    left: FieldReferenceNode("file.extension"),
    right: InSetNode { values: ["exe", "dll"] },
    negated: false
  },
  negated: false
}

8.2 Function Call

starts_with(file.name, "test") or ends_with(file.name, ".txt")

If starts_with and ends_with are known functions returning booleans, the parser will produce a FunctionCallNode for each call and wrap them in a LogicalNode with operator="or".


9. Error Handling

ParseError is thrown for invalid syntax, type mismatches, or unknown functions. It captures line and column for user-friendly error messages:

throw new ParseError(
  "Unknown function 'some_bad_function'.",
  token.line,
  token.column
);

When evaluating the AST, an UndefinedFieldError may be thrown if a required field is missing from the data at runtime.


10. References

Reference Description Link
evaluator.ts Implementation of GelEvaluator. View Code
engine.ts Higher-level GelEngine showcasing parsing + evaluation. View Code
End of Technical Manual
For more examples, see unit tests or integration tests in the main repository.