From 6f7578182dc657eed499cac0c69b1c5ce1d90204 Mon Sep 17 00:00:00 2001 From: Yura Dupyn <2153100+omedusyo@users.noreply.github.com> Date: Mon, 6 Apr 2026 21:57:17 +0200 Subject: [PATCH] Update README --- README.md | 37 ++++++++++++++++++++++++++++++------- 1 file changed, 30 insertions(+), 7 deletions(-) diff --git a/README.md b/README.md index 9ad2d10..7c9a8fd 100644 --- a/README.md +++ b/README.md @@ -1,12 +1,35 @@ -TypeScript library for handling source code strings. +TypeScript library for handling source code strings without having to deal with intricacies of JS's UTF16 encoding. -It has its own String type that deals with unicode in a saner than JS's UTF16 strings. - -This take in JS string, and makes it into a fat string `SourceText` that handles all the insanity of UTF16 in JS (like JS leaking UTF16 internals so that there are code-points spanning multiple indices in the string array). -- it handles NFC Normalization +# SourceText +A sane, UTF-16-safe string wrapper specifically designed for parsing source code, tracking line numbers, and generating CLI error messages. +Think of it as a fat wrapper for a string that understand more info about the string like line structure. - makes the original string easy to traverse in error-free way by introducing a character abstraction - type `CodePoint` and its position within the SourceText called `CodePointIndex` -- It also tracks where line start (handling various platform specific weirdness like `\r\n`) +- tracks where lines start (handling various platform specific weirdness like `\r\n`) -It also allows for Spatial Tracking or verious sub-regions within the source. It introduces +# Core: SourceText vs TextRegion +The most important thing to remember is the difference between `SourceText` and `SourceRegion`. +- `SourceText`: The heavy, immutable root object. Basically a fat wrapper for a JS string. It ingests the raw string, normalizes JS's weird UTF-16 surrogate pairs into actual code points, and indexes all the line breaks. You only create this once per file. +- `SourceRegion`: A region of source-code (think of it as a string-slice to a large part of the original source-code). This is what parsers/lexers work with. Most of the time you'll have exactly one `SourceRegion` spanning the whole source-code, but for certain languages it is advantageous to partition the code into multiple such large regions. + +It also allows for Spatial Tracking or various sub-regions within the source. It introduces - point-like `SourceLocation` abstraction (basically where a cursor could be) - and interval-like `Span` abstraction (basically what a mouse selection could span) + +# Locations and Spans +- `SourceLocation` is basically a smart 2D coordinate equivalent to `(line, col)` (but also tracks `CodePointIndex`) +- `Span` an interval determined by `start` and `end` SourceLocations + + +# Rendering CLI Errors +Secondary functionality is `function renderSpan(region: SourceRegion, span: Span, contextLines = 1): LineView[]` which is able to render spans of source-code as follows +``` + 7 | ◊foo + 8 | item1 + 9 | item2 (bad indent - nested text without constructor) + ^^^^ +10 | +``` + +# Warning +Performance is currently not prioritized. But the library is written in such a way that internal representation can be swapped out without affecting the clients. +