Update README

This commit is contained in:
Yura Dupyn 2026-04-06 21:57:17 +02:00
parent 5df8b05274
commit 6f7578182d

View file

@ -1,12 +1,35 @@
TypeScript library for handling source code strings. TypeScript library for handling source code strings without having to deal with intricacies of JS's UTF16 encoding.
It has its own String type that deals with unicode in a saner than JS's UTF16 strings. # SourceText
A sane, UTF-16-safe string wrapper specifically designed for parsing source code, tracking line numbers, and generating CLI error messages.
This take in JS string, and makes it into a fat string `SourceText` that handles all the insanity of UTF16 in JS (like JS leaking UTF16 internals so that there are code-points spanning multiple indices in the string array). Think of it as a fat wrapper for a string that understand more info about the string like line structure.
- it handles NFC Normalization
- makes the original string easy to traverse in error-free way by introducing a character abstraction - type `CodePoint` and its position within the SourceText called `CodePointIndex` - makes the original string easy to traverse in error-free way by introducing a character abstraction - type `CodePoint` and its position within the SourceText called `CodePointIndex`
- It also tracks where line start (handling various platform specific weirdness like `\r\n`) - tracks where lines start (handling various platform specific weirdness like `\r\n`)
It also allows for Spatial Tracking or verious sub-regions within the source. It introduces # Core: SourceText vs TextRegion
The most important thing to remember is the difference between `SourceText` and `SourceRegion`.
- `SourceText`: The heavy, immutable root object. Basically a fat wrapper for a JS string. It ingests the raw string, normalizes JS's weird UTF-16 surrogate pairs into actual code points, and indexes all the line breaks. You only create this once per file.
- `SourceRegion`: A region of source-code (think of it as a string-slice to a large part of the original source-code). This is what parsers/lexers work with. Most of the time you'll have exactly one `SourceRegion` spanning the whole source-code, but for certain languages it is advantageous to partition the code into multiple such large regions.
It also allows for Spatial Tracking or various sub-regions within the source. It introduces
- point-like `SourceLocation` abstraction (basically where a cursor could be) - point-like `SourceLocation` abstraction (basically where a cursor could be)
- and interval-like `Span` abstraction (basically what a mouse selection could span) - and interval-like `Span` abstraction (basically what a mouse selection could span)
# Locations and Spans
- `SourceLocation` is basically a smart 2D coordinate equivalent to `(line, col)` (but also tracks `CodePointIndex`)
- `Span` an interval determined by `start` and `end` SourceLocations
# Rendering CLI Errors
Secondary functionality is `function renderSpan(region: SourceRegion, span: Span, contextLines = 1): LineView[]` which is able to render spans of source-code as follows
```
7 | ◊foo
8 | item1
9 | item2 (bad indent - nested text without constructor)
^^^^
10 |
```
# Warning
Performance is currently not prioritized. But the library is written in such a way that internal representation can be swapped out without affecting the clients.