Update README

2026-04-06 21:57:17 +02:00 · 2026-04-06 21:57:17 +02:00 · 6f7578182d
commit 6f7578182d
parent 5df8b05274
1 changed files with 30 additions and 7 deletions
--- a/README.md
+++ b/README.md
@ -1,12 +1,35 @@
-TypeScript library for handling source code strings.
+TypeScript library for handling source code strings without having to deal with intricacies of JS's UTF16 encoding.
-It has its own String type that deals with unicode in a saner than JS's UTF16 strings.
+# SourceText
-
+A sane, UTF-16-safe string wrapper specifically designed for parsing source code, tracking line numbers, and generating CLI error messages.
-This take in JS string, and makes it into a fat string `SourceText` that handles all the insanity of UTF16 in JS (like JS leaking UTF16 internals so that there are code-points spanning multiple indices in the string array).
+Think of it as a fat wrapper for a string that understand more info about the string like line structure.
 - it handles NFC Normalization
 - makes the original string easy to traverse in error-free way by introducing a character abstraction - type `CodePoint` and its position within the SourceText called `CodePointIndex`
- It also tracks where line start (handling various platform specific weirdness like `\r\n`)
+- tracks where lines start (handling various platform specific weirdness like `\r\n`)
-It also allows for Spatial Tracking or verious sub-regions within the source. It introduces
+# Core: SourceText vs TextRegion
 The most important thing to remember is the difference between `SourceText` and `SourceRegion`.
 - `SourceText`: The heavy, immutable root object. Basically a fat wrapper for a JS string. It ingests the raw string, normalizes JS's weird UTF-16 surrogate pairs into actual code points, and indexes all the line breaks. You only create this once per file.
 - `SourceRegion`: A region of source-code (think of it as a string-slice to a large part of the original source-code). This is what parsers/lexers work with. Most of the time you'll have exactly one `SourceRegion` spanning the whole source-code, but for certain languages it is advantageous to partition the code into multiple such large regions.
 It also allows for Spatial Tracking or various sub-regions within the source. It introduces
 - point-like `SourceLocation` abstraction (basically where a cursor could be)
 - and interval-like `Span` abstraction (basically what a mouse selection could span)
 # Locations and Spans
 - `SourceLocation` is basically a smart 2D coordinate equivalent to `(line, col)` (but also tracks `CodePointIndex`)
 - `Span` an interval determined by `start` and `end` SourceLocations
 # Rendering CLI Errors
 Secondary functionality is `function renderSpan(region: SourceRegion, span: Span, contextLines = 1): LineView[]` which is able to render spans of source-code as follows
 ```
 7 | ◊foo
 8 |   item1
 9 |     item2 (bad indent - nested text without constructor)
     ^^^^
 10 |
 ```
 # Warning
 Performance is currently not prioritized. But the library is written in such a way that internal representation can be swapped out without affecting the clients.