Abstractions for working over raw Source Code of typical Programming Languages in TypeScript

Find a file

Yura Dupyn 000949f3c3 Region.stringOf		2026-04-22 18:53:28 +02:00
src	Region.stringOf	2026-04-22 18:53:28 +02:00
tmp_repl	Initial commit	2026-04-06 15:33:41 +02:00
.gitignore	Initial commit	2026-04-06 15:33:41 +02:00
LICENSE	Make it into library	2026-04-06 16:05:09 +02:00
package-lock.json	Use `tsup` for building the lib	2026-04-06 20:15:21 +02:00
package.json	Fuck all of these bundlers. Just use `tsc`	2026-04-06 21:18:45 +02:00
README.md	Update README	2026-04-06 21:57:17 +02:00
tsconfig.json	Fuck all of these bundlers. Just use `tsc`	2026-04-06 21:18:45 +02:00

README.md

TypeScript library for handling source code strings without having to deal with intricacies of JS's UTF16 encoding.

SourceText

A sane, UTF-16-safe string wrapper specifically designed for parsing source code, tracking line numbers, and generating CLI error messages. Think of it as a fat wrapper for a string that understand more info about the string like line structure.

makes the original string easy to traverse in error-free way by introducing a character abstraction - type CodePoint and its position within the SourceText called CodePointIndex
tracks where lines start (handling various platform specific weirdness like \r\n)

Core: SourceText vs TextRegion

The most important thing to remember is the difference between SourceText and SourceRegion.

SourceText: The heavy, immutable root object. Basically a fat wrapper for a JS string. It ingests the raw string, normalizes JS's weird UTF-16 surrogate pairs into actual code points, and indexes all the line breaks. You only create this once per file.
SourceRegion: A region of source-code (think of it as a string-slice to a large part of the original source-code). This is what parsers/lexers work with. Most of the time you'll have exactly one SourceRegion spanning the whole source-code, but for certain languages it is advantageous to partition the code into multiple such large regions.

It also allows for Spatial Tracking or various sub-regions within the source. It introduces

point-like SourceLocation abstraction (basically where a cursor could be)
and interval-like Span abstraction (basically what a mouse selection could span)

Locations and Spans

SourceLocation is basically a smart 2D coordinate equivalent to (line, col) (but also tracks CodePointIndex)
Span an interval determined by start and end SourceLocations

Rendering CLI Errors

Secondary functionality is function renderSpan(region: SourceRegion, span: Span, contextLines = 1): LineView[] which is able to render spans of source-code as follows

 7 | ◊foo
 8 |   item1
 9 |     item2 (bad indent - nested text without constructor)
     ^^^^
10 |

Warning

Performance is currently not prioritized. But the library is written in such a way that internal representation can be swapped out without affecting the clients.