Abstractions for working over raw Source Code of typical Programming Languages in TypeScript
Find a file
2026-04-22 18:53:28 +02:00
src Region.stringOf 2026-04-22 18:53:28 +02:00
tmp_repl Initial commit 2026-04-06 15:33:41 +02:00
.gitignore Initial commit 2026-04-06 15:33:41 +02:00
LICENSE Make it into library 2026-04-06 16:05:09 +02:00
package-lock.json Use tsup for building the lib 2026-04-06 20:15:21 +02:00
package.json Fuck all of these bundlers. Just use tsc 2026-04-06 21:18:45 +02:00
README.md Update README 2026-04-06 21:57:17 +02:00
tsconfig.json Fuck all of these bundlers. Just use tsc 2026-04-06 21:18:45 +02:00

TypeScript library for handling source code strings without having to deal with intricacies of JS's UTF16 encoding.

SourceText

A sane, UTF-16-safe string wrapper specifically designed for parsing source code, tracking line numbers, and generating CLI error messages. Think of it as a fat wrapper for a string that understand more info about the string like line structure.

  • makes the original string easy to traverse in error-free way by introducing a character abstraction - type CodePoint and its position within the SourceText called CodePointIndex
  • tracks where lines start (handling various platform specific weirdness like \r\n)

Core: SourceText vs TextRegion

The most important thing to remember is the difference between SourceText and SourceRegion.

  • SourceText: The heavy, immutable root object. Basically a fat wrapper for a JS string. It ingests the raw string, normalizes JS's weird UTF-16 surrogate pairs into actual code points, and indexes all the line breaks. You only create this once per file.
  • SourceRegion: A region of source-code (think of it as a string-slice to a large part of the original source-code). This is what parsers/lexers work with. Most of the time you'll have exactly one SourceRegion spanning the whole source-code, but for certain languages it is advantageous to partition the code into multiple such large regions.

It also allows for Spatial Tracking or various sub-regions within the source. It introduces

  • point-like SourceLocation abstraction (basically where a cursor could be)
  • and interval-like Span abstraction (basically what a mouse selection could span)

Locations and Spans

  • SourceLocation is basically a smart 2D coordinate equivalent to (line, col) (but also tracks CodePointIndex)
  • Span an interval determined by start and end SourceLocations

Rendering CLI Errors

Secondary functionality is function renderSpan(region: SourceRegion, span: Span, contextLines = 1): LineView[] which is able to render spans of source-code as follows

 7 | ◊foo
 8 |   item1
 9 |     item2 (bad indent - nested text without constructor)
     ^^^^
10 |

Warning

Performance is currently not prioritized. But the library is written in such a way that internal representation can be swapped out without affecting the clients.