Files
fpc-msgbase/docs/PROPOSAL.md
Ken Johnson 1e253e8a78 Phase 5: attribute registry + arch / proposal / README updates
New docs/attributes-registry.md publishes the canonical attribute
key catalog in four tiers:

  1. Universal headers — msg.num, from, to, subject, date.*, addr.*,
     area, board, cost.  Every Fido format carries them.
  2. Canonical attribute bits — attr.private, attr.crash, etc.,
     mapped to/from the FTS-1 attribute word.
  3. FTSC kludges — msgid, replyid, pid, tid, flags, chrs, tzutc,
     seen-by, path, via.  Multi-line keys use #13 between lines.
  4. Format-specific — jam.*, squish.*, hudson.*, goldbase.*, ezy.*,
     pcb.*, wildcat.*, pkt.*, msg.*.  Each backend's namespace.

Plus a per-format support matrix showing which keys each backend
carries. Authoritative source remains each backend's
ClassSupportedAttributes -- the matrix can drift; SupportsAttribute()
is the runtime-correct query.

docs/architecture.md TUniMessage section rewritten:
- Documents the strict two-area model (Body + Attributes only).
- Body holds only the message text, never kludges or headers.
- Library never composes presentation -- consumers walk Attributes
  and assemble their own display.
- Adds the capabilities API section pointing at the registry.
- Removes the stale "kludge lines intact and CR-separated" promise
  the previous adapter implementations didn't honor.

docs/PROPOSAL.md flags the original Extras-bag section as
SUPERSEDED 2026-04-17, points to the registry + architecture docs
as the live design. Original text retained as historical context
since it captures the conversation that drove the redesign.

README.md:
- Features list now leads with the lossless two-area model and the
  capabilities API.
- Adds a Status note flagging 0.2 as a breaking change vs 0.1 with
  a one-paragraph migration sketch (msg.WhoFrom -> Attributes.Get
  ('from'), etc.).
- Documentation index links to the new registry doc.
2026-04-17 14:35:19 -07:00

842 lines
31 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# fpc-msgbase — plug-and-forget redesign proposal
**Status:** design handover, pre-implementation. Prepared 2026-04-15.
**Target audience:** the next implementation session.
**Precondition:** fpc-msgbase is at 0.1.0 (commit `b79e7fb` at the time
of writing); 9 format backends implemented from FTSC and format-author
specs, 7/7 tests green.
---
## TL;DR
fpc-msgbase today solves the *read* side cleanly and ports 9 formats
with tested fidelity. It's **not yet plug-and-forget** for embedders
like comet is. Four things are missing: a lossless round-trip
guarantee, an atomic outbound-packet builder (`TOutboundBatch`),
format-agnostic I/O injection, and a single-callback event model.
This proposal is **not a rewrite** — it's a six-week reshape in
place. The format backends stay. The scaffolding changes.
When done, the acceptance test is a five-line Hello World that opens
a path, reads messages, closes. No format name, no lock ceremony,
no event registry, no `.uni` sidecar unit, no init-order hazard.
---
## Context: where this proposal came from
A cross-project conversation between NetReader (`../netreader`, the
HPT-drop-in mail tosser) and fpc-msgbase. NetReader has its own
format layer (`src/core/nr.msgbase.*`) hardened by a 39/39 live trial
and a 54/54 CLI-parity test suite.
During that conversation we compared:
1. Do the two libraries agree on bytes when reading/writing the same
base? (**Unverified. First action item below.**)
2. What would "plug it in and forget" look like for fpc-msgbase?
The conclusions that drove this proposal:
- fpc-msgbase's `TPacketBatch` handles the **inbound** concurrent
tosser side cleanly. It **does not** handle the symmetric outbound
side (writing per-link packets with size rotation and atomic
finalize). Neither library does, today.
- `TUniMessage` as-currently-defined is **lossy**: JAM
`MsgIdCRC`/`ReplyCRC`, `DateProcessed`, and every other
format-specific header field has no canonical slot, so a caller
that does `read → write` through the unified API silently loses
bytes.
- The two-unit-per-format inclusion pattern (`ma.fmt.jam` +
`ma.fmt.jam.uni`) asks callers to remember both. The `.uni`
sidecar's only job is `initialization` registration — it belongs
*inside* the format unit.
- `wc_sdk/` is ~66K lines of BTrieve-era Pascal pulled in
unconditionally by any caller who uses Wildcat. Embeddability
requires build-time gating.
- The error model mixes typed exceptions (`EMessageBase` from the
factory) with boolean returns (all other methods) with event
fires (locking). Three mental models for one library is two too
many.
The reshape below addresses each.
---
## Current state — file-by-file
| Unit | LOC | Purpose | Keep? |
|---|---|---|---|
| `src/ma.api.pas` | ~330 | `TMessageBase` abstract, factory | **Reshape** |
| `src/ma.types.pas` | ~600 | `TUniMessage`, `TFTNAddress`, bits | **Reshape** |
| `src/ma.events.pas` | ~100 | multi-subscriber event registry | **Replace** with single callback |
| `src/ma.lock.pas` | ~250 | 3-layer locking | **Keep**, make explicit |
| `src/ma.paths.pas` | ~180 | per-format path derivation | **Keep** |
| `src/ma.batch.pas` | 333 | inbound `TPacketBatch` | **Keep** + add `TOutboundBatch` twin |
| `src/formats/ma.fmt.<fmt>.pas` × 9 | 4001300 each | native spec-driven backend | **Keep** — hardened |
| `src/formats/ma.fmt.<fmt>.uni.pas` × 9 | 100200 each | adapter to `TUniMessage` | **Fold into** the format unit |
| `src/wc_sdk/` | ~40K | Wildcat BTrieve SDK | **Gate** behind `{$DEFINE MA_WITH_WILDCAT}` |
Tests: `tests/test_*.pas` → keep, extend with the round-trip
corpus below.
---
## The five-line acceptance test
When the reshape is done, this works unchanged across every format:
```pascal
uses OpenMsgLib, OpenMsgLib.Auto;
var B: TMsgBase; M: TMsgRecord; i: integer;
B := OpenMsgBase('/path/to/base', momReadOnly);
try
for i := 0 to B.Count - 1 do
if B.Read(i, M) then WriteLn(M.Subject);
finally B.Free; end;
```
No format name (autodetect). No lock call (default = no lock, caller
opts in). No event wiring (default = silent). No `.uni` unit (one
unit per format registers itself). No init-order hazard (single
registration point).
If a new session reads a test base and the above doesn't work
verbatim, the reshape isn't done.
---
## Target architecture
### Three-tier layering
```
Caller (BBS, tosser, editor, importer)
├── TMsgBase (unified) ← most callers live here
├── Direct format class ← drop-down when Extras aren't enough
└── Raw stream ← replay, test, encrypt, mock
Format backends speak to ITsmIO — never TFileStream directly
ITsmIO adapters: file | memory | encrypted | test harness
```
**Why add ITsmIO:**
- Test backends without hitting disk (in-memory fixtures)
- Wrap the lib in encryption/compression
- Run on network mounts with unusual locking semantics
- Replay captured corrupt frames for debugging
### `TMsgRecord` with lossless `Extras`
> **SUPERSEDED 2026-04-17.** The "named fields + Extras bag" hybrid
> below was rejected during implementation in favour of a stricter
> two-area model: `Body` holds only the message text; **everything
> else** (from/to/subject/dates/addresses + every kludge + every
> format-specific field) is an attribute. See
> [`docs/attributes-registry.md`](attributes-registry.md) for the
> key catalog and [`docs/architecture.md`](architecture.md) for the
> updated TUniMessage contract. The original Extras-bag design is
> retained below as historical context.
>
> The capabilities API discussed in this section landed essentially
> as proposed (`base.SupportsAttribute(K)` + class-level
> `ClassSupportedAttributes`).
```pascal
TMsgRecord = record
{ Universal fields every format has: }
Index: longint;
From, To_: AnsiString;
Subject: AnsiString;
DateWritten: TDateTime;
DateArrived: TDateTime;
OrigAddr: TFtnAddr;
DestAddr: TFtnAddr;
Attr: cardinal; { canonical MSG_ATTR_* bitset }
Body: AnsiString; { kludges + text, CR-separated }
{ Backend-specific fields preserved verbatim across round-trips.
Every key a backend WRITES during a Read it MUST re-consume
during a subsequent Write. Test harness enforces this. }
Extras: TMsgExtras;
end;
TMsgExtras = record
function Get(const Key, Default: AnsiString): AnsiString;
procedure SetValue(const Key, Value: AnsiString);
function Has(const Key: AnsiString): boolean;
end;
```
**Well-known keys** (published in `docs/extras-registry.md`, to be
written):
| Key | Type | Source | Notes |
|---|---|---|---|
| `jam.msgidcrc` | hex u32 | JAM `.jhr` fixed header | needed for NR linker `-j` fast path |
| `jam.replycrc` | hex u32 | JAM `.jhr` fixed header | same |
| `jam.dateprocessed` | unix int | JAM `.jhr` | tosser timestamp |
| `jam.passwordcrc` | hex u32 | JAM `.jhr` | per-msg password |
| `jam.cost` | int | JAM `.jhr` | |
| `squish.umsgid` | hex u32 | SQI frame | unique msg ID |
| `hudson.board` | int 1..200 | MSGINFO | board number |
| `hudson.refer` | int | HDR | refer-to ptr |
| `pcb.confnum` | int | PCB `.IDX` | conference |
| `pcb.refer` | int | PCB `.IDX` | |
| `ezy.msgflags` | hex byte | MH#####.BBS | |
| `goldbase.userid` | int | IDX | |
| `wildcat.confnum` | int | WC SDK | |
| `pkt.cost` | int | Type-2 header | |
| `pkt.flavour` | enum | outbound pkt only | crash/hold/direct/imm/norm |
**Round-trip invariant:**
> After `Read(i, M)`, calling `Write(M)` on the same or a different
> base of the same format produces bytes that round-trip through
> `Read` to an identical `TMsgRecord` (same Extras keys, same values).
Tests enforce this across the full corpus.
### Error model — one tree
```
EMessageBase
├─ EMessageBaseIO disk full, permission, corrupt read
├─ EMessageBaseLock timeout, contention, deadlock
├─ EMessageBaseFormat bad signature, truncated header
├─ EMessageBaseRange Index out of [0..Count)
└─ EMessageBaseClosed operation on a freed/closed base
```
Every method either **succeeds** or **raises**. Boolean returns mean
"nothing to do" (empty Pack, no new messages, no matching records) —
never failure. A single `try/except on E: EMessageBase do ...`
catches the whole tree.
### Locking — explicit, not implicit
```pascal
TMsgBase = class
procedure LockForRead;
procedure LockForWrite;
procedure Unlock;
function TryLockForRead (TimeoutMs: integer): boolean;
function TryLockForWrite(TimeoutMs: integer): boolean;
{ Common-case one-liners. }
procedure WithReadLock (AProc: TProc);
procedure WithWriteLock(AProc: TProc);
end;
```
Default `Open` acquires **no lock**. Callers choose. A read-only
BBS frontend doesn't need cross-process locking; a tosser does.
Library doesn't guess. Common case stays a one-liner via
`WithReadLock`.
### Transactions — declared
```pascal
base.BeginTransaction;
try
for msg in batch do base.WriteMessage(msg);
base.Commit;
except
base.Rollback;
raise;
end;
```
Backends implement per-format:
- **JAM:** defer `.jdx` index updates in memory, flush on Commit.
- **SDM:** temp-dir shadow, rename on Commit.
- **Hudson:** in-memory index delta, flush on Commit.
- **Squish:** frame-list delta, flush on Commit.
- **PKT (write):** temp file, rename on Commit.
- **Read-only mode:** Commit is a no-op.
Callers who don't call `Begin/Commit` still work — writes flush
per-call. Transactions are opt-in for atomicity.
### Events — single callback
```pascal
TMsgEvent = record
EventType: TMsgEventType; { BaseOpened, MessageRead, etc. }
Source: TMsgBase; { may be nil for lib-global events }
Subject: AnsiString; { path, area tag, msgid }
Detail: AnsiString; { human-readable }
LongValue: int64; { count, size, offset }
TimeStamp: TDateTime;
end;
TMsgEventCallback = procedure(const E: TMsgEvent) of object;
base.OnEvent := @MyHandler; { one pointer, that's it }
```
comet does this with `OnLog`. One callback, caller multiplexes if
they need multiple observers. No multi-subscriber registry. Radical
simplification.
### Format detection is the default API
```pascal
{ Primary — sniff the path, pick the backend. }
function OpenMsgBase(const Path: AnsiString;
Mode: TMsgOpenMode): TMsgBase;
{ Escape hatch — force a specific format. }
function OpenMsgBaseAs(Format: TMsgBaseFormat;
const Path: AnsiString;
Mode: TMsgOpenMode): TMsgBase;
```
Fingerprints:
- JAM: `.jhr` + `.jdx` pair, `"JAM\0"` signature
- Squish: `.sqd` + `.sqi`
- Hudson: `MSGINFO.BBS` + `MSGHDR.BBS` + ...
- GoldBase: `MSGINFO.DAT` + ...
- PCBoard: `.IDX` + `.MSG` pair
- EzyCom: `MH*.BBS` + `MT*.BBS`
- Wildcat: WC SDK marker file
- SDM: directory full of numbered `.msg` files (fallback)
### One unit per format, self-registering
```pascal
uses
OpenMsgLib, { core + factory }
OpenMsgLib.Jam, { registers JAM format in initialization }
OpenMsgLib.Hudson; { registers Hudson in initialization }
```
No `.uni` split. The format unit's `initialization` block calls
`RegisterFormat(..., @Factory)`. Inclusion is the registration.
### `TOutboundBatch` — the missing half of the tosser
The symmetric twin of `TPacketBatch` (inbound). Details in the
appendix; the shape:
```pascal
TOutboundBatch = class
constructor Create(const AOutboundDir: AnsiString;
const AOurAka: TFtnAddr);
{ Append a message to the outbound packet for (Target, Flavour).
Caches the open pkt per (Target, Flavour) pair so repeat calls
write to the same file until rotation or Flush. }
function DispatchMessage(const Msg: TMsgRecord;
const Target: TFtnAddr;
Flavour: FlavourType): boolean;
{ Finalise every cached pkt — write terminator, rename .tmp →
final, update .flo if configured. Idempotent. }
procedure Flush;
property MaxPktSizeKB: longint; { 0 = unlimited; rotate at threshold }
property OnEvent: TMsgEventCallback;
end;
```
Two features missing in both NR and fpc-msgbase today, baked in
here from day one:
- **Packet size rotation.** Before each write, check cached stream
size + estimated msg size. If over threshold, close current
(`.tmp``.pkt` rename), open next.
- **Atomic finalize.** Writes go to `xxxxxxxx.pkt.tmp`. Only `Flush`
(or rotation) renames to `.pkt`. Crash mid-run leaves an orphan
`.tmp` — not a corrupt real packet.
Format-agnostic: writes to Type-2 / 2+ / 2.2 pkts via the existing
`ma.fmt.pkt` backend. FTN routing stays in the caller (NR's
`FindRouteForNetmail`/`GetLinkForRoute` is fidoconf-specific, not
library material).
---
## Reference map — where to look in existing code
### Inside fpc-msgbase
| Concern | File | Relevant lines |
|---|---|---|
| Factory pattern to extend | `src/ma.api.pas` | full file |
| `TUniMessage` to grow with Extras | `src/ma.types.pas` | record definition, top of file |
| Existing inbound tosser to mirror for outbound | `src/ma.batch.pas` | `TPacketBatch`, `GetOrCreateBase` (line 296) |
| Example tosser template | `examples/example_tosser.pas` | `TSimpleTosser.OnMessage` |
| Existing lock layer to make explicit | `src/ma.lock.pas` | full file |
| Event registry to replace with single callback | `src/ma.events.pas` | full file |
| Per-format native → uni adapters to fold together | `src/formats/*.uni.pas` × 9 | each is ~100-200 LOC |
| Sample data for tests (if any) | `tests/` | check what's already there |
### In NetReader (`../netreader`)
NR has already solved several of these in its own idiom. The
outbound machinery is the most reusable reference:
| Concern | File | Relevant lines |
|---|---|---|
| Cached outbound packet per dest addr | `src/core/nr.scanner.pas` | `GetOutboundPacket` line 214, `CloseAllPackets` line 291 |
| Temp-pkt filename generator | `src/core/nr.scanner.pas` | `CreateTempPktFileName` call at 256 |
| Per-message route + pack logic | `src/core/nr.scanner.pas` | `PackMsg` line 551 |
| Route resolution (stays in NR) | `src/core/nr.scanner.pas` | `FindRouteForNetmail` 398, `GetLinkForRoute` 451 |
| Priority from Attr + FLAGS kludge | `src/core/nr.scanner.pas` | `PackMsg` lines 609-636 |
| Zone-aware outbound path | `src/core/nr.arcmail.pas` | `GetOutboundDir` (around line 178), `GetFLOPath` 219 |
| IsArcMailExt helper | `src/core/nr.arcmail.pas` | `IsArcMailExt` line 155 |
| Pkt header + message write | `src/core/nr.packet.pas` | `WritePktHeader`, `WritePktMessage` |
| Scanner's NoHighWaters pattern (useful for read-only verifiers) | `src/core/nr.scanner.pas` | `ScanNMArea` 1098 |
| JAM header-only read (NR's linker CRC fast path) | `src/msgbase/nr.msgbase.jam.pas` | `TNrJamMsgBase.ReadLinkHdr`, `TNrJamLinkHdr` record |
NR's entire CLI parity pass (last 7 commits in `../netreader`) is
built on this scaffolding. Pattern should transfer cleanly.
### Reference: format specs
The FTSC document collection at `/home/ken/Source Code/ftsc/docs/` and
the format-author specs (jam.txt, squish.doc, pcboard.doc, etc.) are
the authoritative source. Every backend cites the spec it implements
in `docs/ftsc-compliance.md`.
### In comet (`../comet`)
comet is the "plug it in and forget" model. Key patterns worth
mirroring:
- Single log callback (`OnLog`), no registry
- `TStream`-centric I/O — caller controls the stream
- Config hot-reload without API surgery
- Embeddable via narrow callback surface
See `../comet/README.md` "Embeddable" bullet.
---
## Test corpus — the `testmsg/` folder
**Status:** to be populated. Proposal: live at
`/home/ken/Source Code/fpc-msgbase/testmsg/` (checked in, anonymized,
git-tracked so rollback is just `git checkout`).
Proposed structure:
```
testmsg/
├── README.md how-to regenerate, licensing notes
├── jam/
│ ├── small_echo/ few-hundred-msg JAM area, real echoarea
│ ├── large_echo/ 10k+ messages, stresses index growth
│ ├── deleted_mix/ area with tombstoned msgs for Pack tests
│ └── netmail/ JAM netmail w/ kludges
├── squish/
│ ├── small_echo/
│ └── netmail/
├── hudson/
│ └── 3-board/ multi-board fixture for Board field test
├── msg/ FTS-1 numbered *.msg
│ ├── netmail/
│ └── echo/ rare but tested
├── pcb/
│ └── sample_conf/
├── ezycom/
├── goldbase/
├── wildcat/
├── pkt/
│ ├── type2_plain/
│ ├── type2plus/
│ └── type2_2/
└── reference/
├── jam_small_echo.json canonical-JSON snapshot
├── jam_large_echo.json (round-trip baseline — DO NOT edit)
└── ... one per corpus base
```
**Anonymization rules:** before check-in, scrub real user addresses
and passwords from kludges. A small helper `tools/anonymize.pas`
can do this deterministically — replaces real MSGID address with
`999:9999/9999`, replaces user names with `User<N>` tokens.
**Regeneration script:** `tools/regen_reference.sh` walks each corpus
base via this library, dumps canonical JSON to `testmsg/reference/`.
Committed output is the ground truth captured from a known-good build;
later test runs diff their output against the committed JSON.
**Rollback story:** `git checkout testmsg/` restores any corrupted
fixture. Keep fixtures small-ish (<50MB total across all formats)
so the repo stays cloneable on slow links.
---
## Byte-agreement cross-verifier — first actionable task
Before any redesign, **confirm the two libraries agree on the
bytes they read and write today**. Without this baseline, reshape
could silently regress behavior that currently works.
**Tool:** `tools/cross_verify.pas` — standalone FPC program.
**What it does:**
1. Open a corpus base with the existing fpc-msgbase.
2. Read all messages → dump each to a canonical JSON record.
3. Open the same base with NetReader's `nr.msgbase.*`.
4. Read all messages → dump to same canonical JSON format.
5. Diff the two outputs. Report mismatches per field.
For write verification:
1. Fabricate 100 messages with known content.
2. Write them through fpc-msgbase to a fresh JAM base.
3. Read them back through NR's backend, diff.
4. Repeat with write via NR, read via fpc-msgbase.
Expected outcome: both should agree for universal fields. Where
they disagree is where the proposal's `Extras` story becomes
important — those are the fields each side handles differently
(or one side silently drops).
**Deliverable:** a report noting exactly which fields differ per
format, so the Extras registry (above) is anchored in real data
rather than speculation.
This is ~2-4 hours of work and produces the single most important
input to the reshape.
---
## Implementation plan — six weeks
| Week | Deliverables |
|---|---|
| **1** | `testmsg/` corpus committed (3-5 bases per format, anonymized). `tools/cross_verify.pas` running. `tools/regen_reference.sh` producing committed canonical JSON. Baseline: which fields differ between fpc-msgbase and NR today? |
| **2** | Grow `TUniMessage``TMsgRecord` with `Extras` bag. Publish `docs/extras-registry.md` naming every well-known key. Backfill round-trip test: every backend reads a corpus base, writes to a fresh base, reads back, Extras map preserved. Fold `.uni` sidecars into format units. |
| **3** | Introduce `ITsmIO`. Refactor every format backend to take an `ITsmIO` instead of `TFileStream` directly. Add `TMemoryTsmIO` for tests. Run the full test suite in memory — no disk writes. |
| **4** | Land `TOutboundBatch` with size rotation + atomic finalize. Write `examples/example_multiplex` that splits inbound pkts by destination and writes per-link outbound pkts via `TOutboundBatch`. |
| **5** | Unify error model (typed `EMessageBase*` tree; boolean returns mean "nothing to do" only). Replace `TMessageEvents` registry with single `OnEvent` callback. `TMsgBase` locking becomes explicit (no implicit `LockForRead` on Open). `WithReadLock` / `WithWriteLock` helpers. |
| **6** | Documentation pass. Full API reference regen. `CHANGELOG.md` with 0.1 → 1.0 migration notes. Version constants `MA_VERSION_MAJOR`/`MINOR` + runtime `MaRequireVersion`. `{$DEFINE MA_WITH_<FORMAT>}` gates finalized. Ship 1.0. |
Each week's work ships independently — no big-bang merge.
---
## Beyond the six-week plan — things worth planning
### Version ABI discipline
```pascal
const
MA_VERSION_MAJOR = 1;
MA_VERSION_MINOR = 0;
{ Caller invokes at program start. Raises if compiled against
major < required or (major == required AND minor < required). }
procedure MaRequireVersion(Major, Minor: integer);
```
Libraries consumed by multiple callers (Fimail, NetReader, third-party)
need a noisy "you linked the wrong version" failure mode. Not
"weird behavior six months later."
### Read-only airtightness
`momReadOnly` should be verifiable. Regression test:
1. Copy a base to a read-only mount (`chmod -w` + `mount -o ro`).
2. `OpenMsgBase(path, momReadOnly)`.
3. Read 10k messages, every field.
4. Assert: every file's `st_mtime` is unchanged; no syscall fired
that opens a file for write.
A BBS running concurrently with a tosser depends on this being
airtight — no "oops, we updated last-read" surprises.
### Build-time format gating
```pascal
{ config.inc }
{$DEFINE MA_WITH_JAM}
{$DEFINE MA_WITH_HUDSON}
{$DEFINE MA_WITH_SQUISH}
{ $DEFINE MA_WITH_WILDCAT} { commented out — 66K LOC pulled in }
```
```pascal
{ OpenMsgLib.All.pas — convenience include-all unit }
unit OpenMsgLib.All;
interface
uses
OpenMsgLib
{$IFDEF MA_WITH_JAM}, OpenMsgLib.Jam{$ENDIF}
{$IFDEF MA_WITH_HUDSON}, OpenMsgLib.Hudson{$ENDIF}
...
;
end.
```
Embedders control what ships. A full tosser wants everything.
A minimal BBS UI wants JAM only. No one should be forced to
compile BTrieve-era SDK code for a JAM reader.
### Documentation structure
```
docs/
├── API.md full API reference (regen each 1.x)
├── architecture.md layered design (update to three-tier)
├── extras-registry.md well-known Extras keys per format
├── ftsc-compliance.md spec notes
├── migration-0.1-to-1.0.md for existing callers
├── embedder-guide.md for BBS/tosser authors
└── format-notes/ per-format quirks & gotchas
├── jam.md
├── squish.md
└── ...
```
### CI expectations
When the repo has CI (GitHub Actions, GitLab CI, whatever), the
test job is:
1. Build with all formats enabled.
2. Build with each format *disabled* in turn — prove conditional
compilation holds.
3. Run `run_tests.sh`.
4. Run `tools/cross_verify.pas` against the corpus.
5. Verify `docs/extras-registry.md` lists every key any backend
writes (grep the source for `Extras.SetValue`).
---
## Contribution path back from NetReader
NR has six weeks of HPT-parity work sitting on top of its own
msgbase. When the reshape hits 1.0, NR has a decision:
- **A.** Adopt fpc-msgbase wholesale — drop `nr.msgbase.*`, call
the lib. NR becomes a thin areafix + scanner + CLI over the
shared library. Big commit, one-time pain.
- **B.** Keep NR's backends, cherry-pick fpc-msgbase's event
dispatcher / lock model / outbound batch. Lighter touch.
- **C.** Contribute NR's improvements (JAM CRC fast-path,
case-insensitive `.msg` globbing, netmail-cfg-writer fix,
FTS-0004 tag validation) back to fpc-msgbase. Symmetric win.
Option C is the first step regardless of whether A or B lands
later. NR's recent work is format-agnostic quality improvements
that every embedder wants.
---
## Open questions for the implementation session
Before any code changes:
1. **Extras representation.** Key-value `AnsiString` bag is dead
simple but slow for 100K messages. Alternatives:
- `TDictionary<AnsiString, AnsiString>` (rtl-generics) — faster lookup
- Packed binary blob with offsets — smallest memory footprint
- Keep strings but cap each Extras to N keys (fixed-size array)
Benchmark before picking.
2. **Transaction nesting.** Can `BeginTransaction` nest? JAM defers
`.jdx` updates in memory — nested transactions just keep
deferring. SDM's tempdir-shadow approach can't nest cleanly.
Propose: **no nesting**. Second `BeginTransaction` call raises.
3. **Thread safety of `TMsgBase`.** Inbound `TPacketBatch` shares
one base across workers, serialises via per-base CS. Works today.
Does the reshape preserve that? Answer: yes, the explicit
lock API makes it *more* obvious.
4. **Squish `.sql` lastread vs `.sqi` index.** This library treats
`.sql` as the cross-process lock sentinel (matching the convention
other Squish-aware tools use). The reshape should document this
explicitly — it's a format-specific quirk that callers shouldn't
need to know.
5. **PKT as a base vs as a stream.** `TMessageBase` abstraction
assumes random-access read. A PKT is a forward-only stream of
messages. Does PKT implement `Count`? (Reader would need to
scan ahead to count.) Propose: PKT implements `Count` but
flags `CanRandomAccess = False`, caller iterates via `MoveNext`
instead of `Read(i)`. Callers who treat PKT as a base get a
clear exception.
6. **Wildcat SDK cleanup.** Is the 40K-LOC `wc_sdk/` still needed,
or can it be replaced with a narrower interop layer? (Not a
1.0 blocker, but worth scoping for 1.1.)
Decisions on these should go in `docs/design-decisions.md` as
they land, so future sessions don't re-litigate.
---
## First-session actionable steps
If this proposal makes sense to the next implementer, the first
session should:
1. **Read this document end-to-end.** Cross-check my mapping of
fpc-msgbase's current state against the actual code — call out
anything stale.
2. **Create `testmsg/` with a corpus.** Start with ONE format
(JAM — best-spec'd, most used) and 3 bases: small echoarea
(~100 msgs), large echoarea (~10K), netmail directory. Commit.
3. **Write `tools/cross_verify.pas`.** Use NR's `nr.msgbase.jam`
and fpc-msgbase's `ma.fmt.jam` as the two readers. Dump
canonical JSON, diff. Report.
4. **Report the diff.** What fields do the two libraries disagree
on today? That diff becomes the initial `extras-registry.md`.
5. **Stop. Discuss.** Before any backend refactor, the cross-diff
report informs every architectural decision below. If the two
libraries disagree on 20% of fields, the Extras story is
validated. If they agree on 100%, the Extras story is more
about future-proofing than immediate need.
Only after the cross-diff is in hand does the week-1 plan above
make sense.
---
## Appendix: the `TOutboundBatch` design in full
(From the cross-project conversation; reproduced here for
completeness.)
```pascal
unit OpenMsgLib.Outbound;
{$mode objfpc}{$H+}
interface
uses
Classes, SysUtils,
OpenMsgLib, OpenMsgLib.Types, OpenMsgLib.Pkt;
type
TOutboundBatch = class
private
FOutboundDir: AnsiString;
FOurAka: TFtnAddr;
FMaxPktSizeKB: longint;
FCacheCS: TRTLCriticalSection;
FCache: TFPHashList; { key: "zone:net/node.point|flavour" -> TEntry }
FOnEvent: TMsgEventCallback;
function GetOrCreateEntry(const Target: TFtnAddr;
Flavour: FlavourType): TEntry;
procedure RotateIfOver(Entry: TEntry; EstimatedSize: integer);
procedure FinalizeEntry(Entry: TEntry); { rename .tmp → final }
public
constructor Create(const AOutboundDir: AnsiString;
const AOurAka: TFtnAddr);
destructor Destroy; override;
function DispatchMessage(const Msg: TMsgRecord;
const Target: TFtnAddr;
Flavour: FlavourType): boolean;
procedure Flush; { finalize every cached entry }
property MaxPktSizeKB: longint read FMaxPktSizeKB write FMaxPktSizeKB;
property OnEvent: TMsgEventCallback read FOnEvent write FOnEvent;
end;
```
**Per-entry:**
```pascal
TEntry = class
Key: AnsiString; { "zone:net/node.point|flavour" }
Target: TFtnAddr;
Flavour: FlavourType;
Stream: TStream; { writer, actually holds an ITsmIO under the hood }
TmpPath: AnsiString; { xxxxxxxx.pkt.tmp }
FinalPath: AnsiString; { xxxxxxxx.pkt — set on Rotate/Flush }
CS: TRTLCriticalSection; { serialises writes to this one pkt }
WrittenSize: int64;
end;
```
**Flow inside `DispatchMessage`:**
1. Lookup entry by `(Target, Flavour)` in `FCache`.
- Miss: create entry, open `TmpPath`, write pkt header, cache.
- Hit: enter its CS.
2. `RotateIfOver(Entry, EstimatedMsgSize)`:
- If `WrittenSize + EstimatedMsgSize > MaxPktSizeKB * 1024`
(and `MaxPktSizeKB > 0`):
- Write terminator, close, rename `.tmp → .pkt`.
- Create a new `.tmp`, write header, reset `WrittenSize`.
- Fire `metPktRotated`.
3. Convert `TMsgRecord``TPktMessage` via `UniToPkt`.
4. Write via `OpenMsgLib.Pkt` writer. Update `WrittenSize`.
5. Fire `metMessageWritten`.
6. Leave entry CS.
**Flush:** iterate `FCache`, for each entry write terminator,
close, `FinalizeEntry`. Fire `metBatchFinalized`.
**Crash recovery:** on startup, scanner sees orphan `xxxxxxxx.pkt.tmp`
files. Decision policy (caller-configurable):
- **Discard** (default): delete orphan tmps, assume corrupt.
- **Recover**: try to validate pkt header + terminator, if valid
rename to `.pkt`, else discard.
---
## References to the cross-project conversation
Full transcript of the NR ↔ fpc-msgbase discussion lives in the
session log at
`/home/ken/.claude/projects/-home-ken-Source-Code-netreader/*.jsonl`
(dates around 2026-04-15). Relevant decisions captured in this
document; the session log has the reasoning trail if questions
arise.
Key points from that conversation, reproduced for the next session:
- **Why not fork fpc-msgbase into a new project?** Because fpc-msgbase
has tested format backends already. Forking means re-validating
them. Reshape in place preserves that investment.
- **Why the five-line Hello World test?** comet achieves
plug-and-forget with a single log callback and a TStream-based
API. That's the bar. If the lib requires more ceremony than
"open path, read, close," it's not there yet.
- **Why explicit locking, not implicit?** BBSes with their own
global mutex don't want the lib double-locking. Stateless
readers don't need cross-process locks. Library guessing leads
to surprises. Explicit means embedders can always reason about
behavior.
- **Why single callback, not registry?** comet's `OnLog` proves
one pointer is enough. Multi-observer is the caller's problem —
they can write their own fan-out if needed. Registry adds state
the library shouldn't own.
---
## Closing
This is a design proposal, not a mandate. The next implementer
should push back on anything that doesn't hold up against real
code. The testmsg corpus + cross-verifier gives us the data to
have that conversation grounded in bytes rather than opinion.
When in doubt: simpler is better. comet's model works because it
refused to do things the library didn't absolutely need to do.
fpc-msgbase 1.0 should exit with **less code, not more**, than
0.1.0 — the reshape is about architectural clarity, not feature
addition. Features come in 1.1+ on top of a clean 1.0.