New docs/attributes-registry.md publishes the canonical attribute
key catalog in four tiers:
1. Universal headers — msg.num, from, to, subject, date.*, addr.*,
area, board, cost. Every Fido format carries them.
2. Canonical attribute bits — attr.private, attr.crash, etc.,
mapped to/from the FTS-1 attribute word.
3. FTSC kludges — msgid, replyid, pid, tid, flags, chrs, tzutc,
seen-by, path, via. Multi-line keys use #13 between lines.
4. Format-specific — jam.*, squish.*, hudson.*, goldbase.*, ezy.*,
pcb.*, wildcat.*, pkt.*, msg.*. Each backend's namespace.
Plus a per-format support matrix showing which keys each backend
carries. Authoritative source remains each backend's
ClassSupportedAttributes -- the matrix can drift; SupportsAttribute()
is the runtime-correct query.
docs/architecture.md TUniMessage section rewritten:
- Documents the strict two-area model (Body + Attributes only).
- Body holds only the message text, never kludges or headers.
- Library never composes presentation -- consumers walk Attributes
and assemble their own display.
- Adds the capabilities API section pointing at the registry.
- Removes the stale "kludge lines intact and CR-separated" promise
the previous adapter implementations didn't honor.
docs/PROPOSAL.md flags the original Extras-bag section as
SUPERSEDED 2026-04-17, points to the registry + architecture docs
as the live design. Original text retained as historical context
since it captures the conversation that drove the redesign.
README.md:
- Features list now leads with the lossless two-area model and the
capabilities API.
- Adds a Status note flagging 0.2 as a breaking change vs 0.1 with
a one-paragraph migration sketch (msg.WhoFrom -> Attributes.Get
('from'), etc.).
- Documentation index links to the new registry doc.
842 lines
31 KiB
Markdown
842 lines
31 KiB
Markdown
# fpc-msgbase — plug-and-forget redesign proposal
|
||
|
||
**Status:** design handover, pre-implementation. Prepared 2026-04-15.
|
||
**Target audience:** the next implementation session.
|
||
**Precondition:** fpc-msgbase is at 0.1.0 (commit `b79e7fb` at the time
|
||
of writing); 9 format backends implemented from FTSC and format-author
|
||
specs, 7/7 tests green.
|
||
|
||
---
|
||
|
||
## TL;DR
|
||
|
||
fpc-msgbase today solves the *read* side cleanly and ports 9 formats
|
||
with tested fidelity. It's **not yet plug-and-forget** for embedders
|
||
like comet is. Four things are missing: a lossless round-trip
|
||
guarantee, an atomic outbound-packet builder (`TOutboundBatch`),
|
||
format-agnostic I/O injection, and a single-callback event model.
|
||
|
||
This proposal is **not a rewrite** — it's a six-week reshape in
|
||
place. The format backends stay. The scaffolding changes.
|
||
|
||
When done, the acceptance test is a five-line Hello World that opens
|
||
a path, reads messages, closes. No format name, no lock ceremony,
|
||
no event registry, no `.uni` sidecar unit, no init-order hazard.
|
||
|
||
---
|
||
|
||
## Context: where this proposal came from
|
||
|
||
A cross-project conversation between NetReader (`../netreader`, the
|
||
HPT-drop-in mail tosser) and fpc-msgbase. NetReader has its own
|
||
format layer (`src/core/nr.msgbase.*`) hardened by a 39/39 live trial
|
||
and a 54/54 CLI-parity test suite.
|
||
|
||
During that conversation we compared:
|
||
|
||
1. Do the two libraries agree on bytes when reading/writing the same
|
||
base? (**Unverified. First action item below.**)
|
||
2. What would "plug it in and forget" look like for fpc-msgbase?
|
||
|
||
The conclusions that drove this proposal:
|
||
|
||
- fpc-msgbase's `TPacketBatch` handles the **inbound** concurrent
|
||
tosser side cleanly. It **does not** handle the symmetric outbound
|
||
side (writing per-link packets with size rotation and atomic
|
||
finalize). Neither library does, today.
|
||
- `TUniMessage` as-currently-defined is **lossy**: JAM
|
||
`MsgIdCRC`/`ReplyCRC`, `DateProcessed`, and every other
|
||
format-specific header field has no canonical slot, so a caller
|
||
that does `read → write` through the unified API silently loses
|
||
bytes.
|
||
- The two-unit-per-format inclusion pattern (`ma.fmt.jam` +
|
||
`ma.fmt.jam.uni`) asks callers to remember both. The `.uni`
|
||
sidecar's only job is `initialization` registration — it belongs
|
||
*inside* the format unit.
|
||
- `wc_sdk/` is ~66K lines of BTrieve-era Pascal pulled in
|
||
unconditionally by any caller who uses Wildcat. Embeddability
|
||
requires build-time gating.
|
||
- The error model mixes typed exceptions (`EMessageBase` from the
|
||
factory) with boolean returns (all other methods) with event
|
||
fires (locking). Three mental models for one library is two too
|
||
many.
|
||
|
||
The reshape below addresses each.
|
||
|
||
---
|
||
|
||
## Current state — file-by-file
|
||
|
||
| Unit | LOC | Purpose | Keep? |
|
||
|---|---|---|---|
|
||
| `src/ma.api.pas` | ~330 | `TMessageBase` abstract, factory | **Reshape** |
|
||
| `src/ma.types.pas` | ~600 | `TUniMessage`, `TFTNAddress`, bits | **Reshape** |
|
||
| `src/ma.events.pas` | ~100 | multi-subscriber event registry | **Replace** with single callback |
|
||
| `src/ma.lock.pas` | ~250 | 3-layer locking | **Keep**, make explicit |
|
||
| `src/ma.paths.pas` | ~180 | per-format path derivation | **Keep** |
|
||
| `src/ma.batch.pas` | 333 | inbound `TPacketBatch` | **Keep** + add `TOutboundBatch` twin |
|
||
| `src/formats/ma.fmt.<fmt>.pas` × 9 | 400–1300 each | native spec-driven backend | **Keep** — hardened |
|
||
| `src/formats/ma.fmt.<fmt>.uni.pas` × 9 | 100–200 each | adapter to `TUniMessage` | **Fold into** the format unit |
|
||
| `src/wc_sdk/` | ~40K | Wildcat BTrieve SDK | **Gate** behind `{$DEFINE MA_WITH_WILDCAT}` |
|
||
|
||
Tests: `tests/test_*.pas` → keep, extend with the round-trip
|
||
corpus below.
|
||
|
||
---
|
||
|
||
## The five-line acceptance test
|
||
|
||
When the reshape is done, this works unchanged across every format:
|
||
|
||
```pascal
|
||
uses OpenMsgLib, OpenMsgLib.Auto;
|
||
var B: TMsgBase; M: TMsgRecord; i: integer;
|
||
B := OpenMsgBase('/path/to/base', momReadOnly);
|
||
try
|
||
for i := 0 to B.Count - 1 do
|
||
if B.Read(i, M) then WriteLn(M.Subject);
|
||
finally B.Free; end;
|
||
```
|
||
|
||
No format name (autodetect). No lock call (default = no lock, caller
|
||
opts in). No event wiring (default = silent). No `.uni` unit (one
|
||
unit per format registers itself). No init-order hazard (single
|
||
registration point).
|
||
|
||
If a new session reads a test base and the above doesn't work
|
||
verbatim, the reshape isn't done.
|
||
|
||
---
|
||
|
||
## Target architecture
|
||
|
||
### Three-tier layering
|
||
|
||
```
|
||
Caller (BBS, tosser, editor, importer)
|
||
│
|
||
├── TMsgBase (unified) ← most callers live here
|
||
├── Direct format class ← drop-down when Extras aren't enough
|
||
└── Raw stream ← replay, test, encrypt, mock
|
||
│
|
||
▼
|
||
Format backends speak to ITsmIO — never TFileStream directly
|
||
│
|
||
▼
|
||
ITsmIO adapters: file | memory | encrypted | test harness
|
||
```
|
||
|
||
**Why add ITsmIO:**
|
||
- Test backends without hitting disk (in-memory fixtures)
|
||
- Wrap the lib in encryption/compression
|
||
- Run on network mounts with unusual locking semantics
|
||
- Replay captured corrupt frames for debugging
|
||
|
||
### `TMsgRecord` with lossless `Extras`
|
||
|
||
> **SUPERSEDED 2026-04-17.** The "named fields + Extras bag" hybrid
|
||
> below was rejected during implementation in favour of a stricter
|
||
> two-area model: `Body` holds only the message text; **everything
|
||
> else** (from/to/subject/dates/addresses + every kludge + every
|
||
> format-specific field) is an attribute. See
|
||
> [`docs/attributes-registry.md`](attributes-registry.md) for the
|
||
> key catalog and [`docs/architecture.md`](architecture.md) for the
|
||
> updated TUniMessage contract. The original Extras-bag design is
|
||
> retained below as historical context.
|
||
>
|
||
> The capabilities API discussed in this section landed essentially
|
||
> as proposed (`base.SupportsAttribute(K)` + class-level
|
||
> `ClassSupportedAttributes`).
|
||
|
||
```pascal
|
||
TMsgRecord = record
|
||
{ Universal fields every format has: }
|
||
Index: longint;
|
||
From, To_: AnsiString;
|
||
Subject: AnsiString;
|
||
DateWritten: TDateTime;
|
||
DateArrived: TDateTime;
|
||
OrigAddr: TFtnAddr;
|
||
DestAddr: TFtnAddr;
|
||
Attr: cardinal; { canonical MSG_ATTR_* bitset }
|
||
Body: AnsiString; { kludges + text, CR-separated }
|
||
|
||
{ Backend-specific fields preserved verbatim across round-trips.
|
||
Every key a backend WRITES during a Read it MUST re-consume
|
||
during a subsequent Write. Test harness enforces this. }
|
||
Extras: TMsgExtras;
|
||
end;
|
||
|
||
TMsgExtras = record
|
||
function Get(const Key, Default: AnsiString): AnsiString;
|
||
procedure SetValue(const Key, Value: AnsiString);
|
||
function Has(const Key: AnsiString): boolean;
|
||
end;
|
||
```
|
||
|
||
**Well-known keys** (published in `docs/extras-registry.md`, to be
|
||
written):
|
||
|
||
| Key | Type | Source | Notes |
|
||
|---|---|---|---|
|
||
| `jam.msgidcrc` | hex u32 | JAM `.jhr` fixed header | needed for NR linker `-j` fast path |
|
||
| `jam.replycrc` | hex u32 | JAM `.jhr` fixed header | same |
|
||
| `jam.dateprocessed` | unix int | JAM `.jhr` | tosser timestamp |
|
||
| `jam.passwordcrc` | hex u32 | JAM `.jhr` | per-msg password |
|
||
| `jam.cost` | int | JAM `.jhr` | |
|
||
| `squish.umsgid` | hex u32 | SQI frame | unique msg ID |
|
||
| `hudson.board` | int 1..200 | MSGINFO | board number |
|
||
| `hudson.refer` | int | HDR | refer-to ptr |
|
||
| `pcb.confnum` | int | PCB `.IDX` | conference |
|
||
| `pcb.refer` | int | PCB `.IDX` | |
|
||
| `ezy.msgflags` | hex byte | MH#####.BBS | |
|
||
| `goldbase.userid` | int | IDX | |
|
||
| `wildcat.confnum` | int | WC SDK | |
|
||
| `pkt.cost` | int | Type-2 header | |
|
||
| `pkt.flavour` | enum | outbound pkt only | crash/hold/direct/imm/norm |
|
||
|
||
**Round-trip invariant:**
|
||
> After `Read(i, M)`, calling `Write(M)` on the same or a different
|
||
> base of the same format produces bytes that round-trip through
|
||
> `Read` to an identical `TMsgRecord` (same Extras keys, same values).
|
||
|
||
Tests enforce this across the full corpus.
|
||
|
||
### Error model — one tree
|
||
|
||
```
|
||
EMessageBase
|
||
├─ EMessageBaseIO disk full, permission, corrupt read
|
||
├─ EMessageBaseLock timeout, contention, deadlock
|
||
├─ EMessageBaseFormat bad signature, truncated header
|
||
├─ EMessageBaseRange Index out of [0..Count)
|
||
└─ EMessageBaseClosed operation on a freed/closed base
|
||
```
|
||
|
||
Every method either **succeeds** or **raises**. Boolean returns mean
|
||
"nothing to do" (empty Pack, no new messages, no matching records) —
|
||
never failure. A single `try/except on E: EMessageBase do ...`
|
||
catches the whole tree.
|
||
|
||
### Locking — explicit, not implicit
|
||
|
||
```pascal
|
||
TMsgBase = class
|
||
procedure LockForRead;
|
||
procedure LockForWrite;
|
||
procedure Unlock;
|
||
function TryLockForRead (TimeoutMs: integer): boolean;
|
||
function TryLockForWrite(TimeoutMs: integer): boolean;
|
||
|
||
{ Common-case one-liners. }
|
||
procedure WithReadLock (AProc: TProc);
|
||
procedure WithWriteLock(AProc: TProc);
|
||
end;
|
||
```
|
||
|
||
Default `Open` acquires **no lock**. Callers choose. A read-only
|
||
BBS frontend doesn't need cross-process locking; a tosser does.
|
||
Library doesn't guess. Common case stays a one-liner via
|
||
`WithReadLock`.
|
||
|
||
### Transactions — declared
|
||
|
||
```pascal
|
||
base.BeginTransaction;
|
||
try
|
||
for msg in batch do base.WriteMessage(msg);
|
||
base.Commit;
|
||
except
|
||
base.Rollback;
|
||
raise;
|
||
end;
|
||
```
|
||
|
||
Backends implement per-format:
|
||
- **JAM:** defer `.jdx` index updates in memory, flush on Commit.
|
||
- **SDM:** temp-dir shadow, rename on Commit.
|
||
- **Hudson:** in-memory index delta, flush on Commit.
|
||
- **Squish:** frame-list delta, flush on Commit.
|
||
- **PKT (write):** temp file, rename on Commit.
|
||
- **Read-only mode:** Commit is a no-op.
|
||
|
||
Callers who don't call `Begin/Commit` still work — writes flush
|
||
per-call. Transactions are opt-in for atomicity.
|
||
|
||
### Events — single callback
|
||
|
||
```pascal
|
||
TMsgEvent = record
|
||
EventType: TMsgEventType; { BaseOpened, MessageRead, etc. }
|
||
Source: TMsgBase; { may be nil for lib-global events }
|
||
Subject: AnsiString; { path, area tag, msgid }
|
||
Detail: AnsiString; { human-readable }
|
||
LongValue: int64; { count, size, offset }
|
||
TimeStamp: TDateTime;
|
||
end;
|
||
|
||
TMsgEventCallback = procedure(const E: TMsgEvent) of object;
|
||
|
||
base.OnEvent := @MyHandler; { one pointer, that's it }
|
||
```
|
||
|
||
comet does this with `OnLog`. One callback, caller multiplexes if
|
||
they need multiple observers. No multi-subscriber registry. Radical
|
||
simplification.
|
||
|
||
### Format detection is the default API
|
||
|
||
```pascal
|
||
{ Primary — sniff the path, pick the backend. }
|
||
function OpenMsgBase(const Path: AnsiString;
|
||
Mode: TMsgOpenMode): TMsgBase;
|
||
|
||
{ Escape hatch — force a specific format. }
|
||
function OpenMsgBaseAs(Format: TMsgBaseFormat;
|
||
const Path: AnsiString;
|
||
Mode: TMsgOpenMode): TMsgBase;
|
||
```
|
||
|
||
Fingerprints:
|
||
- JAM: `.jhr` + `.jdx` pair, `"JAM\0"` signature
|
||
- Squish: `.sqd` + `.sqi`
|
||
- Hudson: `MSGINFO.BBS` + `MSGHDR.BBS` + ...
|
||
- GoldBase: `MSGINFO.DAT` + ...
|
||
- PCBoard: `.IDX` + `.MSG` pair
|
||
- EzyCom: `MH*.BBS` + `MT*.BBS`
|
||
- Wildcat: WC SDK marker file
|
||
- SDM: directory full of numbered `.msg` files (fallback)
|
||
|
||
### One unit per format, self-registering
|
||
|
||
```pascal
|
||
uses
|
||
OpenMsgLib, { core + factory }
|
||
OpenMsgLib.Jam, { registers JAM format in initialization }
|
||
OpenMsgLib.Hudson; { registers Hudson in initialization }
|
||
```
|
||
|
||
No `.uni` split. The format unit's `initialization` block calls
|
||
`RegisterFormat(..., @Factory)`. Inclusion is the registration.
|
||
|
||
### `TOutboundBatch` — the missing half of the tosser
|
||
|
||
The symmetric twin of `TPacketBatch` (inbound). Details in the
|
||
appendix; the shape:
|
||
|
||
```pascal
|
||
TOutboundBatch = class
|
||
constructor Create(const AOutboundDir: AnsiString;
|
||
const AOurAka: TFtnAddr);
|
||
|
||
{ Append a message to the outbound packet for (Target, Flavour).
|
||
Caches the open pkt per (Target, Flavour) pair so repeat calls
|
||
write to the same file until rotation or Flush. }
|
||
function DispatchMessage(const Msg: TMsgRecord;
|
||
const Target: TFtnAddr;
|
||
Flavour: FlavourType): boolean;
|
||
|
||
{ Finalise every cached pkt — write terminator, rename .tmp →
|
||
final, update .flo if configured. Idempotent. }
|
||
procedure Flush;
|
||
|
||
property MaxPktSizeKB: longint; { 0 = unlimited; rotate at threshold }
|
||
property OnEvent: TMsgEventCallback;
|
||
end;
|
||
```
|
||
|
||
Two features missing in both NR and fpc-msgbase today, baked in
|
||
here from day one:
|
||
|
||
- **Packet size rotation.** Before each write, check cached stream
|
||
size + estimated msg size. If over threshold, close current
|
||
(`.tmp` → `.pkt` rename), open next.
|
||
- **Atomic finalize.** Writes go to `xxxxxxxx.pkt.tmp`. Only `Flush`
|
||
(or rotation) renames to `.pkt`. Crash mid-run leaves an orphan
|
||
`.tmp` — not a corrupt real packet.
|
||
|
||
Format-agnostic: writes to Type-2 / 2+ / 2.2 pkts via the existing
|
||
`ma.fmt.pkt` backend. FTN routing stays in the caller (NR's
|
||
`FindRouteForNetmail`/`GetLinkForRoute` is fidoconf-specific, not
|
||
library material).
|
||
|
||
---
|
||
|
||
## Reference map — where to look in existing code
|
||
|
||
### Inside fpc-msgbase
|
||
|
||
| Concern | File | Relevant lines |
|
||
|---|---|---|
|
||
| Factory pattern to extend | `src/ma.api.pas` | full file |
|
||
| `TUniMessage` to grow with Extras | `src/ma.types.pas` | record definition, top of file |
|
||
| Existing inbound tosser to mirror for outbound | `src/ma.batch.pas` | `TPacketBatch`, `GetOrCreateBase` (line 296) |
|
||
| Example tosser template | `examples/example_tosser.pas` | `TSimpleTosser.OnMessage` |
|
||
| Existing lock layer to make explicit | `src/ma.lock.pas` | full file |
|
||
| Event registry to replace with single callback | `src/ma.events.pas` | full file |
|
||
| Per-format native → uni adapters to fold together | `src/formats/*.uni.pas` × 9 | each is ~100-200 LOC |
|
||
| Sample data for tests (if any) | `tests/` | check what's already there |
|
||
|
||
### In NetReader (`../netreader`)
|
||
|
||
NR has already solved several of these in its own idiom. The
|
||
outbound machinery is the most reusable reference:
|
||
|
||
| Concern | File | Relevant lines |
|
||
|---|---|---|
|
||
| Cached outbound packet per dest addr | `src/core/nr.scanner.pas` | `GetOutboundPacket` line 214, `CloseAllPackets` line 291 |
|
||
| Temp-pkt filename generator | `src/core/nr.scanner.pas` | `CreateTempPktFileName` call at 256 |
|
||
| Per-message route + pack logic | `src/core/nr.scanner.pas` | `PackMsg` line 551 |
|
||
| Route resolution (stays in NR) | `src/core/nr.scanner.pas` | `FindRouteForNetmail` 398, `GetLinkForRoute` 451 |
|
||
| Priority from Attr + FLAGS kludge | `src/core/nr.scanner.pas` | `PackMsg` lines 609-636 |
|
||
| Zone-aware outbound path | `src/core/nr.arcmail.pas` | `GetOutboundDir` (around line 178), `GetFLOPath` 219 |
|
||
| IsArcMailExt helper | `src/core/nr.arcmail.pas` | `IsArcMailExt` line 155 |
|
||
| Pkt header + message write | `src/core/nr.packet.pas` | `WritePktHeader`, `WritePktMessage` |
|
||
| Scanner's NoHighWaters pattern (useful for read-only verifiers) | `src/core/nr.scanner.pas` | `ScanNMArea` 1098 |
|
||
| JAM header-only read (NR's linker CRC fast path) | `src/msgbase/nr.msgbase.jam.pas` | `TNrJamMsgBase.ReadLinkHdr`, `TNrJamLinkHdr` record |
|
||
|
||
NR's entire CLI parity pass (last 7 commits in `../netreader`) is
|
||
built on this scaffolding. Pattern should transfer cleanly.
|
||
|
||
### Reference: format specs
|
||
|
||
The FTSC document collection at `/home/ken/Source Code/ftsc/docs/` and
|
||
the format-author specs (jam.txt, squish.doc, pcboard.doc, etc.) are
|
||
the authoritative source. Every backend cites the spec it implements
|
||
in `docs/ftsc-compliance.md`.
|
||
|
||
### In comet (`../comet`)
|
||
|
||
comet is the "plug it in and forget" model. Key patterns worth
|
||
mirroring:
|
||
|
||
- Single log callback (`OnLog`), no registry
|
||
- `TStream`-centric I/O — caller controls the stream
|
||
- Config hot-reload without API surgery
|
||
- Embeddable via narrow callback surface
|
||
|
||
See `../comet/README.md` "Embeddable" bullet.
|
||
|
||
---
|
||
|
||
## Test corpus — the `testmsg/` folder
|
||
|
||
**Status:** to be populated. Proposal: live at
|
||
`/home/ken/Source Code/fpc-msgbase/testmsg/` (checked in, anonymized,
|
||
git-tracked so rollback is just `git checkout`).
|
||
|
||
Proposed structure:
|
||
|
||
```
|
||
testmsg/
|
||
├── README.md how-to regenerate, licensing notes
|
||
├── jam/
|
||
│ ├── small_echo/ few-hundred-msg JAM area, real echoarea
|
||
│ ├── large_echo/ 10k+ messages, stresses index growth
|
||
│ ├── deleted_mix/ area with tombstoned msgs for Pack tests
|
||
│ └── netmail/ JAM netmail w/ kludges
|
||
├── squish/
|
||
│ ├── small_echo/
|
||
│ └── netmail/
|
||
├── hudson/
|
||
│ └── 3-board/ multi-board fixture for Board field test
|
||
├── msg/ FTS-1 numbered *.msg
|
||
│ ├── netmail/
|
||
│ └── echo/ rare but tested
|
||
├── pcb/
|
||
│ └── sample_conf/
|
||
├── ezycom/
|
||
├── goldbase/
|
||
├── wildcat/
|
||
├── pkt/
|
||
│ ├── type2_plain/
|
||
│ ├── type2plus/
|
||
│ └── type2_2/
|
||
└── reference/
|
||
├── jam_small_echo.json canonical-JSON snapshot
|
||
├── jam_large_echo.json (round-trip baseline — DO NOT edit)
|
||
└── ... one per corpus base
|
||
```
|
||
|
||
**Anonymization rules:** before check-in, scrub real user addresses
|
||
and passwords from kludges. A small helper `tools/anonymize.pas`
|
||
can do this deterministically — replaces real MSGID address with
|
||
`999:9999/9999`, replaces user names with `User<N>` tokens.
|
||
|
||
**Regeneration script:** `tools/regen_reference.sh` walks each corpus
|
||
base via this library, dumps canonical JSON to `testmsg/reference/`.
|
||
Committed output is the ground truth captured from a known-good build;
|
||
later test runs diff their output against the committed JSON.
|
||
|
||
**Rollback story:** `git checkout testmsg/` restores any corrupted
|
||
fixture. Keep fixtures small-ish (<50MB total across all formats)
|
||
so the repo stays cloneable on slow links.
|
||
|
||
---
|
||
|
||
## Byte-agreement cross-verifier — first actionable task
|
||
|
||
Before any redesign, **confirm the two libraries agree on the
|
||
bytes they read and write today**. Without this baseline, reshape
|
||
could silently regress behavior that currently works.
|
||
|
||
**Tool:** `tools/cross_verify.pas` — standalone FPC program.
|
||
|
||
**What it does:**
|
||
|
||
1. Open a corpus base with the existing fpc-msgbase.
|
||
2. Read all messages → dump each to a canonical JSON record.
|
||
3. Open the same base with NetReader's `nr.msgbase.*`.
|
||
4. Read all messages → dump to same canonical JSON format.
|
||
5. Diff the two outputs. Report mismatches per field.
|
||
|
||
For write verification:
|
||
1. Fabricate 100 messages with known content.
|
||
2. Write them through fpc-msgbase to a fresh JAM base.
|
||
3. Read them back through NR's backend, diff.
|
||
4. Repeat with write via NR, read via fpc-msgbase.
|
||
|
||
Expected outcome: both should agree for universal fields. Where
|
||
they disagree is where the proposal's `Extras` story becomes
|
||
important — those are the fields each side handles differently
|
||
(or one side silently drops).
|
||
|
||
**Deliverable:** a report noting exactly which fields differ per
|
||
format, so the Extras registry (above) is anchored in real data
|
||
rather than speculation.
|
||
|
||
This is ~2-4 hours of work and produces the single most important
|
||
input to the reshape.
|
||
|
||
---
|
||
|
||
## Implementation plan — six weeks
|
||
|
||
| Week | Deliverables |
|
||
|---|---|
|
||
| **1** | `testmsg/` corpus committed (3-5 bases per format, anonymized). `tools/cross_verify.pas` running. `tools/regen_reference.sh` producing committed canonical JSON. Baseline: which fields differ between fpc-msgbase and NR today? |
|
||
| **2** | Grow `TUniMessage` → `TMsgRecord` with `Extras` bag. Publish `docs/extras-registry.md` naming every well-known key. Backfill round-trip test: every backend reads a corpus base, writes to a fresh base, reads back, Extras map preserved. Fold `.uni` sidecars into format units. |
|
||
| **3** | Introduce `ITsmIO`. Refactor every format backend to take an `ITsmIO` instead of `TFileStream` directly. Add `TMemoryTsmIO` for tests. Run the full test suite in memory — no disk writes. |
|
||
| **4** | Land `TOutboundBatch` with size rotation + atomic finalize. Write `examples/example_multiplex` that splits inbound pkts by destination and writes per-link outbound pkts via `TOutboundBatch`. |
|
||
| **5** | Unify error model (typed `EMessageBase*` tree; boolean returns mean "nothing to do" only). Replace `TMessageEvents` registry with single `OnEvent` callback. `TMsgBase` locking becomes explicit (no implicit `LockForRead` on Open). `WithReadLock` / `WithWriteLock` helpers. |
|
||
| **6** | Documentation pass. Full API reference regen. `CHANGELOG.md` with 0.1 → 1.0 migration notes. Version constants `MA_VERSION_MAJOR`/`MINOR` + runtime `MaRequireVersion`. `{$DEFINE MA_WITH_<FORMAT>}` gates finalized. Ship 1.0. |
|
||
|
||
Each week's work ships independently — no big-bang merge.
|
||
|
||
---
|
||
|
||
## Beyond the six-week plan — things worth planning
|
||
|
||
### Version ABI discipline
|
||
|
||
```pascal
|
||
const
|
||
MA_VERSION_MAJOR = 1;
|
||
MA_VERSION_MINOR = 0;
|
||
|
||
{ Caller invokes at program start. Raises if compiled against
|
||
major < required or (major == required AND minor < required). }
|
||
procedure MaRequireVersion(Major, Minor: integer);
|
||
```
|
||
|
||
Libraries consumed by multiple callers (Fimail, NetReader, third-party)
|
||
need a noisy "you linked the wrong version" failure mode. Not
|
||
"weird behavior six months later."
|
||
|
||
### Read-only airtightness
|
||
|
||
`momReadOnly` should be verifiable. Regression test:
|
||
|
||
1. Copy a base to a read-only mount (`chmod -w` + `mount -o ro`).
|
||
2. `OpenMsgBase(path, momReadOnly)`.
|
||
3. Read 10k messages, every field.
|
||
4. Assert: every file's `st_mtime` is unchanged; no syscall fired
|
||
that opens a file for write.
|
||
|
||
A BBS running concurrently with a tosser depends on this being
|
||
airtight — no "oops, we updated last-read" surprises.
|
||
|
||
### Build-time format gating
|
||
|
||
```pascal
|
||
{ config.inc }
|
||
{$DEFINE MA_WITH_JAM}
|
||
{$DEFINE MA_WITH_HUDSON}
|
||
{$DEFINE MA_WITH_SQUISH}
|
||
{ $DEFINE MA_WITH_WILDCAT} { commented out — 66K LOC pulled in }
|
||
```
|
||
|
||
```pascal
|
||
{ OpenMsgLib.All.pas — convenience include-all unit }
|
||
unit OpenMsgLib.All;
|
||
interface
|
||
uses
|
||
OpenMsgLib
|
||
{$IFDEF MA_WITH_JAM}, OpenMsgLib.Jam{$ENDIF}
|
||
{$IFDEF MA_WITH_HUDSON}, OpenMsgLib.Hudson{$ENDIF}
|
||
...
|
||
;
|
||
end.
|
||
```
|
||
|
||
Embedders control what ships. A full tosser wants everything.
|
||
A minimal BBS UI wants JAM only. No one should be forced to
|
||
compile BTrieve-era SDK code for a JAM reader.
|
||
|
||
### Documentation structure
|
||
|
||
```
|
||
docs/
|
||
├── API.md full API reference (regen each 1.x)
|
||
├── architecture.md layered design (update to three-tier)
|
||
├── extras-registry.md well-known Extras keys per format
|
||
├── ftsc-compliance.md spec notes
|
||
├── migration-0.1-to-1.0.md for existing callers
|
||
├── embedder-guide.md for BBS/tosser authors
|
||
└── format-notes/ per-format quirks & gotchas
|
||
├── jam.md
|
||
├── squish.md
|
||
└── ...
|
||
```
|
||
|
||
### CI expectations
|
||
|
||
When the repo has CI (GitHub Actions, GitLab CI, whatever), the
|
||
test job is:
|
||
|
||
1. Build with all formats enabled.
|
||
2. Build with each format *disabled* in turn — prove conditional
|
||
compilation holds.
|
||
3. Run `run_tests.sh`.
|
||
4. Run `tools/cross_verify.pas` against the corpus.
|
||
5. Verify `docs/extras-registry.md` lists every key any backend
|
||
writes (grep the source for `Extras.SetValue`).
|
||
|
||
---
|
||
|
||
## Contribution path back from NetReader
|
||
|
||
NR has six weeks of HPT-parity work sitting on top of its own
|
||
msgbase. When the reshape hits 1.0, NR has a decision:
|
||
|
||
- **A.** Adopt fpc-msgbase wholesale — drop `nr.msgbase.*`, call
|
||
the lib. NR becomes a thin areafix + scanner + CLI over the
|
||
shared library. Big commit, one-time pain.
|
||
- **B.** Keep NR's backends, cherry-pick fpc-msgbase's event
|
||
dispatcher / lock model / outbound batch. Lighter touch.
|
||
- **C.** Contribute NR's improvements (JAM CRC fast-path,
|
||
case-insensitive `.msg` globbing, netmail-cfg-writer fix,
|
||
FTS-0004 tag validation) back to fpc-msgbase. Symmetric win.
|
||
|
||
Option C is the first step regardless of whether A or B lands
|
||
later. NR's recent work is format-agnostic quality improvements
|
||
that every embedder wants.
|
||
|
||
---
|
||
|
||
## Open questions for the implementation session
|
||
|
||
Before any code changes:
|
||
|
||
1. **Extras representation.** Key-value `AnsiString` bag is dead
|
||
simple but slow for 100K messages. Alternatives:
|
||
- `TDictionary<AnsiString, AnsiString>` (rtl-generics) — faster lookup
|
||
- Packed binary blob with offsets — smallest memory footprint
|
||
- Keep strings but cap each Extras to N keys (fixed-size array)
|
||
|
||
Benchmark before picking.
|
||
|
||
2. **Transaction nesting.** Can `BeginTransaction` nest? JAM defers
|
||
`.jdx` updates in memory — nested transactions just keep
|
||
deferring. SDM's tempdir-shadow approach can't nest cleanly.
|
||
Propose: **no nesting**. Second `BeginTransaction` call raises.
|
||
|
||
3. **Thread safety of `TMsgBase`.** Inbound `TPacketBatch` shares
|
||
one base across workers, serialises via per-base CS. Works today.
|
||
Does the reshape preserve that? Answer: yes, the explicit
|
||
lock API makes it *more* obvious.
|
||
|
||
4. **Squish `.sql` lastread vs `.sqi` index.** This library treats
|
||
`.sql` as the cross-process lock sentinel (matching the convention
|
||
other Squish-aware tools use). The reshape should document this
|
||
explicitly — it's a format-specific quirk that callers shouldn't
|
||
need to know.
|
||
|
||
5. **PKT as a base vs as a stream.** `TMessageBase` abstraction
|
||
assumes random-access read. A PKT is a forward-only stream of
|
||
messages. Does PKT implement `Count`? (Reader would need to
|
||
scan ahead to count.) Propose: PKT implements `Count` but
|
||
flags `CanRandomAccess = False`, caller iterates via `MoveNext`
|
||
instead of `Read(i)`. Callers who treat PKT as a base get a
|
||
clear exception.
|
||
|
||
6. **Wildcat SDK cleanup.** Is the 40K-LOC `wc_sdk/` still needed,
|
||
or can it be replaced with a narrower interop layer? (Not a
|
||
1.0 blocker, but worth scoping for 1.1.)
|
||
|
||
Decisions on these should go in `docs/design-decisions.md` as
|
||
they land, so future sessions don't re-litigate.
|
||
|
||
---
|
||
|
||
## First-session actionable steps
|
||
|
||
If this proposal makes sense to the next implementer, the first
|
||
session should:
|
||
|
||
1. **Read this document end-to-end.** Cross-check my mapping of
|
||
fpc-msgbase's current state against the actual code — call out
|
||
anything stale.
|
||
|
||
2. **Create `testmsg/` with a corpus.** Start with ONE format
|
||
(JAM — best-spec'd, most used) and 3 bases: small echoarea
|
||
(~100 msgs), large echoarea (~10K), netmail directory. Commit.
|
||
|
||
3. **Write `tools/cross_verify.pas`.** Use NR's `nr.msgbase.jam`
|
||
and fpc-msgbase's `ma.fmt.jam` as the two readers. Dump
|
||
canonical JSON, diff. Report.
|
||
|
||
4. **Report the diff.** What fields do the two libraries disagree
|
||
on today? That diff becomes the initial `extras-registry.md`.
|
||
|
||
5. **Stop. Discuss.** Before any backend refactor, the cross-diff
|
||
report informs every architectural decision below. If the two
|
||
libraries disagree on 20% of fields, the Extras story is
|
||
validated. If they agree on 100%, the Extras story is more
|
||
about future-proofing than immediate need.
|
||
|
||
Only after the cross-diff is in hand does the week-1 plan above
|
||
make sense.
|
||
|
||
---
|
||
|
||
## Appendix: the `TOutboundBatch` design in full
|
||
|
||
(From the cross-project conversation; reproduced here for
|
||
completeness.)
|
||
|
||
```pascal
|
||
unit OpenMsgLib.Outbound;
|
||
|
||
{$mode objfpc}{$H+}
|
||
|
||
interface
|
||
|
||
uses
|
||
Classes, SysUtils,
|
||
OpenMsgLib, OpenMsgLib.Types, OpenMsgLib.Pkt;
|
||
|
||
type
|
||
TOutboundBatch = class
|
||
private
|
||
FOutboundDir: AnsiString;
|
||
FOurAka: TFtnAddr;
|
||
FMaxPktSizeKB: longint;
|
||
FCacheCS: TRTLCriticalSection;
|
||
FCache: TFPHashList; { key: "zone:net/node.point|flavour" -> TEntry }
|
||
FOnEvent: TMsgEventCallback;
|
||
|
||
function GetOrCreateEntry(const Target: TFtnAddr;
|
||
Flavour: FlavourType): TEntry;
|
||
procedure RotateIfOver(Entry: TEntry; EstimatedSize: integer);
|
||
procedure FinalizeEntry(Entry: TEntry); { rename .tmp → final }
|
||
public
|
||
constructor Create(const AOutboundDir: AnsiString;
|
||
const AOurAka: TFtnAddr);
|
||
destructor Destroy; override;
|
||
|
||
function DispatchMessage(const Msg: TMsgRecord;
|
||
const Target: TFtnAddr;
|
||
Flavour: FlavourType): boolean;
|
||
procedure Flush; { finalize every cached entry }
|
||
|
||
property MaxPktSizeKB: longint read FMaxPktSizeKB write FMaxPktSizeKB;
|
||
property OnEvent: TMsgEventCallback read FOnEvent write FOnEvent;
|
||
end;
|
||
```
|
||
|
||
**Per-entry:**
|
||
|
||
```pascal
|
||
TEntry = class
|
||
Key: AnsiString; { "zone:net/node.point|flavour" }
|
||
Target: TFtnAddr;
|
||
Flavour: FlavourType;
|
||
Stream: TStream; { writer, actually holds an ITsmIO under the hood }
|
||
TmpPath: AnsiString; { xxxxxxxx.pkt.tmp }
|
||
FinalPath: AnsiString; { xxxxxxxx.pkt — set on Rotate/Flush }
|
||
CS: TRTLCriticalSection; { serialises writes to this one pkt }
|
||
WrittenSize: int64;
|
||
end;
|
||
```
|
||
|
||
**Flow inside `DispatchMessage`:**
|
||
|
||
1. Lookup entry by `(Target, Flavour)` in `FCache`.
|
||
- Miss: create entry, open `TmpPath`, write pkt header, cache.
|
||
- Hit: enter its CS.
|
||
2. `RotateIfOver(Entry, EstimatedMsgSize)`:
|
||
- If `WrittenSize + EstimatedMsgSize > MaxPktSizeKB * 1024`
|
||
(and `MaxPktSizeKB > 0`):
|
||
- Write terminator, close, rename `.tmp → .pkt`.
|
||
- Create a new `.tmp`, write header, reset `WrittenSize`.
|
||
- Fire `metPktRotated`.
|
||
3. Convert `TMsgRecord` → `TPktMessage` via `UniToPkt`.
|
||
4. Write via `OpenMsgLib.Pkt` writer. Update `WrittenSize`.
|
||
5. Fire `metMessageWritten`.
|
||
6. Leave entry CS.
|
||
|
||
**Flush:** iterate `FCache`, for each entry write terminator,
|
||
close, `FinalizeEntry`. Fire `metBatchFinalized`.
|
||
|
||
**Crash recovery:** on startup, scanner sees orphan `xxxxxxxx.pkt.tmp`
|
||
files. Decision policy (caller-configurable):
|
||
- **Discard** (default): delete orphan tmps, assume corrupt.
|
||
- **Recover**: try to validate pkt header + terminator, if valid
|
||
rename to `.pkt`, else discard.
|
||
|
||
---
|
||
|
||
## References to the cross-project conversation
|
||
|
||
Full transcript of the NR ↔ fpc-msgbase discussion lives in the
|
||
session log at
|
||
`/home/ken/.claude/projects/-home-ken-Source-Code-netreader/*.jsonl`
|
||
(dates around 2026-04-15). Relevant decisions captured in this
|
||
document; the session log has the reasoning trail if questions
|
||
arise.
|
||
|
||
Key points from that conversation, reproduced for the next session:
|
||
|
||
- **Why not fork fpc-msgbase into a new project?** Because fpc-msgbase
|
||
has tested format backends already. Forking means re-validating
|
||
them. Reshape in place preserves that investment.
|
||
- **Why the five-line Hello World test?** comet achieves
|
||
plug-and-forget with a single log callback and a TStream-based
|
||
API. That's the bar. If the lib requires more ceremony than
|
||
"open path, read, close," it's not there yet.
|
||
- **Why explicit locking, not implicit?** BBSes with their own
|
||
global mutex don't want the lib double-locking. Stateless
|
||
readers don't need cross-process locks. Library guessing leads
|
||
to surprises. Explicit means embedders can always reason about
|
||
behavior.
|
||
- **Why single callback, not registry?** comet's `OnLog` proves
|
||
one pointer is enough. Multi-observer is the caller's problem —
|
||
they can write their own fan-out if needed. Registry adds state
|
||
the library shouldn't own.
|
||
|
||
---
|
||
|
||
## Closing
|
||
|
||
This is a design proposal, not a mandate. The next implementer
|
||
should push back on anything that doesn't hold up against real
|
||
code. The testmsg corpus + cross-verifier gives us the data to
|
||
have that conversation grounded in bytes rather than opinion.
|
||
|
||
When in doubt: simpler is better. comet's model works because it
|
||
refused to do things the library didn't absolutely need to do.
|
||
fpc-msgbase 1.0 should exit with **less code, not more**, than
|
||
0.1.0 — the reshape is about architectural clarity, not feature
|
||
addition. Features come in 1.1+ on top of a clean 1.0.
|