Files

Ken Johnson 1e253e8a78 Phase 5: attribute registry + arch / proposal / README updates

New docs/attributes-registry.md publishes the canonical attribute
key catalog in four tiers:

  1. Universal headers — msg.num, from, to, subject, date.*, addr.*,
     area, board, cost.  Every Fido format carries them.
  2. Canonical attribute bits — attr.private, attr.crash, etc.,
     mapped to/from the FTS-1 attribute word.
  3. FTSC kludges — msgid, replyid, pid, tid, flags, chrs, tzutc,
     seen-by, path, via.  Multi-line keys use #13 between lines.
  4. Format-specific — jam.*, squish.*, hudson.*, goldbase.*, ezy.*,
     pcb.*, wildcat.*, pkt.*, msg.*.  Each backend's namespace.

Plus a per-format support matrix showing which keys each backend
carries. Authoritative source remains each backend's
ClassSupportedAttributes -- the matrix can drift; SupportsAttribute()
is the runtime-correct query.

docs/architecture.md TUniMessage section rewritten:
- Documents the strict two-area model (Body + Attributes only).
- Body holds only the message text, never kludges or headers.
- Library never composes presentation -- consumers walk Attributes
  and assemble their own display.
- Adds the capabilities API section pointing at the registry.
- Removes the stale "kludge lines intact and CR-separated" promise
  the previous adapter implementations didn't honor.

docs/PROPOSAL.md flags the original Extras-bag section as
SUPERSEDED 2026-04-17, points to the registry + architecture docs
as the live design. Original text retained as historical context
since it captures the conversation that drove the redesign.

README.md:
- Features list now leads with the lossless two-area model and the
  capabilities API.
- Adds a Status note flagging 0.2 as a breaking change vs 0.1 with
  a one-paragraph migration sketch (msg.WhoFrom -> Attributes.Get
  ('from'), etc.).
- Documentation index links to the new registry doc.

2026-04-17 14:35:19 -07:00

31 KiB

Raw Permalink Blame History

fpc-msgbase — plug-and-forget redesign proposal

Status: design handover, pre-implementation. Prepared 2026-04-15. Target audience: the next implementation session. Precondition: fpc-msgbase is at 0.1.0 (commit b79e7fb at the time of writing); 9 format backends implemented from FTSC and format-author specs, 7/7 tests green.

TL;DR

fpc-msgbase today solves the read side cleanly and ports 9 formats with tested fidelity. It's not yet plug-and-forget for embedders like comet is. Four things are missing: a lossless round-trip guarantee, an atomic outbound-packet builder (TOutboundBatch), format-agnostic I/O injection, and a single-callback event model.

This proposal is not a rewrite — it's a six-week reshape in place. The format backends stay. The scaffolding changes.

When done, the acceptance test is a five-line Hello World that opens a path, reads messages, closes. No format name, no lock ceremony, no event registry, no .uni sidecar unit, no init-order hazard.

Context: where this proposal came from

A cross-project conversation between NetReader (../netreader, the HPT-drop-in mail tosser) and fpc-msgbase. NetReader has its own format layer (src/core/nr.msgbase.*) hardened by a 39/39 live trial and a 54/54 CLI-parity test suite.

During that conversation we compared:

Do the two libraries agree on bytes when reading/writing the same base? (Unverified. First action item below.)
What would "plug it in and forget" look like for fpc-msgbase?

The conclusions that drove this proposal:

fpc-msgbase's TPacketBatch handles the inbound concurrent tosser side cleanly. It does not handle the symmetric outbound side (writing per-link packets with size rotation and atomic finalize). Neither library does, today.
TUniMessage as-currently-defined is lossy: JAM MsgIdCRC/ReplyCRC, DateProcessed, and every other format-specific header field has no canonical slot, so a caller that does read → write through the unified API silently loses bytes.
The two-unit-per-format inclusion pattern (ma.fmt.jam + ma.fmt.jam.uni) asks callers to remember both. The .uni sidecar's only job is initialization registration — it belongs inside the format unit.
wc_sdk/ is ~66K lines of BTrieve-era Pascal pulled in unconditionally by any caller who uses Wildcat. Embeddability requires build-time gating.
The error model mixes typed exceptions (EMessageBase from the factory) with boolean returns (all other methods) with event fires (locking). Three mental models for one library is two too many.

The reshape below addresses each.

Current state — file-by-file

Unit	LOC	Purpose	Keep?
`src/ma.api.pas`	~330	`TMessageBase` abstract, factory	Reshape
`src/ma.types.pas`	~600	`TUniMessage`, `TFTNAddress`, bits	Reshape
`src/ma.events.pas`	~100	multi-subscriber event registry	Replace with single callback
`src/ma.lock.pas`	~250	3-layer locking	Keep, make explicit
`src/ma.paths.pas`	~180	per-format path derivation	Keep
`src/ma.batch.pas`	333	inbound `TPacketBatch`	Keep + add `TOutboundBatch` twin
`src/formats/ma.fmt.<fmt>.pas` × 9	400–1300 each	native spec-driven backend	Keep — hardened
`src/formats/ma.fmt.<fmt>.uni.pas` × 9	100–200 each	adapter to `TUniMessage`	Fold into the format unit
`src/wc_sdk/`	~40K	Wildcat BTrieve SDK	Gate behind `{$DEFINE MA_WITH_WILDCAT}`

Tests: tests/test_*.pas → keep, extend with the round-trip corpus below.

The five-line acceptance test

When the reshape is done, this works unchanged across every format:

uses OpenMsgLib, OpenMsgLib.Auto;
var B: TMsgBase; M: TMsgRecord; i: integer;
B := OpenMsgBase('/path/to/base', momReadOnly);
try
  for i := 0 to B.Count - 1 do
    if B.Read(i, M) then WriteLn(M.Subject);
finally B.Free; end;

No format name (autodetect). No lock call (default = no lock, caller opts in). No event wiring (default = silent). No .uni unit (one unit per format registers itself). No init-order hazard (single registration point).

If a new session reads a test base and the above doesn't work verbatim, the reshape isn't done.

Target architecture

Three-tier layering

Caller (BBS, tosser, editor, importer)
    │
    ├── TMsgBase (unified)              ← most callers live here
    ├── Direct format class             ← drop-down when Extras aren't enough
    └── Raw stream                      ← replay, test, encrypt, mock
            │
            ▼
    Format backends speak to ITsmIO — never TFileStream directly
            │
            ▼
    ITsmIO adapters: file | memory | encrypted | test harness

Why add ITsmIO:

Test backends without hitting disk (in-memory fixtures)
Wrap the lib in encryption/compression
Run on network mounts with unusual locking semantics
Replay captured corrupt frames for debugging

`TMsgRecord` with lossless `Extras`

SUPERSEDED 2026-04-17. The "named fields + Extras bag" hybrid below was rejected during implementation in favour of a stricter two-area model: Body holds only the message text; everything else (from/to/subject/dates/addresses + every kludge + every format-specific field) is an attribute. See docs/attributes-registry.md for the key catalog and docs/architecture.md for the updated TUniMessage contract. The original Extras-bag design is retained below as historical context.

The capabilities API discussed in this section landed essentially as proposed (base.SupportsAttribute(K) + class-level ClassSupportedAttributes).

TMsgRecord = record
  { Universal fields every format has: }
  Index:        longint;
  From, To_:    AnsiString;
  Subject:      AnsiString;
  DateWritten:  TDateTime;
  DateArrived:  TDateTime;
  OrigAddr:     TFtnAddr;
  DestAddr:     TFtnAddr;
  Attr:         cardinal;        { canonical MSG_ATTR_* bitset }
  Body:         AnsiString;      { kludges + text, CR-separated }

  { Backend-specific fields preserved verbatim across round-trips.
    Every key a backend WRITES during a Read it MUST re-consume
    during a subsequent Write.  Test harness enforces this. }
  Extras:       TMsgExtras;
end;

TMsgExtras = record
  function  Get(const Key, Default: AnsiString): AnsiString;
  procedure SetValue(const Key, Value: AnsiString);
  function  Has(const Key: AnsiString): boolean;
end;

Well-known keys (published in docs/extras-registry.md, to be written):

Key	Type	Source	Notes
`jam.msgidcrc`	hex u32	JAM `.jhr` fixed header	needed for NR linker `-j` fast path
`jam.replycrc`	hex u32	JAM `.jhr` fixed header	same
`jam.dateprocessed`	unix int	JAM `.jhr`	tosser timestamp
`jam.passwordcrc`	hex u32	JAM `.jhr`	per-msg password
`jam.cost`	int	JAM `.jhr`
`squish.umsgid`	hex u32	SQI frame	unique msg ID
`hudson.board`	int 1..200	MSGINFO	board number
`hudson.refer`	int	HDR	refer-to ptr
`pcb.confnum`	int	PCB `.IDX`	conference
`pcb.refer`	int	PCB `.IDX`
`ezy.msgflags`	hex byte	MH#####.BBS
`goldbase.userid`	int	IDX
`wildcat.confnum`	int	WC SDK
`pkt.cost`	int	Type-2 header
`pkt.flavour`	enum	outbound pkt only	crash/hold/direct/imm/norm

Round-trip invariant:

After Read(i, M), calling Write(M) on the same or a different base of the same format produces bytes that round-trip through Read to an identical TMsgRecord (same Extras keys, same values).

Tests enforce this across the full corpus.

Error model — one tree

EMessageBase
  ├─ EMessageBaseIO         disk full, permission, corrupt read
  ├─ EMessageBaseLock       timeout, contention, deadlock
  ├─ EMessageBaseFormat     bad signature, truncated header
  ├─ EMessageBaseRange      Index out of [0..Count)
  └─ EMessageBaseClosed     operation on a freed/closed base

Every method either succeeds or raises. Boolean returns mean "nothing to do" (empty Pack, no new messages, no matching records) — never failure. A single try/except on E: EMessageBase do ... catches the whole tree.

Locking — explicit, not implicit

TMsgBase = class
  procedure LockForRead;
  procedure LockForWrite;
  procedure Unlock;
  function  TryLockForRead (TimeoutMs: integer): boolean;
  function  TryLockForWrite(TimeoutMs: integer): boolean;

  { Common-case one-liners. }
  procedure WithReadLock (AProc: TProc);
  procedure WithWriteLock(AProc: TProc);
end;

Default Open acquires no lock. Callers choose. A read-only BBS frontend doesn't need cross-process locking; a tosser does. Library doesn't guess. Common case stays a one-liner via WithReadLock.

Transactions — declared

base.BeginTransaction;
try
  for msg in batch do base.WriteMessage(msg);
  base.Commit;
except
  base.Rollback;
  raise;
end;

Backends implement per-format:

JAM: defer .jdx index updates in memory, flush on Commit.
SDM: temp-dir shadow, rename on Commit.
Hudson: in-memory index delta, flush on Commit.
Squish: frame-list delta, flush on Commit.
PKT (write): temp file, rename on Commit.
Read-only mode: Commit is a no-op.

Callers who don't call Begin/Commit still work — writes flush per-call. Transactions are opt-in for atomicity.

Events — single callback

TMsgEvent = record
  EventType: TMsgEventType;       { BaseOpened, MessageRead, etc. }
  Source:    TMsgBase;            { may be nil for lib-global events }
  Subject:   AnsiString;          { path, area tag, msgid }
  Detail:    AnsiString;          { human-readable }
  LongValue: int64;               { count, size, offset }
  TimeStamp: TDateTime;
end;

TMsgEventCallback = procedure(const E: TMsgEvent) of object;

base.OnEvent := @MyHandler;       { one pointer, that's it }

comet does this with OnLog. One callback, caller multiplexes if they need multiple observers. No multi-subscriber registry. Radical simplification.

Format detection is the default API

{ Primary — sniff the path, pick the backend. }
function OpenMsgBase(const Path: AnsiString;
                     Mode: TMsgOpenMode): TMsgBase;

{ Escape hatch — force a specific format. }
function OpenMsgBaseAs(Format: TMsgBaseFormat;
                       const Path: AnsiString;
                       Mode: TMsgOpenMode): TMsgBase;

Fingerprints:

JAM: .jhr + .jdx pair, "JAM\0" signature
Squish: .sqd + .sqi
Hudson: MSGINFO.BBS + MSGHDR.BBS + ...
GoldBase: MSGINFO.DAT + ...
PCBoard: .IDX + .MSG pair
EzyCom: MH*.BBS + MT*.BBS
Wildcat: WC SDK marker file
SDM: directory full of numbered .msg files (fallback)

One unit per format, self-registering

uses
  OpenMsgLib,            { core + factory }
  OpenMsgLib.Jam,        { registers JAM format in initialization }
  OpenMsgLib.Hudson;     { registers Hudson in initialization }

No .uni split. The format unit's initialization block calls RegisterFormat(..., @Factory). Inclusion is the registration.

`TOutboundBatch` — the missing half of the tosser

The symmetric twin of TPacketBatch (inbound). Details in the appendix; the shape:

TOutboundBatch = class
  constructor Create(const AOutboundDir: AnsiString;
                     const AOurAka: TFtnAddr);

  { Append a message to the outbound packet for (Target, Flavour).
    Caches the open pkt per (Target, Flavour) pair so repeat calls
    write to the same file until rotation or Flush. }
  function DispatchMessage(const Msg: TMsgRecord;
                           const Target: TFtnAddr;
                           Flavour: FlavourType): boolean;

  { Finalise every cached pkt — write terminator, rename .tmp →
    final, update .flo if configured.  Idempotent. }
  procedure Flush;

  property MaxPktSizeKB: longint;       { 0 = unlimited; rotate at threshold }
  property OnEvent: TMsgEventCallback;
end;

Two features missing in both NR and fpc-msgbase today, baked in here from day one:

Packet size rotation. Before each write, check cached stream size + estimated msg size. If over threshold, close current (.tmp → .pkt rename), open next.
Atomic finalize. Writes go to xxxxxxxx.pkt.tmp. Only Flush (or rotation) renames to .pkt. Crash mid-run leaves an orphan .tmp — not a corrupt real packet.

Format-agnostic: writes to Type-2 / 2+ / 2.2 pkts via the existing ma.fmt.pkt backend. FTN routing stays in the caller (NR's FindRouteForNetmail/GetLinkForRoute is fidoconf-specific, not library material).

Reference map — where to look in existing code

Inside fpc-msgbase

Concern	File	Relevant lines
Factory pattern to extend	`src/ma.api.pas`	full file
`TUniMessage` to grow with Extras	`src/ma.types.pas`	record definition, top of file
Existing inbound tosser to mirror for outbound	`src/ma.batch.pas`	`TPacketBatch`, `GetOrCreateBase` (line 296)
Example tosser template	`examples/example_tosser.pas`	`TSimpleTosser.OnMessage`
Existing lock layer to make explicit	`src/ma.lock.pas`	full file
Event registry to replace with single callback	`src/ma.events.pas`	full file
Per-format native → uni adapters to fold together	`src/formats/*.uni.pas` × 9	each is ~100-200 LOC
Sample data for tests (if any)	`tests/`	check what's already there

In NetReader (`../netreader`)

NR has already solved several of these in its own idiom. The outbound machinery is the most reusable reference:

Concern	File	Relevant lines
Cached outbound packet per dest addr	`src/core/nr.scanner.pas`	`GetOutboundPacket` line 214, `CloseAllPackets` line 291
Temp-pkt filename generator	`src/core/nr.scanner.pas`	`CreateTempPktFileName` call at 256
Per-message route + pack logic	`src/core/nr.scanner.pas`	`PackMsg` line 551
Route resolution (stays in NR)	`src/core/nr.scanner.pas`	`FindRouteForNetmail` 398, `GetLinkForRoute` 451
Priority from Attr + FLAGS kludge	`src/core/nr.scanner.pas`	`PackMsg` lines 609-636
Zone-aware outbound path	`src/core/nr.arcmail.pas`	`GetOutboundDir` (around line 178), `GetFLOPath` 219
IsArcMailExt helper	`src/core/nr.arcmail.pas`	`IsArcMailExt` line 155
Pkt header + message write	`src/core/nr.packet.pas`	`WritePktHeader`, `WritePktMessage`
Scanner's NoHighWaters pattern (useful for read-only verifiers)	`src/core/nr.scanner.pas`	`ScanNMArea` 1098
JAM header-only read (NR's linker CRC fast path)	`src/msgbase/nr.msgbase.jam.pas`	`TNrJamMsgBase.ReadLinkHdr`, `TNrJamLinkHdr` record

NR's entire CLI parity pass (last 7 commits in ../netreader) is built on this scaffolding. Pattern should transfer cleanly.

Reference: format specs

The FTSC document collection at /home/ken/Source Code/ftsc/docs/ and the format-author specs (jam.txt, squish.doc, pcboard.doc, etc.) are the authoritative source. Every backend cites the spec it implements in docs/ftsc-compliance.md.

In comet (`../comet`)

comet is the "plug it in and forget" model. Key patterns worth mirroring:

Single log callback (OnLog), no registry
TStream-centric I/O — caller controls the stream
Config hot-reload without API surgery
Embeddable via narrow callback surface

See ../comet/README.md "Embeddable" bullet.

Test corpus — the `testmsg/` folder

Status: to be populated. Proposal: live at /home/ken/Source Code/fpc-msgbase/testmsg/ (checked in, anonymized, git-tracked so rollback is just git checkout).

Proposed structure:

testmsg/
├── README.md                    how-to regenerate, licensing notes
├── jam/
│   ├── small_echo/              few-hundred-msg JAM area, real echoarea
│   ├── large_echo/              10k+ messages, stresses index growth
│   ├── deleted_mix/             area with tombstoned msgs for Pack tests
│   └── netmail/                 JAM netmail w/ kludges
├── squish/
│   ├── small_echo/
│   └── netmail/
├── hudson/
│   └── 3-board/                 multi-board fixture for Board field test
├── msg/                         FTS-1 numbered *.msg
│   ├── netmail/
│   └── echo/                    rare but tested
├── pcb/
│   └── sample_conf/
├── ezycom/
├── goldbase/
├── wildcat/
├── pkt/
│   ├── type2_plain/
│   ├── type2plus/
│   └── type2_2/
└── reference/
    ├── jam_small_echo.json      canonical-JSON snapshot
    ├── jam_large_echo.json      (round-trip baseline — DO NOT edit)
    └── ...                      one per corpus base

Anonymization rules: before check-in, scrub real user addresses and passwords from kludges. A small helper tools/anonymize.pas can do this deterministically — replaces real MSGID address with 999:9999/9999, replaces user names with User<N> tokens.

Regeneration script: tools/regen_reference.sh walks each corpus base via this library, dumps canonical JSON to testmsg/reference/. Committed output is the ground truth captured from a known-good build; later test runs diff their output against the committed JSON.

Rollback story: git checkout testmsg/ restores any corrupted fixture. Keep fixtures small-ish (<50MB total across all formats) so the repo stays cloneable on slow links.

Byte-agreement cross-verifier — first actionable task

Before any redesign, confirm the two libraries agree on the bytes they read and write today. Without this baseline, reshape could silently regress behavior that currently works.

Tool: tools/cross_verify.pas — standalone FPC program.

What it does:

Open a corpus base with the existing fpc-msgbase.
Read all messages → dump each to a canonical JSON record.
Open the same base with NetReader's nr.msgbase.*.
Read all messages → dump to same canonical JSON format.
Diff the two outputs. Report mismatches per field.

For write verification:

Fabricate 100 messages with known content.
Write them through fpc-msgbase to a fresh JAM base.
Read them back through NR's backend, diff.
Repeat with write via NR, read via fpc-msgbase.

Expected outcome: both should agree for universal fields. Where they disagree is where the proposal's Extras story becomes important — those are the fields each side handles differently (or one side silently drops).

Deliverable: a report noting exactly which fields differ per format, so the Extras registry (above) is anchored in real data rather than speculation.

This is ~2-4 hours of work and produces the single most important input to the reshape.

Implementation plan — six weeks

Week	Deliverables
1	`testmsg/` corpus committed (3-5 bases per format, anonymized). `tools/cross_verify.pas` running. `tools/regen_reference.sh` producing committed canonical JSON. Baseline: which fields differ between fpc-msgbase and NR today?
2	Grow `TUniMessage` → `TMsgRecord` with `Extras` bag. Publish `docs/extras-registry.md` naming every well-known key. Backfill round-trip test: every backend reads a corpus base, writes to a fresh base, reads back, Extras map preserved. Fold `.uni` sidecars into format units.
3	Introduce `ITsmIO`. Refactor every format backend to take an `ITsmIO` instead of `TFileStream` directly. Add `TMemoryTsmIO` for tests. Run the full test suite in memory — no disk writes.
4	Land `TOutboundBatch` with size rotation + atomic finalize. Write `examples/example_multiplex` that splits inbound pkts by destination and writes per-link outbound pkts via `TOutboundBatch`.
5	Unify error model (typed `EMessageBase*` tree; boolean returns mean "nothing to do" only). Replace `TMessageEvents` registry with single `OnEvent` callback. `TMsgBase` locking becomes explicit (no implicit `LockForRead` on Open). `WithReadLock` / `WithWriteLock` helpers.
6	Documentation pass. Full API reference regen. `CHANGELOG.md` with 0.1 → 1.0 migration notes. Version constants `MA_VERSION_MAJOR`/`MINOR` + runtime `MaRequireVersion`. `{$DEFINE MA_WITH_<FORMAT>}` gates finalized. Ship 1.0.

Each week's work ships independently — no big-bang merge.

Beyond the six-week plan — things worth planning

Version ABI discipline

const
  MA_VERSION_MAJOR = 1;
  MA_VERSION_MINOR = 0;

{ Caller invokes at program start.  Raises if compiled against
  major < required or (major == required AND minor < required). }
procedure MaRequireVersion(Major, Minor: integer);

Libraries consumed by multiple callers (Fimail, NetReader, third-party) need a noisy "you linked the wrong version" failure mode. Not "weird behavior six months later."

Read-only airtightness

momReadOnly should be verifiable. Regression test:

Copy a base to a read-only mount (chmod -w + mount -o ro).
OpenMsgBase(path, momReadOnly).
Read 10k messages, every field.
Assert: every file's st_mtime is unchanged; no syscall fired that opens a file for write.

A BBS running concurrently with a tosser depends on this being airtight — no "oops, we updated last-read" surprises.

Build-time format gating

{ config.inc }
{$DEFINE MA_WITH_JAM}
{$DEFINE MA_WITH_HUDSON}
{$DEFINE MA_WITH_SQUISH}
{ $DEFINE MA_WITH_WILDCAT}    { commented out — 66K LOC pulled in }

{ OpenMsgLib.All.pas — convenience include-all unit }
unit OpenMsgLib.All;
interface
uses
  OpenMsgLib
  {$IFDEF MA_WITH_JAM}, OpenMsgLib.Jam{$ENDIF}
  {$IFDEF MA_WITH_HUDSON}, OpenMsgLib.Hudson{$ENDIF}
  ...
  ;
end.

Embedders control what ships. A full tosser wants everything. A minimal BBS UI wants JAM only. No one should be forced to compile BTrieve-era SDK code for a JAM reader.

Documentation structure

docs/
├── API.md                          full API reference (regen each 1.x)
├── architecture.md                 layered design (update to three-tier)
├── extras-registry.md              well-known Extras keys per format
├── ftsc-compliance.md              spec notes
├── migration-0.1-to-1.0.md         for existing callers
├── embedder-guide.md               for BBS/tosser authors
└── format-notes/                   per-format quirks & gotchas
    ├── jam.md
    ├── squish.md
    └── ...

CI expectations

When the repo has CI (GitHub Actions, GitLab CI, whatever), the test job is:

Build with all formats enabled.
Build with each format disabled in turn — prove conditional compilation holds.
Run run_tests.sh.
Run tools/cross_verify.pas against the corpus.
Verify docs/extras-registry.md lists every key any backend writes (grep the source for Extras.SetValue).

Contribution path back from NetReader

NR has six weeks of HPT-parity work sitting on top of its own msgbase. When the reshape hits 1.0, NR has a decision:

A. Adopt fpc-msgbase wholesale — drop nr.msgbase.*, call the lib. NR becomes a thin areafix + scanner + CLI over the shared library. Big commit, one-time pain.
B. Keep NR's backends, cherry-pick fpc-msgbase's event dispatcher / lock model / outbound batch. Lighter touch.
C. Contribute NR's improvements (JAM CRC fast-path, case-insensitive .msg globbing, netmail-cfg-writer fix, FTS-0004 tag validation) back to fpc-msgbase. Symmetric win.

Option C is the first step regardless of whether A or B lands later. NR's recent work is format-agnostic quality improvements that every embedder wants.

Open questions for the implementation session

Before any code changes:

Extras representation. Key-value AnsiString bag is dead simple but slow for 100K messages. Alternatives:
- TDictionary<AnsiString, AnsiString> (rtl-generics) — faster lookup
- Packed binary blob with offsets — smallest memory footprint
- Keep strings but cap each Extras to N keys (fixed-size array)
Benchmark before picking.
Transaction nesting. Can BeginTransaction nest? JAM defers .jdx updates in memory — nested transactions just keep deferring. SDM's tempdir-shadow approach can't nest cleanly. Propose: no nesting. Second BeginTransaction call raises.
Thread safety of TMsgBase. Inbound TPacketBatch shares one base across workers, serialises via per-base CS. Works today. Does the reshape preserve that? Answer: yes, the explicit lock API makes it more obvious.
Squish .sql lastread vs .sqi index. This library treats .sql as the cross-process lock sentinel (matching the convention other Squish-aware tools use). The reshape should document this explicitly — it's a format-specific quirk that callers shouldn't need to know.
PKT as a base vs as a stream. TMessageBase abstraction assumes random-access read. A PKT is a forward-only stream of messages. Does PKT implement Count? (Reader would need to scan ahead to count.) Propose: PKT implements Count but flags CanRandomAccess = False, caller iterates via MoveNext instead of Read(i). Callers who treat PKT as a base get a clear exception.
Wildcat SDK cleanup. Is the 40K-LOC wc_sdk/ still needed, or can it be replaced with a narrower interop layer? (Not a 1.0 blocker, but worth scoping for 1.1.)

Decisions on these should go in docs/design-decisions.md as they land, so future sessions don't re-litigate.

First-session actionable steps

If this proposal makes sense to the next implementer, the first session should:

Read this document end-to-end. Cross-check my mapping of fpc-msgbase's current state against the actual code — call out anything stale.
Create testmsg/ with a corpus. Start with ONE format (JAM — best-spec'd, most used) and 3 bases: small echoarea (~100 msgs), large echoarea (~10K), netmail directory. Commit.
Write tools/cross_verify.pas. Use NR's nr.msgbase.jam and fpc-msgbase's ma.fmt.jam as the two readers. Dump canonical JSON, diff. Report.
Report the diff. What fields do the two libraries disagree on today? That diff becomes the initial extras-registry.md.
Stop. Discuss. Before any backend refactor, the cross-diff report informs every architectural decision below. If the two libraries disagree on 20% of fields, the Extras story is validated. If they agree on 100%, the Extras story is more about future-proofing than immediate need.

Only after the cross-diff is in hand does the week-1 plan above make sense.

Appendix: the `TOutboundBatch` design in full

(From the cross-project conversation; reproduced here for completeness.)

unit OpenMsgLib.Outbound;

{$mode objfpc}{$H+}

interface

uses
  Classes, SysUtils,
  OpenMsgLib, OpenMsgLib.Types, OpenMsgLib.Pkt;

type
  TOutboundBatch = class
  private
    FOutboundDir: AnsiString;
    FOurAka:      TFtnAddr;
    FMaxPktSizeKB: longint;
    FCacheCS:     TRTLCriticalSection;
    FCache:       TFPHashList;  { key: "zone:net/node.point|flavour" -> TEntry }
    FOnEvent:     TMsgEventCallback;

    function  GetOrCreateEntry(const Target: TFtnAddr;
                               Flavour: FlavourType): TEntry;
    procedure RotateIfOver(Entry: TEntry; EstimatedSize: integer);
    procedure FinalizeEntry(Entry: TEntry);  { rename .tmp → final }
  public
    constructor Create(const AOutboundDir: AnsiString;
                       const AOurAka: TFtnAddr);
    destructor Destroy; override;

    function DispatchMessage(const Msg: TMsgRecord;
                             const Target: TFtnAddr;
                             Flavour: FlavourType): boolean;
    procedure Flush;       { finalize every cached entry }

    property MaxPktSizeKB: longint read FMaxPktSizeKB write FMaxPktSizeKB;
    property OnEvent: TMsgEventCallback read FOnEvent write FOnEvent;
  end;

Per-entry:

TEntry = class
  Key:        AnsiString;      { "zone:net/node.point|flavour" }
  Target:     TFtnAddr;
  Flavour:    FlavourType;
  Stream:     TStream;         { writer, actually holds an ITsmIO under the hood }
  TmpPath:    AnsiString;      { xxxxxxxx.pkt.tmp }
  FinalPath:  AnsiString;      { xxxxxxxx.pkt — set on Rotate/Flush }
  CS:         TRTLCriticalSection;   { serialises writes to this one pkt }
  WrittenSize: int64;
end;

Flow inside DispatchMessage:

Lookup entry by (Target, Flavour) in FCache.
- Miss: create entry, open TmpPath, write pkt header, cache.
- Hit: enter its CS.
RotateIfOver(Entry, EstimatedMsgSize):
- If WrittenSize + EstimatedMsgSize > MaxPktSizeKB * 1024 (and MaxPktSizeKB > 0):
  - Write terminator, close, rename .tmp → .pkt.
  - Create a new .tmp, write header, reset WrittenSize.
  - Fire metPktRotated.
Convert TMsgRecord → TPktMessage via UniToPkt.
Write via OpenMsgLib.Pkt writer. Update WrittenSize.
Fire metMessageWritten.
Leave entry CS.

Flush: iterate FCache, for each entry write terminator, close, FinalizeEntry. Fire metBatchFinalized.

Crash recovery: on startup, scanner sees orphan xxxxxxxx.pkt.tmp files. Decision policy (caller-configurable):

Discard (default): delete orphan tmps, assume corrupt.
Recover: try to validate pkt header + terminator, if valid rename to .pkt, else discard.

References to the cross-project conversation

Full transcript of the NR ↔ fpc-msgbase discussion lives in the session log at /home/ken/.claude/projects/-home-ken-Source-Code-netreader/*.jsonl (dates around 2026-04-15). Relevant decisions captured in this document; the session log has the reasoning trail if questions arise.

Key points from that conversation, reproduced for the next session:

Why not fork fpc-msgbase into a new project? Because fpc-msgbase has tested format backends already. Forking means re-validating them. Reshape in place preserves that investment.
Why the five-line Hello World test? comet achieves plug-and-forget with a single log callback and a TStream-based API. That's the bar. If the lib requires more ceremony than "open path, read, close," it's not there yet.
Why explicit locking, not implicit? BBSes with their own global mutex don't want the lib double-locking. Stateless readers don't need cross-process locks. Library guessing leads to surprises. Explicit means embedders can always reason about behavior.
Why single callback, not registry? comet's OnLog proves one pointer is enough. Multi-observer is the caller's problem — they can write their own fan-out if needed. Registry adds state the library shouldn't own.

Closing

This is a design proposal, not a mandate. The next implementer should push back on anything that doesn't hold up against real code. The testmsg corpus + cross-verifier gives us the data to have that conversation grounded in bytes rather than opinion.

When in doubt: simpler is better. comet's model works because it refused to do things the library didn't absolutely need to do. fpc-msgbase 1.0 should exit with less code, not more, than 0.1.0 — the reshape is about architectural clarity, not feature addition. Features come in 1.1+ on top of a clean 1.0.

31 KiB Raw Permalink Blame History Unescape Escape