Files
fpc-msgbase/docs/architecture.md
Ken Johnson e876d98b83 0.3.5: ma.kludge shared helper, INTL/FMPT/TOPT, area auto-pop, list helpers, PKT polish
Five consumer-feedback items, one milestone:

(1) Shared FTSC kludge plumbing in src/ma.kludge.pas

  ParseKludgeLine, SplitKludgeBlob, BuildKludgePrefix,
  BuildKludgeSuffix.  Single source of truth for kludge naming,
  INTL/FMPT/TOPT recognition, and the kludge.<lowername>
  forward-compat passthrough.  Eliminates the four near-identical
  parsers MSG/PKT/Squish were carrying; JAM's FTSKLUDGE subfield
  walking also routes through ParseKludgeLine so its unknown
  kludges land in the same `kludge.<name>` slot as the others.

  Bug fix folded in: the parser previously split kludge name from
  value at the first ':' it found, which broke INTL (the value
  contains an FTN address with ':' in it).  Now picks the earlier
  of space and colon, which handles both colon-form ("MSGID: foo")
  and space-form ("INTL <to> <from>") kludges correctly.

(2) INTL / FMPT / TOPT slots in attributes registry

  FSC-4008 cross-zone routing kludges every netmail tosser carries.
  Added to JAM/Squish/MSG/PKT capability lists, parsed natively,
  emitted on Write.  Round-trip covered by tests.

(3) Unified `kludge.*` namespace for unknown FTSC kludges

  Squish's `squish.kludge.<name>`, MSG's `msg.kludge.<name>`, and
  PKT's `pkt.kludge.<name>` all collapse to plain `kludge.<name>`.
  Consumers find passthrough kludges without switching on format.
  JAM's numeric `jam.subfield.<id>` stays — those are JAM-specific
  binary subfields, not FTSC-form kludges.

(4) `area` auto-populated from base.AreaTag on Read

  When the caller passes AAreaTag to MessageBaseOpen (or sets
  the AreaTag property post-construction), every successful
  ReadMessage fills msg.Attributes['area'] unless the adapter
  already populated it from on-disk data (e.g. PKT AREA kludge).
  Saves echomail consumers from copying AreaTag into every
  message attribute manually.

(5) TMsgAttributes multi-line helpers

  GetList / SetList / AppendListItem on TMsgAttributes for the
  multi-instance attributes (seen-by, path, via, trace) that
  store with #13 between entries.  Consumers don't have to roll
  their own split/join.

Plus two PKT polish items from the same feedback round:

(6) ma.fmt.pkt.uni.DoWriteMessage now raises EMessageBase
    explicitly with a pointer to the Native API instead of
    silently returning False.

(7) TPktFile.CreateFromStream / CreateNewToStream constructors
    accept any TStream (with optional ownership), so unit tests
    that round-trip via TMemoryStream don't have to tempfile-dance.
    FStream is now TStream; FOwnsStream gates Free in destructor.

TStringDynArray moved from ma.api.pas to ma.types.pas so both
the capabilities API and the new attribute helpers can share it.

Docs sweep:

- docs/attributes-registry.md: intl/fmpt/topt added; unknown-kludge
  convention documented; multi-line helper section added.
- docs/architecture.md: ma.kludge layer surfaced; .uni adapter
  registration gotcha called out loudly with the recommended
  uses clause; area auto-pop documented.
- docs/API.md: TUniMessage section rewritten for Body+Attributes
  model (was still pre-0.2); HWM API documented; PKT cheat-sheet
  notes Native + CreateFromStream; tests/programs list updated.
- README.md: Building section flags the .uni gotcha first
  thing; ma.kludge added to features.

tests/test_consumer_round1.pas: 7 new tests covering INTL/FMPT/
TOPT round-trip on JAM/Squish/MSG, area auto-pop, GetList/SetList/
AppendListItem, PKT raise, and TPktFile in-memory stream
round-trip.

Suite: 47/47 across 10 programs (test_consumer_round1 adds 7).
2026-04-18 09:14:33 -07:00

11 KiB

fpc-msgbase — architecture

Layers

        ┌──────────────────────────────────────────────────┐
        │  Caller (BBS, tosser, editor, importer, …)       │
        └──────────────────────────────────────────────────┘
                              │
                              ▼
        ┌──────────────────────────────────────────────────┐
        │  ma.api (TMessageBase, factory, TUniMessage)     │
        ├──────────────────────────────────────────────────┤
        │  ma.events   ma.lock   ma.paths    ma.kludge     │
        │  ma.batch (concurrent tosser helper)             │
        ├──────────────────────────────────────────────────┤
        │  Format backends — two .pas units per format:    │
        │   ma.fmt.<fmt>     - native record + I/O class   │
        │   ma.fmt.<fmt>.uni - TMessageBase adapter        │
        │  ma.fmt.hudson(.uni)   ma.fmt.jam(.uni)          │
        │  ma.fmt.squish(.uni)   ma.fmt.msg(.uni)          │
        │  ma.fmt.pkt(.uni)      ma.fmt.pcboard(.uni)      │
        │  ma.fmt.ezycom(.uni)   ma.fmt.goldbase(.uni)     │
        │  ma.fmt.wildcat(.uni)                            │
        ├──────────────────────────────────────────────────┤
        │  RTL: TFileStream, BaseUnix/Windows for locking  │
        └──────────────────────────────────────────────────┘

Integration gotcha: to use a backend through the unified TMessageBase API you must include the .uni adapter unit in your uses clause, not just the native ma.fmt.<format> unit. The adapter's initialization block is what registers the backend with the factory.

uses
  ma.types, ma.events, ma.api,
  ma.fmt.jam, ma.fmt.jam.uni;     { both — .uni is what registers }

Forgetting .uni produces EMessageBase: No backend registered for JAM at the first MessageBaseOpen(mbfJam, ...) call. The exception message hints at the fix.

Polymorphism

Every backend descends from TMessageBase and implements the abstract DoOpen, DoClose, DoMessageCount, DoReadMessage, DoWriteMessage contract. Callers can either:

  1. Use the unified API — MessageBaseOpen(format, path, mode) returns a TMessageBase. Read/write through TUniMessage. Format-agnostic.
  2. Drop down to format-specific class methods (e.g. TJamBase.IncModCounter, TSquishBase.SqHashName) when they need behaviour the unified API cannot express. Each backend keeps its rich API public.

TUniMessage — two-area model

TUniMessage = record
  Body:       AnsiString;       { only the message text }
  Attributes: TMsgAttributes;   { everything else, key/value }
end;

Two areas, no surprises:

  • Body carries the user-visible message text and nothing else. Never kludge lines, never headers, never SEEN-BY/PATH. Always a ready-to-display blob.
  • Attributes carries every other piece of data: From, To, Subject, dates, addresses, attribute bits, FTSC kludges (MSGID, ReplyID, PID, SEEN-BY, PATH, …), and per-format extras (jam.msgidcrc, squish.umsgid, pcb.confnum, …).

Same model as RFC 822 email (headers + body). Lossless round-trip across Read → Write → Read is enforced by the regression suite in tests/test_roundtrip_attrs.pas.

The library never composes presentation. A BBS that wants to display kludges inline walks Attributes and prepends ^aMSGID: etc. to its own display. A BBS that hides kludges just shows Body. A tosser that needs MSGID for dupe detection reads Attributes.Get('msgid') directly — no body parsing required.

Dates land in TDateTime regardless of how the backend stored them (Hudson MM-DD-YY strings with 1950 pivot, Squish FTS-0001 strings, JAM Unix timestamps, PCBoard / EzyCom DOS PackTime). Stored in attributes as date.written / date.received via SetDate / GetDate.

Format-specific bit fields (Hudson byte attr, JAM 32-bit attr, Squish attr, MSG word attr, PCB status, EzyCom dual byte) are unrolled into individual attr.* boolean attributes on Read via UniAttrBitsToAttributes and recomposed on Write via UniAttrBitsFromAttributes and the per-format XxxAttrFromUni helpers. The canonical MSG_ATTR_* cardinal bitset stays as the internal pivot.

High-Water Mark (HWM) — per-user scanner pointer

Tossers, scanners, and editors that want to track "last message I processed for user X" can use the per-user HWM API on TMessageBase:

function  SupportsHWM: boolean;
function  GetHWM(const UserName: AnsiString): longint;
procedure SetHWM(const UserName: AnsiString; MsgNum: longint);
procedure MapUser(const UserName: AnsiString; UserId: longint);
property  ActiveUser: AnsiString;     { auto-bump on Read }

HWM uses the format's native lastread mechanism, not a sidecar. A tosser registers itself as just another user ('NetReader', 'Allfix', 'FidoMail-Toss') and its HWM lives in the same file the BBS uses for human-user lastread, so multiple consumers naturally coexist without colliding.

Coverage:

Format HWM Mechanism
JAM .JLR (CRC32(lower(name)))
Squish .SQL (CRC32(lower(name)))
Hudson LASTREAD.BBS per-(user-id, board); needs MapUser + Board
GoldBase LASTREAD.DAT per-(user-id, board); needs MapUser + Board
EzyCom per-user state lives in the BBS user records, not the message base; no msg-base lastread file to plumb
Wildcat SDK exposes MarkMsgRead per-message but no per-user HWM primitive
PCBoard USERS file lastread per-conference; deferred
MSG, PKT spec has no HWM concept

For the multi-board formats (Hudson, GoldBase) the caller must set both:

  • base.MapUser('NetReader', 60001) — pick a numeric user ID (use 60000+ to avoid colliding with real BBS users).
  • base.Board := N — the board / conference number this scan is for. The same physical Hudson base contains all 200 boards; HWM is per-(user, board).

Without either, GetHWM returns -1.

For unsupported formats SupportsHWM returns false and GetHWM returns -1; SetHWM is a no-op. Caller falls back to its own state for those formats (e.g. NR's dupedb).

Auto-bump pattern for scanners:

base.ActiveUser := 'NetReader';
for i := 0 to base.MessageCount - 1 do begin
  base.ReadMessage(i, msg);
  { ... process msg ... }
  { HWM auto-tracks the highest msg.num seen for NetReader. }
end;

When ActiveUser is set, ReadMessage calls SetHWM after each successful read if the just-read msg.num is strictly greater than the current HWM. Never decrements -- reading a lower-numbered message is a no-op. Default off (ActiveUser = '').

Multi-tenant by design: every scanner / tosser gets its own slot in the lastread file, keyed by its name. NR as 'NetReader', Allfix as 'Allfix', Fimail as 'FidoMail-Toss' -- they all coexist in .JLR / .SQL without interfering with each other or with human-user lastread.

Pack/purge is the format's responsibility: each backend's Pack rewrites the lastread file in step with the message renumbering. For JAM and Squish this is handled natively.

area auto-population

When the caller passes an AAreaTag to MessageBaseOpen (or sets the AreaTag property post-construction), every successful ReadMessage auto-populates Msg.Attributes['area'] with that tag — but only if the adapter didn't already populate it from on-disk data (PKT's AREA kludge, for example).

This saves echomail consumers from having to copy AreaTag into every message attribute manually. Multi-format scanners always get a populated area when the area is configured.

Shared kludge plumbing — ma.kludge

ma.kludge exposes the FTSC-form-kludge parsing/emission helpers the inline-kludge backends (MSG, PKT) and CtrlInfo-style backend (Squish) share, plus what JAM's FTSKLUDGE subfield walking uses:

function  ParseKludgeLine(const Line: AnsiString;
                          var A: TMsgAttributes): boolean;
procedure SplitKludgeBlob(const RawBody: AnsiString;
                          out PlainBody: AnsiString;
                          var A: TMsgAttributes);
function  BuildKludgePrefix(const A: TMsgAttributes): AnsiString;
function  BuildKludgeSuffix(const A: TMsgAttributes): AnsiString;

Consumers that need to parse raw FTSC body blobs (e.g. parity tests, format converters, debug tools) can call these directly without reaching into a backend. Single source of truth for kludge naming, INTL/FMPT/TOPT recognition, and the kludge.<name> forward-compat passthrough.

Capabilities API — backend self-description

Each backend declares the canonical list of attribute keys it understands via a class function:

class function TMessageBase.ClassSupportedAttributes: TStringDynArray;

Callers query before setting:

if base.SupportsAttribute('attr.returnreceipt') then
  RenderReceiptCheckbox
else
  HideReceiptCheckbox;

Backends silently ignore unknown attributes on Write (RFC 822 X-header semantics — fine for forward compatibility); the capabilities API exists so callers know in advance which keys won't survive on a given format. The full per-format support matrix lives in docs/attributes-registry.md.

Locking

Three layers, applied in order on every Open:

  1. In-processTRTLCriticalSection per TMessageBase instance.
  2. Cross-process — advisory lock on a sentinel file (<base>.lck or, for Squish, <base>.SQL so we coexist with other Squish-aware tools). fpflock(LOCK_EX|LOCK_SH) on Unix, LockFileEx on Windows. Retry with backoff up to a configurable timeout (default 30s). Lock acquire/release fires events.
  3. OS share modesfmShareDenyWrite for writers, fmShareDenyNone for readers, matching DOS-era multi-process sharing conventions every classic format expects.

Events

TMessageEvents lets callers subscribe one or more handlers to receive metBaseOpened, metMessageRead, metMessageWritten, metLockAcquired, metPackProgress, etc. Internally the dispatcher serialises calls so handlers do not need to be reentrant.

Concurrent tossers

TPacketBatch owns a queue of .pkt paths and a worker thread pool. Each worker opens its packet, reads messages, hands each to the caller-provided processor. The batch caches one TMessageBase per destination area so writes serialise through layer-1 locking; layer-2 keeps separate processes (e.g. an editor) safe at the same time.

Behavioural fidelity

Every format backend is implemented from the published format specification (FTSC documents and the original format authors' own spec papers — see docs/ftsc-compliance.md). Tests read and write real sample bases captured from working BBS installations; round-trip tests verify byte-for-byte preservation across read → write → read cycles.