# fpc-msgbase — architecture ## Layers ``` ┌──────────────────────────────────────────────────┐ │ Caller (BBS, tosser, editor, importer, …) │ └──────────────────────────────────────────────────┘ │ ▼ ┌──────────────────────────────────────────────────┐ │ mb.api (TMessageBase, factory, TUniMessage) │ ├──────────────────────────────────────────────────┤ │ mb.events mb.lock mb.paths mb.kludge │ ├──────────────────────────────────────────────────┤ │ Format backends — two .pas units per format: │ │ mb.fmt. - native record + I/O class │ │ mb.fmt..uni - TMessageBase adapter │ │ mb.fmt.hudson(.uni) mb.fmt.jam(.uni) │ │ mb.fmt.squish(.uni) mb.fmt.msg(.uni) │ │ mb.fmt.pcboard(.uni) mb.fmt.ezycom(.uni) │ │ mb.fmt.goldbase(.uni) mb.fmt.wildcat(.uni) │ ├──────────────────────────────────────────────────┤ │ RTL: TFileStream, BaseUnix/Windows for locking │ └──────────────────────────────────────────────────┘ ┌──────────────────────────────────────────────────┐ │ Sibling: fpc-ftn-transport │ │ tt.pkt.format / tt.pkt.reader (registers │ │ mbfPkt) / tt.pkt.writer / tt.pkt.batch │ │ plus forthcoming BSO / ArcMail / drop modules. │ └──────────────────────────────────────────────────┘ ``` PKT is a wire format and lives in `fpc-ftn-transport`, not here. The `mbfPkt` enum value stays in `mb.types` so `tt.pkt.reader` can register the backend with the unified-API factory. Consumers wanting to iterate `.pkt` files just `uses tt.pkt.reader` and call `MessageBaseOpen(mbfPkt, ...)` as usual. TPacketBatch (was here as `ma.batch`) moved with it as `tt.pkt.batch`. **Integration gotcha:** to use a backend through the unified `TMessageBase` API you must include the `.uni` adapter unit in your `uses` clause, not just the native `mb.fmt.` unit. The adapter's `initialization` block is what registers the backend with the factory. ```pascal uses mb.types, mb.events, mb.api, mb.fmt.jam, mb.fmt.jam.uni; { both — .uni is what registers } ``` Forgetting `.uni` produces `EMessageBase: No backend registered for JAM` at the first `MessageBaseOpen(mbfJam, ...)` call. The exception message hints at the fix. ## Polymorphism Every backend descends from `TMessageBase` and implements the abstract `DoOpen`, `DoClose`, `DoMessageCount`, `DoReadMessage`, `DoWriteMessage` contract. Callers can either: 1. Use the unified API — `MessageBaseOpen(format, path, mode)` returns a `TMessageBase`. Read/write through `TUniMessage`. Format-agnostic. 2. Drop down to format-specific class methods (e.g. `TJamBase.IncModCounter`, `TSquishBase.SqHashName`) when they need behaviour the unified API cannot express. Each backend keeps its rich API public. ## TUniMessage — two-area model ```pascal TUniMessage = record Body: AnsiString; { only the message text } Attributes: TMsgAttributes; { everything else, key/value } end; ``` Two areas, no surprises: - **Body** carries the user-visible message text and nothing else. Never kludge lines, never headers, never SEEN-BY/PATH. Always a ready-to-display blob. - **Attributes** carries every other piece of data: From, To, Subject, dates, addresses, attribute bits, FTSC kludges (MSGID, ReplyID, PID, SEEN-BY, PATH, …), and per-format extras (`jam.msgidcrc`, `squish.umsgid`, `pcb.confnum`, …). Same model as RFC 822 email (headers + body). Lossless round-trip across Read → Write → Read is enforced by the regression suite in `tests/test_roundtrip_attrs.pas`. **The library never composes presentation.** A BBS that wants to display kludges inline walks `Attributes` and prepends `^aMSGID:` etc. to its own display. A BBS that hides kludges just shows `Body`. A tosser that needs MSGID for dupe detection reads `Attributes.Get('msgid')` directly — no body parsing required. Dates land in `TDateTime` regardless of how the backend stored them (Hudson `MM-DD-YY` strings with 1950 pivot, Squish FTS-0001 strings, JAM Unix timestamps, PCBoard / EzyCom DOS PackTime). Stored in attributes as `date.written` / `date.received` via `SetDate` / `GetDate`. Format-specific bit fields (Hudson byte attr, JAM 32-bit attr, Squish attr, MSG word attr, PCB status, EzyCom dual byte) are unrolled into individual `attr.*` boolean attributes on Read via `UniAttrBitsToAttributes` and recomposed on Write via `UniAttrBitsFromAttributes` and the per-format `XxxAttrFromUni` helpers. The canonical `MSG_ATTR_*` cardinal bitset stays as the internal pivot. ### High-Water Mark (HWM) — per-user scanner pointer Tossers, scanners, and editors that want to track "last message I processed for user X" can use the per-user HWM API on `TMessageBase`: ```pascal function SupportsHWM: boolean; function GetHWM(const UserName: AnsiString): longint; procedure SetHWM(const UserName: AnsiString; MsgNum: longint); procedure MapUser(const UserName: AnsiString; UserId: longint); property ActiveUser: AnsiString; { auto-bump on Read } ``` HWM uses the format's native lastread mechanism, not a sidecar. A tosser registers itself as just another user (`'NetReader'`, `'Allfix'`, `'FidoMail-Toss'`) and its HWM lives in the same file the BBS uses for human-user lastread, so multiple consumers naturally coexist without colliding. **Coverage:** | Format | HWM | Mechanism | |---|:-:|---| | JAM | ✓ | `.JLR` (CRC32(lower(name))) | | Squish | ✓ | `.SQL` (CRC32(lower(name))) | | Hudson | ✓ | `LASTREAD.BBS` per-(user-id, board); needs `MapUser` + `Board` | | GoldBase | ✓ | `LASTREAD.DAT` per-(user-id, board); needs `MapUser` + `Board` | | EzyCom | — | per-user state lives in the BBS user records, not the message base; no msg-base lastread file to plumb | | Wildcat | — | SDK exposes `MarkMsgRead` per-message but no per-user HWM primitive | | PCBoard | — | USERS file lastread per-conference; deferred | | MSG, PKT | — | spec has no HWM concept | For the multi-board formats (Hudson, GoldBase) the caller must set both: - `base.MapUser('NetReader', 60001)` — pick a numeric user ID (use 60000+ to avoid colliding with real BBS users). - `base.Board := N` — the board / conference number this scan is for. The same physical Hudson base contains all 200 boards; HWM is per-(user, board). Without either, `GetHWM` returns -1. For unsupported formats `SupportsHWM` returns false and `GetHWM` returns -1; `SetHWM` is a no-op. Caller falls back to its own state for those formats (e.g. NR's dupedb). **Auto-bump pattern for scanners:** ```pascal base.ActiveUser := 'NetReader'; for i := 0 to base.MessageCount - 1 do begin base.ReadMessage(i, msg); { ... process msg ... } { HWM auto-tracks the highest msg.num seen for NetReader. } end; ``` When `ActiveUser` is set, `ReadMessage` calls `SetHWM` after each successful read if the just-read `msg.num` is strictly greater than the current HWM. Never decrements -- reading a lower-numbered message is a no-op. Default off (`ActiveUser = ''`). **Multi-tenant by design:** every scanner / tosser gets its own slot in the lastread file, keyed by its name. NR as `'NetReader'`, Allfix as `'Allfix'`, Fimail as `'FidoMail-Toss'` -- they all coexist in `.JLR` / `.SQL` without interfering with each other or with human-user lastread. **Pack/purge** is the format's responsibility: each backend's Pack rewrites the lastread file in step with the message renumbering. For JAM and Squish this is handled natively. ### `area` auto-population When the caller passes an `AAreaTag` to `MessageBaseOpen` (or sets the `AreaTag` property post-construction), every successful `ReadMessage` auto-populates `Msg.Attributes['area']` with that tag — but only if the adapter didn't already populate it from on-disk data (PKT's AREA kludge, for example). This saves echomail consumers from having to copy AreaTag into every message attribute manually. Multi-format scanners always get a populated `area` when the area is configured. ### Shared kludge plumbing — `mb.kludge` `mb.kludge` exposes the FTSC-form-kludge parsing/emission helpers the inline-kludge backends (MSG, PKT) and CtrlInfo-style backend (Squish) share, plus what JAM's FTSKLUDGE subfield walking uses: ```pascal function ParseKludgeLine(const Line: AnsiString; var A: TMsgAttributes): boolean; procedure SplitKludgeBlob(const RawBody: AnsiString; out PlainBody: AnsiString; var A: TMsgAttributes); function BuildKludgePrefix(const A: TMsgAttributes): AnsiString; function BuildKludgeSuffix(const A: TMsgAttributes): AnsiString; ``` Consumers that need to parse raw FTSC body blobs (e.g. parity tests, format converters, debug tools) can call these directly without reaching into a backend. Single source of truth for kludge naming, INTL/FMPT/TOPT recognition, and the `kludge.` forward-compat passthrough. ### Capabilities API — backend self-description Each backend declares the canonical list of attribute keys it understands via a class function: ```pascal class function TMessageBase.ClassSupportedAttributes: TStringDynArray; ``` Callers query before setting: ```pascal if base.SupportsAttribute('attr.returnreceipt') then RenderReceiptCheckbox else HideReceiptCheckbox; ``` Backends silently ignore unknown attributes on Write (RFC 822 X-header semantics — fine for forward compatibility); the capabilities API exists so callers know in advance which keys won't survive on a given format. The full per-format support matrix lives in `docs/attributes-registry.md`. ## Locking Three layers, applied in order on every `Open`: 1. **In-process** — `TRTLCriticalSection` per `TMessageBase` instance. 2. **Cross-process** — advisory lock on a sentinel file (`.lck` or, for Squish, `.SQL` so we coexist with other Squish-aware tools). `fpflock(LOCK_EX|LOCK_SH)` on Unix, `LockFileEx` on Windows. Retry with backoff up to a configurable timeout (default 30s). Lock acquire/release fires events. 3. **OS share modes** — `fmShareDenyWrite` for writers, `fmShareDenyNone` for readers, matching DOS-era multi-process sharing conventions every classic format expects. ## Events `TMessageEvents` lets callers subscribe one or more handlers to receive `metBaseOpened`, `metMessageRead`, `metMessageWritten`, `metLockAcquired`, `metPackProgress`, etc. Internally the dispatcher serialises calls so handlers do not need to be reentrant. ## Concurrent tossers `TPacketBatch` (was `ma.batch` here pre-0.4.0; now lives in `fpc-ftn-transport` as `tt.pkt.batch`) owns a queue of `.pkt` paths and a worker thread pool. Each worker opens its packet, reads messages, hands each to the caller-provided processor. The batch caches one `TMessageBase` per destination area so writes serialise through layer-1 locking; layer-2 keeps separate processes (e.g. an editor) safe at the same time. Class name unchanged for caller compatibility. ## Memory ownership Shared rule across the fpc-* ecosystem (msgbase, ftn-transport, binkp, comet, emsi, log): Public types exposed to callers are either **value records** (`TFTNAddress`, `TUniMessage`, `TMsgAttributes` — owned by the caller's stack/heap; copy semantics) or **TObject descendants the caller constructs and frees** (`TMessageBase` and its backends). Returned `TBytes` / `string` / `TStream` values are RTL-managed and the caller frees via normal heap semantics. The library never allocates memory with `GetMem` and expects the caller to `FreeMem` (or vice versa). This keeps static- linked consumers (no shared-heap plugin model like Fastway's cmem-first pattern) compatible without fiddling. ## Behavioural fidelity Every format backend is implemented from the published format specification (FTSC documents and the original format authors' own spec papers — see `docs/ftsc-compliance.md`). Tests read and write real sample bases captured from working BBS installations; round-trip tests verify byte-for-byte preservation across read → write → read cycles.