Files
fpc-msgbase/docs/architecture.md
Ken Johnson a541085a4b 0.7.0: extract TFTNAddress into leaf mb.address unit
Ask 1 from fpc-binkp consumer thread: non-storage libraries
(fpc-ftn-transport, fpc-binkp, future fpc-comet-proto / fpc-emsi,
SQL-backed messaging like Fastway) only need TFTNAddress, not the
full 1041-line mb.types.  Extract to src/mb.address.pas (~90 lines,
only SysUtils) so they can cp a single file into their project.

mb.types continues to uses mb.address so existing callers see the
type transitively -- BUT FPC does not propagate record-field access
through re-export, so consumers that touch TFTNAddress.Zone/Net/
Node/Point directly must add mb.address to their own uses clause.
All 7 in-tree .uni adapters, 2 examples, 5 test harnesses updated.

No behavioural change.  Full suite passes, multi-target build
green (x86_64-linux, i386-{linux,freebsd,win32,os2,go32v2}).
2026-04-21 13:22:53 -07:00

302 lines
13 KiB
Markdown

# fpc-msgbase — architecture
## Layers
```
┌──────────────────────────────────────────────────┐
│ Caller (BBS, tosser, editor, importer, …) │
└──────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────┐
│ mb.api (TMessageBase, factory, TUniMessage) │
├──────────────────────────────────────────────────┤
│ mb.events mb.lock mb.paths mb.kludge │
├──────────────────────────────────────────────────┤
│ Format backends — two .pas units per format: │
│ mb.fmt.<fmt> - native record + I/O class │
│ mb.fmt.<fmt>.uni - TMessageBase adapter │
│ mb.fmt.hudson(.uni) mb.fmt.jam(.uni) │
│ mb.fmt.squish(.uni) mb.fmt.msg(.uni) │
│ mb.fmt.pcboard(.uni) mb.fmt.ezycom(.uni) │
│ mb.fmt.goldbase(.uni) mb.fmt.wildcat(.uni) │
├──────────────────────────────────────────────────┤
│ RTL: TFileStream, BaseUnix/Windows for locking │
└──────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────┐
│ Sibling: fpc-ftn-transport │
│ tt.pkt.format / tt.pkt.reader (registers │
│ mbfPkt) / tt.pkt.writer / tt.pkt.batch │
│ plus forthcoming BSO / ArcMail / drop modules. │
└──────────────────────────────────────────────────┘
```
PKT is a wire format and lives in `fpc-ftn-transport`, not here.
The `mbfPkt` enum value stays in `mb.types` so `tt.pkt.reader`
can register the backend with the unified-API factory. Consumers
wanting to iterate `.pkt` files just `uses tt.pkt.reader` and
call `MessageBaseOpen(mbfPkt, ...)` as usual. TPacketBatch
(was here as `ma.batch`) moved with it as `tt.pkt.batch`.
**Integration gotcha:** to use a backend through the unified
`TMessageBase` API you must include the `.uni` adapter unit in
your `uses` clause, not just the native `mb.fmt.<format>` unit.
The adapter's `initialization` block is what registers the
backend with the factory.
```pascal
uses
mb.types, mb.events, mb.api,
mb.fmt.jam, mb.fmt.jam.uni; { both — .uni is what registers }
```
Forgetting `.uni` produces `EMessageBase: No backend registered
for JAM` at the first `MessageBaseOpen(mbfJam, ...)` call. The
exception message hints at the fix.
## Polymorphism
Every backend descends from `TMessageBase` and implements the abstract
`DoOpen`, `DoClose`, `DoMessageCount`, `DoReadMessage`, `DoWriteMessage`
contract. Callers can either:
1. Use the unified API — `MessageBaseOpen(format, path, mode)` returns a
`TMessageBase`. Read/write through `TUniMessage`. Format-agnostic.
2. Drop down to format-specific class methods (e.g. `TJamBase.IncModCounter`,
`TSquishBase.SqHashName`) when they need behaviour the unified API cannot
express. Each backend keeps its rich API public.
## TUniMessage — two-area model
```pascal
TUniMessage = record
Body: AnsiString; { only the message text }
Attributes: TMsgAttributes; { everything else, key/value }
end;
```
Two areas, no surprises:
- **Body** carries the user-visible message text and nothing else.
Never kludge lines, never headers, never SEEN-BY/PATH. Always a
ready-to-display blob.
- **Attributes** carries every other piece of data: From, To,
Subject, dates, addresses, attribute bits, FTSC kludges (MSGID,
ReplyID, PID, SEEN-BY, PATH, …), and per-format extras
(`jam.msgidcrc`, `squish.umsgid`, `pcb.confnum`, …).
Same model as RFC 822 email (headers + body). Lossless round-trip
across Read → Write → Read is enforced by the regression suite in
`tests/test_roundtrip_attrs.pas`.
**The library never composes presentation.** A BBS that wants to
display kludges inline walks `Attributes` and prepends `^aMSGID:`
etc. to its own display. A BBS that hides kludges just shows
`Body`. A tosser that needs MSGID for dupe detection reads
`Attributes.Get('msgid')` directly — no body parsing required.
Dates land in `TDateTime` regardless of how the backend stored
them (Hudson `MM-DD-YY` strings with 1950 pivot, Squish FTS-0001
strings, JAM Unix timestamps, PCBoard / EzyCom DOS PackTime).
Stored in attributes as `date.written` / `date.received` via
`SetDate` / `GetDate`.
Format-specific bit fields (Hudson byte attr, JAM 32-bit attr,
Squish attr, MSG word attr, PCB status, EzyCom dual byte) are
unrolled into individual `attr.*` boolean attributes on Read via
`UniAttrBitsToAttributes` and recomposed on Write via
`UniAttrBitsFromAttributes` and the per-format `XxxAttrFromUni`
helpers. The canonical `MSG_ATTR_*` cardinal bitset stays as the
internal pivot.
### High-Water Mark (HWM) — per-user scanner pointer
Tossers, scanners, and editors that want to track "last message I
processed for user X" can use the per-user HWM API on
`TMessageBase`:
```pascal
function SupportsHWM: boolean;
function GetHWM(const UserName: AnsiString): longint;
procedure SetHWM(const UserName: AnsiString; MsgNum: longint);
procedure MapUser(const UserName: AnsiString; UserId: longint);
property ActiveUser: AnsiString; { auto-bump on Read }
```
HWM uses the format's native lastread mechanism, not a sidecar.
A tosser registers itself as just another user (`'NetReader'`,
`'Allfix'`, `'FidoMail-Toss'`) and its HWM lives in the same
file the BBS uses for human-user lastread, so multiple consumers
naturally coexist without colliding.
**Coverage:**
| Format | HWM | Mechanism |
|---|:-:|---|
| JAM | ✓ | `.JLR` (CRC32(lower(name))) |
| Squish | ✓ | `.SQL` (CRC32(lower(name))) |
| Hudson | ✓ | `LASTREAD.BBS` per-(user-id, board); needs `MapUser` + `Board` |
| GoldBase | ✓ | `LASTREAD.DAT` per-(user-id, board); needs `MapUser` + `Board` |
| EzyCom | — | per-user state lives in the BBS user records, not the message base; no msg-base lastread file to plumb |
| Wildcat | — | SDK exposes `MarkMsgRead` per-message but no per-user HWM primitive |
| PCBoard | — | USERS file lastread per-conference; deferred |
| MSG, PKT | — | spec has no HWM concept |
For the multi-board formats (Hudson, GoldBase) the caller must
set both:
- `base.MapUser('NetReader', 60001)` — pick a numeric user ID
(use 60000+ to avoid colliding with real BBS users).
- `base.Board := N` — the board / conference number this scan
is for. The same physical Hudson base contains all 200 boards;
HWM is per-(user, board).
Without either, `GetHWM` returns -1.
For unsupported formats `SupportsHWM` returns false and `GetHWM`
returns -1; `SetHWM` is a no-op. Caller falls back to its own
state for those formats (e.g. NR's dupedb).
**Auto-bump pattern for scanners:**
```pascal
base.ActiveUser := 'NetReader';
for i := 0 to base.MessageCount - 1 do begin
base.ReadMessage(i, msg);
{ ... process msg ... }
{ HWM auto-tracks the highest msg.num seen for NetReader. }
end;
```
When `ActiveUser` is set, `ReadMessage` calls `SetHWM` after each
successful read if the just-read `msg.num` is strictly greater
than the current HWM. Never decrements -- reading a lower-numbered
message is a no-op. Default off (`ActiveUser = ''`).
**Multi-tenant by design:** every scanner / tosser gets its own
slot in the lastread file, keyed by its name. NR as `'NetReader'`,
Allfix as `'Allfix'`, Fimail as `'FidoMail-Toss'` -- they all
coexist in `.JLR` / `.SQL` without interfering with each other or
with human-user lastread.
**Pack/purge** is the format's responsibility: each backend's
Pack rewrites the lastread file in step with the message
renumbering. For JAM and Squish this is handled natively.
### `area` auto-population
When the caller passes an `AAreaTag` to `MessageBaseOpen` (or
sets the `AreaTag` property post-construction), every successful
`ReadMessage` auto-populates `Msg.Attributes['area']` with that
tag — but only if the adapter didn't already populate it from
on-disk data (PKT's AREA kludge, for example).
This saves echomail consumers from having to copy AreaTag into
every message attribute manually. Multi-format scanners always
get a populated `area` when the area is configured.
### Shared kludge plumbing — `mb.kludge`
`mb.kludge` exposes the FTSC-form-kludge parsing/emission helpers
the inline-kludge backends (MSG, PKT) and CtrlInfo-style backend
(Squish) share, plus what JAM's FTSKLUDGE subfield walking uses:
```pascal
function ParseKludgeLine(const Line: AnsiString;
var A: TMsgAttributes): boolean;
procedure SplitKludgeBlob(const RawBody: AnsiString;
out PlainBody: AnsiString;
var A: TMsgAttributes);
function BuildKludgePrefix(const A: TMsgAttributes): AnsiString;
function BuildKludgeSuffix(const A: TMsgAttributes): AnsiString;
```
Consumers that need to parse raw FTSC body blobs (e.g. parity
tests, format converters, debug tools) can call these directly
without reaching into a backend. Single source of truth for
kludge naming, INTL/FMPT/TOPT recognition, and the `kludge.<name>`
forward-compat passthrough.
### Capabilities API — backend self-description
Each backend declares the canonical list of attribute keys it
understands via a class function:
```pascal
class function TMessageBase.ClassSupportedAttributes: TStringDynArray;
```
Callers query before setting:
```pascal
if base.SupportsAttribute('attr.returnreceipt') then
RenderReceiptCheckbox
else
HideReceiptCheckbox;
```
Backends silently ignore unknown attributes on Write (RFC 822
X-header semantics — fine for forward compatibility); the
capabilities API exists so callers know in advance which keys won't
survive on a given format. The full per-format support matrix lives
in `docs/attributes-registry.md`.
## Locking
Three layers, applied in order on every `Open`:
1. **In-process**`TRTLCriticalSection` per `TMessageBase` instance.
2. **Cross-process** — advisory lock on a sentinel file
(`<base>.lck` or, for Squish, `<base>.SQL` so we coexist with other
Squish-aware tools). `fpflock(LOCK_EX|LOCK_SH)` on Unix,
`LockFileEx` on Windows. Retry with backoff up to a configurable
timeout (default 30s). Lock acquire/release fires events.
3. **OS share modes**`fmShareDenyWrite` for writers,
`fmShareDenyNone` for readers, matching DOS-era multi-process sharing
conventions every classic format expects.
## Events
`TMessageEvents` lets callers subscribe one or more handlers to receive
`metBaseOpened`, `metMessageRead`, `metMessageWritten`, `metLockAcquired`,
`metPackProgress`, etc. Internally the dispatcher serialises calls so
handlers do not need to be reentrant.
## Concurrent tossers
`TPacketBatch` (was `ma.batch` here pre-0.4.0; now lives in
`fpc-ftn-transport` as `tt.pkt.batch`) owns a queue of `.pkt`
paths and a worker thread pool. Each worker opens its packet,
reads messages, hands each to the caller-provided processor.
The batch caches one `TMessageBase` per destination area so
writes serialise through layer-1 locking; layer-2 keeps
separate processes (e.g. an editor) safe at the same time.
Class name unchanged for caller compatibility.
## Memory ownership
Shared rule across the fpc-* ecosystem (msgbase, ftn-transport,
binkp, comet, emsi, log):
Public types exposed to callers are either **value records**
(`TFTNAddress`, `TUniMessage`, `TMsgAttributes` — owned by the
caller's stack/heap; copy semantics) or **TObject descendants
the caller constructs and frees** (`TMessageBase` and its
backends). Returned `TBytes` / `string` / `TStream` values are
RTL-managed and the caller frees via normal heap semantics.
The library never allocates memory with `GetMem` and expects
the caller to `FreeMem` (or vice versa). This keeps static-
linked consumers (no shared-heap plugin model like Fastway's
cmem-first pattern) compatible without fiddling.
## Behavioural fidelity
Every format backend is implemented from the published format
specification (FTSC documents and the original format authors' own
spec papers — see `docs/ftsc-compliance.md`). Tests read and write
real sample bases captured from working BBS installations; round-trip
tests verify byte-for-byte preservation across read → write → read
cycles.