Add design proposal: plug-and-forget redesign roadmap
Six-week reshape plan: lossless TMsgRecord with Extras bag, ITsmIO abstraction, TOutboundBatch, single-callback events, explicit locking, typed exception tree. Captures the cross-project conversation with NetReader and the byte-agreement cross-verifier as the first actionable step. Pre-rename baseline.
This commit is contained in:
825
docs/PROPOSAL.md
Normal file
825
docs/PROPOSAL.md
Normal file
@@ -0,0 +1,825 @@
|
||||
# message_api — plug-and-forget redesign proposal
|
||||
|
||||
**Status:** design handover, pre-implementation. Prepared 2026-04-15.
|
||||
**Target audience:** the next implementation session.
|
||||
**Precondition:** message_api is at 0.1.0 (commit `b79e7fb` at the time
|
||||
of writing); 9 format backends ported from Allfix, 7/7 tests green.
|
||||
|
||||
---
|
||||
|
||||
## TL;DR
|
||||
|
||||
message_api today solves the *read* side cleanly and ports 9 formats
|
||||
with tested fidelity. It's **not yet plug-and-forget** for embedders
|
||||
like comet is. Four things are missing: a lossless round-trip
|
||||
guarantee, an atomic outbound-packet builder (`TOutboundBatch`),
|
||||
format-agnostic I/O injection, and a single-callback event model.
|
||||
|
||||
This proposal is **not a rewrite** — it's a six-week reshape in
|
||||
place. The format backends stay. The scaffolding changes.
|
||||
|
||||
When done, the acceptance test is a five-line Hello World that opens
|
||||
a path, reads messages, closes. No format name, no lock ceremony,
|
||||
no event registry, no `.uni` sidecar unit, no init-order hazard.
|
||||
|
||||
---
|
||||
|
||||
## Context: where this proposal came from
|
||||
|
||||
A cross-project conversation between NetReader (`../netreader`, the
|
||||
HPT-drop-in mail tosser) and message_api. NetReader has its own
|
||||
format layer (`src/core/nr.msgbase.*`), also Allfix-based, hardened
|
||||
by a 39/39 live trial and a 54/54 CLI-parity test suite.
|
||||
|
||||
During that conversation we compared:
|
||||
|
||||
1. Do the two libraries agree on bytes when reading/writing the same
|
||||
base? (**Unverified. First action item below.**)
|
||||
2. What would "plug it in and forget" look like for message_api?
|
||||
|
||||
The conclusions that drove this proposal:
|
||||
|
||||
- message_api's `TPacketBatch` handles the **inbound** concurrent
|
||||
tosser side cleanly. It **does not** handle the symmetric outbound
|
||||
side (writing per-link packets with size rotation and atomic
|
||||
finalize). Neither library does, today.
|
||||
- `TUniMessage` as-currently-defined is **lossy**: JAM
|
||||
`MsgIdCRC`/`ReplyCRC`, `DateProcessed`, and every other
|
||||
format-specific header field has no canonical slot, so a caller
|
||||
that does `read → write` through the unified API silently loses
|
||||
bytes.
|
||||
- The two-unit-per-format inclusion pattern (`ma.fmt.jam` +
|
||||
`ma.fmt.jam.uni`) asks callers to remember both. The `.uni`
|
||||
sidecar's only job is `initialization` registration — it belongs
|
||||
*inside* the format unit.
|
||||
- `wc_sdk/` is ~66K lines of BTrieve-era Pascal pulled in
|
||||
unconditionally by any caller who uses Wildcat. Embeddability
|
||||
requires build-time gating.
|
||||
- The error model mixes typed exceptions (`EMessageBase` from the
|
||||
factory) with boolean returns (all other methods) with event
|
||||
fires (locking). Three mental models for one library is two too
|
||||
many.
|
||||
|
||||
The reshape below addresses each.
|
||||
|
||||
---
|
||||
|
||||
## Current state — file-by-file
|
||||
|
||||
| Unit | LOC | Purpose | Keep? |
|
||||
|---|---|---|---|
|
||||
| `src/ma.api.pas` | ~330 | `TMessageBase` abstract, factory | **Reshape** |
|
||||
| `src/ma.types.pas` | ~600 | `TUniMessage`, `TFTNAddress`, bits | **Reshape** |
|
||||
| `src/ma.events.pas` | ~100 | multi-subscriber event registry | **Replace** with single callback |
|
||||
| `src/ma.lock.pas` | ~250 | 3-layer locking | **Keep**, make explicit |
|
||||
| `src/ma.paths.pas` | ~180 | per-format path derivation | **Keep** |
|
||||
| `src/ma.batch.pas` | 333 | inbound `TPacketBatch` | **Keep** + add `TOutboundBatch` twin |
|
||||
| `src/formats/ma.fmt.<fmt>.pas` × 9 | 400–1300 each | native Allfix-based backend | **Keep** — hardened |
|
||||
| `src/formats/ma.fmt.<fmt>.uni.pas` × 9 | 100–200 each | adapter to `TUniMessage` | **Fold into** the format unit |
|
||||
| `src/wc_sdk/` | ~40K | Wildcat BTrieve SDK | **Gate** behind `{$DEFINE MA_WITH_WILDCAT}` |
|
||||
|
||||
Tests: `tests/test_*.pas` → keep, extend with the round-trip
|
||||
corpus below.
|
||||
|
||||
---
|
||||
|
||||
## The five-line acceptance test
|
||||
|
||||
When the reshape is done, this works unchanged across every format:
|
||||
|
||||
```pascal
|
||||
uses OpenMsgLib, OpenMsgLib.Auto;
|
||||
var B: TMsgBase; M: TMsgRecord; i: integer;
|
||||
B := OpenMsgBase('/path/to/base', momReadOnly);
|
||||
try
|
||||
for i := 0 to B.Count - 1 do
|
||||
if B.Read(i, M) then WriteLn(M.Subject);
|
||||
finally B.Free; end;
|
||||
```
|
||||
|
||||
No format name (autodetect). No lock call (default = no lock, caller
|
||||
opts in). No event wiring (default = silent). No `.uni` unit (one
|
||||
unit per format registers itself). No init-order hazard (single
|
||||
registration point).
|
||||
|
||||
If a new session reads a test base and the above doesn't work
|
||||
verbatim, the reshape isn't done.
|
||||
|
||||
---
|
||||
|
||||
## Target architecture
|
||||
|
||||
### Three-tier layering
|
||||
|
||||
```
|
||||
Caller (BBS, tosser, editor, importer)
|
||||
│
|
||||
├── TMsgBase (unified) ← most callers live here
|
||||
├── Direct format class ← drop-down when Extras aren't enough
|
||||
└── Raw stream ← replay, test, encrypt, mock
|
||||
│
|
||||
▼
|
||||
Format backends speak to ITsmIO — never TFileStream directly
|
||||
│
|
||||
▼
|
||||
ITsmIO adapters: file | memory | encrypted | test harness
|
||||
```
|
||||
|
||||
**Why add ITsmIO:**
|
||||
- Test backends without hitting disk (in-memory fixtures)
|
||||
- Wrap the lib in encryption/compression
|
||||
- Run on network mounts with unusual locking semantics
|
||||
- Replay captured corrupt frames for debugging
|
||||
|
||||
### `TMsgRecord` with lossless `Extras`
|
||||
|
||||
```pascal
|
||||
TMsgRecord = record
|
||||
{ Universal fields every format has: }
|
||||
Index: longint;
|
||||
From, To_: AnsiString;
|
||||
Subject: AnsiString;
|
||||
DateWritten: TDateTime;
|
||||
DateArrived: TDateTime;
|
||||
OrigAddr: TFtnAddr;
|
||||
DestAddr: TFtnAddr;
|
||||
Attr: cardinal; { canonical MSG_ATTR_* bitset }
|
||||
Body: AnsiString; { kludges + text, CR-separated }
|
||||
|
||||
{ Backend-specific fields preserved verbatim across round-trips.
|
||||
Every key a backend WRITES during a Read it MUST re-consume
|
||||
during a subsequent Write. Test harness enforces this. }
|
||||
Extras: TMsgExtras;
|
||||
end;
|
||||
|
||||
TMsgExtras = record
|
||||
function Get(const Key, Default: AnsiString): AnsiString;
|
||||
procedure SetValue(const Key, Value: AnsiString);
|
||||
function Has(const Key: AnsiString): boolean;
|
||||
end;
|
||||
```
|
||||
|
||||
**Well-known keys** (published in `docs/extras-registry.md`, to be
|
||||
written):
|
||||
|
||||
| Key | Type | Source | Notes |
|
||||
|---|---|---|---|
|
||||
| `jam.msgidcrc` | hex u32 | JAM `.jhr` fixed header | needed for NR linker `-j` fast path |
|
||||
| `jam.replycrc` | hex u32 | JAM `.jhr` fixed header | same |
|
||||
| `jam.dateprocessed` | unix int | JAM `.jhr` | tosser timestamp |
|
||||
| `jam.passwordcrc` | hex u32 | JAM `.jhr` | per-msg password |
|
||||
| `jam.cost` | int | JAM `.jhr` | |
|
||||
| `squish.umsgid` | hex u32 | SQI frame | unique msg ID |
|
||||
| `hudson.board` | int 1..200 | MSGINFO | board number |
|
||||
| `hudson.refer` | int | HDR | refer-to ptr |
|
||||
| `pcb.confnum` | int | PCB `.IDX` | conference |
|
||||
| `pcb.refer` | int | PCB `.IDX` | |
|
||||
| `ezy.msgflags` | hex byte | MH#####.BBS | |
|
||||
| `goldbase.userid` | int | IDX | |
|
||||
| `wildcat.confnum` | int | WC SDK | |
|
||||
| `pkt.cost` | int | Type-2 header | |
|
||||
| `pkt.flavour` | enum | outbound pkt only | crash/hold/direct/imm/norm |
|
||||
|
||||
**Round-trip invariant:**
|
||||
> After `Read(i, M)`, calling `Write(M)` on the same or a different
|
||||
> base of the same format produces bytes that round-trip through
|
||||
> `Read` to an identical `TMsgRecord` (same Extras keys, same values).
|
||||
|
||||
Tests enforce this across the full corpus.
|
||||
|
||||
### Error model — one tree
|
||||
|
||||
```
|
||||
EMessageBase
|
||||
├─ EMessageBaseIO disk full, permission, corrupt read
|
||||
├─ EMessageBaseLock timeout, contention, deadlock
|
||||
├─ EMessageBaseFormat bad signature, truncated header
|
||||
├─ EMessageBaseRange Index out of [0..Count)
|
||||
└─ EMessageBaseClosed operation on a freed/closed base
|
||||
```
|
||||
|
||||
Every method either **succeeds** or **raises**. Boolean returns mean
|
||||
"nothing to do" (empty Pack, no new messages, no matching records) —
|
||||
never failure. A single `try/except on E: EMessageBase do ...`
|
||||
catches the whole tree.
|
||||
|
||||
### Locking — explicit, not implicit
|
||||
|
||||
```pascal
|
||||
TMsgBase = class
|
||||
procedure LockForRead;
|
||||
procedure LockForWrite;
|
||||
procedure Unlock;
|
||||
function TryLockForRead (TimeoutMs: integer): boolean;
|
||||
function TryLockForWrite(TimeoutMs: integer): boolean;
|
||||
|
||||
{ Common-case one-liners. }
|
||||
procedure WithReadLock (AProc: TProc);
|
||||
procedure WithWriteLock(AProc: TProc);
|
||||
end;
|
||||
```
|
||||
|
||||
Default `Open` acquires **no lock**. Callers choose. A read-only
|
||||
BBS frontend doesn't need cross-process locking; a tosser does.
|
||||
Library doesn't guess. Common case stays a one-liner via
|
||||
`WithReadLock`.
|
||||
|
||||
### Transactions — declared
|
||||
|
||||
```pascal
|
||||
base.BeginTransaction;
|
||||
try
|
||||
for msg in batch do base.WriteMessage(msg);
|
||||
base.Commit;
|
||||
except
|
||||
base.Rollback;
|
||||
raise;
|
||||
end;
|
||||
```
|
||||
|
||||
Backends implement per-format:
|
||||
- **JAM:** defer `.jdx` index updates in memory, flush on Commit.
|
||||
- **SDM:** temp-dir shadow, rename on Commit.
|
||||
- **Hudson:** in-memory index delta, flush on Commit.
|
||||
- **Squish:** frame-list delta, flush on Commit.
|
||||
- **PKT (write):** temp file, rename on Commit.
|
||||
- **Read-only mode:** Commit is a no-op.
|
||||
|
||||
Callers who don't call `Begin/Commit` still work — writes flush
|
||||
per-call. Transactions are opt-in for atomicity.
|
||||
|
||||
### Events — single callback
|
||||
|
||||
```pascal
|
||||
TMsgEvent = record
|
||||
EventType: TMsgEventType; { BaseOpened, MessageRead, etc. }
|
||||
Source: TMsgBase; { may be nil for lib-global events }
|
||||
Subject: AnsiString; { path, area tag, msgid }
|
||||
Detail: AnsiString; { human-readable }
|
||||
LongValue: int64; { count, size, offset }
|
||||
TimeStamp: TDateTime;
|
||||
end;
|
||||
|
||||
TMsgEventCallback = procedure(const E: TMsgEvent) of object;
|
||||
|
||||
base.OnEvent := @MyHandler; { one pointer, that's it }
|
||||
```
|
||||
|
||||
comet does this with `OnLog`. One callback, caller multiplexes if
|
||||
they need multiple observers. No multi-subscriber registry. Radical
|
||||
simplification.
|
||||
|
||||
### Format detection is the default API
|
||||
|
||||
```pascal
|
||||
{ Primary — sniff the path, pick the backend. }
|
||||
function OpenMsgBase(const Path: AnsiString;
|
||||
Mode: TMsgOpenMode): TMsgBase;
|
||||
|
||||
{ Escape hatch — force a specific format. }
|
||||
function OpenMsgBaseAs(Format: TMsgBaseFormat;
|
||||
const Path: AnsiString;
|
||||
Mode: TMsgOpenMode): TMsgBase;
|
||||
```
|
||||
|
||||
Fingerprints:
|
||||
- JAM: `.jhr` + `.jdx` pair, `"JAM\0"` signature
|
||||
- Squish: `.sqd` + `.sqi`
|
||||
- Hudson: `MSGINFO.BBS` + `MSGHDR.BBS` + ...
|
||||
- GoldBase: `MSGINFO.DAT` + ...
|
||||
- PCBoard: `.IDX` + `.MSG` pair
|
||||
- EzyCom: `MH*.BBS` + `MT*.BBS`
|
||||
- Wildcat: WC SDK marker file
|
||||
- SDM: directory full of numbered `.msg` files (fallback)
|
||||
|
||||
### One unit per format, self-registering
|
||||
|
||||
```pascal
|
||||
uses
|
||||
OpenMsgLib, { core + factory }
|
||||
OpenMsgLib.Jam, { registers JAM format in initialization }
|
||||
OpenMsgLib.Hudson; { registers Hudson in initialization }
|
||||
```
|
||||
|
||||
No `.uni` split. The format unit's `initialization` block calls
|
||||
`RegisterFormat(..., @Factory)`. Inclusion is the registration.
|
||||
|
||||
### `TOutboundBatch` — the missing half of the tosser
|
||||
|
||||
The symmetric twin of `TPacketBatch` (inbound). Details in the
|
||||
appendix; the shape:
|
||||
|
||||
```pascal
|
||||
TOutboundBatch = class
|
||||
constructor Create(const AOutboundDir: AnsiString;
|
||||
const AOurAka: TFtnAddr);
|
||||
|
||||
{ Append a message to the outbound packet for (Target, Flavour).
|
||||
Caches the open pkt per (Target, Flavour) pair so repeat calls
|
||||
write to the same file until rotation or Flush. }
|
||||
function DispatchMessage(const Msg: TMsgRecord;
|
||||
const Target: TFtnAddr;
|
||||
Flavour: FlavourType): boolean;
|
||||
|
||||
{ Finalise every cached pkt — write terminator, rename .tmp →
|
||||
final, update .flo if configured. Idempotent. }
|
||||
procedure Flush;
|
||||
|
||||
property MaxPktSizeKB: longint; { 0 = unlimited; rotate at threshold }
|
||||
property OnEvent: TMsgEventCallback;
|
||||
end;
|
||||
```
|
||||
|
||||
Two features missing in both NR and message_api today, baked in
|
||||
here from day one:
|
||||
|
||||
- **Packet size rotation.** Before each write, check cached stream
|
||||
size + estimated msg size. If over threshold, close current
|
||||
(`.tmp` → `.pkt` rename), open next.
|
||||
- **Atomic finalize.** Writes go to `xxxxxxxx.pkt.tmp`. Only `Flush`
|
||||
(or rotation) renames to `.pkt`. Crash mid-run leaves an orphan
|
||||
`.tmp` — not a corrupt real packet.
|
||||
|
||||
Format-agnostic: writes to Type-2 / 2+ / 2.2 pkts via the existing
|
||||
`ma.fmt.pkt` backend. FTN routing stays in the caller (NR's
|
||||
`FindRouteForNetmail`/`GetLinkForRoute` is fidoconf-specific, not
|
||||
library material).
|
||||
|
||||
---
|
||||
|
||||
## Reference map — where to look in existing code
|
||||
|
||||
### Inside message_api
|
||||
|
||||
| Concern | File | Relevant lines |
|
||||
|---|---|---|
|
||||
| Factory pattern to extend | `src/ma.api.pas` | full file |
|
||||
| `TUniMessage` to grow with Extras | `src/ma.types.pas` | record definition, top of file |
|
||||
| Existing inbound tosser to mirror for outbound | `src/ma.batch.pas` | `TPacketBatch`, `GetOrCreateBase` (line 296) |
|
||||
| Example tosser template | `examples/example_tosser.pas` | `TSimpleTosser.OnMessage` |
|
||||
| Existing lock layer to make explicit | `src/ma.lock.pas` | full file |
|
||||
| Event registry to replace with single callback | `src/ma.events.pas` | full file |
|
||||
| Per-format native → uni adapters to fold together | `src/formats/*.uni.pas` × 9 | each is ~100-200 LOC |
|
||||
| Sample data for tests (if any) | `tests/` | check what's already there |
|
||||
|
||||
### In NetReader (`../netreader`)
|
||||
|
||||
NR has already solved several of these in its own idiom. The
|
||||
outbound machinery is the most reusable reference:
|
||||
|
||||
| Concern | File | Relevant lines |
|
||||
|---|---|---|
|
||||
| Cached outbound packet per dest addr | `src/core/nr.scanner.pas` | `GetOutboundPacket` line 214, `CloseAllPackets` line 291 |
|
||||
| Temp-pkt filename generator | `src/core/nr.scanner.pas` | `CreateTempPktFileName` call at 256 |
|
||||
| Per-message route + pack logic | `src/core/nr.scanner.pas` | `PackMsg` line 551 |
|
||||
| Route resolution (stays in NR) | `src/core/nr.scanner.pas` | `FindRouteForNetmail` 398, `GetLinkForRoute` 451 |
|
||||
| Priority from Attr + FLAGS kludge | `src/core/nr.scanner.pas` | `PackMsg` lines 609-636 |
|
||||
| Zone-aware outbound path | `src/core/nr.arcmail.pas` | `GetOutboundDir` (around line 178), `GetFLOPath` 219 |
|
||||
| IsArcMailExt helper | `src/core/nr.arcmail.pas` | `IsArcMailExt` line 155 |
|
||||
| Pkt header + message write | `src/core/nr.packet.pas` | `WritePktHeader`, `WritePktMessage` |
|
||||
| Scanner's NoHighWaters pattern (useful for read-only verifiers) | `src/core/nr.scanner.pas` | `ScanNMArea` 1098 |
|
||||
| JAM header-only read (NR's linker CRC fast path) | `src/msgbase/nr.msgbase.jam.pas` | `TNrJamMsgBase.ReadLinkHdr`, `TNrJamLinkHdr` record |
|
||||
|
||||
NR's entire CLI parity pass (last 7 commits in `../netreader`) is
|
||||
built on this scaffolding. Pattern should transfer cleanly.
|
||||
|
||||
### In Allfix (`../Allfix`, assumed)
|
||||
|
||||
The Allfix source is the reference implementation for every format
|
||||
backend. Use `msgutil` / `domsg` behavior as the ground truth.
|
||||
**Preferably** dump Allfix's read of each test-base into a JSON
|
||||
reference file and diff the new lib's output against it.
|
||||
|
||||
### In comet (`../comet`)
|
||||
|
||||
comet is the "plug it in and forget" model. Key patterns worth
|
||||
mirroring:
|
||||
|
||||
- Single log callback (`OnLog`), no registry
|
||||
- `TStream`-centric I/O — caller controls the stream
|
||||
- Config hot-reload without API surgery
|
||||
- Embeddable via narrow callback surface
|
||||
|
||||
See `../comet/README.md` "Embeddable" bullet.
|
||||
|
||||
---
|
||||
|
||||
## Test corpus — the `testmsg/` folder
|
||||
|
||||
**Status:** to be populated. Proposal: live at
|
||||
`/home/ken/Source Code/message_api/testmsg/` (checked in, anonymized,
|
||||
git-tracked so rollback is just `git checkout`).
|
||||
|
||||
Proposed structure:
|
||||
|
||||
```
|
||||
testmsg/
|
||||
├── README.md how-to regenerate, licensing notes
|
||||
├── jam/
|
||||
│ ├── small_echo/ few-hundred-msg JAM area, real echoarea
|
||||
│ ├── large_echo/ 10k+ messages, stresses index growth
|
||||
│ ├── deleted_mix/ area with tombstoned msgs for Pack tests
|
||||
│ └── netmail/ JAM netmail w/ kludges
|
||||
├── squish/
|
||||
│ ├── small_echo/
|
||||
│ └── netmail/
|
||||
├── hudson/
|
||||
│ └── 3-board/ multi-board fixture for Board field test
|
||||
├── msg/ FTS-1 numbered *.msg
|
||||
│ ├── netmail/
|
||||
│ └── echo/ rare but tested
|
||||
├── pcb/
|
||||
│ └── sample_conf/
|
||||
├── ezycom/
|
||||
├── goldbase/
|
||||
├── wildcat/
|
||||
├── pkt/
|
||||
│ ├── type2_plain/
|
||||
│ ├── type2plus/
|
||||
│ └── type2_2/
|
||||
└── reference/
|
||||
├── jam_small_echo.json Allfix-dumped canonical JSON
|
||||
├── jam_large_echo.json (round-trip baseline — DO NOT edit)
|
||||
└── ... one per corpus base
|
||||
```
|
||||
|
||||
**Anonymization rules:** before check-in, scrub real user addresses
|
||||
and passwords from kludges. A small helper `tools/anonymize.pas`
|
||||
can do this deterministically — replaces real MSGID address with
|
||||
`999:9999/9999`, replaces user names with `User<N>` tokens.
|
||||
|
||||
**Regeneration script:** `tools/regen_reference.sh` runs Allfix
|
||||
against each corpus base, dumps canonical JSON to
|
||||
`testmsg/reference/`. Committed output is the ground truth; tests
|
||||
diff the new lib's output against committed JSON.
|
||||
|
||||
**Rollback story:** `git checkout testmsg/` restores any corrupted
|
||||
fixture. Keep fixtures small-ish (<50MB total across all formats)
|
||||
so the repo stays cloneable on slow links.
|
||||
|
||||
---
|
||||
|
||||
## Byte-agreement cross-verifier — first actionable task
|
||||
|
||||
Before any redesign, **confirm the two libraries agree on the
|
||||
bytes they read and write today**. Without this baseline, reshape
|
||||
could silently regress behavior that currently works.
|
||||
|
||||
**Tool:** `tools/cross_verify.pas` — standalone FPC program.
|
||||
|
||||
**What it does:**
|
||||
|
||||
1. Open a corpus base with the existing message_api.
|
||||
2. Read all messages → dump each to a canonical JSON record.
|
||||
3. Open the same base with NetReader's `nr.msgbase.*`.
|
||||
4. Read all messages → dump to same canonical JSON format.
|
||||
5. Diff the two outputs. Report mismatches per field.
|
||||
|
||||
For write verification:
|
||||
1. Fabricate 100 messages with known content.
|
||||
2. Write them through message_api to a fresh JAM base.
|
||||
3. Read them back through NR's backend, diff.
|
||||
4. Repeat with write via NR, read via message_api.
|
||||
|
||||
Expected outcome: both should agree for universal fields. Where
|
||||
they disagree is where the proposal's `Extras` story becomes
|
||||
important — those are the fields each side handles differently
|
||||
(or one side silently drops).
|
||||
|
||||
**Deliverable:** a report noting exactly which fields differ per
|
||||
format, so the Extras registry (above) is anchored in real data
|
||||
rather than speculation.
|
||||
|
||||
This is ~2-4 hours of work and produces the single most important
|
||||
input to the reshape.
|
||||
|
||||
---
|
||||
|
||||
## Implementation plan — six weeks
|
||||
|
||||
| Week | Deliverables |
|
||||
|---|---|
|
||||
| **1** | `testmsg/` corpus committed (3-5 bases per format, anonymized). `tools/cross_verify.pas` running. `tools/regen_reference.sh` producing committed canonical JSON. Baseline: which fields differ between message_api and NR today? |
|
||||
| **2** | Grow `TUniMessage` → `TMsgRecord` with `Extras` bag. Publish `docs/extras-registry.md` naming every well-known key. Backfill round-trip test: every backend reads a corpus base, writes to a fresh base, reads back, Extras map preserved. Fold `.uni` sidecars into format units. |
|
||||
| **3** | Introduce `ITsmIO`. Refactor every format backend to take an `ITsmIO` instead of `TFileStream` directly. Add `TMemoryTsmIO` for tests. Run the full test suite in memory — no disk writes. |
|
||||
| **4** | Land `TOutboundBatch` with size rotation + atomic finalize. Write `examples/example_multiplex` that splits inbound pkts by destination and writes per-link outbound pkts via `TOutboundBatch`. |
|
||||
| **5** | Unify error model (typed `EMessageBase*` tree; boolean returns mean "nothing to do" only). Replace `TMessageEvents` registry with single `OnEvent` callback. `TMsgBase` locking becomes explicit (no implicit `LockForRead` on Open). `WithReadLock` / `WithWriteLock` helpers. |
|
||||
| **6** | Documentation pass. Full API reference regen. `CHANGELOG.md` with 0.1 → 1.0 migration notes. Version constants `MA_VERSION_MAJOR`/`MINOR` + runtime `MaRequireVersion`. `{$DEFINE MA_WITH_<FORMAT>}` gates finalized. Ship 1.0. |
|
||||
|
||||
Each week's work ships independently — no big-bang merge.
|
||||
|
||||
---
|
||||
|
||||
## Beyond the six-week plan — things worth planning
|
||||
|
||||
### Version ABI discipline
|
||||
|
||||
```pascal
|
||||
const
|
||||
MA_VERSION_MAJOR = 1;
|
||||
MA_VERSION_MINOR = 0;
|
||||
|
||||
{ Caller invokes at program start. Raises if compiled against
|
||||
major < required or (major == required AND minor < required). }
|
||||
procedure MaRequireVersion(Major, Minor: integer);
|
||||
```
|
||||
|
||||
Libraries consumed by multiple callers (Allfix, NR, third-party)
|
||||
need a noisy "you linked the wrong version" failure mode. Not
|
||||
"weird behavior six months later."
|
||||
|
||||
### Read-only airtightness
|
||||
|
||||
`momReadOnly` should be verifiable. Regression test:
|
||||
|
||||
1. Copy a base to a read-only mount (`chmod -w` + `mount -o ro`).
|
||||
2. `OpenMsgBase(path, momReadOnly)`.
|
||||
3. Read 10k messages, every field.
|
||||
4. Assert: every file's `st_mtime` is unchanged; no syscall fired
|
||||
that opens a file for write.
|
||||
|
||||
A BBS running concurrently with a tosser depends on this being
|
||||
airtight — no "oops, we updated last-read" surprises.
|
||||
|
||||
### Build-time format gating
|
||||
|
||||
```pascal
|
||||
{ config.inc }
|
||||
{$DEFINE MA_WITH_JAM}
|
||||
{$DEFINE MA_WITH_HUDSON}
|
||||
{$DEFINE MA_WITH_SQUISH}
|
||||
{ $DEFINE MA_WITH_WILDCAT} { commented out — 66K LOC pulled in }
|
||||
```
|
||||
|
||||
```pascal
|
||||
{ OpenMsgLib.All.pas — convenience include-all unit }
|
||||
unit OpenMsgLib.All;
|
||||
interface
|
||||
uses
|
||||
OpenMsgLib
|
||||
{$IFDEF MA_WITH_JAM}, OpenMsgLib.Jam{$ENDIF}
|
||||
{$IFDEF MA_WITH_HUDSON}, OpenMsgLib.Hudson{$ENDIF}
|
||||
...
|
||||
;
|
||||
end.
|
||||
```
|
||||
|
||||
Embedders control what ships. Allfix and NR want everything.
|
||||
A minimal BBS UI wants JAM only. No one should be forced to
|
||||
compile BTrieve-era SDK code for a JAM reader.
|
||||
|
||||
### Documentation structure
|
||||
|
||||
```
|
||||
docs/
|
||||
├── API.md full API reference (regen each 1.x)
|
||||
├── architecture.md layered design (update to three-tier)
|
||||
├── extras-registry.md well-known Extras keys per format
|
||||
├── ftsc-compliance.md spec notes
|
||||
├── migration-0.1-to-1.0.md for existing callers
|
||||
├── embedder-guide.md for BBS/tosser authors
|
||||
└── format-notes/ per-format quirks & gotchas
|
||||
├── jam.md
|
||||
├── squish.md
|
||||
└── ...
|
||||
```
|
||||
|
||||
### CI expectations
|
||||
|
||||
When the repo has CI (GitHub Actions, GitLab CI, whatever), the
|
||||
test job is:
|
||||
|
||||
1. Build with all formats enabled.
|
||||
2. Build with each format *disabled* in turn — prove conditional
|
||||
compilation holds.
|
||||
3. Run `run_tests.sh`.
|
||||
4. Run `tools/cross_verify.pas` against the corpus.
|
||||
5. Verify `docs/extras-registry.md` lists every key any backend
|
||||
writes (grep the source for `Extras.SetValue`).
|
||||
|
||||
---
|
||||
|
||||
## Contribution path back from NetReader
|
||||
|
||||
NR has six weeks of HPT-parity work sitting on top of its own
|
||||
msgbase. When the reshape hits 1.0, NR has a decision:
|
||||
|
||||
- **A.** Adopt message_api wholesale — drop `nr.msgbase.*`, call
|
||||
the lib. NR becomes a thin areafix + scanner + CLI over the
|
||||
shared library. Big commit, one-time pain.
|
||||
- **B.** Keep NR's backends, cherry-pick message_api's event
|
||||
dispatcher / lock model / outbound batch. Lighter touch.
|
||||
- **C.** Contribute NR's improvements (JAM CRC fast-path,
|
||||
case-insensitive `.msg` globbing, netmail-cfg-writer fix,
|
||||
FTS-0004 tag validation) back to message_api. Symmetric win.
|
||||
|
||||
Option C is the first step regardless of whether A or B lands
|
||||
later. NR's recent work is format-agnostic quality improvements
|
||||
that every embedder wants.
|
||||
|
||||
---
|
||||
|
||||
## Open questions for the implementation session
|
||||
|
||||
Before any code changes:
|
||||
|
||||
1. **Extras representation.** Key-value `AnsiString` bag is dead
|
||||
simple but slow for 100K messages. Alternatives:
|
||||
- `TDictionary<AnsiString, AnsiString>` (rtl-generics) — faster lookup
|
||||
- Packed binary blob with offsets — smallest memory footprint
|
||||
- Keep strings but cap each Extras to N keys (fixed-size array)
|
||||
|
||||
Benchmark before picking.
|
||||
|
||||
2. **Transaction nesting.** Can `BeginTransaction` nest? JAM defers
|
||||
`.jdx` updates in memory — nested transactions just keep
|
||||
deferring. SDM's tempdir-shadow approach can't nest cleanly.
|
||||
Propose: **no nesting**. Second `BeginTransaction` call raises.
|
||||
|
||||
3. **Thread safety of `TMsgBase`.** Inbound `TPacketBatch` shares
|
||||
one base across workers, serialises via per-base CS. Works today.
|
||||
Does the reshape preserve that? Answer: yes, the explicit
|
||||
lock API makes it *more* obvious.
|
||||
|
||||
4. **Squish `.sql` lastread vs `.sqi` index.** Current Allfix
|
||||
treats `.sql` as the cross-process lock sentinel. The reshape
|
||||
should document this explicitly — it's a format-specific quirk
|
||||
that callers shouldn't need to know.
|
||||
|
||||
5. **PKT as a base vs as a stream.** `TMessageBase` abstraction
|
||||
assumes random-access read. A PKT is a forward-only stream of
|
||||
messages. Does PKT implement `Count`? (Reader would need to
|
||||
scan ahead to count.) Propose: PKT implements `Count` but
|
||||
flags `CanRandomAccess = False`, caller iterates via `MoveNext`
|
||||
instead of `Read(i)`. Callers who treat PKT as a base get a
|
||||
clear exception.
|
||||
|
||||
6. **Wildcat SDK cleanup.** Is the 40K-LOC `wc_sdk/` still needed,
|
||||
or can it be replaced with a narrower interop layer? (Not a
|
||||
1.0 blocker, but worth scoping for 1.1.)
|
||||
|
||||
Decisions on these should go in `docs/design-decisions.md` as
|
||||
they land, so future sessions don't re-litigate.
|
||||
|
||||
---
|
||||
|
||||
## First-session actionable steps
|
||||
|
||||
If this proposal makes sense to the next implementer, the first
|
||||
session should:
|
||||
|
||||
1. **Read this document end-to-end.** Cross-check my mapping of
|
||||
message_api's current state against the actual code — call out
|
||||
anything stale.
|
||||
|
||||
2. **Create `testmsg/` with a corpus.** Start with ONE format
|
||||
(JAM — best-spec'd, most used) and 3 bases: small echoarea
|
||||
(~100 msgs), large echoarea (~10K), netmail directory. Commit.
|
||||
|
||||
3. **Write `tools/cross_verify.pas`.** Use NR's `nr.msgbase.jam`
|
||||
and message_api's `ma.fmt.jam` as the two readers. Dump
|
||||
canonical JSON, diff. Report.
|
||||
|
||||
4. **Report the diff.** What fields do the two libraries disagree
|
||||
on today? That diff becomes the initial `extras-registry.md`.
|
||||
|
||||
5. **Stop. Discuss.** Before any backend refactor, the cross-diff
|
||||
report informs every architectural decision below. If the two
|
||||
libraries disagree on 20% of fields, the Extras story is
|
||||
validated. If they agree on 100%, the Extras story is more
|
||||
about future-proofing than immediate need.
|
||||
|
||||
Only after the cross-diff is in hand does the week-1 plan above
|
||||
make sense.
|
||||
|
||||
---
|
||||
|
||||
## Appendix: the `TOutboundBatch` design in full
|
||||
|
||||
(From the cross-project conversation; reproduced here for
|
||||
completeness.)
|
||||
|
||||
```pascal
|
||||
unit OpenMsgLib.Outbound;
|
||||
|
||||
{$mode objfpc}{$H+}
|
||||
|
||||
interface
|
||||
|
||||
uses
|
||||
Classes, SysUtils,
|
||||
OpenMsgLib, OpenMsgLib.Types, OpenMsgLib.Pkt;
|
||||
|
||||
type
|
||||
TOutboundBatch = class
|
||||
private
|
||||
FOutboundDir: AnsiString;
|
||||
FOurAka: TFtnAddr;
|
||||
FMaxPktSizeKB: longint;
|
||||
FCacheCS: TRTLCriticalSection;
|
||||
FCache: TFPHashList; { key: "zone:net/node.point|flavour" -> TEntry }
|
||||
FOnEvent: TMsgEventCallback;
|
||||
|
||||
function GetOrCreateEntry(const Target: TFtnAddr;
|
||||
Flavour: FlavourType): TEntry;
|
||||
procedure RotateIfOver(Entry: TEntry; EstimatedSize: integer);
|
||||
procedure FinalizeEntry(Entry: TEntry); { rename .tmp → final }
|
||||
public
|
||||
constructor Create(const AOutboundDir: AnsiString;
|
||||
const AOurAka: TFtnAddr);
|
||||
destructor Destroy; override;
|
||||
|
||||
function DispatchMessage(const Msg: TMsgRecord;
|
||||
const Target: TFtnAddr;
|
||||
Flavour: FlavourType): boolean;
|
||||
procedure Flush; { finalize every cached entry }
|
||||
|
||||
property MaxPktSizeKB: longint read FMaxPktSizeKB write FMaxPktSizeKB;
|
||||
property OnEvent: TMsgEventCallback read FOnEvent write FOnEvent;
|
||||
end;
|
||||
```
|
||||
|
||||
**Per-entry:**
|
||||
|
||||
```pascal
|
||||
TEntry = class
|
||||
Key: AnsiString; { "zone:net/node.point|flavour" }
|
||||
Target: TFtnAddr;
|
||||
Flavour: FlavourType;
|
||||
Stream: TStream; { writer, actually holds an ITsmIO under the hood }
|
||||
TmpPath: AnsiString; { xxxxxxxx.pkt.tmp }
|
||||
FinalPath: AnsiString; { xxxxxxxx.pkt — set on Rotate/Flush }
|
||||
CS: TRTLCriticalSection; { serialises writes to this one pkt }
|
||||
WrittenSize: int64;
|
||||
end;
|
||||
```
|
||||
|
||||
**Flow inside `DispatchMessage`:**
|
||||
|
||||
1. Lookup entry by `(Target, Flavour)` in `FCache`.
|
||||
- Miss: create entry, open `TmpPath`, write pkt header, cache.
|
||||
- Hit: enter its CS.
|
||||
2. `RotateIfOver(Entry, EstimatedMsgSize)`:
|
||||
- If `WrittenSize + EstimatedMsgSize > MaxPktSizeKB * 1024`
|
||||
(and `MaxPktSizeKB > 0`):
|
||||
- Write terminator, close, rename `.tmp → .pkt`.
|
||||
- Create a new `.tmp`, write header, reset `WrittenSize`.
|
||||
- Fire `metPktRotated`.
|
||||
3. Convert `TMsgRecord` → `TPktMessage` via `UniToPkt`.
|
||||
4. Write via `OpenMsgLib.Pkt` writer. Update `WrittenSize`.
|
||||
5. Fire `metMessageWritten`.
|
||||
6. Leave entry CS.
|
||||
|
||||
**Flush:** iterate `FCache`, for each entry write terminator,
|
||||
close, `FinalizeEntry`. Fire `metBatchFinalized`.
|
||||
|
||||
**Crash recovery:** on startup, scanner sees orphan `xxxxxxxx.pkt.tmp`
|
||||
files. Decision policy (caller-configurable):
|
||||
- **Discard** (default): delete orphan tmps, assume corrupt.
|
||||
- **Recover**: try to validate pkt header + terminator, if valid
|
||||
rename to `.pkt`, else discard.
|
||||
|
||||
---
|
||||
|
||||
## References to the cross-project conversation
|
||||
|
||||
Full transcript of the NR ↔ message_api discussion lives in the
|
||||
session log at
|
||||
`/home/ken/.claude/projects/-home-ken-Source-Code-netreader/*.jsonl`
|
||||
(dates around 2026-04-15). Relevant decisions captured in this
|
||||
document; the session log has the reasoning trail if questions
|
||||
arise.
|
||||
|
||||
Key points from that conversation, reproduced for the next session:
|
||||
|
||||
- **Why not fork message_api into a new project?** Because message_api
|
||||
has tested format backends already. Forking means re-validating
|
||||
them. Reshape in place preserves that investment.
|
||||
- **Why the five-line Hello World test?** comet achieves
|
||||
plug-and-forget with a single log callback and a TStream-based
|
||||
API. That's the bar. If the lib requires more ceremony than
|
||||
"open path, read, close," it's not there yet.
|
||||
- **Why explicit locking, not implicit?** BBSes with their own
|
||||
global mutex don't want the lib double-locking. Stateless
|
||||
readers don't need cross-process locks. Library guessing leads
|
||||
to surprises. Explicit means embedders can always reason about
|
||||
behavior.
|
||||
- **Why single callback, not registry?** comet's `OnLog` proves
|
||||
one pointer is enough. Multi-observer is the caller's problem —
|
||||
they can write their own fan-out if needed. Registry adds state
|
||||
the library shouldn't own.
|
||||
|
||||
---
|
||||
|
||||
## Closing
|
||||
|
||||
This is a design proposal, not a mandate. The next implementer
|
||||
should push back on anything that doesn't hold up against real
|
||||
code. The testmsg corpus + cross-verifier gives us the data to
|
||||
have that conversation grounded in bytes rather than opinion.
|
||||
|
||||
When in doubt: simpler is better. comet's model works because it
|
||||
refused to do things the library didn't absolutely need to do.
|
||||
message_api 1.0 should exit with **less code, not more**, than
|
||||
0.1.0 — the reshape is about architectural clarity, not feature
|
||||
addition. Features come in 1.1+ on top of a clean 1.0.
|
||||
Reference in New Issue
Block a user