Skip to main content

Taxonomy & Data Model

The entity model behind the browse philosophy. This document covers the shapes and constraints; API shapes live in API.md; philosophy in AgnosticUI.md.

The model is content-centric. We have one content kind (Content) and the taxonomy that classifies it: Surface, Group, Channel, Tag, and Platform. The schema supports adding more content kinds later without structural change.


1. Overview

Surface (1) ← Layer 1, typically one: "Content"
├── Group (n) ← Layer 2: topic buckets
│ └── Channel (n) ← Layer 3 (optional): sub-topics
└── …

Tag (n) ← shared vocabulary, faceted
Platform (n) ← source platform badges (youtube, bluesky, …)

Content (n) ← the items themselves
├── surfaceSlug (denormalized ref)
├── groupSlug (denormalized ref)
├── channelSlug? (denormalized ref)
├── platformSlug (denormalized ref)
├── tagSlugs[] (multikey index)
└── approvalStatus (pending | approved | rejected)

Denormalization by slug is intentional: list/filter queries run against Content alone, without joins. Slugs are stable (rename is atomic; see API.md §3).


2. Surface (Layer 1)

Usually one. Supports future kinds without structural change.

@Schema({ timestamps: true, collection: 'surfaces' })
class Surface {
@Prop({ required: true, unique: true }) slug: string; // e.g. "content"
@Prop({ required: true }) label: string; // "Content"
@Prop({ required: true }) routePrefix: string; // "/browse"
@Prop({ required: true, default: 'topic-groups' }) browseMode: 'topic-groups' | 'alphabetical' | 'faceted-primary' | 'entity-linked';
@Prop() accentColor?: string; // hex
@Prop() icon?: string; // name or URL
@Prop({ default: 0 }) displayOrder: number;
@Prop({ default: true }) isVisible: boolean;
@Prop({ default: true }) isActive: boolean;
@Prop() minPermission?: string; // e.g. "content.moderate"
}

Indexes: { slug: 1 } unique.


3. Group (Layer 2)

@Schema({ timestamps: true, collection: 'groups' })
class Group {
@Prop({ required: true, ref: 'Surface' }) surfaceSlug: string;
@Prop({ required: true }) slug: string; // unique within surfaceSlug
@Prop({ required: true }) label: string;
@Prop() color?: string;
@Prop() icon?: string;
@Prop({ default: 0 }) displayOrder: number;
@Prop({ default: true }) isActive: boolean;
}

Indexes: { surfaceSlug: 1, slug: 1 } unique.


4. Channel (Layer 3, optional)

@Schema({ timestamps: true, collection: 'channels' })
class Channel {
@Prop({ required: true, ref: 'Group' }) groupSlug: string;
@Prop({ required: true, ref: 'Surface' }) surfaceSlug: string; // denormalized
@Prop({ required: true }) slug: string; // unique within groupSlug
@Prop({ required: true }) label: string;
@Prop({ default: 0 }) displayOrder: number;
@Prop({ default: true }) isActive: boolean;
}

Indexes: { groupSlug: 1, slug: 1 } unique; { surfaceSlug: 1, groupSlug: 1 } for catalog assembly.


5. Platform

A small, mostly-static list of source platforms that power the platform filter and badge color.

@Schema({ timestamps: true, collection: 'platforms' })
class Platform {
@Prop({ required: true, unique: true }) slug: string; // "youtube"
@Prop({ required: true }) label: string; // "YouTube"
@Prop() color?: string; // brand hex
@Prop({ default: 0 }) displayOrder: number;
@Prop({ default: true }) isActive: boolean;
}

Seeded at bootstrap with a starter set (youtube, twitter, bluesky, reddit, article, generic); operators can add more via the admin surface.


6. Tag

Shared vocabulary for all taggable content. Single collection; slug is the natural primary key; facet classifies meaning.

@Schema({ timestamps: true, collection: 'tags' })
class Tag {
@Prop({ required: true, unique: true }) slug: string;
@Prop({ required: true }) label: string; // display form
@Prop({ required: true, default: 'unclassified' })
facet: 'topic' | 'format' | 'level' | 'source-type' | 'unclassified';
@Prop() description?: string;
@Prop({ default: 0 }) usageCount: number; // denormalized, see below
@Prop({ default: true }) isActive: boolean;
}

usageCount denormalization

Updated on content state transitions:

  • On approve: increment counts for each tag in the approved content.
  • On reject or soft-delete: decrement.
  • On tag edit of approved content: apply diff.

A nightly reconciliation job (countDocuments({ tagSlugs: slug, approvalStatus: 'approved', isActive: true }) per tag) corrects drift. Optional v1; becomes necessary at scale.

Rename

Atomic, transactional, cascade to Content.tagSlugs. See API.md §3.

Never delete

Rename or deactivate (isActive=false). Deletion loses the audit trail of what a content item meant at a point in time.

Indexes: { slug: 1 } unique; { facet: 1, slug: 1 } for the tag explorer's facet-grouped view; { usageCount: -1 } for by-usage sort.


7. Content

The item being browsed. Existing User/Role schemas stay as-is.

@Schema({ timestamps: true, collection: 'content' })
class Content {
@Prop({ required: true, unique: true }) slug: string;
@Prop({ required: true }) title: string;
@Prop() browseTitle?: string; // curator override
@Prop() description?: string;
@Prop({ required: true }) url: string; // canonical link
@Prop({ required: true, ref: 'Platform' }) platformSlug: string;

// taxonomy links (denormalized by slug)
@Prop({ required: true, ref: 'Surface' }) surfaceSlug: string;
@Prop({ required: true, ref: 'Group' }) groupSlug: string;
@Prop({ ref: 'Channel' }) channelSlug?: string;
@Prop({ type: [String], default: [] }) tagSlugs: string[];

// media
@Prop() thumbnailUrl?: string;
@Prop({ type: Object }) thumbnailMip?: { small: string; medium: string; large: string };
@Prop({ type: Object }) embed?: { provider: string; html?: string; width?: number; height?: number };

// submission + approval
@Prop({ required: true, ref: 'User' }) submittedBy: Types.ObjectId;
@Prop({ required: true, default: 'pending' })
approvalStatus: 'pending' | 'approved' | 'rejected';
@Prop() approvedAt?: Date; // set on transition → approved
@Prop({ type: Object }) approvalMeta?: { actorUserId: Types.ObjectId; actorAt: Date; reason?: string };

// soft delete
@Prop({ default: true }) isActive: boolean;
}

Required indexes

IndexPurpose
{ slug: 1 } uniqueDetail lookup
{ approvalStatus: 1, isActive: 1, approvedAt: -1, _id: -1 }Default feed (newest approved)
{ tagSlugs: 1 } multikeyTag filter — hot path
{ groupSlug: 1, approvedAt: -1 }Group view
{ groupSlug: 1, channelSlug: 1, approvedAt: -1 }Channel view
{ platformSlug: 1, approvedAt: -1 }Platform filter
Text index on { title: 'text', description: 'text' }q= text search
{ submittedBy: 1, createdAt: -1 }"My submissions" queries

Mongo multi-index planning is best-effort; the default-feed composite covers the unfiltered case, and the tag multikey handles tag-narrowed filters. The query planner intersects where sensible.

Approval state machine

pending ─approve→ approved
pending ─reject→ rejected
approved ─soft-delete→ (isActive=false; status unchanged)
rejected ─revive→ pending (moderator only)

Illegal transitions (approve an already-rejected item without re-queueing; approve a soft-deleted item) return 422 with error.code = "content.state_invalid".

Submit flow

POST /content/submit with { url, title, platformSlug?, groupSlug?, channelSlug?, tagSlugs? }:

  1. Validate URL (valid URL, 2–2048 chars).
  2. Validate platformSlug if supplied; infer from URL hostname if missing (simple map: youtube.com → youtube, x.com/twitter.com → twitter, bsky.app → bluesky, reddit.com → reddit, else generic).
  3. Validate groupSlug is active; fall back to a configured default group if unspecified.
  4. Validate every tag slug exists and is active; reject with 400 if any doesn't (do not auto-create tags from submissions).
  5. Generate a slug from title; on conflict, append a short random suffix.
  6. Insert with approvalStatus: 'pending', submittedBy: <user>.
  7. Fire content.submitted event (in-process EventEmitter v1; queue later).

Approval flow

POST /content/<slug>/approve:

  1. Load content; 404 if missing or inactive.
  2. 422 if not pending (idempotent rejections are fine; see "idempotency" below).
  3. Update approvalStatus: 'approved', approvedAt: now, approvalMeta.
  4. Fire content.approved event → async thumbnail mip generation, usage-count increment.
  5. Return updated content in envelope.

POST /content/<slug>/reject is analogous; takes optional reason.

Idempotency. Approving an already-approved item returns the current state with 200 and meta.unchanged: true — don't error. Rejecting an already-rejected item same pattern.


8. Catalog derivation

GET /catalog is a derived view. Assembly:

  1. Query surfaces.find({ isActive: true, isVisible: true }).sort({ displayOrder: 1 }).lean().
  2. For each surface, query groups (same filters, sorted).
  3. For each group, query channels (same filters, sorted).
  4. Query active platforms and build the platform list.
  5. Compute facet list from a config (or from distinct tags.facet values).
  6. Compute etag from a hash of max(updatedAt) across the queried collections.

Assembly happens in a service method that is cached (Nest CacheModule with ttl: 300s, keyed by 'catalog:v1'). Writes to any taxonomy collection invalidate the cache via a central CatalogCacheService.invalidate() hook called from each taxonomy controller's write paths.

Handling admin vs public

Hidden surfaces (those with minPermission set) are omitted for callers who lack the permission. The cache is keyed on a derived permission bucket ('public' | 'moderator' | 'admin') so a single cache serves each bucket.


9. Seeding

On first boot (if surfaces.countDocuments() === 0), the app seeds:

  • One surface: { slug: 'content', label: 'Content', routePrefix: '/browse', browseMode: 'topic-groups' }.
  • A starter group: { surfaceSlug: 'content', slug: 'general', label: 'General' }.
  • The platform list (youtube, twitter, bluesky, reddit, article, generic).
  • Roles (admin, moderator, member) per AUTH.md §5.

Seed script: api/scripts/seed.ts (TBD). Safe to re-run — each seed entry uses findOneAndUpdate with upsert: true.


10. Migration posture

No migration framework in v1. When a schema change is needed:

  1. Mongoose schema change is forward-compatible (default: … on new required fields; type widening).
  2. A one-off script in api/scripts/migrations/ mutates existing documents.
  3. Never rename a field in place — add-new, backfill, deprecate, remove.

This is acceptable at current scale (no prod data). Revisit if operational maturity demands a formal tool (e.g. migrate-mongo).