parser-frontmatter-ts/COMPLETION_SPEC.md

9.9 KiB

@bobai/frontmatter - Completion Specification

Package Overview

Field Value
Package Name @bobai/frontmatter
Version 1.1.0
Standard BOBAI Markdown Standard v1.1
Language TypeScript
Node.js >= 18.0.0
License MIT

Implementation Status

Core Features

Feature Status Notes
FrontmatterGenerator class Complete Static methods for generation
Output modes (none/balanced/complete) Complete All three modes implemented
YAML serialization Complete Uses js-yaml with proper formatting
Type definitions Complete Full TypeScript interfaces
Constants & defaults Complete Comprehensive coverage
LLM enrichment prompts Complete Prompt templates included
Parser profiles Complete All 10 parsers mapped

Test Coverage

Test Suite Tests Status
generator.test.ts 35 Passing
constants.test.ts 16 Passing
prompts.test.ts 12 Passing
Total 63 All Passing

File Structure

bobai-frontmatter/
├── src/
│   ├── index.ts              # Main exports (27 lines)
│   ├── generator.ts          # FrontmatterGenerator class (123 lines)
│   ├── types.ts              # TypeScript interfaces (47 lines)
│   ├── constants.ts          # Enums, defaults, balanced fields (130 lines)
│   └── prompts.ts            # LLM enrichment prompts (43 lines)
├── tests/
│   ├── generator.test.ts     # Generator tests (470 lines)
│   ├── constants.test.ts     # Constants tests (140 lines)
│   └── prompts.test.ts       # Prompt tests (80 lines)
├── dist/                     # Compiled JavaScript + type definitions
├── package.json              # NPM configuration with Jest
├── tsconfig.json             # TypeScript configuration
├── README.md                 # Comprehensive documentation
├── COMPLETION_SPEC.md        # This document
└── IMPLEMENTATION_BLUEPRINT.md # Original blueprint

Exports

Types

export type OutputMode = 'none' | 'balanced' | 'complete';
export type AudienceLevel = 'all' | 'beginner' | 'intermediate' | 'expert';
export type DocPurpose = 'reference' | 'tutorial' | 'troubleshooting' | 'conceptual' | 'guide' | 'specification';
export type ProfileType = 'scraped' | 'research' | 'technical' | 'code' | 'data' | 'changelog' | 'legal' | 'test' | 'schema' | 'troubleshoot' | 'meeting' | 'faq' | 'config';

export interface FrontmatterOptions { ... }
export interface DeterministicFields { ... }
export interface LLMEnrichment { ... }

Constants

export const AUDIENCE_VALUES: AudienceLevel[];      // 4 values
export const DOC_PURPOSE_VALUES: DocPurpose[];      // 6 values
export const PROFILE_VALUES: ProfileType[];         // 13 values
export const DEFAULTS: { ... };                     // 5 defaults
export const BALANCED_FIELDS: string[];             // 70+ fields
export const PARSER_PROFILES: Record<string, ProfileType>;  // 10 parsers

Functions

export class FrontmatterGenerator {
  static generate(options, deterministic?, enrichment?, mode?): string;
  static generateMarkdown(options, deterministic, content, enrichment?, mode?): string;
}

export function getEnrichmentPrompt(content: string, docType?: string): string;
export function getSamplePromptForDocType(docType: string): string;

Parser Support Matrix

Supported Parsers and Their Balanced Fields

Parser Profile Key Balanced Fields
fss-parse-pdf technical word_count, page_count, has_tables, has_images, has_toc, has_forms, encrypted, author
fss-parse-word technical word_count, page_count, paragraph_count, has_tracked_changes, has_toc, author
fss-parse-excel data sheet_count, row_count, column_count, author
fss-parse-image data width, height, format, channels, has_alpha, ocr_confidence, file_size
fss-parse-audio meeting duration, bitrate, sample_rate, codec, has_transcript, speaker_count, language
fss-parse-video meeting duration, width, height, fps, aspect_ratio, video_codec, audio_codec
fss-parse-email data from, to, cc, sender, recipients, date, message_id, has_attachments, attachment_count, importance
fss-parse-presentation technical slide_count, total_slides, word_count, chart_count, has_speaker_notes, has_images
fss-parse-data data record_count, format_detected, file_size, column_count
fss-parse-diagram schema diagram_count, diagram_type, valid_diagrams, invalid_diagrams, node_count, edge_count

BALANCED_FIELDS Complete List (70 fields)

Universal Document (10)

  • word_count, page_count, character_count, author, subject, creator, created, modified, file_size, format

Structure Fields (10)

  • has_tables, has_images, table_count, image_count, section_count, has_toc, has_forms, has_tracked_changes, paragraph_count, heading_count

Excel/Data (5)

  • sheet_count, row_count, column_count, record_count, format_detected

Image (7)

  • width, height, channels, has_alpha, color_space, ocr_confidence, has_exif

Audio (8)

  • duration, duration_seconds, bitrate, sample_rate, codec, has_transcript, speaker_count, language

Video (5)

  • fps, aspect_ratio, resolution, video_codec, audio_codec

Presentation (5)

  • slide_count, total_slides, chart_count, has_speaker_notes, has_animations

Email (11)

  • from, to, cc, sender, recipients, date, message_id, has_attachments, attachment_count, importance, thread_id

Diagram (6)

  • diagram_count, diagram_type, valid_diagrams, invalid_diagrams, node_count, edge_count

Analysis (3)

  • encrypted, complexity_score, reading_time_minutes

Default Values

Default Value Description
profile 'data' Default document profile
audience 'all' Default audience level
extractionConfidence 1.0 Default confidence (0.0-1.0)
contentQuality 1.5 Default quality score (0.0-2.0)
complexity 3 Default complexity (1-5)

Output Format

Frontmatter Structure

---
# Core fields (always present)
profile: 'technical'
created: '2024-01-15T10:30:00.000Z'
generator: 'fss-parse-pdf'
version: '1.2.0'
title: 'Document Title'
extraction_confidence: 1
content_quality: 1.5
source_file: '/path/to/file.pdf'

# Deterministic fields (based on mode)
word_count: 5000
page_count: 25
has_tables: true
# ... more based on parser type

# LLM enrichment fields (or placeholders)
summary: 'Description of document...'
tags:
  - tag1
  - tag2
category: 'technical'
audience: 'intermediate'
doc_purpose: 'reference'
complexity: 3
actionable: false
key_technologies:
  - TypeScript
  - Node.js
---

Dependencies

Production

  • js-yaml ^4.1.0 - YAML serialization

Development

  • typescript ^5.0.0 - TypeScript compiler
  • jest ^29.7.0 - Test runner
  • ts-jest ^29.1.0 - Jest TypeScript transformer
  • @types/jest ^29.5.0 - Jest type definitions
  • @types/js-yaml ^4.0.9 - js-yaml type definitions
  • @types/node ^20.0.0 - Node.js type definitions

Usage Patterns

Basic Usage

import { FrontmatterGenerator } from '@bobai/frontmatter';

const markdown = FrontmatterGenerator.generateMarkdown(
  { generator: 'fss-parse-pdf', version: '1.0.0', title: 'Doc' },
  { word_count: 1000, page_count: 5 },
  '# Content here'
);

With LLM Enrichment

import { FrontmatterGenerator, getEnrichmentPrompt, LLMEnrichment } from '@bobai/frontmatter';

const prompt = getEnrichmentPrompt(content, 'pdf');
const enrichment: LLMEnrichment = await getLLMResponse(prompt);

const markdown = FrontmatterGenerator.generateMarkdown(
  options, deterministic, content, enrichment, 'balanced'
);

Using Parser Profiles

import { PARSER_PROFILES } from '@bobai/frontmatter';

const profile = PARSER_PROFILES['fss-parse-audio'];  // 'meeting'

Integration Requirements

For Parsers to Use This Package

  1. Install: npm install ../packages/bobai-frontmatter
  2. Import: import { FrontmatterGenerator, ... } from '@bobai/frontmatter';
  3. Build: Ensure bobai-frontmatter is built before parser build

Package.json Dependency

{
  "dependencies": {
    "@bobai/frontmatter": "file:../packages/bobai-frontmatter"
  }
}

Quality Metrics

Metric Value
Total Lines of Code ~500 (src)
Test Coverage 63 tests
TypeScript Strict Mode Yes
Zero Runtime Errors Yes
Build Time < 1s
Test Time ~1s

Validation Checklist

  • All types properly exported
  • All constants properly exported
  • FrontmatterGenerator methods work correctly
  • YAML output is valid
  • All output modes function correctly
  • Balanced fields cover all parser types
  • Parser profiles are correct
  • LLM prompts generate correct structure
  • Tests pass with no warnings
  • TypeScript compiles with no errors
  • README documentation complete
  • Package.json properly configured

Known Limitations

  1. No LLM client: Package provides prompts but not LLM integration
  2. No file I/O: Generate strings only, parsers handle file operations
  3. No validation: Trusts parser-provided data

Future Enhancements (Not Implemented)

  1. LLM client integration (src/llm/ directory)
  2. Schema validation for frontmatter
  3. Custom field definitions per parser
  4. Streaming generation for large documents

Conclusion

The @bobai/frontmatter package is complete and ready for integration with all FSS parsers. It provides:

  • Consistent BOBAI v1.1 standard frontmatter generation
  • Support for all 10 parser types
  • Three output modes for different use cases
  • LLM enrichment prompt templates
  • Comprehensive test coverage
  • Full TypeScript type safety

Parsers can immediately begin using this package by installing it as a local dependency and importing the required exports.