# @bobai/frontmatter BOBAI Markdown Standard v1.1 frontmatter generator for FSS parsers. Provides consistent, standardized frontmatter generation across all parser types. ## Installation ```bash # From local path (recommended for FSS parsers) npm install ../packages/bobai-frontmatter # Or link globally cd /MASTERFOLDER/Tools/parsers/packages/bobai-frontmatter npm link # Then in your parser: npm link @bobai/frontmatter ``` ## Quick Start ```typescript import { FrontmatterGenerator, getEnrichmentPrompt, PARSER_PROFILES, LLMEnrichment } from '@bobai/frontmatter'; // Generate markdown with frontmatter const markdown = FrontmatterGenerator.generateMarkdown( { generator: 'fss-parse-pdf', version: '1.2.0', title: 'My Document', sourcePath: '/path/to/file.pdf', profile: PARSER_PROFILES['fss-parse-pdf'] // 'technical' }, { word_count: 1234, page_count: 8, has_tables: true, has_images: false }, content, // Markdown content string undefined, // LLMEnrichment or undefined 'balanced' // OutputMode ); ``` ## API Reference ### FrontmatterGenerator #### `generate(options, deterministic?, enrichment?, mode?)` Generate frontmatter YAML block only. ```typescript const frontmatter = FrontmatterGenerator.generate( options: FrontmatterOptions, deterministic?: DeterministicFields, enrichment?: LLMEnrichment, mode?: OutputMode // 'none' | 'balanced' | 'complete' ): string; ``` #### `generateMarkdown(options, deterministic, content, enrichment?, mode?)` Generate complete markdown with frontmatter prepended. ```typescript const markdown = FrontmatterGenerator.generateMarkdown( options: FrontmatterOptions, deterministic: DeterministicFields, content: string, enrichment?: LLMEnrichment, mode?: OutputMode ): string; ``` ### Types #### FrontmatterOptions ```typescript interface FrontmatterOptions { generator: string; // e.g., 'fss-parse-pdf' version: string; // e.g., '1.2.0' title: string; // Document title sourcePath?: string | null; // Original file path profile?: ProfileType; // Document profile extractionConfidence?: number; // 0.0-1.0 contentQuality?: number; // 0.0-2.0 } ``` #### DeterministicFields Parser-extracted metadata. Any fields can be included: ```typescript interface DeterministicFields { word_count?: number; page_count?: number; character_count?: number; [key: string]: any; // Parser-specific fields } ``` #### LLMEnrichment AI-generated metadata fields: ```typescript interface LLMEnrichment { summary?: string; tags?: string[]; category?: string; audience?: 'all' | 'beginner' | 'intermediate' | 'expert'; doc_purpose?: 'reference' | 'tutorial' | 'troubleshooting' | 'conceptual' | 'guide' | 'specification'; complexity?: number; // 1-5 actionable?: boolean; key_technologies?: string[]; } ``` ## Output Modes ### `none` Returns empty string (no frontmatter). Content only. ### `balanced` (default) Includes: - Core required fields (profile, created, generator, version, title, etc.) - Key deterministic fields from BALANCED_FIELDS list - LLM enrichment fields (or placeholders) Best for RAG indexing and search. ### `complete` Includes all fields from deterministic object plus core and enrichment fields. Use for archival or when full metadata is needed. ## Parser Profiles Default profiles for each parser type: ```typescript import { PARSER_PROFILES } from '@bobai/frontmatter'; PARSER_PROFILES['fss-parse-pdf'] // 'technical' PARSER_PROFILES['fss-parse-word'] // 'technical' PARSER_PROFILES['fss-parse-excel'] // 'data' PARSER_PROFILES['fss-parse-image'] // 'data' PARSER_PROFILES['fss-parse-audio'] // 'meeting' PARSER_PROFILES['fss-parse-video'] // 'meeting' PARSER_PROFILES['fss-parse-email'] // 'data' PARSER_PROFILES['fss-parse-presentation'] // 'technical' PARSER_PROFILES['fss-parse-data'] // 'data' PARSER_PROFILES['fss-parse-diagram'] // 'schema' ``` ## Balanced Fields by Parser Type The BALANCED_FIELDS list includes 70+ fields covering all parser types: ### Universal `word_count`, `page_count`, `character_count`, `author`, `subject`, `creator`, `created`, `modified`, `file_size`, `format` ### PDF/Word Structure `has_tables`, `has_images`, `table_count`, `image_count`, `section_count`, `has_toc`, `has_forms`, `has_tracked_changes`, `paragraph_count`, `heading_count` ### Excel/Data `sheet_count`, `row_count`, `column_count`, `record_count`, `format_detected` ### Image `width`, `height`, `channels`, `has_alpha`, `color_space`, `ocr_confidence`, `has_exif` ### Audio `duration`, `duration_seconds`, `bitrate`, `sample_rate`, `codec`, `has_transcript`, `speaker_count`, `language` ### Video `fps`, `aspect_ratio`, `resolution`, `video_codec`, `audio_codec` ### Presentation `slide_count`, `total_slides`, `chart_count`, `has_speaker_notes`, `has_animations` ### Email `from`, `to`, `cc`, `sender`, `recipients`, `date`, `message_id`, `has_attachments`, `attachment_count`, `importance`, `thread_id` ### Diagram `diagram_count`, `diagram_type`, `valid_diagrams`, `invalid_diagrams`, `node_count`, `edge_count` ## LLM Enrichment ### Getting the Prompt ```typescript import { getEnrichmentPrompt, getSamplePromptForDocType } from '@bobai/frontmatter'; // Get prompt for LLM enrichment const prompt = getEnrichmentPrompt(content, 'pdf'); // Send to your LLM... const response = await llm.generate(prompt); const enrichment: LLMEnrichment = JSON.parse(response); // Use in frontmatter generation const markdown = FrontmatterGenerator.generateMarkdown( options, deterministic, content, enrichment, 'balanced' ); ``` ### Prompt Output Format The LLM will return JSON matching the LLMEnrichment interface: ```json { "summary": "2-3 sentence description", "tags": ["specific", "search", "terms"], "category": "technical", "audience": "intermediate", "doc_purpose": "reference", "complexity": 3, "actionable": false, "key_technologies": ["TypeScript", "Node.js"] } ``` ## Parser Integration Example ```typescript // In your parser (e.g., pdf-ts/src/pdf-parser.ts) import { FrontmatterGenerator, PARSER_PROFILES, FrontmatterOptions, DeterministicFields } from '@bobai/frontmatter'; import { version } from '../package.json'; export function generateOutput( content: string, metadata: ParsedMetadata, sourcePath: string, mode: 'none' | 'balanced' | 'complete' = 'balanced' ): string { const options: FrontmatterOptions = { generator: 'fss-parse-pdf', version, title: metadata.title || 'Untitled', sourcePath, profile: PARSER_PROFILES['fss-parse-pdf'], extractionConfidence: metadata.confidence, contentQuality: calculateQuality(metadata) }; const deterministic: DeterministicFields = { word_count: metadata.wordCount, page_count: metadata.pageCount, character_count: metadata.characterCount, has_tables: metadata.hasTables, has_images: metadata.hasImages, table_count: metadata.tableCount, image_count: metadata.imageCount, author: metadata.author, created: metadata.creationDate, modified: metadata.modificationDate, encrypted: metadata.isEncrypted }; return FrontmatterGenerator.generateMarkdown( options, deterministic, content, undefined, // No LLM enrichment mode ); } ``` ## Constants & Defaults ```typescript import { DEFAULTS, AUDIENCE_VALUES, DOC_PURPOSE_VALUES, PROFILE_VALUES, BALANCED_FIELDS } from '@bobai/frontmatter'; // Default values DEFAULTS.profile // 'data' DEFAULTS.audience // 'all' DEFAULTS.extractionConfidence // 1.0 DEFAULTS.contentQuality // 1.5 DEFAULTS.complexity // 3 // Valid values for validation AUDIENCE_VALUES // ['all', 'beginner', 'intermediate', 'expert'] DOC_PURPOSE_VALUES // ['reference', 'tutorial', ...] PROFILE_VALUES // ['scraped', 'research', 'technical', ...] ``` ## Testing ```bash npm test # Run all tests npm run test:watch # Watch mode npm run test:coverage # Coverage report ``` ## Building ```bash npm run build # Compile TypeScript to dist/ npm run clean # Remove dist/ ``` ## Output Example ```yaml --- profile: 'technical' created: '2024-01-15T10:30:00.000Z' generator: 'fss-parse-pdf' version: '1.2.0' title: 'API Documentation' extraction_confidence: 1 content_quality: 1.5 source_file: '/docs/api.pdf' word_count: 5000 page_count: 25 has_tables: true has_images: true author: 'Development Team' summary: '' tags: [] category: '' --- # API Documentation Content starts here... ``` ## License MIT