@bobai/frontmatter
BOBAI Markdown Standard v1.1 frontmatter generator for FSS parsers. Provides consistent, standardized frontmatter generation across all parser types.
Installation
# From local path (recommended for FSS parsers)
npm install ../packages/bobai-frontmatter
# Or link globally
cd /MASTERFOLDER/Tools/parsers/packages/bobai-frontmatter
npm link
# Then in your parser:
npm link @bobai/frontmatter
Quick Start
import {
FrontmatterGenerator,
getEnrichmentPrompt,
PARSER_PROFILES,
LLMEnrichment
} from '@bobai/frontmatter';
// Generate markdown with frontmatter
const markdown = FrontmatterGenerator.generateMarkdown(
{
generator: 'fss-parse-pdf',
version: '1.2.0',
title: 'My Document',
sourcePath: '/path/to/file.pdf',
profile: PARSER_PROFILES['fss-parse-pdf'] // 'technical'
},
{
word_count: 1234,
page_count: 8,
has_tables: true,
has_images: false
},
content, // Markdown content string
undefined, // LLMEnrichment or undefined
'balanced' // OutputMode
);
API Reference
FrontmatterGenerator
generate(options, deterministic?, enrichment?, mode?)
Generate frontmatter YAML block only.
const frontmatter = FrontmatterGenerator.generate(
options: FrontmatterOptions,
deterministic?: DeterministicFields,
enrichment?: LLMEnrichment,
mode?: OutputMode // 'none' | 'balanced' | 'complete'
): string;
generateMarkdown(options, deterministic, content, enrichment?, mode?)
Generate complete markdown with frontmatter prepended.
const markdown = FrontmatterGenerator.generateMarkdown(
options: FrontmatterOptions,
deterministic: DeterministicFields,
content: string,
enrichment?: LLMEnrichment,
mode?: OutputMode
): string;
Types
FrontmatterOptions
interface FrontmatterOptions {
generator: string; // e.g., 'fss-parse-pdf'
version: string; // e.g., '1.2.0'
title: string; // Document title
sourcePath?: string | null; // Original file path
profile?: ProfileType; // Document profile
extractionConfidence?: number; // 0.0-1.0
contentQuality?: number; // 0.0-2.0
}
DeterministicFields
Parser-extracted metadata. Any fields can be included:
interface DeterministicFields {
word_count?: number;
page_count?: number;
character_count?: number;
[key: string]: any; // Parser-specific fields
}
LLMEnrichment
AI-generated metadata fields:
interface LLMEnrichment {
summary?: string;
tags?: string[];
category?: string;
audience?: 'all' | 'beginner' | 'intermediate' | 'expert';
doc_purpose?: 'reference' | 'tutorial' | 'troubleshooting' | 'conceptual' | 'guide' | 'specification';
complexity?: number; // 1-5
actionable?: boolean;
key_technologies?: string[];
}
Output Modes
none
Returns empty string (no frontmatter). Content only.
balanced (default)
Includes:
- Core required fields (profile, created, generator, version, title, etc.)
- Key deterministic fields from BALANCED_FIELDS list
- LLM enrichment fields (or placeholders)
Best for RAG indexing and search.
complete
Includes all fields from deterministic object plus core and enrichment fields. Use for archival or when full metadata is needed.
Parser Profiles
Default profiles for each parser type:
import { PARSER_PROFILES } from '@bobai/frontmatter';
PARSER_PROFILES['fss-parse-pdf'] // 'technical'
PARSER_PROFILES['fss-parse-word'] // 'technical'
PARSER_PROFILES['fss-parse-excel'] // 'data'
PARSER_PROFILES['fss-parse-image'] // 'data'
PARSER_PROFILES['fss-parse-audio'] // 'meeting'
PARSER_PROFILES['fss-parse-video'] // 'meeting'
PARSER_PROFILES['fss-parse-email'] // 'data'
PARSER_PROFILES['fss-parse-presentation'] // 'technical'
PARSER_PROFILES['fss-parse-data'] // 'data'
PARSER_PROFILES['fss-parse-diagram'] // 'schema'
Balanced Fields by Parser Type
The BALANCED_FIELDS list includes 70+ fields covering all parser types:
Universal
word_count, page_count, character_count, author, subject, creator, created, modified, file_size, format
PDF/Word Structure
has_tables, has_images, table_count, image_count, section_count, has_toc, has_forms, has_tracked_changes, paragraph_count, heading_count
Excel/Data
sheet_count, row_count, column_count, record_count, format_detected
Image
width, height, channels, has_alpha, color_space, ocr_confidence, has_exif
Audio
duration, duration_seconds, bitrate, sample_rate, codec, has_transcript, speaker_count, language
Video
fps, aspect_ratio, resolution, video_codec, audio_codec
Presentation
slide_count, total_slides, chart_count, has_speaker_notes, has_animations
from, to, cc, sender, recipients, date, message_id, has_attachments, attachment_count, importance, thread_id
Diagram
diagram_count, diagram_type, valid_diagrams, invalid_diagrams, node_count, edge_count
LLM Enrichment
Getting the Prompt
import { getEnrichmentPrompt, getSamplePromptForDocType } from '@bobai/frontmatter';
// Get prompt for LLM enrichment
const prompt = getEnrichmentPrompt(content, 'pdf');
// Send to your LLM...
const response = await llm.generate(prompt);
const enrichment: LLMEnrichment = JSON.parse(response);
// Use in frontmatter generation
const markdown = FrontmatterGenerator.generateMarkdown(
options,
deterministic,
content,
enrichment,
'balanced'
);
Prompt Output Format
The LLM will return JSON matching the LLMEnrichment interface:
{
"summary": "2-3 sentence description",
"tags": ["specific", "search", "terms"],
"category": "technical",
"audience": "intermediate",
"doc_purpose": "reference",
"complexity": 3,
"actionable": false,
"key_technologies": ["TypeScript", "Node.js"]
}
Parser Integration Example
// In your parser (e.g., pdf-ts/src/pdf-parser.ts)
import {
FrontmatterGenerator,
PARSER_PROFILES,
FrontmatterOptions,
DeterministicFields
} from '@bobai/frontmatter';
import { version } from '../package.json';
export function generateOutput(
content: string,
metadata: ParsedMetadata,
sourcePath: string,
mode: 'none' | 'balanced' | 'complete' = 'balanced'
): string {
const options: FrontmatterOptions = {
generator: 'fss-parse-pdf',
version,
title: metadata.title || 'Untitled',
sourcePath,
profile: PARSER_PROFILES['fss-parse-pdf'],
extractionConfidence: metadata.confidence,
contentQuality: calculateQuality(metadata)
};
const deterministic: DeterministicFields = {
word_count: metadata.wordCount,
page_count: metadata.pageCount,
character_count: metadata.characterCount,
has_tables: metadata.hasTables,
has_images: metadata.hasImages,
table_count: metadata.tableCount,
image_count: metadata.imageCount,
author: metadata.author,
created: metadata.creationDate,
modified: metadata.modificationDate,
encrypted: metadata.isEncrypted
};
return FrontmatterGenerator.generateMarkdown(
options,
deterministic,
content,
undefined, // No LLM enrichment
mode
);
}
Constants & Defaults
import {
DEFAULTS,
AUDIENCE_VALUES,
DOC_PURPOSE_VALUES,
PROFILE_VALUES,
BALANCED_FIELDS
} from '@bobai/frontmatter';
// Default values
DEFAULTS.profile // 'data'
DEFAULTS.audience // 'all'
DEFAULTS.extractionConfidence // 1.0
DEFAULTS.contentQuality // 1.5
DEFAULTS.complexity // 3
// Valid values for validation
AUDIENCE_VALUES // ['all', 'beginner', 'intermediate', 'expert']
DOC_PURPOSE_VALUES // ['reference', 'tutorial', ...]
PROFILE_VALUES // ['scraped', 'research', 'technical', ...]
Testing
npm test # Run all tests
npm run test:watch # Watch mode
npm run test:coverage # Coverage report
Building
npm run build # Compile TypeScript to dist/
npm run clean # Remove dist/
Output Example
---
profile: 'technical'
created: '2024-01-15T10:30:00.000Z'
generator: 'fss-parse-pdf'
version: '1.2.0'
title: 'API Documentation'
extraction_confidence: 1
content_quality: 1.5
source_file: '/docs/api.pdf'
word_count: 5000
page_count: 25
has_tables: true
has_images: true
author: 'Development Team'
summary: ''
tags: []
category: ''
---
# API Documentation
Content starts here...
License
MIT