parser-frontmatter-ts/COMPLETION_SPEC.md

311 lines
9.9 KiB
Markdown

# @bobai/frontmatter - Completion Specification
## Package Overview
| Field | Value |
|-------|-------|
| Package Name | `@bobai/frontmatter` |
| Version | 1.1.0 |
| Standard | BOBAI Markdown Standard v1.1 |
| Language | TypeScript |
| Node.js | >= 18.0.0 |
| License | MIT |
## Implementation Status
### Core Features
| Feature | Status | Notes |
|---------|--------|-------|
| FrontmatterGenerator class | Complete | Static methods for generation |
| Output modes (none/balanced/complete) | Complete | All three modes implemented |
| YAML serialization | Complete | Uses js-yaml with proper formatting |
| Type definitions | Complete | Full TypeScript interfaces |
| Constants & defaults | Complete | Comprehensive coverage |
| LLM enrichment prompts | Complete | Prompt templates included |
| Parser profiles | Complete | All 10 parsers mapped |
### Test Coverage
| Test Suite | Tests | Status |
|------------|-------|--------|
| generator.test.ts | 35 | Passing |
| constants.test.ts | 16 | Passing |
| prompts.test.ts | 12 | Passing |
| **Total** | **63** | **All Passing** |
## File Structure
```
bobai-frontmatter/
├── src/
│ ├── index.ts # Main exports (27 lines)
│ ├── generator.ts # FrontmatterGenerator class (123 lines)
│ ├── types.ts # TypeScript interfaces (47 lines)
│ ├── constants.ts # Enums, defaults, balanced fields (130 lines)
│ └── prompts.ts # LLM enrichment prompts (43 lines)
├── tests/
│ ├── generator.test.ts # Generator tests (470 lines)
│ ├── constants.test.ts # Constants tests (140 lines)
│ └── prompts.test.ts # Prompt tests (80 lines)
├── dist/ # Compiled JavaScript + type definitions
├── package.json # NPM configuration with Jest
├── tsconfig.json # TypeScript configuration
├── README.md # Comprehensive documentation
├── COMPLETION_SPEC.md # This document
└── IMPLEMENTATION_BLUEPRINT.md # Original blueprint
```
## Exports
### Types
```typescript
export type OutputMode = 'none' | 'balanced' | 'complete';
export type AudienceLevel = 'all' | 'beginner' | 'intermediate' | 'expert';
export type DocPurpose = 'reference' | 'tutorial' | 'troubleshooting' | 'conceptual' | 'guide' | 'specification';
export type ProfileType = 'scraped' | 'research' | 'technical' | 'code' | 'data' | 'changelog' | 'legal' | 'test' | 'schema' | 'troubleshoot' | 'meeting' | 'faq' | 'config';
export interface FrontmatterOptions { ... }
export interface DeterministicFields { ... }
export interface LLMEnrichment { ... }
```
### Constants
```typescript
export const AUDIENCE_VALUES: AudienceLevel[]; // 4 values
export const DOC_PURPOSE_VALUES: DocPurpose[]; // 6 values
export const PROFILE_VALUES: ProfileType[]; // 13 values
export const DEFAULTS: { ... }; // 5 defaults
export const BALANCED_FIELDS: string[]; // 70+ fields
export const PARSER_PROFILES: Record<string, ProfileType>; // 10 parsers
```
### Functions
```typescript
export class FrontmatterGenerator {
static generate(options, deterministic?, enrichment?, mode?): string;
static generateMarkdown(options, deterministic, content, enrichment?, mode?): string;
}
export function getEnrichmentPrompt(content: string, docType?: string): string;
export function getSamplePromptForDocType(docType: string): string;
```
## Parser Support Matrix
### Supported Parsers and Their Balanced Fields
| Parser | Profile | Key Balanced Fields |
|--------|---------|---------------------|
| fss-parse-pdf | technical | word_count, page_count, has_tables, has_images, has_toc, has_forms, encrypted, author |
| fss-parse-word | technical | word_count, page_count, paragraph_count, has_tracked_changes, has_toc, author |
| fss-parse-excel | data | sheet_count, row_count, column_count, author |
| fss-parse-image | data | width, height, format, channels, has_alpha, ocr_confidence, file_size |
| fss-parse-audio | meeting | duration, bitrate, sample_rate, codec, has_transcript, speaker_count, language |
| fss-parse-video | meeting | duration, width, height, fps, aspect_ratio, video_codec, audio_codec |
| fss-parse-email | data | from, to, cc, sender, recipients, date, message_id, has_attachments, attachment_count, importance |
| fss-parse-presentation | technical | slide_count, total_slides, word_count, chart_count, has_speaker_notes, has_images |
| fss-parse-data | data | record_count, format_detected, file_size, column_count |
| fss-parse-diagram | schema | diagram_count, diagram_type, valid_diagrams, invalid_diagrams, node_count, edge_count |
## BALANCED_FIELDS Complete List (70 fields)
### Universal Document (10)
- word_count, page_count, character_count, author, subject, creator, created, modified, file_size, format
### Structure Fields (10)
- has_tables, has_images, table_count, image_count, section_count, has_toc, has_forms, has_tracked_changes, paragraph_count, heading_count
### Excel/Data (5)
- sheet_count, row_count, column_count, record_count, format_detected
### Image (7)
- width, height, channels, has_alpha, color_space, ocr_confidence, has_exif
### Audio (8)
- duration, duration_seconds, bitrate, sample_rate, codec, has_transcript, speaker_count, language
### Video (5)
- fps, aspect_ratio, resolution, video_codec, audio_codec
### Presentation (5)
- slide_count, total_slides, chart_count, has_speaker_notes, has_animations
### Email (11)
- from, to, cc, sender, recipients, date, message_id, has_attachments, attachment_count, importance, thread_id
### Diagram (6)
- diagram_count, diagram_type, valid_diagrams, invalid_diagrams, node_count, edge_count
### Analysis (3)
- encrypted, complexity_score, reading_time_minutes
## Default Values
| Default | Value | Description |
|---------|-------|-------------|
| profile | 'data' | Default document profile |
| audience | 'all' | Default audience level |
| extractionConfidence | 1.0 | Default confidence (0.0-1.0) |
| contentQuality | 1.5 | Default quality score (0.0-2.0) |
| complexity | 3 | Default complexity (1-5) |
## Output Format
### Frontmatter Structure
```yaml
---
# Core fields (always present)
profile: 'technical'
created: '2024-01-15T10:30:00.000Z'
generator: 'fss-parse-pdf'
version: '1.2.0'
title: 'Document Title'
extraction_confidence: 1
content_quality: 1.5
source_file: '/path/to/file.pdf'
# Deterministic fields (based on mode)
word_count: 5000
page_count: 25
has_tables: true
# ... more based on parser type
# LLM enrichment fields (or placeholders)
summary: 'Description of document...'
tags:
- tag1
- tag2
category: 'technical'
audience: 'intermediate'
doc_purpose: 'reference'
complexity: 3
actionable: false
key_technologies:
- TypeScript
- Node.js
---
```
## Dependencies
### Production
- `js-yaml` ^4.1.0 - YAML serialization
### Development
- `typescript` ^5.0.0 - TypeScript compiler
- `jest` ^29.7.0 - Test runner
- `ts-jest` ^29.1.0 - Jest TypeScript transformer
- `@types/jest` ^29.5.0 - Jest type definitions
- `@types/js-yaml` ^4.0.9 - js-yaml type definitions
- `@types/node` ^20.0.0 - Node.js type definitions
## Usage Patterns
### Basic Usage
```typescript
import { FrontmatterGenerator } from '@bobai/frontmatter';
const markdown = FrontmatterGenerator.generateMarkdown(
{ generator: 'fss-parse-pdf', version: '1.0.0', title: 'Doc' },
{ word_count: 1000, page_count: 5 },
'# Content here'
);
```
### With LLM Enrichment
```typescript
import { FrontmatterGenerator, getEnrichmentPrompt, LLMEnrichment } from '@bobai/frontmatter';
const prompt = getEnrichmentPrompt(content, 'pdf');
const enrichment: LLMEnrichment = await getLLMResponse(prompt);
const markdown = FrontmatterGenerator.generateMarkdown(
options, deterministic, content, enrichment, 'balanced'
);
```
### Using Parser Profiles
```typescript
import { PARSER_PROFILES } from '@bobai/frontmatter';
const profile = PARSER_PROFILES['fss-parse-audio']; // 'meeting'
```
## Integration Requirements
### For Parsers to Use This Package
1. **Install**: `npm install ../packages/bobai-frontmatter`
2. **Import**: `import { FrontmatterGenerator, ... } from '@bobai/frontmatter';`
3. **Build**: Ensure bobai-frontmatter is built before parser build
### Package.json Dependency
```json
{
"dependencies": {
"@bobai/frontmatter": "file:../packages/bobai-frontmatter"
}
}
```
## Quality Metrics
| Metric | Value |
|--------|-------|
| Total Lines of Code | ~500 (src) |
| Test Coverage | 63 tests |
| TypeScript Strict Mode | Yes |
| Zero Runtime Errors | Yes |
| Build Time | < 1s |
| Test Time | ~1s |
## Validation Checklist
- [x] All types properly exported
- [x] All constants properly exported
- [x] FrontmatterGenerator methods work correctly
- [x] YAML output is valid
- [x] All output modes function correctly
- [x] Balanced fields cover all parser types
- [x] Parser profiles are correct
- [x] LLM prompts generate correct structure
- [x] Tests pass with no warnings
- [x] TypeScript compiles with no errors
- [x] README documentation complete
- [x] Package.json properly configured
## Known Limitations
1. **No LLM client**: Package provides prompts but not LLM integration
2. **No file I/O**: Generate strings only, parsers handle file operations
3. **No validation**: Trusts parser-provided data
## Future Enhancements (Not Implemented)
1. LLM client integration (src/llm/ directory)
2. Schema validation for frontmatter
3. Custom field definitions per parser
4. Streaming generation for large documents
## Conclusion
The `@bobai/frontmatter` package is **complete and ready for integration** with all FSS parsers. It provides:
- Consistent BOBAI v1.1 standard frontmatter generation
- Support for all 10 parser types
- Three output modes for different use cases
- LLM enrichment prompt templates
- Comprehensive test coverage
- Full TypeScript type safety
Parsers can immediately begin using this package by installing it as a local dependency and importing the required exports.