311 lines
9.9 KiB
Markdown
311 lines
9.9 KiB
Markdown
# @bobai/frontmatter - Completion Specification
|
|
|
|
## Package Overview
|
|
|
|
| Field | Value |
|
|
|-------|-------|
|
|
| Package Name | `@bobai/frontmatter` |
|
|
| Version | 1.1.0 |
|
|
| Standard | BOBAI Markdown Standard v1.1 |
|
|
| Language | TypeScript |
|
|
| Node.js | >= 18.0.0 |
|
|
| License | MIT |
|
|
|
|
## Implementation Status
|
|
|
|
### Core Features
|
|
|
|
| Feature | Status | Notes |
|
|
|---------|--------|-------|
|
|
| FrontmatterGenerator class | Complete | Static methods for generation |
|
|
| Output modes (none/balanced/complete) | Complete | All three modes implemented |
|
|
| YAML serialization | Complete | Uses js-yaml with proper formatting |
|
|
| Type definitions | Complete | Full TypeScript interfaces |
|
|
| Constants & defaults | Complete | Comprehensive coverage |
|
|
| LLM enrichment prompts | Complete | Prompt templates included |
|
|
| Parser profiles | Complete | All 10 parsers mapped |
|
|
|
|
### Test Coverage
|
|
|
|
| Test Suite | Tests | Status |
|
|
|------------|-------|--------|
|
|
| generator.test.ts | 35 | Passing |
|
|
| constants.test.ts | 16 | Passing |
|
|
| prompts.test.ts | 12 | Passing |
|
|
| **Total** | **63** | **All Passing** |
|
|
|
|
## File Structure
|
|
|
|
```
|
|
bobai-frontmatter/
|
|
├── src/
|
|
│ ├── index.ts # Main exports (27 lines)
|
|
│ ├── generator.ts # FrontmatterGenerator class (123 lines)
|
|
│ ├── types.ts # TypeScript interfaces (47 lines)
|
|
│ ├── constants.ts # Enums, defaults, balanced fields (130 lines)
|
|
│ └── prompts.ts # LLM enrichment prompts (43 lines)
|
|
├── tests/
|
|
│ ├── generator.test.ts # Generator tests (470 lines)
|
|
│ ├── constants.test.ts # Constants tests (140 lines)
|
|
│ └── prompts.test.ts # Prompt tests (80 lines)
|
|
├── dist/ # Compiled JavaScript + type definitions
|
|
├── package.json # NPM configuration with Jest
|
|
├── tsconfig.json # TypeScript configuration
|
|
├── README.md # Comprehensive documentation
|
|
├── COMPLETION_SPEC.md # This document
|
|
└── IMPLEMENTATION_BLUEPRINT.md # Original blueprint
|
|
```
|
|
|
|
## Exports
|
|
|
|
### Types
|
|
|
|
```typescript
|
|
export type OutputMode = 'none' | 'balanced' | 'complete';
|
|
export type AudienceLevel = 'all' | 'beginner' | 'intermediate' | 'expert';
|
|
export type DocPurpose = 'reference' | 'tutorial' | 'troubleshooting' | 'conceptual' | 'guide' | 'specification';
|
|
export type ProfileType = 'scraped' | 'research' | 'technical' | 'code' | 'data' | 'changelog' | 'legal' | 'test' | 'schema' | 'troubleshoot' | 'meeting' | 'faq' | 'config';
|
|
|
|
export interface FrontmatterOptions { ... }
|
|
export interface DeterministicFields { ... }
|
|
export interface LLMEnrichment { ... }
|
|
```
|
|
|
|
### Constants
|
|
|
|
```typescript
|
|
export const AUDIENCE_VALUES: AudienceLevel[]; // 4 values
|
|
export const DOC_PURPOSE_VALUES: DocPurpose[]; // 6 values
|
|
export const PROFILE_VALUES: ProfileType[]; // 13 values
|
|
export const DEFAULTS: { ... }; // 5 defaults
|
|
export const BALANCED_FIELDS: string[]; // 70+ fields
|
|
export const PARSER_PROFILES: Record<string, ProfileType>; // 10 parsers
|
|
```
|
|
|
|
### Functions
|
|
|
|
```typescript
|
|
export class FrontmatterGenerator {
|
|
static generate(options, deterministic?, enrichment?, mode?): string;
|
|
static generateMarkdown(options, deterministic, content, enrichment?, mode?): string;
|
|
}
|
|
|
|
export function getEnrichmentPrompt(content: string, docType?: string): string;
|
|
export function getSamplePromptForDocType(docType: string): string;
|
|
```
|
|
|
|
## Parser Support Matrix
|
|
|
|
### Supported Parsers and Their Balanced Fields
|
|
|
|
| Parser | Profile | Key Balanced Fields |
|
|
|--------|---------|---------------------|
|
|
| fss-parse-pdf | technical | word_count, page_count, has_tables, has_images, has_toc, has_forms, encrypted, author |
|
|
| fss-parse-word | technical | word_count, page_count, paragraph_count, has_tracked_changes, has_toc, author |
|
|
| fss-parse-excel | data | sheet_count, row_count, column_count, author |
|
|
| fss-parse-image | data | width, height, format, channels, has_alpha, ocr_confidence, file_size |
|
|
| fss-parse-audio | meeting | duration, bitrate, sample_rate, codec, has_transcript, speaker_count, language |
|
|
| fss-parse-video | meeting | duration, width, height, fps, aspect_ratio, video_codec, audio_codec |
|
|
| fss-parse-email | data | from, to, cc, sender, recipients, date, message_id, has_attachments, attachment_count, importance |
|
|
| fss-parse-presentation | technical | slide_count, total_slides, word_count, chart_count, has_speaker_notes, has_images |
|
|
| fss-parse-data | data | record_count, format_detected, file_size, column_count |
|
|
| fss-parse-diagram | schema | diagram_count, diagram_type, valid_diagrams, invalid_diagrams, node_count, edge_count |
|
|
|
|
## BALANCED_FIELDS Complete List (70 fields)
|
|
|
|
### Universal Document (10)
|
|
- word_count, page_count, character_count, author, subject, creator, created, modified, file_size, format
|
|
|
|
### Structure Fields (10)
|
|
- has_tables, has_images, table_count, image_count, section_count, has_toc, has_forms, has_tracked_changes, paragraph_count, heading_count
|
|
|
|
### Excel/Data (5)
|
|
- sheet_count, row_count, column_count, record_count, format_detected
|
|
|
|
### Image (7)
|
|
- width, height, channels, has_alpha, color_space, ocr_confidence, has_exif
|
|
|
|
### Audio (8)
|
|
- duration, duration_seconds, bitrate, sample_rate, codec, has_transcript, speaker_count, language
|
|
|
|
### Video (5)
|
|
- fps, aspect_ratio, resolution, video_codec, audio_codec
|
|
|
|
### Presentation (5)
|
|
- slide_count, total_slides, chart_count, has_speaker_notes, has_animations
|
|
|
|
### Email (11)
|
|
- from, to, cc, sender, recipients, date, message_id, has_attachments, attachment_count, importance, thread_id
|
|
|
|
### Diagram (6)
|
|
- diagram_count, diagram_type, valid_diagrams, invalid_diagrams, node_count, edge_count
|
|
|
|
### Analysis (3)
|
|
- encrypted, complexity_score, reading_time_minutes
|
|
|
|
## Default Values
|
|
|
|
| Default | Value | Description |
|
|
|---------|-------|-------------|
|
|
| profile | 'data' | Default document profile |
|
|
| audience | 'all' | Default audience level |
|
|
| extractionConfidence | 1.0 | Default confidence (0.0-1.0) |
|
|
| contentQuality | 1.5 | Default quality score (0.0-2.0) |
|
|
| complexity | 3 | Default complexity (1-5) |
|
|
|
|
## Output Format
|
|
|
|
### Frontmatter Structure
|
|
|
|
```yaml
|
|
---
|
|
# Core fields (always present)
|
|
profile: 'technical'
|
|
created: '2024-01-15T10:30:00.000Z'
|
|
generator: 'fss-parse-pdf'
|
|
version: '1.2.0'
|
|
title: 'Document Title'
|
|
extraction_confidence: 1
|
|
content_quality: 1.5
|
|
source_file: '/path/to/file.pdf'
|
|
|
|
# Deterministic fields (based on mode)
|
|
word_count: 5000
|
|
page_count: 25
|
|
has_tables: true
|
|
# ... more based on parser type
|
|
|
|
# LLM enrichment fields (or placeholders)
|
|
summary: 'Description of document...'
|
|
tags:
|
|
- tag1
|
|
- tag2
|
|
category: 'technical'
|
|
audience: 'intermediate'
|
|
doc_purpose: 'reference'
|
|
complexity: 3
|
|
actionable: false
|
|
key_technologies:
|
|
- TypeScript
|
|
- Node.js
|
|
---
|
|
```
|
|
|
|
## Dependencies
|
|
|
|
### Production
|
|
- `js-yaml` ^4.1.0 - YAML serialization
|
|
|
|
### Development
|
|
- `typescript` ^5.0.0 - TypeScript compiler
|
|
- `jest` ^29.7.0 - Test runner
|
|
- `ts-jest` ^29.1.0 - Jest TypeScript transformer
|
|
- `@types/jest` ^29.5.0 - Jest type definitions
|
|
- `@types/js-yaml` ^4.0.9 - js-yaml type definitions
|
|
- `@types/node` ^20.0.0 - Node.js type definitions
|
|
|
|
## Usage Patterns
|
|
|
|
### Basic Usage
|
|
|
|
```typescript
|
|
import { FrontmatterGenerator } from '@bobai/frontmatter';
|
|
|
|
const markdown = FrontmatterGenerator.generateMarkdown(
|
|
{ generator: 'fss-parse-pdf', version: '1.0.0', title: 'Doc' },
|
|
{ word_count: 1000, page_count: 5 },
|
|
'# Content here'
|
|
);
|
|
```
|
|
|
|
### With LLM Enrichment
|
|
|
|
```typescript
|
|
import { FrontmatterGenerator, getEnrichmentPrompt, LLMEnrichment } from '@bobai/frontmatter';
|
|
|
|
const prompt = getEnrichmentPrompt(content, 'pdf');
|
|
const enrichment: LLMEnrichment = await getLLMResponse(prompt);
|
|
|
|
const markdown = FrontmatterGenerator.generateMarkdown(
|
|
options, deterministic, content, enrichment, 'balanced'
|
|
);
|
|
```
|
|
|
|
### Using Parser Profiles
|
|
|
|
```typescript
|
|
import { PARSER_PROFILES } from '@bobai/frontmatter';
|
|
|
|
const profile = PARSER_PROFILES['fss-parse-audio']; // 'meeting'
|
|
```
|
|
|
|
## Integration Requirements
|
|
|
|
### For Parsers to Use This Package
|
|
|
|
1. **Install**: `npm install ../packages/bobai-frontmatter`
|
|
2. **Import**: `import { FrontmatterGenerator, ... } from '@bobai/frontmatter';`
|
|
3. **Build**: Ensure bobai-frontmatter is built before parser build
|
|
|
|
### Package.json Dependency
|
|
|
|
```json
|
|
{
|
|
"dependencies": {
|
|
"@bobai/frontmatter": "file:../packages/bobai-frontmatter"
|
|
}
|
|
}
|
|
```
|
|
|
|
## Quality Metrics
|
|
|
|
| Metric | Value |
|
|
|--------|-------|
|
|
| Total Lines of Code | ~500 (src) |
|
|
| Test Coverage | 63 tests |
|
|
| TypeScript Strict Mode | Yes |
|
|
| Zero Runtime Errors | Yes |
|
|
| Build Time | < 1s |
|
|
| Test Time | ~1s |
|
|
|
|
## Validation Checklist
|
|
|
|
- [x] All types properly exported
|
|
- [x] All constants properly exported
|
|
- [x] FrontmatterGenerator methods work correctly
|
|
- [x] YAML output is valid
|
|
- [x] All output modes function correctly
|
|
- [x] Balanced fields cover all parser types
|
|
- [x] Parser profiles are correct
|
|
- [x] LLM prompts generate correct structure
|
|
- [x] Tests pass with no warnings
|
|
- [x] TypeScript compiles with no errors
|
|
- [x] README documentation complete
|
|
- [x] Package.json properly configured
|
|
|
|
## Known Limitations
|
|
|
|
1. **No LLM client**: Package provides prompts but not LLM integration
|
|
2. **No file I/O**: Generate strings only, parsers handle file operations
|
|
3. **No validation**: Trusts parser-provided data
|
|
|
|
## Future Enhancements (Not Implemented)
|
|
|
|
1. LLM client integration (src/llm/ directory)
|
|
2. Schema validation for frontmatter
|
|
3. Custom field definitions per parser
|
|
4. Streaming generation for large documents
|
|
|
|
## Conclusion
|
|
|
|
The `@bobai/frontmatter` package is **complete and ready for integration** with all FSS parsers. It provides:
|
|
|
|
- Consistent BOBAI v1.1 standard frontmatter generation
|
|
- Support for all 10 parser types
|
|
- Three output modes for different use cases
|
|
- LLM enrichment prompt templates
|
|
- Comprehensive test coverage
|
|
- Full TypeScript type safety
|
|
|
|
Parsers can immediately begin using this package by installing it as a local dependency and importing the required exports.
|