-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Summary
After reviewing the codebase, there's significant code duplication across connectors (CSV, JSON, Notion) that should be extracted to @dashframe/engine.
Duplicated Code (~110 lines per connector)
1. Type Inference (inferType, parseValue)
- CSV:
packages/connector-csv/src/index.ts(lines 32-66) - JSON:
packages/connector-json/src/index.ts(lines 50-126) - Nearly identical logic for detecting number, boolean, date, string types
- Same
parseValuefunction for converting raw values to typed values
2. Arrow DataFrame Creation
- CSV:
packages/connector-csv/src/index.ts(lines 172-215) - JSON:
packages/connector-json/src/index.ts(lines 260-301) - Same pattern: create Arrow vectors by type, build Table, convert to IPC, create DataFrame
3. Field Generation
- CSV:
packages/connector-csv/src/index.ts(lines 151-170) - JSON:
packages/connector-json/src/index.ts(lines 239-258) - Notion:
packages/connector-notion/src/index.ts(lines 92-142) - Same
_rowIndexcomputed field pattern, UUID generation, SourceSchema creation
4. Primary Key Detection
- Both CSV and JSON use:
/^_?id$/i.test(col.name) - Falls back to
_rowIndexif no ID column found
Proposed Solution
Add shared utilities to @dashframe/engine:
packages/engine/src/
├── connectors/
│ ├── FileSourceConnector.ts # (existing)
│ ├── type-inference.ts # inferType, parseValue, detectPrimaryKey
│ ├── arrow-helpers.ts # createArrowColumns, createDataFrame
│ ├── field-generation.ts # generateFields, generateSourceSchema
│ └── index.ts
Exports
// type-inference.ts
export function inferColumnType(value: unknown): ColumnType
export function parseValueByType(value: unknown, type: ColumnType): unknown
export function detectPrimaryKeyColumn(columnNames: string[]): string
// arrow-helpers.ts
export function createArrowColumnsFromRows(
rows: Record<string, unknown>[],
columns: { name: string; type: ColumnType }[]
): Record<string, Vector<DataType>>
// field-generation.ts
export function generateFieldsFromColumns(
columns: TableColumn[],
tableId: UUID,
systemFields?: { name: string; type: ColumnType }[]
): Field[]
export function generateSourceSchema(columns: TableColumn[]): SourceSchemaBenefits
- ~110 lines removed from each connector
- Consistency - single source of truth for type inference
- Easier to add connectors - Excel, Parquet, XML connectors get utilities for free
- Better testing - centralized logic = comprehensive tests in one place
- Simpler dependency graph - connectors already depend on
@dashframe/engine
Tasks
- Create
packages/engine/src/connectors/type-inference.ts - Create
packages/engine/src/connectors/arrow-helpers.ts - Create
packages/engine/src/connectors/field-generation.ts - Add comprehensive tests for shared utilities
- Refactor
@dashframe/connector-csvto use shared utilities - Refactor
@dashframe/connector-jsonto use shared utilities - Refactor
@dashframe/connector-notionto use shared utilities - Update exports in
@dashframe/engine
Related
- PR feat: Add JSON file connector for importing JSON data #20 - Add JSON file connector (introduced the duplication pattern)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels