Skip to content

refactor: Extract shared connector utilities to @dashframe/engine #22

@youhaowei

Description

@youhaowei

Summary

After reviewing the codebase, there's significant code duplication across connectors (CSV, JSON, Notion) that should be extracted to @dashframe/engine.

Duplicated Code (~110 lines per connector)

1. Type Inference (inferType, parseValue)

  • CSV: packages/connector-csv/src/index.ts (lines 32-66)
  • JSON: packages/connector-json/src/index.ts (lines 50-126)
  • Nearly identical logic for detecting number, boolean, date, string types
  • Same parseValue function for converting raw values to typed values

2. Arrow DataFrame Creation

  • CSV: packages/connector-csv/src/index.ts (lines 172-215)
  • JSON: packages/connector-json/src/index.ts (lines 260-301)
  • Same pattern: create Arrow vectors by type, build Table, convert to IPC, create DataFrame

3. Field Generation

  • CSV: packages/connector-csv/src/index.ts (lines 151-170)
  • JSON: packages/connector-json/src/index.ts (lines 239-258)
  • Notion: packages/connector-notion/src/index.ts (lines 92-142)
  • Same _rowIndex computed field pattern, UUID generation, SourceSchema creation

4. Primary Key Detection

  • Both CSV and JSON use: /^_?id$/i.test(col.name)
  • Falls back to _rowIndex if no ID column found

Proposed Solution

Add shared utilities to @dashframe/engine:

packages/engine/src/
├── connectors/
│   ├── FileSourceConnector.ts  # (existing)
│   ├── type-inference.ts       # inferType, parseValue, detectPrimaryKey
│   ├── arrow-helpers.ts        # createArrowColumns, createDataFrame  
│   ├── field-generation.ts     # generateFields, generateSourceSchema
│   └── index.ts

Exports

// type-inference.ts
export function inferColumnType(value: unknown): ColumnType
export function parseValueByType(value: unknown, type: ColumnType): unknown
export function detectPrimaryKeyColumn(columnNames: string[]): string

// arrow-helpers.ts
export function createArrowColumnsFromRows(
  rows: Record<string, unknown>[],
  columns: { name: string; type: ColumnType }[]
): Record<string, Vector<DataType>>

// field-generation.ts
export function generateFieldsFromColumns(
  columns: TableColumn[],
  tableId: UUID,
  systemFields?: { name: string; type: ColumnType }[]
): Field[]
export function generateSourceSchema(columns: TableColumn[]): SourceSchema

Benefits

  • ~110 lines removed from each connector
  • Consistency - single source of truth for type inference
  • Easier to add connectors - Excel, Parquet, XML connectors get utilities for free
  • Better testing - centralized logic = comprehensive tests in one place
  • Simpler dependency graph - connectors already depend on @dashframe/engine

Tasks

  • Create packages/engine/src/connectors/type-inference.ts
  • Create packages/engine/src/connectors/arrow-helpers.ts
  • Create packages/engine/src/connectors/field-generation.ts
  • Add comprehensive tests for shared utilities
  • Refactor @dashframe/connector-csv to use shared utilities
  • Refactor @dashframe/connector-json to use shared utilities
  • Refactor @dashframe/connector-notion to use shared utilities
  • Update exports in @dashframe/engine

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions