Skip to content

Conversation

@djstrong
Copy link
Contributor

@djstrong djstrong commented Sep 30, 2025

  • Introduced convert-csv command for converting CSV files to .ensrainbow format.
  • Added support for single and two-column CSV formats, without headers
  • Implemented error handling for invalid CSV data.
  • Created tests for various CSV scenarios, including special characters and invalid formats.
  • Updated package dependencies to include fast-csv for CSV parsing.

@changeset-bot
Copy link

changeset-bot bot commented Sep 30, 2025

🦋 Changeset detected

Latest commit: b02b7f1

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 14 packages
Name Type
ensrainbow Patch
ensindexer Patch
ensadmin Patch
ensapi Patch
@ensnode/datasources Patch
@ensnode/ensrainbow-sdk Patch
@ensnode/ponder-metadata Patch
@ensnode/ensnode-schema Patch
@ensnode/ensnode-react Patch
@ensnode/ponder-subgraph Patch
@ensnode/ensnode-sdk Patch
@ensnode/shared-configs Patch
@docs/ensnode Patch
@docs/ensrainbow Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vercel
Copy link

vercel bot commented Sep 30, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Review Updated (UTC)
admin.ensnode.io Ready Ready Preview, Comment Dec 17, 2025 3:50pm
ensnode.io Ready Ready Preview, Comment Dec 17, 2025 3:50pm
ensrainbow.io Ready Ready Preview, Comment Dec 17, 2025 3:50pm

- Introduced `convert-csv` command for converting CSV files to .ensrainbow format.
- Added support for single and two-column CSV formats.
- Implemented error handling for invalid CSV data.
- Created tests for various CSV scenarios, including special characters and invalid formats.
- Updated package dependencies to include `csv-simple-parser` for CSV parsing.
- Added new command-line options for CSV conversion: `--silent`, `--disable-dedup`, `--cache-size`, `--use-bloom-filter`, and `--bloom-filter-size`.
- Implemented a deduplication database using ClassicLevel with optional Bloom filter for faster processing.
- Updated the conversion process to support deduplication and improved memory management.
- Enhanced logging for large file processing and added tests for new deduplication features.
- Added a function to estimate memory usage of Maps for better tracking.
- Reduced default cache size in DeduplicationDB from 10000 to 1000.
- Enhanced backpressure handling during CSV writing to prevent memory overflow.
- Updated logging to include output backpressure events and improved performance for large files.
- Streamlined the CSV processing to operate in a completely sequential manner.
- Removed unused command-line options for deduplication and Bloom filter from the CLI interface.
- Updated default progress interval from 10000 to 50000 records for improved performance.
- Enhanced logging for file processing and memory management during CSV conversion.
- Cleaned up code for better readability and maintainability.
Copy link
Member

@lightwalker-eth lightwalker-eth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djstrong Great work here 👍 Reviewed and shared some suggestions. Appreciate your advice 👍

"ensrainbow": patch
---

feat: add CSV conversion command to ensrainbow CLI
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
feat: add CSV conversion command to ensrainbow CLI
feat: add CSV conversion command to ensrainbow CLI to convert rainbow tables from CSV format to ensrainbow format

```bash title="Convert legacy SQL data"
pnpm run convert --input-file path/to/ens_names.sql.gz --output-file subgraph-0.ensrainbow
```
- **SQL Conversion**: Convert legacy ENS Subgraph data (`ens_names.sql.gz`) using `pnpm run convert`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be a problem to flip these? In other words, the default conversion case moving forward will be converting from CSV files. Converting from SQL files is what should be the special case.

Therefore it seems to me that convert should convert a CSV and convert-sql should convert a legacy SQL file such as the legacy ENS Subgraph data.

What do you think?

```bash title="Convert legacy SQL data"
pnpm run convert --input-file path/to/ens_names.sql.gz --output-file subgraph-0.ensrainbow
```
- **SQL Conversion**: Convert legacy ENS Subgraph data (`ens_names.sql.gz`) using `pnpm run convert`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For converting legacy ENS Subgraph data, suggest including a link to https://github.com/graphprotocol/ens-rainbow

@@ -0,0 +1,675 @@
---
title: Creating ENSRainbow Files
description: Complete guide to creating .ensrainbow files from SQL dumps and CSV data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
description: Complete guide to creating .ensrainbow files from SQL dumps and CSV data.
description: Complete guide to creating .ensrainbow files.

sidebar:
label: Creating Files
order: 3
keywords: [ensrainbow, file creation, conversion, sql, csv]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
keywords: [ensrainbow, file creation, conversion, sql, csv]
keywords: [ensrainbow, file creation, conversion, csv]

Goal: The .sql file conversion case is a really niche use case. Happy for us to include docs about it, however we should limit our promotion of the .sql conversion case. It exists only for legacy purposes. In general we should only talk about the .csv conversion case.

}
const maybeLabelHash = providedHash.startsWith("0x") ? providedHash : `0x${providedHash}`;
try {
const labelHash = labelHashToBytes(maybeLabelHash as LabelHash);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, where here did we validate that the provided labelhash is the correct labelhash? Or was there a performance issue if we did that?

At the very minimum I would want to see us add a check here that the bytes we are given here is the expected length of a valid labelhash. Ex: "0x1234" is not enough bytes and should be invalid. Or is that already handled inside labelHashToBytes?

label: label,
};
} else {
// Two columns: validate and use provided hash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see here it says "validate" but I don't see full validation?

outputStream: NodeJS.WritableStream,
lineNumber: number,
existingDb: ENSRainbowDB | null,
dedupDb: DeduplicationDB,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we removed DeduplicationDb and instead used existingDb for this purpose?

In other words, what if we changed the convert command so that it was always a combined "convert + ingest"?

In other words:

  • If we ingest from a .ensrainbow file then it doesn't need to output a new .ensrainbow file
  • If we ingest from a .csv file or a .sql file then it might initialize / update the existingDb while also producing as output the incremental .ensrainbow file.

If we did this then it seems we could ingest as we convert and then remove the need for this dedupDb?

Appreciate your advice if this is a good idea or if it would add a bunch of scope. Thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The drawback of proposed approach is that the existing database will change during the process or even will be in unhealthy state (if some error occurs during process).
In DeduplicationDb we are saving only labels - I hope the change will not increase memory consumption.
Probably this is good change for user experience but may take some time to implement and test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The decision is to not combine the commands in this PR.

*/
export async function convertCsvCommand(options: ConvertCsvCommandOptions): Promise<void> {
// Validate that existingDbPath is provided when labelSetVersion > 0
if (options.labelSetVersion > 0 && !options.existingDbPath) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my other feedback suggesting that we remove the labelSetVersion option from this command as it can be determined dynamically through the existingDbPath

import { ENSRainbowDB } from "@/lib/database";

import { convertCsvCommand } from "./convert-csv-command";

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super work on these tests 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants