Skip to content

Add cecil-data-analysis-xql skill#4

Open
jayendra13 wants to merge 1 commit into
cecilearth:mainfrom
jayendra13:add-cecil-data-analysis-xql-skill
Open

Add cecil-data-analysis-xql skill#4
jayendra13 wants to merge 1 commit into
cecilearth:mainfrom
jayendra13:add-cecil-data-analysis-xql-skill

Conversation

@jayendra13
Copy link
Copy Markdown
Contributor

@jayendra13 jayendra13 commented May 11, 2026

Summary

Adds a new skill, skills/cecil-data-analysis-xql/, that answers earth-observation analysis questions against Cecil datasets using xarray-sql (xql) as the SQL layer.

The skill owns the full loop:

  1. Pick the right dataset from the Cecil catalog (prefers existing subscriptions; gates new ones on explicit confirmation).
  2. Load it via client.load_xarray(subscription_id).
  3. Register the dataset and every variable's reference_table on an xarray_sql.XarrayContext.
  4. Run the query and present SQL → result table → plain-English interpretation as one compact block.

Why xql: SQL beats pandas chains for joining categorical reference tables (no integer codes leaking into answers), windows like ROW_NUMBER OVER / LAG OVER make dominant-class and pixel-level change-detection queries one-liners, and DataFusion streams over the dask-backed xarray Dataset the SDK already returns.

What's in the skill

  • SKILL.md — instructions, golden-rule output format, worked example
  • references/ — Cecil dataset catalog pointers, condensed SDK reference, xql patterns + gotchas, and the canonical demo script
  • scripts/list_subscriptions.py, inspect_dataset.py, run_analysis.py (load → register → run SQL → save result.csv + result.md + result.png)

Cross-references

Per CONTRIBUTING:

Skill checklist

  • Directory name (cecil-data-analysis-xql) is kebab-case and matches the name: in frontmatter.
  • Frontmatter contains name, description, license: MIT.
  • Body covers prerequisites, steps, constraints, references.
  • Cross-references to sibling skills use relative paths.

Test plan

  • Smoke test: list_subscriptions.py — returns 2 subscriptions on Land Cover 9-Class, output matches the documented format.
  • Smoke test: inspect_dataset.py against the Land Cover 9-Class dataset — returns 9-class reference table cleanly.
  • End-to-end: dominant-land-cover-class worked example through run_analysis.py — runs the windowed SQL, joins the reference table, writes result.csv / result.md / result.png, prints the Markdown summary.

End-to-end output:

|   year | dominant_class   |      px |
|-------:|:-----------------|--------:|
|   2020 | Trees            | 9167795 |
|   2023 | Crops            | 8876719 |

The end-to-end test caught one missing transitive dependency (tabulate, needed by pandas.DataFrame.to_markdown()); fix included in the second commit on this branch.

@jayendra13 jayendra13 force-pushed the add-cecil-data-analysis-xql-skill branch from f4fcb2f to a232e5d Compare May 11, 2026 14:28
A Claude skill for answering earth-observation analysis questions against
Cecil datasets, with the SQL layer provided by xarray-sql (xql). Picks a
dataset, loads it via the Cecil SDK, registers the dataset and every
variable's reference_table on an XarrayContext, runs the query, and
presents query + result table + interpretation as a single block.

Contents:
- SKILL.md: instructions, golden-rule output format, worked example
- references/datasets.md: per-category selection guidance + gotchas
  (the live catalog comes from client.list_datasets())
- references/sdk.md: SDK gotchas that aren't in docstrings (the SDK
  has none — signatures come from inspect.signature)
- references/xarray_sql.md: XarrayContext patterns, quoting rules,
  the cftime UDF caveat
- scripts/list_subscriptions.py, inspect_dataset.py, run_analysis.py
  (load → register → run SQL → save result.csv + result.md + result.png)
@jayendra13 jayendra13 force-pushed the add-cecil-data-analysis-xql-skill branch from a232e5d to 4eb928b Compare May 11, 2026 14:29
alexlogs added a commit that referenced this pull request May 12, 2026
Structural restructure of #4 (jayendra13/add-cecil-data-analysis-xql-skill).
The original PR is a single skill containing 4 runnable Python scripts and
3 reference documents (~1,200 LOC). That shape is closer to a tutorial than
a skill — skills are short, focused text loaded into an agent's context at
inference time, not multi-file CLI projects.

Splitting into two artifacts:

- skills/cecil-data-analysis-xql/SKILL.md (~135 lines)
  Just what an agent needs in-context: the subscription gate (wallet
  safety), the golden-rule output format, the SQL idioms, the PascalCase
  quoting rule, vector-vs-raster routing, and failure modes. Links out to
  the tutorial for everything else.

- tutorials/cecil-data-analysis-xql/
  ├── README.md          full Step 0–5 walkthrough + worked example
  ├── references/        datasets.md, sdk.md, xarray_sql.md (Jayendra's)
  └── scripts/           _env.py, list_subscriptions.py, inspect_dataset.py,
                         run_analysis.py (Jayendra's, with three small fixes)

Three review fixes applied to the script:

1. run_analysis.py: added --vector flag using load_dataframe +
   XarrayContext.from_pandas. Previously the runner unconditionally called
   load_xarray, which would fail on IBAT vector datasets the description
   says are in scope ("threatened species ranges intersect this AOI").

2. run_analysis.py: replaced fig.autofmt_xdate() in the bar-chart branch
   with rotation of categorical xticks. autofmt_xdate is a date-axis
   helper applied to a non-date axis.

3. README.md: pinned xarray-sql to a sub-0.1 range (pre-1.0 package; the
   XarrayContext().from_dataset() API is the kind that drifts in 0.x).

Also dropped the bash-specific `set -a && source .env && set +a` syntax
from the prerequisites — fish/zsh-without-posix users would hit it.

If this merges, #4 should close as superseded.

Co-Authored-By: jayendra13 <651057+jayendra13@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
alexlogs added a commit that referenced this pull request May 12, 2026
Structural restructure of #4 (jayendra13/add-cecil-data-analysis-xql-skill).
The original PR is a single skill containing 4 runnable Python scripts and
3 reference documents (~1,200 LOC). That shape is closer to a tutorial than
a skill — skills are short, focused text loaded into an agent's context at
inference time, not multi-file CLI projects.

Splitting into two artifacts:

- skills/cecil-data-analysis-xql/SKILL.md (~135 lines)
  Just what an agent needs in-context: the subscription gate (wallet
  safety), the golden-rule output format, the SQL idioms, the PascalCase
  quoting rule, vector-vs-raster routing, and failure modes. Links out to
  the tutorial for everything else.

- tutorials/cecil-data-analysis-xql/
  ├── README.md          full Step 0–5 walkthrough + worked example
  ├── references/        datasets.md, sdk.md, xarray_sql.md (Jayendra's)
  └── scripts/           _env.py, list_subscriptions.py, inspect_dataset.py,
                         run_analysis.py (Jayendra's, with three small fixes)

Three review fixes applied to the script:

1. run_analysis.py: added --vector flag using load_dataframe +
   XarrayContext.from_pandas. Previously the runner unconditionally called
   load_xarray, which would fail on IBAT vector datasets the description
   says are in scope ("threatened species ranges intersect this AOI").

2. run_analysis.py: replaced fig.autofmt_xdate() in the bar-chart branch
   with rotation of categorical xticks. autofmt_xdate is a date-axis
   helper applied to a non-date axis.

3. README.md: pinned xarray-sql to a sub-0.1 range (pre-1.0 package; the
   XarrayContext().from_dataset() API is the kind that drifts in 0.x).

Also dropped the bash-specific `set -a && source .env && set +a` syntax
from the prerequisites — fish/zsh-without-posix users would hit it.

If this merges, #4 should close as superseded.

Co-Authored-By: jayendra13 <jayendra0parmar@gmail.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant