Add cecil-data-analysis-xql skill#4
Open
jayendra13 wants to merge 1 commit into
Open
Conversation
f4fcb2f to
a232e5d
Compare
A Claude skill for answering earth-observation analysis questions against Cecil datasets, with the SQL layer provided by xarray-sql (xql). Picks a dataset, loads it via the Cecil SDK, registers the dataset and every variable's reference_table on an XarrayContext, runs the query, and presents query + result table + interpretation as a single block. Contents: - SKILL.md: instructions, golden-rule output format, worked example - references/datasets.md: per-category selection guidance + gotchas (the live catalog comes from client.list_datasets()) - references/sdk.md: SDK gotchas that aren't in docstrings (the SDK has none — signatures come from inspect.signature) - references/xarray_sql.md: XarrayContext patterns, quoting rules, the cftime UDF caveat - scripts/list_subscriptions.py, inspect_dataset.py, run_analysis.py (load → register → run SQL → save result.csv + result.md + result.png)
a232e5d to
4eb928b
Compare
alexlogs
added a commit
that referenced
this pull request
May 12, 2026
Structural restructure of #4 (jayendra13/add-cecil-data-analysis-xql-skill). The original PR is a single skill containing 4 runnable Python scripts and 3 reference documents (~1,200 LOC). That shape is closer to a tutorial than a skill — skills are short, focused text loaded into an agent's context at inference time, not multi-file CLI projects. Splitting into two artifacts: - skills/cecil-data-analysis-xql/SKILL.md (~135 lines) Just what an agent needs in-context: the subscription gate (wallet safety), the golden-rule output format, the SQL idioms, the PascalCase quoting rule, vector-vs-raster routing, and failure modes. Links out to the tutorial for everything else. - tutorials/cecil-data-analysis-xql/ ├── README.md full Step 0–5 walkthrough + worked example ├── references/ datasets.md, sdk.md, xarray_sql.md (Jayendra's) └── scripts/ _env.py, list_subscriptions.py, inspect_dataset.py, run_analysis.py (Jayendra's, with three small fixes) Three review fixes applied to the script: 1. run_analysis.py: added --vector flag using load_dataframe + XarrayContext.from_pandas. Previously the runner unconditionally called load_xarray, which would fail on IBAT vector datasets the description says are in scope ("threatened species ranges intersect this AOI"). 2. run_analysis.py: replaced fig.autofmt_xdate() in the bar-chart branch with rotation of categorical xticks. autofmt_xdate is a date-axis helper applied to a non-date axis. 3. README.md: pinned xarray-sql to a sub-0.1 range (pre-1.0 package; the XarrayContext().from_dataset() API is the kind that drifts in 0.x). Also dropped the bash-specific `set -a && source .env && set +a` syntax from the prerequisites — fish/zsh-without-posix users would hit it. If this merges, #4 should close as superseded. Co-Authored-By: jayendra13 <651057+jayendra13@users.noreply.github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
alexlogs
added a commit
that referenced
this pull request
May 12, 2026
Structural restructure of #4 (jayendra13/add-cecil-data-analysis-xql-skill). The original PR is a single skill containing 4 runnable Python scripts and 3 reference documents (~1,200 LOC). That shape is closer to a tutorial than a skill — skills are short, focused text loaded into an agent's context at inference time, not multi-file CLI projects. Splitting into two artifacts: - skills/cecil-data-analysis-xql/SKILL.md (~135 lines) Just what an agent needs in-context: the subscription gate (wallet safety), the golden-rule output format, the SQL idioms, the PascalCase quoting rule, vector-vs-raster routing, and failure modes. Links out to the tutorial for everything else. - tutorials/cecil-data-analysis-xql/ ├── README.md full Step 0–5 walkthrough + worked example ├── references/ datasets.md, sdk.md, xarray_sql.md (Jayendra's) └── scripts/ _env.py, list_subscriptions.py, inspect_dataset.py, run_analysis.py (Jayendra's, with three small fixes) Three review fixes applied to the script: 1. run_analysis.py: added --vector flag using load_dataframe + XarrayContext.from_pandas. Previously the runner unconditionally called load_xarray, which would fail on IBAT vector datasets the description says are in scope ("threatened species ranges intersect this AOI"). 2. run_analysis.py: replaced fig.autofmt_xdate() in the bar-chart branch with rotation of categorical xticks. autofmt_xdate is a date-axis helper applied to a non-date axis. 3. README.md: pinned xarray-sql to a sub-0.1 range (pre-1.0 package; the XarrayContext().from_dataset() API is the kind that drifts in 0.x). Also dropped the bash-specific `set -a && source .env && set +a` syntax from the prerequisites — fish/zsh-without-posix users would hit it. If this merges, #4 should close as superseded. Co-Authored-By: jayendra13 <jayendra0parmar@gmail.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a new skill,
skills/cecil-data-analysis-xql/, that answers earth-observation analysis questions against Cecil datasets using xarray-sql (xql) as the SQL layer.The skill owns the full loop:
client.load_xarray(subscription_id).reference_tableon anxarray_sql.XarrayContext.Why
xql: SQL beats pandas chains for joining categorical reference tables (no integer codes leaking into answers), windows likeROW_NUMBER OVER/LAG OVERmake dominant-class and pixel-level change-detection queries one-liners, and DataFusion streams over the dask-backed xarray Dataset the SDK already returns.What's in the skill
SKILL.md— instructions, golden-rule output format, worked examplereferences/— Cecil dataset catalog pointers, condensed SDK reference, xql patterns + gotchas, and the canonical demo scriptscripts/—list_subscriptions.py,inspect_dataset.py,run_analysis.py(load → register → run SQL → saveresult.csv+result.md+result.png)Cross-references
Per CONTRIBUTING:
subscribe-and-load— sets up the subscriptionload_xarrayexpects.land-cover-baseline-and-change— same outputs without a SQL layer.Skill checklist
cecil-data-analysis-xql) is kebab-case and matches thename:in frontmatter.name,description,license: MIT.Test plan
list_subscriptions.py— returns 2 subscriptions on Land Cover 9-Class, output matches the documented format.inspect_dataset.pyagainst the Land Cover 9-Class dataset — returns 9-class reference table cleanly.run_analysis.py— runs the windowed SQL, joins the reference table, writesresult.csv/result.md/result.png, prints the Markdown summary.End-to-end output:
The end-to-end test caught one missing transitive dependency (
tabulate, needed bypandas.DataFrame.to_markdown()); fix included in the second commit on this branch.