Skip to content

feat(financial): EDINET XBRL CSVから構造化財務諸表を抽出#4

Merged
beatinaniwa merged 6 commits into
mainfrom
feat/financial-statement-extraction
Apr 1, 2026
Merged

feat(financial): EDINET XBRL CSVから構造化財務諸表を抽出#4
beatinaniwa merged 6 commits into
mainfrom
feat/financial-statement-extraction

Conversation

@beatinaniwa

@beatinaniwa beatinaniwa commented Apr 1, 2026

Copy link
Copy Markdown
Owner

概要

AIエージェント(Claude Code等)が財務分析を行えるよう、EDINET XBRL CSVから構造化された財務諸表を抽出する機能を追加。CLIはデータ抽出に徹し、指標計算・比較・スクリーニングはAIエージェントの責務とする設計。

変更内容

新コマンド

  • doc financial <docID> — 単一有報から構造化財務諸表(BS/PL/CF)+ summaryを抽出
  • company financials <code> --periods N — 企業の複数期財務データを一括取得
  • schema financial-elements — 157個の既知XBRL要素マッピングを出力

新パッケージ: internal/financial/

  • context.go — EDINET CSVコンテキストIDパーサー(年次/四半期/半期対応)
  • classifier.go — XBRL要素→財務諸表分類器(IFRS/JP-GAAP 100+要素、SummaryKeyマッピング)
  • parser.go — CSV→構造化財務諸表パーサー(連結/個別自動選択、重複解決、summary構築)
  • types.go — Summary, FinancialData, LineItem等の型定義

summary設計

バフェットコードの全指標計算入力をカバーする約25項目:
revenue, operating_income, net_income, total_assets, equity, operating_cf, eps, interest_bearing_debt等

その他

  • internal/service/financial.go — FinancialService(キャッシュ、破損リカバリ、statement フィルタ)
  • cmd/helpers.go — validateStatement, ptrBool等の共通ヘルパー追加
  • internal/schema/schema.go — ListCommands()に新コマンド登録

Codexローカルレビューの結果

  • 4ラウンド(修正: 8件、スキップ: 0件)
    • R1: 期間取得順序(最新優先に修正)、IFRS非連結行の混入防止
    • R2: 四半期コンテキストID対応、statementフィルタ後のConsolidated再計算
    • R3: summaryのstatementフィルタ連動、キャッシュ破損リトライのセマンティックエラー除外
    • R4: NonConsolidatedMember+その他の連結判定修正、periodOrderの四半期ソート安定化

🤖 Generated with Claude Code

…inancialsコマンドを追加

AIエージェント(Claude Code等)がバフェットコード的な財務分析を行うためのデータ抽出基盤。
summaryフィールドで主要財務項目(売上高、営業利益、純利益、総資産、自己資本、営業CF等)に
即アクセス可能。IFRS/JP-GAAP両対応、連結・個別の自動選択を実装。

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ab9f570e85

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +345 to +347
if hasNonCons {
return nonConsRows, false
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve neutral "other" rows in non-consolidated mode

When --non-consolidated is used, selectConsolidation returns only nonConsRows, which excludes all rows from the "その他" bucket. In EDINET CSV, several non-consolidation-specific items (for example filing-date share metrics like jpcrp_cor:NumberOfIssuedSharesAsOfFilingDateTotal) are typically tagged as "その他", so they disappear entirely even when they should still be reported. This makes summary and statement items incomplete for valid non-consolidated requests whenever such neutral rows are present.

Useful? React with 👍 / 👎.

Codex PRレビュー指摘: --non-consolidated時にjpcrp_cor:NumberOfIssuedShares等の
連結・個別に依存しない共通項目が除外されていた。other行をニュートラル(jpcrp_cor/jpdei_cor)と
IFRS連結(jpigp_cor等)に分離し、前者は常に含める。
@beatinaniwa

Copy link
Copy Markdown
Owner Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c8d2e677d1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/financial/parser.go Outdated
}

// Detect accounting standard
acctStd := detectAccountingStandard(rows)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Detect accounting standard from selected rows only

buildResult selects consolidated/non-consolidated rows per statement, but then computes AccountingStd from the full rows slice, including rows that were explicitly excluded by consolidation selection. In mixed filings (for example, consolidated IFRS with non-consolidated JP-GAAP rows present), this can label the returned statements as jpgaap even when the selected data is IFRS, which makes downstream analysis incorrect. Compute the standard from selections (or from the built statements) so it matches the data actually returned.

Useful? React with 👍 / 👎.

混合ファイリング(連結IFRS+非連結JP-GAAP)で非連結を指定した場合、
全行から検出していたため誤ってIFRSと判定されていた問題を修正。
selections(選択済み行)からのみ会計基準を検出するよう変更。
@beatinaniwa

Copy link
Copy Markdown
Owner Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c0e2a7ee91

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/service/financial.go Outdated
Comment on lines +124 to +128
DocType: "120",
From: from,
To: to,
RateLimit: rateLimit,
Limit: 0, // fetch all, then take latest N

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Stop using wide day-by-day scans to collect annual filings

GetCompanyFinancials requests filings across a periods*400 day window with Limit: 0, which causes companySvc.Filings/DocumentService.List to iterate and call the EDINET list endpoint for every single date in that range. With the default periods=3, this means roughly 1,200 list requests before statement downloads even begin, making the command very slow and likely to hit API throttling/timeouts in real use; this should short-circuit once enough annual reports are found instead of scanning the entire window.

Useful? React with 👍 / 👎.

GetCompanyFinancialsが全期間を日単位で順スキャン(periods*400日分)して
大量のAPIコールを発生させていた問題を修正。
ListOptionsにReverseフラグを追加し、最新日付から逆順にスキャンして
必要数の有報を見つけた時点で早期終了するよう変更。
@beatinaniwa

Copy link
Copy Markdown
Owner Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d6742bea12

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +218 to +220
if rawValue != "" && rawValue != "-" {
if v, err := strconv.ParseFloat(rawValue, 64); err == nil {
value = &v

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve zero values encoded as dash in parsed items

Rows whose is "-" are currently treated as missing (Value=nil), so explicit zero values are dropped from both line items and summary metrics. This causes accuracy issues for legitimately zero fields (for example cash-flow totals in a period with no activity), because downstream consumers cannot distinguish 0 from "not present" and computed summaries omit those keys.

Useful? React with 👍 / 👎.

Comment on lines +128 to +129
Limit: periods,
Reverse: true, // scan from newest to oldest, stop once enough found

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fetch additional filings when one of the limited set fails

The company-level fetch limits filing discovery to exactly periods, but later skips failed documents with continue; when any of those filings fails to download/parse, the result silently returns fewer periods than requested even if older valid filings exist just outside the initial limit. This makes output completeness depend on transient per-document failures rather than available history.

Useful? React with 👍 / 👎.

@beatinaniwa

Copy link
Copy Markdown
Owner Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d6742bea12

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

From: from,
To: to,
RateLimit: rateLimit,
Limit: periods,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Backfill older filings when recent periods fail

The initial filing query is capped at periods, so if any of those recent filings fail to download/parse, the function returns fewer periods even when older annual reports in the same lookback window could satisfy the request. This can silently under-deliver historical data for company financials --periods N whenever one of the newest filings is malformed or temporarily unavailable.

Useful? React with 👍 / 👎.

Comment on lines +357 to +361
if hasNonCons {
return nonConsRows, false
}
if hasCons {
*warnings = append(*warnings, fmt.Sprintf("statement %s: non_consolidated data requested but not available, using consolidated as fallback", st))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep neutral rows in explicit non-consolidated mode

nonConsRows already includes neutral jpcrp_cor/jpdei_cor rows, but this branch only returns it when hasNonCons is true (which ignores neutral-only data). For statements that contain only neutral rows, --non-consolidated can drop valid values (for example filing-date/share/dividend items) and incorrectly make the statement disappear.

Useful? React with 👍 / 👎.

hasNonConsがfalseでもneutralOther(jpcrp_cor/jpdei_cor)に
行がある場合にnonConsRowsを返すよう条件を拡張。
filing-date項目など非連結行なしでneutral行のみのstatementが
消失していた問題を修正。
@beatinaniwa

Copy link
Copy Markdown
Owner Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9b512b6be9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/financial/parser.go Outdated
Comment on lines +357 to +358
if hasNonCons || len(neutralOther) > 0 {
return nonConsRows, false

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Fallback to consolidated when only neutral rows exist

In explicit non-consolidated mode, this branch treats neutralOther as sufficient non-consolidated data and returns nonConsRows even when true non-consolidated rows are absent. For filings that have consolidated BS/PL/CF rows plus neutral jpcrp_cor rows (e.g., shares/dividend items), this drops the core consolidated statement rows instead of taking the intended consolidated fallback path with a warning, producing incomplete statements and missing summary keys such as total_assets or revenue.

Useful? React with 👍 / 👎.

Comment on lines +117 to +120
lookbackDays := periods * 400
jst := time.FixedZone("JST", 9*60*60)
nowJST := time.Now().In(jst)
from := nowJST.AddDate(0, 0, -lookbackDays).Format("2006-01-02")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Cap lookback to EDINET's 10-year window

The command accepts --periods up to 10, but lookbackDays := periods * 400 makes from about 4000 days in the past for that valid input (~10.96 years). That exceeds the repository’s own EDINET date-window assumption (10 years), so the maximum documented setting can produce out-of-range date requests and fail before any filings are processed. Clamping this lookback (or reducing allowed periods) avoids a user-visible failure at supported inputs.

Useful? React with 👍 / 👎.

…するよう修正

hasNonConsとlen(neutralOther)の条件を分離し、
consolidated行がある場合は必ずconsolidatedフォールバックを使用するよう修正。
neutral-onlyの場合のみneutral行を返す。
@beatinaniwa

Copy link
Copy Markdown
Owner Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4434d66966

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +440 to +444
key := dedupeKey{stmtType, r.contextInfo.Period, r.elementID}

if seen[key] {
continue
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Include unit/context in deduplication key

The dedupe logic in buildStatement treats (statement, period, elementID) as globally unique and drops later rows, but it does not distinguish different units or contexts. When a filing provides the same concept/period in multiple units or parallel contexts, whichever row appears first is kept and the rest are silently discarded, which can produce materially wrong values in both statement rows and the derived summary.

Useful? React with 👍 / 👎.

Comment on lines +296 to +299
if strings.HasPrefix(elementID, "jpcrp030000-asr_") {
if colonIdx := strings.Index(elementID, ":"); colonIdx >= 0 {
suffix := elementID[colonIdx+1:]
if def, ok := companySuffixes[suffix]; ok {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Match company-specific suffixes for non-ASR jpcrp IDs

Company-specific suffix mapping is gated to jpcrp030000-asr_ IDs only, so equivalent company-specific elements in other jpcrp taxonomies (notably quarterly forms) bypass this mapping and fall back to keyword heuristics. Those heuristic classifications do not set SummaryKey, so key metrics like revenue/operating income are omitted from summary on non-ASR filings even though the suffix map already has the intended mapping.

Useful? React with 👍 / 👎.

@beatinaniwa

Copy link
Copy Markdown
Owner Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 🎉

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@beatinaniwa beatinaniwa merged commit b5cc333 into main Apr 1, 2026
2 checks passed
@beatinaniwa beatinaniwa deleted the feat/financial-statement-extraction branch April 1, 2026 09:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant