Feature Request: Auto-extract extension taxonomy facts from XBRL linkbases
Feature Category
Problem Statement
Is your feature request related to a problem? Please describe.
When fetching financials for companies that define custom XBRL extension concepts, Tesla (tsla_AutomotiveRevenue), Berkshire (brka_*), banks, utilities, and many others, all extension facts are silently dropped from the normalized output. These concepts live in the company's own XBRL namespace and are never matched against the standard us-gaap tag map. The result is that the most analytically interesting(from a fundamental perspective), company-specific data (segment revenue breakdowns, custom balance sheet line items, non-standard cash flow adjustments) is invisible.
Today, company_mappings/ ships hardcoded JSON for exactly 3 companies (TSLA, MSFT, BRKA). This cannot scale: there are thousands of SEC filers with extension namespaces, and the mappings go stale as companies rename concepts across filings.
The data to solve this is already in every filing. The XBRL submission package that edgartools already fetches contains three linkbase files .xsd, _lab.xml, _cal.xml that together provide the human-readable label and the structural GAAP parent relationship for every extension concept, with no hardcoding required.
Who would benefit from this feature?
Proposed Solution
Describe the solution you'd like
When edgartools parses an XBRL filing package, add a linkbase extraction pass that reads the three files already present in the submission:
-
.xsd - identifies which extension elements are non-abstract (actual reportable values) vs. structural (axes, domains, members, table/line-item wrappers). The abstract="true" attribute is the authoritative signal.
-
_lab.xml (label linkbase) - resolves the arc chain (loc, labelArc, label) to get the company-authored human-readable label for each extension concept. This is exactly the label that would appear in the printed 10-K/10-Q.
-
_cal.xml (calculation linkbase) - for each extension concept that appears as a child of a us-gaap_ parent in a calculation arc, records the parent concept and weight (+1.0 additive, -1.0 subtractive). This is the company's own declaration of where the concept sits within the financial statement hierarchy.
The output would be a collection of ExtensionFact objects (or similar) exposed alongside the standard normalized facts, annotated with label and parent relationship so downstream consumers can place them correctly without guessing.
Describe alternatives you've considered
- Expanding
company_mappings/ manually, not scalable, goes stale, requires maintenance per company per filing year.
- Using the
companyfacts API endpoint, this endpoint pre-aggregates facts but strips the linkbase metadata. The label and parent relationship information is only available in the per-filing XML package, which edgartools already accesses for its XBRL parsing.
- Ignoring extension facts entirely, the status quo. For many companies this is a significant gap in coverage, particularly for segment-level analysis.
Use Case Example
How would you use this feature?
from edgar import Company
company = Company("TSLA")
financials = company.get_financials()
# Standard facts work as today
revenue = financials.income_statement["Revenue"]
# Extension facts — new
extension_facts = financials.extension_facts # or similar
for fact in extension_facts:
print(fact.concept) # "tsla_RestructuringAndOtherExpenses"
print(fact.label) # "Restructuring And Other Expenses"
print(fact.parent_concept) # "us-gaap_OperatingExpenses"
print(fact.weight) # 1.0 (additive component of parent)
print(fact.value) # 1_730_000_000
print(fact.period) # "Q3 2025"
For developers building applications that need segment-level data:
# Get all revenue sub-components Tesla reports that aren't standard GAAP
revenue_segments = [
f for f in financials.extension_facts
if f.parent_concept == "us-gaap_Revenues"
]
# → tsla_AutomotiveRevenues, tsla_EnergyGenerationAndStorageRevenues, tsla_ServicesAndOtherRevenues
Implementation Considerations
Proof of concept
Verified against Tesla's Q3 2025 10-Q using only stdlib + requests 17 non-abstract extension concepts with full parent relationships extracted from a single filing with zero hardcoding. Happy to share the extraction script if useful.
The GAAP parent relationship is precisely what's needed to slot each extension fact into the statement hierarchy and it's declared by the company itself in the filing.
Complexity Level:
The linkbase parsing code already exists in edgartools' xbrl/ module for other purposes. The main work is wiring the extraction into the normalization pipeline and defining the output type for extension facts.
Backwards Compatibility:
- ✅ This feature maintains backwards compatibility
Extension facts would be surfaced as an additive property or collection existing .income_statement, .balance_sheet, .cash_flow_statement access patterns are unchanged. The company_mappings/ JSON files could remain supported as an override layer for cases where manual curation is preferred.
Additional Context
Why the companyfacts API can't solve this
The SEC's data.sec.gov/api/xbrl/companyfacts/CIK.json endpoint, the most convenient EDGAR data source, pre-aggregates facts but strips all linkbase metadata. The label and parent relationship for extension concepts is only available in the per-filing XML submission package. Edgartools already accesses these packages for its XBRL instance parsing; this feature would read the linkbases that are fetched alongside them.
Filtering structural elements
Not all extension namespace elements are facts. XBRL filings use extension namespaces for structural scaffolding: *Axis, *Domain, *Member, *Abstract, *Table, *LineItems. These should be excluded. The reliable filter is abstract="true" in the .xsd, structural elements declare themselves abstract; reportable value elements do not. In the Tesla example above, 37 of 83 extension concepts are abstract and filtered out, leaving 46 actual reportable elements.
Statement role filtering
Some extension concepts appear in calculation links only under disclosure schedule roles (note tables, supplemental detail schedules) rather than primary financial statement roles (ConsolidatedBalanceSheets, ConsolidatedStatementsofOperations, etc.). These could optionally be flagged with lower prominence, since primary statement facts are what most consumers want by default.
Related Issues/Features:
- The existing
company_mappings/ system in edgar/xbrl/standardization/ addresses the same problem via hardcoding this feature would make that approach unnecessary for the common case.
- The
unmapped_logger.py in the same directory already tracks unrecognised concepts, suggesting this gap is known.
Feature requests are evaluated based on EdgarTools' core principles: Simple yet powerful, accurate financials, beginner-friendly, and joyful UX.
Feature Request: Auto-extract extension taxonomy facts from XBRL linkbases
Feature Category
Problem Statement
Is your feature request related to a problem? Please describe.
When fetching financials for companies that define custom XBRL extension concepts, Tesla (
tsla_AutomotiveRevenue), Berkshire (brka_*), banks, utilities, and many others, all extension facts are silently dropped from the normalized output. These concepts live in the company's own XBRL namespace and are never matched against the standardus-gaaptag map. The result is that the most analytically interesting(from a fundamental perspective), company-specific data (segment revenue breakdowns, custom balance sheet line items, non-standard cash flow adjustments) is invisible.Today,
company_mappings/ships hardcoded JSON for exactly 3 companies (TSLA, MSFT, BRKA). This cannot scale: there are thousands of SEC filers with extension namespaces, and the mappings go stale as companies rename concepts across filings.The data to solve this is already in every filing. The XBRL submission package that edgartools already fetches contains three linkbase files
.xsd,_lab.xml,_cal.xmlthat together provide the human-readable label and the structural GAAP parent relationship for every extension concept, with no hardcoding required.Who would benefit from this feature?
Proposed Solution
Describe the solution you'd like
When edgartools parses an XBRL filing package, add a linkbase extraction pass that reads the three files already present in the submission:
.xsd- identifies which extension elements are non-abstract (actual reportable values) vs. structural (axes, domains, members, table/line-item wrappers). Theabstract="true"attribute is the authoritative signal._lab.xml(label linkbase) - resolves the arc chain (loc, labelArc, label) to get the company-authored human-readable label for each extension concept. This is exactly the label that would appear in the printed 10-K/10-Q._cal.xml(calculation linkbase) - for each extension concept that appears as a child of aus-gaap_parent in a calculation arc, records the parent concept and weight (+1.0additive,-1.0subtractive). This is the company's own declaration of where the concept sits within the financial statement hierarchy.The output would be a collection of
ExtensionFactobjects (or similar) exposed alongside the standard normalized facts, annotated with label and parent relationship so downstream consumers can place them correctly without guessing.Describe alternatives you've considered
company_mappings/manually, not scalable, goes stale, requires maintenance per company per filing year.companyfactsAPI endpoint, this endpoint pre-aggregates facts but strips the linkbase metadata. The label and parent relationship information is only available in the per-filing XML package, which edgartools already accesses for its XBRL parsing.Use Case Example
How would you use this feature?
For developers building applications that need segment-level data:
Implementation Considerations
Proof of concept
Verified against Tesla's Q3 2025 10-Q using only stdlib + requests 17 non-abstract extension concepts with full parent relationships extracted from a single filing with zero hardcoding. Happy to share the extraction script if useful.
The GAAP parent relationship is precisely what's needed to slot each extension fact into the statement hierarchy and it's declared by the company itself in the filing.
Complexity Level:
The linkbase parsing code already exists in edgartools'
xbrl/module for other purposes. The main work is wiring the extraction into the normalization pipeline and defining the output type for extension facts.Backwards Compatibility:
Extension facts would be surfaced as an additive property or collection existing
.income_statement,.balance_sheet,.cash_flow_statementaccess patterns are unchanged. Thecompany_mappings/JSON files could remain supported as an override layer for cases where manual curation is preferred.Additional Context
Why the
companyfactsAPI can't solve thisThe SEC's
data.sec.gov/api/xbrl/companyfacts/CIK.jsonendpoint, the most convenient EDGAR data source, pre-aggregates facts but strips all linkbase metadata. The label and parent relationship for extension concepts is only available in the per-filing XML submission package. Edgartools already accesses these packages for its XBRL instance parsing; this feature would read the linkbases that are fetched alongside them.Filtering structural elements
Not all extension namespace elements are facts. XBRL filings use extension namespaces for structural scaffolding:
*Axis,*Domain,*Member,*Abstract,*Table,*LineItems. These should be excluded. The reliable filter isabstract="true"in the.xsd, structural elements declare themselves abstract; reportable value elements do not. In the Tesla example above, 37 of 83 extension concepts are abstract and filtered out, leaving 46 actual reportable elements.Statement role filtering
Some extension concepts appear in calculation links only under disclosure schedule roles (note tables, supplemental detail schedules) rather than primary financial statement roles (
ConsolidatedBalanceSheets,ConsolidatedStatementsofOperations, etc.). These could optionally be flagged with lower prominence, since primary statement facts are what most consumers want by default.Related Issues/Features:
company_mappings/system inedgar/xbrl/standardization/addresses the same problem via hardcoding this feature would make that approach unnecessary for the common case.unmapped_logger.pyin the same directory already tracks unrecognised concepts, suggesting this gap is known.Feature requests are evaluated based on EdgarTools' core principles: Simple yet powerful, accurate financials, beginner-friendly, and joyful UX.