feat(structural-parsing): cross-locale label resolution, OrderPage split, synthetic CHIPS structure (0.9.0–0.9.2)#104
Closed
Flummy1 wants to merge 8 commits into
Closed
feat(structural-parsing): cross-locale label resolution, OrderPage split, synthetic CHIPS structure (0.9.0–0.9.2)#104Flummy1 wants to merge 8 commits into
Flummy1 wants to merge 8 commits into
Conversation
Address structural-parsing recall problems found via live engine testing on funpaybotengine PR #35: structure built from data-fields JSON keeps English IDs while OfferPage / OrderPage render localized labels, leaving get_structured_fields with ~1/8 recall on real data. - SubcategoryFieldDef gains casefolded `aliases: set[str]`; OfferFieldsParser registers field id and form <label> as aliases. label_map / lower_label_map now index aliases. New `lookup_field_id`, `add_alias`, and heuristic `enrich_from_offer` (matches by SELECT/DROPDOWN option value). - OfferPreview / OrderPreview gain `parse_title_fields(structure)` (cherry- picked from the lost feature/subcategory-structure-parser commits and adapted to the dict-based fields layout). SELECT/DROPDOWN segments are validated against `options`; unmatched segments are dropped. - OrderPage data is split into `metadata` (canonical RU/EN/UA-aware order fields) and `lot_fields` (everything else). `get_structured_fields` now reads `lot_fields` only, eliminating false positives from order-metadata labels colliding with structural ones. `data` is preserved for back-compat. - FieldCondition and SubcategoryStructure now extend FunPayObject with `raw_source`, removing the need for the asdict() shim in engine pydantic validators. - Bump 0.8.0 -> 0.8.2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3da79ee to
b027046
Compare
…ues to int Options like '20 USD', '5000 RUB', '13 звёзд', '50 stars' carry a numeric magnitude with a redundant trailing unit; the unit is already encoded in field_def.id (usd / rub / quantity), so consumers should not have to re-strip it. Pure-text options (e.g. currency='USD', method='По username') are preserved as strings unchanged. Updated test_select_match_kept (now test_select_quantity_collapsed_to_int) and added test_select_non_numeric_option_kept_as_string for coverage of the string-preserving path.
… label seeding, right-to-left title matcher
Follow-up on the multi-source label index. Live testing on 32 real
purchases / 8 subcategories surfaced three remaining failure modes
that capped enriched recall at ~80% (43/54 lot_fields). This commit
addresses all three:
Task 1 — Value normalization for option matching.
Sellers commonly decorate values with emoji / star symbols ('RUB🔥'
vs option 'RUB', 'По логину🔥' vs 'По логину'), and option lists
on the filter form are clean. New module-level _normalize_option
helper (casefold + strip emoji/symbol decorations + collapse
whitespace + strip outer punctuation) is now applied on both sides
of the comparison in `SubcategoryStructure.enrich_from_offer` and
in `_parse_title_fields`. Latin / Cyrillic / digits / interior
punctuation are preserved.
Task 2 — Auto-alias the canonical localized label, plus an
`enrich_from_offer_fields` cross-source helper.
`SubcategoryFieldDef.__post_init__` now adds the casefolded `label`
to `aliases` automatically — single source of truth for label
registration, removing the duplicated parser-side wiring. New
`SubcategoryStructure.enrich_from_offer_fields(offer_fields)`
method registers the canonical localized labels from an
authenticated offerEdit-page schema as aliases on a structure
built from the (often English-labeled) public listing form,
unblocking TEXT fields that have no `options` for value-based
enrichment to work against.
Task 3 — Right-to-left greedy `_parse_title_fields`.
Replaced positional rsplit+zip with a reverse walk that tries each
rightmost not-yet-consumed segment against the current field; on
miss, the segment is left for an earlier field. Fixes short-title
rsplit-mismatch where titles with fewer segments than suffix
fields produced wrong assignments (e.g. '50 звёзд, По username'
with 3 declared suffix fields). Also folds in _normalize_option
in the SELECT/DROPDOWN branch so decorated title segments resolve
to the canonical option.
Bump 0.8.2 → 0.8.3 (purely additive surface, behavior of existing
match cases preserved).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ECT first
Live regression on gift-card subcat #1316: titles like
'…, RUB, 2000 RUB' were resolving to {inr2: 2000, currency: 'RUB'}
instead of {rub: 2000, currency: 'RUB'}. Root cause was the right-to-
left matcher in 425e447 attributing leading-numeric segments to the
LAST declared NUMERIC_RANGE field (`inr2`, conditioned on
`inr='другое количество'`) without verifying that the condition holds.
Two-pass algorithm now:
Pass 1 — SELECT/DROPDOWN, right-to-left. These are authoritative
because the segment must literally appear in the field's
options.
Pass 2 — NUMERIC_RANGE for unconsumed segments, ONLY for fields whose
conditions are satisfied by pass-1 results (or have no
conditions). Walks fields in declaration order so the
innermost matching field wins.
Adds two tests:
- test_select_preferred_over_numeric_range_with_unsatisfied_conditions:
reproduces the live #1316 case.
- test_unconditional_numeric_range_still_matches: ensures gating
does not regress unconditional NUMERIC_RANGE fields.
531 tests pass.
b1c35ef to
71be2f3
Compare
Closes the remaining ~16% recall gap on the live structural-parsing stand by addressing two well-scoped failure modes from the dataset research: Task 1 — Synthetic SubcategoryStructure for CHIPS subcategories. CHIPS-listing pages (e.g. #173 Necropolis) often have no ``div.lot-fields`` block, so no authoritative structure can be parsed. But each CHIPS offer's ``other_data`` already carries structured ``(field_id → value_id)`` pairs — enough to synthesize one. New ``_synthesize_chips_structure`` helper in the parser module builds a ``SubcategoryStructure`` from the union of ``other_data`` keys/values across the parsed previews, with each distinct key as a SELECT field whose options are the first-seen values, and labels pulled from ``other_data_names`` (or the field id as fallback). Wired into ``SubcategoryPageParser._parse``: parses the offers first, and when ``structure is None``, ``subcategory_type`` is CHIPS, and the new opt-in ``fallback_structure_from_chips_offers`` flag on ``SubcategoryPageParsingOptions`` is set, synthesizes a structure from them. OFFERS subcategories without lot-fields are intentionally not covered — their ``OfferPage.fields`` carry only order metadata, nothing structural to infer. ``SubcategoryStructure`` gains a ``derived_from`` field (``Literal['lot_fields', 'chips_offers']``, default ``'lot_fields'``) and an ``is_synthetic`` property, so callers can distinguish authoritative vs inferred structures without sniffing ``raw_source``. Task 2 — Composite lot-label expansion in ``_split_order_data``. ``OrderPage.lot_fields`` regularly contains composite labels of the form ``'количество usd' = '20 USD'``, but the structure exposes the canonical currency-amount field as ``usd`` — ``get_structured_fields`` couldn't bridge the two. New ``_COMPOSITE_LABEL_RE`` matches ``'<quantity-locale> <2-4-letter-id>'`` (broad enough to survive FunPay adding new currencies) and ``_COMPOSITE_VALUE_RE`` extracts the numeric magnitude. When both match, an additional ``lot_fields[currency_id] = '<number>'`` entry is emitted alongside the original composite label. Original wins on synthetic-key collisions (``setdefault``). Behavior gated by ``OrderPageParsingOptions.expand_composite_lot_labels`` (default ``True`` — preserves the recall-boosting semantics — but callers asserting ``len(lot_fields)`` can opt out). Bump 0.8.3 → 0.9.0. Surface additions are opt-in and back-compat, but composite expansion changes the contents of ``lot_fields`` for existing payloads, which is the "minor breaking" trigger for the minor bump. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ce provenance (0.9.1) Adds two new no-HTTP enrichment paths and a provenance side-channel so aliases registered by each enrich_from_* source can be inspected and selectively invalidated. * AliasSource enum (LABEL / LISTING / OFFER_EDIT / OFFER_PAGE / ORDER_PAGE / OFFER_PREVIEW / USER) and a SubcategoryStructure-side _alias_sources map keyed by (field_id, alias_casefold). * add_alias gains source= kwarg (default USER, backward-compatible); alias_source() and forget_aliases_from() expose / invalidate by source. * enrich_from_order_page(OrderPage) — value-match against SELECT options, mirror of enrich_from_offer for completed-order payloads. * enrich_from_offer_previews(Iterable[OfferPreview]) — registers other_data_names entries as aliases for ids already in the structure. * Existing enrich_from_offer / enrich_from_offer_fields tagged with OFFER_PAGE / OFFER_EDIT respectively; auto-aliases seeded from SubcategoryFieldDef.label tagged LABEL. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…re merge (0.9.2) * OfferPage.delivery_fields_spec — parsed from <form action="/orders/new">, maps input.name → label for per-order delivery contract fields (Telegram username, Steam login, …) * SubcategoryStructure.delivery_fields + enrich_delivery_fields_from_offer to accumulate specs across observed offers (first-seen label wins) * OrderPage.delivery_fields with static ORDER_DELIVERY_LABELS blacklist; reclassify_with_structure for high-precision per-subcategory routing * lookup_field_id(context=…) scores ambiguous matches against FieldCondition state, disambiguating shared labels like quantity SELECT vs quantity2 NUMERIC_RANGE * OrderPage.get_structured_fields uses iterative resolve with accumulated context, falling back to first-by-declaration * SubcategoryStructure.merge_from for combining synthetic + authoritative structures: deep-copies missing fields, unions aliases, carries alias provenance, invalidates label-map caches Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…thods Title parsing moves to a dedicated external package (title_playground). Remove _TITLE_SUFFIX_TYPES, _parse_title_fields from subcategory_structure.py, OfferPreview.parse_title_fields and OrderPreview.parse_title_fields, their tests (TestParseTitleFields, TestParseTitleFieldsRightToLeft), and the now-unused SubcategoryStructure TYPE_CHECKING import in orders.py. _normalize_option and _DECORATION_RE are preserved — they are still used by enrich_from_offer and enrich_from_order_page for option-value matching.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why this PR exists
SubcategoryStructure(introduced in #102) was built fromdata-fieldsJSON, so it indexed fields by their English IDs ('weapon','quantity','currency'). But every other surface the engine actually consumes —OfferPage.fields,OrderPageparam-list, the buyer's order form — renders localized labels ('Категория','количество','Тип валюты','Telegram Username','Логин Steam'). Consequence, measured live across iterating stands:OrderPage.lot_fieldsb027046(multi-source aliases + offerEdit-side seeding)enrich_from_offer8d979c5(synthetic CHIPS + composite labels, 0.9.0)54e4d56(no-HTTP enrichment paths + provenance, 0.9.1)e78cc88(delivery-fields classification + context lookup + merge, 0.9.2)Three orthogonal contract issues blocked engine-side integration and were addressed alongside the recall work:
OrderPage.datamixed three categories of data — order metadata ('игра','сумма'), lot-config fields ('регион','количество usd'), and per-order delivery-contract data the buyer types at checkout ('telegram username','логин steam'). Lookups against the flat dict risked false positives across all three.FieldCondition/SubcategoryStructureweren'tFunPayObjects — broke pydanticfrom_attributesvalidators infunpaybotengine.lot-fieldsblock produced no structure — even though every offer'sother_datacarried structured(field_id → value_id)pairs.This PR addresses the whole pipeline.
Commit chronology
b027046SubcategoryFieldDef.aliases, multi-sourcelabel_map,lookup_field_id/add_alias/enrich_from_offer/enrich_from_offer_fields,OrderPagedata split intometadata+lot_fields,FunPayObjectunification ofFieldConditionandSubcategoryStructure.98dfc96'20 USD','13 звёзд') collapse to bareintinenrich_from_offervalue matching since the unit is already encoded infield_id.425e447_normalize_option(casefold + strip emoji/symbols + collapse ws + strip outer punct) bridges decorated values like'RUB🔥'to clean options.__post_init__onSubcategoryFieldDefauto-addslabeltoaliases(single source of truth — parser-side wiring becomes a one-liner). Newenrich_from_offer_fieldscross-source helper.71be2f3enrich_from_offerskips NUMERIC_RANGE candidates whoseconditionsare not satisfied by the already-resolved context. Fixes the live regression on subcat #1316.8d979c5SubcategoryStructurefor CHIPS subcategories withoutlot-fields(built from union ofOfferPreview.other_datakeys/values; opt-in flag). Composite-label expansion in_split_order_data('количество usd' = '20 USD'→ also'usd' = '20'; opt-out flag).derived_fromfield +is_syntheticproperty. Bump to 0.9.0.54e4d56enrich_from_order_page(OrderPage)andenrich_from_offer_previews(Iterable[OfferPreview]). AddsAliasSourceenum + side-channel_alias_sourcesmap;add_aliasgains asource=kwarg (defaultUSER, backward-compatible); newalias_source()/forget_aliases_from(source). Bump to 0.9.1.e78cc88OfferPage.delivery_fields_spec,SubcategoryStructure.delivery_fields+enrich_delivery_fields_from_offer,OrderPage.delivery_fields+ staticORDER_DELIVERY_LABELSblacklist,reclassify_with_structure. Context-awarelookup_field_id(context=…).SubcategoryStructure.merge_from. Bump to 0.9.2.8da34ad_parse_title_fields,_TITLE_SUFFIX_TYPES,OfferPreview.parse_title_fields,OrderPreview.parse_title_fields, and their tests. Title parsing moves to a dedicated external package._normalize_option/_DECORATION_REare preserved — still used byenrich_from_offerandenrich_from_order_page.Areas changed
1. Multi-source label index —
funpayparsers/types/subcategory_structure.pyThe label/lookup machinery so a structure built from English
data-fieldsIDs still resolves localized labels seen elsewhere.SubcategoryFieldDef.aliases: set[str]— additional casefolded label aliases.SubcategoryFieldDef.__post_init__casefolds every alias entry, drops empties, and auto-adds the field's ownlabelas an alias. Single source of truth for label registration.SubcategoryStructure.label_map(@cached_property) indexes bothf.label(as-is, possibly localized) and every entry inf.aliases. Per-fieldseenset guards againstlabelitself being one of the aliases.SubcategoryStructure.lower_label_map(@cached_property) merges duplicates produced by casefolding without re-introducing already-listed field IDs.SubcategoryStructure.add_alias(field_id, alias, source=...)— registers a casefolded alias and busts both cached label maps viaself.__dict__.pop(...).SubcategoryStructure.enrich_from_offer_fields(offer_fields)— registers the canonical localized labels from an authenticatedOfferFields.field_schema(offerEdit page) as aliases on a structure built from the public listing form, bridging the listing-form / offer-page locale gap. Returnsselffor chaining.SubcategoryStructure.enrich_from_offer(offer)— heuristic enrichment by SELECT/DROPDOWN option value: registerslabelas alias only if exactly one field matches (after_normalize_optionon both sides). Returnsself.SubcategoryStructure.from_offer_fieldspropagatesoffer_fields.raw_sourceinto the new structure.Parser side (
funpayparsers/parsers/offer_fields_parser.py): onlyaliases={field_id}is passed; the localized<label>text flows in through__post_init__.2. Context-aware
lookup_field_id— 0.9.2SubcategoryStructure.lookup_field_id(label, *, context=None)now disambiguates ambiguous matches against already-resolved fields. Some FunPay subcategories declare two fields under the same form-locale label — most commonly aquantitySELECT and aquantity2NUMERIC_RANGE both labelled'Количество робуксов', where the latter is gated onquantity='другое количество'. Previously the lookup returnedNoneon ambiguity andget_structured_fieldssilently picked first-by-declaration.FieldConditionstate in context (mapping{field_id: value}of fields already resolved this pass):Noneon miss,Noneon top-tier tie or all-zero scoring (still ambiguous), the winning id otherwise.contextis omitted and the label is ambiguous, returnsNoneexactly like before.OrderPage.get_structured_fieldsis rewritten as an iterative resolve that feeds the accumulating result back as context on each step, with a fallback to first-by-declaration when context is insufficient (preserves legacy behaviour for callers with no per-field disambiguation needs).3. Value normalization —
_DECORATION_RE/_normalize_optionSellers commonly inject decorations into rendered values that are absent from the canonical filter-form options (
'RUB🔥'vs option'RUB','По логину🔥'vs'По логину','★★★Premium★★★'vs'Premium')._DECORATION_REcovers extended pictographs (emoji proper), misc symbols + dingbats, regional indicators (flags), misc technical (⌚ ⌛ ⏰), misc symbols and arrows (⭐ ⬆), variation selectors, ZWJ. Latin / Cyrillic / digits / interior punctuation are deliberately preserved._normalize_option(s)— casefold + strip decorations + collapse whitespace + strip outer punctuation.enrich_from_offerandenrich_from_order_page(both sides of the value comparison).4.
OrderPage.datasplit: metadata / lot_fields / delivery_fieldsThe previous flat
datadict mixed RU/EN/UA-localized order metadata with everything else. Afterb027046it became(metadata, lot_fields); aftere78cc88it splits into three disjoint dicts:metadata— eight canonical keys (game,category,short_description,detailed_description,amount,open,closed,total), exposed via the public constantORDER_METADATA_LABELS.lot_fields— lot-config fields and the input forget_structured_fields. Composite labels like'количество usd' = '20 USD'are additionally indexed under the trailing currency id ('usd' = '20') so the structure'susd/rub/… fields can resolve them.setdefaultsemantics on synthetic-key collisions.delivery_fields— new in 0.9.2. Per-order data the buyer typed into the order form (Telegram username, Steam login, character name, email, …). Routed via the public constantORDER_DELIVERY_LABELS(static blacklist of common casefolded labels). Live data showed 10 of 62 stand orders losing recall to misclassified delivery labels —'telegram username'× 5 on subcat #2418 (Telegram Stars),'логин steam'× 5 on subcat #1086 (Steam)._split_order_datasignature is now:The new
extra_delivery_labelskwarg lets callers pass a casefolded harvest ofSubcategoryStructure.delivery_fields.values()for high-precision per-subcategory classification (extends, does not replace, the static blacklist). All three returned dicts are pairwise disjoint by key.OrderPage.reclassify_with_structure(structure)re-splitsdatausing the structure'sdelivery_fieldsfor high-precision routing — useful when the page was originally parsed without a structure on hand and the caller has since obtained one.OrderPage.metadata/lot_fields/delivery_fieldsare additional fields on the dataclass;datais preserved verbatim as the union for back-compat. All metadata properties (short_description,full_description,amount,open_date_text,close_date_text,order_category_name,order_subcategory_name,order_total) read frommetadatadirectly.Wired in
funpayparsers/parsers/page_parsers/order_page_parser.py:OrderPageParsingOptions.expand_composite_lot_labels: bool = True(opt-out for callers assertinglen(lot_fields)).dataonce after the param-list loop and wires all three resulting dicts into the constructedOrderPage.5. Buyer order form parsing →
OfferPage.delivery_fields_spec— 0.9.2The authoritative source for "which
OrderPage.datalabels are buyer-typed delivery inputs vs lot-config" is the offer's own<form action="/orders/new">. Each<div class="form-group">containing a<label class="control-label">and a named<input>/<select>is one delivery field._DELIVERY_FORM_RESERVED_NAMES—csrf_token,type,preview,offer_id,price_guard,username(chrome autofill bug-fix dummy),method(payment dropdown),amount,sum._DELIVERY_FORM_SKIP_CLASSES—multiple-purchase-switcher,offer-calc-box,js-price-row,js-order-prices._parse_delivery_fields_spec(form_node)— returns{input_name: label}. Skips reserved-name / skip-class /type="hidden"/ unlabeled groups. First named input per form-group wins (setdefault).OfferPage.delivery_fields_spec: dict[str, str]. Wired inOfferPageParser._parseviapage_content.css_first('form[action$="orders/new"]', strict=False); gracefully empty when the form is absent (anonymous offer pages).SubcategoryStructure.delivery_fields: dict[str, str](also new in 0.9.2) accumulates specs across observed offers viaenrich_delivery_fields_from_offer(offer). First-seen-label wins so the label seen on the first observed offer stays stable across the subcategory. The accumulated dict feedsOrderPage.reclassify_with_structurefor high-precision per-order routing.6. Synthetic
SubcategoryStructurefor CHIPS —subcategory_page_parser.pyCHIPS-listing pages (e.g. #173) often have no
div.lot-fieldsblock, so no authoritative structure can be parsed. But each CHIPS offer'sother_dataalready carries structured(field_id → value_id)pairs ({'server': 12448, 'side': 2}) — enough to synthesize one. Out of scope: OFFERS subcategories withoutlot-fields(e.g. #3789 ChatGPT) carry only order metadata inOfferPage.fields, nothing structural to infer._synthesize_chips_structure(subcategory_id, offers)— builds aSubcategoryStructurefrom the union ofOfferPreview.other_datakeys/values across the parsed previews. Each distinct key becomes aSubcategoryFieldDef(type=SELECT)whoseoptionsare the first-seen values. Labels come fromOfferPreview.other_data_names, withfield_idas fallback.SubcategoryPageParsingOptions.fallback_structure_from_chips_offers: bool = False.SubcategoryPageParser._parse: subcategory id is computed once; offers are parsed before the synth fallback; whenstructure is None,subcategory_type is CHIPS, and the option is set, synthesizes a structure from the parsed offers.Provenance is exposed on the structure itself:
SubcategoryStructure.derived_from: Literal['lot_fields', 'chips_offers'], default'lot_fields'. The default preserves back-compat for every existing callsite.SubcategoryStructure.is_syntheticproperty —Trueiffderived_from != 'lot_fields'. Replaces sniffingraw_sourcefor provenance.7. No-HTTP enrichment paths + alias provenance — 0.9.1
enrich_from_offer(OfferPage)covers the post-purchase OfferPage path, but two common flows still hit the structure with locale-mismatched labels: completed-order processing (where the engine has anOrderPagein hand from aNEW_ORDERmessage but noOfferPage) and listing-page batch enrichment (where everyOfferPreview.other_dataalready carries structured(field_id → value)pairs — datapoints that cost zero extra HTTP).A second concern: with multiple enrich-from-* sources feeding the same
aliasesset, debugging "why was this alias added?" and selectively invalidating one source's contributions become real needs. Solved with a side-channel provenance map.AliasSource—LABEL/LISTING/OFFER_EDIT/OFFER_PAGE/ORDER_PAGE/OFFER_PREVIEW/USER. Exported via__all__.SubcategoryStructure._alias_sources: dict[(field_id, alias_cf), AliasSource]. Side-channel rather than a per-SubcategoryFieldDefchange so the publicaliases: set[str]API stays untouched.SubcategoryStructure.__post_init__seeds existing field aliases (auto-added bySubcategoryFieldDef.__post_init__from the field'slabel) withAliasSource.LABEL.add_alias(field_id, alias, source=AliasSource.USER)— gained asourcekwarg with a backward-compatible default. Re-registering an existing alias updates the recorded source.alias_source(field_id, alias) -> AliasSource | None— provenance lookup.forget_aliases_from(source) -> int— drops every alias previously registered with source, returning the count removed.enrich_from_offerandenrich_from_offer_fieldsnow tag their additions withOFFER_PAGEandOFFER_EDITrespectively.New no-HTTP enrichment paths:
enrich_from_order_page(order: OrderPage)— value-match heuristic onorder.lot_fields, mirror ofenrich_from_offer. Source:ORDER_PAGE.enrich_from_offer_previews(offers: Iterable[OfferPreview])— for each preview, registersother_data_names[field_id]as an alias for anyfield_idalready inself.fields. Source:OFFER_PREVIEW.enrich_delivery_fields_from_offer(offer)(0.9.2) — accumulatesOfferPage.delivery_fields_specentries intoSubcategoryStructure.delivery_fieldswith first-seen-label-wins semantics.8.
SubcategoryStructure.merge_from— 0.9.2Combines two structures into one. Used by the engine to layer synthetic structure (incomplete but cheap, from CHIPS listing) under an authoritative
from_offer_fieldsstructure, or to hydrate persistent-cached aliases on top of a freshly parsed structure.field_idonly in other — deep-copies the wholeSubcategoryFieldDefand carries over_alias_sourcesentries.field_idin both — unions the alias sets only.selfis treated as the authoritative source forlabel,options,type, andconditions(other's values for these are ignored)._alias_sources, falling back toAliasSource.USER.delivery_fieldsare unioned with the same first-seen-wins semantic.label_map/lower_label_mapcaches.selffor chaining.9.
FieldConditionandSubcategoryStructureunified underFunPayObjectIn
funpaybotenginePR #35 (commitd66422f), pydantic wrappers built viamodel_validate(..., from_attributes=True)choked onFieldConditionandSubcategoryStructurebecause they were plain@dataclasses with noraw_source. The engine had to add anasdict()shim in_add_raw_source.FieldConditionextendsFunPayObjectwithraw_source: str = field(default='', compare=False). All other fields gain defaults so existing kw-only callers keep working.SubcategoryStructuresimilarly.OfferFieldsParser._parse_conditionsstores the raw condition JSON on eachFieldConditionviajson.dumps(cond, ensure_ascii=False).SubcategoryPageParserpasseslot_fields_div.htmlasraw_sourceto the constructedSubcategoryStructure.The engine's
asdict()shim can be removed after this PR is released.10. Version bump —
pyproject.toml0.8.0→0.8.2→0.8.3→0.9.0→0.9.1→0.9.2.The
0.9.0minor bump covered composite-label expansion (additive, but a contract change for callers assertinglot_fieldsdict size). The0.9.1patch bump was strictly additive. The0.9.2patch bump is similarly additive at the dataclass level, but_split_order_datanow returns a 3-tuple — a breaking change for any caller importing that private symbol directly (none in tree besides the parser and updated tests).Public API surface added or changed
SubcategoryFieldDef.aliasestypes/subcategory_structure.pySubcategoryStructure.derived_fromtypes/subcategory_structure.pySubcategoryStructure.delivery_fieldstypes/subcategory_structure.pySubcategoryStructure.is_synthetictypes/subcategory_structure.pySubcategoryStructure.lookup_field_id(label, *, context=None)types/subcategory_structure.pycontextkwarg in 0.9.2)SubcategoryStructure.add_aliastypes/subcategory_structure.pySubcategoryStructure.add_alias(source=...)types/subcategory_structure.pyUSER)SubcategoryStructure.enrich_from_offertypes/subcategory_structure.pySubcategoryStructure.enrich_from_offer_fieldstypes/subcategory_structure.pySubcategoryStructure.enrich_from_order_pagetypes/subcategory_structure.pySubcategoryStructure.enrich_from_offer_previewstypes/subcategory_structure.pySubcategoryStructure.enrich_delivery_fields_from_offertypes/subcategory_structure.pySubcategoryStructure.merge_fromtypes/subcategory_structure.pySubcategoryStructure.alias_sourcetypes/subcategory_structure.pySubcategoryStructure.forget_aliases_fromtypes/subcategory_structure.pyAliasSourcetypes/subcategory_structure.pyOfferPage.delivery_fields_spectypes/pages/offer_page.pyOrderPage.metadatatypes/pages/order_page.pyOrderPage.lot_fieldstypes/pages/order_page.pyOrderPage.delivery_fieldstypes/pages/order_page.pyOrderPage.reclassify_with_structuretypes/pages/order_page.pyORDER_METADATA_LABELStypes/pages/order_page.pyORDER_DELIVERY_LABELStypes/pages/order_page.pyOrderPageParsingOptions.expand_composite_lot_labelsparsers/page_parsers/order_page_parser.pyTrue)SubcategoryPageParsingOptions.fallback_structure_from_chips_offersparsers/page_parsers/subcategory_page_parser.pyFalse)FieldConditiontypes/subcategory_structure.pyFunPayObject(gainedraw_source)SubcategoryStructuretypes/subcategory_structure.pyFunPayObject(gainedraw_source)Backward compatibility
OrderPage.datais kept verbatim —metadata,lot_fields,delivery_fieldsare additional fields.datais the union for any existing caller.FieldCondition/SubcategoryStructurecontinue to construct via the same kw-only call patterns; newraw_sourcedefaults to''.SubcategoryStructure.derived_fromdefaults to'lot_fields', so any existing test or wrapper that constructs structures directly produces non-synthetic ones.delivery_fieldsdefaults to{}.OfferFieldsParserstill emits the sameSubcategoryFieldDefobjects; the parser explicitly seedsaliases={field_id}and the localizedlabelflows in via__post_init__.OrderPageParsingOptions.expand_composite_lot_labelsdefaults toTrue— preserves the recall-boosting semantics. Callers assertinglen(lot_fields)can opt out.SubcategoryPageParsingOptions.fallback_structure_from_chips_offersdefaults toFalse— opt-in only; CHIPS callers that previously gotstructure=Nonecontinue to getstructure=None.lookup_field_id(label)without acontextkwarg behaves exactly as before — ambiguous matches still returnNone._split_order_datanow returns a 3-tuple. The only in-tree callers (OrderPageParser, the unit tests) are updated. External callers importing this private helper need to unpack three values.Tests
pytest tests/— 574 passed.tests/subcategory_structure_test.pyTestFieldCondition— case-insensitive matching, numeric value coercion, edge cases.TestSubcategoryStructureLabelMap—label_mapindexes both the original label and the casefolded form (label auto-added by__post_init__); duplicate labels preserved in declaration order; empty labels grouped; case-insensitive merge.TestSubcategoryFieldDefAliases— alias casefolding, empty-alias filtering.TestSubcategoryStructureAliasIndexing—lookup_field_idresolves both English ID and localized aliases;add_aliasinvalidates cached label maps;enrich_from_offermatches a unique SELECT option and skips ambiguous matches.TestNormalizeOption— strips emoji / misc symbols / dingbats; collapses whitespace; strips outer punct; preserves Latin / Cyrillic / digits / interior punctuation.TestEnrichFromOfferWithDecoratedValue—'RUB🔥'resolves to acurrencyfield whose options include'RUB'.TestSubcategoryFieldDefLabelAutoAlias—labelauto-added to aliases by__post_init__; empty label not added.TestEnrichFromOfferFields— localized labels from offerEdit-pageOfferFieldsget registered as aliases on a listing-page-built structure; ids absent fromself.fieldsare ignored; chaining returnsself.TestEnrichFromOrderPage(0.9.1) — unique-value match registers an alias withORDER_PAGEprovenance; ambiguous matches skipped; already-mapped labels left alone.TestEnrichFromOfferPreviews(0.9.1) —other_data_namesentries registered as aliases withOFFER_PREVIEWprovenance; no-op on empty inputs / emptyother_data; unknown field ids ignored.TestAliasSource(0.9.1) — default source isUSER; explicit source recorded; label-derived auto-aliases seeded withLABEL;forget_aliases_from(source)removes only matching entries and returns the count; positionaladd_alias(fid, alias)calls remain backward-compatible.TestLookupFieldIdWithContext(0.9.2) — unique label resolves without context; ambiguous + no context returnsNone; context picks unconditional when conditions of the alternative are unsatisfied; context promotes the conditional candidate when its conditions hold; miss returnsNoneregardless of context.TestEnrichDeliveryFieldsFromOffer(0.9.2) — multiple offers union; first-seen label wins; chaining returnsself.TestSubcategoryStructureMerge(0.9.2) — fields only in other are deep-copied (mutating other afterwards doesn't affect self); overlapping field unions aliases; alias provenance carries over; other is not mutated;delivery_fieldsare unioned;label_mapcache is invalidated; chaining returnsself.TestOrderPageReclassifyAndStructured(0.9.2) —reclassify_with_structurepromotes a label out oflot_fieldsand intodelivery_fieldswhen the structure has the matching delivery spec;get_structured_fieldspicks the unconditional candidate on ambiguous lookup with no helpful context.tests/order_page_split_test.pyTestSplitOrderData— canonical RU/EN/UA metadata extracted under canonical keys;metadataandlot_fieldskey spaces stay disjoint; English-locale variants.TestSplitOrderDataCompositeExpansion—'количество usd' = '20 USD'produces both the original entry and a synthetic'usd' = '20'; same forrub; broad currency-id regex (gbpworks); non-numeric value left alone; pre-existing synthetic-key wins on collision (setdefault); opt-out viaexpand_composite=False; non-composite labels untouched.TestSplitOrderDataDeliveryClassification(0.9.2) — static blacklist routes Telegram / Steam labels todelivery_fields;extra_delivery_labelsextends rather than replaces the static blacklist; the three returned dicts are pairwise disjoint;ORDER_DELIVERY_LABELSexposes the expected casefolded entries.tests/parsers_test/offer_page_delivery_spec_test.py(0.9.2)TestParseDeliveryFieldsSpec— basic Telegram-username form-group; payment-method dropdown skipped; chrome-autofill hidden input skipped;offer-calc-boxform-group skipped; unlabeled groups skipped; multiple visible fields collected; reservedcsrf_tokenskipped; named character-name input picked up.tests/synthesize_chips_test.pyTestSynthesizeChipsStructure— empty / single / multi-offer cases; first-seen option ordering; label fallback to field id; defaultderived_from='lot_fields'for normalSubcategoryStructureconstructors.TestFallbackOptionDefault—fallback_structure_from_chips_offersdefaults toFalse.tests/parsers_test/lot_fields_parser_test.py— fixture asserts the auto-registered aliases ({field_id, casefolded label}) on each parsedSubcategoryFieldDef.Engine cleanup once 0.9.2 is released
asdict()shim in_add_raw_source(was needed becauseFieldCondition/SubcategoryStructureweren'tFunPayObject).structure.is_syntheticinstead of sniffingraw_source.ORDER_DELIVERY_LABELS+SubcategoryStructure.delivery_fieldsis now the canonical source._split_order_dataresult unpacking — the third element isdelivery_fields.