Conversation
WalkthroughAdds optional Polars DataFrame integration behind a feature flag (ModelPolarsExt); refactors build.rs to use compile-time cfg-based OS/arch detection and per-OS logic; updates Cargo.toml to add an optional Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant ModelPolarsExt
participant Select as select_columns?
participant ToFloat as dataframe_to_float_features
participant ToCat as dataframe_to_cat_features
participant Calc as calc_model_prediction
User->>ModelPolarsExt: call predict_dataframe / variants
ModelPolarsExt->>Select: optionally select columns (variant)
Select-->>ModelPolarsExt: DataFrame or selection
ModelPolarsExt->>ToFloat: convert to Vec<Vec<f32>>
ToFloat-->>ModelPolarsExt: float features
ModelPolarsExt->>ToCat: convert to Vec<Vec<String>> (optional)
ToCat-->>ModelPolarsExt: cat features
ModelPolarsExt->>Calc: calc_model_prediction(float, cat)
Calc-->>ModelPolarsExt: Vec<f64> predictions
ModelPolarsExt-->>User: return predictions or CatBoostError
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (4)
build.rs (2)
12-24: Architecture detection fromTARGETis fine but only handles a small set of triples.
get_arch_from_targetcorrectly keys offTARGETinstead of host, but it only recognizesx86_64,aarch64/arm64, andi686/i586, panicking otherwise. That’s reasonable if CatBoost only ships binaries for those arches, but it does mean any other target will fail at build‑time even if someone doesn’t actually try to download/run CatBoost there.If you want a softer failure mode, you could return a
Result<&'static str, String>here and bubble that up into a more descriptiveCatBoost‑specific error instead of panicking.
341-357: Shared library copy, soname, and rpath logic look good for native builds; cross‑compile behavior may be off.The new unified flow—derive
lib_filename, copy fromOUT_DIR/libstotarget/{PROFILE}, then adjust install name/soname and emit OS‑specific rpaths—looks coherent for macOS/Linux/Windows when building natively. Usingpatchelf/install_name_toolwith ignored exit codes is also a sensible “best effort” approach.
- Same as above, the
#[cfg(target_os = ...)]here will follow the host OS, not the Cargo target, which can surprise anyone cross‑compiling. Once you switch OS detection to useCARGO_CFG_TARGET_OS, you may also want these blocks to branch on that value instead ofcfg.- For non‑default targets, the final binary path is typically
target/<triple>/<profile>. Right now you copy intotarget/{PROFILE}; this is fine for native builds, but for exotic targets you may want to consider copying intotarget/<triple>/{PROFILE}instead (or at least documenting that only native builds are supported).Also applies to: 359-390, 399-447
src/polars_ext.rs (2)
110-136: Row/column iteration is correct but can be slightly optimized.
dataframe_to_float_featuresis logically correct: it builds row‑majorVec<Vec<f32>>by iterating over rows and then columns. Minor nits:
df.get_columns()is called on each row iteration; you can hoist it out of the outer loop.as_materialized_series()is also called repeatedly per cell; depending on how cheap this is in Polars 0.45, it might be worth calling it once per column outside the row loop.A minimal tweak:
fn dataframe_to_float_features(df: &DataFrame) -> CatBoostResult<Vec<Vec<f32>>> { let num_rows = df.height(); let num_features = df.width(); if num_rows == 0 || num_features == 0 { /* … */ } - let mut result = Vec::with_capacity(num_rows); - for row_idx in 0..num_rows { - let mut row_features = Vec::with_capacity(num_features); - for col in df.get_columns() { - let series = col.as_materialized_series(); - let value = extract_f32_value(series, row_idx)?; - row_features.push(value); - } - result.push(row_features); - } + let cols: Vec<&Series> = df.get_columns() + .iter() + .map(|c| c.as_materialized_series()) + .collect(); + let mut result = Vec::with_capacity(num_rows); + for row_idx in 0..num_rows { + let mut row_features = Vec::with_capacity(num_features); + for series in &cols { + let value = extract_f32_value(series, row_idx)?; + row_features.push(value); + } + result.push(row_features); + } Ok(result) }Please verify in Polars 0.45 docs that
as_materialized_series()doesn’t unexpectedly clone data when reused this way.
138-165: Categorical feature conversion mirrors the float conversion; similar perf tweaks apply.
dataframe_to_cat_featuresmirrors the float conversion logic and is correct in terms of buildingVec<Vec<String>>. The same minor optimizations (hoistingget_columns()andas_materialized_series()outside the row loop) apply here as well.If you adopt the optimization for floats, consider applying the same pattern here to keep behavior and performance characteristics aligned.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
Cargo.toml(2 hunks)build.rs(8 hunks)src/lib.rs(1 hunks)src/polars_ext.rs(1 hunks)
🔇 Additional comments (4)
src/lib.rs (1)
16-19: Polars extension module gating and re‑export look solid.Conditionally compiling
mod polars_ext;andpub use crate::polars_ext::ModelPolarsExt;behind#[cfg(feature = "polars")]cleanly exposes the extension trait only when Polars is enabled, without affecting existing users.src/polars_ext.rs (2)
5-56: Trait surface and docs forModelPolarsExtlook good and are well‑scoped.The three methods (
predict_dataframe,predict_dataframe_with_columns,predict_dataframe_with_types) cover the main Polars use‑cases without polluting the baseModelAPI, and the doc example is clear. The design choice to keep the extension as a separate trait is nice for ergonomics.If you expect heavy Polars usage, you might consider later adding a short section to the crate README showing how to enable the
polarsfeature and use these methods end‑to‑end.
167-271: Note: The suggested mitigation to switch tof64is not feasible given the CatBoost C binding constraint.The
calc_model_predictionfunction has a hardcoded generic constraintTFloatFeature: AsRef<[f32]>and calls the C FFI functionCalcModelPredictionWithHashedCatFeaturesAndTextAndEmbeddingFeatures, which expects f32 pointers. The CatBoost C library itself is bound to f32, so switching to f64 would require modifying the FFI bindings in thesyscrate, not just updatingcalc_model_predictionat this layer.That said, the code correctly implements what CatBoost requires. The precision loss concern is technically valid (f32 can represent integers precisely only up to 2^24), but without evidence of large integer features in your use case, it remains a theoretical concern.
extract_f32_valuehandles common numeric and boolean types correctly. The precision caveat is valid only if your actual feature data includes integers larger than ~16 million; confirm your typical feature value ranges to determine if this is a practical concern.Cargo.toml (1)
13-15: Polars 0.45 dependency and dtype-full feature set verified as correct.The dependency wiring is sound. The code in
polars_ext.rsuses standard Polars numeric types (Int8, Int16, Int32, Int64, UInt8, UInt16, UInt32, UInt64, Float32, Float64, Boolean, String), all of which are covered by thedtype-fullfeature. Thedtype-fullpreset enables all optional dtype-related features in Polars 0.45, and the optional dependency with explicit feature gate["dep:polars"]is correctly configured to keep the base crate lean when the feature is not enabled.
| fn get_os_name() -> &'static str { | ||
| #[cfg(target_os = "macos")] | ||
| { | ||
| "darwin" | ||
| } | ||
|
|
||
| (os.to_string(), arch.to_string()) | ||
| #[cfg(target_os = "linux")] | ||
| { | ||
| "linux" | ||
| } | ||
|
|
||
| #[cfg(target_os = "windows")] | ||
| { | ||
| "windows" | ||
| } | ||
| } |
There was a problem hiding this comment.
Using #[cfg(target_os = …)] in get_os_name ties behavior to the host, not the Cargo target.
In build scripts, cfg(target_os) refers to the host OS, while TARGET / CARGO_CFG_TARGET_OS describe the compilation target. get_os_name() currently uses cfg(target_os) and is then used to form the download URL and platform tuple in download_compiled_library. This will break cross‑compilation (e.g., building Linux target binaries from macOS) by downloading the wrong library and emitting the wrong errors.
Consider deriving the OS name from CARGO_CFG_TARGET_OS or by parsing TARGET (similar to get_arch_from_target). For example:
-fn get_os_name() -> &'static str {
- #[cfg(target_os = "macos")]
- {
- "darwin"
- }
- #[cfg(target_os = "linux")]
- {
- "linux"
- }
- #[cfg(target_os = "windows")]
- {
- "windows"
- }
-}
+fn get_os_name() -> String {
+ let target_os = std::env::var("CARGO_CFG_TARGET_OS")
+ .expect("CARGO_CFG_TARGET_OS not set by Cargo");
+ match target_os.as_str() {
+ "macos" => "darwin".to_string(),
+ "linux" => "linux".to_string(),
+ "windows" => "windows".to_string(),
+ other => panic!("Unsupported target OS: {}", other),
+ }
+}You’d then update call sites to accept a String instead of &'static str.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| fn get_os_name() -> &'static str { | |
| #[cfg(target_os = "macos")] | |
| { | |
| "darwin" | |
| } | |
| (os.to_string(), arch.to_string()) | |
| #[cfg(target_os = "linux")] | |
| { | |
| "linux" | |
| } | |
| #[cfg(target_os = "windows")] | |
| { | |
| "windows" | |
| } | |
| } | |
| fn get_os_name() -> String { | |
| let target_os = std::env::var("CARGO_CFG_TARGET_OS") | |
| .expect("CARGO_CFG_TARGET_OS not set by Cargo"); | |
| match target_os.as_str() { | |
| "macos" => "darwin".to_string(), | |
| "linux" => "linux".to_string(), | |
| "windows" => "windows".to_string(), | |
| other => panic!("Unsupported target OS: {}", other), | |
| } | |
| } |
🤖 Prompt for AI Agents
In build.rs around lines 26 to 41, get_os_name currently uses #[cfg(target_os =
"...")] which reflects the build host rather than the compilation target;
replace this with logic that reads CARGO_CFG_TARGET_OS (or parses TARGET) from
env::var at compile time and returns an owned String (not &'static str) to
represent the target OS name (e.g., "linux", "darwin", "windows"); update any
call sites (like download_compiled_library and platform tuple construction) to
accept/consume a String (or &str) instead of &'static str and ensure the string
mapping matches the conventions used elsewhere (same canonical names as
get_arch_from_target).
| fn get_lib_filename() -> &'static str { | ||
| #[cfg(target_os = "windows")] | ||
| { | ||
| "catboostmodel.dll" | ||
| } | ||
|
|
||
| #[cfg(target_os = "macos")] | ||
| { | ||
| "libcatboostmodel.dylib" | ||
| } | ||
|
|
||
| #[cfg(target_os = "linux")] | ||
| { | ||
| "libcatboostmodel.so" | ||
| } | ||
| } |
There was a problem hiding this comment.
get_lib_filename should also key off the Cargo target OS instead of the host.
Same cross‑compilation concern as get_os_name: #[cfg(target_os = "...")] in a build script evaluates to the host OS. Using it to choose the library filename will misbehave when cross‑compiling (e.g., targeting Windows from Linux).
You can align this with the suggested get_os_name change by either:
- returning the filename directly from
get_os_name(e.g.,("darwin", "libcatboostmodel.dylib")), or - introducing a
CARGO_CFG_TARGET_OS‑based helper, similar to the previous diff, and using that string to pick the correct filename.
🤖 Prompt for AI Agents
In build.rs around lines 43 to 58, get_lib_filename currently uses host-targeted
#[cfg(target_os = "...")] which breaks cross-compilation; replace this logic so
it keys off the Cargo target OS instead (use env::var("CARGO_CFG_TARGET_OS") or
reuse/extend the proposed get_os_name helper to return the target OS or a (os,
filename) pair) and map the target os string ("windows", "macos"/"darwin",
"linux") to the correct filename ("catboostmodel.dll", "libcatboostmodel.dylib",
"libcatboostmodel.so"); ensure the function uses the target string to select the
filename and keep the return type compatible ( &'static str or adjust callers if
you change signature).
| let os = get_os_name(); | ||
| let arch = get_arch_from_target(); | ||
| let version = get_catboost_version(); |
There was a problem hiding this comment.
Platform selection in download_compiled_library depends on the same OS source; align with target OS.
The match (os, arch) and later match os blocks assume os is the target OS. With the current get_os_name implementation that’s only true for native builds. For cross‑builds, you’ll end up in the wrong arm of the match or in the _ => Unsupported platform branches even for valid target triples.
Once get_os_name() is switched to derive from the Cargo target (see earlier comment), this function should work as‑is again. After that, it might be worth adding a short debug println! of os/arch and TARGET to help diagnose platform mismatches when users hit the Unsupported platform errors.
Also applies to: 113-187, 189-246
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (2)
src/polars_ext.rs (2)
79-105: Blocking issue: Cannot handle pure categorical predictions despite API design.When
float_columnsis empty, line 87 selects zero columns, producing a DataFrame that triggers the "zero columns" error at line 115-118. This prevents categorical-only predictions even though CatBoost supports them and the API structure suggests this should work.Apply this diff to allow empty float_columns:
fn predict_dataframe_with_types( &self, df: &DataFrame, float_columns: &[&str], cat_columns: &[&str], ) -> CatBoostResult<Vec<f64>> { - // Extract float features - let float_col_names: Vec<String> = float_columns.iter().map(|s| s.to_string()).collect(); - let float_df = df.select(float_col_names).map_err(|e| CatBoostError { - description: format!("Failed to select float columns: {}", e), - })?; - let float_features = dataframe_to_float_features(&float_df)?; + // Extract float features + let float_features = if float_columns.is_empty() { + if cat_columns.is_empty() { + return Err(CatBoostError { + description: "Both float_columns and cat_columns cannot be empty".to_string(), + }); + } + vec![vec![]; df.height()] + } else { + let float_col_names: Vec<String> = float_columns.iter().map(|s| s.to_string()).collect(); + let float_df = df.select(float_col_names).map_err(|e| CatBoostError { + description: format!("Failed to select float columns: {}", e), + })?; + dataframe_to_float_features(&float_df)? + }; // Extract categorical features let cat_features = if cat_columns.is_empty() {
302-333: Major issue: Integer-to-string conversion corrupts large values viaf32.Lines 314-317 route all integer types through
extract_f32_valuebefore converting to string. Sincef32has only a 24-bit mantissa, this corrupts integers beyond ±16,777,216 and mangles unsigned values when cast toi64. This produces incorrect categorical codes for CatBoost.Extract each integer type directly using its dedicated accessor:
fn extract_string_value(series: &Series, idx: usize) -> CatBoostResult<String> { use DataType::*; match series.dtype() { String => { let ca = series.str().map_err(|e| CatBoostError { description: format!("Failed to cast to String: {}", e), })?; Ok(get_checked_value!(ca, idx).to_string()) } - // Convert numeric types to strings for categorical features - Int8 | Int16 | Int32 | Int64 | UInt8 | UInt16 | UInt32 | UInt64 => { - let value = extract_f32_value(series, idx)?; - Ok(format!("{}", value as i64)) - } + Int8 => { + let ca = series.i8().map_err(|e| CatBoostError { + description: format!("Failed to cast to i8: {}", e), + })?; + Ok(get_checked_value!(ca, idx).to_string()) + } + Int16 => { + let ca = series.i16().map_err(|e| CatBoostError { + description: format!("Failed to cast to i16: {}", e), + })?; + Ok(get_checked_value!(ca, idx).to_string()) + } + Int32 => { + let ca = series.i32().map_err(|e| CatBoostError { + description: format!("Failed to cast to i32: {}", e), + })?; + Ok(get_checked_value!(ca, idx).to_string()) + } + Int64 => { + let ca = series.i64().map_err(|e| CatBoostError { + description: format!("Failed to cast to i64: {}", e), + })?; + Ok(get_checked_value!(ca, idx).to_string()) + } + UInt8 => { + let ca = series.u8().map_err(|e| CatBoostError { + description: format!("Failed to cast to u8: {}", e), + })?; + Ok(get_checked_value!(ca, idx).to_string()) + } + UInt16 => { + let ca = series.u16().map_err(|e| CatBoostError { + description: format!("Failed to cast to u16: {}", e), + })?; + Ok(get_checked_value!(ca, idx).to_string()) + } + UInt32 => { + let ca = series.u32().map_err(|e| CatBoostError { + description: format!("Failed to cast to u32: {}", e), + })?; + Ok(get_checked_value!(ca, idx).to_string()) + } + UInt64 => { + let ca = series.u64().map_err(|e| CatBoostError { + description: format!("Failed to cast to u64: {}", e), + })?; + Ok(get_checked_value!(ca, idx).to_string()) + } Boolean => { let ca = series.bool().map_err(|e| CatBoostError {
🧹 Nitpick comments (2)
src/polars_ext.rs (2)
44-55: Clarify behavior and constraints in documentation.The documentation should specify:
- What happens when both
float_columnsandcat_columnsare empty (currently this causes an error deep in the call stack)- Whether categorical columns are automatically cast to String or must already be String dtype
- The expected column order in the resulting predictions
66-77: Implicit numeric-only assumption may surprise users.This method delegates to
predict_dataframe, which converts all columns to float features. If a user passes String column names, the conversion at line 131 will fail with a Polars cast error. Consider either documenting this constraint or validating column dtypes before selection.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/polars_ext.rs(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Test windows (x86_64) - CatBoost 1.2.7
- GitHub Check: Build Docker image
🔇 Additional comments (5)
src/polars_ext.rs (5)
58-64: LGTM!Clean implementation for the pure numeric prediction case.
107-148: LGTM! Efficient column-by-column conversion.The implementation uses Polars' cast to Float32 for efficient type conversion and processes column-by-column for better cache locality. Null handling is appropriate.
150-204: LGTM! Well-structured with performance optimizations.The single-row fast path and direct String iteration are good optimizations. The fallback to
extract_string_valuehandles other types (though there's a separate issue with numeric conversion flagged below).
224-300: LGTM! Comprehensive numeric type handling.Covers all Polars numeric types with appropriate conversions. The cast to
f32is acceptable here since these values are used as float features in CatBoost.
206-222: LGTM! Clean abstraction for value extraction.The macro eliminates repetition and provides consistent bounds and null checking across all type accessors.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (2)
src/polars_ext.rs (2)
79-104: The pure-categorical prediction issue persists.When
float_columnsis empty,df.select([])at lines 86-89 produces a zero-column DataFrame, which causesdataframe_to_float_features(line 90) to error at lines 115-119 with "DataFrame has zero rows or columns". This prevents purely categorical predictions even though CatBoost supports categorical-only models.The previous review suggested handling empty
float_columnsby skipping the selection/conversion and creating an empty feature vector directly:) -> CatBoostResult<Vec<f64>> { - // Extract float features - let float_col_names: Vec<String> = float_columns.iter().map(|s| s.to_string()).collect(); - let float_df = df.select(float_col_names).map_err(|e| CatBoostError { - description: format!("Failed to select float columns: {}", e), - })?; - let float_features = dataframe_to_float_features(&float_df)?; + // Extract float features + let float_features = if float_columns.is_empty() { + // No float features - create empty vectors for each row + vec![vec![]; df.height()] + } else { + let float_col_names: Vec<String> = float_columns.iter().map(|s| s.to_string()).collect(); + let float_df = df.select(float_col_names).map_err(|e| CatBoostError { + description: format!("Failed to select float columns: {}", e), + })?; + dataframe_to_float_features(&float_df)? + }; // Extract categorical featuresAdditionally, add a check for the case where both are empty:
+ if float_columns.is_empty() && cat_columns.is_empty() { + return Err(CatBoostError { + description: "Must provide at least one float or categorical column".to_string(), + }); + } + // Extract float features
310-313: Numeric-to-string conversion for categoricals still routes throughf32, causing precision loss.The code at lines 310-313 extracts integer values via
extract_f32_valuethen casts toi64. Sincef32has only a 24-bit mantissa, large integer values (beyond ±16,777,216) are corrupted, and unsigned types are incorrectly cast to signed, producing wrong categorical codes for CatBoost.The previous review recommended extracting each integer type directly using Polars' native accessors (
.i8(),.i16(),.i32(),.i64(),.u8(),.u16(),.u32(),.u64()), which are already used correctly inextract_f32_value. Example fix:- Int8 | Int16 | Int32 | Int64 | UInt8 | UInt16 | UInt32 | UInt64 => { - let value = extract_f32_value(series, idx)?; - Ok(format!("{}", value as i64)) + Int8 => { + let ca = series.i8().map_err(|e| CatBoostError { + description: format!("Failed to cast to i8: {}", e), + })?; + Ok(get_checked_value!(ca, idx).to_string()) + } + Int16 => { + let ca = series.i16().map_err(|e| CatBoostError { + description: format!("Failed to cast to i16: {}", e), + })?; + Ok(get_checked_value!(ca, idx).to_string()) + } + Int32 => { + let ca = series.i32().map_err(|e| CatBoostError { + description: format!("Failed to cast to i32: {}", e), + })?; + Ok(get_checked_value!(ca, idx).to_string()) + } + Int64 => { + let ca = series.i64().map_err(|e| CatBoostError { + description: format!("Failed to cast to i64: {}", e), + })?; + Ok(get_checked_value!(ca, idx).to_string()) + } + UInt8 => { + let ca = series.u8().map_err(|e| CatBoostError { + description: format!("Failed to cast to u8: {}", e), + })?; + Ok(get_checked_value!(ca, idx).to_string()) + } + UInt16 => { + let ca = series.u16().map_err(|e| CatBoostError { + description: format!("Failed to cast to u16: {}", e), + })?; + Ok(get_checked_value!(ca, idx).to_string()) + } + UInt32 => { + let ca = series.u32().map_err(|e| CatBoostError { + description: format!("Failed to cast to u32: {}", e), + })?; + Ok(get_checked_value!(ca, idx).to_string()) + } + UInt64 => { + let ca = series.u64().map_err(|e| CatBoostError { + description: format!("Failed to cast to u64: {}", e), + })?; + Ok(get_checked_value!(ca, idx).to_string()) }
🧹 Nitpick comments (1)
src/polars_ext.rs (1)
107-204: Conversion logic is sound with good performance considerations.The column-by-column processing for cache locality is well-reasoned, and the null/bounds checking is thorough.
Minor optional improvement: The error message at lines 115-119 and 156-159 could be more specific—differentiate between "zero rows" vs "zero columns" to aid debugging.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/polars_ext.rs(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build Docker image
🔇 Additional comments (1)
src/polars_ext.rs (1)
1-56: LGTM! Trait definition is well-documented.The public API surface is clean with clear documentation and appropriate examples. The trait design provides a natural ergonomic interface for Polars users.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
src/polars_ext.rs (2)
33-42: Clarify expected column types in documentation.The documentation for
predict_dataframe_with_columnsdoesn't specify what column types are expected or how the method behaves. Since it delegates topredict_dataframe, which converts all columns to float features, this should be documented.Consider adding a note like:
/// Predict using specific columns from a Polars DataFrame /// +/// All selected columns will be treated as float features and must be numeric. +/// /// # Arguments /// * `df` - Input DataFrame /// * `columns` - Column names to use as features (in order)
124-161: Consider adding single-row fast path for consistency.
dataframe_to_cat_featuresincludes a single-row optimization (lines 176-184), butdataframe_to_float_featuresdoes not. While the performance benefit may be negligible for float conversion (since we're already doing a bulk cast to Float32), adding a similar fast path would make the functions symmetric.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/polars_ext.rs(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Test windows (x86_64) - CatBoost 1.2.7
🔇 Additional comments (2)
src/polars_ext.rs (2)
79-117: LGTM! Pure-categorical prediction handling is correct.The implementation properly handles the case when
float_columnsis empty by creating empty feature vectors (line 94-95) rather than attempting to select zero columns. This allows pure-categorical predictions as intended.
324-371: LGTM! Integer-to-string conversion now preserves precision.The implementation correctly extracts each integer type directly using the appropriate Polars accessor (
.i8(),.u64(), etc.) and converts to string without going throughf32. This preserves the exact integer values, including largei64andu64values that would be corrupted by anf32intermediary.
| /// # Arguments | ||
| /// * `df` - Input DataFrame | ||
| /// * `float_columns` - Names of columns to use as float features | ||
| /// * `cat_columns` - Names of columns to use as categorical features (must be String type) |
There was a problem hiding this comment.
Update documentation: categorical columns accept more types than stated.
The documentation states categorical columns "must be String type", but the implementation (lines 324-371) also accepts and converts Int8/16/32/64, UInt8/16/32/64, and Boolean types. Update the doc comment to reflect the actual supported types.
Apply this diff:
- /// * `cat_columns` - Names of columns to use as categorical features (must be String type)
+ /// * `cat_columns` - Names of columns to use as categorical features (String, numeric, or Boolean types)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /// * `cat_columns` - Names of columns to use as categorical features (must be String type) | |
| /// * `cat_columns` - Names of columns to use as categorical features (String, numeric, or Boolean types) |
🤖 Prompt for AI Agents
In src/polars_ext.rs around line 49, the doc comment for `cat_columns`
incorrectly says "must be String type"; update it to list the actual supported
types observed in the implementation (see logic at lines 324-371) — i.e.,
String, signed integers (Int8/16/32/64), unsigned integers (UInt8/16/32/64), and
Boolean — and clarify that these non-string types will be converted to
categorical as handled by the code. Keep the wording concise and accurate to the
implementation.
Summary by CodeRabbit
New Features
Chores