[codex] update course automation, transformer, and neuron lessons by hghalebi · Pull Request #2 · hghalebi/rust-ml

hghalebi · 2026-03-27T04:25:42Z

Summary

This PR extends the course from a CI-only update into a larger curriculum and code pass.

It now includes:

the CI and Gemini writing-review automation
the typed Transformer encoder rewrite and companion crate updates
a newly authored neuron module that bridges Rust basics into the first trainable model

What Changed

Course automation

added CI checks for lesson structure, local Markdown links, snippet compilation, and transformer crate quality gates
added Gemini-based review for English clarity and technical-teaching quality

Transformer module

rewrote the Transformer lessons around English -> Algebra -> Rust
added semantic-newtype teaching code and thiserror-based diagnostics in the companion crate
added the runnable encoder demo and updated references/docs

Neuron module

promoted lessons/03-neuron from planned to authored
added 01-rust-essentials-for-a-tiny-neuron.md
added 02-neuron-as-a-chain-of-functions.md
added neuron exercises and solutions
integrated the tiny-factory metaphor, chain-of-functions framing, and official Rust docs links
updated course indexes and authored-module validators accordingly

Validation

I ran the relevant local checks for the authored content and companion code:

python3 scripts/check_course_content.py
python3 -m py_compile scripts/check_course_content.py scripts/check_lesson_rust_snippets.py scripts/gemini_review_markdown.py
python3 scripts/check_lesson_rust_snippets.py
cargo fmt --manifest-path code/transformer/Cargo.toml
cargo clippy --manifest-path code/transformer/Cargo.toml --all-targets --all-features
cargo test --manifest-path code/transformer/Cargo.toml
cargo run --example encoder_demo --manifest-path code/transformer/Cargo.toml

Current Results

course content checks passed across 39 Markdown files
18 snippets compiled from the authored foundations, vectors, and neuron lessons
29 snippets compiled from the Transformer module
transformer crate tests passed: 34/34

Notes

The branch already has an open PR, so this update refreshes the review context instead of opening a duplicate one.

Summary by CodeRabbit

New Features
- Transformer encoder API, multi‑head/linear attention variants, and runnable encoder demo
- Neuron module with two new Rust lessons and exercises
Documentation
- Major rewrite of Transformer lessons (encoder-focused, semantic newtypes, expressive errors)
- Added external reference to Raschka’s LLMs-from-scratch
Refactor
- Typed math primitives (dense vectors/matrices) with fallible APIs and richer error types
- Reorganized attention/encoder modules and semantic types
Chores
- CI workflow action version bumps and added error-reporting dependency

coderabbitai · 2026-03-27T04:25:56Z

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

Replaces panic-based math/nn primitives with fallible, semantic types and expressive errors; adds attention and encoder modules, updates public API, examples, and lessons to use new crate; bumps CI action versions and adds thiserror; updates snippet tooling and course docs to reflect encoder-first, typed-Rust teaching.

Changes

Cohort / File(s)	Summary
CI & Dependency `\.github/workflows/ci.yml`, `\.github/workflows/gemini-writing-review.yml`, `code/transformer/Cargo.toml`	Bumped GH Action versions (`actions/checkout`, `actions/setup-python`, `actions/upload-artifact`) and added `thiserror = "2"` dependency.
Core Math & Errors `code/transformer/src/math.rs`, `code/transformer/src/error.rs`	Replaced dynamic `Vector`/`Matrix` with `DenseVector`/`DenseMatrix` and fallible constructors/ops returning `Result`; added `ModelError` enum (detailed error variants) and statistical helpers.
Semantic Types `code/transformer/src/types.rs`	Introduced semantic newtypes (e.g., `TokenEmbedding`, `Query`, `Key`, `Value`, `TokenSequence`, projection/mask wrappers) with accessors and validated `TokenSequence::new`.
Attention Implementation `code/transformer/src/attention.rs`	New attention module: projection layers (`QueryLayer`, `KeyLayer`, `ValueLayer`, `OutputLayer`), `scaled_attention_score`, `softmax`, `weighted_sum`, `AttentionHead`, `LinearAttentionHead`, `MultiHeadAttention`, concatenation and many unit tests.
Encoder Architecture `code/transformer/src/transformer.rs`	Replaced prior `Sequence`/`SelfAttention`/`TransformerBlock` with encoder-focused APIs: `PositionalEncodingTable`, `LayerNorm`, `FeedForward` layers, `TransformerEncoderBlock`, `Encoder`, and validated residual helpers (`add_sequences`, `add_token_embeddings`).
API Surface & Cleanup `code/transformer/src/lib.rs`, `code/transformer/src/nn.rs`	Restructured exports to `attention`, `error`, `types`; removed old `nn.rs` (deleted Linear, relu, softmax, phi, layer_norm, FeedForward, StaticLinear) and re-exported new primitives and error type.
Examples & Crate README `code/transformer/examples/encoder_demo.rs`, `code/transformer/README.md`	Added runnable encoder demo example; README reframed as executable companion listing components, layout, run command, and updated scope/priorities.
Transformer Lessons & Exercises `lessons/07-transformer/...`	Extensive lesson rewrites to emphasize encoder path, semantic newtypes, `Result<ModelError>`, chunked English→Algebra→Rust pedagogy; exercises and solutions re-scoped to new API and error-driven pedagogy.
Neuron Module & Lessons `lessons/03-neuron/*`, `code/neuron/README.md`	Added Neuron track lessons (two lessons, exercises, solutions) and updated neuron README/status entries.
Course Docs & References `README.md`, `lessons/README.md`, `references/README.md`, `references/repos/llms-from-scratch.md`, `lessons/06-attention/README.md`	Updated course map/status, renamed authored lesson titles, reframed learning strategy, and added Sebastian Raschka "LLMs From Scratch" reference and usage guidance.
Snippet & Validation Scripts `scripts/check_course_content.py`, `scripts/check_lesson_rust_snippets.py`	`check_course_content.py` adds module-07-specific required headings; `check_lesson_rust_snippets.py` special-cases neuron blocks, revises which files compile as general snippets, and refactors transformer snippet checking to create temp Cargo projects and run `cargo check` with adjusted env/target handling and reporting.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Demo as EncoderDemo
    participant Pos as PositionalEncodingTable
    participant Encoder as TransformerEncoderBlock
    participant Norm as LayerNorm
    participant MHA as MultiHeadAttention
    participant Head as AttentionHead
    participant FF as FeedForward

    User->>Demo: run encoder_demo
    Demo->>Demo: build TokenSequence (embeddings)
    Demo->>Pos: add_to_sequence(seq)
    Pos-->>Demo: augmented TokenSequence

    Demo->>Encoder: forward(augmented_seq)

    rect rgba(100, 150, 200, 0.5)
    Encoder->>Norm: forward_sequence(input_seq)
    Norm-->>Encoder: normalized_seq
    Encoder->>MHA: forward(normalized_seq)
    MHA->>Head: forward(seq)
    Head-->>MHA: per-token AttentionOutput
    MHA-->>Encoder: attention TokenSequence
    Encoder->>Encoder: add_sequences(input, attention_out)
    end

    rect rgba(150, 200, 100, 0.5)
    Encoder->>Norm: forward_sequence(residual_seq)
    Norm-->>Encoder: normalized_seq2
    Encoder->>FF: forward_sequence(normalized_seq2)
    FF-->>Encoder: ff TokenSequence
    Encoder->>Encoder: add_sequences(residual, ff_out)
    end

    Encoder-->>Demo: final TokenSequence
    Demo->>User: print token vectors

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 Hops through types with careful stride,

Errors caught on every side,
Query greets Key with gentle art,
Attention stitches token to part,
From English thought to Rust-made chart.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title comprehensively and accurately summarizes the main changes: updates to course automation (CI/Gemini workflows), transformer lesson content/code, and new neuron lessons.
Docstring Coverage	✅ Passed	Docstring coverage is 87.77% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch codex/add-ci-and-gemini-writing-review

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request refactors the Transformer module to use semantic newtypes and expressive error handling via the thiserror crate. It introduces a more structured teaching approach ('English -> Algebra -> Rust') across the lessons and provides a complete implementation of an encoder block, including multi-head attention, positional encodings, and layer normalization. A new runnable example demonstrates the full forward pass, and validation scripts have been updated to support the new architecture. I have no feedback to provide.

coderabbitai

Actionable comments posted: 11

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@code/transformer/src/attention.rs`:
- Around line 378-380: Add a short doc comment above the phi function explaining
that phi implements a positive feature map (ReLU plus small epsilon) used to
approximate the softmax kernel for linear attention, and note why the epsilon
(1e-6) is added (to avoid exact zeros/ensure numerical stability and safe
normalization). Reference the phi function and DenseVector in the comment and
mention that the epsilon value can be tuned or documented as a stability
constant.
- Around line 363-376: LinearAttentionHead::new currently clones
QueryLayer/KeyLayer/ValueLayer just to call AttentionHead::new and discard the
clones; change the validation to avoid allocation by adding a validation method
that accepts references (e.g., AttentionHead::validate(query: &QueryLayer, key:
&KeyLayer, value: &ValueLayer)) or by overloading AttentionHead::new to take
references, then call that reference-based validator from
LinearAttentionHead::new and remove the unnecessary .clone() calls so the
original query_layer, key_layer, value_layer are kept and returned directly.

In `@code/transformer/src/math.rs`:
- Around line 249-257: The loop in mul_vec uses
out.iter_mut().enumerate().take(self.rows) redundantly because out already has
self.rows elements; remove the .take(self.rows) so the loop becomes for (row,
slot) in out.iter_mut().enumerate() { ... } and keep the inner logic using
self.get(row, col) and vector.get(col) to compute sum and assign *slot = sum,
ensuring behavior and bounds remain the same.
- Around line 62-69: The public methods get and set in the vector wrapper use
direct indexing and will panic on out-of-bounds access; either add safe,
bounds-checked variants (e.g., get_checked returning Option<f32> and set_checked
returning Result<(), Error> or bool) and have get/set call them or explicitly
document the panic behavior in the public API docs for get and set so callers
know these methods may panic on invalid indices; update function
documentation/comments for get and set to state the panic condition and, if you
add checked variants, implement and expose them alongside the existing get/set
methods.

In `@code/transformer/src/transformer.rs`:
- Around line 160-171: The condition in LayerNorm::forward_token is checking
token.len() != self.dimension() AND token.len() != self.beta.len(), but
self.beta.len() == self.dimension(), so remove the redundant second check: in
the forward_token function only compare token.len() to self.dimension() (keep
the existing ModelError::DimensionMismatch block and its labels/right_shape
as-is referencing "gamma/beta" and self.dimension()); this simplifies the
condition while preserving the same error behavior and messaging.
- Around line 397-404: The initial clone in Encoder::forward can be avoided by
replacing the owned TokenSequence initialization with an Option<TokenSequence>
(e.g., let mut current: Option<TokenSequence> = None) and using the incoming
&TokenSequence (x) on the first iteration, then storing subsequent results in
current via current = Some(block.forward(...)?); specifically, for each block in
self.blocks call block.forward(...) with either x (for the first iteration) or
current.as_ref().unwrap() (for later iterations), taking ownership of the
produced TokenSequence into current and returning current.unwrap() at the end;
update the forward method to use this pattern with the existing symbols
(forward, blocks, TokenSequence, ModelError) to eliminate the initial clone.

In `@code/transformer/src/types.rs`:
- Around line 156-159: The public token(&self, index: usize) -> &TokenEmbedding
currently panics on out-of-bounds; change it to a bounds-checked API by
returning Option<&TokenEmbedding> (pub fn token(&self, index: usize) ->
Option<&TokenEmbedding>) and use self.tokens.get(index) inside, or alternatively
add a new method (e.g., pub fn token_checked(...)->Option<&TokenEmbedding>) that
does this and keep the existing method only if you explicitly document its panic
behavior; update callers of token to handle the Option and adjust the doc
comment to clearly state which variant panics vs. which is safe.

In `@lessons/07-transformer/02-typed-rust-transformer-with-linear-attention.md`:
- Around line 385-395: Remove or reformat the commented Rust snippet: either
delete the commented-out Vector and Matrix block or replace it with a plain
explanatory note (e.g., "Future direction: consider using generic Vector<const
N: usize> and Matrix<const R: usize, const C: usize> types") so the lesson no
longer contains unusual `//` comments inside a text code fence; look for the
`Vector` and `Matrix` identifiers in the snippet and update that section
accordingly.

In `@lessons/07-transformer/exercises.md`:
- Around line 26-34: Add a minimal starter snippet/example for Exercise 2 that
shows constructing a three-token TokenSequence with model width 4 to guide
beginners; mention the TokenSequence::new constructor and show (in the snippet)
creating three tokens of equal width and passing them to TokenSequence::new, and
include brief comments answering "why every token needs the same width" and
"what TokenSequence::new would reject" so learners see both usage and the
failure mode.

In `@lessons/README.md`:
- Around line 17-18: Remove the duplicate table row for
"[07-transformer](07-transformer/README.md)"; keep a single entry for that
folder and update its module index/status so the row reads with "Module 6" and
"Authored" (replace the current "Module 6 | Planned" and remove the "Module 7 |
Authored" line), ensuring only one "[07-transformer](07-transformer/README.md) |
Module 6 | Authored | ..." row remains in the lessons table.

In `@scripts/check_lesson_rust_snippets.py`:
- Around line 1005-1007: The hardcoded DEVELOPER_DIR assignment in the env dict
causes non-macOS breakage; update the code that sets env["DEVELOPER_DIR"] so it
only runs on macOS (e.g., check platform.system() == "Darwin" or
sys.platform.startswith("darwin")) and leave env untouched on Linux/other OSes;
you can import platform (or sys) at the top of
scripts/check_lesson_rust_snippets.py and wrap the DEVELOPER_DIR assignment in
that conditional so env and CARGO_TARGET_DIR (referenced near target_dir) remain
unchanged on non-macOS runners.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 80b13887-b0b6-47cf-8eb0-999313bcb877

📥 Commits

Reviewing files that changed from the base of the PR and between f520d97 and 34dea97.

📒 Files selected for processing (25)

.github/workflows/ci.yml
.github/workflows/gemini-writing-review.yml
README.md
code/transformer/Cargo.toml
code/transformer/README.md
code/transformer/examples/encoder_demo.rs
code/transformer/src/attention.rs
code/transformer/src/error.rs
code/transformer/src/lib.rs
code/transformer/src/math.rs
code/transformer/src/nn.rs
code/transformer/src/transformer.rs
code/transformer/src/types.rs
lessons/06-attention/README.md
lessons/07-transformer/01-tiny-transformer-from-first-principles.md
lessons/07-transformer/02-typed-rust-transformer-with-linear-attention.md
lessons/07-transformer/03-transformer-encoder-in-small-chunks.md
lessons/07-transformer/README.md
lessons/07-transformer/exercises.md
lessons/07-transformer/solutions.md
lessons/README.md
references/README.md
references/repos/llms-from-scratch.md
scripts/check_course_content.py
scripts/check_lesson_rust_snippets.py

💤 Files with no reviewable changes (1)

code/transformer/src/nn.rs

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Gemini Review

🧰 Additional context used

🪛 LanguageTool

lessons/07-transformer/solutions.md

[style] ~13-~13: Consider an alternative for the overused word “exactly”.
Context: ...umns must equal vector length. That is exactly the kind of failure message you want wh...

(EXACTLY_PRECISELY)

lessons/07-transformer/02-typed-rust-transformer-with-linear-attention.md

[style] ~10-~10: This phrase can be considered informal. To elevate your writing, consider using a more professional alternative.
Context: ...e giant generic blob - a clean place to talk about linear attention without confusing it w...

(TALK_ABOUT_DISCUSS)

[style] ~95-~95: This phrasing can be overused. Try elevating your writing with a more formal alternative.
Context: ...velopers and researchers. ### Algebra If you want: math y = Wx then the matrix w...

(IF_YOU_WANT)

lessons/07-transformer/03-transformer-encoder-in-small-chunks.md

[style] ~60-~60: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ... Ok(()) } ``` ## Chunk 1: A vector is just a list of numbers ### English A token...

(REP_BE_JUST)

🪛 Ruff (0.15.7)

scripts/check_lesson_rust_snippets.py

[error] 1009-1009: subprocess call: check for execution of untrusted input

(S603)

[error] 1010-1016: Starting a process with a partial executable path

(S607)

🔇 Additional comments (34)

.github/workflows/ci.yml (1)

16-16: Verify that GitHub Actions v6 versions exist.

This workflow updates actions/checkout@v6 and actions/setup-python@v6 across multiple jobs. As of my knowledge cutoff in March 2025, v4 was the latest for checkout and v5 was the latest for setup-python. Please verify that these v6 versions have been released and are stable before merging. The verification script provided in the review comment for .github/workflows/gemini-writing-review.yml will check all action versions used across both workflow files.

Also applies to: 21-21, 41-41, 44-44, 60-60

.github/workflows/gemini-writing-review.yml (1)

32-32: This comment is resolved. All three GitHub Actions have v6 versions available and stable:

actions/checkout@v6: Latest v6.0.2 (released 2026-01-09)

actions/setup-python@v6: Latest v6.2.0 (released 2026-01-22)

actions/upload-artifact@v6: v6.0.0 (released 2025-12-12)

All v6 versions are production-ready. Note that v6 series require GitHub Actions runner v2.327.1+ due to Node.js 24 runtime updates.

lessons/06-attention/README.md (1)

21-21: Good reference addition for Module 5 context.

The new Raschka link is relevant and correctly scoped as supporting material.

code/transformer/Cargo.toml (1)

10-10: thiserror dependency aligns with the new error model.

This matches the crate’s shift to structured ModelError propagation.

references/README.md (1)

15-24: Nice provenance and usage-boundary clarification.

The new section and usage rule reduce ambiguity about how external repos should be used.

scripts/check_course_content.py (1)

100-112: Module-specific section validation is a good fit here.

The split keeps existing checks stable for other modules while enforcing the new 07-transformer structure.

references/repos/llms-from-scratch.md (1)

1-41: Well-scoped external reference entry.

The file clearly communicates relevance and non-template usage, which is exactly what this references area needs.

lessons/07-transformer/03-transformer-encoder-in-small-chunks.md (1)

19-859: Strong rewrite with consistent chunk rhythm and type-safe vocabulary.

The lesson progression is coherent, and the snippets stay aligned with the crate’s semantic API surface.

lessons/07-transformer/solutions.md (1)

3-102: Solutions now map cleanly to the typed transformer implementation.

Good alignment between debugging guidance, shape constraints, and encoder-block execution steps.

lessons/07-transformer/02-typed-rust-transformer-with-linear-attention.md (3)

1-19: LGTM! Clear learning objectives aligned with crate design.

The lesson structure effectively maps the pedagogical goals to the crate's semantic types and error-handling approach.

59-76: LGTM! Clean demonstration of semantic newtypes.

The snippet effectively shows how TokenEmbedding, Query, Key, and Value wrap DenseVector while maintaining distinct semantic roles.

197-218: Both QueryProjection and ProjectionBias are properly re-exported from the crate root.

The imports on lines 198-201 are correct. Both types are re-exported via the pub use types::{..., ProjectionBias, QueryProjection, ...} statement in lib.rs, making them accessible as shown in the code snippet.

lessons/07-transformer/README.md (1)

1-60: LGTM! Clear module overview aligned with lesson restructuring.

The README accurately describes the three complementary teaching modes and lists the updated lesson titles consistently with the actual lesson files.

code/transformer/examples/encoder_demo.rs (1)

1-129: LGTM! Well-structured encoder demo with consistent dimensions.

The example correctly demonstrates:

Two attention heads with compatible Q/K/V projections (4→2)

Output projection combining concatenated heads (4→4)

Feed-forward with hidden expansion (4→6→4)

Proper error propagation throughout

README.md (2)

43-45: LGTM! Lesson titles match actual lesson content.

The updated lesson titles ("What Problem the Transformer Solves", "Typed Rust Transformer with Expressive Errors") are consistent with the lesson file headers.

111-117: LGTM! Crate feature list accurately reflects the new API.

The coverage list properly documents the semantic newtypes, thiserror diagnostics, and encoder components.

lessons/07-transformer/01-tiny-transformer-from-first-principles.md (3)

54-68: LGTM! Clean introductory snippet demonstrating TokenSequence.

The code correctly shows creating a TokenSequence with TokenEmbedding wrappers and printing basic properties.

103-123: LGTM! Good demonstration of attention score computation.

The snippet correctly uses scaled_attention_score, AttentionScores, and softmax to show the attention mechanism step by step.

210-212: Path is correct. The command references valid files: encoder_demo.rs exists at code/transformer/examples/encoder_demo.rs and code/transformer/Cargo.toml exists as expected.

scripts/check_lesson_rust_snippets.py (1)

969-1031: LGTM! Sound approach for testing snippets against the local crate.

The refactored function correctly:

Creates isolated Cargo projects per snippet

Uses a shared target directory for build caching

References the local crate via path dependency

Reports failures with clear context (file path and block index)

The Ruff security warnings (S603, S607) are acceptable false positives for a build script that only runs controlled commands.

code/transformer/README.md (1)

1-57: LGTM! Clear documentation of crate scope and structure.

The README accurately describes the crate's educational purpose, lists components matching the actual API, and clearly states what's excluded (decoder, dropout, autograd, etc.).

lessons/07-transformer/exercises.md (1)

7-18: LGTM! Exercise 1 effectively demonstrates structured error diagnostics.

The code intentionally creates a dimension mismatch (2×2 matrix with 3-element vector) to trigger ModelError::DimensionMismatch, teaching students to read expressive error messages.

code/transformer/src/error.rs (1)

1-95: LGTM! Well-structured error module with comprehensive diagnostic variants.

The error design follows best practices:

Uses thiserror for ergonomic Error trait implementation

Provides shape-aware diagnostics with actionable hints

Consistent structure across all variants with operation field for traceability

The use of Vec<usize> for shapes does allocate on the error path, but this is acceptable for a teaching crate prioritizing clarity over performance.

code/transformer/src/lib.rs (1)

1-30: LGTM! Clean crate organization with well-structured public API.

The module layout and re-exports provide a coherent public API surface. The crate-level documentation clearly communicates the design philosophy of semantic types, fallible operations, and shape-aware diagnostics.

code/transformer/src/math.rs (1)

263-347: LGTM! Comprehensive test coverage for math primitives.

The tests cover:

Empty input rejection

Dimension mismatch errors

Happy path computations (dot product, matrix multiplication)

Ragged row rejection for matrix construction

code/transformer/src/types.rs (2)

6-29: LGTM! Clean macro for generating semantic newtypes.

The vector_role! macro effectively reduces boilerplate while maintaining consistent API across all vector wrapper types.

96-169: LGTM! Well-validated TokenSequence with comprehensive invariant checking.

The constructor enforces:

Non-empty sequences

Non-zero token dimensions

Consistent token widths across the sequence

The map_tokens method elegantly handles fallible transformations while maintaining invariants.

code/transformer/src/attention.rs (4)

11-24: LGTM! Clean validation helper for projection layers.

The validate_projection function correctly validates that the weight matrix output dimension matches the bias length, which is essential for the linear transformation Wx + b.

172-194: LGTM! Numerically stable softmax implementation.

The implementation correctly:

Subtracts the maximum value before exponentiation to prevent overflow

Checks for empty input

Validates the sum is finite and non-zero to catch numerical issues

410-425: Potential numerical concern with outer product accumulation.

The summary matrix accumulates outer products key[row] * value[col] for all key-value pairs. For long sequences, this could accumulate numerical error. The current implementation is correct for a teaching crate, but worth noting.

523-778: LGTM! Comprehensive test coverage for attention primitives.

The tests thoroughly cover:

Layer projection correctness

Dimension mismatch error reporting

Softmax numerical stability with large values

Weighted sum computation

Single-token reduction properties

Multi-head attention validation and forward pass

Linear attention permutation equivariance

code/transformer/src/transformer.rs (3)

50-65: Minor: Positional encoding formula differs slightly from the original paper.

The standard "Attention is All You Need" formula uses 2i for both sin and cos at the same index pair. Your implementation applies sin to even indices and cos to odd indices directly, which produces equivalent results but with a different index mapping. This is acceptable for a teaching crate but worth noting in documentation.

359-367: LGTM! Correct encoder block forward pass with pre-norm residual pattern.

The implementation correctly applies:

Multi-head attention

Residual connection (input + attention output)

Layer normalization

Feed-forward network

Residual connection

Layer normalization

This follows the standard Transformer encoder architecture.

407-627: LGTM! Thorough test suite covering all encoder components.

The tests validate:

Positional encoding pattern at position 0

Shape preservation through encoding addition

Elementwise token addition

Sequence length mismatch errors

Layer normalization behavior (centering, scaling, constant token handling)

Feed-forward shape preservation

Encoder block width validation

Output finiteness guarantees

Multi-block encoder execution

coderabbitai · 2026-03-27T04:32:35Z

+impl LinearAttentionHead {
+    /// Creates one simplified linear-attention head.
+    pub fn new(
+        query_layer: QueryLayer,
+        key_layer: KeyLayer,
+        value_layer: ValueLayer,
+    ) -> Result<Self, ModelError> {
+        AttentionHead::new(query_layer.clone(), key_layer.clone(), value_layer.clone())?;
+        Ok(Self {
+            query_layer,
+            key_layer,
+            value_layer,
+        })
+    }


🧹 Nitpick | 🔵 Trivial

Unnecessary cloning in LinearAttentionHead::new validation.

The layers are cloned just to validate via AttentionHead::new, then the clones are discarded. Consider extracting the validation logic to avoid the allocation.

♻️ Avoid unnecessary clones

/// Creates one simplified linear-attention head. pub fn new( query_layer: QueryLayer, key_layer: KeyLayer, value_layer: ValueLayer, ) -> Result<Self, ModelError> { - AttentionHead::new(query_layer.clone(), key_layer.clone(), value_layer.clone())?; + // Validate layer compatibility without constructing a full AttentionHead + if query_layer.input_dim() != key_layer.input_dim() + || query_layer.input_dim() != value_layer.input_dim() + { + return Err(ModelError::InvalidHeadConfiguration { + operation: "LinearAttentionHead::new", + details: "query, key, and value layers must accept the same token width", + }); + } + + if query_layer.output_dim() != key_layer.output_dim() { + return Err(ModelError::InvalidHeadConfiguration { + operation: "LinearAttentionHead::new", + details: "query and key layers must produce the same head dimension", + }); + } + Ok(Self { query_layer, key_layer, value_layer, }) }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@code/transformer/src/attention.rs` around lines 363 - 376, LinearAttentionHead::new currently clones QueryLayer/KeyLayer/ValueLayer just to call AttentionHead::new and discard the clones; change the validation to avoid allocation by adding a validation method that accepts references (e.g., AttentionHead::validate(query: &QueryLayer, key: &KeyLayer, value: &ValueLayer)) or by overloading AttentionHead::new to take references, then call that reference-based validator from LinearAttentionHead::new and remove the unnecessary .clone() calls so the original query_layer, key_layer, value_layer are kept and returned directly.

coderabbitai · 2026-03-27T04:32:35Z

+    fn phi(vector: &DenseVector) -> DenseVector {
+        vector.map(|value| value.max(0.0) + 1e-6)
+    }


🧹 Nitpick | 🔵 Trivial

Consider documenting the phi function's purpose and choice of epsilon.

The phi function implements a positive feature map (ReLU + epsilon) for linear attention. A brief doc comment would help readers understand its role in the kernel approximation.

📝 Add documentation

+ /// Positive feature map for linear attention: ReLU(x) + ε. + /// + /// The epsilon ensures positivity, which is required for the kernel + /// interpretation of linear attention. fn phi(vector: &DenseVector) -> DenseVector { vector.map(|value| value.max(0.0) + 1e-6) }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@code/transformer/src/attention.rs` around lines 378 - 380, Add a short doc comment above the phi function explaining that phi implements a positive feature map (ReLU plus small epsilon) used to approximate the softmax kernel for linear attention, and note why the epsilon (1e-6) is added (to avoid exact zeros/ensure numerical stability and safe normalization). Reference the phi function and DenseVector in the comment and mention that the epsilon value can be tuned or documented as a stability constant.

coderabbitai · 2026-03-27T04:32:35Z

+    pub fn get(&self, index: usize) -> f32 {
+        self.0[index]
+    }
+
+    /// Overwrites a single element.
+    pub fn set(&mut self, index: usize, value: f32) {
+        self.0[index] = value;
+    }


🧹 Nitpick | 🔵 Trivial

Consider bounds checking for get and set methods.

These methods use direct indexing which will panic on out-of-bounds access. While internal callers compute indices correctly, the public API exposes this panic risk to users.

For a teaching crate prioritizing clarity, this is acceptable if documented. However, consider adding bounds-checked variants or documenting the panic behavior.

📝 Optional: Add documentation about panic behavior

/// Reads a single element. + /// + /// # Panics + /// Panics if `index >= self.len()`. pub fn get(&self, index: usize) -> f32 { self.0[index] } /// Overwrites a single element. + /// + /// # Panics + /// Panics if `index >= self.len()`. pub fn set(&mut self, index: usize, value: f32) { self.0[index] = value; }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@code/transformer/src/math.rs` around lines 62 - 69, The public methods get and set in the vector wrapper use direct indexing and will panic on out-of-bounds access; either add safe, bounds-checked variants (e.g., get_checked returning Option<f32> and set_checked returning Result<(), Error> or bool) and have get/set call them or explicitly document the panic behavior in the public API docs for get and set so callers know these methods may panic on invalid indices; update function documentation/comments for get and set to state the panic condition and, if you add checked variants, implement and expose them alongside the existing get/set methods.

coderabbitai · 2026-03-27T04:32:35Z

+        for (row, slot) in out.iter_mut().enumerate().take(self.rows) {
            let mut sum = 0.0;
-            for c in 0..self.cols {
-                sum += self.get(r, c) * x.as_slice()[c];
+
+            for col in 0..self.cols {
+                sum += self.get(row, col) * vector.get(col);
            }
+
            *slot = sum;
        }


🧹 Nitpick | 🔵 Trivial

Redundant .take(self.rows) in mul_vec.

The .take(self.rows) is unnecessary since out is already created with exactly self.rows elements, so enumerate() will naturally stop at that boundary.

♻️ Simplify iteration

- for (row, slot) in out.iter_mut().enumerate().take(self.rows) { + for (row, slot) in out.iter_mut().enumerate() { let mut sum = 0.0; for col in 0..self.cols { sum += self.get(row, col) * vector.get(col); } *slot = sum; }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

for (row, slot) in out.iter_mut().enumerate().take(self.rows) {

let mut sum = 0.0;

for c in 0..self.cols {

sum += self.get(r, c) * x.as_slice()[c];

for col in 0..self.cols {

sum += self.get(row, col) * vector.get(col);

}

*slot = sum;

}

for (row, slot) in out.iter_mut().enumerate() {

let mut sum = 0.0;

for col in 0..self.cols {

sum += self.get(row, col) * vector.get(col);

}

*slot = sum;

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@code/transformer/src/math.rs` around lines 249 - 257, The loop in mul_vec uses out.iter_mut().enumerate().take(self.rows) redundantly because out already has self.rows elements; remove the .take(self.rows) so the loop becomes for (row, slot) in out.iter_mut().enumerate() { ... } and keep the inner logic using self.get(row, col) and vector.get(col) to compute sum and assign *slot = sum, ensuring behavior and bounds remain the same.

coderabbitai · 2026-03-27T04:32:36Z

+    /// Normalizes one token embedding.
+    pub fn forward_token(&self, token: &TokenEmbedding) -> Result<TokenEmbedding, ModelError> {
+        if token.len() != self.dimension() || token.len() != self.beta.len() {
+            return Err(ModelError::DimensionMismatch {
+                operation: "LayerNorm::forward_token",
+                left_label: "token",
+                left_shape: vec![token.len()],
+                right_label: "gamma/beta",
+                right_shape: vec![self.dimension()],
+                hint: "layer norm parameters must match token width",
+            });
+        }


🧹 Nitpick | 🔵 Trivial

Redundant check: token.len() != self.beta.len() is always false when first condition passes.

Since self.gamma.len() == self.beta.len() (both initialized with same d_model in constructor), the second part of the condition is redundant once token.len() != self.dimension() is checked.

♻️ Simplify condition

pub fn forward_token(&self, token: &TokenEmbedding) -> Result<TokenEmbedding, ModelError> { - if token.len() != self.dimension() || token.len() != self.beta.len() { + if token.len() != self.dimension() { return Err(ModelError::DimensionMismatch { operation: "LayerNorm::forward_token", left_label: "token", left_shape: vec![token.len()], right_label: "gamma/beta", right_shape: vec![self.dimension()], hint: "layer norm parameters must match token width", }); }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

/// Normalizes one token embedding.

pub fn forward_token(&self, token: &TokenEmbedding) -> Result<TokenEmbedding, ModelError> {

if token.len() != self.dimension() || token.len() != self.beta.len() {

return Err(ModelError::DimensionMismatch {

operation: "LayerNorm::forward_token",

left_label: "token",

left_shape: vec![token.len()],

right_label: "gamma/beta",

right_shape: vec![self.dimension()],

hint: "layer norm parameters must match token width",

});

}

/// Normalizes one token embedding.

pub fn forward_token(&self, token: &TokenEmbedding) -> Result<TokenEmbedding, ModelError> {

if token.len() != self.dimension() {

return Err(ModelError::DimensionMismatch {

operation: "LayerNorm::forward_token",

left_label: "token",

left_shape: vec![token.len()],

right_label: "gamma/beta",

right_shape: vec![self.dimension()],

hint: "layer norm parameters must match token width",

});

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@code/transformer/src/transformer.rs` around lines 160 - 171, The condition in LayerNorm::forward_token is checking token.len() != self.dimension() AND token.len() != self.beta.len(), but self.beta.len() == self.dimension(), so remove the redundant second check: in the forward_token function only compare token.len() to self.dimension() (keep the existing ModelError::DimensionMismatch block and its labels/right_shape as-is referencing "gamma/beta" and self.dimension()); this simplifies the condition while preserving the same error behavior and messaging.

coderabbitai · 2026-03-27T04:32:36Z

+    /// Returns one token by index.
+    pub fn token(&self, index: usize) -> &TokenEmbedding {
+        &self.tokens[index]
+    }


🧹 Nitpick | 🔵 Trivial

The token method can panic on out-of-bounds access.

Similar to the math module, this public method uses direct indexing without bounds checking.

📝 Optional: Document panic or add bounds-checked variant

/// Returns one token by index. + /// + /// # Panics + /// Panics if `index >= self.len()`. pub fn token(&self, index: usize) -> &TokenEmbedding { &self.tokens[index] }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

/// Returns one token by index.

pub fn token(&self, index: usize) -> &TokenEmbedding {

&self.tokens[index]

}

/// Returns one token by index.

///

/// # Panics

/// Panics if `index >= self.len()`.

pub fn token(&self, index: usize) -> &TokenEmbedding {

&self.tokens[index]

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@code/transformer/src/types.rs` around lines 156 - 159, The public token(&self, index: usize) -> &TokenEmbedding currently panics on out-of-bounds; change it to a bounds-checked API by returning Option<&TokenEmbedding> (pub fn token(&self, index: usize) -> Option<&TokenEmbedding>) and use self.tokens.get(index) inside, or alternatively add a new method (e.g., pub fn token_checked(...)->Option<&TokenEmbedding>) that does this and keep the existing method only if you explicitly document its panic behavior; update callers of token to handle the Option and adjust the doc comment to clearly state which variant panics vs. which is safe.

coderabbitai · 2026-03-27T04:32:36Z

+```text
+// Future direction, not the first lesson:
+//
+// pub struct Vector<const N: usize> {
+//     data: [f32; N],
+// }
+//
+// pub struct Matrix<const R: usize, const C: usize> {
+//     data: [[f32; C]; R],
+// }
 ```


🧹 Nitpick | 🔵 Trivial

Minor: Consider removing the commented-out code block or converting to a cleaner note.

The commented Rust code in a text block is fine for illustrating a future direction, but the // comment style inside a text block is slightly unusual. This is acceptable as-is since it's clearly labeled as a "future direction."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@lessons/07-transformer/02-typed-rust-transformer-with-linear-attention.md` around lines 385 - 395, Remove or reformat the commented Rust snippet: either delete the commented-out Vector and Matrix block or replace it with a plain explanatory note (e.g., "Future direction: consider using generic Vector<const N: usize> and Matrix<const R: usize, const C: usize> types") so the lesson no longer contains unusual `//` comments inside a text code fence; look for the `Vector` and `Matrix` identifiers in the snippet and update that section accordingly.

coderabbitai · 2026-03-27T04:32:36Z

+## Exercise 2: Build a `TokenSequence`

-Try a sharper input such as:
-
-```rust
-vec![1.0, 0.0, 0.0, 0.0]
-```
+Create a three-token sequence with model width `4`.

 Questions:

- how do the attention scores change?
- does the output become more concentrated or more mixed?
+- why does every token need the same width?
+- what would `TokenSequence::new` reject?



🧹 Nitpick | 🔵 Trivial

Consider providing a starter snippet for Exercise 2.

Exercises 3-7 are more open-ended, but Exercise 2 could benefit from a minimal code skeleton to help beginners get started, similar to Exercise 1's complete example.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@lessons/07-transformer/exercises.md` around lines 26 - 34, Add a minimal starter snippet/example for Exercise 2 that shows constructing a three-token TokenSequence with model width 4 to guide beginners; mention the TokenSequence::new constructor and show (in the snippet) creating three tokens of equal width and passing them to TokenSequence::new, and include brief comments answering "why every token needs the same width" and "what TokenSequence::new would reject" so learners see both usage and the failure mode.

coderabbitai · 2026-03-27T04:32:36Z

 | [07-transformer](07-transformer/README.md) | Module 6 | Planned | Understand the compact attention formulation as a batched version of explicit dot-product loops. |
-| [07-transformer](07-transformer/README.md) | Module 7 | Started | Assemble a tiny transformer block, then grow that picture into a standard encoder and a typed Rust linear-attention variant. |
+| [07-transformer](07-transformer/README.md) | Module 7 | Authored | Learn the encoder path through semantic Rust types, expressive errors, and an English/Algebra/Rust chunk ladder. |


⚠️ Potential issue | 🟠 Major

Fix duplicated 07-transformer row and incorrect module index.

The table now has two entries for the same folder with conflicting status, and Module 7 breaks the stated Module-0-based numbering. Keep a single 07-transformer row as Module 6 with Authored.

Suggested patch

-| [07-transformer](07-transformer/README.md) | Module 6 | Planned | Understand the compact attention formulation as a batched version of explicit dot-product loops. | -| [07-transformer](07-transformer/README.md) | Module 7 | Authored | Learn the encoder path through semantic Rust types, expressive errors, and an English/Algebra/Rust chunk ladder. | +| [07-transformer](07-transformer/README.md) | Module 6 | Authored | Learn the encoder path through semantic Rust types, expressive errors, and an English/Algebra/Rust chunk ladder. |

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

| [07-transformer](07-transformer/README.md) | Module 6 | Planned | Understand the compact attention formulation as a batched version of explicit dot-product loops. |

| [07-transformer](07-transformer/README.md) | Module 7 | Started | Assemble a tiny transformer block, then grow that picture into a standard encoder and a typed Rust linear-attention variant. |

| [07-transformer](07-transformer/README.md) | Module 7 | Authored | Learn the encoder path through semantic Rust types, expressive errors, and an English/Algebra/Rust chunk ladder. |

| [07-transformer](07-transformer/README.md) | Module 6 | Authored | Learn the encoder path through semantic Rust types, expressive errors, and an English/Algebra/Rust chunk ladder. |

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@lessons/README.md` around lines 17 - 18, Remove the duplicate table row for "[07-transformer](07-transformer/README.md)"; keep a single entry for that folder and update its module index/status so the row reads with "Module 6" and "Authored" (replace the current "Module 6 | Planned" and remove the "Module 7 | Authored" line), ensuring only one "[07-transformer](07-transformer/README.md) | Module 6 | Authored | ..." row remains in the lessons table.

coderabbitai · 2026-03-27T04:32:36Z

+            env = dict(os.environ)
+            env["CARGO_TARGET_DIR"] = str(target_dir)
+            env["DEVELOPER_DIR"] = "/Library/Developer/CommandLineTools"


⚠️ Potential issue | 🟡 Minor

Hardcoded DEVELOPER_DIR breaks portability on non-macOS systems.

Line 1007 sets DEVELOPER_DIR to /Library/Developer/CommandLineTools, which is macOS-specific. This will either be ignored or potentially cause issues on Linux CI runners.

Consider conditionally setting this only on macOS:

🛠️ Proposed fix for cross-platform compatibility

env = dict(os.environ) env["CARGO_TARGET_DIR"] = str(target_dir) - env["DEVELOPER_DIR"] = "/Library/Developer/CommandLineTools" + import sys + if sys.platform == "darwin": + env["DEVELOPER_DIR"] = "/Library/Developer/CommandLineTools"

Or move the import to the top of the file and use a cleaner conditional.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

env = dict(os.environ)

env["CARGO_TARGET_DIR"] = str(target_dir)

env["DEVELOPER_DIR"] = "/Library/Developer/CommandLineTools"

env = dict(os.environ)

env["CARGO_TARGET_DIR"] = str(target_dir)

import sys

if sys.platform == "darwin":

env["DEVELOPER_DIR"] = "/Library/Developer/CommandLineTools"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/check_lesson_rust_snippets.py` around lines 1005 - 1007, The hardcoded DEVELOPER_DIR assignment in the env dict causes non-macOS breakage; update the code that sets env["DEVELOPER_DIR"] so it only runs on macOS (e.g., check platform.system() == "Darwin" or sys.platform.startswith("darwin")) and leave env untouched on Linux/other OSes; you can import platform (or sys) at the top of scripts/check_lesson_rust_snippets.py and wrap the DEVELOPER_DIR assignment in that conditional so env and CARGO_TARGET_DIR (referenced near target_dir) remain unchanged on non-macOS runners.

hghalebi added 3 commits March 26, 2026 15:26

update actions for node 24

b617d18

add newtypes to transformer crate

57e48c0

Rewrite transformer module with typed encoder lessons

34dea97

gemini-code-assist Bot reviewed Mar 27, 2026

View reviewed changes

coderabbitai Bot reviewed Mar 27, 2026

View reviewed changes

Add authored neuron lessons and checks

4e74452

hghalebi changed the title ~~Codex/add ci and gemini writing review~~ [codex] update course automation, transformer, and neuron lessons Mar 27, 2026

hghalebi merged commit c42bccf into main Mar 27, 2026
5 of 6 checks passed

Conversation

hghalebi commented Mar 27, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What Changed

Course automation

Transformer module

Neuron module

Validation

Current Results

Notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hghalebi commented Mar 27, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 27, 2026 •

edited

Loading