Skip to content

[codex] update course automation, transformer, and neuron lessons#2

Merged
hghalebi merged 4 commits into
mainfrom
codex/add-ci-and-gemini-writing-review
Mar 27, 2026
Merged

[codex] update course automation, transformer, and neuron lessons#2
hghalebi merged 4 commits into
mainfrom
codex/add-ci-and-gemini-writing-review

Conversation

@hghalebi
Copy link
Copy Markdown
Owner

@hghalebi hghalebi commented Mar 27, 2026

Summary

This PR extends the course from a CI-only update into a larger curriculum and code pass.

It now includes:

  • the CI and Gemini writing-review automation
  • the typed Transformer encoder rewrite and companion crate updates
  • a newly authored neuron module that bridges Rust basics into the first trainable model

What Changed

Course automation

  • added CI checks for lesson structure, local Markdown links, snippet compilation, and transformer crate quality gates
  • added Gemini-based review for English clarity and technical-teaching quality

Transformer module

  • rewrote the Transformer lessons around English -> Algebra -> Rust
  • added semantic-newtype teaching code and thiserror-based diagnostics in the companion crate
  • added the runnable encoder demo and updated references/docs

Neuron module

  • promoted lessons/03-neuron from planned to authored
  • added 01-rust-essentials-for-a-tiny-neuron.md
  • added 02-neuron-as-a-chain-of-functions.md
  • added neuron exercises and solutions
  • integrated the tiny-factory metaphor, chain-of-functions framing, and official Rust docs links
  • updated course indexes and authored-module validators accordingly

Validation

I ran the relevant local checks for the authored content and companion code:

  • python3 scripts/check_course_content.py
  • python3 -m py_compile scripts/check_course_content.py scripts/check_lesson_rust_snippets.py scripts/gemini_review_markdown.py
  • python3 scripts/check_lesson_rust_snippets.py
  • cargo fmt --manifest-path code/transformer/Cargo.toml
  • cargo clippy --manifest-path code/transformer/Cargo.toml --all-targets --all-features
  • cargo test --manifest-path code/transformer/Cargo.toml
  • cargo run --example encoder_demo --manifest-path code/transformer/Cargo.toml

Current Results

  • course content checks passed across 39 Markdown files
  • 18 snippets compiled from the authored foundations, vectors, and neuron lessons
  • 29 snippets compiled from the Transformer module
  • transformer crate tests passed: 34/34

Notes

The branch already has an open PR, so this update refreshes the review context instead of opening a duplicate one.

Summary by CodeRabbit

  • New Features

    • Transformer encoder API, multi‑head/linear attention variants, and runnable encoder demo
    • Neuron module with two new Rust lessons and exercises
  • Documentation

    • Major rewrite of Transformer lessons (encoder-focused, semantic newtypes, expressive errors)
    • Added external reference to Raschka’s LLMs-from-scratch
  • Refactor

    • Typed math primitives (dense vectors/matrices) with fallible APIs and richer error types
    • Reorganized attention/encoder modules and semantic types
  • Chores

    • CI workflow action version bumps and added error-reporting dependency

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 27, 2026

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

Replaces panic-based math/nn primitives with fallible, semantic types and expressive errors; adds attention and encoder modules, updates public API, examples, and lessons to use new crate; bumps CI action versions and adds thiserror; updates snippet tooling and course docs to reflect encoder-first, typed-Rust teaching.

Changes

Cohort / File(s) Summary
CI & Dependency
\.github/workflows/ci.yml, \.github/workflows/gemini-writing-review.yml, code/transformer/Cargo.toml
Bumped GH Action versions (actions/checkout, actions/setup-python, actions/upload-artifact) and added thiserror = "2" dependency.
Core Math & Errors
code/transformer/src/math.rs, code/transformer/src/error.rs
Replaced dynamic Vector/Matrix with DenseVector/DenseMatrix and fallible constructors/ops returning Result; added ModelError enum (detailed error variants) and statistical helpers.
Semantic Types
code/transformer/src/types.rs
Introduced semantic newtypes (e.g., TokenEmbedding, Query, Key, Value, TokenSequence, projection/mask wrappers) with accessors and validated TokenSequence::new.
Attention Implementation
code/transformer/src/attention.rs
New attention module: projection layers (QueryLayer, KeyLayer, ValueLayer, OutputLayer), scaled_attention_score, softmax, weighted_sum, AttentionHead, LinearAttentionHead, MultiHeadAttention, concatenation and many unit tests.
Encoder Architecture
code/transformer/src/transformer.rs
Replaced prior Sequence/SelfAttention/TransformerBlock with encoder-focused APIs: PositionalEncodingTable, LayerNorm, FeedForward layers, TransformerEncoderBlock, Encoder, and validated residual helpers (add_sequences, add_token_embeddings).
API Surface & Cleanup
code/transformer/src/lib.rs, code/transformer/src/nn.rs
Restructured exports to attention, error, types; removed old nn.rs (deleted Linear, relu, softmax, phi, layer_norm, FeedForward, StaticLinear) and re-exported new primitives and error type.
Examples & Crate README
code/transformer/examples/encoder_demo.rs, code/transformer/README.md
Added runnable encoder demo example; README reframed as executable companion listing components, layout, run command, and updated scope/priorities.
Transformer Lessons & Exercises
lessons/07-transformer/...
Extensive lesson rewrites to emphasize encoder path, semantic newtypes, Result<ModelError>, chunked English→Algebra→Rust pedagogy; exercises and solutions re-scoped to new API and error-driven pedagogy.
Neuron Module & Lessons
lessons/03-neuron/*, code/neuron/README.md
Added Neuron track lessons (two lessons, exercises, solutions) and updated neuron README/status entries.
Course Docs & References
README.md, lessons/README.md, references/README.md, references/repos/llms-from-scratch.md, lessons/06-attention/README.md
Updated course map/status, renamed authored lesson titles, reframed learning strategy, and added Sebastian Raschka "LLMs From Scratch" reference and usage guidance.
Snippet & Validation Scripts
scripts/check_course_content.py, scripts/check_lesson_rust_snippets.py
check_course_content.py adds module-07-specific required headings; check_lesson_rust_snippets.py special-cases neuron blocks, revises which files compile as general snippets, and refactors transformer snippet checking to create temp Cargo projects and run cargo check with adjusted env/target handling and reporting.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Demo as EncoderDemo
    participant Pos as PositionalEncodingTable
    participant Encoder as TransformerEncoderBlock
    participant Norm as LayerNorm
    participant MHA as MultiHeadAttention
    participant Head as AttentionHead
    participant FF as FeedForward

    User->>Demo: run encoder_demo
    Demo->>Demo: build TokenSequence (embeddings)
    Demo->>Pos: add_to_sequence(seq)
    Pos-->>Demo: augmented TokenSequence

    Demo->>Encoder: forward(augmented_seq)

    rect rgba(100, 150, 200, 0.5)
    Encoder->>Norm: forward_sequence(input_seq)
    Norm-->>Encoder: normalized_seq
    Encoder->>MHA: forward(normalized_seq)
    MHA->>Head: forward(seq)
    Head-->>MHA: per-token AttentionOutput
    MHA-->>Encoder: attention TokenSequence
    Encoder->>Encoder: add_sequences(input, attention_out)
    end

    rect rgba(150, 200, 100, 0.5)
    Encoder->>Norm: forward_sequence(residual_seq)
    Norm-->>Encoder: normalized_seq2
    Encoder->>FF: forward_sequence(normalized_seq2)
    FF-->>Encoder: ff TokenSequence
    Encoder->>Encoder: add_sequences(residual, ff_out)
    end

    Encoder-->>Demo: final TokenSequence
    Demo->>User: print token vectors
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 Hops through types with careful stride,

Errors caught on every side,
Query greets Key with gentle art,
Attention stitches token to part,
From English thought to Rust-made chart.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title comprehensively and accurately summarizes the main changes: updates to course automation (CI/Gemini workflows), transformer lesson content/code, and new neuron lessons.
Docstring Coverage ✅ Passed Docstring coverage is 87.77% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/add-ci-and-gemini-writing-review

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the Transformer module to use semantic newtypes and expressive error handling via the thiserror crate. It introduces a more structured teaching approach ('English -> Algebra -> Rust') across the lessons and provides a complete implementation of an encoder block, including multi-head attention, positional encodings, and layer normalization. A new runnable example demonstrates the full forward pass, and validation scripts have been updated to support the new architecture. I have no feedback to provide.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@code/transformer/src/attention.rs`:
- Around line 378-380: Add a short doc comment above the phi function explaining
that phi implements a positive feature map (ReLU plus small epsilon) used to
approximate the softmax kernel for linear attention, and note why the epsilon
(1e-6) is added (to avoid exact zeros/ensure numerical stability and safe
normalization). Reference the phi function and DenseVector in the comment and
mention that the epsilon value can be tuned or documented as a stability
constant.
- Around line 363-376: LinearAttentionHead::new currently clones
QueryLayer/KeyLayer/ValueLayer just to call AttentionHead::new and discard the
clones; change the validation to avoid allocation by adding a validation method
that accepts references (e.g., AttentionHead::validate(query: &QueryLayer, key:
&KeyLayer, value: &ValueLayer)) or by overloading AttentionHead::new to take
references, then call that reference-based validator from
LinearAttentionHead::new and remove the unnecessary .clone() calls so the
original query_layer, key_layer, value_layer are kept and returned directly.

In `@code/transformer/src/math.rs`:
- Around line 249-257: The loop in mul_vec uses
out.iter_mut().enumerate().take(self.rows) redundantly because out already has
self.rows elements; remove the .take(self.rows) so the loop becomes for (row,
slot) in out.iter_mut().enumerate() { ... } and keep the inner logic using
self.get(row, col) and vector.get(col) to compute sum and assign *slot = sum,
ensuring behavior and bounds remain the same.
- Around line 62-69: The public methods get and set in the vector wrapper use
direct indexing and will panic on out-of-bounds access; either add safe,
bounds-checked variants (e.g., get_checked returning Option<f32> and set_checked
returning Result<(), Error> or bool) and have get/set call them or explicitly
document the panic behavior in the public API docs for get and set so callers
know these methods may panic on invalid indices; update function
documentation/comments for get and set to state the panic condition and, if you
add checked variants, implement and expose them alongside the existing get/set
methods.

In `@code/transformer/src/transformer.rs`:
- Around line 160-171: The condition in LayerNorm::forward_token is checking
token.len() != self.dimension() AND token.len() != self.beta.len(), but
self.beta.len() == self.dimension(), so remove the redundant second check: in
the forward_token function only compare token.len() to self.dimension() (keep
the existing ModelError::DimensionMismatch block and its labels/right_shape
as-is referencing "gamma/beta" and self.dimension()); this simplifies the
condition while preserving the same error behavior and messaging.
- Around line 397-404: The initial clone in Encoder::forward can be avoided by
replacing the owned TokenSequence initialization with an Option<TokenSequence>
(e.g., let mut current: Option<TokenSequence> = None) and using the incoming
&TokenSequence (x) on the first iteration, then storing subsequent results in
current via current = Some(block.forward(...)?); specifically, for each block in
self.blocks call block.forward(...) with either x (for the first iteration) or
current.as_ref().unwrap() (for later iterations), taking ownership of the
produced TokenSequence into current and returning current.unwrap() at the end;
update the forward method to use this pattern with the existing symbols
(forward, blocks, TokenSequence, ModelError) to eliminate the initial clone.

In `@code/transformer/src/types.rs`:
- Around line 156-159: The public token(&self, index: usize) -> &TokenEmbedding
currently panics on out-of-bounds; change it to a bounds-checked API by
returning Option<&TokenEmbedding> (pub fn token(&self, index: usize) ->
Option<&TokenEmbedding>) and use self.tokens.get(index) inside, or alternatively
add a new method (e.g., pub fn token_checked(...)->Option<&TokenEmbedding>) that
does this and keep the existing method only if you explicitly document its panic
behavior; update callers of token to handle the Option and adjust the doc
comment to clearly state which variant panics vs. which is safe.

In `@lessons/07-transformer/02-typed-rust-transformer-with-linear-attention.md`:
- Around line 385-395: Remove or reformat the commented Rust snippet: either
delete the commented-out Vector and Matrix block or replace it with a plain
explanatory note (e.g., "Future direction: consider using generic Vector<const
N: usize> and Matrix<const R: usize, const C: usize> types") so the lesson no
longer contains unusual `//` comments inside a text code fence; look for the
`Vector` and `Matrix` identifiers in the snippet and update that section
accordingly.

In `@lessons/07-transformer/exercises.md`:
- Around line 26-34: Add a minimal starter snippet/example for Exercise 2 that
shows constructing a three-token TokenSequence with model width 4 to guide
beginners; mention the TokenSequence::new constructor and show (in the snippet)
creating three tokens of equal width and passing them to TokenSequence::new, and
include brief comments answering "why every token needs the same width" and
"what TokenSequence::new would reject" so learners see both usage and the
failure mode.

In `@lessons/README.md`:
- Around line 17-18: Remove the duplicate table row for
"[07-transformer](07-transformer/README.md)"; keep a single entry for that
folder and update its module index/status so the row reads with "Module 6" and
"Authored" (replace the current "Module 6 | Planned" and remove the "Module 7 |
Authored" line), ensuring only one "[07-transformer](07-transformer/README.md) |
Module 6 | Authored | ..." row remains in the lessons table.

In `@scripts/check_lesson_rust_snippets.py`:
- Around line 1005-1007: The hardcoded DEVELOPER_DIR assignment in the env dict
causes non-macOS breakage; update the code that sets env["DEVELOPER_DIR"] so it
only runs on macOS (e.g., check platform.system() == "Darwin" or
sys.platform.startswith("darwin")) and leave env untouched on Linux/other OSes;
you can import platform (or sys) at the top of
scripts/check_lesson_rust_snippets.py and wrap the DEVELOPER_DIR assignment in
that conditional so env and CARGO_TARGET_DIR (referenced near target_dir) remain
unchanged on non-macOS runners.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 80b13887-b0b6-47cf-8eb0-999313bcb877

📥 Commits

Reviewing files that changed from the base of the PR and between f520d97 and 34dea97.

📒 Files selected for processing (25)
  • .github/workflows/ci.yml
  • .github/workflows/gemini-writing-review.yml
  • README.md
  • code/transformer/Cargo.toml
  • code/transformer/README.md
  • code/transformer/examples/encoder_demo.rs
  • code/transformer/src/attention.rs
  • code/transformer/src/error.rs
  • code/transformer/src/lib.rs
  • code/transformer/src/math.rs
  • code/transformer/src/nn.rs
  • code/transformer/src/transformer.rs
  • code/transformer/src/types.rs
  • lessons/06-attention/README.md
  • lessons/07-transformer/01-tiny-transformer-from-first-principles.md
  • lessons/07-transformer/02-typed-rust-transformer-with-linear-attention.md
  • lessons/07-transformer/03-transformer-encoder-in-small-chunks.md
  • lessons/07-transformer/README.md
  • lessons/07-transformer/exercises.md
  • lessons/07-transformer/solutions.md
  • lessons/README.md
  • references/README.md
  • references/repos/llms-from-scratch.md
  • scripts/check_course_content.py
  • scripts/check_lesson_rust_snippets.py
💤 Files with no reviewable changes (1)
  • code/transformer/src/nn.rs
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Gemini Review
🧰 Additional context used
🪛 LanguageTool
lessons/07-transformer/solutions.md

[style] ~13-~13: Consider an alternative for the overused word “exactly”.
Context: ...umns must equal vector length. That is exactly the kind of failure message you want wh...

(EXACTLY_PRECISELY)

lessons/07-transformer/02-typed-rust-transformer-with-linear-attention.md

[style] ~10-~10: This phrase can be considered informal. To elevate your writing, consider using a more professional alternative.
Context: ...e giant generic blob - a clean place to talk about linear attention without confusing it w...

(TALK_ABOUT_DISCUSS)


[style] ~95-~95: This phrasing can be overused. Try elevating your writing with a more formal alternative.
Context: ...velopers and researchers. ### Algebra If you want: math y = Wx then the matrix w...

(IF_YOU_WANT)

lessons/07-transformer/03-transformer-encoder-in-small-chunks.md

[style] ~60-~60: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ... Ok(()) } ``` ## Chunk 1: A vector is just a list of numbers ### English A token...

(REP_BE_JUST)

🪛 Ruff (0.15.7)
scripts/check_lesson_rust_snippets.py

[error] 1009-1009: subprocess call: check for execution of untrusted input

(S603)


[error] 1010-1016: Starting a process with a partial executable path

(S607)

🔇 Additional comments (34)
.github/workflows/ci.yml (1)

16-16: Verify that GitHub Actions v6 versions exist.

This workflow updates actions/checkout@v6 and actions/setup-python@v6 across multiple jobs. As of my knowledge cutoff in March 2025, v4 was the latest for checkout and v5 was the latest for setup-python. Please verify that these v6 versions have been released and are stable before merging. The verification script provided in the review comment for .github/workflows/gemini-writing-review.yml will check all action versions used across both workflow files.

Also applies to: 21-21, 41-41, 44-44, 60-60

.github/workflows/gemini-writing-review.yml (1)

32-32: This comment is resolved. All three GitHub Actions have v6 versions available and stable:

  • actions/checkout@v6: Latest v6.0.2 (released 2026-01-09)
  • actions/setup-python@v6: Latest v6.2.0 (released 2026-01-22)
  • actions/upload-artifact@v6: v6.0.0 (released 2025-12-12)

All v6 versions are production-ready. Note that v6 series require GitHub Actions runner v2.327.1+ due to Node.js 24 runtime updates.

lessons/06-attention/README.md (1)

21-21: Good reference addition for Module 5 context.

The new Raschka link is relevant and correctly scoped as supporting material.

code/transformer/Cargo.toml (1)

10-10: thiserror dependency aligns with the new error model.

This matches the crate’s shift to structured ModelError propagation.

references/README.md (1)

15-24: Nice provenance and usage-boundary clarification.

The new section and usage rule reduce ambiguity about how external repos should be used.

scripts/check_course_content.py (1)

100-112: Module-specific section validation is a good fit here.

The split keeps existing checks stable for other modules while enforcing the new 07-transformer structure.

references/repos/llms-from-scratch.md (1)

1-41: Well-scoped external reference entry.

The file clearly communicates relevance and non-template usage, which is exactly what this references area needs.

lessons/07-transformer/03-transformer-encoder-in-small-chunks.md (1)

19-859: Strong rewrite with consistent chunk rhythm and type-safe vocabulary.

The lesson progression is coherent, and the snippets stay aligned with the crate’s semantic API surface.

lessons/07-transformer/solutions.md (1)

3-102: Solutions now map cleanly to the typed transformer implementation.

Good alignment between debugging guidance, shape constraints, and encoder-block execution steps.

lessons/07-transformer/02-typed-rust-transformer-with-linear-attention.md (3)

1-19: LGTM! Clear learning objectives aligned with crate design.

The lesson structure effectively maps the pedagogical goals to the crate's semantic types and error-handling approach.


59-76: LGTM! Clean demonstration of semantic newtypes.

The snippet effectively shows how TokenEmbedding, Query, Key, and Value wrap DenseVector while maintaining distinct semantic roles.


197-218: Both QueryProjection and ProjectionBias are properly re-exported from the crate root.

The imports on lines 198-201 are correct. Both types are re-exported via the pub use types::{..., ProjectionBias, QueryProjection, ...} statement in lib.rs, making them accessible as shown in the code snippet.

lessons/07-transformer/README.md (1)

1-60: LGTM! Clear module overview aligned with lesson restructuring.

The README accurately describes the three complementary teaching modes and lists the updated lesson titles consistently with the actual lesson files.

code/transformer/examples/encoder_demo.rs (1)

1-129: LGTM! Well-structured encoder demo with consistent dimensions.

The example correctly demonstrates:

  • Two attention heads with compatible Q/K/V projections (4→2)
  • Output projection combining concatenated heads (4→4)
  • Feed-forward with hidden expansion (4→6→4)
  • Proper error propagation throughout
README.md (2)

43-45: LGTM! Lesson titles match actual lesson content.

The updated lesson titles ("What Problem the Transformer Solves", "Typed Rust Transformer with Expressive Errors") are consistent with the lesson file headers.


111-117: LGTM! Crate feature list accurately reflects the new API.

The coverage list properly documents the semantic newtypes, thiserror diagnostics, and encoder components.

lessons/07-transformer/01-tiny-transformer-from-first-principles.md (3)

54-68: LGTM! Clean introductory snippet demonstrating TokenSequence.

The code correctly shows creating a TokenSequence with TokenEmbedding wrappers and printing basic properties.


103-123: LGTM! Good demonstration of attention score computation.

The snippet correctly uses scaled_attention_score, AttentionScores, and softmax to show the attention mechanism step by step.


210-212: Path is correct. The command references valid files: encoder_demo.rs exists at code/transformer/examples/encoder_demo.rs and code/transformer/Cargo.toml exists as expected.

scripts/check_lesson_rust_snippets.py (1)

969-1031: LGTM! Sound approach for testing snippets against the local crate.

The refactored function correctly:

  • Creates isolated Cargo projects per snippet
  • Uses a shared target directory for build caching
  • References the local crate via path dependency
  • Reports failures with clear context (file path and block index)

The Ruff security warnings (S603, S607) are acceptable false positives for a build script that only runs controlled commands.

code/transformer/README.md (1)

1-57: LGTM! Clear documentation of crate scope and structure.

The README accurately describes the crate's educational purpose, lists components matching the actual API, and clearly states what's excluded (decoder, dropout, autograd, etc.).

lessons/07-transformer/exercises.md (1)

7-18: LGTM! Exercise 1 effectively demonstrates structured error diagnostics.

The code intentionally creates a dimension mismatch (2×2 matrix with 3-element vector) to trigger ModelError::DimensionMismatch, teaching students to read expressive error messages.

code/transformer/src/error.rs (1)

1-95: LGTM! Well-structured error module with comprehensive diagnostic variants.

The error design follows best practices:

  • Uses thiserror for ergonomic Error trait implementation
  • Provides shape-aware diagnostics with actionable hints
  • Consistent structure across all variants with operation field for traceability

The use of Vec<usize> for shapes does allocate on the error path, but this is acceptable for a teaching crate prioritizing clarity over performance.

code/transformer/src/lib.rs (1)

1-30: LGTM! Clean crate organization with well-structured public API.

The module layout and re-exports provide a coherent public API surface. The crate-level documentation clearly communicates the design philosophy of semantic types, fallible operations, and shape-aware diagnostics.

code/transformer/src/math.rs (1)

263-347: LGTM! Comprehensive test coverage for math primitives.

The tests cover:

  • Empty input rejection
  • Dimension mismatch errors
  • Happy path computations (dot product, matrix multiplication)
  • Ragged row rejection for matrix construction
code/transformer/src/types.rs (2)

6-29: LGTM! Clean macro for generating semantic newtypes.

The vector_role! macro effectively reduces boilerplate while maintaining consistent API across all vector wrapper types.


96-169: LGTM! Well-validated TokenSequence with comprehensive invariant checking.

The constructor enforces:

  • Non-empty sequences
  • Non-zero token dimensions
  • Consistent token widths across the sequence

The map_tokens method elegantly handles fallible transformations while maintaining invariants.

code/transformer/src/attention.rs (4)

11-24: LGTM! Clean validation helper for projection layers.

The validate_projection function correctly validates that the weight matrix output dimension matches the bias length, which is essential for the linear transformation Wx + b.


172-194: LGTM! Numerically stable softmax implementation.

The implementation correctly:

  • Subtracts the maximum value before exponentiation to prevent overflow
  • Checks for empty input
  • Validates the sum is finite and non-zero to catch numerical issues

410-425: Potential numerical concern with outer product accumulation.

The summary matrix accumulates outer products key[row] * value[col] for all key-value pairs. For long sequences, this could accumulate numerical error. The current implementation is correct for a teaching crate, but worth noting.


523-778: LGTM! Comprehensive test coverage for attention primitives.

The tests thoroughly cover:

  • Layer projection correctness
  • Dimension mismatch error reporting
  • Softmax numerical stability with large values
  • Weighted sum computation
  • Single-token reduction properties
  • Multi-head attention validation and forward pass
  • Linear attention permutation equivariance
code/transformer/src/transformer.rs (3)

50-65: Minor: Positional encoding formula differs slightly from the original paper.

The standard "Attention is All You Need" formula uses 2i for both sin and cos at the same index pair. Your implementation applies sin to even indices and cos to odd indices directly, which produces equivalent results but with a different index mapping. This is acceptable for a teaching crate but worth noting in documentation.


359-367: LGTM! Correct encoder block forward pass with pre-norm residual pattern.

The implementation correctly applies:

  1. Multi-head attention
  2. Residual connection (input + attention output)
  3. Layer normalization
  4. Feed-forward network
  5. Residual connection
  6. Layer normalization

This follows the standard Transformer encoder architecture.


407-627: LGTM! Thorough test suite covering all encoder components.

The tests validate:

  • Positional encoding pattern at position 0
  • Shape preservation through encoding addition
  • Elementwise token addition
  • Sequence length mismatch errors
  • Layer normalization behavior (centering, scaling, constant token handling)
  • Feed-forward shape preservation
  • Encoder block width validation
  • Output finiteness guarantees
  • Multi-block encoder execution

Comment on lines +363 to +376
impl LinearAttentionHead {
/// Creates one simplified linear-attention head.
pub fn new(
query_layer: QueryLayer,
key_layer: KeyLayer,
value_layer: ValueLayer,
) -> Result<Self, ModelError> {
AttentionHead::new(query_layer.clone(), key_layer.clone(), value_layer.clone())?;
Ok(Self {
query_layer,
key_layer,
value_layer,
})
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Unnecessary cloning in LinearAttentionHead::new validation.

The layers are cloned just to validate via AttentionHead::new, then the clones are discarded. Consider extracting the validation logic to avoid the allocation.

♻️ Avoid unnecessary clones
     /// Creates one simplified linear-attention head.
     pub fn new(
         query_layer: QueryLayer,
         key_layer: KeyLayer,
         value_layer: ValueLayer,
     ) -> Result<Self, ModelError> {
-        AttentionHead::new(query_layer.clone(), key_layer.clone(), value_layer.clone())?;
+        // Validate layer compatibility without constructing a full AttentionHead
+        if query_layer.input_dim() != key_layer.input_dim()
+            || query_layer.input_dim() != value_layer.input_dim()
+        {
+            return Err(ModelError::InvalidHeadConfiguration {
+                operation: "LinearAttentionHead::new",
+                details: "query, key, and value layers must accept the same token width",
+            });
+        }
+
+        if query_layer.output_dim() != key_layer.output_dim() {
+            return Err(ModelError::InvalidHeadConfiguration {
+                operation: "LinearAttentionHead::new",
+                details: "query and key layers must produce the same head dimension",
+            });
+        }
+
         Ok(Self {
             query_layer,
             key_layer,
             value_layer,
         })
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@code/transformer/src/attention.rs` around lines 363 - 376,
LinearAttentionHead::new currently clones QueryLayer/KeyLayer/ValueLayer just to
call AttentionHead::new and discard the clones; change the validation to avoid
allocation by adding a validation method that accepts references (e.g.,
AttentionHead::validate(query: &QueryLayer, key: &KeyLayer, value: &ValueLayer))
or by overloading AttentionHead::new to take references, then call that
reference-based validator from LinearAttentionHead::new and remove the
unnecessary .clone() calls so the original query_layer, key_layer, value_layer
are kept and returned directly.

Comment on lines +378 to +380
fn phi(vector: &DenseVector) -> DenseVector {
vector.map(|value| value.max(0.0) + 1e-6)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider documenting the phi function's purpose and choice of epsilon.

The phi function implements a positive feature map (ReLU + epsilon) for linear attention. A brief doc comment would help readers understand its role in the kernel approximation.

📝 Add documentation
+    /// Positive feature map for linear attention: ReLU(x) + ε.
+    ///
+    /// The epsilon ensures positivity, which is required for the kernel
+    /// interpretation of linear attention.
     fn phi(vector: &DenseVector) -> DenseVector {
         vector.map(|value| value.max(0.0) + 1e-6)
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@code/transformer/src/attention.rs` around lines 378 - 380, Add a short doc
comment above the phi function explaining that phi implements a positive feature
map (ReLU plus small epsilon) used to approximate the softmax kernel for linear
attention, and note why the epsilon (1e-6) is added (to avoid exact zeros/ensure
numerical stability and safe normalization). Reference the phi function and
DenseVector in the comment and mention that the epsilon value can be tuned or
documented as a stability constant.

Comment on lines +62 to +69
pub fn get(&self, index: usize) -> f32 {
self.0[index]
}

/// Overwrites a single element.
pub fn set(&mut self, index: usize, value: f32) {
self.0[index] = value;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider bounds checking for get and set methods.

These methods use direct indexing which will panic on out-of-bounds access. While internal callers compute indices correctly, the public API exposes this panic risk to users.

For a teaching crate prioritizing clarity, this is acceptable if documented. However, consider adding bounds-checked variants or documenting the panic behavior.

📝 Optional: Add documentation about panic behavior
     /// Reads a single element.
+    ///
+    /// # Panics
+    /// Panics if `index >= self.len()`.
     pub fn get(&self, index: usize) -> f32 {
         self.0[index]
     }

     /// Overwrites a single element.
+    ///
+    /// # Panics
+    /// Panics if `index >= self.len()`.
     pub fn set(&mut self, index: usize, value: f32) {
         self.0[index] = value;
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@code/transformer/src/math.rs` around lines 62 - 69, The public methods get
and set in the vector wrapper use direct indexing and will panic on
out-of-bounds access; either add safe, bounds-checked variants (e.g.,
get_checked returning Option<f32> and set_checked returning Result<(), Error> or
bool) and have get/set call them or explicitly document the panic behavior in
the public API docs for get and set so callers know these methods may panic on
invalid indices; update function documentation/comments for get and set to state
the panic condition and, if you add checked variants, implement and expose them
alongside the existing get/set methods.

Comment on lines +249 to 257
for (row, slot) in out.iter_mut().enumerate().take(self.rows) {
let mut sum = 0.0;
for c in 0..self.cols {
sum += self.get(r, c) * x.as_slice()[c];

for col in 0..self.cols {
sum += self.get(row, col) * vector.get(col);
}

*slot = sum;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Redundant .take(self.rows) in mul_vec.

The .take(self.rows) is unnecessary since out is already created with exactly self.rows elements, so enumerate() will naturally stop at that boundary.

♻️ Simplify iteration
-        for (row, slot) in out.iter_mut().enumerate().take(self.rows) {
+        for (row, slot) in out.iter_mut().enumerate() {
             let mut sum = 0.0;

             for col in 0..self.cols {
                 sum += self.get(row, col) * vector.get(col);
             }

             *slot = sum;
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for (row, slot) in out.iter_mut().enumerate().take(self.rows) {
let mut sum = 0.0;
for c in 0..self.cols {
sum += self.get(r, c) * x.as_slice()[c];
for col in 0..self.cols {
sum += self.get(row, col) * vector.get(col);
}
*slot = sum;
}
for (row, slot) in out.iter_mut().enumerate() {
let mut sum = 0.0;
for col in 0..self.cols {
sum += self.get(row, col) * vector.get(col);
}
*slot = sum;
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@code/transformer/src/math.rs` around lines 249 - 257, The loop in mul_vec
uses out.iter_mut().enumerate().take(self.rows) redundantly because out already
has self.rows elements; remove the .take(self.rows) so the loop becomes for
(row, slot) in out.iter_mut().enumerate() { ... } and keep the inner logic using
self.get(row, col) and vector.get(col) to compute sum and assign *slot = sum,
ensuring behavior and bounds remain the same.

Comment on lines +160 to +171
/// Normalizes one token embedding.
pub fn forward_token(&self, token: &TokenEmbedding) -> Result<TokenEmbedding, ModelError> {
if token.len() != self.dimension() || token.len() != self.beta.len() {
return Err(ModelError::DimensionMismatch {
operation: "LayerNorm::forward_token",
left_label: "token",
left_shape: vec![token.len()],
right_label: "gamma/beta",
right_shape: vec![self.dimension()],
hint: "layer norm parameters must match token width",
});
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Redundant check: token.len() != self.beta.len() is always false when first condition passes.

Since self.gamma.len() == self.beta.len() (both initialized with same d_model in constructor), the second part of the condition is redundant once token.len() != self.dimension() is checked.

♻️ Simplify condition
     pub fn forward_token(&self, token: &TokenEmbedding) -> Result<TokenEmbedding, ModelError> {
-        if token.len() != self.dimension() || token.len() != self.beta.len() {
+        if token.len() != self.dimension() {
             return Err(ModelError::DimensionMismatch {
                 operation: "LayerNorm::forward_token",
                 left_label: "token",
                 left_shape: vec![token.len()],
                 right_label: "gamma/beta",
                 right_shape: vec![self.dimension()],
                 hint: "layer norm parameters must match token width",
             });
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/// Normalizes one token embedding.
pub fn forward_token(&self, token: &TokenEmbedding) -> Result<TokenEmbedding, ModelError> {
if token.len() != self.dimension() || token.len() != self.beta.len() {
return Err(ModelError::DimensionMismatch {
operation: "LayerNorm::forward_token",
left_label: "token",
left_shape: vec![token.len()],
right_label: "gamma/beta",
right_shape: vec![self.dimension()],
hint: "layer norm parameters must match token width",
});
}
/// Normalizes one token embedding.
pub fn forward_token(&self, token: &TokenEmbedding) -> Result<TokenEmbedding, ModelError> {
if token.len() != self.dimension() {
return Err(ModelError::DimensionMismatch {
operation: "LayerNorm::forward_token",
left_label: "token",
left_shape: vec![token.len()],
right_label: "gamma/beta",
right_shape: vec![self.dimension()],
hint: "layer norm parameters must match token width",
});
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@code/transformer/src/transformer.rs` around lines 160 - 171, The condition in
LayerNorm::forward_token is checking token.len() != self.dimension() AND
token.len() != self.beta.len(), but self.beta.len() == self.dimension(), so
remove the redundant second check: in the forward_token function only compare
token.len() to self.dimension() (keep the existing ModelError::DimensionMismatch
block and its labels/right_shape as-is referencing "gamma/beta" and
self.dimension()); this simplifies the condition while preserving the same error
behavior and messaging.

Comment on lines +156 to +159
/// Returns one token by index.
pub fn token(&self, index: usize) -> &TokenEmbedding {
&self.tokens[index]
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

The token method can panic on out-of-bounds access.

Similar to the math module, this public method uses direct indexing without bounds checking.

📝 Optional: Document panic or add bounds-checked variant
     /// Returns one token by index.
+    ///
+    /// # Panics
+    /// Panics if `index >= self.len()`.
     pub fn token(&self, index: usize) -> &TokenEmbedding {
         &self.tokens[index]
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/// Returns one token by index.
pub fn token(&self, index: usize) -> &TokenEmbedding {
&self.tokens[index]
}
/// Returns one token by index.
///
/// # Panics
/// Panics if `index >= self.len()`.
pub fn token(&self, index: usize) -> &TokenEmbedding {
&self.tokens[index]
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@code/transformer/src/types.rs` around lines 156 - 159, The public
token(&self, index: usize) -> &TokenEmbedding currently panics on out-of-bounds;
change it to a bounds-checked API by returning Option<&TokenEmbedding> (pub fn
token(&self, index: usize) -> Option<&TokenEmbedding>) and use
self.tokens.get(index) inside, or alternatively add a new method (e.g., pub fn
token_checked(...)->Option<&TokenEmbedding>) that does this and keep the
existing method only if you explicitly document its panic behavior; update
callers of token to handle the Option and adjust the doc comment to clearly
state which variant panics vs. which is safe.

Comment on lines +385 to 395
```text
// Future direction, not the first lesson:
//
// pub struct Vector<const N: usize> {
// data: [f32; N],
// }
//
// pub struct Matrix<const R: usize, const C: usize> {
// data: [[f32; C]; R],
// }
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Minor: Consider removing the commented-out code block or converting to a cleaner note.

The commented Rust code in a text block is fine for illustrating a future direction, but the // comment style inside a text block is slightly unusual. This is acceptable as-is since it's clearly labeled as a "future direction."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lessons/07-transformer/02-typed-rust-transformer-with-linear-attention.md`
around lines 385 - 395, Remove or reformat the commented Rust snippet: either
delete the commented-out Vector and Matrix block or replace it with a plain
explanatory note (e.g., "Future direction: consider using generic Vector<const
N: usize> and Matrix<const R: usize, const C: usize> types") so the lesson no
longer contains unusual `//` comments inside a text code fence; look for the
`Vector` and `Matrix` identifiers in the snippet and update that section
accordingly.

Comment on lines +26 to 34
## Exercise 2: Build a `TokenSequence`

Try a sharper input such as:

```rust
vec![1.0, 0.0, 0.0, 0.0]
```
Create a three-token sequence with model width `4`.

Questions:

- how do the attention scores change?
- does the output become more concentrated or more mixed?
- why does every token need the same width?
- what would `TokenSequence::new` reject?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider providing a starter snippet for Exercise 2.

Exercises 3-7 are more open-ended, but Exercise 2 could benefit from a minimal code skeleton to help beginners get started, similar to Exercise 1's complete example.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lessons/07-transformer/exercises.md` around lines 26 - 34, Add a minimal
starter snippet/example for Exercise 2 that shows constructing a three-token
TokenSequence with model width 4 to guide beginners; mention the
TokenSequence::new constructor and show (in the snippet) creating three tokens
of equal width and passing them to TokenSequence::new, and include brief
comments answering "why every token needs the same width" and "what
TokenSequence::new would reject" so learners see both usage and the failure
mode.

Comment thread lessons/README.md
Comment on lines 17 to +18
| [07-transformer](07-transformer/README.md) | Module 6 | Planned | Understand the compact attention formulation as a batched version of explicit dot-product loops. |
| [07-transformer](07-transformer/README.md) | Module 7 | Started | Assemble a tiny transformer block, then grow that picture into a standard encoder and a typed Rust linear-attention variant. |
| [07-transformer](07-transformer/README.md) | Module 7 | Authored | Learn the encoder path through semantic Rust types, expressive errors, and an English/Algebra/Rust chunk ladder. |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix duplicated 07-transformer row and incorrect module index.

The table now has two entries for the same folder with conflicting status, and Module 7 breaks the stated Module-0-based numbering. Keep a single 07-transformer row as Module 6 with Authored.

Suggested patch
-| [07-transformer](07-transformer/README.md) | Module 6 | Planned | Understand the compact attention formulation as a batched version of explicit dot-product loops. |
-| [07-transformer](07-transformer/README.md) | Module 7 | Authored | Learn the encoder path through semantic Rust types, expressive errors, and an English/Algebra/Rust chunk ladder. |
+| [07-transformer](07-transformer/README.md) | Module 6 | Authored | Learn the encoder path through semantic Rust types, expressive errors, and an English/Algebra/Rust chunk ladder. |
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
| [07-transformer](07-transformer/README.md) | Module 6 | Planned | Understand the compact attention formulation as a batched version of explicit dot-product loops. |
| [07-transformer](07-transformer/README.md) | Module 7 | Started | Assemble a tiny transformer block, then grow that picture into a standard encoder and a typed Rust linear-attention variant. |
| [07-transformer](07-transformer/README.md) | Module 7 | Authored | Learn the encoder path through semantic Rust types, expressive errors, and an English/Algebra/Rust chunk ladder. |
| [07-transformer](07-transformer/README.md) | Module 6 | Authored | Learn the encoder path through semantic Rust types, expressive errors, and an English/Algebra/Rust chunk ladder. |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lessons/README.md` around lines 17 - 18, Remove the duplicate table row for
"[07-transformer](07-transformer/README.md)"; keep a single entry for that
folder and update its module index/status so the row reads with "Module 6" and
"Authored" (replace the current "Module 6 | Planned" and remove the "Module 7 |
Authored" line), ensuring only one "[07-transformer](07-transformer/README.md) |
Module 6 | Authored | ..." row remains in the lessons table.

Comment on lines +1005 to +1007
env = dict(os.environ)
env["CARGO_TARGET_DIR"] = str(target_dir)
env["DEVELOPER_DIR"] = "/Library/Developer/CommandLineTools"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Hardcoded DEVELOPER_DIR breaks portability on non-macOS systems.

Line 1007 sets DEVELOPER_DIR to /Library/Developer/CommandLineTools, which is macOS-specific. This will either be ignored or potentially cause issues on Linux CI runners.

Consider conditionally setting this only on macOS:

🛠️ Proposed fix for cross-platform compatibility
             env = dict(os.environ)
             env["CARGO_TARGET_DIR"] = str(target_dir)
-            env["DEVELOPER_DIR"] = "/Library/Developer/CommandLineTools"
+            import sys
+            if sys.platform == "darwin":
+                env["DEVELOPER_DIR"] = "/Library/Developer/CommandLineTools"

Or move the import to the top of the file and use a cleaner conditional.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
env = dict(os.environ)
env["CARGO_TARGET_DIR"] = str(target_dir)
env["DEVELOPER_DIR"] = "/Library/Developer/CommandLineTools"
env = dict(os.environ)
env["CARGO_TARGET_DIR"] = str(target_dir)
import sys
if sys.platform == "darwin":
env["DEVELOPER_DIR"] = "/Library/Developer/CommandLineTools"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/check_lesson_rust_snippets.py` around lines 1005 - 1007, The
hardcoded DEVELOPER_DIR assignment in the env dict causes non-macOS breakage;
update the code that sets env["DEVELOPER_DIR"] so it only runs on macOS (e.g.,
check platform.system() == "Darwin" or sys.platform.startswith("darwin")) and
leave env untouched on Linux/other OSes; you can import platform (or sys) at the
top of scripts/check_lesson_rust_snippets.py and wrap the DEVELOPER_DIR
assignment in that conditional so env and CARGO_TARGET_DIR (referenced near
target_dir) remain unchanged on non-macOS runners.

@hghalebi hghalebi changed the title Codex/add ci and gemini writing review [codex] update course automation, transformer, and neuron lessons Mar 27, 2026
@hghalebi hghalebi merged commit c42bccf into main Mar 27, 2026
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant