`_decode_bytes` crashes on invalid UTF-8 sequences

## Description

The `_decode_bytes` helper function in `gemma/gm/data/_tasks.py` crashes with `UnicodeDecodeError` when encountering invalid UTF-8 byte sequences. This prevents data processing pipelines from handling datasets (especially TFDS datasets) that may contain corrupted or non-UTF-8 bytes.

## Current Behavior

The function attempts to decode bytes as UTF-8 without error handling:

```python
def _decode_bytes(element):
  if isinstance(element, bytes):
    return element.decode("utf-8")  # Crashes on invalid UTF-8
  else:
    return element
```

When invalid UTF-8 sequences are encountered (e.g., `bytes([0xFF, 0xFE, 0xFD])`), the function raises `UnicodeDecodeError`, causing the entire data processing pipeline to crash.

## Expected Behavior

The function should handle invalid UTF-8 sequences gracefully by:
1. Replacing invalid bytes with the Unicode replacement character (U+FFFD)
2. Issuing a warning to inform users about data quality issues
3. Allowing the data processing pipeline to continue

## Impact

- **Crashes entire data processing pipelines** when datasets contain invalid UTF-8 bytes
- **No graceful degradation** - valid data cannot be processed if any invalid bytes exist
- **Poor error messages** - `UnicodeDecodeError` doesn't clearly indicate the issue is with data encoding
- Affects both `Seq2SeqTask` and `ContrastiveTask` which use this helper function

## Reproduction

```python
from gemma.gm.data._tasks import _decode_bytes

# This crashes with UnicodeDecodeError
invalid_bytes = bytes([0xFF, 0xFE, 0xFD])
result = _decode_bytes(invalid_bytes)
```

**Error:**
```
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
```

<img width="734" height="418" alt="Image" src="https://github.com/user-attachments/assets/b00ceb1d-3d13-43d4-b6b1-c0dd084d1d82" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`_decode_bytes` crashes on invalid UTF-8 sequences #504

Description

Current Behavior

Expected Behavior

Impact

Reproduction

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

_decode_bytes crashes on invalid UTF-8 sequences #504

Description

Description

Current Behavior

Expected Behavior

Impact

Reproduction

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`_decode_bytes` crashes on invalid UTF-8 sequences #504