Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions tasks/manifest.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,30 @@ categories:
- task_log_apache_critical
- task_log_apache_timeline
- task_log_syslog_boot
- task_log_nginx_status_codes
- task_log_nginx_traffic
- task_log_nginx_slow_requests
- task_log_nginx_user_agents
- task_log_nginx_errors
- task_log_ssh_failed_logins
- task_log_ssh_brute_force
- task_log_ssh_successful
- task_log_ssh_user_activity
- task_log_ssh_unusual_times
- task_log_hdfs_failures
- task_log_hdfs_connections
- task_log_hdfs_slow_ops
- task_log_hdfs_block_ops
- task_log_hdfs_storage
- task_log_mapreduce_jobs
- task_log_mapreduce_failures
- task_log_mapreduce_slow_tasks
- task_log_mapreduce_resources
- task_log_mapreduce_timeline
- task_log_syslog_anomalies
- task_log_syslog_services
- task_log_syslog_cron
- task_log_syslog_auth_failures

meeting_analysis:
- task_meeting_council_votes
Expand Down
146 changes: 146 additions & 0 deletions tasks/task_log_hdfs_block_ops.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
---
id: task_log_hdfs_block_ops
name: HDFS DataNode Log - Block Operations Summary
category: log_analysis
grading_type: hybrid
timeout_seconds: 180
workspace_files:
- dest: "hdfs_datanode.log"
source: "logs/hdfs_datanode.log"
---

# HDFS DataNode Log - Block Operations Summary

## Prompt

Analyze the HDFS DataNode log at `hdfs_datanode.log` and produce a comprehensive summary of all block operations. The log comes from an HDFS cluster and tracks block lifecycle events.

Your report should include:

1. **Block Inventory**: Total unique block IDs in the log, with a full list
2. **Operation Types**: For each operation type (allocateBlock, Receiving, Received, addStoredBlock, replicate, PacketResponder), count total occurrences
3. **Block Lifecycle Tracking**: For each block that has a complete lifecycle (allocate → receive → stored), document the full chain
4. **Replication Chain**: For blocks with replication events, trace the replication path across nodes
5. **Associated Jobs**: Identify the MapReduce jobs that triggered these block operations (visible in file paths)
6. **Per-Block Detail Table**: Create a table with columns: Block ID, Size (if known), Allocated Path, Nodes Involved, Replication Count

Write the report to `hdfs_block_ops_report.md` as a well-structured markdown document.

---

## Expected Behavior

The agent should parse 2000 log entries and produce:

**Block Inventory:**
- ~390 unique block IDs

**Operation Counts:**
- Receiving block: ~1149
- allocateBlock: ~385
- Received block: ~19
- addStoredBlock: ~19
- PacketResponder: ~12
- Replicate: 4

**Complete Block Lifecycles (blocks with full data):**
- blk_-1608999687919862906: 91178 bytes, allocated for job_200811092030_0001/job.jar
- blk_7503483334202473044: 233217 bytes, allocated for job_200811092030_0001/job.split
- blk_-3544583377289625738: 11971 bytes
- blk_-9073992586687739851: 11977 bytes

**Replication Chain:**
- blk_-1608999687919862906 was replicated 4 times across the cluster:
10.250.14.224 → 10.251.215.16 → 10.251.74.79 → 10.251.31.5 → 10.251.90.64

**Associated Job:**
- job_200811092030_0001 — MapReduce job, files: job.jar, job.split

Acceptable variations:
- Block ID lists may be truncated
- Not all 390 blocks need full detail — just those with complete lifecycle data
- Table format may vary

---

## Grading Criteria

- [ ] `hdfs_block_ops_report.md` is created in the workspace
- [ ] Unique block count is provided (~390)
- [ ] Operation types are counted (receiving, allocate, replicate, etc.)
- [ ] At least one block lifecycle is fully traced (allocate → receive → stored)
- [ ] The associated MapReduce job is identified (job_200811092030_0001)

---

## Automated Checks

```python
def grade(transcript: list, workspace_path: str) -> dict:
"""Grade the HDFS block operations summary task."""
from pathlib import Path

scores = {}
workspace = Path(workspace_path)
report_file = workspace / "hdfs_block_ops_report.md"

if not report_file.exists():
return {
"output_created": 0.0,
"block_count": 0.0,
"operations_counted": 0.0,
"lifecycle_traced": 0.0,
"job_identified": 0.0,
}

scores["output_created"] = 1.0
content = report_file.read_text(encoding="utf-8").lower()

# Check 1: Block count
has_count = any(n in content for n in ["390", "~390", "385", "~385", "380", "~400"])
scores["block_count"] = (
1.0 if has_count else
0.5 if any(kw in content for kw in ["hundred", "unique block"]) else 0.0
)

# Check 2: Operations counted
op_keywords = ["receiving", "allocate", "replicate", "addstored",
"packetresponder", "received"]
ops_found = sum(1 for kw in op_keywords if kw in content)
scores["operations_counted"] = (
1.0 if ops_found >= 4 else
0.5 if ops_found >= 2 else 0.0
)

# Check 3: Block lifecycle traced
lifecycle_keywords = ["91178", "233217", "blk_-1608999687919862906",
"blk_7503483334202473044", "lifecycle", "job.jar", "job.split"]
scores["lifecycle_traced"] = (
1.0 if sum(1 for kw in lifecycle_keywords if kw in content) >= 2 else
0.5 if sum(1 for kw in lifecycle_keywords if kw in content) >= 1 else 0.0
)

# Check 4: MapReduce job identified
has_job = "job_200811092030_0001" in content or "200811092030" in content
has_mapreduce = "mapreduce" in content or "mapred" in content or "map reduce" in content
scores["job_identified"] = (
1.0 if has_job else
0.5 if has_mapreduce else 0.0
)

return scores
```

---

## Additional Notes

**Key facts from the log:**

- Most blocks only have "Receiving" and "allocateBlock" entries — the cluster was mid-operation
- Only ~19 blocks have complete lifecycle data with confirmed sizes
- The 390 block IDs represent a MapReduce job's data being distributed across the cluster
- Replication is only logged for blk_-1608999687919862906, which is replicated 4 times
- File paths show this is related to a MapReduce job: `/mnt/hadoop/mapred/system/job_200811092030_0001/`

**Grading weights (equal):** Each of the five criteria contributes 0.2 to the final score.
142 changes: 142 additions & 0 deletions tasks/task_log_hdfs_connections.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
---
id: task_log_hdfs_connections
name: HDFS DataNode Log - Connection Pattern Analysis
category: log_analysis
grading_type: hybrid
timeout_seconds: 180
workspace_files:
- dest: "hdfs_datanode.log"
source: "logs/hdfs_datanode.log"
---

# HDFS DataNode Log - Connection Pattern Analysis

## Prompt

Analyze the HDFS DataNode log at `hdfs_datanode.log` and produce a report on connection and communication patterns between nodes. The log contains entries from DataNode, FSNamesystem, and PacketResponder components.

Your report should include:

1. **Network Topology**: List all unique IP addresses that appear in the log, categorized by their role (source, destination, or both)
2. **Subnet Analysis**: Group IPs by subnet (e.g., 10.250.x.x vs 10.251.x.x). How many nodes are in each subnet?
3. **Most Active Nodes**: Top 10 IPs by frequency of appearance (as source or destination)
4. **Communication Patterns**: Which pairs of nodes communicate most frequently?
5. **DataNode vs NameSystem**: Separate the activity — what comes from DataNode operations vs FSNamesystem operations?
6. **Cluster Size Estimate**: Based on the IPs observed, estimate the cluster size

Write the report to `hdfs_connections_report.md` as a well-structured markdown document.

---

## Expected Behavior

The agent should parse 2000 log entries and produce:

**Network Topology:**
- 202 unique IP addresses observed
- IPs fall in the 10.250.x.x and 10.251.x.x ranges (private network)
- All nodes use port 50010 (HDFS DataNode data transfer port)

**Subnet Analysis:**
- 10.250.x.x subnet — contains some of the most active nodes
- 10.251.x.x subnet — contains additional DataNode cluster members
- The split suggests a multi-rack HDFS deployment

**Most Active Nodes:**
- 10.250.19.102 — extremely active (appears as source in many block transfers)
- 10.250.10.6, 10.251.215.16, 10.250.14.224 — also very active

**Component Activity:**
- DataNode$DataXceiver: Block receive operations (~1149 entries)
- FSNamesystem: Block allocation and storage tracking (~400+ entries)
- DataNode$PacketResponder: Block receive confirmations with sizes

Acceptable variations:
- Exact IP counts and rankings may vary by parsing approach
- Subnet grouping granularity may differ
- Cluster size estimates will be approximate

---

## Grading Criteria

- [ ] `hdfs_connections_report.md` is created in the workspace
- [ ] Unique IPs are listed or counted (~202)
- [ ] IPs are grouped by subnet (10.250.x.x vs 10.251.x.x)
- [ ] Most active nodes are identified
- [ ] DataNode vs FSNamesystem activity is distinguished

---

## Automated Checks

```python
def grade(transcript: list, workspace_path: str) -> dict:
"""Grade the HDFS connection pattern analysis task."""
from pathlib import Path

scores = {}
workspace = Path(workspace_path)
report_file = workspace / "hdfs_connections_report.md"

if not report_file.exists():
return {
"output_created": 0.0,
"ips_listed": 0.0,
"subnets_grouped": 0.0,
"active_nodes": 0.0,
"components_separated": 0.0,
}

scores["output_created"] = 1.0
content = report_file.read_text(encoding="utf-8").lower()

# Check 1: IPs listed/counted
has_count = any(n in content for n in ["202", "200", "~200", "over 200"])
has_ips = "10.250" in content and "10.251" in content
scores["ips_listed"] = (
1.0 if has_count and has_ips else
0.5 if has_ips else 0.0
)

# Check 2: Subnets grouped
subnet_keywords = ["subnet", "10.250", "10.251", "rack", "network segment",
"address range", "ip range"]
scores["subnets_grouped"] = (
1.0 if "10.250" in content and "10.251" in content and
sum(1 for kw in subnet_keywords if kw in content) >= 2 else
0.5 if "10.250" in content and "10.251" in content else 0.0
)

# Check 3: Active nodes identified
active_ips = ["10.250.19.102", "10.251.215.16", "10.250.14.224", "10.250.10.6"]
ips_found = sum(1 for ip in active_ips if ip in content)
scores["active_nodes"] = (
1.0 if ips_found >= 2 else
0.5 if ips_found >= 1 else 0.0
)

# Check 4: Components separated
component_keywords = ["dataxceiver", "dataxeceiver", "fsnamesystem",
"packetresponder", "namenode", "datanode"]
scores["components_separated"] = (
1.0 if sum(1 for kw in component_keywords if kw in content) >= 2 else
0.5 if sum(1 for kw in component_keywords if kw in content) >= 1 else 0.0
)

return scores
```

---

## Additional Notes

**Key facts from the log:**

- 202 unique IPs — this is a large HDFS cluster
- Two main subnets: 10.250.x.x and 10.251.x.x
- Port 50010 is used throughout — standard HDFS DataNode port
- 10.250.19.102 appears as source in a disproportionate number of entries
- The log captures a burst of activity related to job_200811092030_0001

**Grading weights (equal):** Each of the five criteria contributes 0.2 to the final score.
Loading
Loading