Skip to content

feat: onboarding Gmail integration + LinkedIn profile enrichment#524

Open
senamakel wants to merge 6 commits intotinyhumansai:mainfrom
senamakel:feat/sunday-3
Open

feat: onboarding Gmail integration + LinkedIn profile enrichment#524
senamakel wants to merge 6 commits intotinyhumansai:mainfrom
senamakel:feat/sunday-3

Conversation

@senamakel
Copy link
Copy Markdown
Member

@senamakel senamakel commented Apr 13, 2026

Summary

  • Onboarding now shows Gmail integration — replaced the static "Connect Integrations Later" step with a live Gmail connect card (Notion and others available after setup)
  • Context gathering step — after connecting Gmail, a new onboarding step calls the Rust-side LinkedIn enrichment pipeline and shows progress
  • LinkedIn enrichment pipeline (Rust) — searches Gmail HTML bodies for comm/in/<username> links, scrapes the profile via Apify (dev_fusion/linkedin-profile-scraper), passes through LLM summarisation, and writes PROFILE.md to workspace
  • PROFILE.md replaces USER.md — the agent prompt system now loads PROFILE.md (generated from real user data) instead of the generic USER.md template
  • Composio tools hidden from main agentToolCategory::Skill tools (Composio, Apify) are filtered out of the orchestrator/main agent prompt; only the skills_agent subagent sees them via category_filter = "skill"

Key files

Area Files
Onboarding UI app/src/pages/onboarding/steps/SkillsStep.tsx, ContextGatheringStep.tsx, Onboarding.tsx
LinkedIn enrichment src/openhuman/learning/linkedin_enrichment.rs, schemas.rs
Profile → prompt src/openhuman/context/prompt.rs (USER.md → PROFILE.md)
Tool filtering src/openhuman/agent/harness/instructions.rs, channels/runtime/startup.rs
Controller registry src/core/all.rs

Test plan

  • cargo test --lib -- linkedin_enrichment — 5 regex tests pass
  • cargo check — clean build
  • tsc --noEmit — clean typecheck
  • CLI test: openhuman learning linkedin_enrichment — full pipeline ran successfully, scraped real LinkedIn profile, LLM summarised, PROFILE.md written
  • Manual: complete onboarding flow with Gmail connected, verify context gathering step shows progress
  • Manual: verify orchestrator agent no longer shows Composio tools in its prompt
  • Manual: verify skills_agent still has access to Composio tools

Summary by CodeRabbit

  • New Features

    • Added Apify integration for running and monitoring data actors, with tools to track job status and retrieve results.
    • Introduced LinkedIn profile enrichment during onboarding, which automatically gathers profile data and context.
    • Added context gathering step in onboarding flow to enrich user information.
    • Enhanced Skills onboarding to focus on Gmail integration with connection status indicators.
    • Added "Run Apify Actors" capability to the capability catalog.
  • Improvements

    • Refactored agent tool instruction system to improve tool organization and reusability.
    • Updated workspace identity file handling to include generated profile data.
    • Improved Composio toolkit icon rendering.

…ing steps

- Removed the old `toolkitMeta.ts` file and replaced it with a new `toolkitMeta.tsx` file that includes updated metadata handling for Composio toolkits, enhancing the integration with React components.
- Updated the `ComposioConnectModal` to directly render icons without additional markup, streamlining the component structure.
- Modified the `Skills` page to utilize the new icon rendering method, improving consistency across the application.
- Enhanced the onboarding process by introducing a new `ContextGatheringStep` component, which gathers user context from connected integrations, improving the onboarding experience.
- Updated the `SkillsStep` to reflect changes in toolkit connection handling and display, ensuring a smoother user interaction during onboarding.
…d status retrieval

- Added new tools for running Apify actors and fetching their run statuses, enhancing automation capabilities.
- Updated the integration schema to include an `apify` toggle for user configuration, allowing for flexible integration management.
- Enhanced the onboarding experience by modifying the SkillsStep to focus on Gmail integration, streamlining user interactions.
- Improved documentation and comments for clarity on the new Apify functionalities and their usage.
- Introduced a new `linkedin_enrichment` module for enriching user profiles by scraping LinkedIn data from Gmail.
- Implemented the `run_linkedin_enrichment` function to handle the enrichment pipeline, including Gmail search, scraping via Apify, and data persistence.
- Added controller schemas for the learning domain, enabling integration with the existing controller framework.
- Updated the `all.rs` file to register the new learning controllers and schemas, enhancing the overall functionality of the learning system.
- Added functionality to generate a PROFILE.md file from scraped LinkedIn data, summarizing user profiles for agent context.
- Updated the `run_linkedin_enrichment` function to write PROFILE.md to the workspace, enhancing data persistence.
- Introduced helper functions for rendering and summarizing LinkedIn profiles, improving the overall enrichment process.
- Ensured minimal PROFILE.md creation even when scraping fails, maintaining essential user context.
…t pipeline

- Updated the ContextGatheringStep to integrate a new pipeline for LinkedIn enrichment, replacing the previous Gmail profile fetching stages.
- Implemented a progress animation and logging for the enrichment process, improving user feedback during data retrieval.
- Refactored stage definitions to align with the new pipeline structure, enhancing clarity and maintainability.
- Introduced error handling and status updates for each stage of the enrichment process, ensuring robust user experience.
…and flexibility

- Introduced helper functions `tool_instructions_preamble` and `append_tool_entry` to streamline the construction of tool instructions.
- Updated `build_tool_instructions` to utilize the new helper functions, improving code readability and maintainability.
- Added `build_tool_instructions_filtered` to allow for generating instructions from a filtered list of tools, enhancing flexibility in tool usage.
- Adjusted the startup process to use the filtered instructions, ensuring only relevant tools are included in the system prompt.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 13, 2026

📝 Walkthrough

Walkthrough

The PR introduces LinkedIn profile enrichment during onboarding via Gmail and Apify integration, migrates Composio toolkit icons to React components with branded SVG support, adds three Apify actor tools for running and monitoring async tasks, and extends the onboarding flow with a new context-gathering step that executes the enrichment pipeline.

Changes

Cohort / File(s) Summary
Frontend Composio Toolkit Refactoring
app/src/components/composio/toolkitMeta.tsx, app/src/components/composio/ComposioConnectModal.tsx, app/src/pages/Skills.tsx
Migrated toolkitMeta from static .ts to React .tsx with internal SVG icon components for known toolkits (Gmail, Google Drive/Calendar/Sheets, Notion, GitHub, Slack, etc.), added fallback emoji icon, removed explicit text-lg span wrapper from icon rendering.
Frontend Onboarding & Skills Integration
app/src/pages/onboarding/Onboarding.tsx, app/src/pages/onboarding/steps/SkillsStep.tsx, app/src/pages/onboarding/steps/ContextGatheringStep.tsx
Added new ContextGatheringStep for LinkedIn enrichment pipeline execution, updated SkillsStep to manage Composio Gmail integration with OAuth modal and connected source tracking, refactored onboarding step advancement logic to include new context step.
Frontend Tests
app/src/pages/__tests__/Skills.composio-catalog.test.tsx
Extended test assertions to verify additional toolkit names (Google Sheets, Facebook, Instagram, Reddit) render in fallback catalog.
Backend Apify Integration
src/openhuman/integrations/apify.rs, src/openhuman/integrations/mod.rs, src/openhuman/integrations/types.rs
Implemented three Apify tools: ApifyRunActorTool (starts async actor runs with timeout/memory parameters), ApifyGetRunStatusTool (polls run status), ApifyGetRunResultsTool (fetches paginated results with JSON sample formatting).
Backend Learning Pipeline
src/openhuman/learning/linkedin_enrichment.rs, src/openhuman/learning/schemas.rs, src/openhuman/learning/mod.rs
Added LinkedIn enrichment pipeline: extracts LinkedIn profile URL from Gmail via Composio, scrapes profile via Apify actor, optionally summarizes via LLM, persists to workspace and memory store.
Backend Configuration & Registry
src/openhuman/config/schema/tools.rs, src/openhuman/channels/runtime/startup.rs, src/core/all.rs, src/openhuman/tools/ops.rs
Added apify toggle to IntegrationsConfig, registered Apify tools in tool registry when enabled, excluded Skill-category tools from agent system prompt, registered learning controller namespace globally.
Backend Schema & Catalog
src/openhuman/agent/harness/instructions.rs, src/openhuman/agent/harness/mod.rs, src/openhuman/about_app/catalog.rs, src/openhuman/context/prompt.rs
Refactored tool instruction building with build_tool_instructions_filtered helper for filtered tool subsets, added Apify capability to catalog, updated identity/profile file handling in prompt workspace.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Frontend
    participant Backend as Backend<br/>(RPC Handler)
    participant Gmail as Gmail/Composio
    participant Apify as Apify API
    participant Memory as Memory Store
    participant LLM as LLM Service

    User->>Frontend: Click "Continue" on SkillsStep
    Frontend->>Frontend: Update draft.connectedSources
    Frontend->>Backend: callCoreRpc("openhuman.learning_linkedin_enrichment")
    activate Backend
    Backend->>Gmail: Composio: Search from:linkedin.com
    Gmail-->>Backend: Email messages with LinkedIn URLs
    Backend->>Backend: Extract profile URL from emails
    Backend->>Apify: POST /dev_fusion/linkedin-profile-scraper
    Apify-->>Backend: Run ID, status, dataset reference
    Backend->>Apify: Poll run status until SUCCEEDED
    Apify-->>Backend: Final status SUCCEEDED + results
    Backend->>Backend: Extract profile JSON from dataset
    Backend->>LLM: Optional: Summarize profile (with fallback)
    LLM-->>Backend: Markdown summary or error (use raw JSON)
    Backend->>Memory: Store profile via MemoryClient
    Memory-->>Backend: Stored
    Backend-->>Frontend: { profile_url, profile_data, log }
    deactivate Backend
    Frontend->>Frontend: Parse logs, update UI progress
    Frontend->>User: Display results, enable Continue button
    User->>Frontend: Click "Continue"
    Frontend->>Frontend: Call onNext(), advance to next step
Loading
sequenceDiagram
    participant ComposioModal
    participant SkillsStep
    participant Integrations as useComposioIntegrations
    participant GmailIntegration

    ComposioModal->>SkillsStep: User clicks "Connect Gmail"
    SkillsStep->>SkillsStep: Set activeToolkit = "gmail"
    SkillsStep->>ComposioModal: Render with activeToolkit state
    ComposioModal->>ComposioModal: Show OAuth flow
    ComposioModal->>GmailIntegration: User authorizes
    GmailIntegration-->>ComposioModal: Success callback
    ComposioModal->>SkillsStep: onClose()
    SkillsStep->>Integrations: Re-fetch integrations state
    Integrations-->>SkillsStep: Updated status (connected=true)
    SkillsStep->>SkillsStep: Update UI: show "Connected" badge
    SkillsStep->>SkillsStep: Enable "Continue" button
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 Hops through the toolkit with branded icons so bright,
Gmail threads and Apify actors running through the night,
LinkedIn profiles gathered in the onboarding flow,
Context enriched and memories stored, watch the learning grow! 🌟

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main feature addition: Gmail integration during onboarding plus LinkedIn profile enrichment pipeline.
Docstring Coverage ✅ Passed Docstring coverage is 80.46% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/openhuman/context/prompt.rs (1)

355-375: ⚠️ Potential issue | 🟠 Major

Align identity file usage across agent and channel prompts.

The main agent prompt no longer injects USER.md (replaced with PROFILE.md), but channels_prompt.rs (line 41) still lists USER.md in bootstrap_files, and subconscious/prompt.rs still loads identity context from USER.md. Additionally, workspace/ops.rs (line 11) still includes the default USER.md content.

If USER.md is being phased out in favor of PROFILE.md, update channels_prompt.rs and subconscious/prompt.rs to use PROFILE.md and remove the USER.md default from workspace/ops.rs. If channels and subconscious intentionally retain USER.md while main agents use PROFILE.md, document this separation in code comments.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/context/prompt.rs` around lines 355 - 375, The repo is
inconsistent: main prompt code now uses PROFILE.md but channels_prompt.rs's
bootstrap_files and subconscious/prompt.rs still reference USER.md and
workspace/ops.rs still provides a USER.md default; update channels_prompt.rs
(bootstrap_files) and subconscious/prompt.rs (where identity is loaded) to
reference "PROFILE.md" instead of "USER.md", and remove the USER.md default
content from workspace/ops.rs (or if the intent is to keep both, add a clear
code comment in channels_prompt.rs and subconscious/prompt.rs documenting why
channels/subconscious retain USER.md while main agents use PROFILE.md). Ensure
you update the relevant constants/arrays and any calls that read or inject the
identity file to use "PROFILE.md" (or add the explanatory comment) so file usage
is consistent.
🧹 Nitpick comments (5)
app/src/pages/onboarding/steps/ContextGatheringStep.tsx (1)

113-121: Add debug logging for pipeline execution.

Per coding guidelines, add namespaced debug logs for new flows to aid tracing.

🔧 Suggested improvement
 async function runPipeline() {
+  console.debug('[onboarding:context-gathering] starting enrichment pipeline');
   // Mark all stages as active (pipeline runs as one call).
   setStageStatuses(prev => ({ ...prev, 'gmail-search': 'active' }));

   try {
     const raw = await callCoreRpc<unknown>({
       method: 'openhuman.learning_linkedin_enrichment',
     });
+    console.debug('[onboarding:context-gathering] RPC completed', { raw });
     const result = unwrapCliEnvelope<EnrichmentResult>(raw);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/pages/onboarding/steps/ContextGatheringStep.tsx` around lines 113 -
121, The runPipeline function is missing namespaced debug logging for tracing;
add debug logs (using the project's logger or a namespaced console.debug) at key
points in runPipeline: before starting the pipeline (after setStageStatuses),
immediately before and after calling
callCoreRpc('openhuman.learning_linkedin_enrichment'), and after
unwrapCliEnvelope(EnrichmentResult) to log the unwrapped result or error
context; reference the runPipeline function and the
callCoreRpc/unwrapCliEnvelope calls when inserting these concise, namespaced
debug messages to aid tracing.
src/openhuman/learning/linkedin_enrichment.rs (1)

512-546: Consider reusing MemoryClient instance.

Both persist_linkedin_profile and persist_linkedin_url_only create separate MemoryClient::new_local() instances. Since both are called from the same pipeline, consider passing the client as a parameter or creating it once in run_linkedin_enrichment.

This is minor since the pipeline runs once during onboarding.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/learning/linkedin_enrichment.rs` around lines 512 - 546,
persist_linkedin_profile and persist_linkedin_url_only each call
MemoryClient::new_local(), causing duplicate client creation; change the design
to create a single MemoryClient in run_linkedin_enrichment and pass a reference
or owned client into persist_linkedin_profile and persist_linkedin_url_only
(e.g., add a parameter like memory: &MemoryClient or memory: MemoryClient),
update their signatures and call sites in run_linkedin_enrichment accordingly,
and remove the MemoryClient::new_local() calls from those functions so they
reuse the shared client instance.
app/src/components/composio/toolkitMeta.tsx (1)

293-305: Inconsistency between CATALOG and KNOWN_COMPOSIO_TOOLKITS.

The CATALOG includes both slug variants (googlecalendar / google_calendar, googledrive / google_drive, googlesheets / google_sheets), but KNOWN_COMPOSIO_TOOLKITS only includes one variant for each. This could cause display issues if the backend returns the alternate variant.

Consider including both variants in KNOWN_COMPOSIO_TOOLKITS or documenting that the list is non-exhaustive:

♻️ Suggested fix
 export const KNOWN_COMPOSIO_TOOLKITS = Object.freeze([
   'gmail',
   'googlecalendar',
+  'google_calendar',
   'googledrive',
+  'google_drive',
   'notion',
   'github',
   'slack',
   'linear',
   'facebook',
   'google_sheets',
+  'googlesheets',
   'instagram',
   'reddit',
 ]);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/components/composio/toolkitMeta.tsx` around lines 293 - 305, The
KNOWN_COMPOSIO_TOOLKITS array is missing alternate slug variants present in
CATALOG (e.g., google_calendar vs googlecalendar, google_drive vs googledrive,
google_sheets vs googlesheets); update KNOWN_COMPOSIO_TOOLKITS to include both
slug variants for each toolkit or explicitly document that the list is
non‑exhaustive and used only as a hint. Locate the KNOWN_COMPOSIO_TOOLKITS
constant and add the alternate strings (google_calendar, google_drive,
google_sheets and any other duplicate variants found in CATALOG) so the frontend
can handle either slug returned by the backend.
app/src/pages/onboarding/Onboarding.tsx (2)

135-160: Add debug logging for context completion flow.

The function handles onboarding completion correctly with good error handling. Consider adding entry debug logging for traceability.

🔧 Suggested improvement
 const handleContextNext = async () => {
+  console.debug('[onboarding] handleContextNext: completing onboarding', {
+    connectedSources: draft.connectedSources,
+  });
   await setOnboardingTasks({
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/pages/onboarding/Onboarding.tsx` around lines 135 - 160, Add entry
debug logging at the start of handleContextNext to trace the context completion
flow: log a clear message (including relevant state like draft.connectedSources
or a minimal marker) before calling setOnboardingTasks, and add similar debug
logs before the userApi.onboardingComplete call and before
setOnboardingCompletedFlag to aid tracing; use the existing logging mechanism
(console.debug/console.log or the project logger) and include the function name
handleContextNext in each message to make logs searchable.

130-133: Add debug logging for new flow entry point.

Per coding guidelines, new flows should have substantial development-oriented logs. Consider adding a namespaced debug log when handleSkillsNext is invoked.

🔧 Suggested improvement
 const handleSkillsNext = async (connectedSources: string[]) => {
+  console.debug('[onboarding] handleSkillsNext called', { connectedSources });
   setDraft(prev => ({ ...prev, connectedSources }));
   handleNext();
 };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/pages/onboarding/Onboarding.tsx` around lines 130 - 133, Add a
namespaced debug log at the start of the handleSkillsNext function so developer
telemetry records when this new flow entry point is invoked; specifically,
inside handleSkillsNext (which calls setDraft and handleNext) log a clear
namespaced message (e.g., "onboarding:handleSkillsNext") along with the
connectedSources payload and any relevant draft state before calling
setDraft/handleNext to aid debugging.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/src/components/composio/toolkitMeta.tsx`:
- Line 1: Prettier formatting failed in
app/src/components/composio/toolkitMeta.tsx; run the project's Prettier (and
ESLint autofix) in the app workspace, format this file (and any changed files)
and re-run linting so the file (toolkitMeta.tsx) adheres to the code style
rules, then stage and commit the formatted changes before pushing.

In `@app/src/pages/onboarding/steps/ContextGatheringStep.tsx`:
- Line 1: Run Prettier on the ContextGatheringStep.tsx file to resolve
formatting issues reported by the pipeline; specifically run `npm run format` or
`npx prettier --write` in the app workspace, then stage the formatted changes
for commit. Ensure the exported component/function ContextGatheringStep (and any
imports at top of the file) are correctly formatted per project Prettier rules
before pushing.
- Around line 96-111: The effect in useEffect checks hasGmail and synchronously
calls setStageStatuses, setStageDetails, and setFinished which triggers the
react-hooks/set-state-in-effect lint rule; instead, move the "skipped"
derivation out of the effect (derive skipped statuses from STAGES/hasGmail
during render) or wrap the state updates in a microtask so they are async (e.g.,
schedule via setTimeout/queueMicrotask) and keep the existing early-return
behavior in useEffect; update the logic around ranRef, hasGmail, STAGES,
setStageStatuses, setStageDetails, setFinished, and runPipeline accordingly so
no synchronous setState occurs directly in the effect body.

In `@app/src/pages/onboarding/steps/SkillsStep.tsx`:
- Around line 63-76: The step currently hard-codes displayToolkits to Gmail and
computes connectedCount from all connections; update it to derive
displayToolkits from useComposioIntegrations().toolkits by selecting the Gmail
toolkit (or an empty array when not present) so the UI reflects the backend
allowlist, then compute connectedCount and connectedSources only by iterating
connectionByToolkit for the slugs in displayToolkits (use the toolkit.slug to
filter), and change the loading/unavailable logic to show a retry/unavailable
card when composioError is set rather than always rendering an actionable Gmail
card; ensure composioToolkitMeta('gmail') is only used to map metadata for a
toolkit that exists in toolkits before adding to displayToolkits.

In `@src/core/all.rs`:
- Around line 138-139: The new learning controllers registered via
controllers.extend(crate::openhuman::learning::all_learning_registered_controllers())
lack a user-facing description because namespace_description() does not return a
description for the "learning" namespace; update the namespace_description
function to include a descriptive entry for "learning" (and any duplicate
namespace_description match arm referenced later around the other occurrence) so
CLI/help discovery shows a human-readable description for the learning RPC
surface—locate namespace_description and add a case for "learning" (or the exact
namespace string used when registering via all_learning_registered_controllers)
with a short explanatory string.

In `@src/openhuman/agent/harness/instructions.rs`:
- Around line 4-16: The formatting failure is caused by long string literals in
function tool_instructions_preamble(); run cargo fmt to auto-fix or manually
wrap/break the long s.push_str(...) calls into shorter multi-line string
literals (use concatenated or raw/multi-line strings) so they adhere to rustfmt
rules, updating the s.push_str(...) invocations around the code block and
"CRITICAL"/"Example" paragraphs to the shorter, formatted forms suggested in the
review.

In `@src/openhuman/channels/runtime/startup.rs`:
- Around line 220-229: The system prompt still includes Skill-category tool
descriptions because build_system_prompt(...) is being called with the full
tool_descs list; before constructing tool_descs (or before calling
build_system_prompt), filter tools_registry the same way you did for the
appended instruction block: create a non-skill collection (e.g., reuse
non_skill_tools/non_skill_refs logic) and build tool_descs only from those
non-skill tools so build_system_prompt(...) will not receive or include
Skill-category entries like Composio; alternatively, remove Skill entries from
the existing tool_descs vector prior to calling build_system_prompt.

In `@src/openhuman/learning/linkedin_enrichment.rs`:
- Line 1: Run rustfmt on the repository and fix formatting in this module: run
`cargo fmt --all` (and then `cargo check`) and commit the changes; specifically
ensure src/openhuman/learning/linkedin_enrichment.rs is reformatted to comply
with rustfmt rules (fix imports, spacing, line breaks, and doc comment alignment
for the LinkedIn enrichment module and any functions/impls within it) so cargo
fmt no longer reports failures.

---

Outside diff comments:
In `@src/openhuman/context/prompt.rs`:
- Around line 355-375: The repo is inconsistent: main prompt code now uses
PROFILE.md but channels_prompt.rs's bootstrap_files and subconscious/prompt.rs
still reference USER.md and workspace/ops.rs still provides a USER.md default;
update channels_prompt.rs (bootstrap_files) and subconscious/prompt.rs (where
identity is loaded) to reference "PROFILE.md" instead of "USER.md", and remove
the USER.md default content from workspace/ops.rs (or if the intent is to keep
both, add a clear code comment in channels_prompt.rs and subconscious/prompt.rs
documenting why channels/subconscious retain USER.md while main agents use
PROFILE.md). Ensure you update the relevant constants/arrays and any calls that
read or inject the identity file to use "PROFILE.md" (or add the explanatory
comment) so file usage is consistent.

---

Nitpick comments:
In `@app/src/components/composio/toolkitMeta.tsx`:
- Around line 293-305: The KNOWN_COMPOSIO_TOOLKITS array is missing alternate
slug variants present in CATALOG (e.g., google_calendar vs googlecalendar,
google_drive vs googledrive, google_sheets vs googlesheets); update
KNOWN_COMPOSIO_TOOLKITS to include both slug variants for each toolkit or
explicitly document that the list is non‑exhaustive and used only as a hint.
Locate the KNOWN_COMPOSIO_TOOLKITS constant and add the alternate strings
(google_calendar, google_drive, google_sheets and any other duplicate variants
found in CATALOG) so the frontend can handle either slug returned by the
backend.

In `@app/src/pages/onboarding/Onboarding.tsx`:
- Around line 135-160: Add entry debug logging at the start of handleContextNext
to trace the context completion flow: log a clear message (including relevant
state like draft.connectedSources or a minimal marker) before calling
setOnboardingTasks, and add similar debug logs before the
userApi.onboardingComplete call and before setOnboardingCompletedFlag to aid
tracing; use the existing logging mechanism (console.debug/console.log or the
project logger) and include the function name handleContextNext in each message
to make logs searchable.
- Around line 130-133: Add a namespaced debug log at the start of the
handleSkillsNext function so developer telemetry records when this new flow
entry point is invoked; specifically, inside handleSkillsNext (which calls
setDraft and handleNext) log a clear namespaced message (e.g.,
"onboarding:handleSkillsNext") along with the connectedSources payload and any
relevant draft state before calling setDraft/handleNext to aid debugging.

In `@app/src/pages/onboarding/steps/ContextGatheringStep.tsx`:
- Around line 113-121: The runPipeline function is missing namespaced debug
logging for tracing; add debug logs (using the project's logger or a namespaced
console.debug) at key points in runPipeline: before starting the pipeline (after
setStageStatuses), immediately before and after calling
callCoreRpc('openhuman.learning_linkedin_enrichment'), and after
unwrapCliEnvelope(EnrichmentResult) to log the unwrapped result or error
context; reference the runPipeline function and the
callCoreRpc/unwrapCliEnvelope calls when inserting these concise, namespaced
debug messages to aid tracing.

In `@src/openhuman/learning/linkedin_enrichment.rs`:
- Around line 512-546: persist_linkedin_profile and persist_linkedin_url_only
each call MemoryClient::new_local(), causing duplicate client creation; change
the design to create a single MemoryClient in run_linkedin_enrichment and pass a
reference or owned client into persist_linkedin_profile and
persist_linkedin_url_only (e.g., add a parameter like memory: &MemoryClient or
memory: MemoryClient), update their signatures and call sites in
run_linkedin_enrichment accordingly, and remove the MemoryClient::new_local()
calls from those functions so they reuse the shared client instance.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 47534480-d111-45b8-a4b5-4902dfdc16de

📥 Commits

Reviewing files that changed from the base of the PR and between ec138c8 and 6bf5e48.

📒 Files selected for processing (22)
  • app/src/components/composio/ComposioConnectModal.tsx
  • app/src/components/composio/toolkitMeta.ts
  • app/src/components/composio/toolkitMeta.tsx
  • app/src/pages/Skills.tsx
  • app/src/pages/__tests__/Skills.composio-catalog.test.tsx
  • app/src/pages/onboarding/Onboarding.tsx
  • app/src/pages/onboarding/steps/ContextGatheringStep.tsx
  • app/src/pages/onboarding/steps/SkillsStep.tsx
  • src/core/all.rs
  • src/openhuman/about_app/catalog.rs
  • src/openhuman/agent/harness/instructions.rs
  • src/openhuman/agent/harness/mod.rs
  • src/openhuman/channels/runtime/startup.rs
  • src/openhuman/config/schema/tools.rs
  • src/openhuman/context/prompt.rs
  • src/openhuman/integrations/apify.rs
  • src/openhuman/integrations/mod.rs
  • src/openhuman/integrations/types.rs
  • src/openhuman/learning/linkedin_enrichment.rs
  • src/openhuman/learning/mod.rs
  • src/openhuman/learning/schemas.rs
  • src/openhuman/tools/ops.rs
💤 Files with no reviewable changes (1)
  • app/src/components/composio/toolkitMeta.ts

@@ -0,0 +1,320 @@
/**
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Address Prettier formatting issues.

The pipeline reports Prettier code style issues. Run formatting before merging.

As per coding guidelines: "Run Prettier and ESLint formatting/linting in the app workspace before merging."

🧰 Tools
🪛 GitHub Actions: Type Check

[warning] 1-1: Prettier reported code style issues in this file during --check.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/components/composio/toolkitMeta.tsx` at line 1, Prettier formatting
failed in app/src/components/composio/toolkitMeta.tsx; run the project's
Prettier (and ESLint autofix) in the app workspace, format this file (and any
changed files) and re-run linting so the file (toolkitMeta.tsx) adheres to the
code style rules, then stage and commit the formatted changes before pushing.

@@ -0,0 +1,315 @@
/**
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Address Prettier formatting issues.

The pipeline reports Prettier code style issues. Run npm run format or npx prettier --write on this file before merging.

As per coding guidelines: "Run Prettier and ESLint formatting/linting in the app workspace before merging."

🧰 Tools
🪛 GitHub Actions: Type Check

[warning] 1-1: Prettier reported code style issues in this file during --check.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/pages/onboarding/steps/ContextGatheringStep.tsx` at line 1, Run
Prettier on the ContextGatheringStep.tsx file to resolve formatting issues
reported by the pipeline; specifically run `npm run format` or `npx prettier
--write` in the app workspace, then stage the formatted changes for commit.
Ensure the exported component/function ContextGatheringStep (and any imports at
top of the file) are correctly formatted per project Prettier rules before
pushing.

Comment on lines +96 to +111
useEffect(() => {
if (ranRef.current) return;
ranRef.current = true;

if (!hasGmail) {
const skipped: Record<string, StageStatus> = {};
for (const s of STAGES) skipped[s.id] = 'skipped';
setStageStatuses(skipped);
setStageDetails({ 'gmail-search': 'Gmail not connected' });
setFinished(true);
return;
}

void runPipeline();
// eslint-disable-next-line react-hooks/exhaustive-deps
}, []);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential lint violation: setState calls inside useEffect.

Based on learnings, this codebase disallows synchronous setState calls directly inside useEffect bodies (react-hooks/set-state-in-effect lint rule). Lines 101-105 call setStageStatuses, setStageDetails, and setFinished directly in the effect.

Consider refactoring to derive the "skipped" state from props/render or moving the skip logic outside the effect:

♻️ Suggested refactor
+const ContextGatheringStep = ({
+  connectedSources,
+  onNext,
+  onBack: _onBack,
+}: ContextGatheringStepProps) => {
+  const hasGmail = connectedSources.some(s => s.includes('gmail'));
+
+  // Derive initial state based on hasGmail
+  const [stageStatuses, setStageStatuses] = useState<Record<string, StageStatus>>(() => {
+    const initial: Record<string, StageStatus> = {};
+    for (const s of STAGES) initial[s.id] = hasGmail ? 'pending' : 'skipped';
+    return initial;
+  });
+  const [stageDetails, setStageDetails] = useState<Record<string, string>>(() =>
+    hasGmail ? {} : { 'gmail-search': 'Gmail not connected' }
+  );
+  const [finished, setFinished] = useState(!hasGmail);
   // ...
   useEffect(() => {
     if (ranRef.current) return;
     ranRef.current = true;

-    if (!hasGmail) {
-      const skipped: Record<string, StageStatus> = {};
-      for (const s of STAGES) skipped[s.id] = 'skipped';
-      setStageStatuses(skipped);
-      setStageDetails({ 'gmail-search': 'Gmail not connected' });
-      setFinished(true);
-      return;
-    }
+    if (!hasGmail) return;

     void runPipeline();

Based on learnings: "In React components, do not perform synchronous setState (or other state-updating calls) directly inside useEffect bodies."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/pages/onboarding/steps/ContextGatheringStep.tsx` around lines 96 -
111, The effect in useEffect checks hasGmail and synchronously calls
setStageStatuses, setStageDetails, and setFinished which triggers the
react-hooks/set-state-in-effect lint rule; instead, move the "skipped"
derivation out of the effect (derive skipped statuses from STAGES/hasGmail
during render) or wrap the state updates in a microtask so they are async (e.g.,
schedule via setTimeout/queueMicrotask) and keep the existing early-return
behavior in useEffect; update the logic around ranRef, hasGmail, STAGES,
setStageStatuses, setStageDetails, setFinished, and runPipeline accordingly so
no synchronous setState occurs directly in the effect body.

Comment on lines +63 to +76
const {
connectionByToolkit,
loading: composioLoading,
refresh: refreshComposio,
} = useComposioIntegrations();

// Only show Gmail during onboarding — more integrations on the Integrations page.
const gmailMeta = composioToolkitMeta('gmail');
const displayToolkits: ComposioToolkitMeta[] = [gmailMeta];

const connectedCount = Array.from(connectionByToolkit.values()).filter(c => {
const state = deriveComposioState(c);
return state === 'connected';
}).length;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Drive this step from the Gmail allowlist, not the global connection map.

useComposioIntegrations().toolkits is the backend allowlist, but this step ignores it and hard-codes displayToolkits to Gmail. At the same time, connectedCount and connectedSources are computed from all Composio connections. A user with an existing Slack/Notion connection will see “Continue” even if Gmail is disconnected, and ContextGatheringStep can then skip enrichment unexpectedly. The loading branch on Line 105 is also unreachable because displayToolkits.length is always 1. Gate the card on the fetched Gmail toolkit and scope the count/submission to the displayed slug(s).

💡 Possible fix
   const {
+    toolkits,
+    error: composioError,
     connectionByToolkit,
     loading: composioLoading,
     refresh: refreshComposio,
   } = useComposioIntegrations();

   // Only show Gmail during onboarding — more integrations on the Integrations page.
   const gmailMeta = composioToolkitMeta('gmail');
-  const displayToolkits: ComposioToolkitMeta[] = [gmailMeta];
+  const displayToolkits: ComposioToolkitMeta[] =
+    composioLoading || toolkits.includes(gmailMeta.slug) ? [gmailMeta] : [];

-  const connectedCount = Array.from(connectionByToolkit.values()).filter(c => {
-    const state = deriveComposioState(c);
-    return state === 'connected';
-  }).length;
+  const connectedCount = displayToolkits.filter(
+    meta => deriveComposioState(connectionByToolkit.get(meta.slug)) === 'connected'
+  ).length;

   const handleFinish = async () => {
     setError(null);
     setLoading(true);
     try {
-      const connectedSources = Array.from(connectionByToolkit.entries())
-        .filter(([, c]) => deriveComposioState(c) === 'connected')
-        .map(([slug]) => `composio:${slug}`);
+      const connectedSources = displayToolkits
+        .filter(meta => deriveComposioState(connectionByToolkit.get(meta.slug)) === 'connected')
+        .map(meta => `composio:${meta.slug}`);
       await onNext(connectedSources);

Use composioError to render an unavailable/retry state instead of leaving a permanently actionable Gmail card when the allowlist fetch fails.

Also applies to: 82-85, 105-189

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/pages/onboarding/steps/SkillsStep.tsx` around lines 63 - 76, The step
currently hard-codes displayToolkits to Gmail and computes connectedCount from
all connections; update it to derive displayToolkits from
useComposioIntegrations().toolkits by selecting the Gmail toolkit (or an empty
array when not present) so the UI reflects the backend allowlist, then compute
connectedCount and connectedSources only by iterating connectionByToolkit for
the slugs in displayToolkits (use the toolkit.slug to filter), and change the
loading/unavailable logic to show a retry/unavailable card when composioError is
set rather than always rendering an actionable Gmail card; ensure
composioToolkitMeta('gmail') is only used to map metadata for a toolkit that
exists in toolkits before adding to displayToolkits.

Comment on lines +138 to +139
// Self-learning and user context enrichment
controllers.extend(crate::openhuman::learning::all_learning_registered_controllers());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add a description for the new learning namespace.

The controllers are registered here, but namespace_description() still falls through to None for learning. That leaves CLI/help discovery incomplete for the new RPC surface.

🛠️ Suggested follow-up
         "local_ai" => Some("Local AI chat, inference, downloads, and media operations."),
+        "learning" => Some("User context enrichment and self-learning controllers."),
         "migrate" => Some("Data migration utilities."),

Also applies to: 184-184

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/core/all.rs` around lines 138 - 139, The new learning controllers
registered via
controllers.extend(crate::openhuman::learning::all_learning_registered_controllers())
lack a user-facing description because namespace_description() does not return a
description for the "learning" namespace; update the namespace_description
function to include a descriptive entry for "learning" (and any duplicate
namespace_description match arm referenced later around the other occurrence) so
CLI/help discovery shows a human-readable description for the learning RPC
surface—locate namespace_description and add a case for "learning" (or the exact
namespace string used when registering via all_learning_registered_controllers)
with a short explanatory string.

Comment on lines +4 to +16
fn tool_instructions_preamble() -> String {
let mut s = String::new();
s.push_str("\n## Tool Use Protocol\n\n");
s.push_str("To use a tool, wrap a JSON object in <tool_call></tool_call> tags:\n\n");
s.push_str("```\n<tool_call>\n{\"name\": \"tool_name\", \"arguments\": {\"param\": \"value\"}}\n</tool_call>\n```\n\n");
s.push_str("CRITICAL: Output actual <tool_call> tags—never describe steps or give examples.\n\n");
s.push_str("Example: User says \"what's the date?\". You MUST respond with:\n<tool_call>\n{\"name\":\"shell\",\"arguments\":{\"command\":\"date\"}}\n</tool_call>\n\n");
s.push_str("You may use multiple tool calls in a single response. ");
s.push_str("After tool execution, results appear in <tool_result> tags. ");
s.push_str("Continue reasoning with the results until you can give a final answer.\n\n");
s.push_str("### Available Tools\n\n");
s
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix formatting to pass CI.

The pipeline shows cargo fmt --all -- --check failed due to formatting differences in this file. The long string literals in tool_instructions_preamble() likely need to be reformatted.

🔧 Suggested fix

Run cargo fmt to auto-fix, or manually break long strings:

 fn tool_instructions_preamble() -> String {
     let mut s = String::new();
     s.push_str("\n## Tool Use Protocol\n\n");
     s.push_str("To use a tool, wrap a JSON object in <tool_call></tool_call> tags:\n\n");
-    s.push_str("```\n<tool_call>\n{\"name\": \"tool_name\", \"arguments\": {\"param\": \"value\"}}\n</tool_call>\n```\n\n");
-    s.push_str("CRITICAL: Output actual <tool_call> tags—never describe steps or give examples.\n\n");
-    s.push_str("Example: User says \"what's the date?\". You MUST respond with:\n<tool_call>\n{\"name\":\"shell\",\"arguments\":{\"command\":\"date\"}}\n</tool_call>\n\n");
+    s.push_str(
+        "```\n<tool_call>\n{\"name\": \"tool_name\", \"arguments\": {\"param\": \"value\"}}\n\
+         </tool_call>\n```\n\n",
+    );
+    s.push_str(
+        "CRITICAL: Output actual <tool_call> tags—never describe steps or give examples.\n\n",
+    );
+    s.push_str(
+        "Example: User says \"what's the date?\". You MUST respond with:\n\
+         <tool_call>\n{\"name\":\"shell\",\"arguments\":{\"command\":\"date\"}}\n</tool_call>\n\n",
+    );
     s.push_str("You may use multiple tool calls in a single response. ");
🧰 Tools
🪛 GitHub Actions: Type Check

[error] 6-1: cargo fmt --all -- --check failed due to formatting differences. Suggested change: wrap CRITICAL push_str string in a multi-line push_str(...) call.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/harness/instructions.rs` around lines 4 - 16, The
formatting failure is caused by long string literals in function
tool_instructions_preamble(); run cargo fmt to auto-fix or manually wrap/break
the long s.push_str(...) calls into shorter multi-line string literals (use
concatenated or raw/multi-line strings) so they adhere to rustfmt rules,
updating the s.push_str(...) invocations around the code block and
"CRITICAL"/"Example" paragraphs to the shorter, formatted forms suggested in the
review.

Comment on lines +220 to +229
// Filter out Skill-category tools (e.g. Composio, Apify) from the
// main agent prompt — those are only available to the skills_agent
// subagent via category_filter = "skill".
let non_skill_tools: Vec<&Box<dyn crate::openhuman::tools::Tool>> = tools_registry
.iter()
.filter(|t| t.category() != crate::openhuman::tools::traits::ToolCategory::Skill)
.collect();
let non_skill_refs: Vec<&dyn crate::openhuman::tools::Tool> =
non_skill_tools.iter().map(|t| t.as_ref()).collect();
system_prompt.push_str(&build_tool_instructions_filtered(&non_skill_refs));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Filter the manual tool summary too.

This only trims the appended instruction block. build_system_prompt(...) has already consumed tool_descs, and Lines 180-185 still add the composio description when enabled, so the main agent prompt continues to advertise a Skill-category tool. Apply the same non-skill filter before building tool_descs, or remove Skill entries from that list entirely.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/channels/runtime/startup.rs` around lines 220 - 229, The system
prompt still includes Skill-category tool descriptions because
build_system_prompt(...) is being called with the full tool_descs list; before
constructing tool_descs (or before calling build_system_prompt), filter
tools_registry the same way you did for the appended instruction block: create a
non-skill collection (e.g., reuse non_skill_tools/non_skill_refs logic) and
build tool_descs only from those non-skill tools so build_system_prompt(...)
will not receive or include Skill-category entries like Composio; alternatively,
remove Skill entries from the existing tool_descs vector prior to calling
build_system_prompt.

@@ -0,0 +1,615 @@
//! LinkedIn profile enrichment via Gmail email mining + Apify scraping.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Address cargo fmt issues before merging.

The pipeline reports multiple cargo fmt failures. Run cargo fmt --all to fix formatting.

As per coding guidelines: "Run cargo fmt and cargo check for Rust code before merging."

🧰 Tools
🪛 GitHub Actions: Type Check

[error] 62-1: cargo fmt --all -- --check failed due to formatting differences. Suggested change: format result.log.push(...) with method chaining across multiple lines.


[error] 72-1: cargo fmt --all -- --check failed due to formatting differences. Suggested change: wrap result.log.push(...) across multiple lines.


[error] 86-1: cargo fmt --all -- --check failed due to formatting differences. Suggested change: wrap result.log.push(...) across multiple lines.


[error] 96-1: cargo fmt --all -- --check failed due to formatting differences. Suggested change: format result.log.push(...) with chained lines.


[error] 279-1: cargo fmt -- -- --check failed due to formatting differences. Suggested change: break exp.get("description").and_then(...).unwrap_or("") chain across multiple lines.


[error] 368-1: cargo fmt --all -- --check failed due to formatting differences. Suggested change: format static COMM_RE LazyLock::new(...) with line breaks.


[error] 407-1: cargo fmt --all -- --check failed due to formatting differences. Suggested change: condense msg.pointer(...).and_then(... ) chain onto a single line.


[error] 516-1: cargo fmt --all -- --check failed due to formatting differences. Suggested change: format MemoryClient::new_local().map_err(...) call with line breaks.


[error] 526-1: cargo fmt --all -- --check failed due to formatting differences. Suggested change: align store_skill_sync string arguments in a formatted multi-line style.


[error] 549-1: cargo fmt --all -- --check failed due to formatting differences. Suggested change: format MemoryClient::new_local().map_err(...) call with line breaks.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/learning/linkedin_enrichment.rs` at line 1, Run rustfmt on the
repository and fix formatting in this module: run `cargo fmt --all` (and then
`cargo check`) and commit the changes; specifically ensure
src/openhuman/learning/linkedin_enrichment.rs is reformatted to comply with
rustfmt rules (fix imports, spacing, line breaks, and doc comment alignment for
the LinkedIn enrichment module and any functions/impls within it) so cargo fmt
no longer reports failures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant