feat: onboarding Gmail integration + LinkedIn profile enrichment#524
feat: onboarding Gmail integration + LinkedIn profile enrichment#524senamakel wants to merge 6 commits intotinyhumansai:mainfrom
Conversation
…ing steps - Removed the old `toolkitMeta.ts` file and replaced it with a new `toolkitMeta.tsx` file that includes updated metadata handling for Composio toolkits, enhancing the integration with React components. - Updated the `ComposioConnectModal` to directly render icons without additional markup, streamlining the component structure. - Modified the `Skills` page to utilize the new icon rendering method, improving consistency across the application. - Enhanced the onboarding process by introducing a new `ContextGatheringStep` component, which gathers user context from connected integrations, improving the onboarding experience. - Updated the `SkillsStep` to reflect changes in toolkit connection handling and display, ensuring a smoother user interaction during onboarding.
…d status retrieval - Added new tools for running Apify actors and fetching their run statuses, enhancing automation capabilities. - Updated the integration schema to include an `apify` toggle for user configuration, allowing for flexible integration management. - Enhanced the onboarding experience by modifying the SkillsStep to focus on Gmail integration, streamlining user interactions. - Improved documentation and comments for clarity on the new Apify functionalities and their usage.
- Introduced a new `linkedin_enrichment` module for enriching user profiles by scraping LinkedIn data from Gmail. - Implemented the `run_linkedin_enrichment` function to handle the enrichment pipeline, including Gmail search, scraping via Apify, and data persistence. - Added controller schemas for the learning domain, enabling integration with the existing controller framework. - Updated the `all.rs` file to register the new learning controllers and schemas, enhancing the overall functionality of the learning system.
- Added functionality to generate a PROFILE.md file from scraped LinkedIn data, summarizing user profiles for agent context. - Updated the `run_linkedin_enrichment` function to write PROFILE.md to the workspace, enhancing data persistence. - Introduced helper functions for rendering and summarizing LinkedIn profiles, improving the overall enrichment process. - Ensured minimal PROFILE.md creation even when scraping fails, maintaining essential user context.
…t pipeline - Updated the ContextGatheringStep to integrate a new pipeline for LinkedIn enrichment, replacing the previous Gmail profile fetching stages. - Implemented a progress animation and logging for the enrichment process, improving user feedback during data retrieval. - Refactored stage definitions to align with the new pipeline structure, enhancing clarity and maintainability. - Introduced error handling and status updates for each stage of the enrichment process, ensuring robust user experience.
…and flexibility - Introduced helper functions `tool_instructions_preamble` and `append_tool_entry` to streamline the construction of tool instructions. - Updated `build_tool_instructions` to utilize the new helper functions, improving code readability and maintainability. - Added `build_tool_instructions_filtered` to allow for generating instructions from a filtered list of tools, enhancing flexibility in tool usage. - Adjusted the startup process to use the filtered instructions, ensuring only relevant tools are included in the system prompt.
📝 WalkthroughWalkthroughThe PR introduces LinkedIn profile enrichment during onboarding via Gmail and Apify integration, migrates Composio toolkit icons to React components with branded SVG support, adds three Apify actor tools for running and monitoring async tasks, and extends the onboarding flow with a new context-gathering step that executes the enrichment pipeline. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant Frontend
participant Backend as Backend<br/>(RPC Handler)
participant Gmail as Gmail/Composio
participant Apify as Apify API
participant Memory as Memory Store
participant LLM as LLM Service
User->>Frontend: Click "Continue" on SkillsStep
Frontend->>Frontend: Update draft.connectedSources
Frontend->>Backend: callCoreRpc("openhuman.learning_linkedin_enrichment")
activate Backend
Backend->>Gmail: Composio: Search from:linkedin.com
Gmail-->>Backend: Email messages with LinkedIn URLs
Backend->>Backend: Extract profile URL from emails
Backend->>Apify: POST /dev_fusion/linkedin-profile-scraper
Apify-->>Backend: Run ID, status, dataset reference
Backend->>Apify: Poll run status until SUCCEEDED
Apify-->>Backend: Final status SUCCEEDED + results
Backend->>Backend: Extract profile JSON from dataset
Backend->>LLM: Optional: Summarize profile (with fallback)
LLM-->>Backend: Markdown summary or error (use raw JSON)
Backend->>Memory: Store profile via MemoryClient
Memory-->>Backend: Stored
Backend-->>Frontend: { profile_url, profile_data, log }
deactivate Backend
Frontend->>Frontend: Parse logs, update UI progress
Frontend->>User: Display results, enable Continue button
User->>Frontend: Click "Continue"
Frontend->>Frontend: Call onNext(), advance to next step
sequenceDiagram
participant ComposioModal
participant SkillsStep
participant Integrations as useComposioIntegrations
participant GmailIntegration
ComposioModal->>SkillsStep: User clicks "Connect Gmail"
SkillsStep->>SkillsStep: Set activeToolkit = "gmail"
SkillsStep->>ComposioModal: Render with activeToolkit state
ComposioModal->>ComposioModal: Show OAuth flow
ComposioModal->>GmailIntegration: User authorizes
GmailIntegration-->>ComposioModal: Success callback
ComposioModal->>SkillsStep: onClose()
SkillsStep->>Integrations: Re-fetch integrations state
Integrations-->>SkillsStep: Updated status (connected=true)
SkillsStep->>SkillsStep: Update UI: show "Connected" badge
SkillsStep->>SkillsStep: Enable "Continue" button
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/openhuman/context/prompt.rs (1)
355-375:⚠️ Potential issue | 🟠 MajorAlign identity file usage across agent and channel prompts.
The main agent prompt no longer injects
USER.md(replaced withPROFILE.md), butchannels_prompt.rs(line 41) still listsUSER.mdinbootstrap_files, andsubconscious/prompt.rsstill loads identity context fromUSER.md. Additionally,workspace/ops.rs(line 11) still includes the defaultUSER.mdcontent.If
USER.mdis being phased out in favor ofPROFILE.md, updatechannels_prompt.rsandsubconscious/prompt.rsto usePROFILE.mdand remove theUSER.mddefault fromworkspace/ops.rs. If channels and subconscious intentionally retainUSER.mdwhile main agents usePROFILE.md, document this separation in code comments.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/context/prompt.rs` around lines 355 - 375, The repo is inconsistent: main prompt code now uses PROFILE.md but channels_prompt.rs's bootstrap_files and subconscious/prompt.rs still reference USER.md and workspace/ops.rs still provides a USER.md default; update channels_prompt.rs (bootstrap_files) and subconscious/prompt.rs (where identity is loaded) to reference "PROFILE.md" instead of "USER.md", and remove the USER.md default content from workspace/ops.rs (or if the intent is to keep both, add a clear code comment in channels_prompt.rs and subconscious/prompt.rs documenting why channels/subconscious retain USER.md while main agents use PROFILE.md). Ensure you update the relevant constants/arrays and any calls that read or inject the identity file to use "PROFILE.md" (or add the explanatory comment) so file usage is consistent.
🧹 Nitpick comments (5)
app/src/pages/onboarding/steps/ContextGatheringStep.tsx (1)
113-121: Add debug logging for pipeline execution.Per coding guidelines, add namespaced debug logs for new flows to aid tracing.
🔧 Suggested improvement
async function runPipeline() { + console.debug('[onboarding:context-gathering] starting enrichment pipeline'); // Mark all stages as active (pipeline runs as one call). setStageStatuses(prev => ({ ...prev, 'gmail-search': 'active' })); try { const raw = await callCoreRpc<unknown>({ method: 'openhuman.learning_linkedin_enrichment', }); + console.debug('[onboarding:context-gathering] RPC completed', { raw }); const result = unwrapCliEnvelope<EnrichmentResult>(raw);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/src/pages/onboarding/steps/ContextGatheringStep.tsx` around lines 113 - 121, The runPipeline function is missing namespaced debug logging for tracing; add debug logs (using the project's logger or a namespaced console.debug) at key points in runPipeline: before starting the pipeline (after setStageStatuses), immediately before and after calling callCoreRpc('openhuman.learning_linkedin_enrichment'), and after unwrapCliEnvelope(EnrichmentResult) to log the unwrapped result or error context; reference the runPipeline function and the callCoreRpc/unwrapCliEnvelope calls when inserting these concise, namespaced debug messages to aid tracing.src/openhuman/learning/linkedin_enrichment.rs (1)
512-546: Consider reusing MemoryClient instance.Both
persist_linkedin_profileandpersist_linkedin_url_onlycreate separateMemoryClient::new_local()instances. Since both are called from the same pipeline, consider passing the client as a parameter or creating it once inrun_linkedin_enrichment.This is minor since the pipeline runs once during onboarding.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/learning/linkedin_enrichment.rs` around lines 512 - 546, persist_linkedin_profile and persist_linkedin_url_only each call MemoryClient::new_local(), causing duplicate client creation; change the design to create a single MemoryClient in run_linkedin_enrichment and pass a reference or owned client into persist_linkedin_profile and persist_linkedin_url_only (e.g., add a parameter like memory: &MemoryClient or memory: MemoryClient), update their signatures and call sites in run_linkedin_enrichment accordingly, and remove the MemoryClient::new_local() calls from those functions so they reuse the shared client instance.app/src/components/composio/toolkitMeta.tsx (1)
293-305: Inconsistency between CATALOG and KNOWN_COMPOSIO_TOOLKITS.The
CATALOGincludes both slug variants (googlecalendar/google_calendar,googledrive/google_drive,googlesheets/google_sheets), butKNOWN_COMPOSIO_TOOLKITSonly includes one variant for each. This could cause display issues if the backend returns the alternate variant.Consider including both variants in
KNOWN_COMPOSIO_TOOLKITSor documenting that the list is non-exhaustive:♻️ Suggested fix
export const KNOWN_COMPOSIO_TOOLKITS = Object.freeze([ 'gmail', 'googlecalendar', + 'google_calendar', 'googledrive', + 'google_drive', 'notion', 'github', 'slack', 'linear', 'facebook', 'google_sheets', + 'googlesheets', 'instagram', 'reddit', ]);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/src/components/composio/toolkitMeta.tsx` around lines 293 - 305, The KNOWN_COMPOSIO_TOOLKITS array is missing alternate slug variants present in CATALOG (e.g., google_calendar vs googlecalendar, google_drive vs googledrive, google_sheets vs googlesheets); update KNOWN_COMPOSIO_TOOLKITS to include both slug variants for each toolkit or explicitly document that the list is non‑exhaustive and used only as a hint. Locate the KNOWN_COMPOSIO_TOOLKITS constant and add the alternate strings (google_calendar, google_drive, google_sheets and any other duplicate variants found in CATALOG) so the frontend can handle either slug returned by the backend.app/src/pages/onboarding/Onboarding.tsx (2)
135-160: Add debug logging for context completion flow.The function handles onboarding completion correctly with good error handling. Consider adding entry debug logging for traceability.
🔧 Suggested improvement
const handleContextNext = async () => { + console.debug('[onboarding] handleContextNext: completing onboarding', { + connectedSources: draft.connectedSources, + }); await setOnboardingTasks({🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/src/pages/onboarding/Onboarding.tsx` around lines 135 - 160, Add entry debug logging at the start of handleContextNext to trace the context completion flow: log a clear message (including relevant state like draft.connectedSources or a minimal marker) before calling setOnboardingTasks, and add similar debug logs before the userApi.onboardingComplete call and before setOnboardingCompletedFlag to aid tracing; use the existing logging mechanism (console.debug/console.log or the project logger) and include the function name handleContextNext in each message to make logs searchable.
130-133: Add debug logging for new flow entry point.Per coding guidelines, new flows should have substantial development-oriented logs. Consider adding a namespaced debug log when
handleSkillsNextis invoked.🔧 Suggested improvement
const handleSkillsNext = async (connectedSources: string[]) => { + console.debug('[onboarding] handleSkillsNext called', { connectedSources }); setDraft(prev => ({ ...prev, connectedSources })); handleNext(); };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/src/pages/onboarding/Onboarding.tsx` around lines 130 - 133, Add a namespaced debug log at the start of the handleSkillsNext function so developer telemetry records when this new flow entry point is invoked; specifically, inside handleSkillsNext (which calls setDraft and handleNext) log a clear namespaced message (e.g., "onboarding:handleSkillsNext") along with the connectedSources payload and any relevant draft state before calling setDraft/handleNext to aid debugging.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@app/src/components/composio/toolkitMeta.tsx`:
- Line 1: Prettier formatting failed in
app/src/components/composio/toolkitMeta.tsx; run the project's Prettier (and
ESLint autofix) in the app workspace, format this file (and any changed files)
and re-run linting so the file (toolkitMeta.tsx) adheres to the code style
rules, then stage and commit the formatted changes before pushing.
In `@app/src/pages/onboarding/steps/ContextGatheringStep.tsx`:
- Line 1: Run Prettier on the ContextGatheringStep.tsx file to resolve
formatting issues reported by the pipeline; specifically run `npm run format` or
`npx prettier --write` in the app workspace, then stage the formatted changes
for commit. Ensure the exported component/function ContextGatheringStep (and any
imports at top of the file) are correctly formatted per project Prettier rules
before pushing.
- Around line 96-111: The effect in useEffect checks hasGmail and synchronously
calls setStageStatuses, setStageDetails, and setFinished which triggers the
react-hooks/set-state-in-effect lint rule; instead, move the "skipped"
derivation out of the effect (derive skipped statuses from STAGES/hasGmail
during render) or wrap the state updates in a microtask so they are async (e.g.,
schedule via setTimeout/queueMicrotask) and keep the existing early-return
behavior in useEffect; update the logic around ranRef, hasGmail, STAGES,
setStageStatuses, setStageDetails, setFinished, and runPipeline accordingly so
no synchronous setState occurs directly in the effect body.
In `@app/src/pages/onboarding/steps/SkillsStep.tsx`:
- Around line 63-76: The step currently hard-codes displayToolkits to Gmail and
computes connectedCount from all connections; update it to derive
displayToolkits from useComposioIntegrations().toolkits by selecting the Gmail
toolkit (or an empty array when not present) so the UI reflects the backend
allowlist, then compute connectedCount and connectedSources only by iterating
connectionByToolkit for the slugs in displayToolkits (use the toolkit.slug to
filter), and change the loading/unavailable logic to show a retry/unavailable
card when composioError is set rather than always rendering an actionable Gmail
card; ensure composioToolkitMeta('gmail') is only used to map metadata for a
toolkit that exists in toolkits before adding to displayToolkits.
In `@src/core/all.rs`:
- Around line 138-139: The new learning controllers registered via
controllers.extend(crate::openhuman::learning::all_learning_registered_controllers())
lack a user-facing description because namespace_description() does not return a
description for the "learning" namespace; update the namespace_description
function to include a descriptive entry for "learning" (and any duplicate
namespace_description match arm referenced later around the other occurrence) so
CLI/help discovery shows a human-readable description for the learning RPC
surface—locate namespace_description and add a case for "learning" (or the exact
namespace string used when registering via all_learning_registered_controllers)
with a short explanatory string.
In `@src/openhuman/agent/harness/instructions.rs`:
- Around line 4-16: The formatting failure is caused by long string literals in
function tool_instructions_preamble(); run cargo fmt to auto-fix or manually
wrap/break the long s.push_str(...) calls into shorter multi-line string
literals (use concatenated or raw/multi-line strings) so they adhere to rustfmt
rules, updating the s.push_str(...) invocations around the code block and
"CRITICAL"/"Example" paragraphs to the shorter, formatted forms suggested in the
review.
In `@src/openhuman/channels/runtime/startup.rs`:
- Around line 220-229: The system prompt still includes Skill-category tool
descriptions because build_system_prompt(...) is being called with the full
tool_descs list; before constructing tool_descs (or before calling
build_system_prompt), filter tools_registry the same way you did for the
appended instruction block: create a non-skill collection (e.g., reuse
non_skill_tools/non_skill_refs logic) and build tool_descs only from those
non-skill tools so build_system_prompt(...) will not receive or include
Skill-category entries like Composio; alternatively, remove Skill entries from
the existing tool_descs vector prior to calling build_system_prompt.
In `@src/openhuman/learning/linkedin_enrichment.rs`:
- Line 1: Run rustfmt on the repository and fix formatting in this module: run
`cargo fmt --all` (and then `cargo check`) and commit the changes; specifically
ensure src/openhuman/learning/linkedin_enrichment.rs is reformatted to comply
with rustfmt rules (fix imports, spacing, line breaks, and doc comment alignment
for the LinkedIn enrichment module and any functions/impls within it) so cargo
fmt no longer reports failures.
---
Outside diff comments:
In `@src/openhuman/context/prompt.rs`:
- Around line 355-375: The repo is inconsistent: main prompt code now uses
PROFILE.md but channels_prompt.rs's bootstrap_files and subconscious/prompt.rs
still reference USER.md and workspace/ops.rs still provides a USER.md default;
update channels_prompt.rs (bootstrap_files) and subconscious/prompt.rs (where
identity is loaded) to reference "PROFILE.md" instead of "USER.md", and remove
the USER.md default content from workspace/ops.rs (or if the intent is to keep
both, add a clear code comment in channels_prompt.rs and subconscious/prompt.rs
documenting why channels/subconscious retain USER.md while main agents use
PROFILE.md). Ensure you update the relevant constants/arrays and any calls that
read or inject the identity file to use "PROFILE.md" (or add the explanatory
comment) so file usage is consistent.
---
Nitpick comments:
In `@app/src/components/composio/toolkitMeta.tsx`:
- Around line 293-305: The KNOWN_COMPOSIO_TOOLKITS array is missing alternate
slug variants present in CATALOG (e.g., google_calendar vs googlecalendar,
google_drive vs googledrive, google_sheets vs googlesheets); update
KNOWN_COMPOSIO_TOOLKITS to include both slug variants for each toolkit or
explicitly document that the list is non‑exhaustive and used only as a hint.
Locate the KNOWN_COMPOSIO_TOOLKITS constant and add the alternate strings
(google_calendar, google_drive, google_sheets and any other duplicate variants
found in CATALOG) so the frontend can handle either slug returned by the
backend.
In `@app/src/pages/onboarding/Onboarding.tsx`:
- Around line 135-160: Add entry debug logging at the start of handleContextNext
to trace the context completion flow: log a clear message (including relevant
state like draft.connectedSources or a minimal marker) before calling
setOnboardingTasks, and add similar debug logs before the
userApi.onboardingComplete call and before setOnboardingCompletedFlag to aid
tracing; use the existing logging mechanism (console.debug/console.log or the
project logger) and include the function name handleContextNext in each message
to make logs searchable.
- Around line 130-133: Add a namespaced debug log at the start of the
handleSkillsNext function so developer telemetry records when this new flow
entry point is invoked; specifically, inside handleSkillsNext (which calls
setDraft and handleNext) log a clear namespaced message (e.g.,
"onboarding:handleSkillsNext") along with the connectedSources payload and any
relevant draft state before calling setDraft/handleNext to aid debugging.
In `@app/src/pages/onboarding/steps/ContextGatheringStep.tsx`:
- Around line 113-121: The runPipeline function is missing namespaced debug
logging for tracing; add debug logs (using the project's logger or a namespaced
console.debug) at key points in runPipeline: before starting the pipeline (after
setStageStatuses), immediately before and after calling
callCoreRpc('openhuman.learning_linkedin_enrichment'), and after
unwrapCliEnvelope(EnrichmentResult) to log the unwrapped result or error
context; reference the runPipeline function and the
callCoreRpc/unwrapCliEnvelope calls when inserting these concise, namespaced
debug messages to aid tracing.
In `@src/openhuman/learning/linkedin_enrichment.rs`:
- Around line 512-546: persist_linkedin_profile and persist_linkedin_url_only
each call MemoryClient::new_local(), causing duplicate client creation; change
the design to create a single MemoryClient in run_linkedin_enrichment and pass a
reference or owned client into persist_linkedin_profile and
persist_linkedin_url_only (e.g., add a parameter like memory: &MemoryClient or
memory: MemoryClient), update their signatures and call sites in
run_linkedin_enrichment accordingly, and remove the MemoryClient::new_local()
calls from those functions so they reuse the shared client instance.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 47534480-d111-45b8-a4b5-4902dfdc16de
📒 Files selected for processing (22)
app/src/components/composio/ComposioConnectModal.tsxapp/src/components/composio/toolkitMeta.tsapp/src/components/composio/toolkitMeta.tsxapp/src/pages/Skills.tsxapp/src/pages/__tests__/Skills.composio-catalog.test.tsxapp/src/pages/onboarding/Onboarding.tsxapp/src/pages/onboarding/steps/ContextGatheringStep.tsxapp/src/pages/onboarding/steps/SkillsStep.tsxsrc/core/all.rssrc/openhuman/about_app/catalog.rssrc/openhuman/agent/harness/instructions.rssrc/openhuman/agent/harness/mod.rssrc/openhuman/channels/runtime/startup.rssrc/openhuman/config/schema/tools.rssrc/openhuman/context/prompt.rssrc/openhuman/integrations/apify.rssrc/openhuman/integrations/mod.rssrc/openhuman/integrations/types.rssrc/openhuman/learning/linkedin_enrichment.rssrc/openhuman/learning/mod.rssrc/openhuman/learning/schemas.rssrc/openhuman/tools/ops.rs
💤 Files with no reviewable changes (1)
- app/src/components/composio/toolkitMeta.ts
| @@ -0,0 +1,320 @@ | |||
| /** | |||
There was a problem hiding this comment.
Address Prettier formatting issues.
The pipeline reports Prettier code style issues. Run formatting before merging.
As per coding guidelines: "Run Prettier and ESLint formatting/linting in the app workspace before merging."
🧰 Tools
🪛 GitHub Actions: Type Check
[warning] 1-1: Prettier reported code style issues in this file during --check.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@app/src/components/composio/toolkitMeta.tsx` at line 1, Prettier formatting
failed in app/src/components/composio/toolkitMeta.tsx; run the project's
Prettier (and ESLint autofix) in the app workspace, format this file (and any
changed files) and re-run linting so the file (toolkitMeta.tsx) adheres to the
code style rules, then stage and commit the formatted changes before pushing.
| @@ -0,0 +1,315 @@ | |||
| /** | |||
There was a problem hiding this comment.
Address Prettier formatting issues.
The pipeline reports Prettier code style issues. Run npm run format or npx prettier --write on this file before merging.
As per coding guidelines: "Run Prettier and ESLint formatting/linting in the app workspace before merging."
🧰 Tools
🪛 GitHub Actions: Type Check
[warning] 1-1: Prettier reported code style issues in this file during --check.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@app/src/pages/onboarding/steps/ContextGatheringStep.tsx` at line 1, Run
Prettier on the ContextGatheringStep.tsx file to resolve formatting issues
reported by the pipeline; specifically run `npm run format` or `npx prettier
--write` in the app workspace, then stage the formatted changes for commit.
Ensure the exported component/function ContextGatheringStep (and any imports at
top of the file) are correctly formatted per project Prettier rules before
pushing.
| useEffect(() => { | ||
| if (ranRef.current) return; | ||
| ranRef.current = true; | ||
|
|
||
| if (!hasGmail) { | ||
| const skipped: Record<string, StageStatus> = {}; | ||
| for (const s of STAGES) skipped[s.id] = 'skipped'; | ||
| setStageStatuses(skipped); | ||
| setStageDetails({ 'gmail-search': 'Gmail not connected' }); | ||
| setFinished(true); | ||
| return; | ||
| } | ||
|
|
||
| void runPipeline(); | ||
| // eslint-disable-next-line react-hooks/exhaustive-deps | ||
| }, []); |
There was a problem hiding this comment.
Potential lint violation: setState calls inside useEffect.
Based on learnings, this codebase disallows synchronous setState calls directly inside useEffect bodies (react-hooks/set-state-in-effect lint rule). Lines 101-105 call setStageStatuses, setStageDetails, and setFinished directly in the effect.
Consider refactoring to derive the "skipped" state from props/render or moving the skip logic outside the effect:
♻️ Suggested refactor
+const ContextGatheringStep = ({
+ connectedSources,
+ onNext,
+ onBack: _onBack,
+}: ContextGatheringStepProps) => {
+ const hasGmail = connectedSources.some(s => s.includes('gmail'));
+
+ // Derive initial state based on hasGmail
+ const [stageStatuses, setStageStatuses] = useState<Record<string, StageStatus>>(() => {
+ const initial: Record<string, StageStatus> = {};
+ for (const s of STAGES) initial[s.id] = hasGmail ? 'pending' : 'skipped';
+ return initial;
+ });
+ const [stageDetails, setStageDetails] = useState<Record<string, string>>(() =>
+ hasGmail ? {} : { 'gmail-search': 'Gmail not connected' }
+ );
+ const [finished, setFinished] = useState(!hasGmail);
// ...
useEffect(() => {
if (ranRef.current) return;
ranRef.current = true;
- if (!hasGmail) {
- const skipped: Record<string, StageStatus> = {};
- for (const s of STAGES) skipped[s.id] = 'skipped';
- setStageStatuses(skipped);
- setStageDetails({ 'gmail-search': 'Gmail not connected' });
- setFinished(true);
- return;
- }
+ if (!hasGmail) return;
void runPipeline();Based on learnings: "In React components, do not perform synchronous setState (or other state-updating calls) directly inside useEffect bodies."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@app/src/pages/onboarding/steps/ContextGatheringStep.tsx` around lines 96 -
111, The effect in useEffect checks hasGmail and synchronously calls
setStageStatuses, setStageDetails, and setFinished which triggers the
react-hooks/set-state-in-effect lint rule; instead, move the "skipped"
derivation out of the effect (derive skipped statuses from STAGES/hasGmail
during render) or wrap the state updates in a microtask so they are async (e.g.,
schedule via setTimeout/queueMicrotask) and keep the existing early-return
behavior in useEffect; update the logic around ranRef, hasGmail, STAGES,
setStageStatuses, setStageDetails, setFinished, and runPipeline accordingly so
no synchronous setState occurs directly in the effect body.
| const { | ||
| connectionByToolkit, | ||
| loading: composioLoading, | ||
| refresh: refreshComposio, | ||
| } = useComposioIntegrations(); | ||
|
|
||
| // Only show Gmail during onboarding — more integrations on the Integrations page. | ||
| const gmailMeta = composioToolkitMeta('gmail'); | ||
| const displayToolkits: ComposioToolkitMeta[] = [gmailMeta]; | ||
|
|
||
| const connectedCount = Array.from(connectionByToolkit.values()).filter(c => { | ||
| const state = deriveComposioState(c); | ||
| return state === 'connected'; | ||
| }).length; |
There was a problem hiding this comment.
Drive this step from the Gmail allowlist, not the global connection map.
useComposioIntegrations().toolkits is the backend allowlist, but this step ignores it and hard-codes displayToolkits to Gmail. At the same time, connectedCount and connectedSources are computed from all Composio connections. A user with an existing Slack/Notion connection will see “Continue” even if Gmail is disconnected, and ContextGatheringStep can then skip enrichment unexpectedly. The loading branch on Line 105 is also unreachable because displayToolkits.length is always 1. Gate the card on the fetched Gmail toolkit and scope the count/submission to the displayed slug(s).
💡 Possible fix
const {
+ toolkits,
+ error: composioError,
connectionByToolkit,
loading: composioLoading,
refresh: refreshComposio,
} = useComposioIntegrations();
// Only show Gmail during onboarding — more integrations on the Integrations page.
const gmailMeta = composioToolkitMeta('gmail');
- const displayToolkits: ComposioToolkitMeta[] = [gmailMeta];
+ const displayToolkits: ComposioToolkitMeta[] =
+ composioLoading || toolkits.includes(gmailMeta.slug) ? [gmailMeta] : [];
- const connectedCount = Array.from(connectionByToolkit.values()).filter(c => {
- const state = deriveComposioState(c);
- return state === 'connected';
- }).length;
+ const connectedCount = displayToolkits.filter(
+ meta => deriveComposioState(connectionByToolkit.get(meta.slug)) === 'connected'
+ ).length;
const handleFinish = async () => {
setError(null);
setLoading(true);
try {
- const connectedSources = Array.from(connectionByToolkit.entries())
- .filter(([, c]) => deriveComposioState(c) === 'connected')
- .map(([slug]) => `composio:${slug}`);
+ const connectedSources = displayToolkits
+ .filter(meta => deriveComposioState(connectionByToolkit.get(meta.slug)) === 'connected')
+ .map(meta => `composio:${meta.slug}`);
await onNext(connectedSources);Use composioError to render an unavailable/retry state instead of leaving a permanently actionable Gmail card when the allowlist fetch fails.
Also applies to: 82-85, 105-189
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@app/src/pages/onboarding/steps/SkillsStep.tsx` around lines 63 - 76, The step
currently hard-codes displayToolkits to Gmail and computes connectedCount from
all connections; update it to derive displayToolkits from
useComposioIntegrations().toolkits by selecting the Gmail toolkit (or an empty
array when not present) so the UI reflects the backend allowlist, then compute
connectedCount and connectedSources only by iterating connectionByToolkit for
the slugs in displayToolkits (use the toolkit.slug to filter), and change the
loading/unavailable logic to show a retry/unavailable card when composioError is
set rather than always rendering an actionable Gmail card; ensure
composioToolkitMeta('gmail') is only used to map metadata for a toolkit that
exists in toolkits before adding to displayToolkits.
| // Self-learning and user context enrichment | ||
| controllers.extend(crate::openhuman::learning::all_learning_registered_controllers()); |
There was a problem hiding this comment.
Add a description for the new learning namespace.
The controllers are registered here, but namespace_description() still falls through to None for learning. That leaves CLI/help discovery incomplete for the new RPC surface.
🛠️ Suggested follow-up
"local_ai" => Some("Local AI chat, inference, downloads, and media operations."),
+ "learning" => Some("User context enrichment and self-learning controllers."),
"migrate" => Some("Data migration utilities."),Also applies to: 184-184
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/core/all.rs` around lines 138 - 139, The new learning controllers
registered via
controllers.extend(crate::openhuman::learning::all_learning_registered_controllers())
lack a user-facing description because namespace_description() does not return a
description for the "learning" namespace; update the namespace_description
function to include a descriptive entry for "learning" (and any duplicate
namespace_description match arm referenced later around the other occurrence) so
CLI/help discovery shows a human-readable description for the learning RPC
surface—locate namespace_description and add a case for "learning" (or the exact
namespace string used when registering via all_learning_registered_controllers)
with a short explanatory string.
| fn tool_instructions_preamble() -> String { | ||
| let mut s = String::new(); | ||
| s.push_str("\n## Tool Use Protocol\n\n"); | ||
| s.push_str("To use a tool, wrap a JSON object in <tool_call></tool_call> tags:\n\n"); | ||
| s.push_str("```\n<tool_call>\n{\"name\": \"tool_name\", \"arguments\": {\"param\": \"value\"}}\n</tool_call>\n```\n\n"); | ||
| s.push_str("CRITICAL: Output actual <tool_call> tags—never describe steps or give examples.\n\n"); | ||
| s.push_str("Example: User says \"what's the date?\". You MUST respond with:\n<tool_call>\n{\"name\":\"shell\",\"arguments\":{\"command\":\"date\"}}\n</tool_call>\n\n"); | ||
| s.push_str("You may use multiple tool calls in a single response. "); | ||
| s.push_str("After tool execution, results appear in <tool_result> tags. "); | ||
| s.push_str("Continue reasoning with the results until you can give a final answer.\n\n"); | ||
| s.push_str("### Available Tools\n\n"); | ||
| s | ||
| } |
There was a problem hiding this comment.
Fix formatting to pass CI.
The pipeline shows cargo fmt --all -- --check failed due to formatting differences in this file. The long string literals in tool_instructions_preamble() likely need to be reformatted.
🔧 Suggested fix
Run cargo fmt to auto-fix, or manually break long strings:
fn tool_instructions_preamble() -> String {
let mut s = String::new();
s.push_str("\n## Tool Use Protocol\n\n");
s.push_str("To use a tool, wrap a JSON object in <tool_call></tool_call> tags:\n\n");
- s.push_str("```\n<tool_call>\n{\"name\": \"tool_name\", \"arguments\": {\"param\": \"value\"}}\n</tool_call>\n```\n\n");
- s.push_str("CRITICAL: Output actual <tool_call> tags—never describe steps or give examples.\n\n");
- s.push_str("Example: User says \"what's the date?\". You MUST respond with:\n<tool_call>\n{\"name\":\"shell\",\"arguments\":{\"command\":\"date\"}}\n</tool_call>\n\n");
+ s.push_str(
+ "```\n<tool_call>\n{\"name\": \"tool_name\", \"arguments\": {\"param\": \"value\"}}\n\
+ </tool_call>\n```\n\n",
+ );
+ s.push_str(
+ "CRITICAL: Output actual <tool_call> tags—never describe steps or give examples.\n\n",
+ );
+ s.push_str(
+ "Example: User says \"what's the date?\". You MUST respond with:\n\
+ <tool_call>\n{\"name\":\"shell\",\"arguments\":{\"command\":\"date\"}}\n</tool_call>\n\n",
+ );
s.push_str("You may use multiple tool calls in a single response. ");🧰 Tools
🪛 GitHub Actions: Type Check
[error] 6-1: cargo fmt --all -- --check failed due to formatting differences. Suggested change: wrap CRITICAL push_str string in a multi-line push_str(...) call.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/openhuman/agent/harness/instructions.rs` around lines 4 - 16, The
formatting failure is caused by long string literals in function
tool_instructions_preamble(); run cargo fmt to auto-fix or manually wrap/break
the long s.push_str(...) calls into shorter multi-line string literals (use
concatenated or raw/multi-line strings) so they adhere to rustfmt rules,
updating the s.push_str(...) invocations around the code block and
"CRITICAL"/"Example" paragraphs to the shorter, formatted forms suggested in the
review.
| // Filter out Skill-category tools (e.g. Composio, Apify) from the | ||
| // main agent prompt — those are only available to the skills_agent | ||
| // subagent via category_filter = "skill". | ||
| let non_skill_tools: Vec<&Box<dyn crate::openhuman::tools::Tool>> = tools_registry | ||
| .iter() | ||
| .filter(|t| t.category() != crate::openhuman::tools::traits::ToolCategory::Skill) | ||
| .collect(); | ||
| let non_skill_refs: Vec<&dyn crate::openhuman::tools::Tool> = | ||
| non_skill_tools.iter().map(|t| t.as_ref()).collect(); | ||
| system_prompt.push_str(&build_tool_instructions_filtered(&non_skill_refs)); |
There was a problem hiding this comment.
Filter the manual tool summary too.
This only trims the appended instruction block. build_system_prompt(...) has already consumed tool_descs, and Lines 180-185 still add the composio description when enabled, so the main agent prompt continues to advertise a Skill-category tool. Apply the same non-skill filter before building tool_descs, or remove Skill entries from that list entirely.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/openhuman/channels/runtime/startup.rs` around lines 220 - 229, The system
prompt still includes Skill-category tool descriptions because
build_system_prompt(...) is being called with the full tool_descs list; before
constructing tool_descs (or before calling build_system_prompt), filter
tools_registry the same way you did for the appended instruction block: create a
non-skill collection (e.g., reuse non_skill_tools/non_skill_refs logic) and
build tool_descs only from those non-skill tools so build_system_prompt(...)
will not receive or include Skill-category entries like Composio; alternatively,
remove Skill entries from the existing tool_descs vector prior to calling
build_system_prompt.
| @@ -0,0 +1,615 @@ | |||
| //! LinkedIn profile enrichment via Gmail email mining + Apify scraping. | |||
There was a problem hiding this comment.
Address cargo fmt issues before merging.
The pipeline reports multiple cargo fmt failures. Run cargo fmt --all to fix formatting.
As per coding guidelines: "Run cargo fmt and cargo check for Rust code before merging."
🧰 Tools
🪛 GitHub Actions: Type Check
[error] 62-1: cargo fmt --all -- --check failed due to formatting differences. Suggested change: format result.log.push(...) with method chaining across multiple lines.
[error] 72-1: cargo fmt --all -- --check failed due to formatting differences. Suggested change: wrap result.log.push(...) across multiple lines.
[error] 86-1: cargo fmt --all -- --check failed due to formatting differences. Suggested change: wrap result.log.push(...) across multiple lines.
[error] 96-1: cargo fmt --all -- --check failed due to formatting differences. Suggested change: format result.log.push(...) with chained lines.
[error] 279-1: cargo fmt -- -- --check failed due to formatting differences. Suggested change: break exp.get("description").and_then(...).unwrap_or("") chain across multiple lines.
[error] 368-1: cargo fmt --all -- --check failed due to formatting differences. Suggested change: format static COMM_RE LazyLock::new(...) with line breaks.
[error] 407-1: cargo fmt --all -- --check failed due to formatting differences. Suggested change: condense msg.pointer(...).and_then(... ) chain onto a single line.
[error] 516-1: cargo fmt --all -- --check failed due to formatting differences. Suggested change: format MemoryClient::new_local().map_err(...) call with line breaks.
[error] 526-1: cargo fmt --all -- --check failed due to formatting differences. Suggested change: align store_skill_sync string arguments in a formatted multi-line style.
[error] 549-1: cargo fmt --all -- --check failed due to formatting differences. Suggested change: format MemoryClient::new_local().map_err(...) call with line breaks.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/openhuman/learning/linkedin_enrichment.rs` at line 1, Run rustfmt on the
repository and fix formatting in this module: run `cargo fmt --all` (and then
`cargo check`) and commit the changes; specifically ensure
src/openhuman/learning/linkedin_enrichment.rs is reformatted to comply with
rustfmt rules (fix imports, spacing, line breaks, and doc comment alignment for
the LinkedIn enrichment module and any functions/impls within it) so cargo fmt
no longer reports failures.
Summary
comm/in/<username>links, scrapes the profile via Apify (dev_fusion/linkedin-profile-scraper), passes through LLM summarisation, and writesPROFILE.mdto workspacePROFILE.md(generated from real user data) instead of the genericUSER.mdtemplateToolCategory::Skilltools (Composio, Apify) are filtered out of the orchestrator/main agent prompt; only theskills_agentsubagent sees them viacategory_filter = "skill"Key files
app/src/pages/onboarding/steps/SkillsStep.tsx,ContextGatheringStep.tsx,Onboarding.tsxsrc/openhuman/learning/linkedin_enrichment.rs,schemas.rssrc/openhuman/context/prompt.rs(USER.md → PROFILE.md)src/openhuman/agent/harness/instructions.rs,channels/runtime/startup.rssrc/core/all.rsTest plan
cargo test --lib -- linkedin_enrichment— 5 regex tests passcargo check— clean buildtsc --noEmit— clean typecheckopenhuman learning linkedin_enrichment— full pipeline ran successfully, scraped real LinkedIn profile, LLM summarised, PROFILE.md writtenSummary by CodeRabbit
New Features
Improvements