Skip to content

Adding frame.py, functionality for window extraction and coherence analysis#196

Closed
judewerth wants to merge 0 commit into
dj-sciops:mainfrom
judewerth:main
Closed

Adding frame.py, functionality for window extraction and coherence analysis#196
judewerth wants to merge 0 commit into
dj-sciops:mainfrom
judewerth:main

Conversation

@judewerth
Copy link
Copy Markdown
Contributor

Here is my draft for a new pipeline for coherence analysis. However this hasn't been tested, @MilagrosMarin when you come back I would appreciate a coding meeting where we can talk about how to improve, test, and integrate this code into the main repository.
New DataJoint Pipeline.pdf

Copy link
Copy Markdown
Contributor

@MilagrosMarin MilagrosMarin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@judewerth, thank you for providing this pipeline expansion for coherence analyses. The overall pipeline design seems well thought out and aligned with the rest of the pipelines. @ttngu207 and I have reviewed this PR, and we are impressed with your learning and progress on the pipeline design. I have left a few comments on our discussion today. Please feel free to test this new pipeline in the database, but run only 1–2 jobs at a time to avoid overwhelming the worker with untested code.

This PR's feedback is mainly focused on improving the clarity, usability, and integration of your proposed changes to the pipeline:

  • We suggest more meaningful names for the new schema, some tables, and some attributes, to be more related and intuitive to the analysis being performed.
  • Review the total number of primary keys in the downstream pipeline, which can hurt query usability. Consider reducing the number of primary key attributes in downstream tables (e.g., by optimizing the computation to use ephys.LFP instead of ephys.LFP.Trace in Coherence table).
  • Ensure foreign key references between tables (e.g., linking burst detection parameters to active timeframes) are properly defined.
  • Verify the new function for channel-to-electrode mapping is compatible with the existing channel mapping functionality in the pipeline.

Overall, this is a valuable addition to the pipeline, and I am happy to discuss these suggestions further in our meeting tomorrow and provide any additional guidance as needed. Thanks!

Comment thread src/workflow/pipeline/frame.py Outdated
# correctly map electrode indices
electrode_ids = lookup[channel_ids]

return electrode_ids No newline at end of file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please ensure that this helper function, map_channel_to_electrode, is compatible with the channel mapping in the ephys pipeline.

Comment thread src/workflow/pipeline/frame.py Outdated


# Set up schema (connects to database and manages table creation)
schema = dj.schema(DB_PREFIX + "frame")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's consider a more meaningful name for the schema instead of frame; what about coherence or a similar option?

Comment thread src/workflow/pipeline/frame.py Outdated

definition = """
-> ActiveTimeFrames
burst_param_idx: int # Reference to BurstDetectionParamset
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the reference to BurstDetectionParamset is missing as a foreign attribute here instead of defining burst_param_idx.

Comment thread src/workflow/pipeline/frame.py Outdated

# Define Manual Tables
@schema
class AnalysisBoundaries(dj.Manual):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about AnalysisBlock or AnalysisSegment?

Comment thread src/workflow/pipeline/frame.py Outdated

definition = """
-> AnalysisBoundaries
start_time: datetime # Start of active time frame
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use explicit attribute names, e.g., active_start_time, active_end_time.

Comment thread src/workflow/pipeline/frame.py Outdated

definition = """
-> ActiveTimeFrames
-> ephys.LFP.Trace
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this foreign key be ephys.LFP.Trace? The electrode attribute in Synchrony is unnecessary, since the foreign key to LFP.Trace here in Coherence already indicates electrode information. Moreover, referencing ephys.LFP.Trace here (with electrode details) would make the primary key too long, reducing query efficiency. Could you optimize the code to reference ephys.LFP here instead, and add individual electrode entries in the Synchrony table?

@judewerth
Copy link
Copy Markdown
Contributor Author

judewerth commented Dec 13, 2025

@MilagrosMarin @ttngu207
I've edited frame.py to only include the active frame detection portion of the code (please confirm this change made it to the pull request).

Additionally, I propose a strategy to process the MUA data to make this analysis possible. We focus on the first 2 days (already processed), 2 days before drug treatments, and drug treatments.

However, I'm interested to hear your thoughts. If we agree to move forward with this, I can begin inserting MUA Sessions.

Next steps:

  1. I need access to debug the file directly (see below)
Screenshot 2025-12-12 192406
  1. Confirm code works properly
  2. Strategy to process mua sessions

It takes ~9 seconds to process a minute of recording per organoid
Screenshot 2025-12-12 191703

If we were to process every file with 15 workers, a conservative estimate is 50 days. Pretty long.

Screenshot 2025-12-12 192953

However, if we were to process only the first 2 days, the 2 days before drug treatments, and 1 day of drug treatments, it would only take about 10 days.

Screenshot 2025-12-12 192958

@judewerth
Copy link
Copy Markdown
Contributor Author

Hello @MilagrosMarin and @ttngu207,
This is the GitHub request accompanying and email sent out to the whole team. I've included my powerpoint presentation here as well. GBM Analysis - DataJoint Integration.pdf

In summary, this is my proposed additions to the pipeline in preparation for the methods paper that will include the GBM analysis (my previous results have been through independent code, these changes can get those results through the pipeline).

What I would appreciate from y'all:

  1. What are your thoughts on the new analysis, do you believe there are better ways to characterize the recorded neural activity?
  2. How is the architecture set up, do you think there is a better way to integrate the new analysis?
  3. What steps need to be taken to push these changes to the main repository (all the changes are on the frame.py file but they would need to be incorporated into the ephys, analysis and mua files)?
  4. A plan regarding processing the MUA data (via mua.EphysSession).

Best,
Jude

ttngu207
ttngu207 previously approved these changes Dec 22, 2025
@judewerth
Copy link
Copy Markdown
Contributor Author

@MilagrosMarin @ttngu207

Hello,

I would like to start the process of inserting the MUA Data which is necessary to run the frame pipeline. I've added a new notebook based on the CREATE_new_session notebook to explain and streamline this process (CREATE_new_MUA_session).

Along with this push I've inserted two days worth of data (organoid 15) into the MUAEphysSession table. If possible please change the number of workers to what you believe to be best. Let me know how this process goes, if all goes well I will add new organoids.

Best,
Jude

@MilagrosMarin
Copy link
Copy Markdown
Contributor

@MilagrosMarin @ttngu207

Hello,

I would like to start the process of inserting the MUA Data which is necessary to run the frame pipeline. I've added a new notebook based on the CREATE_new_session notebook to explain and streamline this process (CREATE_new_MUA_session).

Along with this push I've inserted two days worth of data (organoid 15) into the MUAEphysSession table. If possible please change the number of workers to what you believe to be best. Let me know how this process goes, if all goes well I will add new organoids.

Best, Jude

Hi @judewerth,

Regarding the operational status of the new MUAEphysSession entries for organoid_id=O15, there are a total of 5,760 sessions for this organoid O15, including 3,607 entries in the MUASpikes table with results already processed and 2,150 error jobs for this table, all with the error message: "FileNotFoundError: No valid full-path found (from [PosixPath('/home/jovyan/s3/inbox'), PosixPath('/home/jovyan/efs/outbox')]) for O13-16_raw/xxx". You can check these errors in the DataJoint platform as well. Please confirm whether the files associated with these sessions are correctly uploaded to the cloud.

@judewerth
Copy link
Copy Markdown
Contributor Author

Hi Milagros,

Thank you for the message. The insert code in the notebook relys on ephys.RawFile to extract file times. I believe all MUA entries contain files in ephys.RawFile which I verified a handful are seen on our Desktop. I've provided a screenshot in the issue #202.

Best,
Jude

@judewerth
Copy link
Copy Markdown
Contributor Author

Hi @MilagrosMarin @ttngu207,

Like we discussed in the meeting, I will be using this pipeline manually to analyze the first 2 days. I've just pushed a new version including STTFA (spike LFP coupling) and STTC (single unit functional connectivity) along with a notebook showcasing these tables (temporary).

While I'm doing that please keep me updated regarding the MUA processing error in the backend as well as how I can help get this code moved from frame.py to the correct schema (analysis, ephys, and mua).

Best,
Jude

Copy link
Copy Markdown
Contributor

@MilagrosMarin MilagrosMarin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial Review

Thanks for this comprehensive addition! I've added some inline comments on issues that need to be addressed before merging.

Summary of Issues:

  1. Critical bugs: Missing parentheses on .pop calls
  2. Missing dependencies: Several packages not in pyproject.toml
  3. Hardcoded configuration: S3 store config should not be in schema module
  4. README changes: Removes important documentation

Happy to discuss any of these points!

Comment thread src/workflow/pipeline/frame.py Outdated
f"Multiple Port IDs found for the {key} - cannot determine the port ID"
)
port_id = port_id.pop

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐛 Bug: Missing parentheses on .pop() call

This assigns the method object itself to port_id, not the result of calling it.

port_id = port_id.pop  # ❌ Returns method object
port_id = port_id.pop()  # ✅ Returns the value

This will cause a runtime error when port_id is used later.

Comment thread src/workflow/pipeline/frame.py Outdated
f"Multiple Probes found for the {key} - cannot determine the probe name"
)
probe_name = probe_name.pop

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Missing parentheses on .pop() call

Same issue as line 156 - should be probe_name.pop() with parentheses.

Also, probe_name is fetched but never used in this method. Is this intentional or should it be used in the commented-out EphysSessionProbe.insert1() below?

Comment thread src/workflow/pipeline/frame.py Outdated
import plotly.io as pio
import neo
import quantities as pq
from elephant.spike_train_correlation import spike_time_tiling_coefficient
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing dependencies in pyproject.toml

The following packages are imported but not listed in pyproject.toml:

  • bottleneck (line 11)
  • specparam (line 16)
  • neo (line 21)
  • quantities (line 22)
  • elephant (line 23)

Please add these to the dependencies list to ensure the package installs correctly.

Comment thread src/workflow/pipeline/frame.py Outdated

# create lookup to convert
lookup = np.empty(32, dtype=int)
lookup[channel_mapping] = electrode_mapping
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded electrode count

The value 32 is hardcoded here, which assumes all probes have 32 electrodes. Consider using the actual electrode count from probe.ElectrodeConfig.Electrode to make this more robust:

num_electrodes = len(probe.ElectrodeConfig.Electrode & electrode_config_key)
lookup = np.empty(num_electrodes, dtype=int)

This would handle probes with different electrode counts.

Comment thread .gitignore Outdated
# Mac
.DS_Store

# Documentation
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Why ignore README.md?

Adding README.md to .gitignore is unusual - the README is typically an important tracked file. Could you clarify the reason for this?

If the intention is to have a local-only README, consider using a different filename like README.local.md instead.

Comment thread notebooks/test_worker.ipynb Outdated
@@ -0,0 +1,5910 @@
{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Clear notebook outputs before merge

This notebook has 5910 lines with embedded cell outputs. Consider clearing outputs before merging to:

  • Reduce file size
  • Avoid tracking output data in version control
  • Make diffs cleaner for future changes

You can clear outputs with: jupyter nbconvert --clear-output --inplace notebooks/test_worker.ipynb

@MilagrosMarin
Copy link
Copy Markdown
Contributor

Hi @judewerth, I suggest closing this PR as it's been superseded by PR #211, where you reorganized the tables from frame.py into their logical homes (analysis.py, mua.py, culture.py, frame.py) — which is what you were asking about in the Jan 10 comment here and what we discussed and reviewed together.

Two tables from this PR are not in #211: ImpedanceFile/ImpedanceMeasurements and STTC. Are those planned for a follow-up PR?

Also, regarding the MUA processing errors for O15 (2,150 FileNotFoundError jobs from Jan 7) — is that resolved, or does it still need attention?

Thanks for the original draft and the PowerPoint — they were very helpful for understanding the design evolution into #211.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants