Adding frame.py, functionality for window extraction and coherence analysis#196
Adding frame.py, functionality for window extraction and coherence analysis#196judewerth wants to merge 0 commit into
Conversation
MilagrosMarin
left a comment
There was a problem hiding this comment.
@judewerth, thank you for providing this pipeline expansion for coherence analyses. The overall pipeline design seems well thought out and aligned with the rest of the pipelines. @ttngu207 and I have reviewed this PR, and we are impressed with your learning and progress on the pipeline design. I have left a few comments on our discussion today. Please feel free to test this new pipeline in the database, but run only 1–2 jobs at a time to avoid overwhelming the worker with untested code.
This PR's feedback is mainly focused on improving the clarity, usability, and integration of your proposed changes to the pipeline:
- We suggest more meaningful names for the new schema, some tables, and some attributes, to be more related and intuitive to the analysis being performed.
- Review the total number of primary keys in the downstream pipeline, which can hurt query usability. Consider reducing the number of primary key attributes in downstream tables (e.g., by optimizing the computation to use
ephys.LFPinstead ofephys.LFP.TraceinCoherencetable). - Ensure foreign key references between tables (e.g., linking burst detection parameters to active timeframes) are properly defined.
- Verify the new function for channel-to-electrode mapping is compatible with the existing channel mapping functionality in the pipeline.
Overall, this is a valuable addition to the pipeline, and I am happy to discuss these suggestions further in our meeting tomorrow and provide any additional guidance as needed. Thanks!
| # correctly map electrode indices | ||
| electrode_ids = lookup[channel_ids] | ||
|
|
||
| return electrode_ids No newline at end of file |
There was a problem hiding this comment.
Please ensure that this helper function, map_channel_to_electrode, is compatible with the channel mapping in the ephys pipeline.
|
|
||
|
|
||
| # Set up schema (connects to database and manages table creation) | ||
| schema = dj.schema(DB_PREFIX + "frame") |
There was a problem hiding this comment.
Let's consider a more meaningful name for the schema instead of frame; what about coherence or a similar option?
|
|
||
| definition = """ | ||
| -> ActiveTimeFrames | ||
| burst_param_idx: int # Reference to BurstDetectionParamset |
There was a problem hiding this comment.
It seems that the reference to BurstDetectionParamset is missing as a foreign attribute here instead of defining burst_param_idx.
|
|
||
| # Define Manual Tables | ||
| @schema | ||
| class AnalysisBoundaries(dj.Manual): |
There was a problem hiding this comment.
What about AnalysisBlock or AnalysisSegment?
|
|
||
| definition = """ | ||
| -> AnalysisBoundaries | ||
| start_time: datetime # Start of active time frame |
There was a problem hiding this comment.
Let's use explicit attribute names, e.g., active_start_time, active_end_time.
|
|
||
| definition = """ | ||
| -> ActiveTimeFrames | ||
| -> ephys.LFP.Trace |
There was a problem hiding this comment.
Should this foreign key be ephys.LFP.Trace? The electrode attribute in Synchrony is unnecessary, since the foreign key to LFP.Trace here in Coherence already indicates electrode information. Moreover, referencing ephys.LFP.Trace here (with electrode details) would make the primary key too long, reducing query efficiency. Could you optimize the code to reference ephys.LFP here instead, and add individual electrode entries in the Synchrony table?
|
@MilagrosMarin @ttngu207 Additionally, I propose a strategy to process the MUA data to make this analysis possible. We focus on the first 2 days (already processed), 2 days before drug treatments, and drug treatments. However, I'm interested to hear your thoughts. If we agree to move forward with this, I can begin inserting MUA Sessions. Next steps:
It takes ~9 seconds to process a minute of recording per organoid If we were to process every file with 15 workers, a conservative estimate is 50 days. Pretty long.
However, if we were to process only the first 2 days, the 2 days before drug treatments, and 1 day of drug treatments, it would only take about 10 days.
|
|
Hello @MilagrosMarin and @ttngu207, In summary, this is my proposed additions to the pipeline in preparation for the methods paper that will include the GBM analysis (my previous results have been through independent code, these changes can get those results through the pipeline). What I would appreciate from y'all:
Best, |
|
Hello, I would like to start the process of inserting the MUA Data which is necessary to run the frame pipeline. I've added a new notebook based on the CREATE_new_session notebook to explain and streamline this process (CREATE_new_MUA_session). Along with this push I've inserted two days worth of data (organoid 15) into the MUAEphysSession table. If possible please change the number of workers to what you believe to be best. Let me know how this process goes, if all goes well I will add new organoids. Best, |
Hi @judewerth, Regarding the operational status of the new |
|
Hi Milagros, Thank you for the message. The insert code in the notebook relys on ephys.RawFile to extract file times. I believe all MUA entries contain files in ephys.RawFile which I verified a handful are seen on our Desktop. I've provided a screenshot in the issue #202. Best, |
|
Like we discussed in the meeting, I will be using this pipeline manually to analyze the first 2 days. I've just pushed a new version including STTFA (spike LFP coupling) and STTC (single unit functional connectivity) along with a notebook showcasing these tables (temporary). While I'm doing that please keep me updated regarding the MUA processing error in the backend as well as how I can help get this code moved from frame.py to the correct schema (analysis, ephys, and mua). Best, |
MilagrosMarin
left a comment
There was a problem hiding this comment.
Initial Review
Thanks for this comprehensive addition! I've added some inline comments on issues that need to be addressed before merging.
Summary of Issues:
- Critical bugs: Missing parentheses on
.popcalls - Missing dependencies: Several packages not in pyproject.toml
- Hardcoded configuration: S3 store config should not be in schema module
- README changes: Removes important documentation
Happy to discuss any of these points!
| f"Multiple Port IDs found for the {key} - cannot determine the port ID" | ||
| ) | ||
| port_id = port_id.pop | ||
|
|
There was a problem hiding this comment.
🐛 Bug: Missing parentheses on .pop() call
This assigns the method object itself to port_id, not the result of calling it.
port_id = port_id.pop # ❌ Returns method object
port_id = port_id.pop() # ✅ Returns the valueThis will cause a runtime error when port_id is used later.
| f"Multiple Probes found for the {key} - cannot determine the probe name" | ||
| ) | ||
| probe_name = probe_name.pop | ||
|
|
There was a problem hiding this comment.
Bug: Missing parentheses on .pop() call
Same issue as line 156 - should be probe_name.pop() with parentheses.
Also, probe_name is fetched but never used in this method. Is this intentional or should it be used in the commented-out EphysSessionProbe.insert1() below?
| import plotly.io as pio | ||
| import neo | ||
| import quantities as pq | ||
| from elephant.spike_train_correlation import spike_time_tiling_coefficient |
There was a problem hiding this comment.
Missing dependencies in pyproject.toml
The following packages are imported but not listed in pyproject.toml:
bottleneck(line 11)specparam(line 16)neo(line 21)quantities(line 22)elephant(line 23)
Please add these to the dependencies list to ensure the package installs correctly.
|
|
||
| # create lookup to convert | ||
| lookup = np.empty(32, dtype=int) | ||
| lookup[channel_mapping] = electrode_mapping |
There was a problem hiding this comment.
Hardcoded electrode count
The value 32 is hardcoded here, which assumes all probes have 32 electrodes. Consider using the actual electrode count from probe.ElectrodeConfig.Electrode to make this more robust:
num_electrodes = len(probe.ElectrodeConfig.Electrode & electrode_config_key)
lookup = np.empty(num_electrodes, dtype=int)This would handle probes with different electrode counts.
| # Mac | ||
| .DS_Store | ||
|
|
||
| # Documentation |
There was a problem hiding this comment.
Question: Why ignore README.md?
Adding README.md to .gitignore is unusual - the README is typically an important tracked file. Could you clarify the reason for this?
If the intention is to have a local-only README, consider using a different filename like README.local.md instead.
| @@ -0,0 +1,5910 @@ | |||
| { | |||
There was a problem hiding this comment.
Suggestion: Clear notebook outputs before merge
This notebook has 5910 lines with embedded cell outputs. Consider clearing outputs before merging to:
- Reduce file size
- Avoid tracking output data in version control
- Make diffs cleaner for future changes
You can clear outputs with: jupyter nbconvert --clear-output --inplace notebooks/test_worker.ipynb
|
Hi @judewerth, I suggest closing this PR as it's been superseded by PR #211, where you reorganized the tables from Two tables from this PR are not in #211: Also, regarding the MUA processing errors for O15 (2,150 FileNotFoundError jobs from Jan 7) — is that resolved, or does it still need attention? Thanks for the original draft and the PowerPoint — they were very helpful for understanding the design evolution into #211. |




Here is my draft for a new pipeline for coherence analysis. However this hasn't been tested, @MilagrosMarin when you come back I would appreciate a coding meeting where we can talk about how to improve, test, and integrate this code into the main repository.
New DataJoint Pipeline.pdf