Skip to content

Vibe-coded ingestion of stuff from 13 polymer papers#8

Open
justaddcoffee wants to merge 8 commits into
mainfrom
yolo_ingestion
Open

Vibe-coded ingestion of stuff from 13 polymer papers#8
justaddcoffee wants to merge 8 commits into
mainfrom
yolo_ingestion

Conversation

@justaddcoffee
Copy link
Copy Markdown
Collaborator

@justaddcoffee justaddcoffee commented May 2, 2025

Again just an experiment to see how well this works

@realmarcin @cmungall @ct-parker @davramov @azaidi06

@justaddcoffee
Copy link
Copy Markdown
Collaborator Author

Prompt:
"I want you to make nodes and edges for our KG project using 1) a directory of 83 scientific papers in ./polymer_papers (in the base directory) and 2) the data schema for this project, which lives here: src/matkg/schema/matkg_schema.yaml. You can make as many nodes and edges as you want, but they should be supported by the scientific paper. Prioritize nodes and edges that seem most important, and that reoccur in several papers. There are lots of examples of nodes and edges in kg/. "

@justaddcoffee
Copy link
Copy Markdown
Collaborator Author

Some observations:

  • this didn't produce as many nodes / edges as I'd have expected
  • new associations have DOIs, which is good
  • we should watch out for hallucinations in has_evidence - for 1-2 spot checked examples, the exact string did not seem to be present in the pdf, and in some cases may be dubious?

@justaddcoffee justaddcoffee changed the title Vibe-coded ingestion of stuff from 83 polymer papers Vibe-coded ingestion of stuff from 5 polymer papers May 2, 2025
Justin Reese added 5 commits May 9, 2025 13:51
…with this prompt

I want you to make nodes and edges for our KG project using 1) a directory of 83 scientific papers in ./polymer_papers (in the base directory) and 2) the data schema for this project, which lives here: src/matkg/schema/matkg_schema.yaml. Process each paper one by one. For each paper, you can make as many nodes and edges as you want, but they should be supported by the scientific paper. Prioritize nodes and edges that seem most important to the overall point of the paper. There are lots of examples of nodes and edges in kg/.

Here's a claude code'd summary of the changes in this commit:

I've processed the following papers so far, extracting information about materials, properties, experimental techniques, and processing methods:

  1. 10.1002/adfm.201002014 - Quinoidal copolymers
  2. 10.1002/adfm.201304216 - PBTIBDTTT polymer
  3. 10.1002/adfm.201801874 - pAQM series polymers
  4. 10.1002/adfm.201902238 - PBDTTPD and PffBT4T-2OD
  5. 10.1002/adfm.202112511 - IIDDT and IIDFDT polymers
  6. 10.1002/adfm.202305611 - PM6:Y6:PBI-BO
  7. 10.1002/advs.201500095 - P3HT:PCBM with nonhalogenated solvents
  8. 10.1002/advs.201600117 - PSeBDT-BDD and SdiPBI-S
  9. 10.1002/adma.201604603 - PII2T-HD polymer
  10. 10.1002/adma.201803045 - PDPP4T doping with F4TCNQ
  11. 10.1002/aenm.201701942 - PNDI2OD-T2 polymer
  12. 10.1002/aenm.201702831 - PBDTT-F-TT polymer
  13. 10.1002/anie.202115585 - PBDT-BQQF and PBDT-BQF quinoidal polymers

  These papers are now represented in the materials knowledge graph, covering:
  - Various conjugated polymers (regular and quinoidal)
  - Electron acceptors (fullerenes and non-fullerene acceptors)
  - Electronic properties (mobility, band gap, HOMO/LUMO levels)
  - Processing methods (thermal annealing, solution processing)
  - Experimental techniques (GIWAXS, UPS, UV-Vis)
  - Device types (OFETs, OPVs)
@justaddcoffee justaddcoffee changed the title Vibe-coded ingestion of stuff from 5 polymer papers Vibe-coded ingestion of stuff from 13 polymer papers May 9, 2025
@justaddcoffee
Copy link
Copy Markdown
Collaborator Author

@davramov @ct-parker @realmarcin @cmungall @azaidi06
I've rolled back previous vibe-coded papers, then re-vibe coded 13 papers:

10.1002/adfm.201002014
10.1002/adfm.201304216
10.1002/adfm.201801874
10.1002/adfm.201902238
10.1002/adfm.202112511
10.1002/adfm.202305611
10.1002/advs.201500095
10.1002/advs.201600117
10.1002/adma.201604603
10.1002/adma.201803045
10.1002/aenm.201701942
10.1002/aenm.201702831
10.1002/anie.202115585

with this (similar but better) prompt:

I want you to make nodes and edges for our KG project using 1) a directory of 83 scientific papers 
in ./polymer_papers (in the base directory) and 2) the data schema for this project, which lives 
here: src/matkg/schema/matkg_schema.yaml. Process each paper one by one. For each paper, 
you can make as many nodes and edges as you want, but they should be supported by the 
scientific paper. Prioritize nodes and edges that seem most important to the overall point of 
the paper. There are lots of examples of nodes and edges in kg/.

and then rebuilt the kg (106 nodes and 71 edges), which is here:
kg/matkg_graph.json

Maybe we could have a look at some specific papers and the nodes/edges it extracted, just for a manual QC check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant