GZ uses unique_id str on subject metadata to uniquely identify the subject in the research domain context, e.g.
|
def unique_id |
|
unique_id = payload.dig('subject', 'metadata', '#name') |
|
return unique_id if unique_id |
|
|
|
# staging has older data with different subject metadata - fallback to handling this special env case |
|
payload.dig('subject', 'metadata', '!SDSS_ID') if Rails.env.staging? || Rails.env.test? |
|
end |
This is the data that flows into the catalgoues and ML systems to uniquely identify the subjects, not the subject_id in our systems. As such we'll need to have this attribute added to the subjects table with a unique index and backfilled when importing the subject data to the system.
One solution is to add the metadata import on the subject backfiller job,
|
Import::SubjectLocations.new(subject).run |
. Alternatively this metadata comes through via the caesar reductions payload, we can use this flow of data to extract the information as it comes through.
We can then use this field to uniquely identify the subject linkage when importing / upsert ML results (vector representations, predictions etc).
GZ uses
unique_idstr on subject metadata to uniquely identify the subject in the research domain context, e.g.kade/app/services/import/reduction.rb
Lines 68 to 74 in bcb057c
This is the data that flows into the catalgoues and ML systems to uniquely identify the subjects, not the subject_id in our systems. As such we'll need to have this attribute added to the subjects table with a unique index and backfilled when importing the subject data to the system.
One solution is to add the metadata import on the subject backfiller job,
kade/app/sidekiq/subject_backfiller_job.rb
Line 8 in bcb057c
We can then use this field to uniquely identify the subject linkage when importing / upsert ML results (vector representations, predictions etc).