Conversation
Signed-off-by: Prafulla Mahindrakar <prafulla.mahindrakar@gmail.com>
Codecov Report
@@ Coverage Diff @@
## master #394 +/- ##
==========================================
+ Coverage 60.80% 61.38% +0.57%
==========================================
Files 151 154 +3
Lines 10799 11088 +289
==========================================
+ Hits 6566 6806 +240
- Misses 3529 3578 +49
Partials 704 704
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Signed-off-by: Prafulla Mahindrakar <prafulla.mahindrakar@gmail.com>
katrogan
left a comment
There was a problem hiding this comment.
can the same issue occur in other group by queries like ListWorkflowIds, ListLaunchPlanIds, etc - should we fix those too?
|
|
||
| // Apply consistent sort ordering. | ||
| if input.SortParameter != nil { | ||
| identifierGroupByWithOrderKey := fmt.Sprintf("%s, %s, %s, %s", Project, Domain, Name, input.SortParameter.GetSortKey()) |
There was a problem hiding this comment.
should we check that the sort parameter isn't already in (Project, Domain, Name)?
There was a problem hiding this comment.
Have removed this functionality of dynamically adding the column and instead made it part of the identifierGroupBy which takes care of fixing ListWorkflowIds , ListLaunchPlanIds, ListTaskIdentifiers
Signed-off-by: Prafulla Mahindrakar <prafulla.mahindrakar@gmail.com>
Signed-off-by: Prafulla Mahindrakar <prafulla.mahindrakar@gmail.com>
Signed-off-by: Prafulla Mahindrakar <prafulla.mahindrakar@gmail.com>
| const filters = "filters" | ||
|
|
||
| var identifierGroupBy = fmt.Sprintf("%s, %s, %s", Project, Domain, Name) | ||
| var identifierGroupBy = fmt.Sprintf("%s, %s, %s, %s", Project, Domain, Name, CreatedAt) |
There was a problem hiding this comment.
so we always group by created at now? what if someone queries by updated_at or some other field?
There was a problem hiding this comment.
Facing issue with group by . Let me dig deeper.
I am planning to remove the group by instead do the following queries which should be equivalent
explain SELECT
Distinct
entities.project,
entities.domain,
entities.name,
'2' AS resource_type,
named_entity_metadata.description,
named_entity_metadata.state
FROM "named_entity_metadata"
RIGHT JOIN (
select distinct project,domain,name
from (
SELECT project,domain,name, created_at
FROM "tasks"
WHERE "domain" = 'development' AND "project" = 'admintests' order by created_at
) t limit 2 offset 2) AS entities ON named_entity_metadata.resource_type = 2
AND named_entity_metadata.project = entities.project
AND named_entity_metadata.domain = entities.domain
AND named_entity_metadata.name = entities.name
WHERE COALESCE(named_entity_metadata.state, 0) = '0' ;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=9.82..9.84 rows=1 width=581)
-> Sort (cost=9.82..9.83 rows=1 width=581)
Sort Key: tasks.project, tasks.domain, tasks.name, named_entity_metadata.description, named_entity_metadata.state
-> Nested Loop Left Join (cost=1.76..9.81 rows=1 width=581)
Join Filter: ((named_entity_metadata.project = tasks.project) AND (named_entity_metadata.domain = tasks.domain) AND (named_entity_metadata.name = tasks.name))
Filter: (COALESCE(named_entity_metadata.state, 0) = 0)
-> Limit (cost=1.61..1.62 rows=1 width=29)
-> HashAggregate (cost=1.59..1.61 rows=2 width=29)
Group Key: tasks.project, tasks.domain, tasks.name
-> Sort (cost=1.41..1.44 rows=9 width=37)
Sort Key: tasks.created_at
-> Seq Scan on tasks (cost=0.00..1.27 rows=9 width=37)
Filter: ((domain = 'development'::text) AND (project = 'admintests'::text))
-> Index Scan using named_entity_metadata_type_project_domain_name_idx on named_entity_metadata (cost=0.14..8.16 rows=1 width=616)
Index Cond: (resource_type = 2)
(15 rows)
Also the query plan look more efficient
One with group by
explain SELECT
entities.project,
entities.domain,
entities.name,
'2' AS resource_type,
named_entity_metadata.description,
named_entity_metadata.state
FROM "named_entity_metadata"
RIGHT JOIN (SELECT project,domain,name FROM "workflows" WHERE "domain" = 'development' AND "project" = 'admintests' GROUP BY project, domain, name, created_at ORDER BY created_at LIMIT 2 offset 2) AS entities ON named_entity_metadata.resource_type = 2
AND named_entity_metadata.project = entities.project
AND named_entity_metadata.domain = entities.domain
AND named_entity_metadata.name = entities.name
WHERE COALESCE(named_entity_metadata.state, 0) = '0'
GROUP BY "created_at",entities.project, entities.domain, entities.name, named_entity_metadata.description, named_entity_metadata.state
ORDER BY created_at desc;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Group (cost=9.70..9.72 rows=1 width=589)
Group Key: named_entity_metadata.created_at, workflows.project, workflows.domain, workflows.name, named_entity_metadata.description, named_entity_metadata.state
-> Sort (cost=9.70..9.71 rows=1 width=557)
Sort Key: named_entity_metadata.created_at DESC, workflows.project, workflows.domain, workflows.name, named_entity_metadata.description, named_entity_metadata.state
-> Nested Loop Left Join (cost=1.58..9.69 rows=1 width=557)
Join Filter: ((named_entity_metadata.project = workflows.project) AND (named_entity_metadata.domain = workflows.domain) AND (named_entity_metadata.name = workflows.name))
Filter: (COALESCE(named_entity_metadata.state, 0) = 0)
-> Limit (cost=1.44..1.46 rows=2 width=37)
-> Group (cost=1.41..1.53 rows=9 width=37)
Group Key: workflows.created_at, workflows.project, workflows.domain, workflows.name
-> Sort (cost=1.41..1.44 rows=9 width=37)
Sort Key: workflows.created_at, workflows.name
-> Seq Scan on workflows (cost=0.00..1.27 rows=9 width=37)
Filter: ((domain = 'development'::text) AND (project = 'admintests'::text))
-> Materialize (cost=0.14..8.17 rows=1 width=624)
-> Index Scan using named_entity_metadata_type_project_domain_name_idx on named_entity_metadata (cost=0.14..8.16 rows=1 width=624)
Index Cond: (resource_type = 2)
We dont need an additional sort for the same key in both inner and outer sql queries
There was a problem hiding this comment.
just a note we still need something equivalent to the WHERE COALESCE(named_entity_metadata.state, 0) = '0' clause to filter by named entity state
re:
RIGHT JOIN (
select distinct project,domain,name
from (
SELECT project,domain,name, created_at
FROM "tasks"
WHERE "domain" = 'development' AND "project" = 'admintests' order by created_at
) t limit 2 offset 2) AS entities ON named_entity_metadata.resource_type = 2
why do we need to do the nested select? can't we right join directly on the result of SELECT DISTINCT project, domain, name, created_at
Signed-off-by: Prafulla Mahindrakar prafulla.mahindrakar@gmail.com
TL;DR
The column used in ordering should be part of the group by clause aswell and hence have added the created_at column too.
This wasn't happening for created_at column which was absent.
Only the record sets cannot be orderd by state since those dont exist workflow table.
With the fix :
Type
Are all requirements met?
Complete description
How did you fix the bug, make the feature etc. Link to any design docs etc
Tracking Issue
fixes flyteorg/flyte#2327
Follow-up issue
NA