Skip to content

Commit 82c841d

Browse files
authored
Merge pull request #67 from TechnologyEnhancedLearning/TD-6798-Stored-Proc-Test-Exploration
trigger git copilot pr
2 parents 4e75bac + 9c55d46 commit 82c841d

40 files changed

Lines changed: 534 additions & 64 deletions
File renamed without changes.

.github/PULL_REQUEST_TEMPLATE.md

Lines changed: 116 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,126 @@
1-
# sections
1+
Go to bottom for pull request template to complete
22

3-
## Contribute to code quality and the future
4-
- tests
5-
- refactor
6-
- reuseable code
7-
- seperated
8-
Links
3+
# Peer Review Guidance
94

10-
## Contrribute to tooling
11-
- AI
5+
## Who
6+
- everyone should get to be reviewed
7+
- everyone should get to review others
8+
- hierachyless
129

13-
## Contribute to knowledge
10+
## What it should be
11+
- Collaborative :) :)
12+
- Opportunity to discuss interesting approaches
13+
- Share knowledge generally
14+
- Share knowledge specifically
15+
- this is the opportunity to see new functions that may be used in future tasks to save work
16+
- there may be an easier way of solving a specific problem
17+
- Celebrate work
18+
- Be supportive
19+
- Be seen
20+
- An excuse for a call and a cup of tea
21+
- If there is nothing to suggest you havent failed at doing a review, you dont have to find something, its just good work, so share that, or a picture of your cat
22+
- A gate, work needs to be interacted with, by a few ppl before going out, we are looking for done not perfect, suggestions and discussion will often result in things useful for next time and learning rather than a change. Sometimes work is needed but it is an additional task, or at this point it maybe even a refactor task for example bring two bits of work in line where they can now share logic. However, sometimes too you will agree to do an extra commit because a change is low effort high value, or required. (Be careful of tasks growing from being atomic at this point in the process.)
23+
- It can be nice practice for reviewers to tell the person who did the work they can now merge, its their work afterall. Some prefer people to just merge it for them.
24+
25+
## What it shoudnt be
26+
- Unless requested linting is better done by AI or linters, no one likes their spelling, grammar etc picked up unless its requested
27+
- Aiming for perfection
28+
- an assessment
29+
30+
Ofter PR becomes a waving-through excerise in teams because
31+
- its uncomfortable to make comments
32+
- the context isnt enough to understand the work
33+
- the task was too big to review easily
34+
- or peer reviewers see it as not their work but an extra
35+
And if this happens teams dont get the benefit of others practice, knowledge of areas theyve not written themselves, appreciation from someone who understands the work, and a natural slow unification/standardisation of approaches. So its important to make it an enjoyable priority.
36+
37+
## refs
38+
[first google but its ok its tips on good peer reviews](https://smartbear.com/learn/code-review/best-practices-for-peer-code-review/)
39+
40+
41+
# Pull Request Guidance
42+
- You can open pull requests as drafts if you like
43+
44+
## Good things to do before a pull request
45+
These are not expectations, as the process become more practiced we will learn what should happen most times and what shouldnt. It is a way of having a joint concept of "good". And as a joint concept this doc should be changed overtime. (If it isnt then it will be because it failed).
46+
47+
### Sanity Checks
48+
These ar not show stoppers just good to check
49+
- checked commit names (some ppl like to squash commits to tidy them up but most dont)
50+
- commit per thing done
51+
- commit naming convention followed
52+
- branch correctly named
53+
- searched for any personal codes you use to leave notes for yourself when developing e.g. "zzzz" to help you tidy up
54+
55+
### Contribute to code quality and the future
56+
- lint (Run_lint targeted files and make improvements)
57+
- tests (Run tests in dbx, write test)
58+
- refactor (Can code be made more reuseable, testable)
59+
- reuseable (Could you reuse existing code or refactor it to be reuseable)
60+
61+
### Contribute to tooling
62+
- AI (add to dbx global, your dbx user name area, or github, context instruction md e.g preference in responses, recommendation it should or shouldnt make or providing context for files it doesnt understand)
63+
- Turned exploratory code you have ran into labeled notebooks to share so others can easily run the same explorations
64+
- Turned exploratory code you have ran into manual tests other developers can run
65+
66+
### Contribute to knowledge
1467
- comments in code
1568
- comments in jira task
1669
- documentation in project
17-
- documentation in confluence
1870
- discussion in pr
1971

20-
## Next steps
21-
- any tasks need creating from discussion
72+
### Contribute to planning
73+
- if this task highlighted a need for any task to be created please make these recommendations or open discussions
74+
75+
76+
# Pull Request Form (Please Complete)
77+
*based on https://github.com/TechnologyEnhancedLearning/LearningHub.Nhs.WebUI/blob/master/.github/pull_request_template.md?plain=1*
78+
79+
80+
81+
## JIRA link
82+
Change this to the jira link to help your reviewers
83+
[TD-####](https://hee-tis.atlassian.net/browse/TD-####)
84+
85+
## Description
86+
_Describe what has changed and how that will affect the app. If relevant, add links to any sources/documentation you used. Highlight anything unusual and give people context around particular decisions._
87+
88+
## Screenshots
89+
_Paste Screen Shots Here (This can be useful sometimes but is more useful for WebUI its totally optional.)_
90+
91+
## Checklists
92+
93+
### Checklist for the Author
94+
95+
This check list is to help you and your peer reviewers have a shared context.
96+
97+
It is not an expectation that everything will be ticked or even most, but a useful prompt if for example a review thinks unit-tests would of been useful, which is a comment that would be in the main comments rather than a comment on the file changes tab.
98+
99+
Peer reviewers should not fail based on this list, it is agile, we are looking for done.
100+
Recommendation may be for future practice, for future refactor tasks, or just good to know. Some may be to refactor somework as part of this ticket. For example maybe a function already existed that does what your doing and there is low effort high benefit to using it instead. Then you would discus, agree to do another commit, make the changes and push.
101+
102+
*Put xs in them so they appear ticked in preview [x]*
103+
- [ ] Checked files changed are the right ones, and they are changing the right things
104+
- [ ] Checked the code by running tests or exploratory code
105+
- [ ] Created Unit Tests
106+
- [ ] Created Integration Tests
107+
- [ ] Created Data Quality Tests
108+
- [ ] Run tests in databricks in my user area
109+
- [ ] Used Spark Expect
110+
- [ ] Update my Jira ticket with useful context notes for testers
111+
- [ ] Documented Work
112+
- [ ] Recommended Jira tickets to the relevant person from needs emerging from this work
113+
- [ ] Thanked testers and reviewers :)
114+
22115

23-
# TODO QQQQ
116+
### Checklist for Peer Reviewer(s)
117+
*there may be many peer reviewers but this is to ensure at least one person has ticked the boxes*
118+
- [ ] Considered if additional tests needed
119+
- [ ] Commented on individual files
120+
- [ ] Commented in general PR
121+
- [ ] Asked curious questions/offered alternative approaches
122+
- [ ] Given praise (if its going to merge its worth praise)
123+
- [ ] Offered to document or request tasks for any areas identified in discussions from the work
124+
- [ ] Agreed additional commits/Approved and let author know
125+
- [ ] Added any additional insight to the jira ticket for the testers
24126

25-
[example from lh](https://github.com/TechnologyEnhancedLearning/LearningHub.Nhs.WebUI/blob/master/.github/pull_request_template.md)
26-
- updating ai files
27-
- look at changed files click the ai star icon top right, ask it to code review locally
28-
- (can i point it at the global instructions?)
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
out of scope
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
out of scope

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,13 +35,14 @@ pip-selfcheck.json
3535
# Scratch / experimental folder
3636
# -----------------------------
3737
scratch/** # ignore all files in scratch
38-
!scratch/README.md # except placeholder README.md
38+
!scratch/scratch-README.md # except placeholder README.md
3939

4040
# Ignore scratch notebooks
4141
# so whenever you want to explore something you can do scratch-phil-ods-thingy and it wont be source controlled.
4242
scratch-*.dbc
4343
scratch-*.ipynb
4444
scratch-*.py
45+
Scratch_*.ipynb
4546

4647
## Data files (we may decide not to do this but will try it early on so its opt in)
4748
*.csv

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ When raising a pull request (PR), please use the `/addinstructions` command in D
1919
---
2020

2121
# Not done
22+
- This document has excellent coverage and should be used to plan next steps and best practice examples [Dataquality](https://www.databricks.com/discover/pages/data-quality-management)
2223
- read https://blogs.perficient.com/2025/03/19/delta-live-tables-and-great-expectations/
2324
- need public repo for branch rules, theyre not tweaked so cant just be exported but
2425
- can set deployment rules
@@ -30,6 +31,7 @@ When raising a pull request (PR), please use the `/addinstructions` command in D
3031
- recommend enable in branch rules
3132
- and require one reviewer
3233
- /addinstructions as a command in databricks ai can work so can put user space or work space instructions
34+
- lakehouse monitoring!
3335

3436
# Potentially incorrect assumptions
3537
- gold is for data ready for dashboard, aggrogated

docs/AI refinements.md

Lines changed: 72 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,36 @@
11
# Notes on what we want from copilot git, and databricks assistant#
22

3+
# ⚠️WARNING⚠️
4+
- Databricks AI has a low limit for request so
5+
- write a detailed prompt to scaffold
6+
- write a detailed prompt to review when done
7+
- write in sql and use to translate to python
8+
- use other external AI for questions not requiring access to files or our specific prebuilt prompts
9+
- prebuilt prompts can be used in external AI aswell
10+
- Git copilot can be used manually but not as an automatic PR tool **yet** without paying for it (though it is planned to be free)
11+
12+
# Using github copilt in the short term
13+
- for now can try [github copilot client](https://github.com/copilot) and select the repo. and use a prompt like
14+
> "Please confirm the name of the most recent pull request on this repo. Then using the .github/copilot-instructions.md as your context prompt provide a peer review of the pull requests code changes, by providing file names and lines."
15+
- on a second screen you may want to go to the pull request and click the file changes tab
16+
- if the bot needs help understanding descibing certain files please consider helping it by adding this to the context file for future, so we can constantly imporve its context
17+
18+
# Tips
19+
- Do few high quality detailed prompts
20+
- forward slash offers some prebuilt context options to select
21+
- using inline ai button will focus on work your on now
22+
- using it from the side bar means you can get it to find for example the closest example of what your trying to do in the project for reference
23+
- tell it what you want it to consider, what you want to achieve, how you want it to help, the type of response you want, what your priorities are
24+
25+
# refs
26+
[copilot in git code review](https://docs.github.com/en/copilot/how-tos/use-copilot-agents/request-a-code-review/use-code-review)
27+
[github custom context repo prompt generator](https://docs.github.com/en/copilot/how-tos/configure-custom-instructions/add-repository-instructions)
28+
29+
# Notes
30+
We are not standardising commit and then failing them based on name in the repo. But we could still provide guidance as the will aid AI too.
31+
E.g. [Angular convention](https://www.conventionalcommits.org/en/v1.0.0-beta.4/) could adapt to be relevant to data process.
32+
33+
334
## Both
435
- recommend spark over using libraries
536
- recommend refactoring to python functions and unit tests where appropriate
@@ -26,4 +57,44 @@ The above was generated by AI. What our teams wants is:
2657
- do not overwhelm the work be clear on how beneficial changes are in what contexts and how difficult to implement so developers can decide what is worth applying.
2758
- if a change is declined and it may be a change we do not want recommending in future suggest an addition to be made to copilot-instructions.md or github/instructions files, the docs/AI-readme.md and the .assisitive in the workspace
2859
- search databricks upto date information before making recommendation about databricks and ensure its relevant for DABs git source controlled solutions with 3 workspaces deployed via git
29-
## Assistant DBX
60+
## Assistant DBX
61+
62+
<!--This is our custom context for databricks assistant it is the global settings, there are also user space ones you can set. This file is not in source control-->
63+
64+
<!--We will analyse our repo the github copilot to build a context prompt which will go here (if you want to see what that will look like look in the github folder in your user area repo, thats where the github ai tool context instructions live)-->
65+
66+
<!--AI generated global context TODO see github one-->
67+
68+
<!--Human added AI context instructions-->
69+
<!--This is just an example we would agree something together-->
70+
71+
### context
72+
- we use dabs
73+
- we have a dev staging prod setup
74+
- dev is for analystist staging for testing team
75+
- we want unit testable code
76+
- we are migrating existing sql to databricks
77+
- databricks is serving warehoused dashboard data and live read data for our lms
78+
- when providing examples ground them in NHS LMS context.
79+
80+
### Response instructions
81+
82+
- Use the `spark.sql()` syntax for data transformations
83+
- Be very supportive especially with python
84+
- in addition to feedback provide questioning prompts to help us learn python. Ask supportive questions, suggest resources to read.
85+
- only recommend very common python libraries and prioritise to some extent vanilla python over packages because we prefer focussing on python over packages where we can for our learning even if it makes it slightly more verbose. We also preffer common packages that AI assistants will be better at managing.
86+
- dont be picky unless asked highlight major issues, detect where the code might by trying to solve the wrong problem and add more details if there arnt large problems so we always improve.
87+
- quantify suggestions with confidence level of why its an issues, what you think the code is for, how important the issue is
88+
- help the user reach the right questions
89+
- check our existing functions first for opportunities to reuse them
90+
91+
### Response context
92+
We use the following python libraries, recommend tools from libraries we already use before recommending new libraries if possible.
93+
94+
### When providing unit tests
95+
- aim for simple tests
96+
- provide alot of comments and explanation
97+
- provide the option of more test to the user if they wish
98+
- recommend other test types
99+
100+

docs/Databricks Tips.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Databricks Tips
2+
3+
- the search will take you outside of your user area careful about editing other peoples work or your bundles (with the correct permissions the former will be less of an issue)
Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
# Context
2+
3+
Standardising naming for commits can be useful for autogenerating versioning changelogs.
4+
- for example this git repo is for a tool that does this, on the right hand side you can see for their package they have release versions and clicking on them you can see a changelog
5+
[semver example using their git repo for their package](https://github.com/semantic-release/semantic-release)
6+
This repo explains the commit naming, and demonstrates it. Here is the list of changes theyve made as a generated changelog [semver package releases changelog](https://github.com/semantic-release/semantic-release/releases)
7+
For branches using jira ticket names and environment names allows git to move tasks across the jira board and to mark if an environment has been successfully deployed.
8+
9+
It can be useful when PRing for humans and AI Prs too.
10+
11+
We may want to define out own standardisation which is databrick and data orientated.
12+
13+
14+
## Commit names convention example
15+
fix(pencil): stop graphite breaking when too much pressure applied
16+
17+
### E.g.
18+
19+
#### FEAT — new capability / behaviour
20+
feat(lms): add derived completion_status for statutory training
21+
feat(pipeline): publish daily LMS compliance snapshot to curated layer
22+
feat(model): expose expiry_date logic for mandatory training
23+
24+
#### FIX — bug / incorrect logic
25+
fix(sql): correct join logic causing duplicate learner records
26+
fix(model): handle null end dates for honorary contracts
27+
fix(lms): resolve incorrect compliance flag for bank staff
28+
29+
#### REFACTOR — same behaviour, better structure
30+
refactor(pipeline): move learner completion logic from SQL to PySpark
31+
refactor(pipeline): replace usp_UpdateLMSCompliance with Databricks job
32+
refactor(model): extract completion calculation into reusable function
33+
refactor(pipeline): decouple LMS rules from ingestion logic
34+
35+
#### TEST — tests only
36+
test(model): add unit tests for completion status calculation
37+
test(pipeline): replace LMS integration test with isolated unit tests
38+
test(lms): cover edge cases for training expiry logic
39+
40+
#### PERF — performance improvements
41+
perf(pipeline): reduce LMS compliance job runtime by optimising joins
42+
perf(sql): remove redundant subqueries from compliance extract
43+
44+
#### CHORE — housekeeping, no behaviour change
45+
chore(notebook): clean up LMS compliance analysis notebook
46+
chore(ci): update Databricks job parameters for LMS pipelines
47+
48+
#### SCHEMA — data shape changes
49+
schema(lms): add staff_group_code to training_completion table
50+
schema(lms)!: rename staff_id to person_id across LMS marts
51+
52+
#### DOCS — documentation only
53+
docs(lms): document compliance logic and ESR alignment assumptions
54+
55+
#### BUILD / CI — tooling & pipelines
56+
ci(lms): run unit tests for PySpark models on pull requests
57+
build(pipeline): parameterise LMS job for multiple environments
58+
59+
60+
## Branch naming
61+
62+
Make branches from main.
63+
64+
TODO: See learning hub branch names and documentation
65+
Something like
66+
67+
feature/<issue-number>-<short-description>
68+
fix/<issue-number>-<short-description>
69+
refactor/<issue-number>-<short-description>
70+
hotfix/<issue-number>-<short-description>
71+
72+
**Branch Naming Examples**
73+
74+
**Feature branches** - New functionality or enhancements:
75+
feature/101-add-learner-dashboard
76+
feature/102-enable-course-progress-tracking
77+
feature/103-integrate-esr-user-data
78+
feature/104-add-training-completion-export
79+
80+
**Fix / bug branches** - Bug fixes or corrections:
81+
fix/201-correct-login-session-timeout
82+
fix/202-resolve-duplicate-learner-rows
83+
fix/203-handle-null-expiry-dates
84+
85+
**Refactor branches** - Refactoring code without changing behaviour:
86+
refactor/301-extract-user-service
87+
refactor/302-move-completion-calculation-to-pyspark
88+
refactor/303-cleanup-lms-notebook
89+
90+
**Hotfix / urgent branches** - Critical issues requiring immediate production deployment:
91+
hotfix/401-critical-login-error
92+
hotfix/402-fix-mandatory-training-bug
93+
94+
**Experiment / spike branches** - Temporary or exploratory work:
95+
spike/501-pyspark-performance-test
96+
spike/502-new-dashboard-ui-prototype
97+

0 commit comments

Comments
 (0)