TechnologyEnhancedLearning
diff --git a/‎.github/ReadMe.md‎ ‎.github/Github-folder-ReadMe.md‎.github/ReadMe.md renamed to .github/Github-folder-ReadMe.md b/‎.github/ReadMe.md‎ ‎.github/Github-folder-ReadMe.md‎.github/ReadMe.md renamed to .github/Github-folder-ReadMe.md
diff --git a/‎.github/PULL_REQUEST_TEMPLATE.md‎
Lines changed: 116 additions & 18 deletions b/‎.github/PULL_REQUEST_TEMPLATE.md‎
Lines changed: 116 additions & 18 deletions
diff --git a/‎.github/workflows/reuseable-dab-deploy.yml-todo‎
Lines changed: 1 addition & 0 deletions b/‎.github/workflows/reuseable-dab-deploy.yml-todo‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎.github/workflows/reuseable-dab-validate.yml-todo‎
Lines changed: 1 addition & 0 deletions b/‎.github/workflows/reuseable-dab-validate.yml-todo‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎.gitignore‎
Lines changed: 2 additions & 1 deletion b/‎.gitignore‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎README.md‎
Lines changed: 2 additions & 0 deletions b/‎README.md‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎docs/AI refinements.md‎
Lines changed: 72 additions & 1 deletion b/‎docs/AI refinements.md‎
Lines changed: 72 additions & 1 deletion
diff --git a/‎docs/Databricks Tips.md‎
Lines changed: 3 additions & 0 deletions b/‎docs/Databricks Tips.md‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎docs/Git branch and commit naming.md‎
Lines changed: 97 additions & 0 deletions b/‎docs/Git branch and commit naming.md‎
Lines changed: 97 additions & 0 deletions
@@ -1,28 +1,126 @@
-# sections
+Go to bottom for pull request template to complete
 
-## Contribute to code quality and the future
-- tests
-- refactor
-- reuseable code
-- seperated
-Links
+# Peer Review Guidance
 
-## Contrribute to tooling
-- AI
+## Who
+- everyone should get to be reviewed
+- everyone should get to review others
+- hierachyless
 
-## Contribute to knowledge
+## What it should be
+- Collaborative :) :)
+- Opportunity to discuss interesting approaches
+- Share knowledge generally
+- Share knowledge specifically
+  - this is the opportunity to see new functions that may be used in future tasks to save work
+  - there may be an easier way of solving a specific problem
+- Celebrate work
+- Be supportive
+- Be seen
+- An excuse for a call and a cup of tea
+- If there is nothing to suggest you havent failed at doing a review, you dont have to find something, its just good work, so share that, or a picture of your cat
+- A gate, work needs to be interacted with, by a few ppl before going out, we are looking for done not perfect, suggestions and discussion will often result in things useful for next time and learning rather than a change. Sometimes work is needed but it is an additional task, or at this point it maybe even a refactor task for example bring two bits of work in line where they can now share logic. However, sometimes too you will agree to do an extra commit because a change is low effort high value, or required. (Be careful of tasks growing from being atomic at this point in the process.)
+- It can be nice practice for reviewers to tell the person who did the work they can now merge, its their work afterall. Some prefer people to just merge it for them.
+
+## What it shoudnt be
+- Unless requested linting is better done by AI or linters, no one likes their spelling, grammar etc picked up unless its requested
+- Aiming for perfection
+- an assessment
+
+Ofter PR becomes a waving-through excerise in teams because
+- its uncomfortable to make comments
+- the context isnt enough to understand the work
+- the task was too big to review easily
+- or peer reviewers see it as not their work but an extra
+And if this happens teams dont get the benefit of others practice, knowledge of areas theyve not written themselves, appreciation from someone who understands the work, and a natural slow unification/standardisation of approaches. So its important to make it an enjoyable priority.
+
+## refs
+[first google but its ok its tips on good peer reviews](https://smartbear.com/learn/code-review/best-practices-for-peer-code-review/)
+
+
+# Pull Request Guidance
+- You can open pull requests as drafts if you like
+
+## Good things to do before a pull request 
+These are not expectations, as the process become more practiced we will learn what should happen most times and what shouldnt. It is a way of having a joint concept of "good". And as a joint concept this doc should be changed overtime. (If it isnt then it will be because it failed).
+
+### Sanity Checks
+These ar not show stoppers just good to check
+- checked commit names (some ppl like to squash commits to tidy them up but most dont)
+  - commit per thing done
+  - commit naming convention followed
+- branch correctly named
+- searched for any personal codes you use to leave notes for yourself when developing e.g. "zzzz" to help you tidy up
+
+### Contribute to code quality and the future
+- lint (Run_lint targeted files and make improvements)
+- tests (Run tests in dbx, write test)
+- refactor (Can code be made more reuseable, testable)
+- reuseable (Could you reuse existing code or refactor it to be reuseable)
+
+### Contribute to tooling
+- AI (add to dbx global, your dbx user name area, or github, context instruction md e.g preference in responses, recommendation it should or shouldnt make or providing context for files it doesnt understand)
+- Turned exploratory code you have ran into labeled notebooks to share so others can easily run the same explorations
+- Turned exploratory code you have ran into manual tests other developers can run
+
+### Contribute to knowledge
 - comments in code
 - comments in jira task
 - documentation in project
-- documentation in confluence
 - discussion in pr
 
-## Next steps
-- any tasks need creating from discussion
+### Contribute to planning
+- if this task highlighted a need for any task to be created please make these recommendations or open discussions
+
+
+# Pull Request Form (Please Complete)
+*based on https://github.com/TechnologyEnhancedLearning/LearningHub.Nhs.WebUI/blob/master/.github/pull_request_template.md?plain=1*
+
+
+
+## JIRA link
+Change this to the jira link to help your reviewers
+[TD-####](https://hee-tis.atlassian.net/browse/TD-####)
+
+## Description
+_Describe what has changed and how that will affect the app. If relevant, add links to any sources/documentation you used. Highlight anything unusual and give people context around particular decisions._
+
+## Screenshots
+_Paste Screen Shots Here (This can be useful sometimes but is more useful for WebUI its totally optional.)_
+
+## Checklists
+
+### Checklist for the Author
+
+This check list is to help you and your peer reviewers have a shared context.
+
+It is not an expectation that everything will be ticked or even most, but a useful prompt if for example a review thinks unit-tests would of been useful, which is a comment that would be in the main comments rather than a comment on the file changes tab.
+
+Peer reviewers should not fail based on this list, it is agile, we are looking for done.
+Recommendation may be for future practice, for future refactor tasks, or just good to know. Some may be to refactor somework as part of this ticket. For example maybe a function already existed that does what your doing and there is low effort high benefit to using it instead. Then you would discus, agree to do another commit, make the changes and push.
+
+*Put xs in them so they appear ticked in preview [x]*
+- [ ] Checked files changed are the right ones, and they are changing the right things
+- [ ] Checked the code by running tests or exploratory code
+- [ ] Created Unit Tests
+- [ ] Created Integration Tests
+- [ ] Created Data Quality Tests
+- [ ] Run tests in databricks in my user area
+- [ ] Used Spark Expect
+- [ ] Update my Jira ticket with useful context notes for testers
+- [ ] Documented Work
+- [ ] Recommended Jira tickets to the relevant person from needs emerging from this work
+- [ ] Thanked testers and reviewers :)
+
 
-# TODO QQQQ
+### Checklist for Peer Reviewer(s) 
+*there may be many peer reviewers but this is to ensure at least one person has ticked the boxes*
+- [ ] Considered if additional tests needed
+- [ ] Commented on individual files
+- [ ] Commented in general PR
+- [ ] Asked curious questions/offered alternative approaches
+- [ ] Given praise (if its going to merge its worth praise)
+- [ ] Offered to document or request tasks for any areas identified in discussions from the work
+- [ ] Agreed additional commits/Approved and let author know
+- [ ] Added any additional insight to the jira ticket for the testers
 
-[example from lh](https://github.com/TechnologyEnhancedLearning/LearningHub.Nhs.WebUI/blob/master/.github/pull_request_template.md)
-- updating ai files
-- look at changed files click the ai star icon top right, ask it to code review locally
-  - (can i point it at the global instructions?)
 
@@ -0,0 +1 @@
+out of scope
@@ -0,0 +1 @@
+out of scope
@@ -35,13 +35,14 @@ pip-selfcheck.json
 # Scratch / experimental folder
 # -----------------------------
 scratch/**            # ignore all files in scratch
-!scratch/README.md    # except placeholder README.md
+!scratch/scratch-README.md    # except placeholder README.md
 
 # Ignore scratch notebooks
 # so whenever you want to explore something you can do scratch-phil-ods-thingy and it wont be source controlled.
 scratch-*.dbc
 scratch-*.ipynb
 scratch-*.py
+Scratch_*.ipynb
 
 ## Data files (we may decide not to do this but will try it early on so its opt in)
 *.csv
 
@@ -19,6 +19,7 @@ When raising a pull request (PR), please use the `/addinstructions` command in D
 ---
 
 # Not done
+- This document has excellent coverage and should be used to plan next steps and best practice examples [Dataquality](https://www.databricks.com/discover/pages/data-quality-management)
 - read https://blogs.perficient.com/2025/03/19/delta-live-tables-and-great-expectations/
 - need public repo for branch rules, theyre not tweaked so cant just be exported but
 	- can set deployment rules
@@ -30,6 +31,7 @@ When raising a pull request (PR), please use the `/addinstructions` command in D
 	- recommend enable in branch rules
 	- and require one reviewer
 	- /addinstructions as a command in databricks ai can work so can put user space or work space instructions
+- lakehouse monitoring!
 
 # Potentially incorrect assumptions
 - gold is for data ready for dashboard, aggrogated
 
@@ -1,5 +1,36 @@
 # Notes on what we want from copilot git, and databricks assistant#
 
+# ⚠️WARNING⚠️
+- Databricks AI has a low limit for request so 
+   - write a detailed prompt to scaffold
+   - write a detailed prompt to review when done
+   - write in sql and use to translate to python
+   - use other external AI for questions not requiring access to files or our specific prebuilt prompts
+   - prebuilt prompts can be used in external AI aswell
+- Git copilot can be used manually but not as an automatic PR tool **yet** without paying for it (though it is planned to be free)
+   
+ # Using github copilt in the short term  
+   - for now can try [github copilot client](https://github.com/copilot) and select the repo. and use a prompt like
+      > "Please confirm the name of the most recent pull request on this repo. Then using the .github/copilot-instructions.md as your context prompt provide a peer review of the pull requests code changes, by providing file names and lines."
+   - on a second screen you may want to go to the pull request and click the file changes tab
+   - if the bot needs help understanding descibing certain files please consider helping it by adding this to the context file for future, so we can constantly imporve its context
+
+# Tips
+- Do few high quality detailed prompts
+- forward slash offers some prebuilt context options to select
+- using inline ai button will focus on work your on now
+- using it from the side bar means you can get it to find for example the closest example of what your trying to do in the project for reference
+- tell it what you want it to consider, what you want to achieve, how you want it to help, the type of response you want, what your priorities are
+
+# refs
+[copilot in git code review](https://docs.github.com/en/copilot/how-tos/use-copilot-agents/request-a-code-review/use-code-review)
+[github custom context repo prompt generator](https://docs.github.com/en/copilot/how-tos/configure-custom-instructions/add-repository-instructions)
+
+# Notes
+We are not standardising commit and then failing them based on name in the repo. But we could still provide guidance as the will aid AI too.
+E.g. [Angular convention](https://www.conventionalcommits.org/en/v1.0.0-beta.4/) could adapt to be relevant to data process.
+
+
 ## Both
 - recommend spark over using libraries
 - recommend refactoring to python functions and unit tests where appropriate
@@ -26,4 +57,44 @@ The above was generated by AI. What our teams wants is:
 - do not overwhelm the work be clear on how beneficial changes are in what contexts and how difficult to implement so developers can decide what is worth applying.
 - if a change is declined and it may be a change we do not want recommending in future suggest an addition to be made to copilot-instructions.md or github/instructions files, the docs/AI-readme.md and the .assisitive in the workspace
 - search databricks upto date information before making recommendation about databricks and ensure its relevant for DABs git source controlled solutions with 3 workspaces deployed via git
-## Assistant DBX
+## Assistant DBX
+
+<!--This is our custom context for databricks assistant it is the global settings, there are also user space ones you can set. This file is not in source control-->
+
+<!--We will analyse our repo the github copilot to build a context prompt which will go here (if you want to see what that will look like look in the github folder in your user area repo, thats where the github ai tool context instructions live)-->
+
+<!--AI generated global context TODO see github one-->
+
+<!--Human added AI context instructions-->
+<!--This is just an example we would agree something together-->
+
+### context
+- we use dabs
+- we have a dev staging prod setup
+- dev is for analystist staging for testing team
+- we want unit testable code
+- we are migrating existing sql to databricks
+- databricks is serving warehoused dashboard data and live read data for our lms
+- when providing examples ground them in NHS LMS context.
+
+### Response instructions
+
+- Use the `spark.sql()` syntax for data transformations
+- Be very supportive especially with python
+  - in addition to feedback provide questioning prompts to help us learn python. Ask supportive questions, suggest resources to read.
+  - only recommend very common python libraries and prioritise to some extent vanilla python over packages because we prefer focussing on python over packages where we can for our learning even if it makes it slightly more verbose. We also preffer common packages that AI assistants will be better at managing.
+  - dont be picky unless asked highlight major issues, detect where the code might by trying to solve the wrong problem and add more details if there arnt large problems so we always improve. 
+- quantify suggestions with confidence level of why its an issues, what you think the code is for, how important the issue is
+- help the user reach the right questions
+- check our existing functions first for opportunities to reuse them
+
+### Response context
+We use the following python libraries, recommend tools from libraries we already use before recommending new libraries if possible.
+
+### When providing unit tests
+- aim for simple tests
+- provide alot of comments and explanation
+- provide the option of more test to the user if they wish
+- recommend other test types
+
+
@@ -0,0 +1,3 @@
+# Databricks Tips
+
+- the search will take you outside of your user area careful about editing other peoples work or your bundles (with the correct permissions the former will be less of an issue)
@@ -0,0 +1,97 @@
+# Context
+
+Standardising naming for commits can be useful for autogenerating versioning changelogs.
+- for example this git repo is for a tool that does this, on the right hand side you can see for their package they have release versions and clicking on them you can see a changelog
+[semver example using their git repo for their package](https://github.com/semantic-release/semantic-release)
+This repo explains the commit naming, and demonstrates it. Here is the list of changes theyve made as a generated changelog [semver package releases changelog](https://github.com/semantic-release/semantic-release/releases)
+For branches using jira ticket names and environment names allows git to move tasks across the jira board and to mark if an environment has been successfully deployed.
+
+It can be useful when PRing for humans and AI Prs too. 
+
+We may want to define out own standardisation which is databrick and data orientated.
+
+
+## Commit names convention example
+fix(pencil): stop graphite breaking when too much pressure applied
+
+### E.g.
+
+#### FEAT — new capability / behaviour
+feat(lms): add derived completion_status for statutory training
+feat(pipeline): publish daily LMS compliance snapshot to curated layer
+feat(model): expose expiry_date logic for mandatory training
+
+#### FIX — bug / incorrect logic
+fix(sql): correct join logic causing duplicate learner records
+fix(model): handle null end dates for honorary contracts
+fix(lms): resolve incorrect compliance flag for bank staff
+
+#### REFACTOR — same behaviour, better structure
+refactor(pipeline): move learner completion logic from SQL to PySpark
+refactor(pipeline): replace usp_UpdateLMSCompliance with Databricks job
+refactor(model): extract completion calculation into reusable function
+refactor(pipeline): decouple LMS rules from ingestion logic
+
+#### TEST — tests only
+test(model): add unit tests for completion status calculation
+test(pipeline): replace LMS integration test with isolated unit tests
+test(lms): cover edge cases for training expiry logic
+
+#### PERF — performance improvements
+perf(pipeline): reduce LMS compliance job runtime by optimising joins
+perf(sql): remove redundant subqueries from compliance extract
+
+#### CHORE — housekeeping, no behaviour change
+chore(notebook): clean up LMS compliance analysis notebook
+chore(ci): update Databricks job parameters for LMS pipelines
+
+#### SCHEMA — data shape changes
+schema(lms): add staff_group_code to training_completion table
+schema(lms)!: rename staff_id to person_id across LMS marts
+
+#### DOCS — documentation only
+docs(lms): document compliance logic and ESR alignment assumptions
+
+#### BUILD / CI — tooling & pipelines
+ci(lms): run unit tests for PySpark models on pull requests
+build(pipeline): parameterise LMS job for multiple environments
+
+
+## Branch naming
+
+Make branches from main.
+
+TODO: See learning hub branch names and documentation
+Something like
+
+feature/<issue-number>-<short-description>
+fix/<issue-number>-<short-description>
+refactor/<issue-number>-<short-description>
+hotfix/<issue-number>-<short-description>
+
+**Branch Naming Examples**
+
+**Feature branches** - New functionality or enhancements:
+feature/101-add-learner-dashboard
+feature/102-enable-course-progress-tracking
+feature/103-integrate-esr-user-data
+feature/104-add-training-completion-export
+
+**Fix / bug branches** - Bug fixes or corrections:
+fix/201-correct-login-session-timeout
+fix/202-resolve-duplicate-learner-rows
+fix/203-handle-null-expiry-dates
+
+**Refactor branches** - Refactoring code without changing behaviour:
+refactor/301-extract-user-service
+refactor/302-move-completion-calculation-to-pyspark
+refactor/303-cleanup-lms-notebook
+
+**Hotfix / urgent branches** - Critical issues requiring immediate production deployment:
+hotfix/401-critical-login-error
+hotfix/402-fix-mandatory-training-bug
+
+**Experiment / spike branches** - Temporary or exploratory work:
+spike/501-pyspark-performance-test
+spike/502-new-dashboard-ui-prototype
+
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+# Databricks Tips`
	`2`	`+`
	`3`	`+- the search will take you outside of your user area careful about editing other peoples work or your bundles (with the correct permissions the former will be less of an issue)`