GitHub - imranece59/poc

Problem Statement:

Input Data Given json contains 20K articles (1 object per line) Each article contains set of attributes with one set value.

Recommendation Engine Requirements Calculate similarity of articles identified by sku based on their attributes values. The number of matching attributes is the most important metric for defining similarity. In case of a draw, attributes with name higher in alphabet (a is higher than z) is weighted with heavier weight.

Example 1: 1 {"sku":"sku-1","attributes": {"att-a": "a1", "att-b": "b1", "att-c": "c1"}} is more similar to 2 {"sku":"sku-2","attributes": {"att-a": "a2", "att-b": "b1", "att-c": "c1"}} than to

{"sku":"sku-1","attributes": {"att-a": "a1", "att-b": "b1", "att-c": "c2"}} is more similar to

{"sku":"sku-3","attributes": {"att-a": "a1", "att-b": "b3", "att-c": "c3"}}

Example 2: {"sku":"sku-1","attributes":{"att-a": "a1", "att-b": "b1"}} is more similar to {"sku":"sku-2","attributes":{"att-a": "a1", "att-b": "b2"}} than to {"sku":"sku-3","attributes":{"att-a": "a2", "att-b": "b1"}}

Recommendation request example sku-123 > ENGINE > 10 most similar SKUs based on criteria described above with their corresponding weights.

Implementation requirements Please use Spark with Scala for your solution You can use any of Spark's APIs and modules like RDDs, Dataframes, SQL, etc. You can use sbt to manage Spark libs The Spark Application/session should be set up to run on local machine

Design Approach:

Load the home24-test-data-for-spark.json file into dataframe
Flatten the rows in the below format +-------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+ |skuName|a |b |c |d |e |f |g |h |i |j | +-------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+ |sku-1 |att-a-7 |att-b-3 |att-c-10|att-d-10|att-e-15|att-f-11|att-g-2 |att-h-7 |att-i-5 |att-j-14| |sku-2 |att-a-9 |att-b-7 |att-c-12|att-d-4 |att-e-10|att-f-4 |att-g-13|att-h-4 |att-i-1 |att-j-13| |sku-3 |att-a-10|att-b-6 |att-c-1 |att-d-1 |att-e-13|att-f-12|att-g-9 |att-h-6 |att-i-7 |att-j-4 | |sku-4 |att-a-9 |att-b-14|att-c-7 |att-d-4 |att-e-8 |att-f-7 |att-g-14|att-h-9 |att-i-13|att-j-3 | |sku-5 |att-a-8 |att-b-7 |att-c-10|att-d-4 |att-e-11|att-f-4 |att-g-8 |att-h-8 |att-i-7 |att-j-8 | |sku-6 |att-a-6 |att-b-2 |att-c-13|att-d-6 |att-e-2 |att-f-11|att-g-2 |att-h-11|att-i-1 |att-j-9 |
Calculate the match against each column of every other rows.

if col(a) matches then 9 otherwise -1
if col(b) matches then 8 otherwise -1
if col(c) matches then 7 otherwise -1
if col(d) matches then 6 otherwise -1
if col(e) matches then 5 otherwise -1
if col(f) matches then 4 otherwise -1
if col(g) matches then 3 otherwise -1
if col(h) matches then 2 otherwise -1
if col(i) matches then 1 otherwise -1
if col(j) matches then 0 otherwise -1

Concatenate all the calculated difference values into single one with the trimmed -1 value
Calculate the rank against the difference values(point 4)

Running

sbt "runMain demo.common.SkuRecommendation --sku=sku-12312 --num=10"
--sku - sku value to calculate similar match / --num - number of similar matches to be returned

Sample Output

Most Similar SKU's
sku-14894
sku-10634
sku-5240
sku-11148
sku-19493
sku-1328
sku-15418
sku-4694
sku-19692
sku-1265

Eclipse Build

Git clone to project
run sbt eclipse from the project folder
import the project into ScalaIde/IntelliJ

Notes This is a pure Spark/Scala program which run on local mode.

Input file is placed under ./src/main/resources/home24-test-data-for-spark.json folder. No need to specify anywhere.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
project		project
src/main		src/main
.classpath		.classpath
.project		.project
README.md		README.md
ai_agent.zip		ai_agent.zip
build.sbt		build.sbt
complete-setup.sh		complete-setup.sh
sql_llm_agent.zip		sql_llm_agent.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages