- Backend: Build a service to upload products list through a csv file. persist the data. and then ingest it to ElasticSearch
- Frontend: Build a search page with elastic style filtering. (pre-search, post-search)
This repository implements an e-commerce platform with a focus on efficient data ingestion and search functionality.
Key highlights include:
- Elasticsearch: Powers the search functionality.
- Kibana: on top of elasticsearch
- Postgres: db to store products/categories
- Kafka: Enables event-driven product ingestion.
- Redis: Optimizes image handling.
- Concurent Processing: Efficiently processes product data from CSV uploads.
- API build with expressJs (typescript)
- FrontEnd served as static folder from express. vanilla Js + tailwind
- Upload product data using /upload.html
- once the file is uploaded, rows are streamed using a read stream and parse with
fast-csv
export const parseAndStreamCsvFromPath = (filePath: string) => {
return createReadStream(filePath).pipe(parse({ headers: true, delimiter: ',', quote: "'" }));
};- read the csv stream and batch rows, once we reach batchSize, we start processing the first batch
for await (const row of csvStream) {
rowsBatch.push(row);
skuStash.push(row['sku']);
if (rowsBatch.length == batchSize) {
await processBatch();
rowsBatch = [];
}
}-
to process a batch, we concurrently download images and wait for all settled.
-
we insert all products with successfull image download in the db
-
if a mass insert fails, we default on oneByOne
export const batchInsertProducts = async (products: IProductDTO[]): Promise<IBatchProductInsertResponse> => {
const response: IBatchProductInsertResponse = {
success: 0,
errors: [],
};
try {
// await productRepo.insert(products);
await productRepo.upsert(products, {
conflictPaths: ['sku'],
});
response.success = products.length;
return response;
} catch (error) {
return await insertOneByOne(products);
}
};Key Considerations
- Image Caching:
- Images are cached locally to minimize redundant downloads. Cached images are reused if the same URL appears within 2 minutes.
- Error Handling:
- Failed image downloads are retried individually to isolate failures.
- Scalability:
- Kafka ensures asynchronous and decoupled processing.
- Elasticsearch provides scalable search capabilities for large datasets.
