L00172671 - Oisin Gibson
Clone the repo, then run:
npm run setupThis downloads and configures everything automatically:
- Apache Tika (PDF text extraction)
- Java Runtime (required by Tika)
- Tesseract OCR (for scanned PDFs — optional, app works without it)
- Python virtual environment + all NLP packages
- Node.js dependencies for all packages
npm startThis starts all four services together:
| Service | URL |
|---|---|
| React client | http://localhost:3000 |
| API server | http://localhost:8080 |
| NLP microservice | http://localhost:8000 |
| Tika (PDF extraction) | http://localhost:9998 |
These are created automatically on first run:
| Password | Role | |
|---|---|---|
| admin@achilles.com | Admin@123 | Admin |
| demo@achilles.com | Demo@123 | User |
Financial document management and NLP analysis system with:
- React client for upload, document browsing, and NLP UI
- Node/Express API for auth, document storage, and processing orchestration
- Python NLP microservice for extraction and analysis
- Local runtime dependencies for Java/Tika/Tesseract OCR
This section focuses on maintained source/config files. Large generated/runtime/vendor folders are summarized at the end for readability.
- package.json: Workspace-level scripts/dependencies.
- package-lock.json: Workspace dependency lock file.
- README.md: Project documentation.
- scripts/checkSetup.js: Environment/setup validation helper.
- scripts/setup.js: Local setup bootstrap script.
- scripts/startTika.js: Starts Apache Tika runtime.
- nlp_service/main.py: Python NLP microservice entrypoint.
- nlp_service/requirements.txt: Python package requirements.
- server/.env.example: Example environment variables.
- server/package.json: Server scripts/dependencies.
- server/package-lock.json: Server dependency lock file.
- server/server.js: Express server bootstrap and route mounting.
- server/createAdmin.js: Creates default admin user.
- server/pdf-diagnostic.js: PDF diagnostics utility.
- server/tika-config.xml: Tika OCR configuration.
- server/contracts/nlpResults.json: NLP result contract/schema.
- server/models/User.js: User model, auth helpers, password hashing.
- server/models/Document.js: Document model and NLP-related fields.
- server/middleware/auth.js: JWT authentication middleware.
- server/routes/auth.js: Authentication endpoints.
- server/routes/users.js: Admin/user management endpoints.
- server/routes/documents.js: Document route aggregator.
- server/routes/documents/documentCrudRoutes.js: Document CRUD endpoints.
- server/routes/documents/uploadRoutes.js: Upload endpoints.
- server/routes/documents/nlpRoutes.js: NLP endpoints.
- server/routes/documents/nlpProcessing.js: NLP processing flow helpers.
- server/routes/documents/helpers.js: Shared document-route utilities.
- server/services/nlpProcessor.js: Core NLP extraction/processing logic.
- server/services/nlpMicroservice.js: Integration with Python NLP microservice.
- server/tests/auth.test.js: Authentication tests.
- server/tests/users.test.js: User/admin route tests.
- server/tests/documents.test.js: Document route tests.
- server/tests/auditFlags.test.js: Audit flag behavior tests.
- server/tests/nerAccuracy.test.js: NER quality/accuracy tests.
- client/package.json: Client scripts/dependencies.
- client/package-lock.json: Client dependency lock file.
- client/public/index.html: HTML entry page.
- client/public/manifest.json: PWA metadata.
- client/public/images/logo.png: App logo asset.
- client/public/images/Outlook-ixaxuupp.jpg: UI image asset.
- client/src/index.js: React app bootstrap.
- client/src/App.js: Route setup and top-level app layout.
- client/src/config.js: API URL config.
- client/src/index.css: Global styling.
- client/src/App.css: App-level styling.
- client/src/reportWebVitals.js: Web vitals helper.
- client/src/setupTests.js: Test setup.
- client/src/components/Header.js: Main navigation/header UI.
- client/src/components/Footer.js: Footer UI.
- client/src/components/Dashboard.js: Main dashboard page.
- client/src/components/Login.js: Login form/page.
- client/src/components/Register.js: Registration form/page.
- client/src/components/AdminPanel.js: Admin user management page.
- client/src/components/UploadDocument.js: Upload page.
- client/src/components/AlertMessage.js: Shared alert/message component.
- client/src/components/SelectedFileCard.js: Selected upload file summary card.
- client/src/components/UploadGuidelines.js: Upload help text.
- client/src/components/ProcessingTimes.js: Processing time analytics view.
- client/src/components/ProcessingTimesTable.js: Processing times table.
- client/src/components/SummaryCards.js: Processing metric summary cards.
- client/src/components/ProcessingTimes.styles.js: Processing view styles.
- client/src/components/ProcessingTimes.utils.js: Processing helper functions.
- client/src/components/About.js: About page.
- client/src/components/About.css: About page styles.
- client/src/components/PrivacyPolicy.js: Privacy policy page.
- client/src/components/TermsOfService.js: Terms page.
- client/src/components/Logo.js: Logo component.
- client/src/components/adminPanel/AdminPanelHeader.js: Admin panel header/tab navigation.
- client/src/components/adminPanel/UserManagementTab.js: Users tab content wrapper.
- client/src/components/adminPanel/UserStatisticsCards.js: User metric cards.
- client/src/components/adminPanel/UserCard.js: Single user card UI.
- client/src/components/documents/Documents.js: Documents page and grouping logic.
- client/src/components/documents/Documents.css: Documents page styles.
- client/src/components/documents/DocumentCard.js: Main document card wrapper.
- client/src/components/documents/DocumentStatistics.js: Documents statistics section.
- client/src/components/documents/EmptyDocuments.js: Empty-state documents UI.
- client/src/components/documents/FileDropZone.js: Drag/drop file upload area.
- client/src/components/documents/documentCard/DocumentCardHeader.js: Document card header section.
- client/src/components/documents/documentCard/DocumentCardMeta.js: Document metadata section.
- client/src/components/documents/documentCard/DocumentCardRagBadge.js: RAG badge section.
- client/src/components/documents/documentCard/DocumentCardActions.js: Document action buttons.
- client/src/components/login/LoginHeader.js: Login page header block.
- client/src/components/login/LoginSubmitButton.js: Login submit button/loading state.
- client/src/components/nlp/NLPAnalysis.js: NLP modal data-loading container.
- client/src/components/nlp/NLPAnalysisView.js: NLP analysis page-level renderer.
- client/src/components/nlp/NLPAnalysis.styles.js: NLP UI style definitions.
- client/src/components/nlp/NLPAnalysis.utils.js: NLP helper functions.
- client/src/components/nlp/NLPAnalysisContentSections.js: NLP content section composer.
- client/src/components/nlp/NLPAnalysisOverviewSection.js: NLP overview/statistics section.
- client/src/components/nlp/NLPAnalysisEntitiesSection.js: Named entity display section.
- client/src/components/nlp/NLPAnalysisDocumentSections.js: Document text/frequency section.
- client/src/components/nlp/NLPAnalysisAuditPanel.js: Audit flags container panel.
- client/src/components/nlp/NLPAnalysisAuditRagCard.js: RAG status card.
- client/src/components/nlp/NLPAnalysisAuditFlagList.js: Audit flag list/evidence renderer.
- client/src/hooks/useAlert.js: Alert-state custom hook.
- client/src/hooks/useDocuments.js: Document fetch/delete hook.
- client/src/hooks/useFileUpload.js: File upload flow hook.
- client/src/utils/alertUtils.js: Alert formatting/helpers.
- client/src/utils/documentUtils.js: Document format/download/group helpers.
- client/src/utils/fileUtils.js: File validation/upload helpers.
These exist in the repo but are intentionally not expanded line-by-line here to keep this README readable:
- client/build: Built frontend artifacts and source maps.
- server/uploads: Uploaded document files.
- server/lib: JAR dependencies for OCR/image support.
- runtimes: Bundled Java/Tika/Tesseract runtime binaries and docs.
- nlp_service/venv: Python virtual environment and installed packages.
- server/database.sqlite: Local SQLite database file.
======================================================================================================================================= Reference Material
-
Tokenization Concepts
- Manning, C. D., & Schütze, H. (1999). Foundations of Statistical Natural Language Processing. MIT Press.
-
Stopword Removal
- Common English stopwords list based on NLTK (Natural Language Toolkit)
- Bird, S., Klein, E., & Loper, E. (2009). Natural Language Processing with Python. O'Reilly Media.
- pdf-parse Library
- GitHub: https://github.com/modesty/pdf-parse
- Uses Mozilla's PDF.js for parsing
- Apache Tika
- Tesseract OCR (Windows builds)
- Tesseract OCR (Official)
Some scanned PDFs use JPEG2000 (JP2) images. To OCR these, add the JAI Image I/O JARs:
- Download:
jai-imageio-core-*.jarjai-imageio-jpeg2000-*.jar
- Place both files in server/lib
- Restart Tika (
npm run tika)
- React (W3Schools): https://www.w3schools.com/react/
- SQL (W3Schools): https://www.w3schools.com/sql/
- Node.js (GeeksforGeeks): https://www.geeksforgeeks.org/nodejs/
- Express.js (GeeksforGeeks): https://www.geeksforgeeks.org/express-js/
- JWT (GeeksforGeeks): https://www.geeksforgeeks.org/json-web-token-jwt/
- bcrypt (GeeksforGeeks): https://www.geeksforgeeks.org/bcrypt-hashing-in-nodejs/
-
React Documentation
- Official Docs: https://react.dev/
- React Hooks: https://react.dev/reference/react
-
Express.js - Web framework for Node.js
- Official Guide: https://expressjs.com/
-
Sequelize ORM
- Documentation: https://sequelize.org/docs/v6/
-
Bootstrap 5
- Documentation: https://getbootstrap.com/docs/5.0/
- Icons: https://icons.getbootstrap.com/
-
Component-Based Architecture
- Fowler, M. (2003). "Patterns of Enterprise Application Architecture"
-
JSON Web Tokens (JWT)
- jwt.io: https://jwt.io/introduction
-
bcrypt - Password hashing
- Multer - Node.js middleware for multipart/form-data
- Documentation: https://github.com/expressjs/multer
- node-cron - Scheduled jobs in Node.js
- Documentation: https://www.npmjs.com/package/node-cron
- OWASP File Upload Cheat Sheet
- file-type - Detect file signature (magic bytes)
- Documentation: https://www.npmjs.com/package/file-type