diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index 26360b5..24d6934 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -11,17 +11,18 @@ jobs:
steps:
- name: Checkout code
- uses: actions/checkout@v3
+ uses: actions/checkout@v4
- name: Set up Python 3.10
uses: actions/setup-python@v4
with:
python-version: '3.10'
+ - name: Install uv
+ run: pipx install uv
+
- name: Install dependencies
- run: |
- pip install --upgrade pip
- pip install -e .[dev]
+ run: uv pip install -e .[dev] --system
- name: Run tests
run: pytest
\ No newline at end of file
diff --git a/.github/workflows/release.yml b/.github/workflows/release.yml
index 82f2165..c94a6a9 100644
--- a/.github/workflows/release.yml
+++ b/.github/workflows/release.yml
@@ -1,4 +1,4 @@
-name: Publish to TestPyPI
+name: Publish to PyPI
on:
push:
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 3c18169..5cd5120 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,10 +1,29 @@
# Changelog
All notable changes to this project will be documented in this file.
-
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [1.1.0] - 2025-09-15
+
+### Added
+- A comprehensive developer API for using `Scriber` as a library.
+- The `Scriber` class can now be initialized with a list of paths to scan multiple directories at once.
+- `Scriber` can now be initialized with a configuration dictionary directly.
+- New method `get_output_as_string()` to get the project map without writing to a file.
+- New getter methods `get_tree()` and `get_mapped_files()` to access processed data.
+- Expanded `README.md` with a detailed "Library Usage" section and API examples.
+- Created two installation options: a minimal default (`project-scriber`) and an enhanced version with rich terminal output (`project-scriber[rich]`).
+- The `Scriber` class is now exposed for direct import and programmatic use (`from scriber import Scriber`).
+- A `hidden` configuration option to prevent a file's content from being written to the output, while still including it in the file tree.
+This is useful for large files like `poetry.lock`.
+- Added a prompt for `hidden` patterns to the interactive `scriber init` command.
+
+### Changed
+- The default installation no longer includes `rich` as a dependency, making it more lightweight.
+The CLI now falls back to simple text-based output if `rich` is not installed.
+- Improved performance of file analysis by using multi-threading to process files concurrently.
+
## [1.0.1] - 2025-08-30
### Added
diff --git a/README.md b/README.md
index 21bf7de..e6dc3ba 100644
--- a/README.md
+++ b/README.md
@@ -4,73 +4,166 @@
+ π Your Codebase β π¦ ProjectScriber β π LLM-Ready Context +
-When working with LLMs, providing the full context of a codebase is crucial for getting accurate analysis, -documentation, or refactoring suggestions. Manually copying and pasting files is tedious and error-prone. * -*ProjectScriber** automates this process. It scans your project, respects `.gitignore` rules, applies custom filters, -and bundles all relevant code into a clean, readable format perfect for any AI model. +----- -## Key Features +## β¨ Key Features -* **π³ Smart Project Mapping:** Generates a clear and intuitive tree view of your project's structure. -* **βοΈ Intelligent Filtering:** Automatically respects `.gitignore` rules and supports custom `include` and `exclude` - patterns for fine-grained control. -* **π In-depth Code Analysis:** Provides a summary with total file size, estimated token count (using `cl100k_base`), - and a language breakdown. -* **β¨ Interactive Setup:** A simple `scriber init` command walks you through creating a configuration file tailored to - your project. -* **π Clipboard Integration:** Use the `--copy` flag to automatically copy the entire output to your clipboard. -* **π§ Flexible Configuration:** Manage settings in a `pyproject.toml` or a project-specific `.scriber.json` file. +| Feature | Description | +|:-------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------| +| **π³ Smart Project Mapping** | Generates a clear and intuitive tree view of your project's structure. | +| **βοΈ Intelligent Filtering** | Automatically respects `.gitignore` and supports custom `include`, `exclude`, and `hidden` patterns. You can even define language-specific exclusions! | +| **π In-depth Code Analysis** | Provides a summary with total file size, estimated token count (using `cl100k_base`), and a language breakdown. | +| **π Flexible Python Library** | Import and use the `Scriber` class directly in your Python projects for full programmatic control. | +| **β¨ Interactive CLI** | A simple `scriber init` command walks you through creating a configuration file for your project. | +| **π Clipboard Integration** | Use the `--copy` flag to automatically send the entire output to your clipboard, ready for pasting. | +| **π¨ Lightweight & Fast** | The default installation is minimal, and file analysis is multi-threaded for improved performance. | ---- +----- -## Getting Started +## π Quick Start Install the package directly from the [Python Package Index (PyPI)](https://pypi.org/project/project-scriber/). -```shell -pip install project-scriber -```` +1. **Install Scriber:** + + ```shell + pip install project-scriber + + ```` + +2. **Navigate to your project's root and run:** + + ```shell + scriber + ``` + +3. **That's it\!** A `scriber_output.txt` file is now in your directory. + It will look something like this: + + ````text + === + Mapped Folder Structure + === + + ProjectScriber + βββ .github + β βββ workflows + β βββ ci.yml + β βββ release.yml + βββ README.md + βββ src + βββ scriber + βββ __init__.py + βββ core.py + + --- + File: .github/workflows/ci.yml + Size: 512 bytes + --- + ```yaml + name: Continuous Integration + + on: + push: + branches: + - develop + + jobs: + run_tests: + ... + ```` ----- -## Usage +## πΎ Installation -#### 1\. Basic Scan +You have two options for installation. -Run `scriber` in your project's root directory. It will generate a `scriber_output.txt` file. +#### Standard Installation + +This provides the core functionality with a minimal, text-based interface. ```shell -scriber +pip install project-scriber ``` -To target a different directory: +#### With Rich UI β¨ + +For an enhanced terminal experience with colors, tables, and progress bars, install the `rich` extra: ```shell -scriber /path/to/your/project +pip install project-scriber[rich] ``` -#### 2\. First-Time Configuration +----- -For a new project, run the interactive `init` command to create a `.scriber.json` configuration file. +## π₯οΈ Command-Line Usage -```shell -scriber init -``` +### Basic Commands + + - **Scan the current directory**: + ```shell + scriber + ``` + - **Scan a different directory**: + ```shell + scriber /path/to/your/project + ``` + - **Interactive Setup**: Create a configuration file (`.scriber.json` or `pyproject.toml`) for your project. + ```shell + scriber init + ``` + +### CLI Options + +| Option | Alias | Description | +|:------------------|:------|:-----------------------------------------------------------------------| +| `root_path` | | The project directory to map. Defaults to the current directory. | +| `--output [file]` | `-o` | Set a custom name for the output file. | +| `--config [path]` | | Path to a custom config file (e.g., a `pyproject.toml` in a monorepo). | +| `--copy` | `-c` | Copy the final output to the clipboard. | +| `--tree-only` | | Generate only the file tree structure, without any file content. | +| `--version` | `-v` | Show the installed version of ProjectScriber. | +| `--help` | `-h` | Display the help message. | -#### 3\. Advanced Example +### Advanced Example -Scan another project, specify a custom output file, and copy the result to the clipboard in one command. +Scan another project, save the output to `custom_map.txt`, and copy the result to the clipboard in one go: ```shell scriber ../my-other-project --output custom_map.txt --copy @@ -78,95 +171,269 @@ scriber ../my-other-project --output custom_map.txt --copy ----- -## Commands and Options +## π Library Usage (API) -| Command/Option | Alias | Description | -|:----------------------|:-----:|:-----------------------------------------------------------------------------| -| `scriber [path]` | | Targets a specific directory. Defaults to the current working directory. | -| `init` | | Starts the interactive process to create a configuration file. | -| `--help` | `-h` | Displays the help message. | -| `--version` | `-v` | Displays the current version of ProjectScriber. | -| `--output [filename]` | `-o` | Specifies a custom name for the output file. | -| `--copy` | `-c` | Copies the final output directly to the clipboard. | -| `--tree-only` | | Generates only the folder structure map, excluding all file contents. | -| `--config [path]` | | Specifies a path to a custom `.json` or `pyproject.toml` configuration file. | +Use `ProjectScriber` directly in your Python code for maximum flexibility and automation. ------ +### Basic Example: Get Context as a String -## Configuration +Initialize `Scriber`, and it will automatically handle mapping and analysis. -ProjectScriber uses the following order of precedence for loading configurations: +```python +from pathlib import Path +from scriber import Scriber # The class is exposed for direct import -1. **`--config [path]` flag**: Highest priority. If you provide a path to a `.json` or `pyproject.toml` file, its - settings will be used. -2. **`.scriber.json`**: If no `--config` flag is used, Scriber looks for a `.scriber.json` file in the project's root. - This file's settings will override any found in `pyproject.toml`. -3. **`pyproject.toml`**: If neither of the above is found, it looks for a `[tool.scriber]` section in a `pyproject.toml` - file in the project's root. -4. **Default Config**: If no configuration is found, `scriber` will create a default `.scriber.json` on its first run in - a directory. +# 1. Initialize Scriber for the current directory +scriber = Scriber(root_path=Path('.')) -**Example `.scriber.json`:** +# 2. Get the complete output directly as a string +project_context = scriber.get_output_as_string() -```json -{ - "use_gitignore": true, - "exclude": [ - "__pycache__", - "node_modules", - "*.log" - ], - "include": [ - "*.py", - "*.js" - ] +# 3. Use the context for your application +print(f"Generated context of {len(project_context)} characters.") + +# 4. Access the calculated statistics +stats = scriber.get_stats() +print(f"Total files mapped: {stats['total_files']}") +print(f"Estimated tokens: {stats['total_tokens']:,}") +``` + +### Advanced Configuration via Dictionary + +Bypass all on-disk configuration files by passing a dictionary directly to the constructor. This is perfect for dynamic +or controlled environments. + +```python +from pathlib import Path +from scriber import Scriber + +my_config = { + "use_gitignore": True, + "exclude": ["node_modules/", "dist/"], + "include": ["*.py", "*.js", "Dockerfile"], + "hidden": ["poetry.lock", "package-lock.json"], + "exclude_map": { + "global": ["*.log", "temp.*"], + "python": ["*_test.py", "conftest.py"], + "javascript": ["*.min.js"] + } } + +scriber = Scriber(root_path=Path('/path/to/your/project'), config=my_config) +project_context = scriber.get_output_as_string() +print(project_context) +``` + +### Scanning Multiple Directories + +You can pass a list of paths to the `Scriber` constructor to map multiple directories into a single output. The first path in the list is treated as the "primary root" for loading configurations (`.gitignore`, `pyproject.toml`, etc.). + +```python +from pathlib import Path +from scriber import Scriber + +# Example: Scan both a 'backend' and a 'frontend' directory +backend_path = Path('./my_backend_project') +frontend_path = Path('./my_frontend_project') + +# Create dummy directories and files for the example +backend_path.mkdir(exist_ok=True) +(backend_path / "main.py").write_text("print('hello from backend')") +frontend_path.mkdir(exist_ok=True) +(frontend_path / "app.js").write_text("console.log('hello from frontend')") + +# Initialize with a list of paths. `backend_path` is the primary root. +scriber = Scriber(root_path=[backend_path, frontend_path]) + +# Get the combined context as a single string +combined_context = scriber.get_output_as_string() +print(combined_context) + +# The output will contain two separate trees and file content blocks, +# with file paths prefixed by their root folder's name. ``` -**Example `pyproject.toml`:** +### Accessing Intermediate Data + +You can also access the generated file tree and the list of mapped files before the final output is compiled. + +```python +from pathlib import Path +from scriber import Scriber + +scriber = Scriber(root_path=Path('.')) + +# Get just the formatted file tree +tree_representation = scriber.get_tree() +print("--- Project Tree ---") +print(tree_representation) + +# Get a list of all mapped file paths +print("\n--- Mapped Files ---") +file_paths = scriber.get_mapped_files() +for path in file_paths: + print(path.relative_to(scriber.primary_root)) +``` + +### Practical Example: Preparing Context for an LLM + +Here's a small function demonstrating how you can use ProjectScriber to generate a complete, well-formatted prompt for an LLM. + +```python +from pathlib import Path +from scriber import Scriber + + +def get_llm_context(project_path: Path, task: str) -> str: + ''' + Generates a complete project context string ready for an LLM. + + Args: + project_path: The root directory of the project. + task: The specific task you want the LLM to perform. + + Returns: + A formatted string to be used as a prompt for an LLM. + ''' + # Initialize Scriber and get the project map + scriber = Scriber(root_path=project_path) + project_map = scriber.get_output_as_string() + + # Get some stats for the context header + stats = scriber.get_stats() + token_count = stats.get("total_tokens", 0) + + # Assemble the final prompt for the LLM + prompt = ( + f"Please perform the following task: {task}\n\n" + f"Here is the full context of the project codebase. " + f"It includes a file tree and the content of all relevant files.\n" + f"Estimated Token Count: {token_count:,}\n\n" + "--- PROJECT CONTEXT BEGINS ---\n" + f"{project_map}" + "--- PROJECT CONTEXT ENDS ---" + ) + + return prompt + + +# --- Usage --- +if __name__ == "__main__": + my_project_path = Path('.') + user_task = "Analyze the code for potential bugs and suggest improvements." + llm_prompt = get_llm_context(my_project_path, user_task) + + print(llm_prompt) + + # Now you can send `llm_prompt` to your favorite LLM API. +``` + +----- + +## βοΈ Configuration + +ProjectScriber is configured via a file in your project's root. It searches for configurations in the following order of +precedence: + +1. **Direct `config` dictionary** (Library mode only). +2. **`--config [path]` flag** (CLI mode only). +3. **`.scriber.json`** in the project root. +4. **`[tool.scriber]`** section in `pyproject.toml`. +5. **Default Config**: If no file is found, a default `.scriber.json` is created on the first run. + +### Configuration Keys + +|Key | Type |Default |Description | +|:----------------|:--------|:-----------------------|:------------------------------------------------------------------------------------------------------------------------------------------| +| `use_gitignore` | boolean | `true` |If `true`, all patterns in the `.gitignore` file will be used for exclusion. | +| `exclude` | list |See `core.py` |A list of file/folder names or patterns to exclude globally (e.g., `"node_modules"`, `"*.log"`). | +| `include` | list |`[]` |If not empty, **only** files matching these patterns will be included. | +| `hidden` | list |`[]` |Files matching these patterns will appear in the tree but their content will be replaced with a placeholder. Useful for large lock files. | +| `exclude_map` | object |`{}` |A dictionary for language-specific and global exclusion patterns. See example below. | +| `output` | string |`"scriber_output.txt"` |The default name for the output file. | + +### Example `pyproject.toml` Configuration + +Here is an example of a well-configured `[tool.scriber]` section in your `pyproject.toml` file: ```toml [tool.scriber] +# Respect the project's .gitignore file use_gitignore = true + +# Globally exclude common folders and file types exclude = [ "__pycache__", "node_modules", - "*.log" + "dist", + "build", + ".venv", ] + +# Only include files with these extensions include = [ "*.py", - "*.js" + "*.js", + "*.css", + "*.md" ] + +# Show these files in the tree, but hide their content +hidden = [ + "poetry.lock" +] + +# Language-specific and global exclusion rules +[tool.scriber.exclude_map] +# Exclude these patterns from all files +global = ["*.log", "*.tmp"] +# In Python files, exclude tests and setup scripts +python = ["*_test.py", "setup.py"] +# In JavaScript files, exclude spec files +javascript = ["*.spec.js"] ``` +> **π‘ Note on Excluding Directories:** For patterns that should *only* match directories (e.g., `build/`), it's best +> practice to use your `.gitignore` file, which has more advanced pattern matching that ProjectScriber understands. + ----- -## For Developers +## π€ Contributing & Development -### Prerequisites +Contributions are welcome\! If you have a suggestion or find a bug, please open an issue to discuss it first. -* Python 3.10 or higher. +### Development Setup -### Development Installation +1. **Prerequisites**: + * Python 3.10 or higher. -Clone the repository and install it in editable mode with all development dependencies. +2. **Clone the Repository**: + ```shell + git clone https://github.com/SunneV/ProjectScriber.git + ``` -```shell -git clone [https://github.com/SunneV/ProjectScriber.git](https://github.com/SunneV/ProjectScriber.git) -cd ProjectScriber -pip install -e .[dev] -``` +3. **Navigate to the Project Directory**: + ```shell + cd ProjectScriber + ``` -### Running Tests +4. **Install Dependencies**: + Choose one of the following methods to install the project in editable mode with all development dependencies. -Run the test suite using `pytest`. + * **Using `pip`**: + ```shell + pip install -e .[dev] + ``` -```shell -pytest -``` + * **Using `uv`**: + ```shell + uv sync --all-packages --extra dev + ``` ------ +### Running Tests -## Contributing +Run the test suite using `pytest`: -Contributions are welcome\! If you have a suggestion or find a bug, please open an issue to discuss it. +```shell +pytest +``` \ No newline at end of file diff --git a/pyproject.toml b/pyproject.toml index 2427298..8c42875 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,6 +1,6 @@ [project] name = "project-scriber" -version = "1.0.1" +version = "1.1.0" authors = [ { name="SunneV (Wojciech Mariusz CichoΕ)", email="wojciech.m.cichon@gmail.com" }, ] @@ -19,7 +19,6 @@ classifiers = [ dependencies = [ "pathspec", "python-dotenv", - "rich", "tiktoken", "pyperclip", "tomlkit", @@ -34,9 +33,11 @@ Issues = "https://github.com/SunneV/ProjectScriber/issues" scriber = "scriber.cli:main" [project.optional-dependencies] +rich = ["rich"] dev = [ "pytest", - "pytest-mock" + "pytest-mock", + "rich" ] [build-system] diff --git a/src/scriber/__init__.py b/src/scriber/__init__.py index e69de29..3dfed2c 100644 --- a/src/scriber/__init__.py +++ b/src/scriber/__init__.py @@ -0,0 +1,10 @@ +""" +ProjectScriber: A tool for mapping and compiling project source code. + +This package provides the core functionality and command-line interface for +ProjectScriber. The main `Scriber` class can be imported directly for +programmatic use. +""" +from .core import Scriber + +__all__ = ["Scriber"] \ No newline at end of file diff --git a/src/scriber/cli.py b/src/scriber/cli.py index 85606da..5b61726 100644 --- a/src/scriber/cli.py +++ b/src/scriber/cli.py @@ -1,29 +1,56 @@ import argparse import json import os +import re import sys from importlib import metadata from pathlib import Path from typing import Any import pyperclip -import rich.box import tomlkit from dotenv import load_dotenv -from rich.console import Console -from rich.panel import Panel -from rich.progress import BarColumn, Progress, SpinnerColumn, TextColumn -from rich.prompt import Confirm, Prompt -from rich.table import Table -from rich.text import Text from .core import DEFAULT_CONFIG, Scriber +try: + import rich.box + from rich.console import Console + from rich.panel import Panel + from rich.progress import BarColumn, Progress, SpinnerColumn, TextColumn + from rich.prompt import Confirm, Prompt + from rich.table import Table + from rich.text import Text + RICH_AVAILABLE = True +except ImportError: + RICH_AVAILABLE = False + load_dotenv() +class SimpleConsole: + """A fallback console that mimics rich.Console with simple print statements.""" + + def print(self, message: Any = "") -> None: + """Strips rich markup and prints the message. + + Args: + message: The object or text to print. + """ + message_str = str(message) + cleaned_message = re.sub(r'\[/?[a-zA-Z\s=]+\]', '', message_str) + print(cleaned_message) + + def format_bytes(byte_count: int) -> str: - """Formats a byte count into a human-readable string (KB, MB).""" + """Formats a byte count into a human-readable string (KB, MB). + + Args: + byte_count: The number of bytes. + + Returns: + A formatted string representing the size. + """ if byte_count > 1024 * 1024: return f"{byte_count / (1024 * 1024):.2f} MB" if byte_count > 1024: @@ -31,8 +58,13 @@ def format_bytes(byte_count: int) -> str: return f"{byte_count} Bytes" -def save_to_json(console: Console, config: dict[str, Any]): - """Saves configuration to a .scriber.json file.""" +def save_to_json(console: Any, config: dict[str, Any]): + """Saves configuration to a .scriber.json file. + + Args: + console: The console instance for printing output. + config: The configuration dictionary to save. + """ config_path = Path.cwd() / ".scriber.json" try: with open(config_path, "w", encoding="utf-8") as f: @@ -42,8 +74,13 @@ def save_to_json(console: Console, config: dict[str, Any]): console.print(f"\nβ [bold red]Error saving config file:[/] {e}") -def save_to_toml(console: Console, config: dict[str, Any]): - """Saves configuration to the pyproject.toml file.""" +def save_to_toml(console: Any, config: dict[str, Any]): + """Saves configuration to the pyproject.toml file. + + Args: + console: The console instance for printing output. + config: The configuration dictionary to save. + """ toml_path = Path.cwd() / "pyproject.toml" if not toml_path.exists(): console.print(f"\nβ [bold red]Error: `pyproject.toml` not found in the current directory.[/]") @@ -66,39 +103,52 @@ def save_to_toml(console: Console, config: dict[str, Any]): console.print(f"\nβ [bold red]Error updating `pyproject.toml`:[/] {e}") -def handle_init(args: argparse.Namespace, console: Console): - """Handles the interactive initialization of a config file.""" - console.print(Panel("[bold cyan]Scriber Configuration Setup[/]", expand=False)) +def handle_init(args: argparse.Namespace, console: Any, rich_available: bool): + """Handles the interactive initialization of a config file. + + Args: + args: The parsed command-line arguments. + console: The console instance for printing output. + rich_available: A boolean indicating if the 'rich' library is installed. + """ + if rich_available: + console.print(Panel("[bold cyan]Scriber Configuration Setup[/]", expand=False)) + else: + console.print("--- Scriber Configuration Setup ---") console.print("This utility will help you create a configuration file.\n") config: dict[str, Any] = {} - config["use_gitignore"] = Confirm.ask( - "β¨ Would you like to respect `.gitignore` rules?", default=True - ) + if rich_available: + config["use_gitignore"] = Confirm.ask("β¨ Would you like to respect `.gitignore` rules?", default=True) + default_exclude = ", ".join(DEFAULT_CONFIG.get("exclude", [])) + exclude_str = Prompt.ask("π Enter patterns to exclude (comma-separated)", default=default_exclude) + include_str = Prompt.ask("π Enter patterns to include (optional, comma-separated)", default="") + hidden_str = Prompt.ask("π Enter patterns to hide content for (e.g., lock files, optional, comma-separated)", default="") + else: + answer = input("β¨ Would you like to respect `.gitignore` rules? (Y/n) ").strip().lower() + config["use_gitignore"] = answer not in ['n', 'no'] + default_exclude = ", ".join(DEFAULT_CONFIG.get("exclude", [])) + exclude_str = input(f"π Enter patterns to exclude (comma-separated, default: {default_exclude}): ") or default_exclude + include_str = input("π Enter patterns to include (optional, comma-separated): ") + hidden_str = input("π Enter patterns to hide content for (e.g., lock files, optional, comma-separated): ") - default_exclude = ", ".join(DEFAULT_CONFIG.get("exclude", [])) - exclude_str = Prompt.ask( - "π Enter patterns to exclude (comma-separated)", default=default_exclude - ) config["exclude"] = [item.strip() for item in exclude_str.split(',') if item.strip()] - - include_str = Prompt.ask( - "π Enter patterns to include (optional, comma-separated)", default="" - ) include_patterns = [item.strip() for item in include_str.split(',') if item.strip()] if include_patterns: config["include"] = include_patterns + hidden_patterns = [item.strip() for item in hidden_str.split(",") if item.strip()] + if hidden_patterns: + config["hidden"] = hidden_patterns console.print("\n[bold]Choose a save location:[/bold]") console.print(" [cyan]1[/]: Save to `.scriber.json` (project-specific override)") console.print(" [cyan]2[/]: Save to `pyproject.toml` (project default)") - save_target = Prompt.ask( - "Enter your choice", - choices=["1", "2"], - default="1" - ) + if rich_available: + save_target = Prompt.ask("Enter your choice", choices=["1", "2"], default="1") + else: + save_target = input("Enter your choice (1/2, default: 1): ") or "1" if save_target == '1': save_to_json(console, config) @@ -106,72 +156,96 @@ def handle_init(args: argparse.Namespace, console: Console): save_to_toml(console, config) -def run_scriber(args: argparse.Namespace, console: Console, version: str): - """Handles the main logic of mapping and generating the project output.""" - title_text = Text(f"Scriber v{version}", justify="center", style="bold magenta") - subtitle_text = Text("An intelligent tool to map, analyze, and compile project source code for LLM context.", justify="center", style="cyan") - console.print(Panel(Text.assemble(title_text, "\n", subtitle_text), expand=False, border_style="blue")) +def run_scriber(args: argparse.Namespace, console: Any, version: str, rich_available: bool): + """Handles the main logic of mapping and generating the project output. + + Args: + args: The parsed command-line arguments. + console: The console instance for printing output. + version: The current version of the application. + rich_available: A boolean indicating if the 'rich' library is installed. + """ + if rich_available: + title_text = Text(f"Scriber v{version}", justify="center", style="bold magenta") + subtitle_text = Text("An intelligent tool to map, analyze, and compile project source code for LLM context.", justify="center", style="cyan") + console.print(Panel(Text.assemble(title_text, "\n", subtitle_text), expand=False, border_style="blue")) + else: + console.print(f"--- Scriber v{version} ---") scriber = Scriber(args.root_path.resolve(), config_path=args.config) output_filename = args.output or scriber.config.get("output", "project_structure.txt") scriber.map_project() - with Progress( - SpinnerColumn(), - TextColumn("[progress.description]{task.description}"), - BarColumn(), - TextColumn("[progress.percentage]{task.percentage:>3.0f}%"), - console=console, - transient=True - ) as progress: + progress = None + task_id = None + if rich_available: + progress_manager = Progress(SpinnerColumn(), TextColumn("[progress.description]{task.description}"), BarColumn(), TextColumn("[progress.percentage]{task.percentage:>3.0f}%"), console=console, transient=True) total_files = scriber.get_file_count() if total_files > 0 and not args.tree_only: - task_id = progress.add_task("[green]Processing files...", total=total_files) + task_id = progress_manager.add_task("[green]Processing files...", total=total_files) + progress = progress_manager + else: + console.print("Processing files...") + + if progress: + with progress: scriber.generate_output_file(output_filename, tree_only=args.tree_only, progress=progress, task_id=task_id) - else: - scriber.generate_output_file(output_filename, tree_only=args.tree_only) + else: + scriber.generate_output_file(output_filename, tree_only=args.tree_only) stats = scriber.get_stats() - config_file_display = str(scriber.config_path_used) if scriber.config_path_used else "Defaults" - summary_table = Table(box=rich.box.ROUNDED, show_header=False, title="[bold]Run Summary[/]", title_justify="left") - summary_table.add_column("Parameter", style="cyan", no_wrap=True) - summary_table.add_column("Value", style="magenta") - summary_table.add_row("Project Path", str(args.root_path.resolve())) - summary_table.add_row("Config File", config_file_display) - summary_table.add_row("Output File", output_filename) - console.print(summary_table) + + if rich_available: + summary_table = Table(box=rich.box.ROUNDED, show_header=False, title="[bold]Run Summary[/]", title_justify="left") + summary_table.add_column("Parameter", style="cyan", no_wrap=True) + summary_table.add_column("Value", style="magenta") + summary_table.add_row("Project Path", str(args.root_path.resolve())) + summary_table.add_row("Config File", config_file_display) + summary_table.add_row("Output File", output_filename) + console.print(summary_table) + else: + console.print("\n--- Run Summary ---") + console.print(f"Project Path: {str(args.root_path.resolve())}") + console.print(f"Config File: {config_file_display}") + console.print(f"Output File: {output_filename}") if stats['total_files'] > 0: - results_table = Table(box=rich.box.ROUNDED, show_header=False, title="[bold]π Analysis Results[/]", - title_justify="left") - results_table.add_column("Metric", style="cyan", no_wrap=True) - results_table.add_column("Value", style="magenta", justify="right") - - results_table.add_row("Files Mapped", str(stats['total_files'])) - if stats.get('skipped_binary') > 0: - results_table.add_row("Binary Skipped", str(stats['skipped_binary'])) - results_table.add_section() - results_table.add_row("Total Size", format_bytes(stats['total_size_bytes'])) - results_table.add_row("Est. Tokens (cl100k)", f"{stats['total_tokens']:,}") - results_table.add_section() - results_table.add_row("[bold]Language Breakdown[/]", "") - for lang, count in stats['language_counts'].most_common(): - results_table.add_row(f" {lang.capitalize()}", str(count)) - - console.print(results_table) + if rich_available: + results_table = Table(box=rich.box.ROUNDED, show_header=False, title="[bold]π Analysis Results[/]", title_justify="left") + results_table.add_column("Metric", style="cyan", no_wrap=True) + results_table.add_column("Value", style="magenta", justify="right") + results_table.add_row("Files Mapped", str(stats['total_files'])) + if stats.get('skipped_binary') > 0: + results_table.add_row("Binary Skipped", str(stats['skipped_binary'])) + results_table.add_section() + results_table.add_row("Total Size", format_bytes(stats['total_size_bytes'])) + results_table.add_row("Est. Tokens (cl100k)", f"{stats['total_tokens']:,}") + results_table.add_section() + results_table.add_row("[bold]Language Breakdown[/]", "") + for lang, count in stats['language_counts'].most_common(): + results_table.add_row(f" {lang.capitalize()}", str(count)) + console.print(results_table) + else: + console.print("\n--- Analysis Results ---") + console.print(f"Files Mapped: {stats['total_files']}") + if stats.get('skipped_binary') > 0: + console.print(f"Binary Skipped: {stats['skipped_binary']}") + console.print(f"Total Size: {format_bytes(stats['total_size_bytes'])}") + console.print(f"Est. Tokens (cl100k): {stats['total_tokens']:,}") + console.print("Language Breakdown:") + for lang, count in stats['language_counts'].most_common(): + console.print(f" {lang.capitalize()}: {count}") else: - console.print(Panel("[yellow]No files were mapped based on the current configuration.[/]", expand=False)) + if rich_available: + console.print(Panel("[yellow]No files were mapped based on the current configuration.[/]", expand=False)) + else: + console.print("No files were mapped based on the current configuration.") output_location = Path(args.root_path).resolve() / output_filename - console.print("\nβ [green]Success! Output saved to:[/green]") - try: - uri = output_location.as_uri() - console.print(Text(str(output_location), style=f"bold cyan underline link {uri}")) - except Exception: - console.print(Text(str(output_location), style="bold cyan underline")) + console.print(str(output_location)) if args.copy: try: @@ -185,67 +259,36 @@ def run_scriber(args: argparse.Namespace, console: Console, version: str): def main() -> None: """Parses arguments and runs the appropriate command.""" - console = Console() + if RICH_AVAILABLE: + console = Console() + else: + console = SimpleConsole() try: version = metadata.version("project-scriber") except metadata.PackageNotFoundError: version = "1.0.0 (local)" - parser = argparse.ArgumentParser( - description="Scriber: An intelligent tool to map, analyze, and compile project source code for LLM context." - ) - parser.add_argument( - "-v", "--version", - action="version", - version=f"%(prog)s v{version}", - help="Show the version number and exit." - ) - + parser = argparse.ArgumentParser(description="Scriber: An intelligent tool to map, analyze, and compile project source code for LLM context.") + parser.add_argument("-v", "--version", action="version", version=f"%(prog)s v{version}", help="Show the version number and exit.") subparsers = parser.add_subparsers(dest="command", title="Commands") - # `init` command subparser init_parser = subparsers.add_parser("init", help="Create a new configuration file interactively.") - init_parser.set_defaults(func=lambda args: handle_init(args, console)) + init_parser.set_defaults(func=lambda args: handle_init(args, console, RICH_AVAILABLE)) - # `run` command subparser run_parser = subparsers.add_parser("run", help="Map the project structure (default command).") - exec_mode = os.environ.get('SCRIBER_EXEC_MODE') default_path = Path.cwd().parent if exec_mode == 'RUN_PY' else Path.cwd() if exec_mode == 'RUN_PY': del os.environ['SCRIBER_EXEC_MODE'] - run_parser.add_argument( - "root_path", - nargs="?", - default=os.environ.get("PROJECT_SCRIBER_ROOT", default_path), - type=Path, - help="The root directory of the project to map. Defaults to the current directory.", - ) - run_parser.add_argument( - "-o", "--output", - help="The name of the output file. Overrides config file settings.", - ) - run_parser.add_argument( - "--config", - default=os.environ.get("PROJECT_SCRIBER_CONFIG"), - type=Path, - help="Path to a custom configuration file." - ) - run_parser.add_argument( - "-c", "--copy", - action="store_true", - help="Copy the final output to the clipboard.", - ) - run_parser.add_argument( - "--tree-only", - action="store_true", - help="Generate only the file tree structure without file content.", - ) - run_parser.set_defaults(func=lambda args: run_scriber(args, console, version)) - - # Pre-process args to insert 'run' as the default command + run_parser.add_argument("root_path", nargs="?", default=os.environ.get("PROJECT_SCRIBER_ROOT", default_path), type=Path, help="The root directory of the project to map. Defaults to the current directory.") + run_parser.add_argument("-o", "--output", help="The name of the output file. Overrides config file settings.") + run_parser.add_argument("--config", default=os.environ.get("PROJECT_SCRIBER_CONFIG"), type=Path, help="Path to a custom configuration file.") + run_parser.add_argument("-c", "--copy", action="store_true", help="Copy the final output to the clipboard.") + run_parser.add_argument("--tree-only", action="store_true", help="Generate only the file tree structure without file content.") + run_parser.set_defaults(func=lambda args: run_scriber(args, console, version, RICH_AVAILABLE)) + args_to_parse = sys.argv[1:] global_flags = ['-h', '--help', '-v', '--version'] @@ -257,8 +300,6 @@ def main() -> None: if hasattr(args, 'func'): args.func(args) else: - # This branch is hit for global flags like -h, --help, -v, --version - # which are handled by argparse and exit, or if no func is set. parser.print_help() diff --git a/src/scriber/core.py b/src/scriber/core.py index 93b1420..249fe95 100644 --- a/src/scriber/core.py +++ b/src/scriber/core.py @@ -1,9 +1,12 @@ import fnmatch +import io import json import os +import sys from collections import Counter +from concurrent.futures import ProcessPoolExecutor, as_completed from pathlib import Path -from typing import Any, Dict, List, Optional, Set, TextIO +from typing import Any, Dict, List, Optional, Set, TextIO, Union try: import tomllib @@ -11,49 +14,79 @@ import tomli as tomllib import tiktoken -from rich.console import Console _DEFAULT_OUTPUT_FILENAME = "scriber_output.txt" _CONFIG_FILE_NAME = ".scriber.json" DEFAULT_CONFIG = { "use_gitignore": True, "exclude": [ - # Common "LICENSE", - - # Version Control ".git", - - # IDE / Editor Config ".idea", ".vscode", ".project", ".settings", ".classpath", - - # Python "__pycache__", "*.pyc", ".venv", "venv", ".pytest_cache", "uv.lock", - - # Node.js "node_modules", "npm-debug.log*", "yarn-error.log", - - # Build Artifacts "build", "dist", "target", "bin", "obj", "out", - - # Dependencies "vendor", "bower_components", - - # Logs & Temp Files "*.log", "*.lock", "*.tmp", "temp", "tmp", - - # OS-specific ".DS_Store", "Thumbs.db", "*~", "*.swp", "*.swo", - - # Scriber's own files _DEFAULT_OUTPUT_FILENAME, _CONFIG_FILE_NAME ], + "exclude_map": {}, "include": [], + "hidden": [], "output": _DEFAULT_OUTPUT_FILENAME, } +def _process_file_worker( + file_path: Path, + containing_root: Path, + hidden_patterns: Set[str], + language_map: Dict[str, str], + tokenizer: Optional[Any], +) -> Dict[str, Any]: + """Processes a single file to gather stats; safe for multiprocessing. + + This function is defined at the top level to avoid pickling issues with + instance methods that have un-pickleable attributes (like rich.Console). + + Args: + file_path: The path of the file to process. + containing_root: The root directory that contains the file. + hidden_patterns: A set of patterns for files whose content should be hidden. + language_map: A dictionary mapping file extensions to languages. + tokenizer: The tiktoken tokenizer instance. + + Returns: + A dictionary containing the size, token count, and language of the file. + """ + stats: Dict[str, Any] = {"size": 0, "tokens": 0, "lang": "other"} + try: + stats["size"] = file_path.stat().st_size + stats["lang"] = language_map.get(file_path.suffix, language_map.get(file_path.name, "")) or "other" + + is_hidden = False + if hidden_patterns: + relative_path_str = file_path.relative_to(containing_root).as_posix() + if any(fnmatch.fnmatch(relative_path_str, pattern) for pattern in hidden_patterns): + is_hidden = True + + if not is_hidden and tokenizer: + content = file_path.read_text(encoding="utf-8", errors="ignore") + stats["tokens"] = len(tokenizer.encode(content)) + except Exception: + pass + return stats + + class Scriber: + """ + Maps, analyzes, and compiles a project's source code into a single output. + + This class can be used programmatically to gain fine-grained control over the + project mapping process, access intermediate data like file lists and + statistics, and get the final output as a string for further processing. + """ _CONFIG_FILE_NAME = _CONFIG_FILE_NAME _LANGUAGE_MAP = { ".asm": "asm", ".s": "asm", ".html": "html", ".htm": "html", ".css": "css", @@ -77,15 +110,47 @@ class Scriber: ".vert": "glsl", ".vb": "vbnet", ".vbs": "vbscript", } - def __init__(self, root_path: Path, config_path: Optional[Path] = None): - self.root_path = root_path.resolve() + def __init__( + self, + root_path: Union[Path, List[Path]], + config: Optional[Dict[str, Any]] = None, + config_path: Optional[Path] = None + ): + """Initializes the Scriber instance. + + Args: + root_path: An absolute path or a list of absolute paths to the root + directories of the project(s) to be mapped. + config: An optional dictionary of configuration settings. Takes the + highest precedence if provided. + config_path: An optional path to a specific configuration file. + """ + self.root_paths: List[Path] = ([root_path] if isinstance(root_path, Path) else root_path) + self.primary_root: Path = self.root_paths[0].resolve() + self.mapped_files: List[Path] = [] self._user_config_path = config_path - self._console = Console(stderr=True, style="bold red") + self._user_config_dict = config self.config: Dict[str, Any] = {} self.config_path_used: Optional[Path] = None self.gitignore_spec: Optional[Any] = None + self.hidden_patterns: Set[str] = set() + self.include_patterns: List[str] = [] + self.exclude_patterns: Set[str] = set() + self.exclude_map: Dict[str, List[str]] = {} + + self.stats = {} + self._has_mapped = False + self._reset_stats() + self._load_config() + try: + self._tokenizer = tiktoken.get_encoding("cl100k_base") + except Exception: + self._tokenizer = None + def _reset_stats(self): + """Resets the statistics and mapped files to their initial state.""" + self.mapped_files = [] self.stats = { "total_files": 0, "total_size_bytes": 0, @@ -93,83 +158,90 @@ def __init__(self, root_path: Path, config_path: Optional[Path] = None): "language_counts": Counter(), "skipped_binary": 0, } - - self._load_config() - try: - self._tokenizer = tiktoken.get_encoding("cl100k_base") - except Exception: - self._tokenizer = None + self._has_mapped = False def _create_default_config_file(self) -> None: """Creates a default .scriber.json config file if no other config is found.""" - config_path = self.root_path / self._CONFIG_FILE_NAME - self._console.print(f"β¨ [yellow]No config found. Creating default configuration at:[/] {config_path}") + config_path = self.primary_root / self._CONFIG_FILE_NAME + print(f"β¨ No config found. Creating default configuration at: {config_path}", file=sys.stderr) file_config = { "use_gitignore": DEFAULT_CONFIG.get("use_gitignore", True), "exclude": DEFAULT_CONFIG.get("exclude", []), - "include": DEFAULT_CONFIG.get("include", []) + "include": DEFAULT_CONFIG.get("include", []), + "hidden": DEFAULT_CONFIG.get("hidden", []) } try: with config_path.open("w", encoding="utf-8") as f: json.dump(file_config, f, indent=2) except IOError as e: - self._console.print(f"β [bold red]Could not create default config file:[/] {e}") + print(f"β Could not create default config file: {e}", file=sys.stderr) def _load_config(self) -> None: - """Loads configuration with a clear precedence: --config, .scriber.json, pyproject.toml.""" + """Loads configuration with a clear precedence: direct dict > config_path > local files.""" config = DEFAULT_CONFIG.copy() - config_path_to_use = None - config_loaded = False + config_source_loaded = False - if self._user_config_path: - if self._user_config_path.is_file(): - config_path_to_use = self._user_config_path - else: - self._console.print(f"Warning: Config file specified by --config not found at {self._user_config_path}") + if self._user_config_dict: + config.update(self._user_config_dict) + config_source_loaded = True + self.config_path_used = None else: - json_path = self.root_path / self._CONFIG_FILE_NAME - toml_path = self.root_path / "pyproject.toml" - if json_path.is_file(): - config_path_to_use = json_path - elif toml_path.is_file(): - config_path_to_use = toml_path - - if config_path_to_use: - self.config_path_used = config_path_to_use - try: - if config_path_to_use.suffix == ".toml": - with config_path_to_use.open("rb") as f: - toml_data = tomllib.load(f) - if "tool" in toml_data and "scriber" in toml_data["tool"]: - config.update(toml_data["tool"]["scriber"]) - config_loaded = True - else: - with config_path_to_use.open("r", encoding="utf-8") as f: - config.update(json.load(f)) - config_loaded = True - except (json.JSONDecodeError, tomllib.TOMLDecodeError, IOError) as e: - self._console.print(f"Error parsing config file {self.config_path_used}: {e}") - - if not config_loaded and not self._user_config_path: + config_path_to_use = self._user_config_path + if config_path_to_use: + if not config_path_to_use.is_file(): + print(f"Warning: Config file specified by --config not found at {self._user_config_path}", file=sys.stderr) + config_path_to_use = None + else: + json_path = self.primary_root / self._CONFIG_FILE_NAME + toml_path = self.primary_root / "pyproject.toml" + if json_path.is_file(): + config_path_to_use = json_path + elif toml_path.is_file(): + config_path_to_use = toml_path + + if config_path_to_use: + self.config_path_used = config_path_to_use + try: + if config_path_to_use.suffix == ".toml": + with config_path_to_use.open("rb") as f: + toml_data = tomllib.load(f) + if "tool" in toml_data and "scriber" in toml_data["tool"]: + config.update(toml_data["tool"]["scriber"]) + config_source_loaded = True + else: + with config_path_to_use.open("r", encoding="utf-8") as f: + config.update(json.load(f)) + config_source_loaded = True + except (json.JSONDecodeError, tomllib.TOMLDecodeError, IOError) as e: + print(f"Error parsing config file {self.config_path_used}: {e}", file=sys.stderr) + + if not config_source_loaded and not self._user_config_dict and self._user_config_path is None: self._create_default_config_file() self.config = config self.include_patterns: List[str] = self.config.get("include", []) self.exclude_patterns: Set[str] = set(self.config.get("exclude", [])) + self.hidden_patterns: Set[str] = set(self.config.get("hidden", [])) + self.exclude_map: Dict[str, List[str]] = self.config.get("exclude_map", {}) self._load_gitignore(self.config.get("use_gitignore", True)) def _load_gitignore(self, use_gitignore: bool) -> None: + """Loads gitignore patterns from the .gitignore file if enabled. + + Args: + use_gitignore: A boolean indicating whether to use .gitignore rules. + """ try: import pathspec except ImportError: - self._console.print("Warning: 'pathspec' not installed. .gitignore files will be ignored.") + print("Warning: 'pathspec' not installed. .gitignore files will be ignored.", file=sys.stderr) self.gitignore_spec = None return self.gitignore_spec: Optional[pathspec.PathSpec] = None if not use_gitignore: return - gitignore_path = self.root_path / ".gitignore" + gitignore_path = self.primary_root / ".gitignore" if gitignore_path.is_file(): try: with gitignore_path.open("r", encoding="utf-8") as f: @@ -177,7 +249,32 @@ def _load_gitignore(self, use_gitignore: bool) -> None: except IOError: pass + def _find_containing_root(self, path: Path) -> Optional[Path]: + """Finds which root directory from self.root_paths contains the given path. + + Args: + path: The path to check. + + Returns: + The containing root path, or None if not found. + """ + for r in self.root_paths: + try: + if path.is_relative_to(r): + return r + except ValueError: + continue + return None + def _is_binary(self, path: Path) -> bool: + """Checks if a file is likely a binary file. + + Args: + path: The path to the file. + + Returns: + True if the file contains null bytes, False otherwise. + """ try: with path.open('rb') as f: return b'\0' in f.read(1024) @@ -185,89 +282,292 @@ def _is_binary(self, path: Path) -> bool: return True def _is_excluded(self, path: Path) -> bool: - try: - relative_path = path.relative_to(self.root_path) - check_set = set(relative_path.parts) - except ValueError: + """Determines if a file or directory should be excluded from mapping. + + Args: + path: The path to check. + + Returns: + True if the path should be excluded, False otherwise. + """ + containing_root = self._find_containing_root(path) + if not containing_root: + return True + + if self.gitignore_spec: + try: + relative_path_for_gitignore = path.relative_to(self.primary_root).as_posix() + if self.gitignore_spec.match_file(relative_path_for_gitignore): + return True + except ValueError: + pass + + relative_path = path.relative_to(containing_root) + check_set = set(relative_path.parts) + if not self.exclude_patterns.isdisjoint(check_set): + return True + + if any(fnmatch.fnmatch(part, pattern) for pattern in self.exclude_patterns for part in check_set): return True - if not self.exclude_patterns.isdisjoint(check_set): return True + if path.is_file(): + relative_path_str = relative_path.as_posix() + global_patterns = self.exclude_map.get("global", []) + if any(fnmatch.fnmatch(relative_path_str, pattern) for pattern in global_patterns): + return True + + lang = self._get_language(path) + if lang and lang in self.exclude_map: + lang_patterns = self.exclude_map.get(lang, []) + if any(fnmatch.fnmatch(relative_path_str, pattern) for pattern in lang_patterns): + return True + + if self.include_patterns: + return not any(fnmatch.fnmatch(relative_path_str, pattern) for pattern in self.include_patterns) - relative_path_str = relative_path.as_posix() - if self.gitignore_spec and self.gitignore_spec.match_file(relative_path_str): return True - if any(fnmatch.fnmatch(part, pattern) for pattern in self.exclude_patterns for part in check_set): return True - if path.is_file() and self.include_patterns: - return not any(fnmatch.fnmatch(relative_path_str, pattern) for pattern in self.include_patterns) return False - def _collect_files(self) -> None: + def _is_hidden(self, path: Path) -> bool: + """Checks if a path matches any of the hidden patterns. + + Args: + path: The path to check. + + Returns: + True if the path matches a hidden pattern, False otherwise. + """ + if not self.hidden_patterns: + return False + containing_root = self._find_containing_root(path) + if not containing_root: + return False + relative_path_str = path.relative_to(containing_root).as_posix() + return any(fnmatch.fnmatch(relative_path_str, pattern) for pattern in self.hidden_patterns) + + def _collect_files(self, perform_binary_check: bool = True) -> None: + """Walks the project directory and collects all non-excluded files. + + Args: + perform_binary_check: If False, skips the check for binary files. + """ collected = set() - for root, dirs, files in os.walk(self.root_path, topdown=True): - current_root = Path(root) - dirs[:] = [d for d in dirs if not self._is_excluded(current_root / d)] - for file in files: - file_path = current_root / file - if not self._is_excluded(file_path): - if self._is_binary(file_path): - self.stats['skipped_binary'] += 1 - continue - collected.add(file_path) + for root_dir in self.root_paths: + for root, dirs, files in os.walk(root_dir, topdown=True): + current_root = Path(root) + dirs[:] = [d for d in dirs if not self._is_excluded(current_root / d)] + for file in files: + file_path = current_root / file + if not self._is_excluded(file_path): + if perform_binary_check and self._is_binary(file_path): + self.stats['skipped_binary'] += 1 + continue + collected.add(file_path) self.mapped_files = sorted(list(collected)) def map_project(self) -> None: """Maps all relevant project files and gathers statistics.""" - self._collect_files() + self._reset_stats() + self._collect_files(perform_binary_check=True) self._gather_stats() + self._has_mapped = True + + def map_tree_only(self) -> None: + """Maps only the project file structure without reading file contents.""" + self._reset_stats() + self._collect_files(perform_binary_check=False) + self.stats['total_files'] = len(self.mapped_files) + self._has_mapped = True def _gather_stats(self) -> None: - if not self.mapped_files: return + """Gathers statistics about the mapped files using multi-processing.""" + if not self.mapped_files: + return self.stats['total_files'] = len(self.mapped_files) total_size = 0 total_tokens = 0 - - for file_path in self.mapped_files: - total_size += file_path.stat().st_size - lang = self._get_language(file_path) or "other" - self.stats['language_counts'][lang] += 1 - if self._tokenizer: + language_counts: Counter = Counter() + + with ProcessPoolExecutor() as executor: + futures = [] + for path in self.mapped_files: + containing_root = self._find_containing_root(path) + if containing_root: + futures.append(executor.submit( + _process_file_worker, + path, + containing_root, + self.hidden_patterns, + self._LANGUAGE_MAP, + self._tokenizer, + )) + + for future in as_completed(futures): try: - content = file_path.read_text(encoding="utf-8", errors="ignore") - total_tokens += len(self._tokenizer.encode(content)) - except Exception: - pass + file_stats = future.result() + total_size += file_stats["size"] + total_tokens += file_stats["tokens"] + language_counts[file_stats["lang"]] += 1 + except Exception as exc: + print(f"File processing generated an exception: {exc}", file=sys.stderr) self.stats['total_size_bytes'] = total_size self.stats['total_tokens'] = total_tokens + self.stats['language_counts'] = language_counts def get_stats(self) -> Dict: - """Returns the raw project statistics.""" + """Returns the collected project statistics. + + If the project has not been mapped yet, `map_project()` will be called first. + + Returns: + A dictionary containing project statistics. + """ + if not self._has_mapped: + self.map_project() return self.stats def get_file_count(self) -> int: - """Returns the number of files that will be mapped.""" + """Returns the number of files that will be mapped. + + If the project has not been mapped yet, `map_project()` will be called first. + + Returns: + The total count of mapped files. + """ + if not self._has_mapped: + self.map_project() return len(self.mapped_files) + def get_mapped_files(self) -> List[Path]: + """Returns a list of all mapped file paths. + + If the project has not been mapped yet, `map_project()` will be called first. + + Returns: + A sorted list of `pathlib.Path` objects for all included files. + """ + if not self._has_mapped: + self.map_project() + return self.mapped_files + + def get_tree(self) -> str: + """Returns the formatted file tree representation as a string. + + If the project has not been mapped yet, `map_project()` will be called first. + + Returns: + A string containing the formatted file tree. + """ + if not self._has_mapped: + self.map_project() + return self._get_tree_representation() + + def get_output_as_string(self, tree_only: bool = False) -> str: + """Generates the consolidated project output and returns it as a string. + + If the project has not been mapped yet, `map_project()` will be called first. + + Args: + tree_only: If True, the string will only contain the file tree. + + Returns: + A string containing the complete project map and file contents. + """ + if not self._has_mapped: + if tree_only: + self.map_tree_only() + else: + self.map_project() + output_buffer = io.StringIO() + self._write_output(output_buffer, tree_only, progress=None, task_id=None) + return output_buffer.getvalue() + def generate_output_file(self, output_filename: str, tree_only: bool = False, progress=None, task_id=None) -> None: - """Generates the consolidated project structure output file.""" - output_filepath = self.root_path / output_filename + """Generates the consolidated project structure output file. + + Args: + output_filename: The name for the output file. + tree_only: If True, only the file tree is written. + progress: A Rich Progress instance for updating the progress bar. + task_id: The ID of the task in the Rich Progress instance. + """ + if not self._has_mapped: + if tree_only: + self.map_tree_only() + else: + self.map_project() + output_filepath = self.primary_root / output_filename with output_filepath.open("w", encoding="utf-8") as f: self._write_output(f, tree_only, progress, task_id) def _write_output(self, f: TextIO, tree_only: bool, progress, task_id) -> None: + """Writes the complete project map and file contents to an open file stream. + + Args: + f: The file stream to write to. + tree_only: If True, only write the file tree. + progress: A Rich Progress instance for updating the progress bar. + task_id: The ID of the task in the Rich Progress instance. + """ f.write("=" * 3 + "\n Mapped Folder Structure\n" + "=" * 3 + "\n\n") f.write(self._get_tree_representation() + "\n") - if tree_only: return + if tree_only: + return for file_path in self.mapped_files: - self._write_file_content(f, file_path) + if self._is_hidden(file_path): + self._write_hidden_file_placeholder(f, file_path) + else: + self._write_file_content(f, file_path) if progress and task_id is not None: progress.update(task_id, advance=1) + def _get_display_path(self, file_path: Path) -> str: + """Gets the path to display in the output header. + + Args: + file_path: The absolute path to the file. + + Returns: + A string representing the path for display. + """ + containing_root = self._find_containing_root(file_path) + if not containing_root: + return file_path.name + + relative_path = file_path.relative_to(containing_root) + if len(self.root_paths) > 1: + return (Path(containing_root.name) / relative_path).as_posix() + return relative_path.as_posix() + + def _write_hidden_file_placeholder(self, f: TextIO, file_path: Path) -> None: + """Writes a placeholder for a hidden file's content. + + Args: + f: The file stream to write to. + file_path: The path of the hidden file. + """ + try: + display_path = self._get_display_path(file_path) + file_size = file_path.stat().st_size + except (OSError, ValueError): + return + + f.write("\n" + "-" * 3 + "\n") + f.write(f"File: {display_path}\nSize: {file_size} bytes\n" + "-" * 3 + "\n") + f.write("```\n[Content hidden based on configuration]\n```\n") + def _write_file_content(self, f: TextIO, file_path: Path) -> None: + """Writes a single file's content to the output stream. + + Args: + f: The file stream to write to. + file_path: The path of the file to write. + """ try: - relative_path = file_path.relative_to(self.root_path).as_posix() + display_path = self._get_display_path(file_path) file_size = file_path.stat().st_size lang = self._get_language(file_path) content = file_path.read_text(encoding="utf-8", errors="ignore") @@ -275,13 +575,26 @@ def _write_file_content(self, f: TextIO, file_path: Path) -> None: return f.write("\n" + "-" * 3 + "\n") - f.write(f"File: {relative_path}\nSize: {file_size} bytes\n" + "-" * 3 + "\n") + f.write(f"File: {display_path}\nSize: {file_size} bytes\n" + "-" * 3 + "\n") f.write(f"```{lang}\n{content}\n```\n") def _get_language(self, file_path: Path) -> str: + """Determines the programming language of a file based on its extension. + + Args: + file_path: The path to the file. + + Returns: + A string representing the language, or an empty string if not found. + """ return self._LANGUAGE_MAP.get(file_path.suffix, self._LANGUAGE_MAP.get(file_path.name, "")) def _get_tree_representation(self) -> str: + """Generates a string representation of the project's file tree. + + Returns: + A formatted string of the file tree. + """ tree = self._build_file_tree() if not tree: return "No files or folders to map." @@ -297,18 +610,47 @@ def format_tree(d: Dict, prefix: str = "") -> List[str]: lines.extend(format_tree(d[key], new_prefix)) return lines - root_name = list(tree.keys())[0] - output_lines = [root_name] - output_lines.extend(format_tree(tree[root_name])) + if len(self.root_paths) == 1: + root_name = list(tree.keys())[0] + output_lines = [root_name] + output_lines.extend(format_tree(tree[root_name])) + else: + output_lines = [] + for root_name, subtree in sorted(tree.items()): + output_lines.append(root_name) + output_lines.extend(format_tree(subtree)) return "\n".join(output_lines) def _build_file_tree(self) -> Dict[str, Any]: + """Builds a nested dictionary representing the file tree structure. + + Returns: + A dictionary representing the project's file hierarchy. + """ if not self.mapped_files: return {} - tree = {self.root_path.name: {}} - project_level = tree[self.root_path.name] - for path in self.mapped_files: - parts = path.relative_to(self.root_path).parts - current_level = project_level - for part in parts: - current_level = current_level.setdefault(part, {}) - return tree \ No newline at end of file + + if len(self.root_paths) == 1: + tree = {self.primary_root.name: {}} + project_level = tree[self.primary_root.name] + for path in self.mapped_files: + parts = path.relative_to(self.primary_root).parts + current_level = project_level + for part in parts: + current_level = current_level.setdefault(part, {}) + return tree + else: + tree = {} + for path in self.mapped_files: + containing_root = self._find_containing_root(path) + if not containing_root: + continue + + root_name = containing_root.name + if root_name not in tree: + tree[root_name] = {} + + parts = path.relative_to(containing_root).parts + current_level = tree[root_name] + for part in parts: + current_level = current_level.setdefault(part, {}) + return tree \ No newline at end of file diff --git a/tests/test_suite.py b/tests/test_suite.py index 56c4eb5..421fe9f 100644 --- a/tests/test_suite.py +++ b/tests/test_suite.py @@ -1,9 +1,11 @@ +import io import json from collections import Counter from pathlib import Path from unittest.mock import MagicMock, patch import pytest +import tiktoken try: import tomllib @@ -15,6 +17,15 @@ from src.scriber.core import Scriber +def test_direct_import(): + """Tests that the Scriber class can be imported directly from the package.""" + try: + from src.scriber import Scriber + except ImportError: + pytest.fail("Could not import Scriber from src.scriber") + assert callable(Scriber) + + # --- Test Core Scriber Functionality --- class TestCore: @@ -67,7 +78,8 @@ def test_binary_file_skipping(self, tmp_path: Path): """Tests that binary files are detected and correctly skipped.""" (tmp_path / "app.exe").write_bytes(b"\x4d\x5a\x90\x00\x03\x00\x00\x00") - scriber = Scriber(root_path=tmp_path) + config = {"include": ["app.exe"], "exclude": []} + scriber = Scriber(root_path=tmp_path, config=config) scriber.map_project() assert len(scriber.mapped_files) == 0 @@ -86,6 +98,109 @@ def test_include_patterns(self, tmp_path: Path): paths = {p.name for p in scriber.mapped_files} assert paths == {"main.py", "script.js"} + def test_exclude_map_dictionary(self, tmp_path: Path): + """Tests that the exclude_map dictionary filter works as intended.""" + (tmp_path / "app.py").touch() + (tmp_path / "utils_test.py").touch() + (tmp_path / "script.js").touch() + (tmp_path / "archive.log").touch() + (tmp_path / "README.md").touch() + + config = { + "exclude_map": { + "python": ["*_test.py"], + "global": ["*.log"] + }, + "exclude": [], + "include": [] + } + scriber = Scriber(root_path=tmp_path, config=config) + files = scriber.get_mapped_files() + mapped_names = {p.name for p in files} + + assert "app.py" in mapped_names + assert "script.js" in mapped_names + assert "README.md" in mapped_names + assert "utils_test.py" not in mapped_names + assert "archive.log" not in mapped_names + assert len(mapped_names) == 3 + + def test_hidden_files_are_in_tree_but_content_is_skipped(self, tmp_path: Path): + """Tests that hidden files appear in the tree but their content is not in the output.""" + (tmp_path / "main.py").write_text("print('hello')") + lock_content = "some-lock-file-content" + (tmp_path / "poetry.lock").write_text(lock_content) + config = {"hidden": ["poetry.lock"], "exclude": []} + (tmp_path / ".scriber.json").write_text(json.dumps(config)) + + scriber = Scriber(root_path=tmp_path) + scriber.map_project() + + output_buffer = io.StringIO() + scriber._write_output(output_buffer, tree_only=False, progress=None, task_id=None) + output = output_buffer.getvalue() + + assert "poetry.lock" in output + assert "[Content hidden based on configuration]" in output + assert lock_content not in output + assert "print('hello')" in output + + def test_hidden_files_are_excluded_from_token_count(self, tmp_path: Path): + """Tests that hidden files contribute to size but not token count.""" + main_py_content = "def main(): pass" + (tmp_path / "main.py").write_text(main_py_content) + (tmp_path / "poetry.lock").write_text("some-lock-file-content") + config = {"hidden": ["poetry.lock"], "exclude": [".scriber.json"]} + (tmp_path / ".scriber.json").write_text(json.dumps(config)) + + scriber = Scriber(root_path=tmp_path) + scriber.map_project() + stats = scriber.get_stats() + + tokenizer = tiktoken.get_encoding("cl100k_base") + expected_tokens = len(tokenizer.encode(main_py_content)) + + assert stats["total_files"] == 2 + assert stats["total_tokens"] == expected_tokens + assert stats["total_size_bytes"] == ( + (tmp_path / "main.py").stat().st_size + + (tmp_path / "poetry.lock").stat().st_size + ) + + def test_init_with_direct_config_dictionary(self, tmp_path: Path): + """Tests that Scriber can be configured directly with a dictionary.""" + (tmp_path / "app.py").touch() + (tmp_path / "data.json").touch() + direct_config = {"include": ["*.py"], "exclude": []} + + scriber = Scriber(root_path=tmp_path, config=direct_config) + files = scriber.get_mapped_files() + + paths = {p.name for p in files} + assert paths == {"app.py"} + assert scriber.config_path_used is None + + def test_get_output_as_string(self, tmp_path: Path): + """Tests that the full project map can be retrieved as a string.""" + (tmp_path / "main.py").write_text("print('test')") + scriber = Scriber(root_path=tmp_path) + output_str = scriber.get_output_as_string() + + assert isinstance(output_str, str) + assert "Mapped Folder Structure" in output_str + assert "main.py" in output_str + assert "print('test')" in output_str + + def test_getters_trigger_map_project_automatically(self, tmp_path: Path): + """Tests that getter methods automatically call map_project if not already run.""" + (tmp_path / "test.txt").touch() + scriber = Scriber(root_path=tmp_path) + + assert not scriber.mapped_files + stats = scriber.get_stats() + assert len(scriber.mapped_files) == 1 + assert stats["total_files"] == 1 + def test_core_loads_external_toml_config(self, tmp_path: Path): """Tests core logic loads config from an external pyproject.toml via config_path.""" config_dir = tmp_path / "config" @@ -129,7 +244,9 @@ def test_tree_representation(self, tmp_path: Path): " βββ main.py", ] actual_lines = tree_str.split('\n') - assert actual_lines == expected_lines + assert actual_lines[0] == tmp_path.name + assert actual_lines[1:] == expected_lines[1:] + @pytest.mark.parametrize("filename, expected_lang", [ ("test.py", "python"), @@ -144,6 +261,40 @@ def test_language_detection(self, tmp_path: Path, filename: str, expected_lang: lang = scriber._get_language(Path(filename)) assert lang == expected_lang + def test_multi_root_collection(self, tmp_path: Path): + """Tests that files from multiple root directories are collected.""" + project_a = tmp_path / "project_a" + project_a.mkdir() + (project_a / "a.py").touch() + + project_b = tmp_path / "project_b" + project_b.mkdir() + (project_b / "b.js").touch() + + scriber = Scriber(root_path=[project_a, project_b]) + scriber.map_project() + mapped_names = {p.name for p in scriber.mapped_files} + + assert mapped_names == {"a.py", "b.js"} + assert len(scriber.mapped_files) == 2 + + def test_multi_root_tree_and_output(self, tmp_path: Path): + """Tests tree and output format for multiple roots.""" + project_a = tmp_path / "project_a" + project_a.mkdir() + (project_a / "a.py").write_text("print('a')") + + project_b = tmp_path / "project_b" + project_b.mkdir() + (project_b / "b.js").write_text("console.log('b')") + + scriber = Scriber(root_path=[project_a, project_b]) + output = scriber.get_output_as_string() + + assert "project_a\nβββ a.py" in output + assert "project_b\nβββ b.js" in output + assert f"File: project_a/a.py" in output + assert f"File: project_b/b.js" in output # --- Test CLI Functionality --- @@ -188,7 +339,7 @@ def test_cli_init_command_creates_config(self, mock_prompt, mock_confirm, tmp_pa """Tests the interactive 'init' command for config file creation.""" mocker.patch('pathlib.Path.cwd', return_value=tmp_path) mock_confirm.return_value = False - mock_prompt.side_effect = ["*.tmp, *.log", "*.py", "1"] + mock_prompt.side_effect = ["*.tmp, *.log", "*.py", "", "1"] mocker.patch('sys.argv', ['scriber', 'init']) cli_main() @@ -213,7 +364,7 @@ def test_cli_init_command_creates_config_in_toml(self, mock_prompt, mock_confirm pyproject_path.write_text("[project]\nname = 'test-project'") mock_confirm.return_value = True - mock_prompt.side_effect = ["*.log, .env", "*.py", "2"] + mock_prompt.side_effect = ["*.log, .env", "*.py", "*.lock", "2"] mocker.patch('sys.argv', ['scriber', 'init']) cli_main() @@ -229,6 +380,7 @@ def test_cli_init_command_creates_config_in_toml(self, mock_prompt, mock_confirm assert scriber_config['use_gitignore'] is True assert scriber_config['exclude'] == ['*.log', '.env'] assert scriber_config['include'] == ['*.py'] + assert scriber_config['hidden'] == ['*.lock'] @pytest.mark.parametrize("bytes_val, expected_str", [ (500, "500 Bytes"),