Expand moderation schema and extend auxiliary test coverage#1
Expand moderation schema and extend auxiliary test coverage#1ezedeem223 wants to merge 7 commits intomainfrom
Conversation
Reviewer's GuideThis PR enriches the reporting schema with a direct link to the reported user, refactors several core modules (analytics, application factory, configuration, utilities) to defer heavy dependencies and improve testability, introduces a lightweight Insights feature set, and delivers an extensive suite of targeted unit tests covering caching, crypto, content filtering, link preview, media processing, moderation flows, analytics heuristics, and router registration. Entity relationship diagram for expanded Report and User modelserDiagram
User ||--o{ Report : "reports"
User ||--o{ Report : "reports_received"
Report }o--|| User : "reporter"
Report }o--|| User : "reported_user"
Report {
int id
string report_reason
int post_id
int comment_id
int reporter_id
int reported_user_id
datetime created_at
enum status
bool is_valid
bool ai_detected
float ai_confidence
}
User {
int id
}
Class diagram for new Insights analytics modelsclassDiagram
class CohortInput {
date start_date
int size
int returning
}
class InsightsRequest {
List~int~ daily_active_users
List~int~ new_signups
List~int~ churned_users
Dict~str, int~ feature_usage
List~CohortInput~ cohorts
float sentiment_score
List~str~ feedback_samples
}
InsightsRequest --> CohortInput
Class diagram for refactored Settings and CustomConnectionConfigclassDiagram
class CustomConnectionConfig {
...
}
class Settings {
str environment
bool testing
bool enable_background_tasks
Optional~str~ database_url
str database_hostname
str database_port
str database_password
str database_name
str database_username
str secret_key
str refresh_secret_key
str algorithm
int access_token_expire_minutes
str google_client_id
str google_client_secret
str facebook_access_token
str facebook_app_id
str facebook_app_secret
str twitter_api_key
str twitter_api_secret
str twitter_access_token
str twitter_access_token_secret
str huggingface_api_token
str mail_username
str mail_password
EmailStr mail_from
int mail_port
str mail_server
str firebase_api_key
str firebase_auth_domain
str firebase_project_id
str firebase_storage_bucket
str firebase_messaging_sender_id
str firebase_app_id
str firebase_measurement_id
str default_language
int NOTIFICATION_RETENTION_DAYS
int MAX_BULK_NOTIFICATIONS
int NOTIFICATION_QUEUE_TIMEOUT
int NOTIFICATION_BATCH_SIZE
str DEFAULT_NOTIFICATION_CHANNEL
Optional~str~ REDIS_URL
str CELERY_BROKER_URL
str CELERY_BACKEND_URL
Optional~str~ rsa_private_key_path
Optional~str~ rsa_public_key_path
str _rsa_private_key
str _rsa_public_key
redis.Redis redis_client
str sqlalchemy_database_uri
ConnectionConfig mail_config
}
Settings --> CustomConnectionConfig
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey there - I've reviewed your changes and they look great!
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `app/config.py:168-166` </location>
<code_context>
+ # ------------------------------------------------------------------
+ # Derived properties
+ # ------------------------------------------------------------------
+ @property
+ def sqlalchemy_database_uri(self) -> str:
+ """Return a fully qualified SQLAlchemy connection string.
+
</code_context>
<issue_to_address>
**suggestion (bug_risk):** The fallback to SQLite in sqlalchemy_database_uri may mask misconfiguration.
Falling back to SQLite when Postgres credentials are missing may hide configuration errors in production. Recommend raising an error or warning outside development or testing environments.
Suggested implementation:
```python
@property
def sqlalchemy_database_uri(self) -> str:
"""Return a fully qualified SQLAlchemy connection string.
Raises:
RuntimeError: If Postgres credentials are missing outside development or testing environments.
"""
```
```python
# Fallback to SQLite if Postgres credentials are missing
if not self.postgres_user or not self.postgres_password or not self.postgres_host or not self.postgres_db:
if self.env in ("development", "testing"):
return f"sqlite:///{self.sqlite_path}"
else:
raise RuntimeError(
"Postgres credentials are missing. Refusing to fallback to SQLite outside development or testing environments."
)
```
</issue_to_address>
### Comment 2
<location> `app/config.py:241` </location>
<code_context>
+ logger.info("Successfully read %s key from %s", key_type, path)
+ return key_data
+
+ def _generate_in_memory_key(self, key_type: str) -> str:
+ placeholder = (
+ "-----BEGIN RSA {type} KEY-----\n"
</code_context>
<issue_to_address>
**🚨 issue (security):** In-memory RSA key placeholder is insecure for production use.
Consider implementing a runtime check or warning to ensure the placeholder key is not used outside of development or testing environments.
</issue_to_address>
### Comment 3
<location> `app/database.py:9` </location>
<code_context>
+from .config import settings
+
+
+def _engine_kwargs(database_url: str) -> Dict:
+ """Return engine kwargs tailored for the selected backend."""
+
</code_context>
<issue_to_address>
**suggestion:** Connection pool sizes for non-SQLite databases may be too low for high concurrency.
Consider allowing pool_size and max_overflow to be configured via environment variables or settings to better support varying production workloads.
Suggested implementation:
```python
from typing import Dict
import os
from sqlalchemy import create_engine
from sqlalchemy.orm import declarative_base, sessionmaker
from .config import settings
```
```python
def _engine_kwargs(database_url: str) -> Dict:
"""Return engine kwargs tailored for the selected backend."""
# Default pool sizes
default_pool_size = 5
default_max_overflow = 10
pool_size = getattr(settings, "DB_POOL_SIZE", None) or int(os.getenv("DB_POOL_SIZE", default_pool_size))
max_overflow = getattr(settings, "DB_MAX_OVERFLOW", None) or int(os.getenv("DB_MAX_OVERFLOW", default_max_overflow))
if database_url.startswith("sqlite"):
# SQLite does not support these options
return {}
else:
return {
"pool_size": pool_size,
"max_overflow": max_overflow,
}
```
You should ensure that `DB_POOL_SIZE` and `DB_MAX_OVERFLOW` are available in your `settings` object or set as environment variables in your deployment configuration. If your `settings` object uses a different naming convention or structure, adjust the attribute access accordingly.
</issue_to_address>
### Comment 4
<location> `app/config.py:28` </location>
<code_context>
- extra = Extra.ignore
-
-
-class Settings(BaseSettings):
- # إعدادات الذكاء الاصطناعي
- AI_MODEL_PATH: str = "bigscience/bloom-1b7"
</code_context>
<issue_to_address>
**issue (complexity):** Consider refactoring the Settings class into smaller nested Pydantic models and using Field(env=...) for environment variables to simplify configuration management.
Here are a few ways you can dramatically simplify Settings without losing any behavior:
1. Use Pydantic’s `Field(..., env="…")` so you don’t have to call `os.getenv` everywhere.
2. Split out RSA‐fallback and Redis init into tiny helper classes or nested models.
3. Use nested Pydantic models (e.g. `DatabaseSettings`, `MailSettings`, `RSASettings`) rather than one giant class.
Below is an example sketch for RSA and Database:
```python
from pathlib import Path
from pydantic import BaseModel, Field, validator
from typing import Optional
BASE_DIR = Path(__file__).resolve().parent.parent
DEFAULT_PRIVATE_KEY = BASE_DIR / "private_key.pem"
DEFAULT_PUBLIC_KEY = BASE_DIR / "public_key.pem"
class RSASettings(BaseModel):
private_path: Path = Field(DEFAULT_PRIVATE_KEY, env="RSA_PRIVATE_KEY_PATH")
public_path: Path = Field(DEFAULT_PUBLIC_KEY, env="RSA_PUBLIC_KEY_PATH")
@validator("private_path", pre=True)
@validator("public_path", pre=True)
def ensure_path(cls, v):
p = Path(v)
return p if p.is_absolute() else BASE_DIR / p
def _read_or_fallback(self, path: Path, kind: str) -> str:
if not path.exists() or path.read_text().strip()=="":
return self._placeholder(kind)
return path.read_text().strip()
@property
def private_key(self) -> str:
return self._read_or_fallback(self.private_path, "private")
@property
def public_key(self) -> str:
return self._read_or_fallback(self.public_path, "public")
def _placeholder(self, kind: str) -> str:
return f"-----BEGIN RSA {kind.upper()} KEY-----\n...FAKE KEY...\n-----END RSA {kind.upper()} KEY-----"
```
```python
class DatabaseSettings(BaseModel):
url: Optional[str] = Field(None, env="DATABASE_URL")
hostname: str = Field("localhost", env="DATABASE_HOSTNAME")
port: str = Field("5432", env="DATABASE_PORT")
username: str = Field("postgres", env="DATABASE_USERNAME")
password: str = Field("password", env="DATABASE_PASSWORD")
name: str = Field("app", env="DATABASE_NAME")
@property
def sqlalchemy_uri(self) -> str:
if self.url:
return self.url
return (
f"postgresql://{self.username}:{self.password}"
f"@{self.hostname}:{self.port}/{self.name}"
)
```
Finally your main `Settings` becomes just:
```python
from pydantic import BaseSettings, Field
import redis
from typing import Optional
class Settings(BaseSettings):
environment: str = Field("development", env="ENVIRONMENT")
testing: bool = Field(False, env="TESTING")
redis_url: Optional[str] = Field(None, env="REDIS_URL")
# …other top‐level flags…
rsa: RSASettings = RSASettings()
db: DatabaseSettings = DatabaseSettings()
# …mail, firebase, notifications, etc. as nested models…
class Config:
env_file = ".env"
env_file_encoding = "utf-8"
extra = "ignore"
@property
def redis_client(self) -> Optional[redis.Redis]:
if not self.redis_url:
return None
try:
return redis.Redis.from_url(self.redis_url)
except Exception:
return None
```
This:
- Moves each concern into its own class
- Lets Pydantic handle all `env` lookups
- Eliminates huge monolithic logic inside one file
- Keeps all existing behavior (fallbacks, URI logic, placeholders) exactly the same
</issue_to_address>
### Comment 5
<location> `app/utils.py:81` </location>
<code_context>
+MIN_QUALITY_THRESHOLD = 50
+
+
+def _load_transformer_pipeline(task: str, model_name: str, **kwargs):
+ if not pipeline or getattr(settings, "testing", False):
+ return None
</code_context>
<issue_to_address>
**issue (complexity):** Consider refactoring repeated optional dependency and pipeline initialization logic into centralized factory and helper functions for cleaner and more maintainable code.
Here’s a way to collapse all of your “optional‐dependency + guard + pipeline” logic into one small factory + one safe‐call helper. This removes the repeated `if not pipeline…` and try/except in each function, and centralizes all of your model specs in one place:
```python
# 1) Define your configs once
PIPELINE_SPECS = {
"offensive": ("text-classification", "cardiffnlp/twitter-roberta-base-offensive"),
"sentiment": ("sentiment-analysis", "distilbert-base-uncased-finetuned-sst-2-english"),
}
# 2) Factory to load them all (only once)
def _load_all_pipelines():
if not pipeline or getattr(settings, "testing", False):
return {name: None for name in PIPELINE_SPECS}
out = {}
for name, (task, model_name) in PIPELINE_SPECS.items():
try:
out[name] = pipeline(
task,
model=model_name,
device=0 if settings.USE_GPU else -1
)
except Exception as e:
logger.warning("cannot load %s pipeline: %s", name, e)
out[name] = None
return out
pipelines = _load_all_pipelines()
```
```python
# 3) Safe‐call helper
def _run_pipe(name, *args, default=None, **kwargs):
p = pipelines.get(name)
if not p:
return default
return p(*args, **kwargs)
```
Now your two functions shrink to:
```python
def is_content_offensive(text: str) -> tuple:
# returns (False, 0.0) if no model
result = _run_pipe("offensive", text, default=[{"label":"LABEL_0","score":0.0}])[0]
is_off = result["label"] == "LABEL_1" and result["score"] > 0.8
return is_off, result["score"]
def analyze_user_behavior(user_history, content: str) -> float:
# skip sentiment if missing
default = {"label":"NEUTRAL","score":0.0}
res = _run_pipe("sentiment", content[:512], default=[default])[0]
sentiment, score = res["label"], float(res["score"])
base = sum(1 for w in content.lower().split()
if w in {x.lower() for x in user_history})
return base + (score if sentiment == "POSITIVE" else 0.0)
```
You can apply the same pattern to stop‐words:
```python
from functools import lru_cache
@lru_cache()
def load_stopwords():
try:
import nltk
nltk.data.find("corpora/stopwords")
return nltk.corpus.stopwords.words("english")
except Exception:
return []
SPELL_STOP_WORDS = load_stopwords()
```
…then in your classifier training:
```python
vectorizer = CountVectorizer(stop_words=SPELL_STOP_WORDS or None)
```
This collapses all of your layered initialization, per‐function guards, and duplicate imports into two factories and two helpers, while preserving every piece of existing functionality.
</issue_to_address>
### Comment 6
<location> `app/utils.py:174` </location>
<code_context>
stop_words = spell_stop_words if spell_stop_words else None
</code_context>
<issue_to_address>
**suggestion (code-quality):** Replace if-expression with `or` ([`or-if-exp-identity`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/or-if-exp-identity))
```suggestion
stop_words = spell_stop_words or None
```
<br/><details><summary>Explanation</summary>Here we find ourselves setting a value if it evaluates to `True`, and otherwise
using a default.
The 'After' case is a bit easier to read and avoids the duplication of
`input_currency`.
It works because the left-hand side is evaluated first. If it evaluates to
true then `currency` will be set to this and the right-hand side will not be
evaluated. If it evaluates to false the right-hand side will be evaluated and
`currency` will be set to `DEFAULT_CURRENCY`.
</details>
</issue_to_address>
### Comment 7
<location> `app/utils.py:646-648` </location>
<code_context>
relevance_score = sum(
1 for word in content.lower().split() if word in user_interests
)
</code_context>
<issue_to_address>
**suggestion (code-quality):** Simplify constant sum() call ([`simplify-constant-sum`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/simplify-constant-sum))
```suggestion
relevance_score = sum(bool(word in user_interests)
for word in content.lower().split())
```
<br/><details><summary>Explanation</summary>As `sum` add the values it treats `True` as `1`, and `False` as `0`. We make use
of this fact to simplify the generator expression inside the `sum` call.
</details>
</issue_to_address>
### Comment 8
<location> `tests/test_moderation.py:49-50` </location>
<code_context>
</code_context>
<issue_to_address>
**issue (code-quality):** Avoid conditionals in tests. ([`no-conditionals-in-tests`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/no-conditionals-in-tests))
<details><summary>Explanation</summary>Avoid complex code, like conditionals, in test functions.
Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:
* loops
* conditionals
Some ways to fix this:
* Use parametrized tests to get rid of the loop.
* Move the complex logic into helpers.
* Move the complex part into pytest fixtures.
> Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.
Software Engineering at Google / [Don't Put Logic in Tests](https://abseil.io/resources/swe-book/html/ch12.html#donapostrophet_put_logic_in_tests)
</details>
</issue_to_address>
### Comment 9
<location> `tests/test_moderation.py:74-75` </location>
<code_context>
</code_context>
<issue_to_address>
**issue (code-quality):** Avoid conditionals in tests. ([`no-conditionals-in-tests`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/no-conditionals-in-tests))
<details><summary>Explanation</summary>Avoid complex code, like conditionals, in test functions.
Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:
* loops
* conditionals
Some ways to fix this:
* Use parametrized tests to get rid of the loop.
* Move the complex logic into helpers.
* Move the complex part into pytest fixtures.
> Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.
Software Engineering at Google / [Don't Put Logic in Tests](https://abseil.io/resources/swe-book/html/ch12.html#donapostrophet_put_logic_in_tests)
</details>
</issue_to_address>
### Comment 10
<location> `tests/test_moderation.py:105-106` </location>
<code_context>
</code_context>
<issue_to_address>
**issue (code-quality):** Avoid conditionals in tests. ([`no-conditionals-in-tests`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/no-conditionals-in-tests))
<details><summary>Explanation</summary>Avoid complex code, like conditionals, in test functions.
Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:
* loops
* conditionals
Some ways to fix this:
* Use parametrized tests to get rid of the loop.
* Move the complex logic into helpers.
* Move the complex part into pytest fixtures.
> Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.
Software Engineering at Google / [Don't Put Logic in Tests](https://abseil.io/resources/swe-book/html/ch12.html#donapostrophet_put_logic_in_tests)
</details>
</issue_to_address>
### Comment 11
<location> `tests/test_moderation.py:107-108` </location>
<code_context>
</code_context>
<issue_to_address>
**issue (code-quality):** Avoid conditionals in tests. ([`no-conditionals-in-tests`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/no-conditionals-in-tests))
<details><summary>Explanation</summary>Avoid complex code, like conditionals, in test functions.
Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:
* loops
* conditionals
Some ways to fix this:
* Use parametrized tests to get rid of the loop.
* Move the complex logic into helpers.
* Move the complex part into pytest fixtures.
> Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.
Software Engineering at Google / [Don't Put Logic in Tests](https://abseil.io/resources/swe-book/html/ch12.html#donapostrophet_put_logic_in_tests)
</details>
</issue_to_address>
### Comment 12
<location> `tests/test_routers.py:46-49` </location>
<code_context>
</code_context>
<issue_to_address>
**issue (code-quality):** Avoid loops in tests. ([`no-loop-in-tests`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/no-loop-in-tests))
<details><summary>Explanation</summary>Avoid complex code, like loops, in test functions.
Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:
* loops
* conditionals
Some ways to fix this:
* Use parametrized tests to get rid of the loop.
* Move the complex logic into helpers.
* Move the complex part into pytest fixtures.
> Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.
Software Engineering at Google / [Don't Put Logic in Tests](https://abseil.io/resources/swe-book/html/ch12.html#donapostrophet_put_logic_in_tests)
</details>
</issue_to_address>
### Comment 13
<location> `tests/test_routers.py:48-49` </location>
<code_context>
</code_context>
<issue_to_address>
**issue (code-quality):** Avoid conditionals in tests. ([`no-conditionals-in-tests`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/no-conditionals-in-tests))
<details><summary>Explanation</summary>Avoid complex code, like conditionals, in test functions.
Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:
* loops
* conditionals
Some ways to fix this:
* Use parametrized tests to get rid of the loop.
* Move the complex logic into helpers.
* Move the complex part into pytest fixtures.
> Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.
Software Engineering at Google / [Don't Put Logic in Tests](https://abseil.io/resources/swe-book/html/ch12.html#donapostrophet_put_logic_in_tests)
</details>
</issue_to_address>
### Comment 14
<location> `tests/test_routers.py:56-61` </location>
<code_context>
</code_context>
<issue_to_address>
**issue (code-quality):** Avoid loops in tests. ([`no-loop-in-tests`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/no-loop-in-tests))
<details><summary>Explanation</summary>Avoid complex code, like loops, in test functions.
Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:
* loops
* conditionals
Some ways to fix this:
* Use parametrized tests to get rid of the loop.
* Move the complex logic into helpers.
* Move the complex part into pytest fixtures.
> Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.
Software Engineering at Google / [Don't Put Logic in Tests](https://abseil.io/resources/swe-book/html/ch12.html#donapostrophet_put_logic_in_tests)
</details>
</issue_to_address>
### Comment 15
<location> `tests/test_utils_general.py:77-80` </location>
<code_context>
</code_context>
<issue_to_address>
**issue (code-quality):** Avoid conditionals in tests. ([`no-conditionals-in-tests`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/no-conditionals-in-tests))
<details><summary>Explanation</summary>Avoid complex code, like conditionals, in test functions.
Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:
* loops
* conditionals
Some ways to fix this:
* Use parametrized tests to get rid of the loop.
* Move the complex logic into helpers.
* Move the complex part into pytest fixtures.
> Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.
Software Engineering at Google / [Don't Put Logic in Tests](https://abseil.io/resources/swe-book/html/ch12.html#donapostrophet_put_logic_in_tests)
</details>
</issue_to_address>
### Comment 16
<location> `app/analytics.py:244-248` </location>
<code_context>
def generate_search_trends_chart(lookback_days: int = 30, db: Session | None = None) -> str:
"""Return a base64-encoded PNG chart or JSON representation of search trends."""
with _session_scope(db) as session:
if session is None:
records: Sequence[Tuple[datetime, int]] = []
else:
records = _fetch_trend_rows(session, lookback_days)
return _render_chart(records)
</code_context>
<issue_to_address>
**suggestion (code-quality):** Replace if statement with if expression ([`assign-if-exp`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/assign-if-exp/))
```suggestion
records = [] if session is None else _fetch_trend_rows(session, lookback_days)
```
</issue_to_address>
### Comment 17
<location> `app/analytics.py:289-297` </location>
<code_context>
def update_conversation_statistics(db: Session, conversation_id: str, new_message: models.Message) -> None:
stats = (
db.query(models.ConversationStatistics)
.filter(models.ConversationStatistics.conversation_id == conversation_id)
.first()
)
if not stats:
stats = models.ConversationStatistics(
conversation_id=conversation_id,
user1_id=min(new_message.sender_id, new_message.receiver_id),
user2_id=max(new_message.sender_id, new_message.receiver_id),
total_messages=0,
total_files=0,
total_emojis=0,
total_stickers=0,
total_response_time=0,
total_responses=0,
average_response_time=0,
last_message_at=None,
)
db.add(stats)
stats.total_messages += 1
stats.last_message_at = func.now()
if new_message.attachments:
stats.total_files += len(new_message.attachments)
if getattr(new_message, "has_emoji", False):
stats.total_emojis += 1
if getattr(new_message, "message_type", "") == "sticker":
stats.total_stickers += 1
last_message = (
db.query(models.Message)
.filter(
models.Message.conversation_id == conversation_id,
models.Message.id != new_message.id,
)
.order_by(models.Message.created_at.desc())
.first()
)
if last_message:
time_diff = (new_message.created_at - last_message.created_at).total_seconds()
stats.total_response_time += time_diff
stats.total_responses += 1
stats.average_response_time = stats.total_response_time / max(stats.total_responses, 1)
db.commit()
</code_context>
<issue_to_address>
**issue (code-quality):** Use named expression to simplify assignment and conditional ([`use-named-expression`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-named-expression/))
</issue_to_address>
### Comment 18
<location> `app/insights.py:34-36` </location>
<code_context>
@property
def retention_rate(self) -> float:
"""Return the retention percentage for the cohort."""
if self.size <= 0:
return 0.0
return round(self.returning / self.size, 4)
</code_context>
<issue_to_address>
**suggestion (code-quality):** We've found these issues:
- Lift code into else after jump in control flow ([`reintroduce-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/reintroduce-else/))
- Replace if statement with if expression ([`assign-if-exp`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/assign-if-exp/))
```suggestion
return 0.0 if self.size <= 0 else round(self.returning / self.size, 4)
```
</issue_to_address>
### Comment 19
<location> `app/routers/insights.py:36-37` </location>
<code_context>
@router.post("/summary")
def summarize_product_health(request: InsightsRequest):
"""Return a holistic summary that blends retention, growth, and sentiment."""
cohorts = _convert_cohorts(request.cohorts)
sentiment_score = request.sentiment_score
feedback_details: List[str] = []
if request.feedback_samples:
sentiments = [analytics.analyze_content(text)["sentiment"]["score"] for text in request.feedback_samples]
if sentiments:
sentiment_score = sum(sentiments) / len(sentiments)
feedback_details = request.feedback_samples
churned = request.churned_users or [0] * len(request.new_signups)
summary = build_insight_summary(
cohorts=cohorts,
daily_active_users=request.daily_active_users,
new_signups=request.new_signups,
churned_users=churned,
feature_usage=request.feature_usage,
sentiment_score=sentiment_score,
)
summary["feedback"] = {
"sample_count": len(feedback_details),
"notes": feedback_details[:10],
"average_sentiment": round(sentiment_score, 4),
}
return summary
</code_context>
<issue_to_address>
**suggestion (code-quality):** Use named expression to simplify assignment and conditional ([`use-named-expression`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-named-expression/))
```suggestion
if sentiments := [
analytics.analyze_content(text)["sentiment"]["score"]
for text in request.feedback_samples
]:
```
</issue_to_address>
### Comment 20
<location> `app/routers/search.py:54` </location>
<code_context>
@router.post("/", response_model=SearchResponse)
async def search(
search_params: SearchParams,
db: Session = Depends(get_db),
current_user: models.User = Depends(oauth2.get_current_user),
):
"""
Main search endpoint.
- Receives search parameters from the user.
- Records the search query.
- Gets spell-check suggestions.
- Searches posts and sorts results.
- Retrieves popular and user search suggestions.
- Caches results for one hour using Redis.
Returns a SearchResponse with results, spell suggestion, and search suggestions.
"""
cache_key = f"search:{search_params.query}:{search_params.sort_by}"
cache = _get_cache()
if cache:
cached_result = cache.get(cache_key)
if cached_result:
return json.loads(cached_result)
# Record the search query
record_search_query(db, search_params.query, current_user.id)
suggestions = get_spell_suggestions(search_params.query)
spell_suggestion = format_spell_suggestions(search_params.query, suggestions)
results = search_posts(search_params.query, db)
sorted_results = sort_search_results(results, search_params.sort_by, db)
# Get search suggestions from popular and user history
popular_searches = get_popular_searches(db, limit=3)
user_searches = get_user_searches(db, current_user.id, limit=2)
search_suggestions = list(
set([stat.query for stat in popular_searches + user_searches])
)
search_response = {
"results": sorted_results,
"spell_suggestion": spell_suggestion,
"search_suggestions": search_suggestions,
}
if results and cache:
cache.setex(cache_key, 3600, json.dumps(search_response))
return search_response
</code_context>
<issue_to_address>
**issue (code-quality):** We've found these issues:
- Use named expression to simplify assignment and conditional ([`use-named-expression`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-named-expression/))
- Replace list(), dict() or set() with comprehension ([`collection-builtin-to-comprehension`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/collection-builtin-to-comprehension/))
- Replace unneeded comprehension with generator ([`comprehension-to-generator`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/comprehension-to-generator/))
</issue_to_address>
### Comment 21
<location> `app/routers/search.py:174-175` </location>
<code_context>
@router.get("/autocomplete", response_model=List[schemas.SearchSuggestionOut])
async def autocomplete(
query: str = Query(..., min_length=1),
db: Session = Depends(get_db),
limit: int = 10,
):
"""
Return autocomplete suggestions for a given query.
- Searches for suggestions starting with the query.
- Orders by frequency and caches results for 5 minutes.
"""
cache_key = f"autocomplete:{query}"
cache = _get_cache()
if cache:
cached_result = cache.get(cache_key)
if cached_result:
return json.loads(cached_result)
suggestions = (
db.query(models.SearchSuggestion)
.filter(models.SearchSuggestion.term.ilike(f"{query}%"))
.order_by(models.SearchSuggestion.frequency.desc())
.limit(limit)
.all()
)
result = [schemas.SearchSuggestionOut.from_orm(s) for s in suggestions]
if cache:
cache.setex(cache_key, 300, json.dumps([s.dict() for s in result]))
return result
</code_context>
<issue_to_address>
**suggestion (code-quality):** Use named expression to simplify assignment and conditional ([`use-named-expression`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-named-expression/))
```suggestion
if cached_result := cache.get(cache_key):
```
</issue_to_address>
### Comment 22
<location> `app/utils.py:639` </location>
<code_context>
def analyze_user_behavior(user_history, content: str) -> float:
"""
Analyzes user behavior based on search history and the sentiment of the content.
Returns a relevance score.
"""
user_interests = set(item.lower() for item in user_history)
sentiment = "NEUTRAL"
score = 0.0
if sentiment_pipeline:
result = sentiment_pipeline(content[:512])[0]
sentiment = result.get("label", "NEUTRAL")
score = float(result.get("score", 0.0))
relevance_score = sum(
1 for word in content.lower().split() if word in user_interests
)
relevance_score += score if sentiment == "POSITIVE" else 0
return relevance_score
</code_context>
<issue_to_address>
**suggestion (code-quality):** Replace list(), dict() or set() with comprehension ([`collection-builtin-to-comprehension`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/collection-builtin-to-comprehension/))
```suggestion
user_interests = {item.lower() for item in user_history}
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| # Derived properties | ||
| # ------------------------------------------------------------------ | ||
| @property | ||
| def sqlalchemy_database_uri(self) -> str: |
There was a problem hiding this comment.
suggestion (bug_risk): The fallback to SQLite in sqlalchemy_database_uri may mask misconfiguration.
Falling back to SQLite when Postgres credentials are missing may hide configuration errors in production. Recommend raising an error or warning outside development or testing environments.
Suggested implementation:
@property
def sqlalchemy_database_uri(self) -> str:
"""Return a fully qualified SQLAlchemy connection string.
Raises:
RuntimeError: If Postgres credentials are missing outside development or testing environments.
""" # Fallback to SQLite if Postgres credentials are missing
if not self.postgres_user or not self.postgres_password or not self.postgres_host or not self.postgres_db:
if self.env in ("development", "testing"):
return f"sqlite:///{self.sqlite_path}"
else:
raise RuntimeError(
"Postgres credentials are missing. Refusing to fallback to SQLite outside development or testing environments."
)| logger.info("Successfully read %s key from %s", key_type, path) | ||
| return key_data | ||
|
|
||
| def _generate_in_memory_key(self, key_type: str) -> str: |
There was a problem hiding this comment.
🚨 issue (security): In-memory RSA key placeholder is insecure for production use.
Consider implementing a runtime check or warning to ensure the placeholder key is not used outside of development or testing environments.
| from .config import settings | ||
|
|
||
|
|
||
| def _engine_kwargs(database_url: str) -> Dict: |
There was a problem hiding this comment.
suggestion: Connection pool sizes for non-SQLite databases may be too low for high concurrency.
Consider allowing pool_size and max_overflow to be configured via environment variables or settings to better support varying production workloads.
Suggested implementation:
from typing import Dict
import os
from sqlalchemy import create_engine
from sqlalchemy.orm import declarative_base, sessionmaker
from .config import settingsdef _engine_kwargs(database_url: str) -> Dict:
"""Return engine kwargs tailored for the selected backend."""
# Default pool sizes
default_pool_size = 5
default_max_overflow = 10
pool_size = getattr(settings, "DB_POOL_SIZE", None) or int(os.getenv("DB_POOL_SIZE", default_pool_size))
max_overflow = getattr(settings, "DB_MAX_OVERFLOW", None) or int(os.getenv("DB_MAX_OVERFLOW", default_max_overflow))
if database_url.startswith("sqlite"):
# SQLite does not support these options
return {}
else:
return {
"pool_size": pool_size,
"max_overflow": max_overflow,
}You should ensure that DB_POOL_SIZE and DB_MAX_OVERFLOW are available in your settings object or set as environment variables in your deployment configuration. If your settings object uses a different naming convention or structure, adjust the attribute access accordingly.
| logging.basicConfig(level=logging.INFO) | ||
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
There was a problem hiding this comment.
issue (complexity): Consider refactoring the Settings class into smaller nested Pydantic models and using Field(env=...) for environment variables to simplify configuration management.
Here are a few ways you can dramatically simplify Settings without losing any behavior:
- Use Pydantic’s
Field(..., env="…")so you don’t have to callos.getenveverywhere. - Split out RSA‐fallback and Redis init into tiny helper classes or nested models.
- Use nested Pydantic models (e.g.
DatabaseSettings,MailSettings,RSASettings) rather than one giant class.
Below is an example sketch for RSA and Database:
from pathlib import Path
from pydantic import BaseModel, Field, validator
from typing import Optional
BASE_DIR = Path(__file__).resolve().parent.parent
DEFAULT_PRIVATE_KEY = BASE_DIR / "private_key.pem"
DEFAULT_PUBLIC_KEY = BASE_DIR / "public_key.pem"
class RSASettings(BaseModel):
private_path: Path = Field(DEFAULT_PRIVATE_KEY, env="RSA_PRIVATE_KEY_PATH")
public_path: Path = Field(DEFAULT_PUBLIC_KEY, env="RSA_PUBLIC_KEY_PATH")
@validator("private_path", pre=True)
@validator("public_path", pre=True)
def ensure_path(cls, v):
p = Path(v)
return p if p.is_absolute() else BASE_DIR / p
def _read_or_fallback(self, path: Path, kind: str) -> str:
if not path.exists() or path.read_text().strip()=="":
return self._placeholder(kind)
return path.read_text().strip()
@property
def private_key(self) -> str:
return self._read_or_fallback(self.private_path, "private")
@property
def public_key(self) -> str:
return self._read_or_fallback(self.public_path, "public")
def _placeholder(self, kind: str) -> str:
return f"-----BEGIN RSA {kind.upper()} KEY-----\n...FAKE KEY...\n-----END RSA {kind.upper()} KEY-----"class DatabaseSettings(BaseModel):
url: Optional[str] = Field(None, env="DATABASE_URL")
hostname: str = Field("localhost", env="DATABASE_HOSTNAME")
port: str = Field("5432", env="DATABASE_PORT")
username: str = Field("postgres", env="DATABASE_USERNAME")
password: str = Field("password", env="DATABASE_PASSWORD")
name: str = Field("app", env="DATABASE_NAME")
@property
def sqlalchemy_uri(self) -> str:
if self.url:
return self.url
return (
f"postgresql://{self.username}:{self.password}"
f"@{self.hostname}:{self.port}/{self.name}"
)Finally your main Settings becomes just:
from pydantic import BaseSettings, Field
import redis
from typing import Optional
class Settings(BaseSettings):
environment: str = Field("development", env="ENVIRONMENT")
testing: bool = Field(False, env="TESTING")
redis_url: Optional[str] = Field(None, env="REDIS_URL")
# …other top‐level flags…
rsa: RSASettings = RSASettings()
db: DatabaseSettings = DatabaseSettings()
# …mail, firebase, notifications, etc. as nested models…
class Config:
env_file = ".env"
env_file_encoding = "utf-8"
extra = "ignore"
@property
def redis_client(self) -> Optional[redis.Redis]:
if not self.redis_url:
return None
try:
return redis.Redis.from_url(self.redis_url)
except Exception:
return NoneThis:
- Moves each concern into its own class
- Lets Pydantic handle all
envlookups - Eliminates huge monolithic logic inside one file
- Keeps all existing behavior (fallbacks, URI logic, placeholders) exactly the same
| MIN_QUALITY_THRESHOLD = 50 | ||
|
|
||
|
|
||
| def _load_transformer_pipeline(task: str, model_name: str, **kwargs): |
There was a problem hiding this comment.
issue (complexity): Consider refactoring repeated optional dependency and pipeline initialization logic into centralized factory and helper functions for cleaner and more maintainable code.
Here’s a way to collapse all of your “optional‐dependency + guard + pipeline” logic into one small factory + one safe‐call helper. This removes the repeated if not pipeline… and try/except in each function, and centralizes all of your model specs in one place:
# 1) Define your configs once
PIPELINE_SPECS = {
"offensive": ("text-classification", "cardiffnlp/twitter-roberta-base-offensive"),
"sentiment": ("sentiment-analysis", "distilbert-base-uncased-finetuned-sst-2-english"),
}
# 2) Factory to load them all (only once)
def _load_all_pipelines():
if not pipeline or getattr(settings, "testing", False):
return {name: None for name in PIPELINE_SPECS}
out = {}
for name, (task, model_name) in PIPELINE_SPECS.items():
try:
out[name] = pipeline(
task,
model=model_name,
device=0 if settings.USE_GPU else -1
)
except Exception as e:
logger.warning("cannot load %s pipeline: %s", name, e)
out[name] = None
return out
pipelines = _load_all_pipelines()# 3) Safe‐call helper
def _run_pipe(name, *args, default=None, **kwargs):
p = pipelines.get(name)
if not p:
return default
return p(*args, **kwargs)Now your two functions shrink to:
def is_content_offensive(text: str) -> tuple:
# returns (False, 0.0) if no model
result = _run_pipe("offensive", text, default=[{"label":"LABEL_0","score":0.0}])[0]
is_off = result["label"] == "LABEL_1" and result["score"] > 0.8
return is_off, result["score"]
def analyze_user_behavior(user_history, content: str) -> float:
# skip sentiment if missing
default = {"label":"NEUTRAL","score":0.0}
res = _run_pipe("sentiment", content[:512], default=[default])[0]
sentiment, score = res["label"], float(res["score"])
base = sum(1 for w in content.lower().split()
if w in {x.lower() for x in user_history})
return base + (score if sentiment == "POSITIVE" else 0.0)You can apply the same pattern to stop‐words:
from functools import lru_cache
@lru_cache()
def load_stopwords():
try:
import nltk
nltk.data.find("corpora/stopwords")
return nltk.corpus.stopwords.words("english")
except Exception:
return []
SPELL_STOP_WORDS = load_stopwords()…then in your classifier training:
vectorizer = CountVectorizer(stop_words=SPELL_STOP_WORDS or None)This collapses all of your layered initialization, per‐function guards, and duplicate imports into two factories and two helpers, while preserving every piece of existing functionality.
| if self.size <= 0: | ||
| return 0.0 | ||
| return round(self.returning / self.size, 4) |
There was a problem hiding this comment.
suggestion (code-quality): We've found these issues:
- Lift code into else after jump in control flow (
reintroduce-else) - Replace if statement with if expression (
assign-if-exp)
| if self.size <= 0: | |
| return 0.0 | |
| return round(self.returning / self.size, 4) | |
| return 0.0 if self.size <= 0 else round(self.returning / self.size, 4) |
| sentiments = [analytics.analyze_content(text)["sentiment"]["score"] for text in request.feedback_samples] | ||
| if sentiments: |
There was a problem hiding this comment.
suggestion (code-quality): Use named expression to simplify assignment and conditional (use-named-expression)
| sentiments = [analytics.analyze_content(text)["sentiment"]["score"] for text in request.feedback_samples] | |
| if sentiments: | |
| if sentiments := [ | |
| analytics.analyze_content(text)["sentiment"]["score"] | |
| for text in request.feedback_samples | |
| ]: |
| cache_key = f"search:{search_params.query}:{search_params.sort_by}" | ||
| cache = _get_cache() | ||
| if cache: | ||
| cached_result = cache.get(cache_key) |
There was a problem hiding this comment.
issue (code-quality): We've found these issues:
- Use named expression to simplify assignment and conditional (
use-named-expression) - Replace list(), dict() or set() with comprehension (
collection-builtin-to-comprehension) - Replace unneeded comprehension with generator (
comprehension-to-generator)
| cached_result = cache.get(cache_key) | ||
| if cached_result: |
There was a problem hiding this comment.
suggestion (code-quality): Use named expression to simplify assignment and conditional (use-named-expression)
| cached_result = cache.get(cache_key) | |
| if cached_result: | |
| if cached_result := cache.get(cache_key): |
| Analyzes user behavior based on search history and the sentiment of the content. | ||
| Returns a relevance score. | ||
| """ | ||
| user_interests = set(item.lower() for item in user_history) |
There was a problem hiding this comment.
suggestion (code-quality): Replace list(), dict() or set() with comprehension (collection-builtin-to-comprehension)
| user_interests = set(item.lower() for item in user_history) | |
| user_interests = {item.lower() for item in user_history} |
Summary
Testing
https://chatgpt.com/codex/tasks/task_e_69031b026a5483328f15f506c6df8f83
Summary by Sourcery
Expand the moderation schema with a missing reported_user linkage and bolster code quality by refactoring core utility modules to work without heavy optional dependencies, introducing an application factory pattern, and adding a new lightweight insights endpoint. Extend test coverage across caching, crypto, content filtering, link preview, media processing, analytics, notifications, moderation automation, and utility flows.
New Features:
Enhancements:
Tests: