Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ LabelU supports one-click loading of pre-annotated data, which can be refined an
LabelU integrates AI model services for automatic annotation of image data. Click the "AI Annotate" button on the annotation page to have the model automatically detect and segment objects. Supports batch annotation for entire tasks with real-time progress tracking. Three reference model servers are provided out of the box:

- **Florence-2** — lightweight, CPU-friendly (~4GB VRAM)
- **GroundingDINO + EfficientSAM** — high-quality detection + segmentation (~4GB VRAM)
- **GroundingDINO + SAM ViT-B** — high-quality detection + segmentation (~4GB VRAM)
- **SAM 3** — state-of-the-art unified model (~8GB VRAM, requires high-end GPU)

See [`model_server/README.md`](./model_server/README.md) for setup instructions.
Expand Down
2 changes: 1 addition & 1 deletion README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ LabelU 支持预标注数据的一键载入,用户可以根据实际需要对
LabelU 集成了 AI 模型服务,支持图像数据的自动标注。在标注页面点击「AI 标注」按钮即可让模型自动检测和分割目标,也支持对整个任务的所有未标注样本进行批量标注,并可实时查看进度。项目内置提供了三个参考模型服务:

- **Florence-2** — 轻量级,CPU 友好(约 4GB 显存)
- **GroundingDINO + EfficientSAM** — 高质量检测 + 分割(约 4GB 显存)
- **GroundingDINO + SAM ViT-B** — 高质量检测 + 分割(约 4GB 显存)
- **SAM 3** — 最新一代统一模型(约 8GB 显存,需要高端 GPU)

详见 [`model_server/README.md`](./model_server/README.md) 了解部署方式。
Expand Down
10 changes: 5 additions & 5 deletions model_server/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ LabelU 自动标注功能的参考模型服务实现。提供三套方案,均

## 方案对比

| | Florence-2 | GroundingDINO + EfficientSAM | SAM 3 |
| | Florence-2 | GroundingDINO + SAM ViT-B | SAM 3 |
|---|---|---|---|
| 架构 | 单模型(检测+分割分步) | 两模型串联 | 单模型统一 |
| 开放词汇 | 支持 | 支持 | 支持(400万+概念) |
Expand All @@ -21,7 +21,7 @@ LabelU 自动标注功能的参考模型服务实现。提供三套方案,均

**推荐**:
- 有 4090/A100 等高端 GPU → **SAM 3**(质量最好,单模型最简单)
- 有中端 GPU(如 1660/2060) → **GroundingDINO + EfficientSAM**
- 有中端 GPU(如 1660/2060) → **GroundingDINO + SAM ViT-B**
- 只有 CPU 或显存紧张 → **Florence-2**

## 快速启动
Expand All @@ -34,7 +34,7 @@ pip install -r requirements.txt
python server.py --device cpu --port 5000
```

### GroundingDINO + EfficientSAM
### GroundingDINO + SAM ViT-B

```bash
cd model_server/grounding_dino_sam
Expand Down Expand Up @@ -66,7 +66,7 @@ cd model_server/florence2
docker build -t labelu-florence2 .
docker run -p 5000:5000 labelu-florence2

# GroundingDINO + EfficientSAM (GPU)
# GroundingDINO + SAM ViT-B (GPU)
cd model_server/grounding_dino_sam
docker build -t labelu-dino-sam .
docker run --gpus all -p 5000:5000 labelu-dino-sam python server.py --device cuda
Expand All @@ -85,7 +85,7 @@ docker run --gpus all -p 5000:5000 labelu-sam3
AI_AUTO_LABEL_ENABLED=true
AI_MODEL_ENDPOINT=http://localhost:5000/
AI_MODEL_TIMEOUT_SECONDS=60
AI_MODEL_NAME=florence-2-base # 或 grounding-dino-tiny+efficient-sam / sam3
AI_MODEL_NAME=florence-2-base # 或 grounding-dino-tiny+sam-vit-base / sam3
```

## API 协议
Expand Down
6 changes: 3 additions & 3 deletions model_server/grounding_dino_sam/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,9 @@ RUN python -c "\
from transformers import AutoProcessor, AutoModelForZeroShotObjectDetection; \
AutoProcessor.from_pretrained('IDEA-Research/grounding-dino-tiny'); \
AutoModelForZeroShotObjectDetection.from_pretrained('IDEA-Research/grounding-dino-tiny'); \
from transformers import EfficientSamModel, SamImageProcessor; \
SamImageProcessor.from_pretrained('ybelkada/efficient-sam-vitt'); \
EfficientSamModel.from_pretrained('ybelkada/efficient-sam-vitt')"
from transformers import SamModel, SamProcessor; \
SamProcessor.from_pretrained('facebook/sam-vit-base'); \
SamModel.from_pretrained('facebook/sam-vit-base')"

EXPOSE 5000
CMD ["python", "server.py", "--device", "cpu"]
46 changes: 20 additions & 26 deletions model_server/grounding_dino_sam/server.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
"""
LabelU Model Server — GroundingDINO + EfficientSAM
LabelU Model Server — GroundingDINO + SAM

High-quality reference implementation for open-vocabulary detection
(GroundingDINO) paired with segmentation (EfficientSAM).
(GroundingDINO) paired with segmentation (Segment Anything).

Implements the LabelU auto-label model API protocol:
POST / → { request_id, image_url, labels, constraints, prompt }
Expand Down Expand Up @@ -34,7 +34,7 @@
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("dino-sam-server")

app = FastAPI(title="LabelU GroundingDINO + EfficientSAM Model Server")
app = FastAPI(title="LabelU GroundingDINO + SAM Model Server")

# ── globals ───────────────────────────────────────────────────────────
dino_model = None
Expand All @@ -44,8 +44,8 @@
device = "cpu"

DINO_MODEL_ID = "IDEA-Research/grounding-dino-tiny"
SAM_MODEL_ID = "ybelkada/efficient-sam-vitt"
MODEL_LABEL = "grounding-dino-tiny+efficient-sam"
SAM_MODEL_ID = "facebook/sam-vit-base"
MODEL_LABEL = "grounding-dino-tiny+sam-vit-base"

BOX_THRESHOLD = 0.25
TEXT_THRESHOLD = 0.25
Expand Down Expand Up @@ -122,28 +122,22 @@ def _detect_objects(


def _segment_box(image: Image.Image, box_xyxy: list[float]) -> np.ndarray | None:
"""Run EfficientSAM on a single bounding box prompt. Returns binary mask."""
"""Run SAM on a single bounding box prompt. Returns binary mask."""
if sam_model is None:
return None

w, h = image.size
input_points = torch.tensor([[[
[box_xyxy[0] / w, box_xyxy[1] / h],
[box_xyxy[2] / w, box_xyxy[3] / h],
]]]).to(device)
input_labels = torch.tensor([[[2, 3]]]).to(device) # box prompt: top-left=2, bottom-right=3

inputs = sam_processor(image, input_points=input_points, input_labels=input_labels, return_tensors="pt").to(device)
input_boxes = [[box_xyxy]]
inputs = sam_processor(image, input_boxes=input_boxes, return_tensors="pt").to(device)
with torch.inference_mode():
outputs = sam_model(**inputs)
outputs = sam_model(**inputs, multimask_output=False)

mask = sam_processor.image_processor.post_process_masks(
outputs.pred_masks.cpu(),
inputs["original_sizes"].cpu(),
inputs["reshaped_input_sizes"].cpu(),
)[0]

mask_np = mask[0, 0].numpy().astype(np.uint8)
mask_np = (mask[0, 0].numpy() > 0).astype(np.uint8)
return mask_np


Expand Down Expand Up @@ -218,7 +212,7 @@ async def predict(req: PredictRequest) -> PredictResponse:
))

latency = int((time.perf_counter() - start) * 1000)
warning = None if sam_model else "EfficientSAM not loaded; polygon results use bounding boxes"
warning = None if sam_model else "SAM not loaded; polygon results use bounding boxes"
return PredictResponse(model=MODEL_LABEL, latency_ms=latency, results=results, warning_message=warning)


Expand All @@ -230,18 +224,18 @@ async def health():
# ── entrypoint ────────────────────────────────────────────────────────
def main():
global dino_model, dino_processor, sam_model, sam_processor, device
global BOX_THRESHOLD, TEXT_THRESHOLD

parser = argparse.ArgumentParser(description="LabelU GroundingDINO + EfficientSAM Model Server")
parser = argparse.ArgumentParser(description="LabelU GroundingDINO + SAM Model Server")
parser.add_argument("--port", type=int, default=5000)
parser.add_argument("--host", type=str, default="0.0.0.0")
parser.add_argument("--device", type=str, default="cuda" if torch.cuda.is_available() else "cpu")
parser.add_argument("--no-sam", action="store_true", help="Disable EfficientSAM (detection only)")
parser.add_argument("--no-sam", action="store_true", help="Disable SAM (detection only)")
parser.add_argument("--box-threshold", type=float, default=BOX_THRESHOLD)
parser.add_argument("--text-threshold", type=float, default=TEXT_THRESHOLD)
args = parser.parse_args()

device = args.device
global BOX_THRESHOLD, TEXT_THRESHOLD
BOX_THRESHOLD = args.box_threshold
TEXT_THRESHOLD = args.text_threshold

Expand All @@ -253,13 +247,13 @@ def main():
logger.info("GroundingDINO loaded.")

if not args.no_sam:
from transformers import EfficientSamModel, SamImageProcessor
logger.info("Loading EfficientSAM (%s) on %s ...", SAM_MODEL_ID, device)
sam_processor = SamImageProcessor.from_pretrained(SAM_MODEL_ID)
sam_model = EfficientSamModel.from_pretrained(SAM_MODEL_ID).to(device).eval()
logger.info("EfficientSAM loaded.")
from transformers import SamModel, SamProcessor
logger.info("Loading SAM (%s) on %s ...", SAM_MODEL_ID, device)
sam_processor = SamProcessor.from_pretrained(SAM_MODEL_ID)
sam_model = SamModel.from_pretrained(SAM_MODEL_ID).to(device).eval()
logger.info("SAM loaded.")
else:
logger.info("EfficientSAM disabled (--no-sam).")
logger.info("SAM disabled (--no-sam).")

uvicorn.run(app, host=args.host, port=args.port)

Expand Down