A high-performance Go library for detecting suspicious and potentially malicious URLs through entropy-based analysis, Unicode inspection, and structural pattern recognition.
LinkGuard provides multi-dimensional URL analysis to detect:
- π’ Entropy Analysis: Detects randomized/obfuscated URLs using Shannon entropy calculations
- π Unicode Attacks: Identifies homoglyphs (look-alike characters), invisible characters, and mixed scripts
- ποΈ Structural Anomalies: Recognizes suspicious patterns like IP addresses, excessive subdomains, double encoding, and suspicious TLDs
- π Risk Scoring: Provides both numeric scores (0.0-1.0) and categorical risk levels (None, Low, Medium, High, Critical)
- π¬ Human-Readable Reports: Generates detailed explanations for each detected suspicious signal
- π URL Extraction: Intelligently extract URLs from text, HTML, Markdown, and various content formats
- β‘ High Performance: Sub-microsecond analysis for simple URLs, ~3Β΅s for complex URLs
- π§ Customizable: Create custom analyzers with selected methods or implement your own analysis logic
- π¨ Flexible Architecture: Mix and match built-in methods, adjust weights, or add custom detection algorithms
go get github.com/solrac97gr/linkguardpackage main
import (
"fmt"
"github.com/solrac97gr/linkguard"
)
func main() {
// Create analyzer (uses default configuration)
analyzer := linkguard.NewAnalyzer()
// Analyze a single URL
result := analyzer.Analyze("https://g\u043E\u043Egle.com/login")
if result.IsSuspicious() {
fmt.Printf("β οΈ Suspicious link detected!\n")
fmt.Printf("Risk Level: %s\n", result.Risk)
fmt.Printf("Score: %.2f\n", result.Score)
fmt.Printf("Reasons:\n")
for _, reason := range result.Reasons {
fmt.Printf(" - %s\n", reason)
}
} else {
fmt.Printf("β
URL appears safe (Risk: %s)\n", result.Risk)
}
}Output:
β οΈ Suspicious link detected!
Risk Level: medium
Score: 0.53
Reasons:
- mixed Unicode scripts detected: [ASCII Cyrillic]
- 2 homoglyph character(s) found (look-alike characters from other scripts)
// Create analyzer (uses default configuration)
analyzer := linkguard.NewAnalyzer()
// Analyze a normal URL
result := analyzer.Analyze("https://www.example.com")
fmt.Printf("Score: %.2f, Risk: %s\n", result.Score, result.Risk)
// Output: Score: 0.03, Risk: noneanalyzer := linkguard.NewAnalyzer()
result := analyzer.Analyze("https://subdomain.example.com/path")
// Access entropy information
fmt.Printf("Shannon Entropy: %.2f\n", result.Entropy)
// Access Unicode analysis
fmt.Printf("Homoglyphs detected: %d\n", result.UnicodeFlags.HomoglyphCount)
fmt.Printf("Scripts found: %v\n", result.UnicodeFlags.ScriptsFound)
// Access structural analysis
fmt.Printf("Has IP address: %t\n", result.StructureFlags.HasIPAddress)
fmt.Printf("Subdomain count: %d\n", result.StructureFlags.SubdomainCount)
fmt.Printf("Path depth: %d\n", result.StructureFlags.PathDepth)// Calculate Shannon entropy independently
url := "https://x7k9q2m4.tk/a8f3e1b5c7d9"
entropy := linkguard.ShannonEntropy(url)
fmt.Printf("Entropy: %.2f\n", entropy)
// Get normalized entropy (0.0-1.0)
normalized := linkguard.NormalizedEntropy(url)
fmt.Printf("Normalized: %.2f\n", normalized)LinkGuard includes a powerful URL extractor that can find URLs in various text formats:
package main
import (
"fmt"
"github.com/solrac97gr/linkguard"
)
func main() {
// Extract URLs from plain text
text := "Check out https://example.com and www.github.com for more info!"
urls := linkguard.ExtractURLs(text)
fmt.Printf("Found URLs: %v\n", urls)
// Output: Found URLs: [https://example.com www.github.com]
// Extract from HTML content
htmlText := `<p>Visit <a href="https://secure-site.com">our site</a> or
contact us at support@company.com</p>`
urls = linkguard.ExtractURLs(htmlText)
fmt.Printf("HTML URLs: %v\n", urls)
// Output: HTML URLs: [https://secure-site.com]
// Extract from Markdown
markdown := "Click [here](https://example.org) or [there](https://test.com)"
urls = linkguard.ExtractURLs(markdown)
fmt.Printf("Markdown URLs: %v\n", urls)
// Output: Markdown URLs: [https://example.org https://test.com]
// Extract and analyze in one step
suspiciousText := "Urgent: verify your account at https://g00gle-security.tk/login"
urls = linkguard.ExtractURLs(suspiciousText)
analyzer := linkguard.NewAnalyzer()
for _, url := range urls {
result := analyzer.Analyze(url)
fmt.Printf("URL: %s | Risk: %s | Score: %.2f\n", url, result.Risk, result.Score)
}
// Output: URL: https://g00gle-security.tk/login | Risk: high | Score: 0.67
}Process large amounts of text content efficiently:
package main
import (
"fmt"
"strings"
"github.com/solrac97gr/linkguard"
)
func analyzeTextContent(content string) {
// Extract all URLs from the content
urls := linkguard.ExtractURLs(content)
if len(urls) == 0 {
fmt.Println("No URLs found in content")
return
}
// Analyze all extracted URLs
analyzer := linkguard.NewAnalyzer()
results := analyzer.AnalyzeMultiple(urls)
// Report findings
fmt.Printf("Found %d URLs in content:\n", len(urls))
var suspicious []linkguard.Result
for _, result := range results {
if result.IsSuspicious() {
suspicious = append(suspicious, result)
}
fmt.Printf(" %s - Risk: %s (%.2f)\n", result.URL, result.Risk, result.Score)
}
if len(suspicious) > 0 {
fmt.Printf("\nβ οΈ %d suspicious URLs detected!\n", len(suspicious))
for _, result := range suspicious {
fmt.Printf("\nURL: %s\n", result.URL)
for _, reason := range result.Reasons {
fmt.Printf(" - %s\n", reason)
}
}
}
}
func main() {
// Example email content with mixed URLs
emailContent := `
Subject: Account Verification Required
Dear Customer,
Please verify your account by clicking here: https://secure-bank.com/verify
If the above link doesn't work, try this alternative:
https://bnk-verification.tk/urgent/verify?token=x7k9q2m4z8
For support, visit our main site at https://www.realbank.com
or contact us at support@realbank.com
Best regards,
Security Team
`
analyzeTextContent(emailContent)
}package main
import (
"fmt"
"net/mail"
"strings"
"github.com/solrac97gr/linkguard"
"github.com/solrac97gr/linkguard/unicode"
"github.com/solrac97gr/linkguard/structure"
)
// EmailSecurityFilter processes email messages for malicious links
type EmailSecurityFilter struct {
analyzer *linkguard.Analyzer
}
func NewEmailSecurityFilter() *EmailSecurityFilter {
// Configure analyzer for phishing detection
analyzer := linkguard.NewAnalyzer(
unicode.New(0.4), // High weight for homoglyph detection
structure.New(0.6), // High weight for suspicious domains
)
return &EmailSecurityFilter{analyzer: analyzer}
}
func (f *EmailSecurityFilter) ScanEmail(msg *mail.Message) (*SecurityReport, error) {
// Read email body
body, err := msg.Header.Get("Body"), nil // Simplified
if err != nil {
return nil, err
}
// Extract URLs from email content
urls := linkguard.ExtractURLs(body)
report := &SecurityReport{
TotalURLs: len(urls),
URLs: make(map[string]linkguard.Result),
}
// Analyze each URL
for _, url := range urls {
result := f.analyzer.Analyze(url)
report.URLs[url] = result
if result.IsSuspicious() {
report.SuspiciousURLs++
if result.Risk >= linkguard.RiskHigh {
report.HighRiskURLs++
}
}
}
// Determine overall action
if report.HighRiskURLs > 0 {
report.Action = "BLOCK"
} else if report.SuspiciousURLs > 0 {
report.Action = "QUARANTINE"
} else {
report.Action = "ALLOW"
}
return report, nil
}
type SecurityReport struct {
TotalURLs int
SuspiciousURLs int
HighRiskURLs int
URLs map[string]linkguard.Result
Action string
}
func (r *SecurityReport) String() string {
var sb strings.Builder
sb.WriteString(fmt.Sprintf("Security Report - Action: %s\n", r.Action))
sb.WriteString(fmt.Sprintf("Total URLs: %d, Suspicious: %d, High Risk: %d\n",
r.TotalURLs, r.SuspiciousURLs, r.HighRiskURLs))
for url, result := range r.URLs {
if result.IsSuspicious() {
sb.WriteString(fmt.Sprintf(" β οΈ %s (Risk: %s)\n", url, result.Risk))
for _, reason := range result.Reasons {
sb.WriteString(fmt.Sprintf(" - %s\n", reason))
}
}
}
return sb.String()
}package main
import (
"fmt"
"time"
"github.com/solrac97gr/linkguard"
)
// ChatMessage represents a message in a chat application
type ChatMessage struct {
ID string
UserID string
Content string
Timestamp time.Time
}
// ChatModerator filters messages for malicious URLs
type ChatModerator struct {
analyzer *linkguard.Analyzer
maxRiskLevel linkguard.RiskLevel
}
func NewChatModerator(maxRisk linkguard.RiskLevel) *ChatModerator {
return &ChatModerator{
analyzer: linkguard.NewAnalyzer(),
maxRiskLevel: maxRisk,
}
}
func (m *ChatModerator) ProcessMessage(msg *ChatMessage) (*ModerationResult, error) {
// Extract URLs from message content
urls := linkguard.ExtractURLs(msg.Content)
result := &ModerationResult{
MessageID: msg.ID,
UserID: msg.UserID,
OriginalContent: msg.Content,
URLs: urls,
Action: "ALLOW",
}
if len(urls) == 0 {
return result, nil
}
// Analyze extracted URLs
var maxRisk linkguard.RiskLevel = linkguard.RiskNone
for _, url := range urls {
analysis := m.analyzer.Analyze(url)
result.Analyses = append(result.Analyses, analysis)
if analysis.Risk > maxRisk {
maxRisk = analysis.Risk
}
}
// Determine moderation action
if maxRisk > m.maxRiskLevel {
result.Action = "BLOCK"
result.Reason = fmt.Sprintf("Message contains %s risk URL(s)", maxRisk)
} else if maxRisk >= linkguard.RiskMedium {
result.Action = "FLAG"
result.Reason = "Message contains potentially suspicious URLs"
}
return result, nil
}
type ModerationResult struct {
MessageID string
UserID string
OriginalContent string
URLs []string
Analyses []linkguard.Result
Action string
Reason string
}
func (r *ModerationResult) ShouldBlock() bool {
return r.Action == "BLOCK"
}
func (r *ModerationResult) ShouldFlag() bool {
return r.Action == "FLAG" || r.Action == "BLOCK"
}package main
import (
"fmt"
"net/http"
"net/url"
"github.com/solrac97gr/linkguard"
)
// ProxyFilter intercepts and analyzes URLs in HTTP requests
type ProxyFilter struct {
analyzer *linkguard.Analyzer
blockLevel linkguard.RiskLevel
}
func NewProxyFilter() *ProxyFilter {
return &ProxyFilter{
analyzer: linkguard.NewAnalyzer(),
blockLevel: linkguard.RiskHigh,
}
}
func (p *ProxyFilter) ProcessRequest(req *http.Request) (*ProxyDecision, error) {
decision := &ProxyDecision{
URL: req.URL.String(),
Method: req.Method,
Action: "ALLOW",
}
// Analyze the request URL
result := p.analyzer.Analyze(req.URL.String())
decision.Analysis = result
// Check if URL should be blocked
if result.Risk >= p.blockLevel {
decision.Action = "BLOCK"
decision.Reason = fmt.Sprintf("URL risk level (%s) exceeds threshold", result.Risk)
} else if result.IsSuspicious() {
decision.Action = "WARN"
decision.Reason = "URL shows suspicious characteristics"
}
// Check referer for additional context
if referer := req.Header.Get("Referer"); referer != "" {
refererResult := p.analyzer.Analyze(referer)
if refererResult.Risk >= linkguard.RiskMedium {
decision.RefererAnalysis = &refererResult
if decision.Action == "ALLOW" {
decision.Action = "WARN"
decision.Reason = "Suspicious referer detected"
}
}
}
return decision, nil
}
type ProxyDecision struct {
URL string
Method string
Action string
Reason string
Analysis linkguard.Result
RefererAnalysis *linkguard.Result
}
func (d *ProxyDecision) ShouldBlock() bool {
return d.Action == "BLOCK"
}
func (d *ProxyDecision) ShouldWarn() bool {
return d.Action == "WARN" || d.Action == "BLOCK"
}LinkGuard provides a flexible architecture that allows you to customize detection logic.
All analyzers are created using NewAnalyzer():
// Default configuration (Entropy: 35%, Unicode: 35%, Structure: 30%)
analyzer := linkguard.NewAnalyzer()
result := analyzer.Analyze("https://example.com")Import analysis method packages and configure weights explicitly:
import (
"github.com/solrac97gr/linkguard"
"github.com/solrac97gr/linkguard/shannon"
"github.com/solrac97gr/linkguard/unicode"
"github.com/solrac97gr/linkguard/structure"
)
// Custom analyzer with specific weights
analyzer := linkguard.NewAnalyzer(
shannon.New(0.2), // 20% entropy
unicode.New(0.6), // 60% unicode
structure.New(0.2), // 20% structure
)
// Use only specific analysis methods
analyzer := linkguard.NewAnalyzer(
unicode.New(1.0), // Only unicode detection
)Adjust the importance of different analysis methods based on your security focus:
import (
"github.com/solrac97gr/linkguard"
"github.com/solrac97gr/linkguard/shannon"
"github.com/solrac97gr/linkguard/unicode"
"github.com/solrac97gr/linkguard/structure"
)
// Security-focused: prioritize structure over entropy
securityAnalyzer := linkguard.NewAnalyzer(
shannon.New(0.2),
unicode.New(0.3),
structure.New(0.5),
)
// Phishing-focused: prioritize Unicode detection
phishingAnalyzer := linkguard.NewAnalyzer(
shannon.New(0.2),
unicode.New(0.6),
structure.New(0.2),
)
// Obfuscation-focused: prioritize entropy
obfuscationAnalyzer := linkguard.NewAnalyzer(
shannon.New(0.7),
unicode.New(0.2),
structure.New(0.1),
)Implement the AnalysisMethod interface to create your own detection logic:
type AnalysisMethod interface {
Analyze(rawURL string, result *Result) float64 // Returns 0.0-1.0 score
Weight() float64 // Returns method weight
Name() string // Returns method name
}type KeywordAnalysis struct {
WeightValue float64
Keywords []string
}
func (k *KeywordAnalysis) Analyze(rawURL string, result *linkguard.Result) float64 {
if k.Keywords == nil {
k.Keywords = []string{"login", "admin", "password", "verify"}
}
lowercaseURL := strings.ToLower(rawURL)
for _, keyword := range k.Keywords {
if strings.Contains(lowercaseURL, keyword) {
return 0.8 // High suspicion score for keyword match
}
}
return 0.0
}
func (k *KeywordAnalysis) Weight() float64 {
if k.WeightValue == 0 {
return 0.3 // default weight
}
return k.WeightValue
}
func (k *KeywordAnalysis) Name() string {
return "Keyword Analysis"
}
// Usage
analyzer := linkguard.NewAnalyzer(
&linkguard.EntropyAnalysis{WeightValue: 0.3},
&linkguard.UnicodeAnalysis{WeightValue: 0.3},
&KeywordAnalysis{
WeightValue: 0.4,
Keywords: []string{"login", "verify", "secure", "banking"},
},
)type LengthAnalysis struct {
WeightValue float64
MaxLength int
}
func (l *LengthAnalysis) Analyze(rawURL string, result *linkguard.Result) float64 {
maxLen := l.MaxLength
if maxLen == 0 {
maxLen = 100 // default threshold
}
if len(rawURL) > maxLen {
excess := float64(len(rawURL) - maxLen)
score := excess / 100.0
if score > 1.0 {
return 1.0
}
return score
}
return 0.0
}
func (l *LengthAnalysis) Weight() float64 {
if l.WeightValue == 0 {
return 0.2
}
return l.WeightValue
}
func (l *LengthAnalysis) Name() string {
return "Length Analysis"
}type TLDWhitelistAnalysis struct {
WeightValue float64
AllowedTLDs map[string]bool
PenaltyScore float64
}
func (t *TLDWhitelistAnalysis) Analyze(rawURL string, result *linkguard.Result) float64 {
if t.AllowedTLDs == nil {
t.AllowedTLDs = map[string]bool{
".com": true,
".org": true,
".edu": true,
".gov": true,
}
}
parsed, err := url.Parse(rawURL)
if err != nil {
return 0.5 // parsing failure is suspicious
}
host := parsed.Hostname()
for tld := range t.AllowedTLDs {
if strings.HasSuffix(host, tld) {
return 0.0 // whitelisted TLD
}
}
// Non-whitelisted TLD
if t.PenaltyScore == 0 {
return 0.4
}
return t.PenaltyScore
}
func (t *TLDWhitelistAnalysis) Weight() float64 {
if t.WeightValue == 0 {
return 0.25
}
return t.WeightValue
}
func (t *TLDWhitelistAnalysis) Name() string {
return "TLD Whitelist Analysis"
}Designed to detect phishing links in emails:
import (
"github.com/solrac97gr/linkguard"
"github.com/solrac97gr/linkguard/unicode"
"github.com/solrac97gr/linkguard/structure"
)
func NewEmailLinkAnalyzer() *linkguard.Analyzer {
return linkguard.NewAnalyzer(
unicode.New(0.4), // Homoglyph attacks common in phishing
structure.New(0.3), // IP addresses, suspicious TLDs
&KeywordAnalysis{
WeightValue: 0.3,
Keywords: []string{"verify", "urgent", "suspended", "confirm", "account"},
},
)
}
// Usage
emailAnalyzer := NewEmailLinkAnalyzer()
results := emailAnalyzer.AnalyzeMultiple(extractedLinks)Designed to detect malicious redirects:
import (
"github.com/solrac97gr/linkguard"
"github.com/solrac97gr/linkguard/shannon"
"github.com/solrac97gr/linkguard/structure"
)
func NewRedirectAnalyzer() *linkguard.Analyzer {
return linkguard.NewAnalyzer(
shannon.New(0.3),
structure.New(0.4),
&LengthAnalysis{
WeightValue: 0.3,
MaxLength: 150, // Long URLs often used in malicious redirects
},
)
}For validating URLs within controlled environments:
import (
"github.com/solrac97gr/linkguard"
"github.com/solrac97gr/linkguard/structure"
)
func NewInternalURLValidator(allowedDomains []string) *linkguard.Analyzer {
domainMap := make(map[string]bool)
for _, d := range allowedDomains {
domainMap[d] = true
}
return linkguard.NewAnalyzer(
structure.New(0.5),
&TLDWhitelistAnalysis{
WeightValue: 0.5,
AllowedTLDs: map[string]bool{".com": true, ".org": true},
},
)
}Add analysis methods to an analyzer after creation:
import (
"github.com/solrac97gr/linkguard"
"github.com/solrac97gr/linkguard/shannon"
)
analyzer := linkguard.NewAnalyzer(shannon.New(0.5))
// Add more methods at runtime
analyzer.AddMethod(&KeywordAnalysis{
WeightValue: 0.5,
Keywords: []string{"phishing", "suspicious"},
})
result := analyzer.Analyze(url)analyzer := linkguard.NewAnalyzer()
methods := analyzer.Methods()
for _, method := range methods {
fmt.Printf("Method: %s, Weight: %.2f\n", method.Name(), method.Weight())
}
// Output:
// Method: Entropy Analysis, Weight: 0.35
// Method: Unicode Analysis, Weight: 0.35
// Method: Structure Analysis, Weight: 0.30- Weight Normalization: Weights don't need to sum to 1.0 - they're automatically normalized
- Zero Weights: If
WeightValueis 0 or negative, the method will use its default weight - Stateless Methods: Keep analysis methods stateless when possible for thread safety
- Error Handling: Return a moderate score (0.4-0.6) when errors occur during analysis
- Testing: Always test custom methods with both safe and suspicious URLs
- Performance: For batch operations, create the analyzer once and reuse it
The Analyzer and analysis methods are safe for concurrent use:
analyzer := linkguard.NewAnalyzer()
var wg sync.WaitGroup
for _, url := range urls {
wg.Add(1)
go func(u string) {
defer wg.Done()
result := analyzer.Analyze(u)
// Process result...
}(url)
}
wg.Wait()| Attack Type | Example | Detection |
|---|---|---|
| Homoglyphs | g\u043E\u043Egle.com (Cyrillic ΠΎ) |
β Detected |
| Invisible chars | exam\u200Bple.com (zero-width space) |
β Detected |
| Mixed scripts | Latin + Cyrillic + Greek | β Detected |
| Fullwidth chars | \uFF21bc.com (fullwidth A) |
β Detected |
| Pattern | Example | Risk Impact |
|---|---|---|
| IP address | http://192.168.1.1/admin |
+0.3 |
| Non-standard port | https://example.com:8443 |
+0.15 |
| Excessive subdomains | a.b.c.d.e.example.com |
+0.2 |
| Double encoding | %2520admin |
+0.3 |
| Suspicious TLD | .tk, .ml, .ga |
+0.2 |
| @ sign trick | google.com@evil.com |
+0.35 |
| Excessive length | >200 characters | +0.1 |
| Entropy | Classification | Example |
|---|---|---|
| 0.0-3.5 | Normal | https://example.com |
| 3.5-4.0 | Slightly elevated | https://my-site123.com/page |
| 4.0-4.5 | Mildly suspicious | Short random strings |
| 4.5-5.0 | Suspicious | https://x7k9q2m4.tk |
| 5.0+ | Highly suspicious | Long random strings |
LinkGuard can intelligently extract URLs from various content types:
| Content Type | Example | Extraction Support |
|---|---|---|
| Plain Text | Visit https://example.com today |
β Full support |
| HTML Content | <a href="https://site.com">Link</a> |
β Parses href attributes |
| Markdown Links | [Click here](https://example.org) |
β Extracts both display and target URLs |
| Email Content | Contact: support@company.com |
β Context-aware filtering |
| Mixed Formats | HTML + Markdown + plain text | β Multi-pass extraction |
| Unicode URLs | https://ο½
ο½ο½ο½ο½ο½ο½
οΌο½ο½ο½ |
β Full-width character support |
| IPv6 URLs | http://[2001:db8::1]:8080 |
β Bracket preservation |
| Complex Schemes | magnet:?xt=urn:btih:... |
β Data, magnet, tel, sms protocols |
| Misleading Links | [google.com](http://evil.com) |
β Extracts both URLs for analysis |
| Feature | Description | Example |
|---|---|---|
| Smart Punctuation | Removes sentence punctuation while preserving URL syntax | Visit https://site.com! β https://site.com |
| Markdown Syntax | Handles malformed markdown patterns | [text](url β Properly extracts url |
| False Positive Filtering | Excludes file paths and non-URL patterns | file.txt β Not extracted |
| Email Context Detection | Excludes emails from conversational contexts | "contact me at user@domain.com" β Not extracted |
| Deduplication | Removes duplicate URLs from multiple extraction passes | Multiple <a> tags with same URL β Single result |
| Operation | Performance | Notes |
|---|---|---|
| Simple text | ~2Β΅s | Plain URLs in text |
| HTML parsing | ~5Β΅s | Including href extraction |
| Markdown processing | ~3Β΅s | Link syntax parsing |
| Mixed content | ~8Β΅s | All extraction methods |
Benchmarks run on Apple M3 (arm64):
| Operation | Time | Memory | Allocs |
|---|---|---|---|
| Full Analysis | |||
| Simple URL | 1142 ns/op | 744 B/op | 8 allocs/op |
| Suspicious URL | 3189 ns/op | 1784 B/op | 22 allocs/op |
| Per-Method Analysis | |||
| Entropy Analysis | 1100 ns/op | 936 B/op | 5 allocs/op |
| Unicode Analysis | 1124 ns/op | 48 B/op | 2 allocs/op |
| Structure Analysis | 370.7 ns/op | 256 B/op | 4 allocs/op |
Key Takeaways:
- β‘ Sub-microsecond analysis for simple URLs
- π ~3 Β΅s for complex URL analysis
- πΎ Low memory footprint (~1-3 KB per URL)
- π Excellent parallel scaling for batch operations
Run benchmarks yourself:
go test -bench=. -benchmem -tags=benchmarkLinkGuard uses a weighted scoring system that combines three independent analysis modules:
βββββββββββββββββββββββββββββββββββββββββββ
β Analyze(url string) β
βββββββββββββββββββ¬ββββββββββββββββββββββββ
β
βββββββββββ΄ββββββββββ
β β
βΌ βΌ
βββββββββββββββββ βββββββββββββββββ
β Entropy β β Unicode β
β Analysis β β Analysis β
β (35% wt.) β β (35% wt.) β
βββββββββ¬ββββββββ βββββββββ¬ββββββββ
β β
ββββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β Structure β
β Analysis β
β (30% wt.) β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β Weighted Sum β
β Score [0,1] β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β Risk Level β
β (None β Crit) β
βββββββββββββββββββ
| Score Range | Risk Level |
|---|---|
| 0.00 - 0.15 | None |
| 0.15 - 0.30 | Low |
| 0.30 - 0.55 | Medium (Suspicious) |
| 0.55 - 0.80 | High |
| 0.80 - 1.00 | Critical |
Creates an analyzer with specified analysis methods.
When called with no arguments, returns an analyzer with default configuration (Entropy: 35%, Unicode: 35%, Structure: 30%).
Parameters: Optional variadic list of AnalysisMethod implementations.
Returns: Configured Analyzer instance.
Examples:
// Default configuration
analyzer := linkguard.NewAnalyzer()
// Custom configuration using subpackages
import (
"github.com/solrac97gr/linkguard"
"github.com/solrac97gr/linkguard/shannon"
"github.com/solrac97gr/linkguard/unicode"
)
analyzer := linkguard.NewAnalyzer(
shannon.New(0.4),
unicode.New(0.6),
)Analyzes a URL using the analyzer's configured methods.
Parameters: URL string to analyze.
Returns: Result with analysis details.
Analyzes multiple URLs using the analyzer's configured methods.
Parameters: Slice of URL strings.
Returns: Slice of Result structs.
Adds a new analysis method to the analyzer.
Parameters: AnalysisMethod implementation to add.
Returns a copy of the analysis methods used by the analyzer.
Returns: Slice of AnalysisMethod interfaces.
Calculates Shannon entropy of a string.
Returns: Entropy value (typically 0.0-6.0 for URLs).
Calculates Shannon entropy normalized to [0, 1].
Returns: Normalized entropy value.
Extracts all URLs from text content including plain text, HTML, and Markdown formats.
Parameters:
text- Input text content to scan for URLs
Returns: Slice of unique URLs found in the text
Features:
- Supports HTTP/HTTPS, FTP, file://, mailto:, tel:, sms:, magnet:, and data: URLs
- Extracts URLs from HTML
hrefattributes - Processes Markdown-style links
[text](url) - Handles Unicode URLs and full-width characters
- Context-aware email filtering (excludes emails in conversational contexts)
- Intelligent punctuation trimming (preserves legitimate URL characters)
- IPv6 URL support
Examples:
// Basic extraction
text := "Visit https://example.com and www.github.com"
urls := linkguard.ExtractURLs(text)
// Returns: ["https://example.com", "www.github.com"]
// HTML content
html := `<a href="https://site.com">Link</a>`
urls = linkguard.ExtractURLs(html)
// Returns: ["https://site.com"]
// Markdown links
markdown := "Check [this](https://example.org) out"
urls = linkguard.ExtractURLs(markdown)
// Returns: ["https://example.org"]
// Mixed content with phishing detection
content := "Click [Google](https://g00gle-security.tk/login)"
urls = linkguard.ExtractURLs(content)
// Returns: ["https://g00gle-security.tk/login"]
// Complex URLs
complex := "Contact us at mailto:support@example.com or visit http://[2001:db8::1]:8080"
urls = linkguard.ExtractURLs(complex)
// Returns: ["http://[2001:db8::1]:8080"]Creates a Shannon entropy analyzer.
Package: github.com/solrac97gr/linkguard/shannon
Parameters: weight - importance of this method (standard: 0.35)
Returns: AnalysisMethod for entropy detection
Creates a Unicode attack detector.
Package: github.com/solrac97gr/linkguard/unicode
Parameters: weight - importance of this method (standard: 0.35)
Returns: AnalysisMethod for homoglyph and script detection
Creates a URL structure analyzer.
Package: github.com/solrac97gr/linkguard/structure
Parameters: weight - importance of this method (standard: 0.30)
Returns: AnalysisMethod for structural pattern detection
Create custom analysis methods by implementing:
type AnalysisMethod interface {
Analyze(rawURL string, result *Result) float64 // Returns 0.0-1.0 score
Weight() float64 // Returns method weight
Name() string // Returns method name
}type Result struct {
URL string // Original input URL
Score float64 // Overall suspicion score (0.0-1.0)
Risk RiskLevel // Categorical risk level
Reasons []string // Human-readable explanations
Entropy float64 // Shannon entropy
UnicodeFlags UnicodeReport // Unicode analysis details
StructureFlags StructureReport // Structure analysis details
}Returns true if risk level is Medium or above.
const (
RiskNone RiskLevel = iota // No suspicious signals
RiskLow // Minor suspicious signals
RiskMedium // Moderate suspicious signals
RiskHigh // Strong suspicious signals
RiskCritical // Very strong suspicious signals
)type UnicodeReport struct {
MixedScripts bool // Multiple Unicode scripts detected
HomoglyphCount int // Number of look-alike characters
InvisibleCharCount int // Number of invisible/zero-width chars
NonASCIIRatio float64 // Fraction of non-ASCII characters
ScriptsFound []string // Unicode script names found
}type StructureReport struct {
ExcessiveSubdomains bool // >3 subdomains
SubdomainCount int // Number of subdomains
HasIPAddress bool // Host is IP address
HasPortNumber bool // Non-standard port specified
ExcessivePathDepth bool // >5 path segments
PathDepth int // Number of path segments
EncodedCharRatio float64 // Percent-encoded character ratio
HasDoubleEncoding bool // Multiple encoding layers
SuspiciousTLD bool // TLD associated with abuse
HasAtSign bool // @ in authority section
ExcessiveLength bool // >200 characters
Length int // Total character count
}# Run all tests
go test
# Run with coverage
go test -cover
# Run specific test
go test -run TestAnalyze_HomoglyphDomain
# Run with race detection
go test -race -v# Run all benchmarks
go test -bench=.
# Run with memory statistics
go test -bench=. -benchmem
# Run specific benchmark
go test -bench=BenchmarkAnalyze_Simple
# Run for longer duration
go test -bench=. -benchtime=5slinkguard/
βββ analyzer.go # Main analysis coordinator and scoring
βββ analyzer_interface.go # Analyzer and AnalysisMethod interfaces
βββ methods.go # Built-in analysis method implementations
βββ entropy.go # Shannon entropy calculations
βββ unicode.go # Unicode and homoglyph detection
βββ structure.go # URL structure analysis
βββ extractor.go # URL extraction from text, HTML, and Markdown
βββ linkguard.go # Public API and type definitions
βββ linkguard_test.go # Unit and integration tests
βββ extractor_test.go # URL extraction tests
βββ analyzer_test.go # Analyzer and custom method tests
βββ examples_test.go # Usage examples
βββ linkguard_bench_test.go # Performance benchmarks
βββ CLAUDE.md # AI assistant guidance
βββ README.md # Complete documentation (you are here)
- Email Security Gateways: Extract and analyze all URLs from incoming emails before delivery
- Web Proxies & Firewalls: Filter suspicious URLs at the network gateway level
- Chat & Messaging Apps: Real-time protection against malicious links in conversations
- Social Media Monitoring: Scan posts and comments for suspicious link patterns
- Security Information Systems: Integrate into SIEM and threat intelligence platforms
- Browser Extensions: Real-time link safety checking before user interaction
- API Security: Validate redirect URLs and callback endpoints in API requests
- Incident Response: Analyze URLs from security incidents and threat reports
- Document Analysis: Extract URLs from PDFs, Word docs, and other file formats
- Web Scraping Safety: Validate extracted links during crawling operations
- Content Management: Scan user-generated content for malicious links
- Marketing Analytics: Extract and validate campaign links from various sources
- Compliance Monitoring: Ensure shared links meet organizational security policies
- Threat Intelligence: Process IOC feeds and extract URLs for analysis
- Digital Forensics: Extract and analyze URLs from digital evidence
- Automated Testing: Validate links in test suites and CI/CD pipelines
- Not a complete security solution: Should be used as part of a layered defense strategy
- False positives possible: Legitimate URLs with unusual characteristics may be flagged
- No DNS/WHOIS checking: Only analyzes the URL string itself
- No active scanning: Does not fetch or execute URLs
- Best effort Unicode detection: Some advanced Unicode attacks may evade detection
Contributions are welcome! Here's how you can help:
- Report bugs: Open an issue with detailed reproduction steps
- Suggest features: Propose new detection patterns or improvements
- Submit PRs: Add new detection capabilities or fix bugs
- Improve docs: Help make documentation clearer
When contributing:
- Write tests for new features
- Run benchmarks to ensure no performance regression
- Follow existing code style
- Update documentation as needed
Built with inspiration from:
- OWASP URL validation guidelines
- Unicode security best practices
- Shannon's information theory
- Phishing detection research
- GitHub: @solrac97gr
- Issues: GitHub Issues
Made with β€οΈ by Carlos Garcia