Skip to content

AttributeError when Content-Disposition header doesn't contain filename parameter #31

Description

Description

When using download_protected_document() (or other file download methods), an AttributeError: 'NoneType' object has no attribute 'group' can occur intermittently if the server returns a Content-Disposition header that doesn't match the expected regex pattern.

Environment

  • Library version: 6.2.0 (latest)
  • Python version: 3.12

Root Cause

The issue is in model_utils.py at line 1399-1401 in the deserialize_file() function:

if content_disposition:
    filename = re.search(r'filename=[\'"]?([^\'"\s]+)[\'"]?',
                         content_disposition).group(1)
    path = os.path.join(os.path.dirname(path), filename)

The code checks if content_disposition is truthy, but does not check if re.search() actually finds a match. When the header exists but doesn't contain filename= in the expected format, re.search() returns None and calling .group(1) on None raises AttributeError.

Scenarios That Can Cause This

  1. Content-Disposition: attachment (no filename parameter)
  2. Content-Disposition: attachment; filename*=UTF-8''encoded%20name (RFC 5987 encoding)
  3. Other non-standard header formats

Error Message

AttributeError: 'NoneType' object has no attribute 'group'

Suggested Fix

Add a null check before calling .group():

if content_disposition:
    match = re.search(r'filename=[\'"]?([^\'"\s]+)[\'"]?', content_disposition)
    if match:
        filename = match.group(1)
        path = os.path.join(os.path.dirname(path), filename)

Or optionally, also handle RFC 5987 encoded filenames:

if content_disposition:
    # Try standard filename first
    match = re.search(r'filename=[\'"]?([^\'"\s]+)[\'"]?', content_disposition)
    if not match:
        # Try RFC 5987 encoded filename (filename*=)
        match = re.search(r"filename\*=(?:UTF-8''|utf-8'')([^;\s]+)", content_disposition)
    if match:
        filename = match.group(1)
        # URL decode if needed for RFC 5987
        if 'filename*=' in content_disposition:
            from urllib.parse import unquote
            filename = unquote(filename)
        path = os.path.join(os.path.dirname(path), filename)

Workaround

As a temporary workaround, users can pass _preload_content=False to file download methods to bypass the deserialization:

response = api.download_protected_document(id=document_id, _preload_content=False)
content = response.read()

This returns the raw urllib3.HTTPResponse object instead of going through deserialize_file().

Additional Context

This is a common bug pattern - similar issues have been reported and fixed in other libraries that parse Content-Disposition headers (e.g., gdown, requests-toolbelt).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions