Skip to content

高老师,您好!第12章例程执行出错[nltk_data] Error loading punkt: <urlopen error [Errno 11004] [nltk_data] getaddrinfo failed>,请帮忙看看程序是不是对网络有特殊依赖? #5

@easychu

Description

@easychu

Win11操作系统,rag1是我的虚拟环境名称,

  1. 安装anaconda
  2. 创建虚拟环境
    conda create -n rag1 python=3.10 -y
  3. 激活 activate rag1
    并 安装rag1所需 依赖,将依赖文件requirements.txt拷贝到 用户目录下:
    pip install -r requirements.txt -i https://pypi.mirrors.ustc.edu.cn/simple --trusted-host=pypi.mirrors.ustc.edu.cn
    4.从aliendao.cn下载text2vec-base-chinese模型文件
    python model_download.py --e --repo_id shibing624/text2vec-base-chinese --token YPY8KHDQ2NAHQ2SG
    下载后的文件在C:\Users\Z /dataroot/models/shibing624/text2vec-base-chinese目录
  4. github上下载 第12章代码到 D:\zhu\tech\RAG\rag1,并将嵌入模型text2vec-base-chinese 文件夹直接从 C:\Users\Z 转移到 D:\zhu\tech\RAG\rag1
  5. 执行 python rag-demo.py 出现如下错误:

(rag1) D:\zhu\tech\RAG\rag1>python rag-demo.py
0%| | 0/2 [00:00<?, ?it/s][nltk_data] Error loading punkt: <urlopen error [Errno 11004]
[nltk_data] getaddrinfo failed>
Error loading file documents\README.md
50%|██████████████████████████████████████████ | 1/2 [00:25<00:25, 25.90s/it]Traceback (most recent call last):
File "D:\zhu\tech\RAG\rag1\rag-demo.py", line 92, in
documents = load_docs("./documents")
File "D:\zhu\tech\RAG\rag1\rag-demo.py", line 24, in load_docs
documents = loader.load()
File "C:\Users\Z\anaconda3\envs\rag1\lib\site-packages\langchain_community\document_loaders\directory.py", line 158, in load
self.load_file(i, p, docs, pbar)
File "C:\Users\Z\anaconda3\envs\rag1\lib\site-packages\langchain_community\document_loaders\directory.py", line 107, in load_file
raise e
File "C:\Users\Z\anaconda3\envs\rag1\lib\site-packages\langchain_community\document_loaders\directory.py", line 100, in load_file
sub_docs = self.loader_cls(str(item), **self.loader_kwargs).load()
File "C:\Users\Z\anaconda3\envs\rag1\lib\site-packages\langchain_community\document_loaders\unstructured.py", line 87, in load
elements = self._get_elements()
File "C:\Users\Z\anaconda3\envs\rag1\lib\site-packages\langchain_community\document_loaders\unstructured.py", line 179, in get_elements
return partition(filename=self.file_path, **self.unstructured_kwargs)
File "C:\Users\Z\anaconda3\envs\rag1\lib\site-packages\unstructured\partition\auto.py", line 397, in partition
elements = partition_md(
File "C:\Users\Z\anaconda3\envs\rag1\lib\site-packages\unstructured\documents\elements.py", line 518, in wrapper
elements = func(*args, **kwargs)
File "C:\Users\Z\anaconda3\envs\rag1\lib\site-packages\unstructured\file_utils\filetype.py", line 591, in wrapper
elements = func(*args, **kwargs)
File "C:\Users\Z\anaconda3\envs\rag1\lib\site-packages\unstructured\file_utils\filetype.py", line 546, in wrapper
elements = func(*args, **kwargs)
File "C:\Users\Z\anaconda3\envs\rag1\lib\site-packages\unstructured\chunking_init
.py", line 52, in wrapper
elements = func(*args, **kwargs)
File "C:\Users\Z\anaconda3\envs\rag1\lib\site-packages\unstructured\partition\md.py", line 104, in partition_md
return partition_html(
File "C:\Users\Z\anaconda3\envs\rag1\lib\site-packages\unstructured\documents\elements.py", line 518, in wrapper
elements = func(*args, **kwargs)
File "C:\Users\Z\anaconda3\envs\rag1\lib\site-packages\unstructured\file_utils\filetype.py", line 591, in wrapper
elements = func(*args, **kwargs)
File "C:\Users\Z\anaconda3\envs\rag1\lib\site-packages\unstructured\file_utils\filetype.py", line 546, in wrapper
elements = func(*args, **kwargs)
File "C:\Users\Z\anaconda3\envs\rag1\lib\site-packages\unstructured\chunking_init
.py", line 52, in wrapper
elements = func(*args, **kwargs)
File "C:\Users\Z\anaconda3\envs\rag1\lib\site-packages\unstructured\partition\html.py", line 141, in partition_html
document_to_element_list(
File "C:\Users\Z\anaconda3\envs\rag1\lib\site-packages\unstructured\partition\common.py", line 559, in document_to_element_list
num_pages = len(document.pages)
File "C:\Users\Z\anaconda3\envs\rag1\lib\site-packages\unstructured\documents\xml.py", line 54, in pages
self._pages = self._parse_pages_from_element_tree()
File "C:\Users\Z\anaconda3\envs\rag1\lib\site-packages\unstructured\documents\html.py", line 176, in _parse_pages_from_element_tree
element = _parse_tag(tag_elem)
File "C:\Users\Z\anaconda3\envs\rag1\lib\site-packages\unstructured\documents\html.py", line 410, in _parse_tag
return _text_to_element(
File "C:\Users\Z\anaconda3\envs\rag1\lib\site-packages\unstructured\documents\html.py", line 458, in _text_to_element
elif is_narrative_tag(text, tag):
File "C:\Users\Z\anaconda3\envs\rag1\lib\site-packages\unstructured\documents\html.py", line 506, in is_narrative_tag
return tag not in HEADING_TAGS and is_possible_narrative_text(text)
File "C:\Users\Z\anaconda3\envs\rag1\lib\site-packages\unstructured\partition\text_type.py", line 77, in is_possible_narrative_text
if exceeds_cap_ratio(text, threshold=cap_threshold):
File "C:\Users\Z\anaconda3\envs\rag1\lib\site-packages\unstructured\partition\text_type.py", line 273, in exceeds_cap_ratio
if sentence_count(text, 3) > 1:
File "C:\Users\Z\anaconda3\envs\rag1\lib\site-packages\unstructured\partition\text_type.py", line 222, in sentence_count
sentences = sent_tokenize(text)
File "C:\Users\Z\anaconda3\envs\rag1\lib\site-packages\unstructured\nlp\tokenize.py", line 30, in sent_tokenize
return sent_tokenize(text)
File "C:\Users\Z\anaconda3\envs\rag1\lib\site-packages\nltk\tokenize_init
.py", line 119, in sent_tokenize
tokenizer = get_punkt_tokenizer(language)
File "C:\Users\Z\anaconda3\envs\rag1\lib\site-packages\nltk\tokenize_init
.py", line 105, in _get_punkt_tokenizer
return PunktTokenizer(language)
File "C:\Users\Z\anaconda3\envs\rag1\lib\site-packages\nltk\tokenize\punkt.py", line 1744, in init
self.load_lang(lang)
File "C:\Users\Z\anaconda3\envs\rag1\lib\site-packages\nltk\tokenize\punkt.py", line 1749, in load_lang
lang_dir = find(f"tokenizers/punkt_tab/{lang}/")
File "C:\Users\Z\anaconda3\envs\rag1\lib\site-packages\nltk\data.py", line 579, in find
raise LookupError(resource_not_found)
LookupError:


Resource punkt_tab not found.
Please use the NLTK Downloader to obtain the resource:

import nltk
nltk.download('punkt_tab')

For more information see: https://www.nltk.org/data.html

Attempted to load tokenizers/punkt_tab/english/

Searched in:
- 'C:\Users\Z/nltk_data'
- 'C:\Users\Z\anaconda3\envs\rag1\nltk_data'
- 'C:\Users\Z\anaconda3\envs\rag1\share\nltk_data'
- 'C:\Users\Z\anaconda3\envs\rag1\lib\nltk_data'
- 'C:\Users\Z\AppData\Roaming\nltk_data'
- 'C:\nltk_data'
- 'D:\nltk_data'
- 'E:\nltk_data'


50%|██████████████████████████████████████████ | 1/2 [00:27<00:27, 27.18s/it]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions