Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 23131

wget python .tmp error doesn't work on specific web URL(web crawling )

$
0
0

Hello I have a weird problem in Python using wget, will be so grateful if someone could give me a help.

what I want to do :

download the file('.pdf','.djvu') from specific website(ex. wiki) with wget, Python. which should be easy.

the full target page

specific page I'm trying to do web crawl

getting the file link for wget

Problem :

it's really weird. At most pages in website, it works well.

But some pages with same HTML structure, it doesn't work.

Even in the same page, some files downloads well with wget but some doesn't

and getting this error message

Error message :`C:\start_automation\crawling_job>C:/Users/sa031/AppData/Local/Programs/Python/Python311/python.exe c:/start_automation/crawling_job/download_test.pyTraceback (most recent call last):  File "c:\start_automation\crawling_job\download_test.py", line 39, in <module>    wget.download(url)  File "C:\Users\sa031\AppData\Local\Programs\Python\Python311\Lib\site-packages\wget.py", line 303, in download    (fd, tmpfile) = tempfile.mkstemp(".tmp", prefix=prefix, dir=".")                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  File "C:\Users\sa031\AppData\Local\Programs\Python\Python311\Lib\tempfile.py", line 341, in mkstemp    return _mkstemp_inner(dir, prefix, suffix, flags, output_type)           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  File "C:\Users\sa031\AppData\Local\Programs\Python\Python311\Lib\tempfile.py", line 256, in _mkstemp_inner    fd = _os.open(file, flags, 0o600)         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^FileNotFoundError: [Errno 2] No such file or directory: 'C:\\start_automation\\crawling_job\\CADAL06210101_%E7%9A%87%E6%B8%85%E7%B6%93%E8%A7%A3%E7%BA%8C%E7%B7%A8%EF%BC%9A%E6%98%93%E5%9C%96%E6%A2%9D%E8%BE%AE%E7%9A%87%E6%B8%85%E7%B6%93%E8%A7%A3%E7%BA%8C%E7%B7%A8%EF%BC%9A%E8%99%9E%E6%B0%8F%E6%98%93%E4%BA%8B.djvu.3kii8ipd.tmp'`

What I have done :

googled, tested with several different pages in the wiki.

asking chatGPT and get the code with absolute path but doesn't work

import osimport wgetdef download_file(url, save_path):    try:        print("Downloading file...")        wget.download(url, save_path)        print("\nDownload complete!")    except Exception as e:        print(f"An error occurred: {e}")if __name__ == "__main__":    # URL of the file to download    file_url = "https://upload.wikimedia.org/wikipedia/commons/a/a7/CADAL06210101_%E7%9A%87%E6%B8%85%E7%B6%93%E8%A7%A3%E7%B6%8C%E7%B7%A8%EF%BC%9A%E6%98%93%E5%9C%96%E6%A2%9D%E8%BE%AE%E7%9A%87%E6%B8%85%E7%B6%93%E8%A7%A3%E7%B6%8C%E7%B7%A8%EF%BC%9A%E8%99%9E%E6%B0%8F%E6%98%93%E4%BA%8B.djvu"    # Specify an absolute path for saving the file    save_location = os.path.join(os.getcwd(), "downloaded_file.djvu")    # Call the function to download the file    download_file(file_url, save_location)

The code :

The code below is the code with URL included which doesn't work.

import wgeturl='https://upload.wikimedia.org/wikipedia/commons/a/a7/CADAL06210101_%E7%9A%87%E6%B8%85%E7%B6%93%E8%A7%A3%E7%BA%8C%E7%B7%A8%EF%BC%9A%E6%98%93%E5%9C%96%E6%A2%9D%E8%BE%AE%E7%9A%87%E6%B8%85%E7%B6%93%E8%A7%A3%E7%BA%8C%E7%B7%A8%EF%BC%9A%E8%99%9E%E6%B0%8F%E6%98%93%E4%BA%8B.djvu'wget.download(url)

maybe :

.djvu.3kii8ipd.tmp'

problem with this weird .tmp name shown on error message but have no idea.

Thanks for reading. Appreciate so much for the help.


Viewing all articles
Browse latest Browse all 23131

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>