如何批量下载这网站图片? 文件名带随机字符 但我觉得可以搞定

换我的话直接妥协使用半自动方法了:

半自动主要是翻页,每次触发都是翻页+获取图片信息(或直接下载)

以下搭配自动化工具,如 Keyboard Maestro

笨法

翻页用 ←→
给 Option + 右键绑快捷键,下载用这个快捷键

再高级一点

翻页用 ←→
运行 JavaScript 提取图片链接到一个文件中

使用正则表达把尺寸正确的链接提取出来(链接中包含了尺寸信息)

然后再批量下载。

正则表达和 JavaScript 可以直接问 AI。

let images = document.querySelectorAll('img');
images.forEach(img => console.log(img.src));

命令-打开设置-其他-自定义大图规则
用下面的代码覆盖原有的,确定,然后刷新网页,就能复制下载大图了

[
    {
        "name": "fashionsnap",
        "url": "^https://www.fashionsnap.com/",
        "r": "width=600",
        "s": "width=24800"
    }
]

不是大规模的爬网站,就用这个扩展吧

直接解析页面标签的方式需要获取图片对象的data-path自定义属性,大图的地址在这里面。
然而这个网站非常友好的在后端生成过完整的图片列表还把元数据以 JSON 格式附带在网页中了,具体在一个id属性是__NEXT_DATA__<script>中。
可以试试以下代码,只依赖标准库,能够下载大图。可以将要下载的网页地址作为命令行参数传入(支持多个),或者每个地址一个字符串写在targets数组中。程序会在当前目录下以每个指定的页面标题为名称创建文件夹,将图片按照页面显示顺序保存在文件夹中。

from __future__ import annotations

from concurrent.futures import ALL_COMPLETED, ProcessPoolExecutor, wait
from html import unescape
from json import loads
from pathlib import Path
from re import search
from sys import argv
from traceback import print_exception
from urllib.parse import urljoin
from urllib.request import Request, urlopen

targets = []

CommonHeader = {
  "User-Agent": "Mozilla/5.0 (Linux; Android 10; SM-P610) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/86.0.4240.111 Safari/537.36",
}


def download_image(url: str, destination: Path) -> None:
  request = Request(url, headers=CommonHeader)
  response = urlopen(request, timeout=8)
  with destination.open("wb") as saved_file:
    saved_file.write(response.read())
  print(url)


def download_page(url: str, pool: ProcessPoolExecutor) -> None:
  request = Request(url, headers=CommonHeader)
  response = urlopen(request, timeout=8)
  page = response.read().decode("utf-8").replace("\n", "").replace("\r", "")
  raw_json = search('<script id="__NEXT_DATA__" type="application/json">(.+?)</script>', page)[1]
  information = loads(raw_json)
  information = information["props"]["pageProps"]["article"]

  title = unescape(search("<title>(.+?)</title>", page)[1])
  translate_table = str.maketrans("", "", ">:<?!|*/")
  title = title.translate(translate_table)
  print(title)

  destination_directory = Path.cwd().joinpath(f"{title}")
  destination_directory.mkdir(parents=False, exist_ok=True)
  futures = [
    pool.submit(
      download_image,
      url=urljoin(base=url, url=item["item"]["path"]),
      destination=destination_directory.joinpath(
        f"{item['sortOrder']:03d}-{Path(item['item']['path']).name}",
      ),
    )
    for item in information["articleItems"]
  ]

  finished, _ = wait(futures, return_when=ALL_COMPLETED)
  for future in finished:
    exception = future.exception()
    if exception is not None:
      print_exception(exception)


targets.extend(argv[1:])
with ProcessPoolExecutor() as pool:
  for target in targets:
    download_page(target, pool)

几乎写错误处理,姑且能跑就行吧 :laughing:

列表上的图片地址是缩略图的,去掉中间/asset/format=auto,width=800就是原图地址。
所以python脚本里直接根据文件名拼接了,因为前面都是一样的。

谢谢,有道理,我朋友都推荐我 什么半自动机器人 RPA (可能拼写错)-_-||
影刀 大概是这个名字吧,我下载后还学习了半天,也没搞出来,哈哈

谢谢啊。大概理解。

谢谢详细的解释。已私信您

要不使用小书签试一试


quicker有相关动作(查看网页所有图片 - by Qthzrvy - 动作信息 - Quicker)

或者新建一个书签,内容为下面内容

上面的小书签内容为图片,尺寸,下面的小书签内容为文件名,图片,内容

javascript:(function(){outText='';for(i=0;i<document.images.length;i++){var image=document.images[i];var maxWidthStyle=%22max-width:700px;%22;var imageTag='<a href=%22'+image.src+'%22 target=%22_blank%22><img style=%22'+(image.tagName.toLowerCase()===%22img%22||image.tagName.toLowerCase()===%22svg%22?maxWidthStyle:%22%22)+%27%22%20src=%22%27+image.src+%27%22%3E%3C/a%3E%27;if(outText.indexOf(image.src)==-1){outText+=%27%3Ctr%3E%3Ctd%20style=%22width:%20700px;%22%3E%3Cdiv%20style=%22max-width:%20700px;%20overflow:%20hidden;%22%3E%27+imageTag+%27%3C/div%3E%3C/td%3E%3Ctd%3E%27+image.naturalWidth+%27x%27+image.naturalHeight+%27%3C/td%3E%3C/tr%3E%3Cp%3E%27;}};if(outText!=%27%27){imgWindow=window.open(%27%27,%27imgWin%27);imgWindow.document.write(%27%3Ctable%20style=margin:auto%20border=1%20cellpadding=10%3E%3Ctr%3E%3Cth%3EImage%3C/th%3E%3Cth%3ESize%3C/th%3E%3C/tr%3E%3Cp%3E%27+outText+%27%3C/table%3E%3Cp%3E%27);imgWindow.document.close();};var%20previousAlert=document.getElementById(%27clipboard-alert%27);if(previousAlert){clearTimeout(previousAlert.timeoutId);document.body.removeChild(previousAlert);}var%20tempAlert=document.createElement(%27div%27);tempAlert.id=%27clipboard-alert%27;tempAlert.textContent=%27%E6%B2%A1%E6%9C%89%E5%9B%BE%E7%89%87%EF%BC%81%27;var%20alertStyles={%27min-width%27:%27150px%27,%27margin-left%27:%27-75px%27,%27background-color%27:%27#3B7CF1','color':'white','text-align':'center','border-radius':'4px','padding':'14px','position':'fixed','z-index':'9999999','left':'50%25','top':'30px','font-size':'16px','font-family':'sans-serif'};for(var%20style%20in%20alertStyles){tempAlert.style.setProperty(style,alertStyles[style]);}document.body.appendChild(tempAlert);tempAlert.timeoutId=setTimeout(function(){document.body.removeChild(tempAlert);},1000);})();

javascript:(function(){var A={},B=[],D=document,i,e,a,k,y,s,m,u,t,r,j,v,h,q,c,G; G=open().document;G.open();G.close(); function C(t){return G.createElement(t)}function P(p,c){p.appendChild(c)}function T(t){return G.createTextNode(t)}for(i=0;e=D.images[i];++i){a=e.getAttribute('alt');k=escape(e.src)+'%'+(a!=null)+a;if(!A[k]){y=!!a+(a!=null);s=C('span');s.style.color=['red','gray','green'][y];s.style.fontStyle=['italic','italic',''][y];P(s,T(['missing','empty',a][y]));m=e.cloneNode(true); if(G.importNode)m=G.importNode(m, true); if(m.width>350)m.width=350;B.push([0,7,T(e.src.split('/').reverse()[0]),m,s]);A[k]=B.length;}u=B[A[k]-1];u[1]=(T(++u[0]));}t=C('table');t.border=1;r=t.createTHead().insertRow(-1);for(j=0;v=['#','Filename','Image','Alternate%20text'][j];++j){h=C('th');P(h,T(v));P(r,h);}for(i=0;q=B[i];++i){r=t.insertRow(-1);for(j=1;v=q[j];++j){c=r.insertCell(-1);P(c,v);}}%20P(G.body,t);})()

我用了,确实取到了,但是有2个问题。

  1. 取到的是 600宽度的小图 (原图是 2450宽度)
  2. 取到的是前20张,原始网址是108张

感谢!!