X-Crawl

X-Crawl是一个灵活的节点。JS多功能爬网库。灵活的用法和众多功能可以帮助您快速，安全和稳定的爬网页，接口和文件。

如果您也喜欢X-Crawl，则可以给x-crawl repository一颗星星来支持它，谢谢您的支持！

特征

ð¥异步同步 - 只需更改模式属性以切换异步或同步爬行模式。
多种用途 - 它可以爬网页，爬网界面，爬网文件和民意测验爬网以满足各种情况的需求。
- 灵活的写作样式 - 相同的爬行API可以适用于多种配置，并且每种配置方法都非常独特。
- ±！±间隔爬行 - 没有间隔，固定间隔和随机间隔，以产生或避免并发高爬行。
ð重试失败 - 避免由于短期问题而引起的爬行失败，并自定义重试次数。
â€totry rotation - 自动旋转代理失败重试，自定义错误时间和HTTP状态代码。
ð设备指纹 - 零配置或自定义配置，避免指纹识别以识别和从不同位置跟踪我们。
ð优先级队列 - 根据单个爬行目标的优先级，它可以在其他目标之前爬行。
- 抓取spa - 爬网SPA（单页应用程序）生成预渲染的内容（又称“ ssr”（服务器端渲染））。
- 控制页面 - 您可以提交表单，键盘输入，事件操作，生成页面的屏幕截图，等等。
ð§¾捕获记录 - 捕获和记录爬行，并使用彩色字符串提醒终端。
ð€typescript - 自己的类型，通过仿制药实现完整的类型。

例子

以每天自动获取一些世界各地经验和房屋的照片为例：

// 1.Import module ES/CJS
import xCrawl from 'x-crawl'

// 2.Create a crawler instance
const myXCrawl = xCrawl({maxRetry: 3,intervalTime: { max: 3000, min: 2000 }})

// 3.Set the crawling task
/*
  Call the startPolling API to start the polling function,
  and the callback function will be called every other day
*/
myXCrawl.startPolling({ d: 1 }, async (count, stopPolling) => {
  // Call crawlPage API to crawl Page
  const res = await myXCrawl.crawlPage({
    targets: [
      'https://www.airbnb.cn/s/experiences',
      'https://www.airbnb.cn/s/plus_homes'
    ],
    viewport: { width: 1920, height: 1080 }
  })

  // Store the image URL to targets
  const targets = []
  const elSelectorMap = ['._fig15y', '._aov0j6']
  for (const item of res) {
    const { id } = item
    const { page } = item.data

    // Wait for the page to load
    await new Promise((r) => setTimeout(r, 300))

    // Gets the URL of the page image
    const urls = await page.$$eval(
      `${elSelectorMap[id - 1]} img`,
      (imgEls) => {
        return imgEls.map((item) => item.src)
      }
    )
    targets.push(...urls)

    // Close page
    page.close()
  }

  // Call the crawlFile API to crawl pictures
  myXCrawl.crawlFile({ targets, storeDir: './upload' })
})

运行结果：

注意：不要随意爬行，您可以在爬行前检查 robots.txt 协议。这只是为了演示如何使用X-Crawl。

X-Crawl

特征

例子

更多的