Web刮擦Apple App Store产品信息和评论Nodejs
#node #webscraping #serpapi

将被刮擦

what

完整代码

如果您不需要解释,请看一下the full code example in the online IDE

import dotenv from "dotenv";
dotenv.config();
import { getJson } from "serpapi";

const getSearchParams = (searchType) => {
  const isProduct = searchType === "product";
  const reviewsLimit = 10; // hardcoded limit for demonstration purpose
  const engine = isProduct ? "apple_product" : "apple_reviews"; // search engine
  const params = {
    api_key: process.env.API_KEY, //your API key from serpapi.com
    product_id: "1507782672", // Parameter defines the ID of a product you want to get the reviews for
    country: "us", // Parameter defines the country to use for the search
    type: isProduct ? "app" : undefined, // Parameter defines the type of Apple Product to get the product page of
    page: isProduct ? undefined : 1, // Parameter is used to get the items on a specific page
    sort: isProduct ? undefined : "mostrecent", // Parameter is used for sorting reviews
  };
  return { engine, params, reviewsLimit };
};

const getProductInfo = async () => {
  const { engine, params } = getSearchParams("product");
  const json = await getJson(engine, params);
  delete json.search_metadata;
  delete json.search_parameters;
  delete json.search_information;
  return json;
};

const getReviews = async () => {
  const reviews = [];
  const { engine, params, reviewsLimit } = getSearchParams();
  while (true) {
    const json = await getJson(engine, params);
    if (json.reviews) {
      reviews.push(...json.reviews);
      params.page += 1;
    } else break;
    if (reviews.length >= reviewsLimit) break;
  }
  return reviews;
};

const getResults = async () => {
  return { productInfo: await getProductInfo(), reviews: await getReviews() };
};

getResults().then((result) => console.dir(result, { depth: null }));

为什么要从serpapi使用Apple Product Page ScraperApple App Store Reviews Scraper apis?

使用API​​通常可以解决在创建自己的解析器或爬网时可能会遇到的所有或大多数问题。从网络剪接的角度来看,我们的API可以帮助解决最痛苦的问题:

  • 通过求解验证码或IP块,来自受支持的搜索引擎的旁路块。
  • 无需从头开始创建解析器并维护它。
  • 支付代理和验证码求解器。
  • 如果需要更快地提取数据,则不需要使用浏览器自动化。

前往Apple Product Page playgroundApple App Store Reviews playground进行现场互动演示。

准备

首先,我们需要创建一个node.js* project并添加koude0套件koude1koude2

为此,在我们项目的目录中,打开命令行并输入:

$ npm init -y

,然后:

$ npm i serpapi dotenv

*如果您没有安装node.js,则可以download it from nodejs.org并遵循安装documentation

  • SERPAPI软件包用于使用SERPAPI刮擦和解析搜索引擎结果。从Google,Bing,Bing,Yandex,Yahoo,Home Depot,eBay等获取搜索结果。

  • dotenv软件包是一个零依赖性模块,将环境变量从.env文件加载到process.env

接下来,我们需要在我们的package.json文件中添加一个带有“模块”值的顶级“类型”字段,以允许using ES6 modules in Node.JS

ES6Module

目前,我们完成了项目的设置node.js环境,然后转到分步代码说明。

代码说明

首先,我们需要从koude2库导入dotenv并致电koude8方法,然后从koude1库导入getJson

import dotenv from "dotenv";
dotenv.config();
import { getJson } from "serpapi";
  • config()将读取您的.env文件,解析内容,将其分配给process.env,并用parsed键返回包含已加载内容或error键的对象。
  • getJson()允许您根据搜索参数获得JSON响应。

接下来,我们编写getSearchParams函数,以制作两个不同API的必要搜索参数。在此函数中,我们根据searchType参数定义并设置isProduct常数。

接下来,我们为产品页面API定义并返回不同的搜索参数,并评论API:搜索engine;我们要收到多少评论(reviewsLimit常数);搜索请求的参数:

const getSearchParams = (searchType) => {
  const isProduct = searchType === "product";
  const reviewsLimit = 10; // hardcoded limit for demonstration purpose
  const engine = isProduct ? "apple_product" : "apple_reviews"; // search engine
  const params = {
    api_key: process.env.API_KEY, //your API key from serpapi.com
    product_id: "1507782672", // Parameter defines the ID of a product you want to get the reviews for
    country: "us", // Parameter defines the country to use for the search
    type: isProduct ? "app" : undefined, // Parameter defines the type of Apple Product to get the product page of
    page: isProduct ? undefined : 1, // Parameter is used to get the items on a specific page
    sort: isProduct ? undefined : "mostrecent", // Parameter is used for sorting reviews
  };
  return { engine, params, reviewsLimit };
};

运行此功能时,我们会收到不同的搜索参数:

  • 产品页面API:
    product api

  • 评论API:
    reviews api

您可以使用下一个搜索参数:

公共参数:

  • api_key参数定义要使用的serpapi私钥。
  • product_id参数定义了您要获得评论的产品的ID。您可以从我们的Web scraping Apple App Store Search with Nodejs博客文章中获取产品ID。您也可以从应用程序的URL获取它。例如 “ https://apps.apple.com/us/app/the-great-coffee-app/id534220544”的product_id是“ ID”之后的长数值,534220544
  • country参数定义了用于搜索的国家。这是一个两个字母的国家代码。 (例如,us(默认)为美国,uk代表英国或法国的fr)。前往Apple Regions获取支持的苹果区域的完整列表。
  • no_cache参数将迫使Serpapi获取App Store搜索结果,即使已经存在缓存版本。仅当查询和所有参数完全相同时,才能提供缓存。 1小时后缓存到期。缓存的搜索是免费的,并且不计入您每月的搜索。可以将其设置为false(默认值)以允许缓存的结果,也可以将true的结果设置为禁止缓存的结果。 no_cacheasync参数不应一起使用。
  • async参数定义了要将搜索提交给SERPAPI的方式。可以将其设置为false(默认值)以打开HTTP连接并保持打开状态,直到获得搜索结果,或者true仅将搜索提交给SERPAPI并以后将其检索。在这种情况下,您需要使用我们的Searches Archive API来检索结果。 asyncno_cache参数不应一起使用。 async不应在启用Ludicrous Speed的帐户上使用。

产品页参数:

  • type参数定义了以获取产品页面的Apple产品类型。它默认为app

评论参数:

  • page参数用于在特定页面上获取项目。 (例如,1(默认值)是结果的第一页,2是结果的第二页,3是结果的第三页,等等)。
  • sort参数用于排序评论。它可以设置为:mostrecent(最新(默认))或mosthelpful(最有用)。

接下来,我们声明函数getProductInfo,该功能从页面中获取所有产品信息并将其返回。在此功能中,我们从getSearchParams函数中接收到destructure engineparams,带有"product"参数。接下来,我们将获得带有结果的json,删除不必要的键,然后返回:

const getProductInfo = async () => {
  const { engine, params } = getSearchParams("product");
  const json = await getJson(engine, params);
  delete json.search_metadata;
  delete json.search_parameters;
  delete json.search_information;
  return json;
};

接下来,我们声明函数getReviews,该功能从所有页面(使用分页)中获取评论结果并返回:

const getReviews = async () => {
  ...
};

在此功能中,我们需要声明一个空的reviews阵列,接收和destructure engineparamsreviewsLimitreviewsLimitgetSearchParams函数中无参数,然后使用koude62 loop get json with json带有结果,请从每个页面和set sew news and page index index index index index(index index index(index index index)(到params.page值)。

如果页面上没有更多结果,或者接收结果的数量超过reviewsLimit,我们会停止循环(使用koude67)并返回带有结果的数组:

const reviews = [];
const { engine, params, reviewsLimit } = getSearchParams();
while (true) {
  const json = await getJson(engine, params);
  if (json.reviews) {
    reviews.push(...json.reviews);
    params.page += 1;
  } else break;
  if (reviews.length >= reviewsLimit) break;
}
return reviews;

最后,我们声明并运行了getResults函数,其中我们用getProductInfogetReviews函数的结果制作一个对象。然后,我们使用koude71方法在控制台中打印所有接收的信息,该方法允许您使用带有必要参数的对象来更改默认输出选项:

const getResults = async () => {
  return { productInfo: await getProductInfo(), reviews: await getReviews() };
};

getResults().then((result) => console.dir(result, { depth: null }));

输出

{
   "productInfo":{
      "title":"Pixea",
      "snippet":"The invisible image viewer",
      "id":"1507782672",
      "age_rating":"4+",
      "developer":{
         "name":"ImageTasks Inc",
         "link":"https://apps.apple.com/us/developer/imagetasks-inc/id450316587"
      },
      "rating":4.6,
      "rating_count":"594 Ratings",
      "price":"Free",
      "logo":"https://is3-ssl.mzstatic.com/image/thumb/Purple118/v4/f6/93/b6/f693b68f-9b14-3689-7521-c19a83fb0d88/AppIcon-1x_U007emarketing-85-220-6.png/320x0w.webp",
      "mac_screenshots":[
         "https://is3-ssl.mzstatic.com/image/thumb/PurpleSource124/v4/b1/8c/fb/b18cfb80-cb5c-d67d-2edc-ee1f6666e012/35b8d5a7-b493-4a80-bdbd-3e9d564601dd_Pixea-1.jpg/643x0w.webp",
         "https://is1-ssl.mzstatic.com/image/thumb/PurpleSource124/v4/96/08/83/9608834d-3d2b-5c0b-570c-f022407ff5cc/1836573e-1b6a-421c-b654-6ae2f915d755_Pixea-2.jpg/643x0w.webp",
         "https://is1-ssl.mzstatic.com/image/thumb/PurpleSource124/v4/58/fd/db/58fddb5d-9480-2536-8679-92d6b067d285/98e22b63-1575-4ee6-b08d-343b9e0474ea_Pixea-3.jpg/643x0w.webp",
         "https://is2-ssl.mzstatic.com/image/thumb/PurpleSource124/v4/c3/f3/f3/c3f3f3b5-deb0-4b58-4afc-79073373b7b9/28f51f38-bc59-4a61-a5a1-bff553838267_Pixea-4.jpg/643x0w.webp"
      ],
      "description":"Pixea is an image viewer for macOS with a nice minimal modern user interface. Pixea works great with JPEG, HEIC, PSD, RAW, WEBP, PNG, GIF, and many other formats. Provides basic image processing, including flip and rotate, shows a color histogram, EXIF, and other information. Supports keyboard shortcuts and trackpad gestures. Shows images inside archives, without extracting them.Supported formats:JPEG, HEIC, GIF, PNG, TIFF, Photoshop (PSD), BMP, Fax images, macOS and Windows icons, Radiance images, Google's WebP. RAW formats: Leica DNG and RAW, Sony ARW, Olympus ORF, Minolta MRW, Nikon NEF, Fuji RAF, Canon CR2 and CRW, Hasselblad 3FR. Sketch files (preview only). ZIP-archives.Export formats:JPEG, JPEG-2000, PNG, TIFF, BMP.Found a bug? Have a suggestion? Please, send it to support@imagetasks.comFollow us on Twitter @imagetasks!",
      "version_history":[
         {
            "release_version":"1.4",
            "release_notes":"- New icon- macOS Big Sur support- Universal Binary- Bug fixes and improvements",
            "release_date":"2020-11-09"
         },
        ... and other versions
      ],
      "ratings_and_reviews":{
         "rating_percentage":{
            "5_star":"76%",
            "4_star":"14%",
            "3_star":"4%",
            "2_star":"2%",
            "1_star":"3%"
         },
         "review_examples":[
            {
               "rating":"5 out of 5",
               "username":"MyrtleBlink182",
               "review_date":"01/18/2022",
               "review_title":"Full-Screen Perfection",
               "review_text":"This photo-viewer is by far the best in the biz. I thoroughly enjoy viewing photos with it. I tried a couple of others out, but this one is exactly what I was looking for. There is no dead space or any extra design baggage when viewing photos. Pixea knocks it out of the park keeping the design minimalistic while ensuring the functionality is through the roof"
            },
            ... and other reviews examples
         ]
      },
      "privacy":{
         "description":"The developer, ImageTasks Inc, indicated that the app’s privacy practices may include handling of data as described below. For more information, see the developer’s privacy policy.",
         "privacy_policy_link":"https://www.imagetasks.com/Pixea-policy.txt",
         "cards":[
            {
               "title":"Data Not Collected",
               "description":"The developer does not collect any data from this app."
            }
         ],
         "sidenote":"Privacy practices may vary, for example, based on the features you use or your age. Learn More",
         "learn_more_link":"https://apps.apple.com/story/id1538632801"
      },
      "information":{
         "seller":"ImageTasks Inc",
         "price":"Free",
         "size":"5.8 MB",
         "categories":[
            "Photo & Video"
         ],
         "compatibility":[
            {
               "device":"Mac",
               "requirement":"Requires macOS 10.12 or later."
            }
         ],
         "supported_languages":[
            "English"
         ],
         "age_rating":{
            "rating":"4+"
         },
         "copyright":"Copyright © 2020 Andrey Tsarkov. All rights reserved.",
         "developer_website":"https://www.imagetasks.com",
         "app_support_link":"https://www.imagetasks.com/pixea",
         "privacy_policy_link":"https://www.imagetasks.com/Pixea-policy.txt"
      },
      "more_by_this_developer":{
         "apps":[
            {
               "logo":"https://is3-ssl.mzstatic.com/image/thumb/Purple118/v4/f6/93/b6/f693b68f-9b14-3689-7521-c19a83fb0d88/AppIcon-1x_U007emarketing-85-220-6.png/320x0w.webp",
               "link":"https://apps.apple.com/us/app/istatistica/id1126874522",
               "serpapi_link":"https://serpapi.com/search.json?country=us&engine=apple_product&product_id=1507782672&type=app",
               "name":"iStatistica",
               "category":"Utilities"
            },
            ... and other apps
         ],
         "result_type":"Full",
         "see_all_link":"https://apps.apple.com/us/app/id1507782672#see-all/developer-other-apps"
      }
   },
   "reviews":[
      {
         "position":1,
         "id":"9332275235",
         "title":"Doesn't respect aspect ratios",
         "text":"Seemingly no way to maintain the aspect ratio of an image. It always wants to fill the photo to the window size, no matter what sizing options you pick. How useless is that?",
         "rating":3,
         "review_date":"2022-11-26 13:29:43 UTC",
         "author":{
            "name":"soren121",
            "link":"https://itunes.apple.com/us/reviews/id33706024"
         }
      },
      ... and other reviews
   ]
}

链接

如果您想在此博客文章中添加其他功能,或者您想查看Serpapi,write me a message的某些项目。


加入我们的Twitter | YouTube

添加一个Feature Requestð«或Bugð