网络刮擦Google Play应用程序评论
#node #webscraping #serpapi



ð注意:您可以使用官方的Google Play Developer API,该200,000 requests per day retrieving the list of reviews and individual reviews的默认限制。

另外,您可以使用完整的第三方Google Play商店应用程序刮擦解决方案google-play-scraper。第三方解决方案通常用于打破配额限制。

这篇博客文章旨在为如何使用Puppeteer刮擦Google Play商店应用程序评论以自己创建某些东西来创建某些内容。


如果您不需要解释,请看一下the full code example in the online IDE

const puppeteer = require("puppeteer-extra");
const StealthPlugin = require("puppeteer-extra-plugin-stealth");


const reviewsLimit = 100; // hardcoded limit for demonstration purpose

const searchParams = {
  id: "com.discord", // Parameter defines the ID of a product you want to get the results for
  hl: "en", // Parameter defines the language to use for the Google search
  gl: "us", // parameter defines the country to use for the Google search

const URL = `${}&hl=${searchParams.hl}&gl=${}`;

async function scrollPage(page, clickElement, scrollContainer) {
  let lastHeight = await page.evaluate(`document.querySelector("${scrollContainer}").scrollHeight`);
  while (true) {
    await page.waitForTimeout(500);
    await page.waitForTimeout(2000);
    let newHeight = await page.evaluate(`document.querySelector("${scrollContainer}").scrollHeight`);
    const reviews = await page.$$(".RHo1pe");
    if (newHeight === lastHeight || reviews.length > reviewsLimit) {
    lastHeight = newHeight;

async function getReviewsFromPage(page) {
  return await page.evaluate(() => ({
    reviews: Array.from(document.querySelectorAll(".RHo1pe")).map((el) => ({
      title: el.querySelector(".X5PpBb")?.textContent.trim(),
      avatar: el.querySelector(".gSGphe > img")?.getAttribute("srcset")?.slice(0, -3),
      rating: parseInt(el.querySelector(".Jx4nYe > div")?.getAttribute("aria-label")?.slice(6)),
      snippet: el.querySelector(".h3YV2d")?.textContent.trim(),
      likes: parseInt(el.querySelector(".AJTPZc")?.textContent.trim()) || "No likes",
      date: el.querySelector(".bp9Aid")?.textContent.trim(),
      response: {
        title: el.querySelector(".ocpBU .I6j64d")?.textContent.trim(),
        snippet: el.querySelector(".ocpBU .ras4vb")?.textContent.trim(),
        date: el.querySelector(".ocpBU .I9Jtec")?.textContent.trim(),

async function getAppReviews() {
  const browser = await puppeteer.launch({
    headless: true, // if you want to see what the browser is doing, you need to change this option to "false"
    args: ["--no-sandbox", "--disable-setuid-sandbox"],

  const page = await browser.newPage();

  await page.setDefaultNavigationTimeout(60000);
  await page.goto(URL);

  await page.waitForSelector(".qZmL0");

  const moreReviewButton = await page.$("c-wiz[jsrenderer='C7s1K'] .VMq4uf button");

  if (moreReviewButton) {
    await"c-wiz[jsrenderer='C7s1K'] .VMq4uf button");
    await page.waitForSelector(".RHo1pe .h3YV2d");
    await scrollPage(page, ".RHo1pe .h3YV2d", ".odk6He");
  const reviews = await getReviewsFromPage(page);

  await browser.close();

  return reviews;

getAppReviews().then((result) => console.dir(result, { depth: null }));


首先,我们需要创建一个node.js* project并添加koude0koude1koude2koude3以控制Chromium(或Chrome或Firefox,但现在我们仅在DevTools Protocol上使用铬在headless或无头模式中。


$ npm init -y


$ npm i puppeteer puppeteer-extra puppeteer-extra-plugin-stealth

*如果您没有安装node.js,则可以download it from nodejs.org并遵循安装documentation

ð注意:另外,您可以使用puppeteer无需任何扩展即可,但是我强烈建议将其与puppeteer-extra一起使用puppeteer-extra-plugin-stealth,以防止您使用无头铬或正在使用web driver的网站检测。您可以在Chrome headless tests website上检查它。下面的屏幕截图显示了差异。




下一步是在滚动完成后从HTML元素中提取数据。通过SelectorGadget Chrome extension,获得合适的CSS选择器的过程非常容易,该过程能够通过单击浏览器中的所需元素来获取CSS选择器。但是,它并不总是完美地工作,尤其是当JavaScript大量使用该网站时。

如果您想了解更多有关它们的信息,我们在Serpapi上有专门的Web Scraping with CSS Selectors博客文章。




声明koude1puppeteer-extra库和koude9控制Chromium浏览器,以防止网站检测到您正在使用puppeteer-extra-plugin-stealth库中使用web driver

const puppeteer = require("puppeteer-extra");
const StealthPlugin = require("puppeteer-extra-plugin-stealth");

接下来,我们“说” puppeteer使用StealthPlugin,编写必要的请求参数,搜索URL并设置我们要接收多少评论(reviewsLimit常数):


const reviewsLimit = 100; // hardcoded limit for demonstration purpose

const searchParams = {
  id: "com.discord", // Parameter defines the ID of a product you want to get the results for
  hl: "en", // Parameter defines the language to use for the Google search
  gl: "us", // parameter defines the country to use for the Google search

const URL = `${}&hl=${searchParams.hl}&gl=${}`;


async function scrollPage(page, clickElement, scrollContainer) {


然后,我们使用while循环,在评论元素上单击(koude17方法)以保持焦点为焦点,等待0.5秒(使用koude18方法),按“ end”按钮滚动到最后一个评论元素,等待2秒并获得新的scrollContainer高度。


let lastHeight = await page.evaluate(`document.querySelector("${scrollContainer}").scrollHeight`);
while (true) {
  await page.waitForTimeout(500);
  await page.waitForTimeout(2000);
  let newHeight = await page.evaluate(`document.querySelector("${scrollContainer}").scrollHeight`);
  const reviews = await page.$$(".RHo1pe");
  if (newHeight === lastHeight || reviews.length > reviewsLimit) {
  lastHeight = newHeight;


async function getReviewsFromPage(page) {



return await page.evaluate(() => ({
    reviews: Array.from(document.querySelectorAll(".RHo1pe")).map((el) => ({


title: el.querySelector(".X5PpBb")?.textContent.trim(),
avatar: el.querySelector(".gSGphe > img")?.getAttribute("srcset")?.slice(0, -3),
rating: parseInt(el.querySelector(".Jx4nYe > div")?.getAttribute("aria-label")?.slice(6)),
snippet: el.querySelector(".h3YV2d")?.textContent.trim(),
likes: parseInt(el.querySelector(".AJTPZc")?.textContent.trim()) || "No likes",
date: el.querySelector(".bp9Aid")?.textContent.trim(),
response: {
    title: el.querySelector(".ocpBU .I6j64d")?.textContent.trim(),
    snippet: el.querySelector(".ocpBU .ras4vb")?.textContent.trim(),
    date: el.querySelector(".ocpBU .I9Jtec")?.textContent.trim(),


async function getAppReviews() {

首先,在此功能中,我们需要使用带有当前optionspuppeteer.launch({options})方法来定义browser,例如headless: trueargs: ["--no-sandbox", "--disable-setuid-sandbox"]


const browser = await puppeteer.launch({
  headless: true, // if you want to see what the browser is doing, you need to change this option to "false"
  args: ["--no-sandbox", "--disable-setuid-sandbox"],

const page = await browser.newPage();

接下来,我们更改默认值(30 sec)等待选择器的时间到60000毫秒(1分钟),以使用koude40方法进行慢速Internet连接,请使用koude42方法访问URL,并使用koude43方法来等待等待,直到选择器加载:< br>

await page.setDefaultNavigationTimeout(60000);
await page.goto(URL);
await page.waitForSelector(".qZmL0");


const moreReviewButton = await page.$("c-wiz[jsrenderer='C7s1K'] .VMq4uf button");

if (moreReviewButton) {
  await"c-wiz[jsrenderer='C7s1K'] .VMq4uf button");
  await page.waitForSelector(".RHo1pe .h3YV2d");
  await scrollPage(page, ".RHo1pe .h3YV2d", ".odk6He");
const reviews = await getReviewsFromPage(page);

await browser.close();

return reviews;


$ node YOUR_FILE_NAME # YOUR_FILE_NAME is the name of your .js file


         "title":"Faera Rathion",
         "snippet":"I would've given this 5 stars a few months ago, being a long time user, but these recent updates have made the app extremely frustrating to use. I get randomly put into channels when I open the app, they scroll me back sometimes hundreds of messages, it's impossible to see all the channels in some Discords, doesn't clear notifications without having to try to fully scroll through a channel I was mentioned in to the point of having to refresh it multiple times and many more consistent issues.",
         "date":"October 19, 2022",
            "title":"Discord Inc.",
            "snippet":"We're sorry for the inconvenience. We hear you and our teams are actively working on rolling out fixes daily. If you continue to experience issues, please make sure your app is on the latest updated version. Also, your feedback greatly affects what we focus on so please let us know if you continue to have issues at",
            "date":"October 19, 2022"
         "title":"Avoxx Nepps",
         "snippet":"The new update has made it borderline unusable. It is extremely glitchy and a lot of times doesn't even work properly. Can't even join a voice call without it leaving and rejoining by itself or muting me for unknown reason. The new video system absolutely sucks. All of the minor inconveniences the previous version had is nothing compared to this update which looks like it was thrown together by a team of teenagers in Middle School in a month for a school project.",
         "likes":"No likes",
         "date":"October 20, 2022",
            "title":"Discord Inc.",
            "snippet":"We'd like to know more about the issues you've encountered after the recent update. Could you please submit a support ticket so we can look into the issue?: If you have any suggestions about what should be changed or improved, please share them on our Feedback page here:",
            "date":"October 21, 2022"
      ...and other reviews

usuingaoqian42 from serpapi





npm i google-search-results-nodejs

这是full code example,如果您不需要说明:

const SerpApi = require("google-search-results-nodejs");
const search = new SerpApi.GoogleSearch(process.env.API_KEY); //your API key from

const reviewsLimit = 100; // hardcoded limit for demonstration purpose

const params = {
  engine: "google_play_product", // search engine
  gl: "us", // parameter defines the country to use for the Google search
  hl: "en", // parameter defines the language to use for the Google search
  store: "apps", // parameter defines the type of Google Play store
  product_id: "com.discord", // Parameter defines the ID of a product you want to get the results for.
  all_reviews: "true", // Parameter is used for retriving all reviews of a product

const getJson = () => {
  return new Promise((resolve) => {
    search.json(params, resolve);

const getResults = async () => {
  const allReviews = [];
  while (true) {
    const json = await getJson();
    if ( {
    } else break;
    if (json.serpapi_pagination?.next_page_token) {
      params.next_page_token = json.serpapi_pagination?.next_page_token;
    } else break;
    if (allReviews.length > reviewsLimit) break;
  return allReviews;

getResults().then((result) => console.dir(result, { depth: null }));



const SerpApi = require("google-search-results-nodejs");
const search = new SerpApi.GoogleSearch(API_KEY);


const reviewsLimit = 100; // hardcoded limit for demonstration purpose

const params = {
  engine: "google_play_product", // search engine
  gl: "us", // parameter defines the country to use for the Google search
  hl: "en", // parameter defines the language to use for the Google search
  store: "apps", // parameter defines the type of Google Play store
  product_id: "com.discord", // Parameter defines the ID of a product you want to get the results for.
  all_reviews: "true", // Parameter is used for retriving all reviews of a product


const getJson = () => {
  return new Promise((resolve) => {
    search.json(params, resolve);


const getResults = async () => {


const allReviews = [];

接下来,我们需要使用while循环。在此循环中,我们获得了带有结果的json,检查页面上是否存在reviews,将其推送(koude56方法)将它们放在allReviews array(使用koude58),将next_page_token设置为params对象,然后重复该循环直到结果不存在,直到结果不存在。页面或收到的评论的数量比reviewsLimit

while (true) {
  const json = await getJson();
  if ( {
  } else break;
  if (json.serpapi_pagination?.next_page_token) {
    params.next_page_token = json.serpapi_pagination?.next_page_token;
  } else break;
  if (allReviews.length > reviewsLimit) break;
return allReviews;


getResults().then((result) => console.dir(result, { depth: null }));


      "title":"Johnathan Kamuda",
      "snippet":"Been using Discord for many, many years. They are always making it better. It's become so much more robust and feature filled since I first started using it. And it's platform to pay for extras is great. You don't NEED to, but it's nice to have that kind of service a available if we wanted some perks. I think some of the options could be laid out better. Personal example - changing individuals volume in a call, not an intuitive option to find at first. Things like that fixed, would be perfect.",
      "date":"October 19, 2022"
      "title":"Lark Reid",
      "snippet":"Ever since the new update me and other people that I know have completely lost the ability to upload more than one image/video at a time. It freezes on 70-100% when uploading multiple at a time. Now the audio on videos that I upload turn into static. I played the videos on my phone to make sure they weren't corrupted, and they are just fine. Sometimes when I open the app it gets stuck connecting and I have to restart it. Please fix your app asap. It's just not my phone that is effected.",
      "date":"October 21, 2022",
         "title":"Discord Inc.",
         "snippet":"We're sorry for the inconvenience. We hear you and our teams are actively working on rolling out fixes daily. If you continue to experience issues, please make sure your app is on the latest updated version. Also, your feedback greatly affects what we focus on so please let us know if you continue to have issues at",
         "date":"October 21, 2022"
    ... and other reviews


如果您想查看一些用serpapi制定的项目,write me a message

加入我们的Twitter | YouTube

添加Feature Requestð«或Bugð