肉丸。liveã混合Redis Stack的黑客新闻体验 - 第3部分-DEV365 开发者社区

欢迎进入本系列的第3部分！

如果您正在寻找我的黑客马拉松提交，请前往part 2。

编辑：从那以后，我已经迁移了摄入并从Vercel（$ 20/mo）到铁路（$ 5-10/mo）生成服务，以降低成本。更新后的黑客安装指南是在过程中。

Jobs Server services-migration 分支
服务服务器repo

现在，DEV x Redis hackathon提交的内容已关闭并且判断已经开始，我将结束（？）本系列与验尸的结论，比较了Meatballs.lives.lives.lives.live（MB）collections与Hackers News（HN）frontpage。

黑客新闻前页面仍在变化，在出版日期之后，以下图表中列出的排名位置可能有所不同。我会适当编辑。

编辑：与原始排名/更新等级的比较的表更新。

MB	hn	标题	创建（CT）
1	57/58	我们的Joe Rogan经验的观察	6:41 AM
2	65/66	电报询问德国用户何时与执法部门共享信息	5:51 AM
3	80/81	covid fudge因素 - 有效的covid数据腐败和方法	7:13 AM
4	79/80	问题追踪器被认为有害	3:13 AM
5	5/16	为什么WebMD如此糟糕？	8:03 AM
6	85/85	问HN：我应该通知投资者我的创业雇主是骗局吗？	3:33 AM
7	60/61	使用手机作为软件开发平台	3:15 AM
8	42/43	问HN：您如何获得侧面演出？	6:19 AM
9	28/35	远程控制的草坪拖拉机	7:08 AM

在出版时，只有肉丸中的一个故事。8月30日生成的Live系列在Hacker News的前9名。

编辑：Hacker News的前9名。肉丸。Live算法对当天早些时候的故事有偏见，因为生成集合的处理器仅限于24小时（00：00-23：59）范围。从天花板延长范围又增加12个小时左右是有意义的。幸运的是，数据就在那里，所以这只是重新生成过去日期的问题。

我们的算法完全不同。

对于上下文，这是去年的insightful post，涉及黑客新闻的排名。

hn：rankingScore = pow(upvotes, 0.8) / pow(ageHours + 2, 1.8)

另一方面，肉丸。Live测量时间序列的表现，强调评分评论。让我们浏览如何实现这些结果。

生成集合算法

通过koude1端点，可以在00:00 UTC生成前一天的收藏（或带有摄入数据的任何东西）。 API仅需要格式YYYY：M：D的dateKey参数。 2022:8:30。然后执行koude4。

接下来会发生什么？

返回success: false如果要求，dateKey是在环境变量MEATBALLS_COLLECTIONS_START_DATE_KEY之前或迟到的
返回success: false如果收集数据已经存在于请求的dateKey

export const getCollectionsByDate = async ({
  repository,
  date: { year, month, day }
}: {
  repository: Repository<Collection>
  date: CollectionDate
}) =>
  await repository
    .search()
    .where('year')
    .eq(year)
    .and('month')
    .eq(month)
    .and('day')
    .eq(day)
    .sortBy('position')
    .return.all()

返回success: false如果未找到所请求的dateKey的时间序列数据

await redisClient.ts.mRange(
  startOfRequestedDayInMilliseconds,
  endOfRequestedDayInMilliseconds,
  ['type=weighted', 'compacted=day'],
  {
    GROUPBY: { label: 'story', reducer: TimeSeriesReducers.MAXIMUM }
  }
)

按最高样本值排序，降序并返回前20个；该值是通过story activity的koude12计算并保存到故事的时间序列的
返回success: false如果在图中找不到特定的故事数据

const findStoriesTransaction = redisClient.multi()

// prepare transaction calls
timeSeriesWithSamples.map((series) => {
  const storyId = series.key.replace('story=', `${DATA_SOURCE.HN}:`)

  findStoriesTransaction.graph.query(
    `${MEATBALLS_DB_KEY.GRAPH}`,
    `
    MATCH (s:Story)
    WHERE s.name = "${storyId}"
    return s.name, s.score, s.comment_total, s.created
    `
  )
})

const foundStories = await findStoriesTransaction.exec()

Rank发现的故事，冒泡的评论到顶部，然后返回第9个
Build the collections;查找故事和评论建议并涵盖图像

休息后有关建议的更多信息。

有关建议的更多信息

我最喜欢的肉丸功能。Live是收藏评论和故事建议。

打开一个系列时，您会获得最多5条评论，在右边，根据故事标题的最多5个故事建议。对于Observations from our Joe Rogan Experience experience，建议看起来像：

目前获得建议的过程很简单，但在此阶段相当有效：

const queryTitle = storyContent.title
  ? removeSpecialCharacters(storyContent.title).replace(/ /g, '|')
  : undefined

const recommendedStories: { id: string; title: string }[] = []

const foundDocuments = (
      await redisClient.ft.search(`Story:index`, queryTitle, {
      LIMIT: { from: 0, size: 5 }
    })
  ).documents,
  docTitles = foundDocuments.map(({ value }) => value.title)

  foundDocuments
    .filter(
      ({ id, value }, index) =>
        value.title &&
        !docTitles.includes(value.title, index + 1) &&
        id.replace('Story:', '') !== story.id &&
        value.title !== storyContent.title
      )
      .map(({ id, value }) => {
        if (value.title)
          recommendedStories.push({
            id: id.replace('Story:hn:', ''),
            title: value.title as string
          })
        })

查找建议的评论更为复杂：

redisClient.graph.query(
  `${MEATBALLS_DB_KEY.GRAPH}`,
  `
  MATCH (:Story { name: "${story.id}" })-[:PROVOKED]->(topComment)<-[:REACTION_TO*1..]-(childComment)
  WITH topComment, collect(childComment) as childComments
  RETURN topComment.name, topComment.created, SIZE(childComments)
  ORDER BY SIZE(childComments) DESC LIMIT 5
  `
)

在这里，根据使用Cypher的collect聚合函数的活性和反应深度，该查询从根中找到最大5 topComment。

这是一位顶级评论，还是引发了引人入胜的对话？

查询的MATCH部分的视觉表示形式将看起来像这样。在RedisInsight上进行：

GRAPH.QUERY _meatballs 'MATCH graph=(:Story { name: "hn:32567147" })-[:PROVOKED]->(topComment)<-[:REACTION_TO*1..]-(childComment)
  RETURN graph'

首先想到改进？

我对您的反馈感兴趣。这是我的一些直接想法：

按时间进行重量，因此最佳结果并不经常在早晨创建，即整天汇总，而是根据整天的可变时间范围来查看样本，然后比较
支持new-collections API上的覆盖参数
如果processNewCollections在复杂性完成之前失败
除了标题，搜索评论和用户有关文本的评论，作为选择推荐故事的一部分

就是这样！

前往meatballs.live，让我知道您的想法。我期待着建立这个项目，并欢迎contributors。

感谢您的阅读和祝福所有参加的人！ ð