本文介绍了我可以在数据库的多对多字段中使用计数器来减少查找吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图找出访问存储在交汇点对象中的数据的最快方法。下面的例子与我的问题是一致的,但是与上下文不同,因为我处理的实际数据集在它的关系中是有些不直观的。



我们有3个类: 用户,产品和评分。用户与 Product 并且 Rating 作为junction /'through'类的多对多关系。 p>

评级对象存储几个问题的答案,这些问题的评分是1-5的整数评分(示例问题: Product 的质量如何, Product 的值是多少,产品)。为了简化,假设他们购买每一个产品的每个用户费率。



<现在这里是我要执行的计算:对于用户,计算所有产品 (即所有其他用户的平均评级,其中之一将来自用户自理)。然后,我们可以告诉用户:平均而言,您购买的产品价值为所有购买该产品的客户的3/5。

简单而缓慢的方法就是遍历用户的所有评论对象。如果我们假定每个用户已经购买了一个小(

然而,我也可以这样做:在 Product 类中,保留一个的数量的计数器 s选择每个数字(例如多少用户 s对该产品的评分为3/5)。如果您每次对 Product 进行评级,那么递增该计数器,那么计算给定 Product 的平均值只需要检查每个评级标准5个计数器。



这是一个有效的技巧吗?这是常用的吗?有没有一个名字?这对我来说似乎很直观,但我对数据库知之甚少,不知道是否有一些根本的缺陷。

解决方案

这个是正常的。它最终是缓存:冗余编码状态,以牺牲其他人的利益为代价。当然这也是一种复杂化。

仅仅因为RDBMS数据结构是关系,并不意味着你不能从一些简单的形式重新排列你如何编码状态。例如,反规范化。

(有时冗余设计(包括像你这样的设计)被称为非规范化,当它们实际上不是非规范化的结果时,冗余不是那种反规范化原因或规范化删除。确实可以合理地描述你的情况涉及规范化而不保留FD(函数依赖)从一个用户的 id &其他列开始,他们的等级 )及其计数器。然后 ratings 功能地确定计数器因为计数器 = 从计数器中选择计数(*)分解为 user etc + counter ,即表用户和用户 + 评级,它取消了对 Rating的评级。)






>


$ b

常见的评论由我:谷歌很多清晰,简洁&您的问题/问题/目标/需求的具体措辞与不同的术语子集。标签,你可能会发现他们与&没有你的具体名称(变量/数据库/表/列/约束/等)。例如,我何时可以在数据库中存储(总和或总数)。人类措辞,而不仅仅是关键字,似乎有帮助。你最好的选择可能是优化SQL数据库设计的性能。有整本书('amazon isbn'),一些在线('pdf')。 (但也许主要是重新查询)。调查与仓储相关的技术,因为OLTP数据库充当OLAP数据库的输入缓冲区,并使用SQL处理大数据。 (例如快照调度)。

PS我把这个叫做缓存(标签也是这样) 标题=显示问题标记缓存rel =tag>缓存)是(典型的)相当抽象,到严重的笑话,在CS的一切都缓存的点。 (谷歌搜索...在计算机科学只有两个难题:缓存失效和命名的事情。 - 菲尔卡尔顿。)(欢迎来到两个。)


I am trying to figure out the fastest way to access data stored in a junction object. The example below is analagous to my problem, but with a different context, because the actual dataset I am dealing with is somewhat unintuitive in its relationships.

We have 3 classes: User, Product, and Rating. User has a many-to-many relationship to Product with Rating as the junction/'through' class.

The Rating object stores the answers to several questions which are integer ratings on a scale of 1-5 (Example questions: How is the quality of the Product, how is the value of the Product, how user-friendly is the Product). For simplification assume every User rates every Product they buy.

Now here is the calculation I want to perform: For a User, calculate the average rating of all the Products they have bought (that is, the average rating from all other Users, one of which will be from this User themself). Then we can tell the user "On average, you buy products rated 3/5 for value by all customers who bought that product".

The simple and slow way is just to iterate over all of a user's review objects. If we assume that each user has bought a small (<100) number of products, and each product has n ratings, this is O(100n) = O(n).

However, I could also do the following: On the Product class, keep a counter of the number of Rating s that selected each number (e.g. how many User s rated this product 3/5 for value). If you increment that counter every time a Product is rated, then computing the average for a given Product just requires checking the 5 counters for each Rating criteria.

Is this a valid technique? Is it commonly employed/is there a name for it? It seems intuitive to me, but I don't know enough about databases to tell whether there's some fundamental flaw or not.

解决方案

This is normal. It is ultimately caching: encoding of state redundantly to benefit some patterns of usage at the expense of others. Of course it's also a complexification.

Just because the RDBMS data structure is relations doesn't mean you can't rearrange how you are encoding state from some straightforward form. Eg denormalization.

(Sometimes redundant designs (including ones like yours) are called "denormalized" when they are not actually the result of denormalization and the redundancy is not the kind that denormalization causes or normalization removes. Cross Table Dependency/Constraint in SQL Database Indeed one could reasonably describe your case as involving normalization without preserving FDs (functional dependencies). Start with a table with a user's id & other columns, their ratings (a relation) & its counter. Then ratings functionally determines counter since counter = select count(*) from ratings. Decompose to user etc + counter, ie table User, and user + ratings, which ungroups to table Rating. )


A frequent comment by me: Google many clear, concise & specific phrasings of your question/problem/goal/desiderata with various subsets of terms & tags as you may discover them with & without your specific names (of variables/databases/tables/columns/constraints/etc). Eg 'when can i store a (sum OR total) redundantly in a database'. Human phrasing, not just keywords, seems to help. Your best bet may be along the lines of optimizing SQL database designs for performance. There are entire books ('amazon isbn'), some online ('pdf'). (But maybe mostly re queries). Investigate techniques relevant to warehousing, since an OLTP database acts as an input buffer to an OLAP database, and using SQL with big data. (Eg snapshot scheduling.)

PS My calling this "caching" (so does tag caching) is (typical of me) rather abstract, to the point where there are serious-jokes that everything in CS is caching. (Googling... "There are only two hard problems in Computer Science: cache invalidation and naming things."--Phil Karlton.) (Welcome to both.)

这篇关于我可以在数据库的多对多字段中使用计数器来减少查找吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-22 15:42