本文介绍了使用周期性传感器数据设计数据库的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 限时删除!! 我正在设计一个PostgreSQL数据库,它接收来自许多传感器源的读数。我已经对设计进行了大量的研究,我正在寻找一些新的输入,以帮助我摆脱这里的烦恼。 要清楚,我我找不到帮助描述数据源或任何相关元数据。我特别想弄清楚如何最好地存储数据值(最终各种类型)。 数据的基本结构如下: 对于每个数据记录设备,有多个通道。 对于每个通道,记录器读取数据 不同的频道可能有不同的数据类型,但通常float4就足够了。 用户应该通过数据库 b $ b 这种数据布局的显着特点是,我有很多通道将数据点与具有时间戳和索引号的单个记录相关联。 现在,为了描述数据量和常见的访问模式: 数据将进入约5个记录器48通道,每分钟。 在这种情况下,总数据量将是每天345,600个读数,每年126万个,并且这些数据需要至少连续读取10年。 / li> 通用访问将包括在所有记录器中查询类似的通道类型,并跨记录器时间戳加入。例如,从logger1获取channel1,从logger2获取channel4,并在logger1.time = logger2.time上执行完整的外连接。 我还应该提到,每个记录器时间戳是由于时间调整而改变的,并且将在不同的表中描述,该表显示了服务器的时间读数,记录器的时间读数,传输等待时间,时钟调整和所得到的调整时钟值。这将发生在一组记录器记录/时间戳,取决于检索。这是我下面 RecordTable 的动机,但是现在不是很关心,只要我可以引用一个(记录器,时间,记录) 我考虑了很多模式选项,最简单的类似于混合EAV方法,其中表本身描述了属性,因为大多数属性只是一个真正的值叫价值。这是一个基本的布局: RecordTable DataValueTable ---------- ----- --------- [PK] id [FK] logger_id [FK] channel_id record_number value logger_time $ b考虑到 logger_id , record_number 和 logger_time 是唯一的,我想我在这里使用代理键,但希望我的节省空间的理由在这里有意义。我还考虑添加一个PK id到 DataValueTable (而不是PK record_id 和 channel_id ),以便引用来自其他表的数据值,但我试图抵制迫使这个模型现在太灵活。但是,我希望尽快开始获取数据,并且在以后需要添加额外功能或不同结构的数据时,不必更改此部分。 首先,我为每个记录器创建记录表,然后为每个通道创建值表,并在其他位置(在一个地方)描述它们,并将视图连接起来,但这只是感觉错误,因为我重复了同样的事情次。我想我试图找到一个愉快的介质,太多的表和太多的行,但分区较大的数据( DataValueTable )似乎很奇怪,因为我很可能分区在 channel_id ,因此每个分区将具有相同的值每行。此外,在这方面的分区将需要每次添加通道时在重新定义主表中的检查条件方面的一些工作。按日期分区仅适用于 RecordTable ,这不是真正需要考虑的是如何相对较小(每天7200行与5个记录器)。 我还考虑使用 channel_id 上的部分索引使用上述方法,因为 DataValueTable 将增长非常大,但是通道ID的集合将保持小的,但是我真的不确定这将在多年后扩展好。我已经用模拟数据做了一些基本的测试,性能只是这样,我希望它随着数据量的增长保持特殊。另外,一些表达式关注于抽真空和分析大表,以及处理大量索引(在这种情况下,高达250个索引)。 注意,我也将跟踪对这些数据的更改,并允许注释(例如,鸟在传感器上,因此这些值被调整/标记等),所以保持在你的想法的背后考虑这里的设计,但它现在是一个单独的关注。 我的经验/技术水平的一些背景,如果它有助于看到我从哪里来:我是一个CS博士生,作为我研究的一部分,我定期与数据/数据库合作。然而,我为客户设计强大的数据库(这是业务的一部分)的实践经验有一定的局限性,它具有出色的寿命和灵活的数据表示。我认为我现在的主要问题是我正在考虑对这个问题的所有方法的角度,而不是专注于完成它,我没有看到一个正确的解决方案在我面前。 总而言之,我想这些是我的主要查询:如果你做了这样的事情,什么对你有用?我没有看到我在这里提出的各种设计有什么好处/缺点?考虑到这些参数和访问模式,您可以如何设计这样的功能? 我将很乐意在需要时提供说明/细节, 。解决方案在关系数据库中提供所有这些都是没有问题的。 PostgreSQL不是企业类,但它肯定是更好的免费软件SQL之一。 清楚的是,我不是寻找帮助 数据源或任何相关的元数据。我特别想弄清楚如何最好地存储数据值(最终各种类型)。 这是你最大的障碍。与程序设计相反,程序设计允许对组件进行分解和隔离分析/设计,数据库需要设计为单个单元。规范化和其他设计技术需要考虑整体和上下文中的组件。数据,描述和元数据必须一起评估,而不是作为单独的部分。 其次,当您开始使用代理键时,意味着您知道数据,以及它如何与其他数据相关,它会阻止您对数据进行真正的建模。 我回答了一组非常相似的问题,巧合的是非常相似的数据。如果你能先阅读这些答案,它会节省我们在你的问题/答案上的大量打字时间。 回答一个/ ID障碍 回答两/主要 答案三/历史 I'm designing a PostgreSQL database that takes in readings from many sensor sources. I've done a lot of research into the design and I'm looking for some fresh input to help get me out of a rut here.To be clear, I am not looking for help describing the sources of data or any related metadata. I am specifically trying to figure out how to best store data values (eventually of various types).The basic structure of the data coming in is as follows:For each data logging device, there are several channels.For each channel, the logger reads data and attaches it to a record with a timestampDifferent channels may have different data types, but generally a float4 will suffice.Users should (through database functions) be able to add different value types, but this concern is secondary.Loggers and channels will also be added through functions.The distinguishing characteristic of this data layout is that I've got many channels associating data points to a single record with a timestamp and index number.Now, to describe the data volume and common access patterns:Data will be coming in for about 5 loggers, each with 48 channels, for every minute.The total data volume in this case will be 345,600 readings per day, 126 million per year, and this data needs to be continually read for the next 10 years at least.More loggers & channels will be added in the future, possibly from physically different types of devices but hopefully with similar storage representation.Common access will include querying similar channel types across all loggers and joining across logger timestamps. For example, get channel1 from logger1, channel4 from logger2, and do a full outer join on logger1.time = logger2.time.I should also mention that each logger timestamp is something that is subject to change due to time adjustment, and will be described in a different table showing the server's time reading, the logger's time reading, transmission latency, clock adjustment, and resulting adjusted clock value. This will happen for a set of logger records/timestamps depending on retrieval. This is my motivation for RecordTable below but otherwise isn't of much concern for now as long as I can reference a (logger, time, record) row from somewhere that will change the timestamps for associated data.I have considered quite a few schema options, the most simple resembling a hybrid EAV approach where the table itself describes the attribute, since most attributes will just be a real value called "value". Here's a basic layout:RecordTable DataValueTable---------- --------------[PK] id <-- [FK] record_id[FK] logger_id [FK] channel_idrecord_number valuelogger_timeConsidering that logger_id, record_number, and logger_time are unique, I suppose I am making use of surrogate keys here but hopefully my justification of saving space is meaningful here. I have also considered adding a PK id to DataValueTable (rather than the PK being record_id and channel_id) in order to reference data values from other tables, but I am trying to resist the urge to make this model "too flexible" for now. I do, however, want to start getting data flowing soon and not have to change this part when extra features or differently-structured-data need to be added later.At first, I was creating record tables for each logger and then value tables for each channel and describing them elsewhere (in one place), with views to connect them all, but that just felt "wrong" because I was repeating the same thing so many times. I guess I'm trying to find a happy medium between too many tables and too many rows, but partitioning the bigger data (DataValueTable) seems strange because I'd most likely be partitioning on channel_id, so each partition would have the same value for every row. Also, partitioning in that regard would require a bit of work in re-defining the check conditions in the main table every time a channel is added. Partitioning by date is only applicable to the RecordTable, which isn't really necessary considering how relatively small it will be (7200 rows per day with the 5 loggers).I also considered using the above with partial indexes on channel_id since DataValueTable will grow very large but the set of channel ids will remain small-ish, but I am really not certain that this will scale well after many years. I have done some basic testing with mock data and the performance is only so-so, and I want it to remain exceptional as data volume grows. Also, some express concern with vacuuming and analyzing a large table, and dealing with a large number of indexes (up to 250 in this case).On a very small side note, I will also be tracking changes to this data and allowing for annotations (e.g. a bird crapped on the sensor, so these values were adjusted/marked etc), so keep that in the back of your mind when considering the design here but it is a separate concern for now.Some background on my experience/technical level, if it helps to see where I'm coming from: I am a CS PhD student, and I work with data/databases on a regular basis as part of my research. However, my practical experience in designing a robust database for clients (this is part of a business) that has exceptional longevity and flexible data representation is somewhat limited. I think my main problem now is I am considering all the angles of approach to this problem instead of focusing on getting it done, and I don't see a "right" solution in front of me at all.So In conclusion, I guess these are my primary queries for you: if you've done something like this, what has worked for you? What are the benefits/drawbacks I'm not seeing of the various designs I've proposed here? How might you design something like this, given these parameters and access patterns?I'll be happy to provide clarification/details where needed, and thanks in advance for being awesome. 解决方案 It is no problem at all to provide all this in a Relational database. PostgreSQL is not enterprise class, but it is certainly one of the better freeware SQLs.To be clear, I am not looking for help describing the sources of data or any related metadata. I am specifically trying to figure out how to best store data values (eventually of various types).That is your biggest obstacle. Contrary to program design, which allows decomposition and isolated analysis/design of components, databases need to be designed as a single unit. Normalisation and other design techniques need to consider both the whole, and the component in context. The data, the descriptions, the metadata have to be evaluated together, not as separate parts.Second, when you start off with surrogate keys, implying that you know the data, and how it relates to other data, it prevents you from genuine modelling of the data.I have answered a very similar set of questions, coincidentally re very similar data. If you could read those answers first, it would save us both a lot of typing time on your question/answer.Answer One/ID ObstacleAnswer Two/MainAnswer Three/Historical 这篇关于使用周期性传感器数据设计数据库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 1403页,肝出来的..
09-06 07:47