本文介绍了将SQL与HBase之间的数据转换(同步)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在彻底改变我们的产品,完全从微软和.NET家族转向开源(其中一个原因是成本削减和数据指数增长)。

我们计划将我们的数据模型从SQL Server(关系数据)完全转移到Hadoop(着名的键值对生态系统)。

在开始时,我们希望支持这两个版本(比如说1.0和新的v2.0)。为了保持数据的一致性,我们计划在两个系统之间同步数据,这是一项相当具有挑战性的任务并且容易出错,但我们没有其他选择。

有点困惑从哪里开始,我正在寻找专家社区。
这个方向上的任何策略/现有文献或任何其他类型的指导将非常有用。 不完全确定代码的结构,但是如果您目前有,或者至少有一个数据库访问类,其中所有的SQL都通过它执行,您可以覆盖保存功能以将更改写入两个数据库。如果你没有数据层,你可能想在开始转换之前考虑写一个。



否则,你可以添加来更新Hadoop,不确定你可以在Hadoop中做什么来保持MSSQL同步。



或者,您可以创建一个每x分钟运行一次的进程,手动同步两个数据库。



就个人而言,我会尽量避免尝试维护两个记录数据库。将一个新的实验数据库的变化移至稳定的数据库似乎有风险。你有机会破坏你稳定的系统。相反,我会编写一个转换器将数据从关系数据库移动到Hadoop。然后每天晚上,将您的数据复制到Hadoop中,并将其用于开发和测试新系统。我认为测试用户会明白,如果你说你的测试版本只是一个测试游乐场,并且不会影响你的实时产品。如果您打算对UI进行重大更改,并担心有些人不想过渡到2.0,那么您可能会试图一次性解决太多问题。



这些是我提出的解决方案......祝你好运!


We are overhauling our product by completely moving from Microsoft and .NET family to open source (well one of the reasons is cost cutting and exponential increase in data).

We plan to move our data model completely from SQL Server (relational data) to Hadoop (the famous key-Value pair ecosystem).

In the beginning, we want to support both versions (say 1.0 and new v2.0). In order to maintain the data consistency, we plan to sync the data between both systems, which is a fairly challenging task and error prone, but we don't have any other option.

A bit confused where to start from, I am looking up to the community of experts. Any strategy/existing literature or any other kind of guidance in this direction would be greatly helpful.

解决方案

I am not entirely sure how your code is structured, but if you currently have a data or persistence layer, or at least a database access class where all your SQL is executed through, you could override the save functions to write changes to both databases. If you do not have a data layer, you may want to considering writing one before starting the transition.

Otherwise, you could add triggers in MSSQL to update Hadoop, not sure what you can do in Hadoop to keep MSSQL in-sync.

Or, you could have a process that runs every x minutes, that manually syncs the two databases.

Personally, I would try to avoid trying to maintain two databases of record. Moving changes from a new, experimental database to your stable database seems risky. You stand the chance of corrupting your stable system. Instead, I would write a convertor to move data from your relational DB to Hadoop. Then every night or so, copy your data into Hadoop and use it for the development and testing of your new system. I think test users would understand if you said your beta version is just a test playground, and won't effect your live product. If you plan on making major changes to your UI and fear some will not want to transition to 2.0, then you might be trying to tackle too much at once.

Those are the solutions I came up with... Good luck!

这篇关于将SQL与HBase之间的数据转换(同步)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-20 01:19