将遗留代码库从cvs传输到分布式存储库（例如git或mercurial）。初始仓库设计所需的建议

本文介绍了将遗留代码库从cvs传输到分布式存储库（例如git或mercurial）。初始仓库设计所需的建议的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

简介和背景

我们正在改变源代码管理系统，我们正在评估git和mercurial。总代码库是大约600万行代码，所以不是很大，也不是很小。

让我首先开始一个很简单的介绍，如何当前存储库设计看起来。

我们为完整的代码库提供了一个基本文件夹，在该级别之下，在多个不同的上下文中使用了各种模块。例如，dllproject1和dllproject2可以作为完全独立的项目来查看。

我们正在开发的软件是我们称之为配置器，可以定制化满足不同客户的需求。总共我们可能有50个不同的版本。但是，他们有一个共同点。他们都共享一些强制性模块（mandatory_module1 ..）。这些文件夹基本上包含内核/核心代码和通用语言资源等。然后，所有的定制都可以是其他模块（module1 ..）之间的任何组合。

由于我们目前是使用cvs，我们在CVSROOT / modules文件中添加了别名。他们可能看起来像：

  core -a mandatory_module1 mandatory_module2 mandatory_module3 
 project_x -a module1 module3 module5 core

所以如果有人决定在project_x上工作，他/她可以快速检查所需的模块：

  base> cvs co project_x

问题

直观地，将基本文件夹作为单个存储库感到错误。作为程序员，您应该能够查看正在使用的当前项目所需的确切代码子集。你有什么想法？

另一方面，将这些模块的每个都放在单独的存储库中感觉更为正确。但是，这使程序员难以查看他们需要的模块。你应该可以通过一个命令来做到这一点。所以我的问题是：在git / mercurial中是否有类似的方法定义别名？

任何其他问题，建议，指针都非常受欢迎！

PS。我已经搜索过类似的问题，但没有觉得任何一个都适用于我的情况。

解决方案

评论提醒你：

这些迁移通常提供重组资源的机会，而不是模块（每个都有一个存储库），而是沿着功能域拆分（将同一给定功能域的几个模块放在同一个存储库中）。

然后，作为定义。

Git没关系，但从，将所有内容放入一个存储库可以是有问题。

上述两点提倡为大型系统提供更多面向组件的方法和大型旧版存储库）。

使用，您可以在项目中检出它们（即使是两步过程）。但是，您可以使用工具，使子模块管理更容易（）。

那就是我在中作为系统方法所描述的内容：大家对于最新的（HEAD）工作，对少量项目是有效的。

对于大量的模块，模块的概念仍然非常有用，但它的管理是与DVCS不一样：

对于紧密相关的模块（又称在同一功能域，如所有模块相关到PNL - 利润a Nd损失或金融领域的风险分析），您需要使用所涉及的所有组件的最新（HEAD）。

这将通过使用，不是为了你在其他子模块上发布（推出）更正，但是跟踪其他团队完成的作品。

Git允许使用额外的奖金，这种跟踪不必在您的存储库和一个中央存储库，但也可以发生在您和另一个团队的本地存储库之间，允许在类似性质的项目之间进行非常快速的前后整合和测试。

但是，对于不直接在功能域中的模块，子模块是一个更好的选择，因为它们是指模块的修订版本（提交）：

当一个低级框架发生变化时，你不想要它要立即传播，因为它会影响所有其他团队，然后他们将不得不放弃他们正在做的事情，以便将他们的代码调整到新版本（你希望尽管所有其他团队都是意识到这个新版本，以便他们不要忘记更新低级组件或模块）。

这使您只能使用官方稳定的其他模块的标准版本，而不是潜在的未稳定或未完全测试的HEAD。

Introduction and Background

We are in the process of changing source control system and we are currently evaluating git and mercurial. The total code base is around 6 million lines of code, so not massive and not really small either.

Let me first start off with a very brief introduction to how the current repository design looks.

We have one base folder for the complete code base, and beneath that level there are all sorts modules used in several different contexts. For example "dllproject1" and "dllproject2" can be looked at as completely separate projects.

The software we are developing is something we call a configurator, which can be customized endlessly for different customer needs. At total we probably have 50 different versions of them. However, they have one thing in common. They all share a couple of mandatory modules (mandatory_module1 ..). These folders basically contain kernel/core code and common language resources etc. All customizations can then be any combination between the other modules (module1 ..).

Since we currently are using cvs we've added aliases in the CVSROOT/modules file. They might look something like:

core –a mandatory_module1 mandatory_module2 mandatory_module3
project_x –a module1 module3 module5 core

So if someone decides to work on project_x, he/she can quickly checkout the modules needed by:

base>cvs co project_x

Questions

Intuitively it just feels wrong to have the base folder as a single repository. As a programmer you should be able to check out the exact code sub set needed for the current project you are working with. What are your thoughts on this?

On the other hand it feels more right to have each of these modules in separate repositories. But this makes it harder for programmers to check out the modules that they need. You should be able to do this by a single command. So my question is: Are there similar ways of defining aliases in git/mercurial?

Any other questions, suggestions, pointers are highly welcome!

PS. I have searched for similar questions but didn’t feel that any of them applied 100% to my situation.

解决方案

Just a quick comment to remind you that:

those migrations often offer the opportunity to reorganize the sources, not along modules (each with one repositories) but rather along a functional domain split (several modules for a same given functional domain being put in the same repository).

Then submodules are to be used, as a way to define a configuration.

Git is alright, but from Linus's admission himself, to put everything into one repository can be problematic.

Those two aforementioned points advocate for a more component-oriented approach for large system (and large legacy repository).

With Git submodule, you can checkout them in your project (even if it is a two-steps process). You have however tools than can make the submodule management easier (git.rake for instance).

That is what I describe in the post Vendor Branch as the "system approach": everyone works on the latest (HEAD) of everything, and it is effective for small number of projects.
For a large number of modules though, the notion of "module" is still very useful, but its management is not the same with DVCS:

for closely related modules (aka "in the same functional domain", like "all modules related to PNL - Profit aNd Losses - or "Risk analysis", in a financial domain), you do need to work with the latest (HEAD) of all components involved.
That would be achieved with the use of a subtree strategy, not in order for you to publish (push) corrections on those other submodules, but to track works done by other teams.
Git allows that with the extra-bonus that this "tracking" does not have to take place between your repository and one "central" repository, but can also take place between you and the local repository of the other team, allowing for a very quick back-and-forth integration and testing between projects of similar nature.
however, for modules which are not directly in your functional domain, submodules are a better option, because they refer to a fix version of a module (a commit):
when a low-level framework changes, you do not want it to be propagated instantaneously, since it would impact all the other teams, which would then have to drop what they were doing to adapt their code to that new version (you do want though all the other teams to be aware of this new version, in order for them to not forget to update that low-level component or "module").
That allows you to work only with official stable identified versions of other modules, and not potentially un-stabled or not fully tested HEADs.

这篇关于将遗留代码库从cvs传输到分布式存储库（例如git或mercurial）。初始仓库设计所需的建议的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！