计算机视觉/增强现实：如何通过视觉叠加3D对象？

本文介绍了计算机视觉/增强现实：如何通过视觉叠加3D对象？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想创建一个示例应用程序，我可以覆盖3D对象的摄像机屏幕上。它们将被放置在特定点，并重新绘制每一帧作为用户移动照相机转换透视

I am trying to create a sample application where I can overlay 3d objects on a camera screen. They will be placed at a particular point, and re-drawn every frame as the user moves the camera to shift perspective.

在本质上，我期待复制这样的： http://www.youtube.com /手表？ν= EEstFtQbzow

In essence, I'm looking to replicate this: http://www.youtube.com/watch?v=EEstFtQbzow

下面是我尝试的措辞问题的更多precisely：正在考虑给定初始图像矩阵（重presenting所有的X，Y像素coords）使用初始对象放置的时间。一旦它的放置，每个后续视频帧将需要加以分析，以重新定位一个已经放置，以便它可以被重新绘制的对象（重叠）正确地给出的新的角度。

Here's my attempt at phrasing the problem more precisely: consider being given an initial image matrix (representing all the X,Y pixel coords) at the time of initial object placement. Once its placed, every subsequent video frame will need to be analyzed to re-position the object that's been placed so that it can be re-drawn (overlayed) correctly given the new perspective.

我有一点计算机视觉背景，但我不确定该怎么做这个特殊的任务。作为参考，示例应用程序我期待创造将是Android的，因此，如果有人知道现有的code，我可以利用这将是巨大的，以及。不过我要被引导到学术论文描述算法，我需要实施非常开放。谢谢你。

I have a bit of a background in computer vision, but I am unsure how to do this particular task. For reference, the sample application I'm looking to create will be for Android, so if anyone knows of existing code I could leverage that would be great as well. However I'm very open to being directed to academic papers describing algorithms I need to implement.Thanks.

推荐答案

这是一个pretty的知名计算机视觉问题。有各种各样的论文，你可以参考这一点，包括做同步定位与地图（SLAM），它可以使用任何捆绑的调整或过滤器为基础的追踪系统。读了这些话题热门文章会给你很多的洞察相机和跟踪在现实世界中。

This is a pretty well known problem in computer vision. There are various papers you can refer to for this, including systems that do simultaneous localisation and mapping (SLAM), which may use either bundle adjustment or filter-based tracking. Reading up popular papers on these topics will give you a lot of insight into cameras and tracking in the real world.

总之，你需要获得6D构成相机的每一帧，即你需要弄清楚的其中的相机是在现实世界（翻译），以及它指向（旋转）。这通常是由第一跟踪场景中的显着特征，推定其三维位置，然后使用这些特征的感知运动弄清楚照相机姿势在每一帧进行。您需要定义一个原点（不能使用相机的产地为你试图解决的问题）在现实世界中，并至少有4已知/测点为基准开始。在你已经包括在你的问题中的视频，增长作用似乎用印刷图案，以获得初始相机姿态。然后，他们跟踪在现实世界中的功能，以继续跟踪姿态

To summarise, you will need to obtain the 6D pose of the camera in every frame i.e. you need to figure out where the camera is in the real world (translation), and where it is pointing (rotation). This is usually done by first tracking salient features in the scene, estimating their 3D positions and then using the perceived motion of these features to figure out the camera pose in every frame. You will need to define an origin (you cannot use the camera as the origin for the problem you're trying to solve) in the real world and have at least 4 known/measured points as a reference to start with. In the video you've included in your question, Augment seem to use a printed pattern to get the initial camera pose. They then track features in the real world to continue tracking the pose.

一旦你的相机姿势，你可以在现实世界中使用投影将3D对象。相机姿势是连接codeD的基本/根本摄像头矩阵，利用它你可以把世界上任何3D点在相机的框架上的二维位置。所以呈现在现实世界中的虚拟3D点，比如说在（X，Y，Z），则突出（X，Y，Z），以二维点（U，V）使用相机矩阵。然后渲染从照相机获得的图像上的点。做到这一点的每一点要渲染的对象，就大功告成了：）

Once you have the camera pose, you can position the 3D object in the real world using projections. The camera pose is encoded the essential/fundamental camera matrix, using which you can transform any 3D point in the world to a 2D position in the camera's frame. So to render a virtual 3D point in the real world, say at (x, y, z), you will project (x, y, z) to 2D point (u, v) using the camera matrix. Then render the point on the image obtained from the camera. Do this for every point of the object you want to render, and you're done :)

这篇关于计算机视觉/增强现实：如何通过视觉叠加3D对象？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！