在数据工厂管道中从ML Studio管道中提取结果

本文介绍了在数据工厂管道中从ML Studio管道中提取结果的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我们当前有一个Data Factory管道，该管道能够成功调用我们的ML Studio管道之一.ML Studio Pipeline完成后，我们希望Azure数据工厂选择ML Studio Pipeline的结果并将结果存储在SQL Server中.

We currently have a Data Factory pipeline that is able to call one of our ML Studio Pipelines successfully. After the ML Studio Pipeline completed, we wanted Azure Data Factory to pick up the results of the ML Studio Pipeline and store the results in SQL Server.

我们发现PipelineData类基于子运行ID将结果存储在blob的文件夹中，这使得Data factory很难提取结果.然后，我们发现了OutputFileDatasetConfig，它允许ML Studio将结果保存到Data Factory的静态位置.这对Data Factory很有用，除了OutputFileDatasetConfig并不总是有效:(因为它是实验类.我们花了一些时间才弄清楚这一点，我们甚至为此创建了一个stackoverflow问题，我们已经解决了该问题，可以在这里找到: Azure ML Studio ML Pipeline-例外:否找到临时文件

We found the PipelineData class stores the results in a folder in blob based on the child run id, which makes it hard for Data factory to pick up the results. We then discovered OutputFileDatasetConfig which allows ML Studio to save the results to a static location for Data Factory. This worked great for Data Factory except OutputFileDatasetConfig doesn't always work :( since it's experimental class. It took us a while to figure this out and we even created a stackoverflow question for this, which we resolved, and can be found here: Azure ML Studio ML Pipeline - Exception: No temp file found

我们返回到使用PipelineData类，该类根据子运行ID将结果存储在blob中的文件夹中，但是我们无法弄清楚如何使数据工厂根据ML的子运行id查找blob.它刚刚运行的Studio Pipeline.

We returned to using PipelineData class which stores the results in a folder in blob based on the child run id, but we can't figure out how to get Data factory to find the blob based on the child run id of the ML Studio Pipeline it just ran.

所以我的问题是，如何使Data Factory拾取从Data Factory Pipeline触发的ML Studio Pipeline的结果?

这是我们尝试构建的Data Factory管道的简单视图.

Here is a simple visual of the Data Factory pipeline we're trying to build.

Step 1: Store Data in azure file store -->
Step 2: Run ML Studio scoring Pipeline -->
Step 3: Copy Results to SQL Server

步骤3是我们无法确定的步骤.任何帮助将不胜感激.谢谢，祝您编程愉快！

Step 3 is the step we can't figure out. Any help would be greatly appreciated. Thanks and happy coding!

推荐答案

我想我回答了自己的问题.原来我的问题类似于几个月前提出的另一个问题，他们的最佳解决方案对我有用.

I think I answered my own question. Turns out my question is similar to another question that was asked a few months ago, and their top solution worked for me.

如何编写Azure机器学习批量评分结果到数据湖?

我能够如下使用DataTransferStep.

I was able to use DataTransferStep as follows.

transfer_ml_to_blob = DataTransferStep(
    name="transfer_ml_to_blob",
    source_data_reference=output_dir,
    destination_data_reference=blob_data_ref,
    compute_target=data_factory_compute,
    source_reference_type='directory', 
    destination_reference_type='directory'
)

其他一些有用的资源:

https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-data-transfer.ipynb

https://social.msdn.microsoft.com/Forums/zh-CN/026b9b1d-6-4217-b179-0c1973ac1fa2/数据传输作业失败，发生意外错误，系统无效，exception-blob-contains-forum = AzureMachineLearningService#7b46c5eb-b7f1-4c2f-a6d0-553672a83e7a

带有DataTransferStep的Azure ML PipelineData结果为0字节文件

这篇关于在数据工厂管道中从ML Studio管道中提取结果的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！