本文介绍了PyMC-协方差估计的wishart分布的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要根据资产类别的收益建模和估计方差-协方差矩阵,因此我查看了 https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers

I need to model and estimate a variance-covariance matrix from asset class returns so I was looking at the stock returns example given in chapter 6 of https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers

这是我的简单实现,我从一个样本开始,该样本使用具有已知均值和方差-协方差矩阵的多元法线.然后,我尝试使用无信息的先验估计它.

Here is my simple implementation where I start with a sample using a multivariate normal with a known mean and variance-covariance matrix. I then try to estimate it using a non-informative priror.

估算值与已知的估算值不同,因此我不确定我的实现是否正确.如果有人能指出我做错了什么,我将不胜感激.

The estimate is different from the known prior so I'm not sure if my implementation is correct. I'd appreciate if someone can point out what I'm doing wrong ?

import numpy as np
import pandas as pd
import pymc as pm


p=3
mu=[.03,.05,-.02]
cov_matrix= [[.025,0.0075, 0.00175],[0.0075,.007,0.00135],[0.00175,0.00135,.00043]]

n_obs=10000
x=np.random.multivariate_normal(mu,cov_matrix,n_obs)

prior_mu=np.ones(p)

prior_sigma = np.eye(p)


post_mu = pm.Normal("returns",prior_mu,1,size=p)
post_cov_matrix_inv = pm.Wishart("cov_matrix_inv",n_obs,np.linalg.inv(cov_matrix))

obs = pm.MvNormal( "observed returns", post_mu, post_cov_matrix_inv, observed = True, value = x )

model = pm.Model( [obs, post_mu, post_cov_matrix_inv] )
mcmc = pm.MCMC()

mcmc.sample( 5000, 2000, 3 )

mu_samples = mcmc.trace("returns")[:]
mu_samples.mean(axis=0)
cov_inv_samples = mcmc.trace("cov_matrix_inv")[:]
mean_covariance_matrix = np.linalg.inv( cov_inv_samples.mean(axis=0) )

推荐答案

我将提出一些建议,这些建议可以改善代码+推断:

here are some suggestions I'll make that can improve the code + inference:

  1. 我会pm.Wishart("cov_matrix_inv",n_obs,np.linalg.inv(cov_matrix))pm.Wishart("cov_matrix_inv",n_obs,np.eye(3) )因为它更客观(并且具有10000个数据点,所以您的先验不会有多大关系)

  1. I would pm.Wishart("cov_matrix_inv",n_obs,np.linalg.inv(cov_matrix)) topm.Wishart("cov_matrix_inv",n_obs,np.eye(3) ) as it is more objective (and with 10000 data points your prior is not going to matter much anyways)

mcmc = pm.MCMC()应该是mcmc = pm.MCMC(model)

mcmc.sample( 5000, 2000, 3 )这里的样本很少.当样本很多时,MCMC的后半部分蒙特卡洛(Monte Carlo)最强:我的意思是成千上万.在这里,您只有1000,因此由蒙特卡洛引起的误差将非常高(误差随着样本量的增加而减小).此外,在样本燃烧2000次后,MCMC可能尚未收敛.您可以使用pymc.Matplot中的plot并调用plot(mcmc)来检查收敛性.我使用mcmc.sample( 25000, 15000, 1 )并获得了更好的结果.

mcmc.sample( 5000, 2000, 3 ) There are far to few samples here. The second half of MCMC, Monte Carlo, is strongest when there are lots of samples: I mean tens of thousands. Here you are only have 1000, thus the error caused by Monte Carlo will be quite high (the error decreases with increasing the sample size). Furthermore, the MCMC has likely not converged after 2000 burn in samples. You can check the convergence with plot in pymc.Matplot and calling plot(mcmc). I used mcmc.sample( 25000, 15000, 1 ) and was getting better results.

我认为您使用如此低的样本的原因是性能.其中大部分是由大量样本引起的:您有10000个观测值.对于您在实践中实际拥有的而言,这可能会很高.

I imagine the reason you used such low samples was the performance. Much of that is caused by the large number of samples: you have 10000 observations. That might be quite high for what you actually have in practice.

请记住,贝叶斯推断中的许多值都被赋予后验样本:取这些样本的平均值似乎是一种浪费-考虑在损失函数中使用这些样本(请参阅本书第5章).

And remember, much of the value in Bayesian inference is being given posterior samples: taking the mean of these samples seems like a waste - think about using the samples in a Loss function (see chapter 5 in the book).

这篇关于PyMC-协方差估计的wishart分布的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-28 22:23