本文介绍了我的代码使用了超过25GB的内存和崩溃的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我使用Extra Trees Classifier来查找数据集中的要素重要性,它由13列和大约1000万行组成。我在上面放了一个椭圆形的信封,隔离林,一切都很好,它甚至占用了不到10 GB的空间。我在jupyter笔记本上运行了代码,即使将其设置为low_memory = True,它也给我带来内存错误。我尝试了拥有约25GB内存但仍崩溃的Google COlab,我现在非常困惑。

So I'm using Extra Trees Classifier in order to find the feature importance in my dataset, it consists of 13 columns and about 10 million rows. I have ran elliptic envelope on it, isolation forest and everything was fine, it even took less than 10 GB. I ran my code on jupyter note book and it gave me memory error even when I set it to low_memory=True. I tried Google COlab which has about 25GB of memory and still crashed, I'm very confused right now.

代码:

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

%matplotlib inline


from sklearn.ensemble import ExtraTreesClassifier 


from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials


# Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)



# Loading First Dataframe

link = '...'

fluff, id = link.split('=')
print (id) # Verify that you have everything after '='
downloaded = drive.CreateFile({'id':id}) 
downloaded.GetContentFile('Final After Simple Filtering.csv')  
df = pd.read_csv('Final After Simple Filtering.csv',index_col=None,low_memory=True)
#df = df.astype(float)


ExtraT = ExtraTreesClassifier(n_estimators = 100,bootstrap=False,n_jobs=1) 

y=df['Power_kW']

del df['Power_kW']

X=df


ExtraT.fit(X,y)

feature_importance = ExtraT.feature_importances_ 

feature_importance_normalized = np.std([tree.feature_importances_ for tree in ExtraT.estimators_], axis = 1)

plt.bar(X.columns, feature_importance) 
plt.xlabel('Lable') 
plt.ylabel('Feature Importance') 
plt.title('Parameters Importance') 
plt.show()  

谢谢

推荐答案

我之前遇到过同样的错误,我已经解决了。

I had the same Error before and i solved it.

更改运行时类型
GPU比CPU更快,这样会有所帮助。但是该怎么做呢?请按照以下步骤操作:

Change Runtime type GPU is Faster more than CPU , so it will help. But How to Do that ? Follow this steps:

确保使用25GB而不是12GB的RAM。
不要忘记Colab是免费版和限量版。
如果仍然有问题,请告诉我,我会尽快帮助您。

Be sure that you use 25GB not 12GB of RAM .Don't forget that Colab is free and limited Edition.If still have a problem , tell me and i will help you ASAP.

这篇关于我的代码使用了超过25GB的内存和崩溃的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

11-03 10:59