本文介绍了我在 deeplearning4j 中使用 word2vec 来训练词向量,但这些向量不稳定的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

1.我用IntelliJ IDEA搭建了一个maven项目,代码如下:

1.I use IntelliJ IDEA build a maven project,code is as follows:

    System.out.println("Load data....");
    SentenceIterator iter = new LineSentenceIterator(new File("/home/zs/programs/deeplearning4j-master/dl4j-test-resources/src/main/resources/raw_sentences.txt"));
    iter.setPreProcessor(new SentencePreProcessor() {
        @Override

            return sentence.toLowerCase();
        }
    });
    System.out.println("Build model....");
    int batchSize = 1000;
    int iterations = 30;
    int layerSize = 300;
    com.sari.Word2Vec vec= new  com.sari.Word2Vec.Builder()
            .batchSize(batchSize) //# words per minibatch.
            .sampling(1e-5) // negative sampling. drops words out
            .minWordFrequency(5) //
            .useAdaGrad(false) //
            .layerSize(layerSize) // word feature vector size
            .iterations(iterations) // # iterations to train
            .learningRate(0.025) //
            .minLearningRate(1e-2) // learning rate decays wrt # words. floor learning
            .negativeSample(10) // sample size 10 words
            .iterate(iter) //
            .tokenizerFactory(tokenizer)
            .build();
    vec.fit();
    System.out.println("Evaluate model....");
    double cosSim = vec.similarity("day" , "night");
    System.out.println("Similarity between day and night: "+cosSim);

这段代码参考了deeplearning4j中的word2vec,但是结果不稳定,每个实验的结果差别很大,比如用'day'和'night'相似度的余弦值,有时结果是高达0.98,有时低至0.4?

This code is reference the word2vec in deeplearning4j,but the result is unstable.The results of each experiment were very different.for example, with the cosine value of the similarity between 'day'and 'night', sometimes the result is as high as 0.98, sometimes as low as 0.4?

这是两个实验的结果

Evaluate model....
Similarity between day and night: 0.706292986869812

Evaluate model....
Similarity between day and night: 0.5550910234451294

为什么会这样.因为刚开始学word2vec,有很多知识没看懂,希望前辈能帮帮我,谢谢!

Why the result like this.Because I have just started learning word2vec, there are a lot of knowledge is not understood, I hope that seniors can help me,thanks!

推荐答案

您已经设置了以下行:

.minLearningRate(1e-2) // learning rate decays wrt # words. floor learning

但这是一个非常高的学习率.高学习率导致模型在任何状态下都不会稳定",而是一些更新显着改变了学习到的表示.这在最初的几次更新中不是问题,但不利于收敛.

But that is an extremely high learning rate. The high learning rate causes the model to not 'settle' in any state, but instead a few updates significantly changes the learned representation. That is not a problem during the first few updates, but bad for convergence.

解决办法:允许学习率衰减.您可以完全省略此行,或者如果您必须使用更合适的值,例如 1e-15

Solution:Allow learning rate to decay.You can leave this line out completely, or if you must you can use a more appropriate value, such as 1e-15

这篇关于我在 deeplearning4j 中使用 word2vec 来训练词向量,但这些向量不稳定的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-15 03:32