Hbase读取性能异常变化

本文介绍了Hbase读取性能异常变化的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我已经安装了HBase 0.94.0。我必须通过扫描来提高我的阅读性能。

当我设置 setCache（100）; 时，我的表现为16秒100000条记录。

当我将它设置为 setCache（50）时，对于100000条记录，我的性能为90秒。 / p>

当我将它设置为 setCache（10）; 时，对于100000条记录，我的表现为16秒

b
$ b

  public class Test {
 public static void main（String [] args）{
 
 long start，middle ， 结束; 
 
 HTableDescriptor描述符= new HTableDescriptor（Student7）; 
 descriptor.addFamily（新的HColumnDescriptor（否））; 
 descriptor.addFamily（new HColumnDescriptor（Subject））; 
 
尝试{
 HBaseConfiguration config = new HBaseConfiguration（）; 
 HBaseAdmin admin = new HBaseAdmin（config）; 
 
 admin.createTable（descriptor）; 
 HTable表=新的HTable（config，Student7）; 
 System.out.println（Table created！）; 
 
 start = System.currentTimeMillis（）; 
 
 for（int i = 1; i  String s = Integer.toString（i）; 
 Put p = new Put（Bytes.toBytes（s））; 
 p.add（Bytes.toBytes（No），Bytes.toBytes（IDCARD），Bytes.toBytes（i + 10））; 
 p.add（Bytes.toBytes（No），Bytes.toBytes（PHONE），Bytes.toBytes（i + 20））; 
 p.add（Bytes.toBytes（No），Bytes.toBytes（PAN），Bytes.toBytes（i + 30））; 
 p.add（Bytes.toBytes（No），Bytes.toBytes（ACCT），Bytes.toBytes（i + 40））; 
 p.add（Bytes.toBytes（Subject），Bytes.toBytes（English），Bytes.toBytes（50））; 
 p.add（Bytes.toBytes（Subject），Bytes.toBytes（Science），Bytes.toBytes（60））; 
 p.add（Bytes.toBytes（Subject），Bytes.toBytes（History），Bytes.toBytes（70））; 
 
 table.put（p）; 
} 
 middle = System.currentTimeMillis（）; 
 
扫描s =新扫描（）; 
 s.setCaching（100）; 
 ResultScanner scanner = table.getScanner（s）; 
 
尝试{
 for（Result rr = scanner.next（）; rr！= null; rr = scanner.next（））{
 System.out.println（找到的行：+ rr）; 
} 
 end = System.currentTimeMillis（）; 
} finally {
 scanner.close（）; 
} 
 System.out.println（TableCreation-Time：+（middle-start））; 
 System.out.println（Scan-Time：+（middle  -  end））; 
} catch（IOException e）{
 System.out.println（IOError：can not create Table。）; 
 e.printStackTrace（）; 
} 
} 
}

为什么会发生这种情况？

解决方案

为什么你想要返回100000记录表中的每条记录？你正在做一个完整的
表扫描，就像在任何大型数据库中一样，这很慢。

尝试考虑一个更有用的用例，其中您想要返回记录的某些列或一系列记录。

HBase在其表格上只有一个索引，即行键。利用它。尝试定义行键，以便只需指定行键即可获取所需的数据。

假设您想知道<$ c $的值（注意， setCaching（100）意味着HBase的行数> c> Subject：History ，行
的行键在80000和80100之间。每个RPC将获取100条记录，这种情况就是这样一个例子。抓取100行显然需要更多的内存，而不是取回，比如说，一行记录在一个大的多用户环境中。）

 长时间开始，结束; 
 start = System.currentTimeMillis（）; 
 
 Scan s = new Scan（String.valueOf（80000）.getBytes（），String.valueOf（80100）.getBytes（））; 
 s.setCaching（100）; 
 s.addColumn（Subject.getBytes（），History.getBytes（））; 
 
 ResultScanner扫描器= table.getScanner（s）; 
尝试{
 for（Result rr = scanner.next（）; rr！= null; rr = scanner.next（））{
 System.out.println（Found row： + new String（rr.getRow（），UTF-8）+value：+ new String（rr.getValue（Subject.getBytes（），History.getBytes（）），UTF- ）））; 
} 
 end = System.currentTimeMillis（）; 
} finally {
 scanner.close（）; 
} 
 System.out.println（Scan：+（end  -  start））;

这可能看起来很愚蠢，因为您如何知道只需要一个整数需要哪些行？嗯，正是如此，但这就是为什么您需要根据您要查询的内容设计一个行键，而不是像传统数据库中那样使用增量值。

试试这个例子。它应该很快。

注意：我没有运行该示例。我只是在这里输入。也许有一些小的语法错误应该纠正，但我希望这个想法很明确。

I've installed HBase 0.94.0. I had to improve my read performance through scan. I've inserted random 100000 records.

When I set setCache(100); my performance was 16 secs for 100000 records.

When I set it to setCache(50) my performance was 90 secs for 100000 records.

When I set it to setCache(10); my performance was 16 secs for 100000 records

public class Test {
    public static void main(String[] args) {

    long start, middle, end;

    HTableDescriptor descriptor = new HTableDescriptor("Student7");
    descriptor.addFamily(new HColumnDescriptor("No"));
    descriptor.addFamily(new HColumnDescriptor("Subject"));

    try {   
    HBaseConfiguration config = new HBaseConfiguration();
    HBaseAdmin admin = new HBaseAdmin(config);

    admin.createTable(descriptor);
            HTable table = new HTable(config, "Student7");
            System.out.println("Table created !");

    start = System.currentTimeMillis();

    for(int i =1;i<100000;i++) {
        String s=Integer.toString(i);
        Put p = new Put(Bytes.toBytes(s));
        p.add(Bytes.toBytes("No"), Bytes.toBytes("IDCARD"),Bytes.toBytes("i+10"));
        p.add(Bytes.toBytes("No"), Bytes.toBytes("PHONE"),Bytes.toBytes("i+20"));
        p.add(Bytes.toBytes("No"), Bytes.toBytes("PAN"),Bytes.toBytes("i+30"));
        p.add(Bytes.toBytes("No"), Bytes.toBytes("ACCT"),Bytes.toBytes("i+40"));
        p.add(Bytes.toBytes("Subject"), Bytes.toBytes("English"),Bytes.toBytes("50"));
        p.add(Bytes.toBytes("Subject"), Bytes.toBytes("Science"),Bytes.toBytes("60"));
        p.add(Bytes.toBytes("Subject"), Bytes.toBytes("History"),Bytes.toBytes("70"));

        table.put(p);
    }
    middle = System.currentTimeMillis();

    Scan s = new Scan();
    s.setCaching(100);      
    ResultScanner scanner = table.getScanner(s);

    try {
        for (Result rr = scanner.next(); rr != null; rr=scanner.next()) {
            System.out.println("Found row: " + rr);
        }
        end = System.currentTimeMillis(); 
    } finally {
        scanner.close();
    }       
        System.out.println("TableCreation-Time: " + (middle - start));
        System.out.println("Scan-Time: " + (middle - end));
    } catch (IOException e) {
        System.out.println("IOError: cannot create Table.");
        e.printStackTrace();
        }
    }
}

Why is this happening?

解决方案

Why would you want to return every record in your 100000 records table? You're doing a full table scan and just as in any large database this is slow.

Try thinking about a more useful use case in which you would like to return some columns of a record or a range of records.

HBase does only have one index on it's table, the row key. Make use of that. Try defining your row key so that you can get the data you need just by specifying the row key.

Let's say you would like to know the value of Subject:History for the rows with a row key between 80000 and 80100. (Note that setCaching(100) means HBase will fetch 100 records per RPC and is this case thus one. Fetching 100 rows obviously requires more memory opposed to fetching, let's say, one row. Keep that in mind in a large multi-user environment.)

Long start, end;
start = System.currentTimeMillis();

Scan s = new Scan(String.valueOf(80000).getBytes(), String.valueOf(80100).getBytes());
s.setCaching(100);
s.addColumn("Subject".getBytes(), "History".getBytes());

ResultScanner scanner = table.getScanner(s);
try {
    for (Result rr = scanner.next(); rr != null; rr=scanner.next()) {
        System.out.println("Found row: " + new String(rr.getRow(), "UTF-8") + " value: " + new String(rr.getValue("Subject".getBytes(), "History".getBytes()), "UTF-8")));
    }
    end = System.currentTimeMillis(); 
} finally {
    scanner.close();
}       
System.out.println("Scan: " + (end - start));

This might look stupid because how would you know which rows you need just by an integer? Well, exactly, but that's why you need to design a row key according to what you're about to query instead of just using an incremental value as you would in a traditional database.

Try this example. It should be fast.

Note: I didn't run the example. I just typed it here. Maybe there are some small syntax errors you should correct but I hope the idea is clear.

这篇关于Hbase读取性能异常变化的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！