在应用程序中,通常会记录日志以便事后分析,在很多情况下是产生了问题之后,再去查看日志,是一种事后的静态分析。在很多时候,我们可能需要了解整个系统在当前,或者某一时刻运行的情况,比如一个系统后台服务,我们可能需要了解一些实时监控的数据例如
1、每秒钟的请求数是多少(TPS)?
2、平均每个请求处理的时间?
3、请求处理的最长耗时?
4.请求处理的响应的直方图?
5、请求处理正确响应率?
6、等待处理的请求队列长度?
7、查看整个系统的的CPU使用率、内存占用、jvm运行情况;以及系统运行出错率等等一系列的实时数据采集时,最简单的方法就是在系统的入口、出口和关键位置设置埋点,然后将采集到的信息发送到实时监控平台或者存入到缓存和DB中做进一步的分析和展示。
Metrics作为一款监控指标的度量类库,提供了许多工具帮助开发者来完成各项数据的监控。
详见官方文档:https://metrics.dropwizard.io/3.1.0/manual/core/
一.Metrice 工具类库的介绍
Metrics提供5种基本的度量类型:Meters Gauges Counters Histograms 和 Timers
1.设置maven依赖
<dependencies><dependency><groupId>io.dropwizard.metrics</groupId><artifactId>metrics-core</artifactId><version>3.2.6</version></dependency><dependency><groupId>io.dropwizard.metrics</groupId><artifactId>metrics-healthchecks</artifactId><version>3.2.6</version></dependency> </dependencies>
2.Meters 的介绍与使用
//Meter(测量)是一种只能自增的计数器,通常用来度量一系列事件发生的概率。它提供了平均速率,以及指数平滑平均速率,以及采样后的1分钟,5分钟,15分钟的样例。 public class MetricsExample {//创建注册表private final static MetricRegistry registry = new MetricRegistry();//创建tps测量表private final static Meter requestMeter = registry.meter("tps");//创建异常测量表private final static Meter errorMeter = registry.meter("err_request");public static void main(String[] args) {//数据生成报告(按每分钟来统计)ConsoleReporter report = ConsoleReporter.forRegistry(registry).convertRatesTo(TimeUnit.MINUTES).convertDurationsTo(TimeUnit.MINUTES).build();report.start(10, TimeUnit.SECONDS); //每10秒将数据打印到控制台上for(;;){ //模拟一直调用请求getAsk(); //发送请求randomSleep(); //间隔的发送请求 }}//处理请求方法public static void getAsk(){try {requestMeter.mark();randomSleep();int x = 10/ThreadLocalRandom.current().nextInt(6);} catch (Exception e) {System.out.println("Error");errorMeter.mark();}}//模拟处理请求耗时public static void randomSleep(){try {TimeUnit.SECONDS.sleep(ThreadLocalRandom.current().nextInt(10)); //随机休眠时间} catch (InterruptedException e) {e.printStackTrace();}} }
//打印结果如下
19-6-4 16:38:47 ================================================================
-- Meters ----------------------------------------------------------------------
err_request
count = 1
mean rate = 1.50 events/minute
1-minute rate = 0.75 events/minute
5-minute rate = 0.19 events/minute
15-minute rate = 0.07 events/minute
tps
count = 4
mean rate = 5.99 events/minute
1-minute rate = 8.85 events/minute
5-minute rate = 11.24 events/minute
15-minute rate = 11.74 events/minute
3.gauge的介绍与使用
3.1 gauge的使用
/*** @des gauge的使用 * @author zhao* @date 2019年6月14日上午12:08:02* Gauge是一个最简单的计量,一般用来统计瞬时状态的数据信息* 例:某一时刻的集合中的大小 */ public class GaugeExample { //定义度量中心 private static MetricRegistry registry = new MetricRegistry(); //定义队列private static Queue<Integer> queue = new LinkedBlockingQueue<>(); public static void main(String[] args) throws InterruptedException { //将信息展示到控制台上 ConsoleReporter reporter = ConsoleReporter.forRegistry(registry).build(); reporter.start(3, TimeUnit.SECONDS); Gauge<Integer> gauge = new Gauge<Integer>() { @Override public Integer getValue() { return queue.size(); } }; //将定义过的gauge 注册到注册中心 registry.register(MetricRegistry.name(GaugeExample.class, "queue-size"), gauge); //模拟queue队列中的数据 for (int i = 0; i < 100; i++) { queue.add(i); TimeUnit.MILLISECONDS.sleep(100); } Thread.currentThread().join(); } } // 打印结果 19-6-14 0:39:17 ================================================================-- Gauges ---------------------------------------------------------------------- com.zpb.gauge.GaugeExample.queue-size value = 3119-6-14 0:39:20 ================================================================-- Gauges ---------------------------------------------------------------------- com.zpb.gauge.GaugeExample.queue-size value = 6019-6-14 0:39:23 ================================================================-- Gauges ---------------------------------------------------------------------- com.zpb.gauge.GaugeExample.queue-size value = 90
3.2RatioGauge 的使用
作用:度量事件成功率的计算。 例:度量缓存命中率、接口调用率等等。
public class RatioGaugeExample {private static MetricRegistry registry = new MetricRegistry();private static Meter totalMeter = registry.meter("totalCount");private static Meter succMeter = registry.meter("succCount");public static void main(String[] args) {ConsoleReporter reporter = ConsoleReporter.forRegistry(registry).build();reporter.start(5, TimeUnit.SECONDS); //每5秒发送一次到控制台 registry.gauge("succ-ratio", ()-> new RatioGauge() {@Overrideprotected Ratio getRatio() {return Ratio.of(succMeter.getCount(),totalMeter.getCount()); //第一个参数:分子 第二个参数:分母 }});//调用for(;;){processHandle();}}public static void processHandle(){//total count totalMeter.mark();try {int x = 10/ThreadLocalRandom.current().nextInt(10);TimeUnit.MILLISECONDS.sleep(100);//succ count succMeter.mark();} catch (Exception e) {System.out.println("================ err");}} }
//打印结果
19-6-17 9:28:13 ================================================================
-- Gauges ----------------------------------------------------------------------
succ-ratio
value = 0.9607843137254902
-- Meters ----------------------------------------------------------------------
succCount
count = 49
mean rate = 9.52 events/second
1-minute rate = 9.60 events/second
5-minute rate = 9.60 events/second
15-minute rate = 9.60 events/second
totalCount
count = 51
mean rate = 9.90 events/second
1-minute rate = 10.00 events/second
5-minute rate = 10.00 events/second
15-minute rate = 10.00 events/second
19-6-17 9:28:18 ================================================================
-- Gauges ----------------------------------------------------------------------
succ-ratio
value = 0.9423076923076923
-- Meters ----------------------------------------------------------------------
succCount
count = 98
mean rate = 9.71 events/second
1-minute rate = 9.63 events/second
5-minute rate = 9.61 events/second
15-minute rate = 9.60 events/second
totalCount
count = 104
mean rate = 10.31 events/second
1-minute rate = 10.06 events/second
5-minute rate = 10.01 events/second
15-minute rate = 10.00 events/second
4.Counter 的使用
作用:Counter是Gauge的一个特例,维护一个计数器,可以通过inc()和dec()方法对计数器做修改。使用步骤与Gauge基本类似,在MetricRegistry中提供了静态方法可以直接实例化一个Counter。可以用来度量生产者和消费者之间的关系
public class CounterExample {private static final Logger LOG = LoggerFactory.getLogger(CounterExample.class);//度量注册中心private static final MetricRegistry registry = new MetricRegistry();//度量计数器private static final Counter counter = registry.counter(MetricRegistry.name(CounterExample.class, "")); private static final ConsoleReporter report = ConsoleReporter.forRegistry(registry).convertRatesTo(TimeUnit.MINUTES).convertDurationsTo(TimeUnit.MINUTES).build();private static Queue<String> queue = new LinkedList<String>();public static void main(String[] args) throws Exception { report.start(5, TimeUnit.SECONDS); //每5秒将数据打印到控制台上new Thread(new Runnable() {@Overridepublic void run() {try {production("abc");} catch (InterruptedException e) {e.printStackTrace();}}}).start();new Thread(new Runnable() {@Overridepublic void run() {try {consume();} catch (InterruptedException e) {e.printStackTrace();}}}).start();;Thread.currentThread().join();}public static void production(String s) throws InterruptedException{for(int i = 0; i < 100;i++){counter.inc();queue.offer(s);}}public static void consume() throws InterruptedException{while(queue.size() != 0){queue.poll(); //删除第1个元素 counter.dec();}} }
5.Histograms直方图
作用:主要使用来统计数据的分布情况, 最大值、最小值、平均值、中位数,百分比(75%、90%、95%、98%、99%和99.9%)。
例如,需要统计某个页面的请求、接口方法请求的响应时间
public class HistogramsExample {private static final MetricRegistry registry = new MetricRegistry();private static ConsoleReporter reporter = ConsoleReporter.forRegistry(registry).build();//实例化一个Histogramsprivate static final Histogram histogram = registry.histogram(MetricRegistry.name(HistogramsExample.class,"histogram"));public static void main(String[] args) throws InterruptedException {reporter.start(5, TimeUnit.SECONDS);Random r = new Random();while(true){processHandle(r.nextDouble());Thread.sleep(100);}}private static void processHandle(Double d){histogram.update((int) (d*100)); //在应用中,需要统计的位置调用Histogram的update()方法。 }}
6.Timer的使用
作用:统计请求的速率和处理时间
例如:某接口的总在一定时间内的请求总数,平均处理时间
public class TimerExample {//创建度量中心private static final MetricRegistry registry = new MetricRegistry();//输出到控制台private static final ConsoleReporter report = ConsoleReporter.forRegistry(registry).build(); //实例化timerprivate static final Timer timer = registry.timer("request");public static void main(String[] args) {report.start(5, TimeUnit.SECONDS);while(true){handleRequest();}}private static void handleRequest(){Context time = timer.time();try {Thread.sleep(500); //模拟处理请求时间} catch (Exception e) {System.out.println("err");}finally {time.stop(); //每次执行完都会关闭System.out.println("==== timer 已关闭");}} } // 打印结果 19-6-17 11:25:27 ===============================================================-- Histograms ------------------------------------------------------------------ com.zpb.histograms.HistogramsExample.histogramcount = 50 #总请求数min = 0max = 98 mean = 53.14 #平均值 stddev = 27.04 #标准差median = 50.00 #中间值75% <= 78.0095% <= 92.0098% <= 94.0099% <= 98.0099.9% <= 98.00
7.HealthChecks
作用:健康检查,用于对系统应用、子模块、关联模块的运行是否正常做检测
实现过程:
类A:继承 HealthCheck ,并重写check()方法 ,在check()中调用类B中的被检测方法
类B:定义一个方法,返回结果是boolean类型。(类B也可以是其它系统中的一个类)
public class HealthChecksExample extends HealthCheck{private DataBase database;public HealthChecksExample(DataBase database) {this.database = database;}@Overrideprotected Result check() throws Exception {if (database.ping()) {return Result.healthy();}return Result.unhealthy("Can't ping database.");}static class DataBase{//模拟ping方法public boolean ping(){Random r = new Random();return r.nextBoolean();}}public static void main(String[] args) {//创建健康检查注册中心HealthCheckRegistry registry = new HealthCheckRegistry();//将被检查的类注册到中心
registry.register("database1",new HealthChecksExample(new DataBase()));registry.register("database2", new HealthChecksExample(new DataBase()));
//从运行的健康检查注册中心获取被检测的结果Set<Entry<String, Result>> entrySet = registry.runHealthChecks().entrySet();while(true){for(Entry<String, Result> entry : entrySet){if(entry.getValue().isHealthy()){System.out.println(entry.getKey()+": OK");}else{System.err.println(entry.getKey()+"FAIL:error message: "+entry.getValue().getMessage());final Throwable e = entry.getValue().getError();if(e !=null){e.printStackTrace();}}}try {Thread.sleep(1000);} catch (Exception e) {e.printStackTrace();}}} }
//打印结果
database1FAIL:error message: Can't ping database.
database2: OK
database1FAIL:error message: Can't ping database.
database2: OK
database1FAIL:error message: Can't ping database.
database2: OK
二.report 报告
如上例子所示,我们拿到了很多类型的数据,但我们不能展示到控制台上,因此我们需要将数据导出,做成可展示的报告,在官网上有很多种类型的report,这里只介绍在工作中经常使用到的。
将数据写到log日志中
将日志通过logback写入到日志中,具体使用配置过程详见:loback的介绍与配置-(通俗易通)
public class TimerExample {//创建度量中心private static final MetricRegistry registry = new MetricRegistry();//输出到日志文件中 private static final Slf4jReporter report = Slf4jReporter.forRegistry(registry).outputTo(LoggerFactory.getLogger("com.metrics.timer")) //定义该日志写到哪个包,这个你可以随意定义,但要与logback.xml中的logger中name一致即可 .convertRatesTo(TimeUnit.SECONDS).convertDurationsTo(TimeUnit.SECONDS).build();//实例化timerprivate static final Timer timer = registry.timer("request");public static void main(String[] args) {report.start(5, TimeUnit.SECONDS);while(true){handleRequest();}}private static void handleRequest(){Context time = timer.time();try {Thread.sleep(500);; //模拟处理请求时间} catch (Exception e) {System.out.println("err ="+e);}finally {time.stop(); //一定要写finally,每次执行完都会关闭System.out.println("==== timer 已关闭");}} }
2.Counter将数据写入到日志中
public class CounterExample {private static final Logger LOG = LoggerFactory.getLogger(CounterExample.class);//度量注册中心private static final MetricRegistry registry = new MetricRegistry();//度量计数器private static final Counter counter = registry.counter(MetricRegistry.name(CounterExample.class, ""));//通过logback打印到日志文件上private static final Slf4jReporter reporter = Slf4jReporter.forRegistry(registry).outputTo(LoggerFactory.getLogger("com.metrics")).convertRatesTo(TimeUnit.SECONDS).convertDurationsTo(TimeUnit.SECONDS).build();private static Queue<String> queue = new LinkedList<String>();public static void main(String[] args) throws Exception {reporter.start(5, TimeUnit.SECONDS); //每5秒钟写一次日志
new Thread(new Runnable() {@Overridepublic void run() {try {production("abc");} catch (InterruptedException e) {e.printStackTrace();}}}).start();new Thread(new Runnable() {@Overridepublic void run() {try {consume();} catch (InterruptedException e) {e.printStackTrace();}}}).start();;Thread.currentThread().join();}public static void production(String s) throws InterruptedException{for(int i = 0; i < 100;i++){counter.inc();queue.offer(s);System.out.println("------- 生产 ----------->"+queue.size());}}public static void consume() throws InterruptedException{while(queue.size() != 0){queue.poll(); //删除第1个元素 counter.dec();System.err.println("<------- 消费 ----------- "+queue.size());}} }