一次超诡异的FGC，你之前肯定没碰到过的场景，快Get起来技术頭條網

2019-10-15 18:37:48 阿飛的BLOG

正撸着代码，内部聊天工具弹出一条信息：“我这个机器总是频繁FGC...快帮我看看

”

我打开对话框，机智的回复一个表情：

继续默默撸码，代码是我的命，一日不撸，就蓝瘦。随后，小伙伴砸了一段GC日志过来：

<code>2019-09-17T20:33:57.889+0800: 4753520.554: [Full GC (Metadata GC Threshold) 4753520.554: [CMS[YG occupancy: 723220 K (1887488 K)]4753520.988: [weak refs process ing, 0.0042134 secs]4753520.992: [classunloading, 0.0987343 secs]4753521.091: [scrub symbol table, 0.0237609 secs]4753521.115: [scrubstringtable, 0.0025983 s ecs]: 145423K->141376K(3354624K), 0.6260023 secs] 868644K->864597K(5242112K), [Metaspace: 128179K->128179K(1234944K)], 0.6264315 secs] [Times: user=1.24 sys=0.0 0, real=0.63 secs]/<code>
<code>4159962 Heap after GC invocations=8029 (full 50):/<code>
<code>4159963 par newgeneration total 1887488K, used 723220K [0x0000000673400000, 0x00000006f3400000, 0x00000006f3400000)/<code>
<code>4159964 eden space 1677824K, 42% used [0x0000000673400000, 0x000000069ed59090, 0x00000006d9a80000)/<code>
<code>4159965 fromspace 209664K, 4% used [0x00000006d9a80000, 0x00000006da36c210, 0x00000006e6740000)/<code>
<code>4159966 to space 209664K, 0% used [0x00000006e6740000, 0x00000006e6740000, 0x00000006f3400000)/<code>
<code>4159967 concurrent mark-sweep generation total 3354624K, used 141376K [0x00000006f3400000, 0x00000007c0000000, 0x00000007c0000000)/<code>
<code>4159968 Metaspace used 128145K, capacity 136860K, committed 262144K, reserved 1234944K/<code>
<code>4159969 class 
space used 14443K, capacity 16168K, committed 77312K, reserved 1048576K/<code>
<code>4159970 }/<code>
<code>4159971 {Heap before GC invocations=8029 (full 50):/<code>
<code>4159972 par newgeneration total 1887488K, used 723220K [0x0000000673400000, 0x00000006f3400000, 0x00000006f3400000)/<code>
<code>4159973 eden space 1677824K, 42% used [0x0000000673400000, 0x000000069ed59090, 0x00000006d9a80000)/<code>
<code>4159974 fromspace 209664K, 4% used [0x00000006d9a80000, 0x00000006da36c210, 0x00000006e6740000)/<code>
<code>4159975 to space 209664K, 0% used [0x00000006e6740000, 0x00000006e6740000, 0x00000006f3400000)/<code>
<code>4159976 concurrent mark-sweep generation total 3354624K, used 141376K [0x00000006f3400000, 0x00000007c0000000, 0x00000007c0000000)/<code>
<code>4159977 Metaspace used 128145K, capacity 136860K, committed 262144K, reserved 1234944K/<code>
<code>4159978 classspace used 14443K, capacity 16168K, committed 77312K, reserved 1048576K/<code>

我这慧眼一瞧，看到了几个关键单词 <code>FullGC/<code>、<code>MetadataGCThreshold/<code>，然后很随意的回复了

“是不是metaspace没有设置，或者设置太小，导致的FGC ”

然后，又砸过来一段JVM参数配置

<code>CATALINA_OPTS="$CATALINA_OPTS -server -Djava.awt.headless=true -Xms5324m -Xmx5324m -Xss512k -XX:PermSize=350m -XX:MaxPermSize=350m -XX:MetaspaceSize=256m -XX:MaxMet aspaceSize=256m -XX:NewSize=2048m -XX:MaxNewSize=2048m -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=9 -XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly -XX :+CMSScavengeBeforeRemark -XX:+ScavengeBeforeFullGC -XX:+UseCMSCompactAtFullCollection -XX:+CMSParallelRemarkEnabled -XX:CMSFullGCsBeforeCompaction=9 -XX:CMSInitiat ingOccupancyFraction=80 -XX:+CMSClassUnloadingEnabled -XX:SoftRefLRUPolicyMSPerMB=0 -XX:-ReduceInitialCardMarks -XX:+CMSPermGenSweepingEnabled -XX:CMSInitiatingPerm OccupancyFraction=80 -XX:+ExplicitGCInvokesConcurrent -Djava.nio.channels.spi.SelectorProvider=[sun.nio.ch](http://sun.nio.ch/).EPollSelectorProvider -Djava.util.logging.manager=org.apac he.juli.ClassLoaderLogManager -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCApplicationStoppedTime -XX:+PrintHeapAtGC -Xloggc:/data/applogs/heap_trace.txt -XX:+IgnoreUnrecognizedVMOptions -XX:-HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/data/applogs/HeapDumpOnOutOfMemoryError"/<code>

“应该不是，我们配置了

-XX:MaxMetaspaceSize=256m

-XX:MetaspaceSize=256m”

看到配置之后，有点懵逼，好像超出了我的认知范围，一下子没回复，又扔过来一堆数据。

“看cat监控数据，Metaspace使用率在50%的时候就FGC了，GC 日志上的显示也只用了142M，可是我们明明设置了初始值是256M，最大值250M，这还没达到阈值 ”

机智如我，赶紧回复 “等等，我空的时候再看看 ”

等空闲下来，又想起了这个问题，决定好好研究下。

既然是Metadata GC Threshold引起的FGC，那么只可能是MetadataSpace使用完了，我又反复的看了下GC日志片段，盯着看了会

 
<code>[0x00000006f3400000, 0x00000007c0000000, 0x00000007c0000000)/<code>
<code>4159977 Metaspace used 128145K, capacity 136860K, committed 262144K, reserved 1234944K/<code>

发生FGC之前，Metaspace的committed确实达到了256M，但是used却只有125M，难道某一次的类初始化需要大于256 - 125 = 131M？

这显然不合理，排除掉这种情况，那么只有一种解释了，Metaspace包含了太多了内存碎片，导致这256M中没有足够大的连续内存。

之前听过老年代因为CMS的标记清理会产生内存碎片导致FGC，为什么Metaspace也会有这样的问题？

让同事对有问题的机器dump了下，用mat打开之后，发现了新大陆，里面包含了大量的类加载器。

难道这个碎片问题是大量类加载器引起的？

本地验证

有了这个疑问，那就简单了，看看能不能在本地复现。

1、先定义一个自定义的类加载器，破坏双亲委派

<code>publicclassMyClassLoaderextendsClassLoader {/<code>

<code>@Override/<code>
<code>protectedClass> findClass(String name)throwsClassNotFoundException {/<code>
<code>try{/<code>
<code>String filePath = "/Users/zhanjun/Desktop/" + name.replace('.', File.separatorChar) + ".class";/<code>
<code>//指定读取磁盘上的某个文件夹下的.class文件：/<code>
<code>File file = newFile(filePath);/<code>
<code>FileInputStream fis = new 
FileInputStream(file);/<code>
<code>byte bytes =newbyte[fis.available()];/<code>
<code>fis.read(bytes);/<code>
<code>//调用defineClass方法，将字节数组转换成Class对象/<code>
<code>Class> clazz = this.defineClass(name, bytes, 0, bytes.length);/<code>
<code>fis.close;/<code>
<code>returnclazz;/<code>
<code>}catch(FileNotFoundException e){/<code>
<code>e.printStackTrace;/<code>
<code>}  
catch(IOException e) {/<code>
<code>e.printStackTrace;/<code>
<code>} finally{/<code>

<code>}/<code>
<code>returnsuper.findClass(name);/<code>
<code>}/<code>
<code>}/<code>

2、然后在while循环中，不断的 load 已经编译好的class文件

<code>publicstaticvoidmain(String[] args) 
throwsException {/<code>
<code>while(true) {/<code>
<code>Class clazz0 = newMyClassLoader.loadClass("com.sankuai.discover.memory.OOM");/<code>
<code>}/<code>
<code>}/<code>

3、最后，配置一下jvm启动参数

<code>-Xmx2688M -Xms2688M -Xmn960M -XX:MetaspaceSize=50M -XX:MaxMetaspaceSize=100M -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC -XX:+UseConcMarkSweepGC/<code>

启动之后，不一会儿在控制台果然出现了日志

<code>{Heap before GC invocations=0 (full 0):/<code>
 
<code>par newgeneration total 884736K, used 330302K [0x0000000752400000, 0x000000078e400000, 0x000000078e400000)/<code>
<code>eden space 786432K, 42% used [0x0000000752400000, 0x000000076668fae0, 0x0000000782400000)/<code>
<code>fromspace 98304K, 0% used [0x0000000782400000, 0x0000000782400000, 0x0000000788400000)/<code>
<code>to space 98304K, 0% used [0x0000000788400000, 0x0000000788400000, 0x000000078e400000)/<code>
<code>concurrent mark-sweep generation total 1769472K, used 0K [0x000000078e400000, 0x00000007fa400000, 0x00000007fa400000)/<code>
<code>Metaspace used 22636K, capacity 102360K, committed 102400K, reserved 1118208K/<code>
<code>classspace used 8829K, capacity 33008K, committed 33008K, reserved 1048576K/<code>
<code>2019-09-21T16:09:28.562-0800: [Full GC (Metadata GC Threshold) 2019-09-21T16:09:28.562-0800: [CMS: 0K->5029K(1769472K), 0.0987115 secs] 33030 
2K->5029K(2654208K), [Metaspace: 22636K->22636K(1118208K)], 0.1340367 secs] [Times: user=0.11 sys=0.03, real=0.13 secs] /<code>
<code>Heap after GC invocations=1 (full 1):/<code>
<code>par newgeneration total 884736K, used 0K [0x0000000752400000, 0x000000078e400000, 0x000000078e400000)/<code>
<code>eden space 786432K, 0% used [0x0000000752400000, 0x0000000752400000, 0x0000000782400000)/<code>
<code>fromspace 98304K, 0% used [0x0000000782400000, 0x0000000782400000, 0x0000000788400000)/<code>
<code>to space 98304K, 0% used [0x0000000788400000, 0x0000000788400000, 0x000000078e400000)/<code>
<code>concurrent mark-sweep generation total 1769472K, used 5029K [0x000000078e400000, 0x00000007fa400000, 0x00000007fa400000)/<code>
<code>Metaspace used 2885K, capacity 4500K, committed 43008K, reserved 1058816K/<code>
<code>classspace used 291K, capacity 388K, committed 33008K, reserved 1048576K/<code>
 
<code>}/<code>