在安装mongodb 2.6.2的时候发现一个奇怪的问题 RS 节点的journal log并没有提前分配.我们知道在安装mongodb的时候
mongo总是会预先分配journal log 启用smallfile的时候默认为128MB 否则会分配1GB的journal log
下面是仲裁节点的日志:
2014-06-17T11:50:09.842+0800 [initandlisten] MongoDB starting : pid=4749 port=27017 dbpath=/data/mongodb/data 64-bit host=vm-3-57 2014-06-17T11:50:09.844+0800 [initandlisten] db version v2.6.2 2014-06-17T11:50:09.844+0800 [initandlisten] git version: 4d06e27876697d67348a397955b46dabb8443827 2014-06-17T11:50:09.844+0800 [initandlisten] build info: Linux build10.nj1.10gen.cc 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3 21:39:27 UTC 2014 x86_64 BOOST_LIB_VERSION=1_49 2014-06-17T11:50:09.844+0800 [initandlisten] allocator: tcmalloc 2014-06-17T11:50:09.844+0800 [initandlisten] options: { config: "/data/mongodb/mongod.cnf", net: { http: { enabled: false }, maxIncomingConnections: 5000, port: 27017, unixDomainSocket: { pathPrefix: "/data/mongodb/data" } }, operationProfiling: { mode: "slowOp", slowOpThresholdMs: 500 }, processManagement: { fork: true, pidFilePath: "/data/mongodb/data/mongod.pid" }, replication: { replSet: "rs1" }, security: { authorization: "enabled", keyFile: "/data/mongodb/data/rs1.keyfile" }, storage: { dbPath: "/data/mongodb/data", directoryPerDB: true, journal: { enabled: true }, repairPath: "/data/mongodb/data", syncPeriodSecs: 10.0 }, systemLog: { destination: "file", path: "/data/mongodb/log/mongod_data.log", quiet: true } } 2014-06-17T11:50:09.863+0800 [initandlisten] journal dir=/data/mongodb/data/journal 2014-06-17T11:50:09.864+0800 [initandlisten] recover : no journal files present, no recovery needed 2014-06-17T11:50:10.147+0800 [initandlisten] preallocateIsFaster=true 3.52 2014-06-17T11:50:10.378+0800 [initandlisten] preallocateIsFaster=true 3.4 2014-06-17T11:50:11.662+0800 [initandlisten] preallocateIsFaster=true 2.9 2014-06-17T11:50:11.662+0800 [initandlisten] preallocating a journal file /data/mongodb/data/journal/prealloc.0 2014-06-17T11:50:14.009+0800 [initandlisten] File Preallocator Progress: 629145600/1073741824 58% 2014-06-17T11:50:26.266+0800 [initandlisten] preallocating a journal file /data/mongodb/data/journal/prealloc.1 2014-06-17T11:50:29.009+0800 [initandlisten] File Preallocator Progress: 723517440/1073741824 67% 2014-06-17T11:50:40.751+0800 [initandlisten] preallocating a journal file /data/mongodb/data/journal/prealloc.2 2014-06-17T11:50:43.020+0800 [initandlisten] File Preallocator Progress: 597688320/1073741824 55% 2014-06-17T11:50:55.830+0800 [FileAllocator] allocating new datafile /data/mongodb/data/local/local.ns, filling with zeroes...
mongo默认创建了3个1GB 的journal log
再来看下RS 节点的日志:
2014-06-17T14:31:31.095+0800 [initandlisten] MongoDB starting : pid=8630 port=27017 dbpath=/storage/sas/mongodb/data 64-bit host=db-mysql-common01a 2014-06-17T14:31:31.096+0800 [initandlisten] db version v2.6.2 2014-06-17T14:31:31.096+0800 [initandlisten] git version: 4d06e27876697d67348a397955b46dabb8443827 2014-06-17T14:31:31.096+0800 [initandlisten] build info: Linux build10.nj1.10gen.cc 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3 21:39:27 UTC 2014 x86_64 BOOST_LIB_VERSION=1_49 2014-06-17T14:31:31.096+0800 [initandlisten] allocator: tcmalloc 2014-06-17T14:31:31.096+0800 [initandlisten] options: { config: "/storage/sas/mongodb/mongod.cnf", net: { http: { enabled: false }, maxIncomingConnections: 5000, port: 27017, unixDomainSocket: { pathPrefix: "/storage/sas/mongodb/data" } }, operationProfiling: { mode: "slowOp", slowOpThresholdMs: 500 }, processManagement: { fork: true, pidFilePath: "/storage/sas/mongodb/data/mongod.pid" }, replication: { replSet: "rs1" }, security: { authorization: "enabled", keyFile: "/storage/sas/mongodb/data/rs1.keyfile" }, storage: { dbPath: "/storage/sas/mongodb/data", directoryPerDB: true, journal: { enabled: true }, repairPath: "/storage/sas/mongodb/data", syncPeriodSecs: 10.0 }, systemLog: { destination: "file", path: "/storage/sas/mongodb/log/mongod_data.log", quiet: true } } 2014-06-17T14:31:31.101+0800 [initandlisten] journal dir=/storage/sas/mongodb/data/journal 2014-06-17T14:31:31.102+0800 [initandlisten] recover : no journal files present, no recovery needed 2014-06-17T14:31:31.130+0800 [FileAllocator] allocating new datafile /storage/sas/mongodb/data/local/local.ns, filling with zeroes... 2014-06-17T14:31:31.130+0800 [FileAllocator] creating directory /storage/sas/mongodb/data/local/_tmp 2014-06-17T14:31:31.132+0800 [FileAllocator] done allocating datafile /storage/sas/mongodb/data/local/local.ns, size: 16MB, took 0 secs 2014-06-17T14:31:31.137+0800 [FileAllocator] allocating new datafile /storage/sas/mongodb/data/local/local.0, filling with zeroes... 2014-06-17T14:31:31.138+0800 [FileAllocator] done allocating datafile /storage/sas/mongodb/data/local/local.0, size: 64MB, took 0 secs 2014-06-17T14:31:31.141+0800 [initandlisten] build index on: local.startup_log properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "local.startup_log" }
没有创建journal log 直接创建了datafile 这个问题很奇怪 之前认为是ext4的问题,咨询朋友之后发现mongodb在创建journal log 之前会有一个判断,部分源码如下:
// @file dur_journal.cpp writing to the writeahead logging journal bool _preallocateIsFaster() { bool faster = false; boost::filesystem::path p = getJournalDir() / "tempLatencyTest"; if (boost::filesystem::exists(p)) { try { remove(p); } catch(const std::exception& e) { log() << "Unable to remove temporary file due to: " << e.what() << endl; } } try { AlignedBuilder b(8192); int millis[2]; const int N = 50; for( int pass = 0; pass < 2; pass++ ) { LogFile f(p.string()); Timer t; for( int i = 0 ; i < N; i++ ) { f.synchronousAppend(b.buf(), 8192); } millis[pass] = t.millis(); // second time through, file exists and is prealloc case } int diff = millis[0] - millis[1]; if( diff > 2 * N ) { // at least 2ms faster for prealloc case? faster = true; log() << "preallocateIsFaster=true " << diff / (1.0*N) << endl; } } catch (const std::exception& e) { log() << "info preallocateIsFaster couldn't run due to: " << e.what() << "; returning false" << endl; } if (boost::filesystem::exists(p)) { try { remove(p); } catch(const std::exception& e) { log() << "Unable to remove temporary file due to: " << e.what() << endl; } } return faster; } bool preallocateIsFaster() { Timer t; bool res = false; if( _preallocateIsFaster() && _preallocateIsFaster() ) { // maybe system is just super busy at the moment? sleep a second to let it calm down. // deciding to to prealloc is a medium big decision: sleepsecs(1); res = _preallocateIsFaster(); } if( t.millis() > 3000 ) log() << "preallocateIsFaster check took " << t.millis()/1000.0 << " secs" << endl; return res; }
int diff = millis[0] - millis[1]; if( diff > 2 * N ) { // at least 2ms faster for prealloc case? faster = true; log() << "preallocateIsFaster=true " << diff / (1.0*N) << endl; }
如果diff> 2*N 那么mongo 将会认为 preallocate 是更好的选择,将会预先分配log ,这在仲裁节点日志也体现出来了,都大于2ms
2014-06-17T11:50:10.147+0800 [initandlisten] preallocateIsFaster=true 3.52 2014-06-17T11:50:10.378+0800 [initandlisten] preallocateIsFaster=true 3.4 2014-06-17T11:50:11.662+0800 [initandlisten] preallocateIsFaster=true 2.9
不过个人认为这个设计很无聊,我相信没人会介意初始化日志的那么点时间吧,何况如果出现峰值的之后再去分配log,对IO又是一个冲击。