在安装mongodb 2.6.2的时候发现一个奇怪的问题 RS 节点的journal log并没有提前分配.我们知道在安装mongodb的时候
mongo总是会预先分配journal log 启用smallfile的时候默认为128MB 否则会分配1GB的journal log

下面是仲裁节点的日志:

2014-06-17T11:50:09.842+0800 [initandlisten] MongoDB starting : pid=4749 port=27017 dbpath=/data/mongodb/data 64-bit host=vm-3-57
2014-06-17T11:50:09.844+0800 [initandlisten] db version v2.6.2
2014-06-17T11:50:09.844+0800 [initandlisten] git version: 4d06e27876697d67348a397955b46dabb8443827
2014-06-17T11:50:09.844+0800 [initandlisten] build info: Linux build10.nj1.10gen.cc 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3 21:39:27 UTC 2014 x86_64 BOOST_LIB_VERSION=1_49
2014-06-17T11:50:09.844+0800 [initandlisten] allocator: tcmalloc
2014-06-17T11:50:09.844+0800 [initandlisten] options: { config: "/data/mongodb/mongod.cnf", net: { http: { enabled: false }, maxIncomingConnections: 5000, port: 27017, unixDomainSocket: { pathPrefix: "/data/mongodb/data" } }, operationProfiling: { mode: "slowOp", slowOpThresholdMs: 500 }, processManagement: { fork: true, pidFilePath: "/data/mongodb/data/mongod.pid" }, replication: { replSet: "rs1" }, security: { authorization: "enabled", keyFile: "/data/mongodb/data/rs1.keyfile" }, storage: { dbPath: "/data/mongodb/data", directoryPerDB: true, journal: { enabled: true }, repairPath: "/data/mongodb/data", syncPeriodSecs: 10.0 }, systemLog: { destination: "file", path: "/data/mongodb/log/mongod_data.log", quiet: true } }
2014-06-17T11:50:09.863+0800 [initandlisten] journal dir=/data/mongodb/data/journal
2014-06-17T11:50:09.864+0800 [initandlisten] recover : no journal files present, no recovery needed
2014-06-17T11:50:10.147+0800 [initandlisten] preallocateIsFaster=true 3.52
2014-06-17T11:50:10.378+0800 [initandlisten] preallocateIsFaster=true 3.4
2014-06-17T11:50:11.662+0800 [initandlisten] preallocateIsFaster=true 2.9
2014-06-17T11:50:11.662+0800 [initandlisten] preallocating a journal file /data/mongodb/data/journal/prealloc.0
2014-06-17T11:50:14.009+0800 [initandlisten]        File Preallocator Progress: 629145600/1073741824    58%
2014-06-17T11:50:26.266+0800 [initandlisten] preallocating a journal file /data/mongodb/data/journal/prealloc.1
2014-06-17T11:50:29.009+0800 [initandlisten]        File Preallocator Progress: 723517440/1073741824    67%
2014-06-17T11:50:40.751+0800 [initandlisten] preallocating a journal file /data/mongodb/data/journal/prealloc.2
2014-06-17T11:50:43.020+0800 [initandlisten]        File Preallocator Progress: 597688320/1073741824    55%
2014-06-17T11:50:55.830+0800 [FileAllocator] allocating new datafile /data/mongodb/data/local/local.ns, filling with zeroes...

mongo默认创建了3个1GB 的journal log

再来看下RS 节点的日志:

2014-06-17T14:31:31.095+0800 [initandlisten] MongoDB starting : pid=8630 port=27017 dbpath=/storage/sas/mongodb/data 64-bit host=db-mysql-common01a
2014-06-17T14:31:31.096+0800 [initandlisten] db version v2.6.2
2014-06-17T14:31:31.096+0800 [initandlisten] git version: 4d06e27876697d67348a397955b46dabb8443827
2014-06-17T14:31:31.096+0800 [initandlisten] build info: Linux build10.nj1.10gen.cc 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3 21:39:27 UTC 2014 x86_64 BOOST_LIB_VERSION=1_49
2014-06-17T14:31:31.096+0800 [initandlisten] allocator: tcmalloc
2014-06-17T14:31:31.096+0800 [initandlisten] options: { config: "/storage/sas/mongodb/mongod.cnf", net: { http: { enabled: false }, maxIncomingConnections: 5000, port: 27017, unixDomainSocket: { pathPrefix: "/storage/sas/mongodb/data" } }, operationProfiling: { mode: "slowOp", slowOpThresholdMs: 500 }, processManagement: { fork: true, pidFilePath: "/storage/sas/mongodb/data/mongod.pid" }, replication: { replSet: "rs1" }, security: { authorization: "enabled", keyFile: "/storage/sas/mongodb/data/rs1.keyfile" }, storage: { dbPath: "/storage/sas/mongodb/data", directoryPerDB: true, journal: { enabled: true }, repairPath: "/storage/sas/mongodb/data", syncPeriodSecs: 10.0 }, systemLog: { destination: "file", path: "/storage/sas/mongodb/log/mongod_data.log", quiet: true } }
2014-06-17T14:31:31.101+0800 [initandlisten] journal dir=/storage/sas/mongodb/data/journal
2014-06-17T14:31:31.102+0800 [initandlisten] recover : no journal files present, no recovery needed
2014-06-17T14:31:31.130+0800 [FileAllocator] allocating new datafile /storage/sas/mongodb/data/local/local.ns, filling with zeroes...
2014-06-17T14:31:31.130+0800 [FileAllocator] creating directory /storage/sas/mongodb/data/local/_tmp
2014-06-17T14:31:31.132+0800 [FileAllocator] done allocating datafile /storage/sas/mongodb/data/local/local.ns, size: 16MB,  took 0 secs
2014-06-17T14:31:31.137+0800 [FileAllocator] allocating new datafile /storage/sas/mongodb/data/local/local.0, filling with zeroes...
2014-06-17T14:31:31.138+0800 [FileAllocator] done allocating datafile /storage/sas/mongodb/data/local/local.0, size: 64MB,  took 0 secs
2014-06-17T14:31:31.141+0800 [initandlisten] build index on: local.startup_log properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "local.startup_log" }

没有创建journal log 直接创建了datafile 这个问题很奇怪 之前认为是ext4的问题,咨询朋友之后发现mongodb在创建journal log 之前会有一个判断,部分源码如下:

// @file dur_journal.cpp writing to the writeahead logging journal

  bool _preallocateIsFaster() {
            bool faster = false;
            boost::filesystem::path p = getJournalDir() / "tempLatencyTest";
            if (boost::filesystem::exists(p)) {
                try {
                    remove(p);
                }
                catch(const std::exception& e) {
                    log() << "Unable to remove temporary file due to: " << e.what() << endl;
                }
            }
            try {
                AlignedBuilder b(8192);
                int millis[2];
                const int N = 50;
                for( int pass = 0; pass < 2; pass++ ) {
                    LogFile f(p.string());
                    Timer t;
                    for( int i = 0 ; i < N; i++ ) { 
                        f.synchronousAppend(b.buf(), 8192);
                    }
                    millis[pass] = t.millis();
                    // second time through, file exists and is prealloc case
                }
                int diff = millis[0] - millis[1];
                if( diff > 2 * N ) {
                    // at least 2ms faster for prealloc case?
                    faster = true;
                    log() << "preallocateIsFaster=true " << diff / (1.0*N) << endl;
                }
            }
            catch (const std::exception& e) {
                log() << "info preallocateIsFaster couldn't run due to: " << e.what()
                      << "; returning false" << endl;
            }
            if (boost::filesystem::exists(p)) {
                try {
                    remove(p);
                }
                catch(const std::exception& e) {
                    log() << "Unable to remove temporary file due to: " << e.what() << endl;
                }
            }
            return faster;
        }
        bool preallocateIsFaster() {
            Timer t;
            bool res = false;
            if( _preallocateIsFaster() && _preallocateIsFaster() ) { 
                // maybe system is just super busy at the moment? sleep a second to let it calm down.  
                // deciding to to prealloc is a medium big decision:
                sleepsecs(1);
                res = _preallocateIsFaster();
            }
            if( t.millis() > 3000 ) 
                log() << "preallocateIsFaster check took " << t.millis()/1000.0 << " secs" << endl;
            return res;
        }
        
 int diff = millis[0] - millis[1];
                if( diff > 2 * N ) {
                    // at least 2ms faster for prealloc case?
                    faster = true;
                    log() << "preallocateIsFaster=true " << diff / (1.0*N) << endl;
                }   

如果diff> 2*N 那么mongo 将会认为 preallocate 是更好的选择,将会预先分配log ,这在仲裁节点日志也体现出来了,都大于2ms

2014-06-17T11:50:10.147+0800 [initandlisten] preallocateIsFaster=true 3.52
2014-06-17T11:50:10.378+0800 [initandlisten] preallocateIsFaster=true 3.4
2014-06-17T11:50:11.662+0800 [initandlisten] preallocateIsFaster=true 2.9

不过个人认为这个设计很无聊,我相信没人会介意初始化日志的那么点时间吧,何况如果出现峰值的之后再去分配log,对IO又是一个冲击。