Processing Timing Logs
Back To: XMC Cat
For the scalability tests done for XMC Cat I needed a means to process my timing log and get the time required for each step. For this I wrote a small program that would process the log and write a csv file.
Contents |
Java Code Changes
In this example, I have a log4j logger declared named timinglog. This logger is in addition to the regular logger I have created for logging in XMC Cat. Also, unlike the regular logger where it is declared in each class using the class name, I use the same name for the log in each class as follows:
protected static Logger timinglog = Logger.getLogger("xmccat.timing.logger");
In the java code for XMC Cat I have log messages that use the following format:
if (timinglog.isDebugEnabled() )
timinglog.debug("AddPropertyOperation-constructor: started: " + System.currentTimeMillis() );
For the processing of the log to work correctly, each message must follow this format: <class name>-<method name>:<description>:<long system time in ms>:<optional additional info>
WARNING: Be sure that your log messages are not in a loop! The processing of the log depends on a set (maximum) number of messages. Not every operation has to have each message, but it cannot have more messages than are in your operation steps file (see below). If you need to measure within a loop, set some accumulators and output log messages after the loop - increment the start time of the loop by the amount in each accumulator. This would be similar to if you had a loop with three steps A, B, and C that are performed 100 times and you did 100 iterations of step A first and then 100 iterations of step B, followed by 100 interations of step C.
For the first log message for any operation, the class name should be preceeded by one or more asterisks (*) such as in the following:
if (timinglog.isDebugEnabled() )
timinglog.debug("******************** CreateObjectOperation-CatalogServiceSkeleton_createObject: started: " + System.currentTimeMillis() );
This is a bit of a kludge based on the log statements I already had created in my code. At some point I'll probably change it to be similar to how it finds the last message of an operation. However, having the asterisks in the first message of an operation makes it easier to read the log after it is processed.
All of the fields are required except the last colon and the additional info. This is used to add info that we want to see in the processing output that could vary between particular instances in a run. In my tests I use the additional info to add the user's DN as a comment in the start of the create method. By having different DN's used in the scalability testing for inserting different types of objects (experiments versus files) I can diffentiate these easily in the processed log. The fields are not actually in angle brackets (see the example above). Each message is a debug statement, but wrapped in a call to the isDebugEnabled() method in log4j so that the message is not processed unless the timing log is on (set to DEBUG).
log4j properties file
The log4j properties file I use has the following additional logger for the timing:
log4j.logger.xmccat.timing.logger=DEBUG,xmctiming
log4j.appender.xmctiming=org.apache.log4j.RollingFileAppender
log4j.appender.xmctiming.MaxBackupIndex=100
log4j.appender.xmctiming.MaxFileSize=200MB
log4j.appender.xmctiming.Threshold=DEBUG
log4j.appender.xmctiming.layout=org.apache.log4j.PatternLayout
log4j.appender.xmctiming.layout.ConversionPattern=%-4d{DATE} [%t] %-5p %c %x -%m%n
log4j.appender.xmctiming.File=xmccat-timing.log
The Log Generated
The log generated based on this approach would have a line for each logging statement that is similar to the following (the messages are obviously dependent on what you add to your code).
01 Feb 2010 03:15:50,661 [http-10083-Processor25] DEBUG xmccat.timing.logger -QueryOperation-constructor: started: 1265012150661
Processing The Log
To process the log, we need to know the order of the log messages. In part this is so we can line up the messages between calls to an operation. Depending on the particulars of an invocation, possibly not all of the steps are invoked in each instance.
The configuration probably should be XML, but I use a text file that must follow these rules:
1. There is one line per operation type. In XMC Cat these are the operations exposed in the WSDL (e.g., query by ID, context query, create object, etc.). 2. The line for each operation is as follows: <operation name>,<message>,<message>, ...,<message> operation name: text (e.g., QueryByIdOperation) message: <class name>:<method name>:<description>
The class name, method name, and description must match the same fields as entered in the log messages in the java code as described above. A snippet from the setup file for processing the XMC Cat timing log is as follows:
QueryByIdOperation,QueryByIdOperation:CatalogServiceSkeleton_objectIdQuery:started,QueryOperation:constructor:started,...
To process the log, use the LogProcessor class' main method from the package named edu.indiana.dde.xmccat.logprocessor.
The location of the log4.jproperties file (if needed) can be set with the java option scalability such as:
-Dscalability.log4j=/mypath/log4j.properties
The parameters that the utility takes are:
-log <log file name> -sum <summary output file name> -steps <operation step file name> -sec <seconds to skip> -ops <number of operations to skip> -dur <test length in minutes>
The log file is your timing log generated as described above.
The summary file contains averages (mean) for each of the steps in the process. The summary is by operation type. The full path and name for the summary file must not exist - it cannot be overwritten and if it already exists the utility will abort and print a message to that effect. The summary log name is also used as a prefix for the detailed files generated for each operatino type (e.g., query by ID). The detailed summaries have 1 line for each olperation completed.
The operation step file is the file described above with the messages for each step to be included in the output.
The -sec, -ops, and -dur settings are all optional and if present will be ignored if set to 0. The -sec willl exclude operations that start before the number of seconds into the log specified by -sec. For example -sec is set to 5, then any operation that has a starting message within 5 seconds of the first message in the log will not be processed and will not be included in the totals for calculating the averages.
The -ops option specifies how many completed operations will be excluded at the start of the file. If the first operation is long running, it may not be excluded if the spercified number of operations start after it and finish before it.
The -sec and -ops settings are mutually exclusive.
The -dur option says how many minutes after the start of the processing (past the skipped seconds) the processing should stop. If you wanted to collect data for 20 minutes and have a 2-minute warm-up, the -sec would be set to 120 and the -dur would be set to 20. Any operations after the 22nd minute in the logt file would not be processed.
This project can be checked out of the D2I svn.