I am working with an application that runs on tomcat in OMVS. It run terribly on one mainframe, and adequately another another. Is there a way I can compare the CPU of the two mainframes as a reference?

I tried:

/d m=cpu

I didn't find the results very promising. The results seemed to be the same for our mini and our main system. I would assume the mini is actually more limited.

Note: I am looking more for CPU processing power on this particular LPAR.

GC_


Solution 1:

Comparing the number of CPUs of a mainframe image isn't meaningful, most likely. Mainframes are designed to run multiple tasks at the same time, and give priority to whatever the business says is most important, and are capable of being very heavily virtualized, so looking at the number of CPUs doesn't tell you much. You have to understand the environment surrounding your application, which includes the weight assigned to your LPAR (how much access to the logical CPUs the LPAR is guaranteed), other things that are running on the LPAR at the same time, and other things running on other LPARs on the same CEC at the same time. You also need to understand the WLM policy of the LPAR, as this tells z/OS what application goals are most important, and what are less important.

Please note that mainframe performance analysis is a specialized skill, which people spend years learning, so there's a limit to what can be said via stackexchange. Talking to your system programmer/performance analysts would probably be a much better thing to do than trying to figure it out yourself, other than as a pure learning exercise.

That said, I can give you some basic things to look at, or ask about. You may or may not be able to access some of the data/tools I'll mention.

First, and most basic, all mainframes have the ability to gather performance data in SMF 70-79 records, which we recommend shops gather as a matter of common practice, and if you want to get really low-level, SMF 113 records. They are binary records, however, and not easy to understand, but they are there. Their format is documented in the z/OS MVS System Management Facilities (SMF) book.

Next, there are a number of tools that can be used to post-process RMF records, such as RMF, from IBM, and a variety of vendor tools. If you have access to them, you can get very in depth information as to CPU utilization of various address spaces/processes over time. Some tools also have interactive modes, where you can get realtime snapshots of individual LPAR activity, as well as activity across the CEC. SDSF and EJES can also give you some very basic information about the LPAR, CEC, and running address spaces, so you can look at accumulated CPU time, for example. If you can tell us what tools you have access to, we might be able to give you more specific advice.

At a guess, though, while the two images have the same number of logical CPUs defined, the main system has a much higher weight than the mini system, meaning that the main system has guaranteed access to more CPU capacity than the mini one does, and most of the time, the mini system can't and won't attempt to actually dispatch work to most of those CPUs. If you are running on a z13, and are in PROCVIEW CORE mode, one of the things that the /d m=cpu command will tell you is if the CPUs are parked or unparked. Parked CPUs are CPUs that the z/OS image is not going dispatch work to, as the system that owns them (probably the main system, if both are on the same CEC) is dispatching work to them.

Solution 2:

Kevin mentions a number of great points, but it might help to start the analysis at a little higher level: what are the two machines and, since we're talking about Tomcat which runs in a JVM, do both have zAAPs or zIIPs (assuming zAAP on zIIP)?

From the "d m=cpu" you should be able to get the machine model information which will at least let you know if you're really comparing apples to apples. Here's an oldish example from my notes:

D M=CPU                                                
IEE174I 13.15.43 DISPLAY M 443                         
PROCESSOR STATUS                                       
ID  CPU                  SERIAL                        
00  +                     0xxxxx2817                   
01  +                     0xxxxx2817                   
02  -                                                  
03  N                                                  
04  N                                                  
05  N                                                  
06  N                                                  
07  N                                                  
08  NI                                                 
09  NI                                                 
0A  NI                                                 
0B  NI                                                 

CPC ND = 002817.M15.IBM.02.0000000xxxxx                
CPC SI = 2817.403.IBM.02.00000000000xxxxx            

Key points here: the xxxxx is the (obscured here) serial number. The 2817 is the model number, which equates to a z196, which today in 2016 is two generations back from the current z13 (model 2964). The model numbers make very limited sense: you'll have to look them up. But if the two machines in question are different models, that's part one of a difference.

The "M15" on the CPC ND line is an indication of how many books/drawers are installed, it's likely a minor consideration in this situation.

The "403" on the CPC SI line though is important. The "4" indicates the relative speed of the general purpose (GP) engines. For the larger (used to be called Enterprise Class) machines, this can range from 4 (slowest) to 7 (fastest). For the smaller (used to be called "Business Class") machines, the speed indicator goes from A (slowest) to Z (fastest, but slower than a 7xx of the same machine generation). The "03" indicates how many GPs are available on the machine. For common configurations of less than 100 GPs, this is simply a decimal number. So in this example the machine is a z196 with 3 GPs that are running at the slowest speed possible on this generation of machine.

However, you mentioned Tomcat, and since Tomcat runs in the JVM, most of it's CPU time should actually be on specialty engine--either a zAAP or zIIP, assuming that A) such was purchased on the machine, and B) they are configured properly to the LPAR. The specialty engines run at full speed regardless of the speed of the general purpose engines. I.E. zIIPs always run at the 7xx speed, even if they are on a 4xx machine.

If you're trying to run Tomcat without specialty engines... well, that's probably not good if you're on a sub-capacity (not a 7xx) machine, for potentially a number of reasons related to likely available capacity and software costs.

However, note that even though most of Tomcat's CPU time will be offloaded to the zIIP/zAAP (when available), there will still be some amount that runs on the GP engines, making the GP situation important to understand as well. Depending on the configuration the amount run on the GPs might be as low as 1-2% of the total or could be >10%.

Note in the above display that the zIIPs are CPUs 08-0B, but they're "N"ot available. In this case they were defined to the LPAR, but they weren't currently available on the hardware because this was a DR machine that didn't have it's CBU configuration in place at the time of the snapshot. Unfortunately those are only logical zIIPs, the number of physical zIIPs or zAAPs are not available from this display, that's actually a bit harder to track down. But if you have logical zIIPs/zAAPs online you know that you have at least some phyical engines to back them.

Even if the two machines in question are the same generation and same speed settings and same number of engines (GPs and zIIPs), then comes into play the whole host of questions/issues around the fact that mainframes rarely run a single system--usually there are multiple LPARs running concurrently. And in that situation, you have to start digging into the data that Kevin mentions to understand what's really going on. But if you're comparing apples (2964-605, with zIIPs) to bananas (2828-F03 without specialty engines) you should expect performance differences from the start.

Finally, I should note that the version of Java being used relative to the generation of machine is important too. For example, if the two machines in question are z13s, but one Tomcat is using Java 8 and the other Java 7, I would expect differences because the exploitation of the new z13 instructions is only in Java 8.

And all this just focuses on CPU-related performance issues. Obviously you could have differences and issues elsewhere as well. But CPU is a good place to start looking absent any other information.

I have successfully run Tomcat on z/OS with little to no trouble--but I had adequate zAAP/zIIP capacity available.