Oracle SODA Python Driver and Jupyter Lab


Oracle SODA Python Driver and Jupyter Lab

This workbook is divided into two sections the first is a quick guide to setting up Jupyter Lab (Python Notebooks) such that it can connect to a database running inside of OCI, in this case an ATP instance. The second section uses the JSON python driver to connect to the database to run a few tests. This notebook is largely a reminder to myself on how to set this up but hopefully it will be useful to others.

Setting up Python 3.6 and Jupyter Lab on our compute instance

I won't go into much detail on setting up ATP or ADW or creating a IaaS server. I covered that process in quite a lot of detail here. We'll be setting up something similar to the following

OCI Cloud

Once you've created the server You'll need to logon to the server with the details found on the compute instances home screen. You just need to grab it's IP address to enable you to logon over ssh.

OCI Cloud

The next step is to connect over ssh to with a command similar to

ssh opc@
Enter passphrase for key '/Users/dgiles/.ssh/id_rsa': 
Last login: Wed Jan  9 20:48:46 2019 from host10.10.10.1

In the following steps we'll be using python so we need to set up python on the server and configure the needed modules. Our first step is to use yum to install python 3.6 (This is personal preference and you could stick with python 2.7). To do this we first need to enable yum and then install the environment. Run the following commands

sudo yum -y install yum-utils
sudo yum-config-manager --enable ol7_developer_epel
sudo yum install -y python36
python3.6 -m venv myvirtualenv
source myvirtualenv/bin/activate

This will install python and enable a virtual environment for use (our own Python sand pit). You can make sure that python is installed by simply typing python3.6 ie.

$> python3.6
Python 3.6.3 (default, Feb  1 2018, 22:26:31) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-16)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> quit()

Make sure you type quit() to leave the REPL environment.

We now need to install the needed modules. You can do this one by one or simply use the following file requirements.txt and run the following command

pip -p requirements.txt

This will install all of the need python modules for the next step which is to start up Jupyter Lab.

Jupyter Lab is an interactive web based application that enables you do interactively run code and document the process at the same time. This blog is written in it and the code below can be run once your environment is set up. Vists the website to see more details.

To start jupyer lab we run the following command.

nohup jupyter-lab --ip= &

Be aware that this will only work if you have activated you virtual environment. In out instance we did this with with the command source myvirtualenv/bin/activate. At this point the jupyter-lab is running in the background and and is listening (by default) on port 8888. You could start a desktop up and use VNC to view the output. However I prefer to redirect the output to my own desktop over ssh. To do this you'll need to run the following ssh command from your desktop

ssh -N -f -L 5555:localhost:8888 opc@

Replacing the IP address above with the one for your compute instance

This will redirect the output of 8888 to port 5555 on your destop. You can then connect to it by simply going to the following url http://localhost:5555. After doing this you should see a screen asking you input a token (you'll only need to do this once). You can find this token inside of the nohup.out file running on the compute instance. It should be near the top of the file and should look something like

[I 20:49:12.237 LabApp]

Just copy the text after "token=" and paste it in to the dialogue box. After completing that step you should see something like this

Jupyter Lab

You can now start creating your own notebooks or load this one found here. I'd visit the website to familiarise yourself on how the notebooks work.

Using Python and the JSON SODA API

This section will walk through using The SODA API with Python from within the Jupyter-lab envionment we set up in the previous section. The SODA API is a simple object API that enables developers persist and retrieve JSON documents held inside of the Oracle Database. SODA drivers are available for Java, C, REST, Node and Python.

You can find the documentation for this API here

To get started we'll need to import the need the following python modules

In [11]:
import cx_Oracle
import keyring
import os
import pandas as pd

We now need to set the location of the directory containing the wallet to enable us to connect to the ATP instance. Once we've done that we can connect to the Oracle ATP instance and get a SODA object to enable us to work with JSON documents. NOTE : I'm using the module keyring to hide the password for my database. You should replace this call with the password for your user.

In [20]:
os.environ['TNS_ADMIN'] = '/home/opc/Wallet'
connection = cx_Oracle.connect('json', keyring.get_password('ATPLondon','json'), 'sbatp_tpurgent')
soda = connection.getSodaDatabase()

We now need to create JSON collection and if needed add any additional indexes which might accelerate data access.

In [21]:
    collection = soda.createCollection("customers_json_soda")
    collection.createIndex({ "name"   : "customer_index",
                          "fields" : [ { "path"     : "name_last",
                          "datatype" : "string"}]})
except cx_Oracle.DatabaseError as ex:
    print("It looks like the index already exists : {}".format(ex))

We can now add data to the collection. Here I'm inserting the document into the database and retrieving it's key. You can find find some examples/test cases on how to use collections here

In [22]:
customer_doc = {"id"          : 1,
       "name_last"    : "Giles",
       "name_first"   : "Dom",
       "description"  : "Gold customer, since 1990",
       "account info" : None,
       "dataplan"     : True,
       "phones"       : [{"type" : "mobile", "number" : 9999965499},
                         {"type" : "home",   "number" : 3248723987}]}
doc = collection.insertOneAndGet(customer_doc)

To fetch documents we could use SQL or Query By Example (QBE) as shown below. You can find further details on QBE here. In this example there should just be a single document. NOTE: I'm simply using pandas DataFrame to render the retrieved data but it does highlight how simple it is to use the framework for additional analysis at a later stage.

In [23]:
documents = collection.find().filter({'name_first': {'$eq': 'Dom'}}).getDocuments()
results = [document.getContent() for document in documents]    
account info dataplan description id name_first name_last phones
0 None True Gold customer, since 1990 1 Dom Giles [{'type': 'mobile', 'number': 9999965499}, {'t...

To update records we can use the replaceOne method.

In [24]:
document = collection.find().filter({'name_first': {'$eq': 'Dom'}}).getOne()
updated = collection.find().key(doc.key).replaceOne({"id"          : 1,
       "name_last"    : "Giles",
       "name_first"   : "Dominic",
       "description"  : "Gold customer, since 1990",
       "account info" : None,
       "dataplan"     : True,
       "phones"       : [{"type" : "mobile", "number" : 9999965499},
                         {"type" : "home",   "number" : 3248723987}]},)

And just to make sure the change happened

In [25]:
data = collection.find().key(document.key).getOne().getContent()
account info dataplan description id name_first name_last phones
0 None True Gold customer, since 1990 1 Dominic Giles [{'type': 'mobile', 'number': 9999965499}, {'t...

And finally we can drop the collection.

In [26]:
except cx_Oracle.DatabaseError as ex:
    print("We're were unable to drop the collection")
In [27]:

Python, Oracle_cx, Altair and Jupyter Notebooks


Simple Oracle/Jupyter/Keyring/Altair Example

In this trivial example we'll be using Jupyter Lab to create this small notebook of Python code. We will see how easy it is to run SQL and analyze those results. The point of the exercise here isn't really the SQL we use but rather how simple it is for developers or analysts who prefer working with code rather than a full blown UI to retrieve and visualise data.

For reference I'm using Python 3.6 and Oracle Database 12c Release 2. You can find the source here

First we'll need to import the required libraries. It's likely that you won't have them installed on your server/workstation. To do that use the following pip command "pip install cx_Oracle keyring pandas altair jupyter jupyter-sql". You should have pip installed and probably be using virtualenv but if you don't I'm afraid that's beyond the scope of this example. Note whilst I'm not importing jupyter and jupyter-sql in this example they are implicitly used.

In [1]:
import cx_Oracle
import keyring
import pandas as pd
import altair as alt

We will be using the magic-sql functionality inside of jupyter-lab. This means we can trivially embed SQL rather than coding the cusrsors and fetches we would typically have to do if we were using straight forward cx_Oracle. This uses the "magic function" syntax" which start with "%" or "%%". First we need to load the support for SQL

In [2]:
%load_ext sql

Next we can connect to the local docker Oracle Database. However one thing we don't want to do with notebooks when working with server application is to disclose our passwords. I use the python module "keyring" which enables you to store the password once in a platform appropriate fashion and then recall it in code. Outside of this notebook I used the keyring command line utility and ran "keyring set local_docker system". It then prompted me for a password which I'm using here. Please be careful with passwords and code notebooks. It's possible to mistakenly serialise a password and then potentially expose it when sharing notebooks with colleagues.

In [3]:
password = keyring.get_password('local_docker','system')

We can then connect to the Oracle Database. In this instance I'm using the cx_Oracle Diriver and telling it to connect to a local docker database running Oracle Database 12c ( Because I'm using a service I need to specify that in the connect string. I also substitute the password I fetched earlier.

In [4]:
%sql oracle+cx_oracle://system:$password@localhost/?service_name=soe
'Connected: system@'

I'm now connected to the Oracle database as the system user. I'm simply using this user as it has access to a few more tables than a typical user would. Next I can issue a simple select statement to fetch some table metadata. the %%sql result << command retrieves the result set into the variable called result

In [5]:
%%sql result <<
select table_name, tablespace_name, num_rows, blocks, avg_row_len, trunc(last_analyzed)
from all_tables
where num_rows  > 0
and tablespace_name is not null
 * oracle+cx_oracle://system:***@localhost/?service_name=soe
0 rows affected.
Returning data to local variable result

In Python a lot of tabular manipulation of data is performed using the module Pandas. It's an amazing piece of code enabling you to analyze, group, filter, join, pivot columnar data with ease. If you've not used it before I strongly reccomend you take a look. With that in mind we need to take the resultset we retrieved and convert it into a DataFrame (the base Pandas tabular structure)

In [6]:
result_df = result.DataFrame()

We can see a sample of that result set the the Panada's head function

In [7]:
table_name tablespace_name num_rows blocks avg_row_len TRUNC(LAST_ANALYZED)
0 BOOTSTRAP$ SYSTEM 60 3 314 2018-06-14
1 CON$ SYSTEM 7068 28 22 2018-06-20
2 OBJ$ SYSTEM 72687 1211 112 2018-06-14
3 TS$ SYSTEM 7 7 89 2018-06-14
4 IND$ SYSTEM 2892 1565 89 2018-06-19

All very useful but what if wanted to chart how many tables were owned by each user. To do that we could of course use the charting functionlty of Pandas (and Matplotlib) but I've recently started experimenting with Altair and found it to make much more sense when define what and how to plot data. So lets just plot the count of tables on the X axis and the tablespace name on the Y axis.

In [8]:

Altair makes much more sense to me as a SQL user than some of the cryptic functionality of Matplotlib. But the chart looks really dull. We can trvially tidy it up a few additonal commands

In [9]:
    x=alt.Y('count()', title='Number of tables in Tablespace'),
    y=alt.Y('tablespace_name', title='Tablespace Name'),
).properties(width=600, height= 150, title='Tables Overview')

Much better but Altair can do significantly more sophisticated charts than a simple bar chart. Next lets take a look at the relationship between the number of rows in a table and the blocks required to store them. It's worth noting that this won't, or at least shouldn't, produce any startling results but it's a useful plot on a near empty database that will give use something to look at. Using another simple scatter chart we get the following

In [10]:
    y = 'num_rows',
    x = 'blocks',

Again it's a near empty database and so there's not much going on or at least we have a lot of very small tables and a few big ones. Lets use a logarithmic scale to spread things out a little better and take the oppertunity to brighten it up

In [11]:
    y = alt.Y('num_rows',scale=alt.Scale(type='log')),
    x = alt.X('blocks', scale=alt.Scale(type='log')),
    color = 'tablespace_name',

Much better... But Altair and the libraries it uses has a another fantastic trick up its sleeve. It can make the charts interactive (zoom and pan) but also enable you to select data in one chart and have it reflected in another. To do that we use a "Selection" object and use this to filter the data in a second chart. Lets do this for the two charts we created. Note you'll need to have run this notebook "live" to see the interactive selection.

In [12]:
interval = alt.selection_interval()

chart1 = alt.Chart(result_df).mark_point().encode(
    y = alt.Y('num_rows',scale=alt.Scale(type='log')),
    x = alt.X('blocks', scale=alt.Scale(type='log')),
    color = 'tablespace_name',
).properties(width=600, selection=interval)

chart2 = alt.Chart(result_df).mark_bar().encode(
    color = 'tablespace_name'
).properties(width=600, height= 150).transform_filter(

chart1 & chart2

And thats it. I'll be using this framework to create a few additional examples in the coming weeks.

The following shows what you would have seen if you had been running the notebook code inside of a browser


Making the alert log just a little more readable

One of the most valuable sources of information about what the Oracle database has done and is currently doing is the alert log. It's something that every Oracle Database professional should be familiar with. So what can you do to improve you chances of not missing important pieces of info? The obvious answer is that you should use a tool like Enterprise Manager. This is particularly true if you are looking after hundreds of databases.

But what if you are only looking after one or two or just testing something out? Well the most common solution is to simply tail the alert log file.

The only issue is that it's not the most exciting thing to view, this of course could be said for any terminal based text file. But there are things you can do to make it easier to parse visually and improve your chances of catching an unexpected issue.

The approach I take is to push the alert log file through python and use the various libraries to brighten it up. It's very easy to go from this (tail -f)

Screenshot of ScreenFloat (20-03-2018, 08-25-26)

To this

Screenshot of ScreenFloat (20-03-2018, 08-25-58)

The reason this works is that python provides a rich set of libraries which can add a little bit of colour and formatting to the alert file.

You can find the code to achieve this in the gist below

Just a quick note on installing this. You'll need either python 2.7 or 3 available on your server.

I'd also recommend installing pip and then the following libraries

pip install humanize psutil colorama python-dateutil

After you've done that it's just a case of running the script. If you have $ORACLE_BASE and $ORACLE_SID set the library will try and make a guess at the location of the alert file. i.e


But if that doesn't work or you get an error you can also explicitly specify the location of the alert log with something like

python -a $ORACLE_BASE/diag/rdbms/orcl/orcl/trace/alert_orcl.log

This script isn't supposed to be an end product just a simple example of what can be achieved to make things a little easier. And whilst I print information like CPU load and Memory there's nothing to stop you from modifying the script to display the number of warnings or errors found in the alert log and update it things change. Or if you really want to go wild implement something similar but a lot more sophisticated using python and curses

The age of "Terminal" is far from over….

Installing Python 2.7 in local directory on Oracle Bare Metal Cloud

I know there will be Linux and Python specialist spitting feathers about this approach. But if you're in need of an up to date python environment then the following approach might be of help.It's worth nothing that this technique will probably work on most Oracle Enterprise Linux or Red Hat Platform releases.

#make localdirectory to install python i.e.
mkdir python

#make sure latest libs needs for python are installed
sudo yum install openssl openssl-devel
sudo yum install zlib-devel

#Download latest source i.e. Python-2.7.13.tgz and uncompress
tar xvfz Python-2.7.13.tgz

cd Python-2.7.13
#configure python for local install 
config --enable-shared --prefix=/home/opc/python --with-libs=/usr/local/lib
make; make install

#python 2.7.13 is now installed but isn't currently being used (2.6 is still the default)

#get pip
curl -o
#install pip (this will still be installed with 2.6)
sudo python
#install virtualenv
sudo pip install virtualenv
#create a virtualenv using the newly installed python
virtualenv -p /home/opc/python/bin/python myvirtualenv
#activate it
source myvirtualenv/bin/activate
#install packages…
pip install cx_Oracle

Changing the size of redo logs in python

I create a lot of small databases to do testing on. The trouble is that I often need to change the size of redo log files when I'm testing large transaction workloads or loading a lot of data. Now there are lots of better ways to do whats shown in the code below but this approach gave me the chance to keep brushing up my python skills and use the might cx_oracle driver. The following should never be considered anything but a nasty hack but it does save me a little bit of time i.e. don't use this on anything but a test database… Clearly the sensible way to do this is to write my own scripts to build databases.

The following code works it's way through the redo log files drops one thats inactive and then simply recreates it. It finished when it's set all of the redo to the right size.

Running the script is simply a case of running it with the parameters shown below

python ChangeRedoSize -u sys -p welcome1 -cs myserver/orclcdb --size 300

Note : the user is the sysdba of the container database if you are using the multitenant arhcitecture and the size is in Mega Bytes.

You should then see something similar to the following

Current Redo Log configuration
| Group No. | Thread No. | Sequence No. | Size (MB) | No of Members |  Status  |
|     1     |     1      |     446      | 524288000 |       1       | INACTIVE |
|     2     |     1      |     448      | 524288000 |       1       | CURRENT  |
|     3     |     1      |     447      | 524288000 |       1       |  ACTIVE  |
alter system switch logfile
alter system switch logfile
alter database drop logfile group 2
alter database add logfile group 2 size 314572800
alter system switch logfile
alter database drop logfile group 1
alter database add logfile group 1 size 314572800
alter system switch logfile
alter system switch logfile
alter system switch logfile
alter system switch logfile
alter database drop logfile group 3
alter database add logfile group 3 size 314572800
alter system switch logfile
All logs correctly sized. Finishing...
New Redo Log configuration
| Group No. | Thread No. | Sequence No. | Size (MB) | No of Members |  Status  |
|     1     |     1      |     455      | 314572800 |       1       |  ACTIVE  |
|     2     |     1      |     454      | 314572800 |       1       | INACTIVE |
|     3     |     1      |     456      | 314572800 |       1       | CURRENT  |

Interpolating data with Python


So as usual for this time of year I find myself on vacation with very little to do. So I try and find personal projects that interest me. This is usually a mixture of electronics and mucking around with software in a way that I don't usally find the time for normally. One of projects is my sensor network.

I have a number of Raspberry Pi's around my house and garden that take measurements of temperature, humidity, pressure and light. They hold the data locally and then periodically upload them to a central server (another Raspberry Pi) where they are aggregated. However for any number of reasons (usally a power failure) the raspberrypi's occasionally restart and are unable to join the network. This means that some of their data is lost. I've improved their resiliance to failure and so it's a less common occurance but it's still possible for it to happen. When this means I'm left with some ugly gaps in an otherwise perfect data set. It's not a big deal but it's pretty easy to fix. Before I begin, I acknolwedge that I'm effectively "making up" data to make graphs "prettier".

In the following code notebook I'll be using Python and Pandas to tidy up the gaps.

To start with I need to load the libraries to process the data. The important ones are included at the start of the imports. The rest from "SensorDatabaseUtilities" aren't really relevant since they are just helper classes to get data from my repository

In [75]:
import matplotlib.pyplot as plt
from matplotlib import style
import pandas as pd
import matplotlib
import json
from import json_normalize
# The following imports are from my own Sensor Library modules and aren't really relevant
from SensorDatabaseUtilities import AggregateItem
from SensorDatabaseUtilities import SensorDatabaseUtilities
# Make sure the charts appear in this notebook and are readable
%matplotlib inline
matplotlib.rcParams['figure.figsize'] = (20.0, 10.0)

The following function is used to convert a list of JSON documents (sensor readings) into a Pandas DataFrame. It then finds the minimum and maximum dates and creates a range for that period. It uses this period to find any missing dates. The heavy lifting of the function uses the reindex() function to insert new entries whilst at the same time interpolating any missing values in the dataframe. It then returns just the newly generated rows

In [76]:
def fillin_missing_data(sensor_name, results_list, algorithm='linear', order=2):
    # Turn list of json documents into a json document
    results = {"results": results_list}
    # Convert JSON into Panda Dataframe/Table
    df = json_normalize(results['results'])
    # Convert Date String to actual DateTime object
    df['Date'] = pd.to_datetime(df['Date'])
    # Find the max and min of the Data Range and generate a complete range of Dates
    full_range = pd.date_range(df['Date'].min(), df['Date'].max())
    # Find the dates that aren't in the complete range
    missing_dates = full_range[~full_range.isin(df['Date'])]
    # Set the Date to be the index
    df.set_index(['Date'], inplace=True)
    # Reindex the data filling in the missing date and interpolating missing values
    if algorithm in ['spline', 'polynomial'] :
        df = df.sort_index().reindex(full_range).interpolate(method=algorithm, order=order)
    elif algorithm in ['ffill', 'bfill']:
        df = df.sort_index().reindex(full_range, method=algorithm)
        df = df.sort_index().reindex(full_range).interpolate(method=algorithm)
    # Find the dates in original data set that have been added
    new_dates = df[df.index.isin(missing_dates)]
    # Create new aggregate records and insert them into the database
    # new_dates.apply(gen_json,axis=1, args=[sensor_name])
    return new_dates

This function simply takes an array of JSON documents and converts them into a DataFrame using the Pandas json_normalize function. It provides us with the dataset that contains missing data i.e. an incomplete data set.

In [77]:
def json_to_dataframe(results_list):
    # Turn list of json documents into a json dodument
    results = {"results": results_list}
    # Convert JSON into Panda Dataframe/Table
    df = json_normalize(results['results'])
    return df

The first step is to pull the data from the database. I'm using some helper functions to do this for me. I've also selected a date range where I know I have a problem.

In [92]:
utils = SensorDatabaseUtilities('raspberrypi', 'localhost')
data = utils.getRangeData('20-jan-2015', '10-feb-2015')
# The following isn't need in the code but is included just to show the structure of the JSON Record
'{"Date": "2015-01-20 00:00:00", "AverageHumidity": 35.6, "AverageTemperature": 18.96, "AveragePressure": 99838.78, "AverageLight": 119.38}'

Next simply convert the list of JSON records into a Pandas DataFrame and set it's index to the "Date" Column. NOTE : Only the first 5 records are shown

In [93]:
incomplete_data = json_to_dataframe(data)
# Find the range of the data and build a series with all dates for that range 
full_range = pd.date_range(incomplete_data['Date'].min(), incomplete_data['Date'].max())
incomplete_data['Date'] = pd.to_datetime(incomplete_data['Date'])
incomplete_data.set_index(['Date'], inplace=True)
# Show the structure of the data set when converted into a DataFrame
AverageHumidity AverageLight AveragePressure AverageTemperature
2015-01-20 35.60 119.38 99838.78 18.96
2015-01-21 38.77 63.65 99617.15 19.48
2015-01-22 37.45 143.00 100909.08 20.08
2015-01-23 35.52 119.87 101306.30 20.12
2015-01-24 39.72 92.43 101528.54 19.90

The following step isn't needed but simply shows the problem we have. In this instance we are missing the days for Janurary 26th 2015 to Janurary 30th 2015

In [94]:
#incomplete_data.set_index(['Date'], inplace=True)
problem_data = incomplete_data.sort_index().reindex(full_range)
axis = problem_data['AverageTemperature'].plot(kind='bar')

Pandas offers you a number of approaches for interpolating the missing data in a series. They range from the simple method of backfilling or forward filling values to the more powerful approaches of methods such as "linear", "quadratic" and "cubic" all the way through to the more sophisticated approaches of "pchip", "spline" and "polynomial". Each approach has its benefits and disadvantages. Rather than talk through each it's much simpler to show you the effect of each interpolation on the data. I've used a line graph rather than a bar graph to allow me to show all of the approaches on a single graph.

In [95]:
interpolation_algorithms = ['linear', 'quadratic', 'cubic', 'spline', 'polynomial', 'pchip', 'ffill', 'bfill']

fig, ax = plt.subplots()
for ia in interpolation_algorithms:
    new_df = pd.concat([incomplete_data, fillin_missing_data('raspberrypi', data, ia)])
    ax = new_df['AverageTemperature'].plot()

handles, not_needed = ax.get_legend_handles_labels()
ax.legend(handles, interpolation_algorithms, loc='best')

Looking at the graph it appears that either pchip (Piecewise Cubic Hermite Interpolating Polynomial) or Cubic interpolation is going to provide the best approximation for the missing values in my data set. This is largely subjective because these are "made up values" but I believe either of these approaches provide values that are closest to what the data could have been.

The next step is to apply one to the incomplete data set and store it back in the database

In [96]:
complete_data = pd.concat([incomplete_data, fillin_missing_data('raspberrypi', data, 'pchip')])

axis = complete_data.sort_index()['AverageTemperature'].plot(kind='bar')

And thats it. I've made the code much more verbose that it needed to be purely to demonstrate the point. Pandas makes it very simple to patch up a data set.


Java Version Performance

Sometimes it’s easy to loose track of the various version numbers for software as they continue their march ever onwards. However as I continue my plans to migrate onto Java8 and all of the coding goodness that lies within I thought it was a sensible to check what difference it would make to swingbench in terms of performance.

Now before we go any further it’s worth pointing out this was a trivial test and my results might not be representative of what you or anyone else might find.

My environment was

iMac (Retina 5K, 27-inch, Late 2014), 4 GHz Intel Core i7, 32 GB 1600 MHz DDR3 with a 500GB SSD

Oracle Database 12c ( with the January Patch Bundle running in a VM with 8GB of memory.

The test is pretty simple but might have an impact on your choice of JVM when generating a lot of data (TB+) with swingbench. I simply created a 1GB SOE with the oewizard. This is a pretty CPU intensive operation for the entire stack : swingbench, the jdbc drivers and the database. The part of the operation that should be most effected by the performance of the JVM is the “Data Generation Step”.

So enough talk what impact did the JVM version have?


Now the numbers might not be earth shattering but it’s nice to know a simple upgrade of the JVM can result in nearly a 25% improvement in performance of a CPU/database intensive batch job. I expect these numbers to go up as I optimise some of the logic to take advantage of Java8 specific functionality.


PDF Generation of report files

I finally got round to adding some code that creates pdf files such that you can convert the “XML” result files into something more readable. However this new functionality requires a Java8 VM to work. You can download the latest build here.

All you need to do is to run swingbench and from the menu save the summary results.


Minibench and charbench will automatically create a results config file in the local directory after a benchmark run. The file that’s created will typically start with “result” and it should look something like this.

[bin]$ ls ccwizard.xml coordinator oewizard shwizard.xmlbmcompare charbench data oewizard.xml swingbenchccconfig.xml clusteroverview debug.log results.xml swingconfig.xmlccwizard clusteroverview.xml minibench shwizardAll you need to do after this is to run the “results2pdf command

[bin]$ ./results2pdf -c results2pdf

There’s really only 2 command line options

[bin]$ ./results2pdf -husage: parameters: -c the config file to convert from xml to pdf -debug send debug information to stdout -h,--help print this message -o output filenameThey are for the input file (-c) and the output file (-o).

The resultant file will contain tables and graphs. The type of data will depend heavily on the type of stats collected in the benchmark. For the richest collection you should enable

  • Full stats collection
  • Database statistics collection
  • CPU collection

An example of the output can be found here.


I plan to try and have the resultant pdf generated and displayed at the end of every bench mark. I’ll include this functionality in a future build.