2011
New build of swingbench 2.4
08/12/11 19:56 Filed in: Swingbench
I’ve just uploaded a new build of swingbench with the following fixes
You can find it here. Let me know if you find any problems.
- Wizard now writes output to debug.log correctly with the -debugFile option
- Fixed bug where partitioning was always installed regardless. This meant the wizards couldn't install against a standard edition database
- Fixed a bug where swingbench wouldn't start unless all users were able to log on. Swingbench will now start when it has logged on as many users as it can.
- Fixed a bug where users were incorrectly counted as being logged on
- Fixed a bug where logging wasn't correctly written to a debug file
- Changed -debugFile to -debugf to provide consistency with other tools
- Changed to way users logged on is reported in verbose mode of charbench
- Updated the secure shell libraries to support Solaris 11
You can find it here. Let me know if you find any problems.
Comments
2.4 makes it to stable status
08/11/11 16:02 Filed in: Swingbench
At last I feel that I’ve fixed enough bugs and had enough feed back to change 2.4 to stable status. This means that this should be the default client most groups use. Based on the feedback I get I’ll remove 2.3 from the downloads page.
There has been a few changes in this release. These include
The new Overview chart is show below

As per usual you can download it here
There has been a few changes in this release. These include
- Improvements to the Overviewchart to provide dynamic YXAxis and floating legends
- The overview chart now uses a 5 second rolling window for response time metrics as opposed to a meaningless average
- “Minibench” has been made a little bigger
- Numerous fixes to the wizards
- Numerous fixes to Universal Connection Pooling
- New command line options
The new Overview chart is show below

As per usual you can download it here
New Build of swingbench 2.4
06/05/11 18:43 Filed in: Swingbench
It’s been 2 or 3 months since the last drop of swingbench and so it will come as no surprise that I’ve released another build. This release features
- Fix to enable the jdbc version of the SOE benchmark to work without errors. I’ll be updating it over the coming week to try and make it as close as possible to the PL/SQL version
- A new parameter (and UI change) to support the disconnect/connection of sessions after a fixed number of transactions
- Various fixes
New build of swingbench 2.4
15/02/11 10:22 Filed in: Swingbench
I’ve just uploaded a new build of swingbench 2.4 it has a number of bug fixes which sort out some of the following issues
You can download it here
Let me know if you have any problems via the comment page
- Unexpected termination due to driver issues
- Unclear timeline text on the overview chart
- Numerous UI and java issues
You can download it here
Let me know if you have any problems via the comment page
Notes on pre-parsing data for Oracle data loads
14/02/11 18:53 Filed in: Oracle
Sometimes data simply isn't in a form that is easy to load into an Oracle database i.e. column form. It would be great if everybody exchanged data in a simple CSV form with a single file to table mapping. Sadly that isn't the case and sometimes you have to do a little work to get it into a form thats useable. A recent benchmark highlighted this issue very well. The customer provided the data in compressed CSV form (so far so good) but the data was held in key value pairs (not so good). They also provided us with a mapping file that describes how it all fits together.
Now typically the approach many people would take would be to develop some form of program that parses all of the data and writes it to staging area and then loads all of it in one go to the target database. I make no criticism of this approach since it works well and as long as its not time critical. It's by far the simplest method. However Im a big fan of taking advantage of whats already available and one of the most underused and powerful features of the Oracle database is the preparser. It enables you to pipeline various operations so they all run as quickly as possible. So going back to my benchmark we used this approach to load data into out target database. It consisted of 4 steps
The java program simple reads from stdin and writes to stdout. To handle key value pairs just required the program to read the mapping file in and split and parse the values from stdin. The data was then written to stdout in a well know order.
Java extract from my program...
All that was needed for sqlloader to process the files was a control file that understood the order of the columns and any additional formatting.
One of the additional benefits is that we can load the data via "direct path" and implement other features such as multi table insert. The Java preparser enables you to add all of the additional formatting to make this a trivial process.
The following diagram illustrates the process.

This equates into a Unix/Linux statement such as
NOTE : one thing you may have noticed is that Im using dd to do 1MB I/Os. This just an efficiency operation and works well on structures such as DBFS, you could skip this part of the operation if needed.
Which brings us onto external tables and the preparser
External Tables and pre-parsers
As I mentioned earlier I like to take advantage of functionality that's already available and one of those features in the Oracle database is external tables. I don't intend to go into much detail as to why you should use external tables other than they do much of the heavy lifting for you and they provide a seamless interface between the filesystem and the database. They effectively make files look like tables.
This means it's trivial to implement parallelism for our pre-parser. We don't need to worry about how to handle the files and how to schedule everything, external tables take care of all of that for you. In our benchmark we used them in the following way

Our previous pipeline remains the same except that we don't need sqlloader its all managed by the table definition itself. So we end up with something similar to the following for the table definition (I've abbreviated it quite substantially and highlighted the important bits)
One of the things to note is that I've included the pipelined preprocessor inside of a shell script which looks like this
The important part of this script is the parameter ($1) that is passed to the shell script. This is the file name that the external table wants to process.
A simple select statement from my "staging_ext_mydata_jan01" unzips and parses the data converting it to usable columns. Whats more if I issue the statement in parallel Oracle takes care of creating the processes for me and making sure everything is scheduled in an orderly fashion.
To finish the load we simply used a multi table insert to put the data into the correct tables in an efficient fashion. Using this approach we were able to read zipped files, parse them and insert them into our three target tables at over 1.5 million source records/sec.
Now typically the approach many people would take would be to develop some form of program that parses all of the data and writes it to staging area and then loads all of it in one go to the target database. I make no criticism of this approach since it works well and as long as its not time critical. It's by far the simplest method. However Im a big fan of taking advantage of whats already available and one of the most underused and powerful features of the Oracle database is the preparser. It enables you to pipeline various operations so they all run as quickly as possible. So going back to my benchmark we used this approach to load data into out target database. It consisted of 4 steps
- Read the data of the filesystem as efficiently as possible and write it to stdout
- Read from stdin and Unzip the the data writing it to stdout
- Read from stdin into a java program to do the key value mapping and error detection/correction writing the output to stdout
- Read from stdin into sqlloader
The java program simple reads from stdin and writes to stdout. To handle key value pairs just required the program to read the mapping file in and split and parse the values from stdin. The data was then written to stdout in a well know order.
Java extract from my program...
BufferedReader br = new BufferedReader(new InputStreamReader(System.in), OneMB); String line = null; HashMapkeyValuePairs = null; MyTokenizer mt = null; while ((line = br.readLine()) != null) { keyValuePairs = new HashMap (200); mt = new MyTokenizer(line, delimitor); for (String token : mt) { int loc = token.indexOf("="); if (loc != -1) { String i = token.substring(0, loc); String s = token.substring(loc + 1, token.length()); keyValuePairs.put(i, s); } StringBuffer outRec = new StringBuffer(1000); outRec.append(checkForNull(keyValuePairs.get("uniqueID"), "")).append(seperator); // mapping logic similar to above repeats System.out.println(outRec.toString()); }
All that was needed for sqlloader to process the files was a control file that understood the order of the columns and any additional formatting.
One of the additional benefits is that we can load the data via "direct path" and implement other features such as multi table insert. The Java preparser enables you to add all of the additional formatting to make this a trivial process.
The following diagram illustrates the process.

This equates into a Unix/Linux statement such as
/bin/dd if=myverybigfile.txt bs=1024k status=noxfer 2>/dev/null | /bin/gunzip -c | java -classpath /home/oracle/loader.jar com.dom.KeyValueParserStdIn | sqlloader bench/bench control=kv.ctl data=\"-\" direct=TRUE;
NOTE : one thing you may have noticed is that Im using dd to do 1MB I/Os. This just an efficiency operation and works well on structures such as DBFS, you could skip this part of the operation if needed.
Which brings us onto external tables and the preparser
External Tables and pre-parsers
As I mentioned earlier I like to take advantage of functionality that's already available and one of those features in the Oracle database is external tables. I don't intend to go into much detail as to why you should use external tables other than they do much of the heavy lifting for you and they provide a seamless interface between the filesystem and the database. They effectively make files look like tables.
This means it's trivial to implement parallelism for our pre-parser. We don't need to worry about how to handle the files and how to schedule everything, external tables take care of all of that for you. In our benchmark we used them in the following way

Our previous pipeline remains the same except that we don't need sqlloader its all managed by the table definition itself. So we end up with something similar to the following for the table definition (I've abbreviated it quite substantially and highlighted the important bits)
create table staging_ext_mydata_jan01 ( uniqueid NUMBER, .. -- Lots of columns .. ) ORGANIZATION EXTERNAL ( TYPE oracle_loader DEFAULT DIRECTORY load_dir_jan01 ACCESS PARAMETERS ( RECORDS DELIMITED BY NEWLINE PREPROCESSOR exec_dir:'external_tab.sh' BADFILE log_dir: 'external.bad' LOGFILE log_dir: 'external.log' fields terminated by '|' OPTIONALLY ENCLOSED BY '"' AND '"' ( uniqueid char(100), .. -- Lots of defintions .. ) ) LOCATION('data1.txt',data2.txt'...'data100.txt')) REJECT LIMIT UNLIMITED;
One of the things to note is that I've included the pipelined preprocessor inside of a shell script which looks like this
/bin/dd if=$1 bs=1024k status=noxfer 2>/dev/null | /bin/gunzip -c | java -classpath /home/oracle/loader.jar com.dom.KeyValueParserStdIn
The important part of this script is the parameter ($1) that is passed to the shell script. This is the file name that the external table wants to process.
A simple select statement from my "staging_ext_mydata_jan01" unzips and parses the data converting it to usable columns. Whats more if I issue the statement in parallel Oracle takes care of creating the processes for me and making sure everything is scheduled in an orderly fashion.
To finish the load we simply used a multi table insert to put the data into the correct tables in an efficient fashion. Using this approach we were able to read zipped files, parse them and insert them into our three target tables at over 1.5 million source records/sec.