Jupyter, Python, Oracle and Docker Part 3
Create Oracle Database in a Docker Container¶
These Python (3.6) scripts walk through the creation of a database and standby server. The reason I've done this rather than just take the default configuration is that this approach gives me a lot more control over the build and enables me to change specifc steps. If you are just wanting to get the Oracle Database running inside of Docker I strongly suggest that you use the docker files and guides in the Oracle Github repository. The approach documented below is very much for someone who is interested in a high level of control over the various steps in the installation and configuration of the Oracle Database. This current version is build on top of Oracle's Internal GiaaS docker image but will be extended to support the public dockers images as well. It aims to build an Active Data Guard model with maximum performance but can be trivially changed to support any of the required models.
It uses a mix of calls to the Docker Python API and Calls direct to the databases via cx_Oracle.
The following code section imports the needed modules to setup up our Docker container to create the Oracle Database. After that we get a Docker client handle which enables use to call the API against our local Docker environment.
import docker
import humanize
import os
import tarfile
from prettytable import PrettyTable
import cx_Oracle
from IPython.display import HTML, display
import keyring
from ipynb.fs.full.OracleDockerDatabaseFunctions import list_images,list_containers,copy_to,create_and_run_script,containter_exec,containter_root_exec,copy_string_to_file
client = docker.from_env(timeout=600)
list_images(client)
Configuration Parameters¶
The following section contains the parameters for setting the configuration of the install. The following parameters image_name
,host_oradata
,sb_host_oradata
need to be changed, although sb_host_oradata
is only important if you are planning on installing a standby database.
# The following parameters are specific to your install and almost certainly need to be changed
image_name = 'cc75a47617' # Taken from the id value above
host_oradata = '/Users/dgiles/Downloads/dockerdbs/oradataprimary' # The directory on the host where primary database will be persisted
sb_host_oradata = '/Users/dgiles/Downloads/dockerdbs/oradatastby' # The direcotry on the host where the standby database will be persisted
#
# The rest are fairly generic and can be changed if needed
oracle_version = '18.0.0'
db_name = 'ORCL'
stby_name = 'ORCL_STBY'
sys_password = keyring.get_password('docker','sys') # I'm just using keyring to hide my password but you can set it to a simple sting i.e. 'mypassword'
pdb_name = 'soe'
p_host_name = 'oracleprimary'
sb_host_name = 'oraclestby'
total_memory = 2048
container_oradata = '/u01/app/oracle/oradata'
o_base = '/u01/app/oracle'
r_area = f'{o_base}/oradata/recovery_area'
o_area = f'{o_base}/oradata/'
a_area = f'{o_base}/admin/ORCL/adump'
o_home = f'{o_base}/product/{oracle_version}/dbhome_1'
t_admin = f'{o_base}/oradata/dbconfig'
log_size = 200
Create Primary Database¶
This code does the heavy lifting. It creates a container oracleprimary (unless you changed the paramter) running the Oracle Database. The containers 1521 port is mapped onto the the hosts 1521 port. This means that to connect from the host, via a tool like sqlplus, all you'd need to do is sqlplus soe/soe@//locahost/soe
.
path = f'{o_home}/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin'
p_container = client.containers.create(image_name,
command="/bin/bash",
hostname=p_host_name,
tty=True,
stdin_open=True,
auto_remove=False,
name=p_host_name,
shm_size='3G',
# network_mode='host',
ports={1521:1521,5500:5500},
volumes={host_oradata: {'bind': container_oradata, 'mode': 'rw'}},
environment={'PATH':path,'ORACLE_SID': db_name, 'ORACLE_BASE': o_base,'TNS_ADMIN': t_admin}
)
p_container.start()
# Make all of the containers attributes available via Python Docker API
p_container.reload()
The next step uses DBCA and configures features like Automatic Memory Mangement, Oracle Managed Files and sets the size of the SGA and redo logs. It prints out the status of the creation during it's progression. NOTE : This step typically takes 10 to 12 minutes.
statement = f'''dbca -silent \
-createDatabase \
-templateName General_Purpose.dbc \
-gdbname {db_name} -sid {db_name} -responseFile NO_VALUE \
-characterSet AL32UTF8 \
-sysPassword {sys_password} \
-systemPassword {sys_password} \
-createAsContainerDatabase true \
-numberOfPDBs 1 \
-pdbName {pdb_name} \
-pdbAdminPassword {sys_password} \
-databaseType MULTIPURPOSE \
-totalMemory {total_memory} \
-memoryMgmtType AUTO_SGA \
-recoveryAreaDestination "{r_area}" \
-storageType FS \
-useOMF true \
-datafileDestination "{o_area}" \
-redoLogFileSize {log_size} \
-emConfiguration NONE \
-ignorePreReqs\
'''
containter_exec(p_container, statement)
Create Primary Database's Listener¶
This step creates the database listener for the primary database. The tnsnames.ora
will be over written in a later step if you choose to have a stand by configuration. NOTE : I could create a DNSMasq container or something similar and add container networking details to make the whole inter node communication simpler but it's a bit of an overkill and so we'll use IP addresses which are easily found.
p_ip_adress = p_container.attrs['NetworkSettings']['IPAddress']
p_listener = f'''LISTENER=
(DESCRIPTION=
(ADDRESS = (PROTOCOL=tcp)(HOST={p_ip_adress})(PORT=1521))
(ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1521))
)
SID_LIST_LISTENER=
(SID_LIST=
(SID_DESC=
(GLOBAL_DBNAME={db_name}_DGMGRL)
(ORACLE_HOME={o_home})
(SID_NAME={db_name})
(ENVS="TNS_ADMIN={t_admin}"
)
)
'''
copy_string_to_file(p_listener, f'{t_admin}/listener.ora', p_container)
contents = '''NAMES.DIRECTORY_PATH= (TNSNAMES, EZCONNECT)'''
copy_string_to_file(contents, f'{t_admin}/sqlnet.ora', p_container)
contents = f'''
ORCL =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = {p_ip_adress})(PORT=1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SID = {db_name})
)
)
'''
copy_string_to_file(contents, f'{t_admin}/tnsnames.ora', p_container)
And start the listener
containter_exec(p_container, 'lsnrctl start')
At this stage you should have a fully functioning Oracle Database. In theory there's no need to go any further if thats all you want.
Create Stand By Container¶
This step creates another container to run the standby databases. It should be pretty much instant. NOTE : You'll only need to run the rest of the code from here if you need a standby database.
sb_container = client.containers.create(image_name,
hostname=sb_host_name,
command="/bin/bash",
tty=True,
stdin_open=True,
auto_remove=False,
name=sb_host_name,
shm_size='3G',
ports={1521:1522,5500:5501},
volumes={sb_host_oradata: {'bind': container_oradata, 'mode': 'rw'}},
environment={'PATH':path,'ORACLE_SID':db_name,'ORACLE_BASE':o_base,'TNS_ADMIN':t_admin}
)
sb_container.start()
# Make all of the containers attributes available via Python Docker API
sb_container.reload()
Display the running containers.
list_containers(client)
Configure the Standby Database¶
Create some additional directories on the standby so they are consistent with the primary.
containter_exec(sb_container, f'mkdir -p {o_area}/{db_name}')
containter_exec(sb_container, f'mkdir -p {t_admin}')
containter_exec(sb_container, f'mkdir -p {r_area}/{db_name}')
containter_exec(sb_container, f'mkdir -p {a_area}')
Create Standby Database's Listener¶
Create the standby listenrs network configuration and then start the listener. NOTE : We'll be overwriting the primary databases tnsnames.ora
file in this step.
sb_ip_adress = sb_container.attrs['NetworkSettings']['IPAddress']
contents = f'''
ORCL =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = {p_ip_adress})(PORT=1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SID = {db_name})
)
)
ORCL_STBY =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = {sb_ip_adress})(PORT=1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SID = {db_name})
)
)
'''
copy_string_to_file(contents, f'{t_admin}/tnsnames.ora', p_container)
copy_string_to_file(contents, f'{t_admin}/tnsnames.ora', sb_container)
sb_listener = f'''LISTENER=
(DESCRIPTION=
(ADDRESS = (PROTOCOL=tcp)(HOST={sb_ip_adress})(PORT =1521))
(ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1521))
)
SID_LIST_LISTENER=
(SID_LIST=
(SID_DESC=
(GLOBAL_DBNAME={stby_name}_DGMGRL)
(ORACLE_HOME={o_home})
(SID_NAME={db_name})
(ENVS="TNS_ADMIN={t_admin}"
)
)
'''
copy_string_to_file(sb_listener, f'{t_admin}/listener.ora', sb_container)
contents = '''NAMES.DIRECTORY_PATH= (TNSNAMES, EZCONNECT)'''
copy_string_to_file(contents, f'{t_admin}/sqlnet.ora', sb_container)
And start the listener
containter_exec(sb_container, 'lsnrctl start')
Configure the servers for Data Guard¶
It might be necessary to pause for a few seconds before moving onto the next step to allow the database to register with the listener...
The next step is to connect to primary and standby servers and set various parameters and configuration to enable us to run Data Guard.
First check the status of archive logging on the primary.
connection = cx_Oracle.connect("sys",sys_password,f"//localhost:1521/{db_name}", mode=cx_Oracle.SYSDBA)
cursor = connection.cursor();
rs = cursor.execute("SELECT log_mode FROM v$database")
for row in rs:
print(f"Database is in {row[0]} mode")
By default it will be in no archivelog mode so we need to shut it down and enable archive log mode
contents = '''sqlplus / as sysdba << EOF
SET ECHO ON;
SHUTDOWN IMMEDIATE;
STARTUP MOUNT;
ALTER DATABASE ARCHIVELOG;
ALTER DATABASE OPEN;
EOF
'''
create_and_run_script(contents, '/tmp/set_archivelog.sql', '/bin/bash /tmp/set_archivelog.sql', p_container)
And check again
connection = cx_Oracle.connect("sys",sys_password,f"//localhost:1521/{db_name}", mode=cx_Oracle.SYSDBA)
cursor = connection.cursor();
rs = cursor.execute("SELECT log_mode FROM v$database")
for row in rs:
print(f"Database is in {row[0]} mode")
And then force a log switch
cursor.execute("ALTER DATABASE FORCE LOGGING")
cursor.execute("ALTER SYSTEM SWITCH LOGFILE")
Add some Standby Logging Files
cursor.execute("ALTER DATABASE ADD STANDBY LOGFILE SIZE 200M")
cursor.execute("ALTER DATABASE ADD STANDBY LOGFILE SIZE 200M")
cursor.execute("ALTER DATABASE ADD STANDBY LOGFILE SIZE 200M")
cursor.execute("ALTER DATABASE ADD STANDBY LOGFILE SIZE 200M")
Enable Flashback and standby file management
cursor.execute("ALTER DATABASE FLASHBACK ON")
cursor.execute("ALTER SYSTEM SET STANDBY_FILE_MANAGEMENT=AUTO")
Start an instance¶
Create a temporary init.ora file to enable us to start an instance on the standby
contents = f"DB_NAME='{db_name}'\n"
copy_string_to_file(contents, f'/tmp/init{db_name}.ora', sb_container)
Create a password file on the standby with the same parameters as the primary
containter_exec(sb_container, f'orapwd file=$ORACLE_HOME/dbs/orapw{db_name} password={sys_password} entries=10 format=12')
And start up the standby instance
contents = f'''STARTUP NOMOUNT PFILE='/tmp/init{db_name}.ora';
EXIT;
'''
create_and_run_script(contents, '/tmp/start_db.sql', 'sqlplus / as sysdba @/tmp/start_db.sql', sb_container)
Duplicate the Primary database to the Standby database¶
Duplicate the primary to the standby. For some reason the tnsnames isn't picked up on the primary/standby in the same location so an explicit connection string is needed.
contents = f'''rman TARGET sys/{sys_password}@{db_name} AUXILIARY sys/{sys_password}@{stby_name} << EOF
DUPLICATE TARGET DATABASE
FOR STANDBY
FROM ACTIVE DATABASE
DORECOVER
SPFILE
SET db_unique_name='{stby_name}' COMMENT 'Is standby'
NOFILENAMECHECK;
EOF
'''
create_and_run_script(contents, '/tmp/duplicate.sh', "/bin/bash /tmp/duplicate.sh", sb_container)
Start the Data Guard Broker¶
It's best practice to use Data Guard Broker and so we'll need to start it on both instances and then create a configuration.
cursor.execute("ALTER SYSTEM SET dg_broker_start=true")
sb_connection = cx_Oracle.connect("sys",sys_password,f"//localhost:1522/{stby_name}", mode=cx_Oracle.SYSDBA)
sb_cursor = sb_connection.cursor()
sb_cursor.execute("ALTER SYSTEM SET dg_broker_start=true")
Create a configuration
contents = f'''export TNS_ADMIN={t_admin};
dgmgrl sys/{sys_password}@{db_name} << EOF
SET ECHO ON;
CREATE CONFIGURATION orcl_stby_config AS PRIMARY DATABASE IS {db_name} CONNECT IDENTIFIER IS {db_name};
EXIT;
EOF
'''
create_and_run_script(contents, '/tmp/dgconfig.sh', "/bin/bash /tmp/dgconfig.sh", p_container)
Add the standby
contents = f'''export TNS_ADMIN={t_admin};
dgmgrl sys/{sys_password}@{db_name} << EOF
SET ECHO ON;
ADD DATABASE {stby_name} AS CONNECT IDENTIFIER IS {stby_name} MAINTAINED AS PHYSICAL;
EXIT;
EOF
'''
create_and_run_script(contents, '/tmp/dgconfig2.sh', "/bin/bash /tmp/dgconfig2.sh", p_container)
Enable the configuration
contents = f'''export TNS_ADMIN={t_admin};
dgmgrl sys/{sys_password}@{db_name} << EOF
SET ECHO ON;
ENABLE CONFIGURATION;
EXIT;
EOF
'''
create_and_run_script(contents, '/tmp/dgconfig3.sh', "/bin/bash /tmp/dgconfig3.sh", p_container)
Display the configuration
contents = f'''export TNS_ADMIN={t_admin};
dgmgrl sys/{sys_password}@{db_name} << EOF
SET ECHO ON;
SHOW CONFIGURATION;
SHOW DATABASE {db_name};
SHOW DATABASE {stby_name};
EOF
'''
create_and_run_script(contents, '/tmp/dgconfig4.sh', "/bin/bash /tmp/dgconfig4.sh", p_container)
Start the Standby in managed recovery¶
We now need to start the standby so it begins applying redo to keep it consistent with the primary.
contents='''sqlplus / as sysdba << EOF
SET ECHO ON;
SHUTDOWN IMMEDIATE;
STARTUP MOUNT;
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION;
EOF
'''
create_and_run_script(contents, '/tmp/convert_to_active.sh', "/bin/bash /tmp/convert_to_active.sh", sb_container)
Standby Database Creation Complete¶
We now have a primary and standby database that we can begin testing with.
Additional Steps¶
At this point you should have a physical standby database that is running in maximum performance mode. This might be enough for the testing you want to carry out but there's a number of possible changes that you might want to consider.
- Change the physical standby database to an Active Standby
- Convert the current mode (Maximum Performance) to Maximum Protection or Maximum Availability
- Configure the Oracle Database 19c Active Data Guard feature, DML Redirect
I'll cover these in the following sections but they "icing on the cake" rather than required.
Active Data Guard¶
This is a relatively trivial change. We just need to alter the standby database to open readonly and then start managed recovery as before
contents='''sqlplus / as sysdba << EOF
SET ECHO ON;
SHUTDOWN IMMEDIATE;
STARTUP MOUNT;
ALTER DATABASE OPEN READ ONLY;
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION;
EOF
'''
create_and_run_script(contents, '/tmp/convert_to_active.sh', "/bin/bash /tmp/convert_to_active.sh", sb_container)
Maximum Performance to Maximum Availability¶
For this change we'll use the Database Guard Broker command line tool to make the change
contents = f'''
dgmgrl sys/{sys_password}@{db_name} << EOF
SET ECHO ON;
SHOW CONFIGURATION;
edit database {stby_name} set property logxptmode=SYNC;
edit configuration set protection mode as maxavailability;
SHOW CONFIGURATION;
EOF
'''
create_and_run_script(contents, '/tmp/max_avail.sh', "/bin/bash /tmp/max_avail.sh", p_container)
Maximum Performance to Maximum Protection¶
As before we'll use the Database Guard Broker command line tool to make the change.
contents = f'''
dgmgrl sys/{sys_password}@{db_name} << EOF
SET ECHO ON;
SHOW CONFIGURATION;
edit database {stby_name} set property logxptmode=SYNC;
edit configuration set protection mode as maxprotection;
SHOW CONFIGURATION;
EOF
'''
create_and_run_script(contents, '/tmp/max_prot.sh', "/bin/bash /tmp/max_prot.sh", p_container)
Back to Max Perfromance¶
We'll use Database Guard Broker to change us back to asynchronus mode.
contents = f'''
dgmgrl sys/{sys_password}@{db_name} << EOF
SET ECHO ON;
SHOW CONFIGURATION;
edit configuration set protection mode as maxperformance;
edit database {stby_name} set property logxptmode=ASYNC;
SHOW CONFIGURATION;
EOF
'''
create_and_run_script(contents, '/tmp/max_prot.sh', "/bin/bash /tmp/max_prot.sh", p_container)
Oracle Database 19c Acvtive Data Guard DML Redirect¶
On Oracle Database 19c we can also enable DML redirect from the standby to the primary. I'll add this on the release of the Oracle Database 19c software for on premises.
Jupyter Notebooks, Python and Oracle Installation on Linux
Installing Jupyter-lab and Docker environment on Linux¶
The following walk-through guides you through the steps needed to set up your enviroment to run Jupyter-lab, Oracle and Docker to build and run docker images for testing. This should work for either an on premises install or on Oracle's cloud using IaaS (Compute). This walkthough will serve primarily as a reminder to myself.
Prerequisites¶
I'm making the assumption that you're running on Linux (I've a similar walkthrough for mac). In my example I'm using Oracle Enterprise Linux 7 (OEL). I'm also assuming a few other things
- Python 3.6 or higher is installed
- You have access to root either directly or via sudo. In this example I'm installing everything in the Oracle user account that has sudo privelege
- Docker is installed. If it isn't, see this excellent guide. If you aren't running as root also make sure you follow this final step as well.
Install¶
The install is pretty simple. It consists of setting up python, installing Oracle Instant client, installing Git and then cloning this directory to the server. Lets start with setting up the Python Environment
Python Setup¶
By default OEL 7 runs Python 2 rather than Python 3 which is likely to change in the future, until then we have a few steps we need to run through. The first is to install pip and virtualenv. We can easily do this on OEL with yum
sudo yum install python36-pip
If this command fails saying something like No package python36-pip available.
then you'll need to edit the yum config file at /etc/yum.repos.d/public-yum-ol7.repo
and enable the software development repos.
The next step is to install virtualenv. Virtualenv enables us to create isolated sandpits to develop Python applications without running into module or library conflicts. Once we have pip installed it's very simple to install
sudo pip3.6 install virtualenv
Next we can create a virtual environment and enable it.
virtualenv -p /usr/bin/python36 myvirtualenv
source myvirtualenv/bin/activate
This will create a directory called myvirtualenv
(you can call it what you like) with it's own version of the python interpreter and pip. Once we "active it", any library we install will only be in this directory and won't effect the system as a whole. You should see you command prompt change when you activate it. It should look something like this
[04:59 PM : oracle@ora18server ~]$ virtualenv -p /usr/bin/python36 myvirtualenv
Running virtualenv with interpreter /usr/bin/python36
Using base prefix '/usr'
No LICENSE.txt / LICENSE found in source
New python executable in /home/oracle/myvirtualenv/bin/python36
Also creating executable in /home/oracle/myvirtualenv/bin/python
Installing setuptools, pip, wheel...
done.
[05:00 PM : oracle@ora18server ~]$ source myvirtualenv/bin/activate
(myvirtualenv) [05:00 PM : oracle@ora18server ~]$ ls
Desktop ora18server-certificate.crt sql swingbench
myvirtualenv OracleUtils sqlcl wallet
(myvirtualenv) [05:00 PM : oracle@ora18server ~]$
Running the following command will show what Python models we have installed at this point.
(myvirtualenv) [05:00 PM : oracle@ora18server ~]$ pip list
Package Version
---------- -------
pip 19.0.3
setuptools 40.8.0
wheel 0.33.1
Which shouldn't be very many.
Git Installation¶
We now need to install Git which is useful for managing and versioning code. That might not be a requirement for you but it also makes it very simple to clone existing repostories. Installing it is very simple.
sudo yum install git
We can now clone my IPython/Jupyter notebooks from github which provides you with the code for creating your own Oracle Docker Images.
git clone https://github.com/domgiles/JuypterLabWork.git
This will create a directory call JuypterLabWork
Installing Oracle Instant Client¶
One of the recent updates to Oracles install models is the support for RPMs and yum installations without the need for click through agreements. This makes it very simple to install a client with a single command
sudo yum install oracle-instantclient18.3-basic
This will typically install the the software into /usr/lib/oracle/18.3
Installing Jupyter-Lab¶
In the JupyterLabWork directory that was created when we ran the git clone command there's a file called requirments.txt
. This is a list of modules needed to run the notebooks in that directory. To install them all you need to do is to run the command
pip install -r requirements.txt
This will install all of the needed modules. From there all we need to do is to run the command
jupyter-lab
If you're running directly on a workstation or virtual machine and have a browser installed it should take you directly into the jupyter environment.
If you're running headless then when jupyter-lab starts it should give you a url that you can connect to. Look for something like
[I 12:52:30.450 NotebookApp] The Jupyter Notebook is running at:
[I 12:52:30.450 NotebookApp] http://oracle18cserver:8888/?token=f71e677e202f5fffc3d20fe458ff973e616e0dc3b8eaf072
[I 12:52:30.450 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
That's it.
Connecting to Oracle ADB from Python
Connect to Oracle ATP or ADW from Python¶
I realised that in some of my previous posts I didn't really detail connecting to ATP and ADW. Here's a slight more in depth walkthough.
Connecting to Oracle Autonomous Transaction Processing or Autonomous Datawarehouse is pretty simple from Python. It requires only a few things
- Oracle Instant Client (Or alternative)
- A Python environment.
- The Oracle_CX Python driver Module.
- A valid wallet for an ATP or ADW service
Let's go through each of these in turn
Oracle Instant Clent¶
The next step is pretty straight forward. You can download the oracle instant client from here.
You'll only need the basic package. Unzip the downloaded file into a suitable location. It's worth pointing out that on Linux this step is even easier. You can now use yum to install the instant client direct from the command line. You can find details on how to configure it here
Python environment¶
There's plenty of guides out there that show you how to install python on your windows or mac. If you haven't done this already This guide is a good place to start. I'm assuming that you've also gone through the steps of installing pip
. If not you can follow this simple guide. I'd also advice you create a virtual environment with virtualenv
before you doing anything else. It's considered best practice and isolates you from current or future library conficts.
First lets create our virtual env
virtualenv adb_virt_env
And then active it (I'm assuming linux or mac)
source adb_virt_env/bin/activate
The next step is to install the Python driver. This is as simple as
pip install cx_Oracle
And thats all we need to do at this stage inside to setup our Python environment.
Oracle ADW or ATP Wallet¶
The final thing we need is the wallet containing the credential and connect string details to enable us to connect to ATP or ADW. You'll need to log onto Oracle OCI console to do this unless have been provided the wallet by a colleague. Simply navigate to your ATP or ADW instance and follow the instructions below.
While it's not necessary we'll download and unzip the wallet into the virtual directory we've created (adb_virt_env).
$> ls
bin cx_Oracle-doc include lib pip-selfcheck.json wallet_SBATP.zip
$> unzip wallet_SBATP.zip
Archive: wallet_SBATP.zip
inflating: cwallet.sso
inflating: tnsnames.ora
inflating: truststore.jks
inflating: ojdbc.properties
inflating: sqlnet.ora
inflating: ewallet.p12
inflating: keystore.jks
$> ls
bin cx_Oracle-doc include lib pip-selfcheck.json tnsnames.ora wallet_SBATP.zip
cwallet.sso ewallet.p12 keystore.jks ojdbc.properties sqlnet.ora truststore.jks
Next we need to edit the sqlnet.ora
file to reflect the location where it's located. Currently for my environment it looks like
WALLET_LOCATION = (SOURCE = (METHOD = file) (METHOD_DATA = (DIRECTORY="?/network/admin")))
SSL_SERVER_DN_MATCH=yes
We'll need to change the DIRECTORY
parameter to our virtual environment. In my case /Users/dgiles/Downloads/adb_virt_env
. So for my environment it will look like
WALLET_LOCATION = (SOURCE = (METHOD = file) (METHOD_DATA = (DIRECTORY="/Users/dgiles/Downloads/adb_virt_env")))
SSL_SERVER_DN_MATCH=yes
We should also take a look at tnsnames.ora to see which services we'll be using. You can do this by taking a look in the tnsnames.ora
file. There's likely to by lots of entries if you have lots of ATB or ADW instances in you OCI compartment. In my instance I'll be using a connect string called sbatp_medium
which has a medium priority but pick the one appropriate to your environment.
sbatp_medium = (description= (address=(protocol=tcps)(port=1522)(host=adb.us-phoenix-1.oraclecloud.com))(connect_data=(service_name=gebqwccvhjbqbs_sbatp_medium.atp.oraclecloud.com))(security=(ssl_server_cert_dn=
"CN=adwc.uscom-east-1.oraclecloud.com,OU=Oracle BMCS US,O=Oracle Corporation,L=Redwood City,ST=California,C=US")) )
We'll only need to remember its name for the next step.
The Code¶
Finally we're ready to write some code. The first step is to import the modules we'll need. In this case it's just cx_oracle
and os
import cx_Oracle
import os
We need to set the environment variable TNS_ADMIN
to point at our directory (adb_virt_env) where all of the files from our wallet are located.
os.environ['TNS_ADMIN'] = '/Users/dgiles/Downloads/adb_virt_env'
And now we can simply connect to ATP or ADW instance using a standard Python database connect operation using the connect string we remebered from the tnsnames.ora
file. NOTE : I'm assuming you've created a user in the database or you're using the admin user created for this instance.
connection = cx_Oracle.connect('admin', 'ReallyLongPassw0rd', 'sbatp_medium')
And thats it... From here on in we can use the connection as it was a local database.
cursor = connection.cursor()
rs = cursor.execute("select 'Hello for ADB' from dual")
rs.fetchall()
Update to MonitorDB
You can add the location in the configuration file or
On the command line
I've also compiled it for Java8 and used the latest jdbc drivers.
You can find it here
Oracle SODA Python Driver and Jupyter Lab
Oracle SODA Python Driver and Jupyter Lab¶
This workbook is divided into two sections the first is a quick guide to setting up Jupyter Lab (Python Notebooks) such that it can connect to a database running inside of OCI, in this case an ATP instance. The second section uses the JSON python driver to connect to the database to run a few tests. This notebook is largely a reminder to myself on how to set this up but hopefully it will be useful to others.
Setting up Python 3.6 and Jupyter Lab on our compute instance¶
I won't go into much detail on setting up ATP or ADW or creating a IaaS server. I covered that process in quite a lot of detail here. We'll be setting up something similar to the following
Once you've created the server You'll need to logon to the server with the details found on the compute instances home screen. You just need to grab it's IP address to enable you to logon over ssh.
The next step is to connect over ssh to with a command similar to
ssh opc@132.146.27.111
Enter passphrase for key '/Users/dgiles/.ssh/id_rsa':
Last login: Wed Jan 9 20:48:46 2019 from host10.10.10.1
In the following steps we'll be using python so we need to set up python on the server and configure the needed modules. Our first step is to use yum to install python 3.6 (This is personal preference and you could stick with python 2.7). To do this we first need to enable yum and then install the environment. Run the following commands
sudo yum -y install yum-utils
sudo yum-config-manager --enable ol7_developer_epel
sudo yum install -y python36
python3.6 -m venv myvirtualenv
source myvirtualenv/bin/activate
This will install python and enable a virtual environment for use (our own Python sand pit). You can make sure that python is installed by simply typing python3.6 ie.
$> python3.6
Python 3.6.3 (default, Feb 1 2018, 22:26:31)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-16)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> quit()
Make sure you type quit()
to leave the REPL environment.
We now need to install the needed modules. You can do this one by one or simply use the following file requirements.txt and run the following command
pip -p requirements.txt
This will install all of the need python modules for the next step which is to start up Jupyter Lab.
Jupyter Lab is an interactive web based application that enables you do interactively run code and document the process at the same time. This blog is written in it and the code below can be run once your environment is set up. Vists the website to see more details.
To start jupyer lab we run the following command.
nohup jupyter-lab --ip=127.0.0.1 &
Be aware that this will only work if you have activated you virtual environment. In out instance we did this with with the command source myvirtualenv/bin/activate
. At this point the jupyter-lab is running in the background and and is listening (by default) on port 8888. You could start a desktop up and use VNC to view the output. However I prefer to redirect the output to my own desktop over ssh. To do this you'll need to run the following ssh command from your desktop
ssh -N -f -L 5555:localhost:8888 opc@132.146.27.111
Replacing the IP address above with the one for your compute instance
This will redirect the output of 8888 to port 5555 on your destop. You can then connect to it by simply going to the following url http://localhost:5555
. After doing this you should see a screen asking you input a token (you'll only need to do this once). You can find this token inside of the nohup.out
file running on the compute instance. It should be near the top of the file and should look something like
[I 20:49:12.237 LabApp] http://127.0.0.1:8888/?token=216272ef06c7b7cb3fa8da4e2d7c727dab77c0942fac29c8
Just copy the text after "token=" and paste it in to the dialogue box. After completing that step you should see something like this
You can now start creating your own notebooks or load this one found here. I'd visit the website to familiarise yourself on how the notebooks work.
Using Python and the JSON SODA API¶
This section will walk through using The SODA API with Python from within the Jupyter-lab envionment we set up in the previous section. The SODA API is a simple object API that enables developers persist and retrieve JSON documents held inside of the Oracle Database. SODA drivers are available for Java, C, REST, Node and Python.
You can find the documentation for this API here
To get started we'll need to import the need the following python modules
import cx_Oracle
import keyring
import os
import pandas as pd
We now need to set the location of the directory containing the wallet to enable us to connect to the ATP instance. Once we've done that we can connect to the Oracle ATP instance and get a SODA object to enable us to work with JSON documents. NOTE : I'm using the module keyring
to hide the password for my database. You should replace this call with the password for your user.
os.environ['TNS_ADMIN'] = '/home/opc/Wallet'
connection = cx_Oracle.connect('json', keyring.get_password('ATPLondon','json'), 'sbatp_tpurgent')
soda = connection.getSodaDatabase()
We now need to create JSON collection and if needed add any additional indexes which might accelerate data access.
try:
collection = soda.createCollection("customers_json_soda")
collection.createIndex({ "name" : "customer_index",
"fields" : [ { "path" : "name_last",
"datatype" : "string"}]})
except cx_Oracle.DatabaseError as ex:
print("It looks like the index already exists : {}".format(ex))
We can now add data to the collection. Here I'm inserting the document into the database and retrieving it's key. You can find find some examples/test cases on how to use collections here
customer_doc = {"id" : 1,
"name_last" : "Giles",
"name_first" : "Dom",
"description" : "Gold customer, since 1990",
"account info" : None,
"dataplan" : True,
"phones" : [{"type" : "mobile", "number" : 9999965499},
{"type" : "home", "number" : 3248723987}]}
doc = collection.insertOneAndGet(customer_doc)
connection.commit()
To fetch documents we could use SQL or Query By Example (QBE) as shown below. You can find further details on QBE here. In this example there should just be a single document. NOTE: I'm simply using pandas DataFrame
to render the retrieved data but it does highlight how simple it is to use the framework for additional analysis at a later stage.
documents = collection.find().filter({'name_first': {'$eq': 'Dom'}}).getDocuments()
results = [document.getContent() for document in documents]
pd.DataFrame(results)
To update records we can use the replaceOne
method.
document = collection.find().filter({'name_first': {'$eq': 'Dom'}}).getOne()
updated = collection.find().key(doc.key).replaceOne({"id" : 1,
"name_last" : "Giles",
"name_first" : "Dominic",
"description" : "Gold customer, since 1990",
"account info" : None,
"dataplan" : True,
"phones" : [{"type" : "mobile", "number" : 9999965499},
{"type" : "home", "number" : 3248723987}]},)
connection.commit()
And just to make sure the change happened
data = collection.find().key(document.key).getOne().getContent()
pd.DataFrame([data])
And finally we can drop the collection.
try:
collection.drop()
except cx_Oracle.DatabaseError as ex:
print("We're were unable to drop the collection")
connection.close()
Accessing the Oracle Object Store
OCI Object Store Examples¶
The following are a series of examples showing the loading of data into the Oracle Object Store. For these to work with your own data you'll need to have your own Oracle Cloud account and uploaded a key. You can find details on how to achieve this here
I'll be using the Oracle OCI Python SDK which wrappers the REST API. You can find details on the API here
Before we do anything we'll need to load the required needed Python modules.
import oci
import keyring
import ast
import os
Configuration needed to connect¶
I'm using the "keyring" Python module to hold the config for my connection to OCI (to avoid needlessly exposing sensitive information). It's of the form
{
"user": "your user ocid",
"key_file": "the path to your private key file",
"fingerprint": "the fingerprint of your public key",
"tenancy": "your tenancy ocid",
"region": "the region you are working with"
}
After retrieving it from my keyring store I then need to convert it into a dictionary before using it. You can also validate the config you are using as well. Handy if this is the first time you've configured it.
my_config = ast.literal_eval(keyring.get_password('oci_opj','doms'))
oci.config.validate_config(my_config)
Create object storage client¶
Then I just need to retireve a Object Storage client to start working with data
object_storage_client = oci.object_storage.ObjectStorageClient(my_config)
namespace = object_storage_client.get_namespace().data
bucket_name = "doms_object_store"
Upload the contents of user directory to a bucket¶
I'll create a bucket and then select all of the files from a user defined directory and upload them to the newly created bucket
import os, io
directory = '/Users/dgiles/datagenerator/bin/generateddata'
files_to_process = [file for file in os.listdir(directory) if file.endswith('csv')]
Create a bucket named "Sales_Data" and give it the tenancy ocid from your config.
try:
create_bucket_response = object_storage_client.create_bucket(
namespace,
oci.object_storage.models.CreateBucketDetails(
name='Sales_Data',
compartment_id=my_config['tenancy']
)
)
except Exception as e:
print(e.message)
Then we just need to loop through the list of files in the directory specified and upload them to the newly created bucket
bucket_name = 'Sales_Data'
for upload_file in files_to_process:
print('Uploading file {}'.format(upload_file))
object_storage_client.put_object(namespace, bucket_name, upload_file, io.open(os.path.join(directory,upload_file),'r'))
Retrieve a list of objects in a bucket¶
The folowing retrieves a bucket and gets a list of objects in the bucket
bucket = object_storage_client.get_bucket(namespace, bucket_name)
object_list = object_storage_client.list_objects(namespace, bucket_name)
for o in object_list.data.objects:
print(o.name)
Download the contents of an object¶
The following downloads a file from a named bucket in chunks and writes it to user defined directory on the client
# Attempt to download a file
object_name = "CUSTOMERS.csv"
destination_dir = '/Users/dgiles/Downloads'.format(object_name)
get_obj = object_storage_client.get_object(namespace, bucket_name, object_name)
with open(os.path.join(destination_dir,object_name), 'wb') as f:
for chunk in get_obj.data.raw.stream(1024 * 1024, decode_content=False):
f.write(chunk)
Delete a bucket¶
We can just as simply delete the bucket we've just created but first we'll need to delete all of the objects inside of it.
object_list = object_storage_client.list_objects(namespace, bucket_name)
for o in object_list.data.objects:
print('Deleting object {}'.format(o.name))
object_storage_client.delete_object(namespace, bucket_name, o.name)
print('Deleting bucket')
response = object_storage_client.delete_bucket(namespace, bucket_name)
Making the alert log just a little more readable
But what if you are only looking after one or two or just testing something out? Well the most common solution is to simply tail the alert log file.
The only issue is that it's not the most exciting thing to view, this of course could be said for any terminal based text file. But there are things you can do to make it easier to parse visually and improve your chances of catching an unexpected issue.
The approach I take is to push the alert log file through python and use the various libraries to brighten it up. It's very easy to go from this (tail -f)
To this
The reason this works is that python provides a rich set of libraries which can add a little bit of colour and formatting to the alert file.
You can find the code to achieve this in the gist below
Just a quick note on installing this. You'll need either python 2.7 or 3 available on your server.
I'd also recommend installing pip and then the following libraries
pip install humanize psutil colorama python-dateutil
After you've done that it's just a case of running the script. If you have $ORACLE_BASE and $ORACLE_SID set the library will try and make a guess at the location of the alert file. i.e
python alertlogparser.py
But if that doesn't work or you get an error you can also explicitly specify the location of the alert log with something like
python alertlogparser.py -a $ORACLE_BASE/diag/rdbms/orcl/orcl/trace/alert_orcl.log
This script isn't supposed to be an end product just a simple example of what can be achieved to make things a little easier. And whilst I print information like CPU load and Memory there's nothing to stop you from modifying the script to display the number of warnings or errors found in the alert log and update it things change. Or if you really want to go wild implement something similar but a lot more sophisticated using python and curses
The age of "Terminal" is far from over….
ORDS From : Start to Finish
Introduction
This is likely to be a pretty long posting on ORDS to describe the workflow for creating a REST application from start to finish. With that said I'm going to have to make a few compromises to keep it tolerable to follow. I thought long and hard about what would be a meaningful example that would be easy to understand but show most of the aspects of building a REST based application using Oracle ORDS. With this in mind I settled on a simple micro service to enable users to "like" products/rows in a similar fashion to being able to like posts on Facebook/Twitter or other social media networks. Whilst there are plenty of examples of how to do this out there already I couldn't find many that explained the entire process from start to finish and using the Oracle Database as the target. I'll try and keep it up to date as we are expecting to see a few changes in SQLDeveloper and ORDS shortly. On completion we should have a web page that looks a little like the one below.
The code for this blog is maintained here
https://github.com/domgiles/ORDS_Example
To create a working environment simply go to the link above, click on the "Clone or download" button and then click on the "Download ZIP" link. When it's finished downloading uncompress the file into your preferred location.
This post should largely serve as a reference for those wishing to test out this functionality themselves. I'll also be releasing a version of this code where we store JSON in the database as opposed to relational table to offer a comparison with the approach.
ORDS Description
First of all what is ORDS and why do we care. Well this is Oracle's description
"Oracle REST Data Services (ORDS) makes it easy to develop modern REST interfaces for relational data in the Oracle Database and now, with ORDS 3.0, the Oracle Database 12c JSON Document Store and Oracle NoSQL Database. ORDS is available both as an Oracle Database Cloud Service and on premise.
REST has become the dominant interfaces for accessing services on the Internet, including those provided by major vendors such as Google, Facebook, Twitter, and Oracle, and within the enterprise by leading companies throughout the world. REST provides a powerful yet simple alternative to standards such as SOAP with connectivity to virtually every language environment, without having to install client drivers, because it is based on simple HTTP calls which virtually all language environments support.
For relational data, ORDS 3.0 now automatically generates REST endpoints for about a dozen major single table operations including create, query, update, and delete. Easy to write JSON filters allow query predicates and sorts to be specified in a query-by-example format. For more complex operations, REST calls are mapped to SQL and PL/SQL routines you can write which return data in JSON and other formats.
ORDS 3.0 enables a wide range of operations to be performed on JSON documents in the Oracle Database 12c Document Store using the Simple Oracle Document Access (SODA) API over REST. These operations include create, query, update, and delete with support for JSON filters."
With that said lets get started.
Prerequisites
If you plan to install this example you'll need to have the following components available. I won't be describing their installation. I'll assume you'll have done this ahead of time
- Oracle Database (SE or EE) 11.1 or later. Oracle Database 12c is the preferred version. It can be configured either as a PDB or not. You can download it from here
- Oracle SQL Developer 4.1.3. It needs to be at least this version. You can down load it here
- Oracle ORDS 3.0.6. You can download it here. (NOTE : I will walkthrough this installation later in the document)
- A Java Virtual Machine on the server/laptop/VM you are running the example on. I'd recommend Java 8 available here
- Alternatively you could download a Virtual Box VM such as the "Hands-on Database Application Development" VM from here.Which will provide everything you need in a ready configured package.
I'm also going to assume that you'll be running on a unix type OS (Linux, Mac OS, Solaris, AIX). Although this will also work fine on windows (just need to convert some of the commands).
My setup
For my setup I have the following configuration.
In reality everything is running on my workstation but the the diagram above provides a working model
Install of ORDS
Assuming you aren't going down the pre installed Virtual Box VM mentioned in the previous section. The install of ORDS is very simple. After you've downloaded it to your VM/Server/Laptop all you need to do is unzip it and change into the directory . i.e.In the directory you unzipped the download run ORDS. The first time you run this it will lead you through the install asking you for the connection to the database. I'd recommended skipping configuring APEX at this stage (by selecting 2 when asked) and running ORDS using http. Whilst these options might not be ideal for production environments they will enable you to get started.
After this completes hit control-c to exit the program. At this stage you should probably add a user with SQLDeveloper development support. I've named mine "appdev" but any name would be fine. We'll use this user later in the example. You can do this using the following command and supplying passwords.
And then restart ORDS. This time it won't prompt you for any passwords but simply run in the foreground
Thats all thats needed to install ORDS.
Install Database Tables
The next step is to create a user and tablespace to hold the tables. I've provided a simple script "create_user.sql" that will create a database user "SOE" and tablespace "SOE" to run it simply connect to the database with a user with DBA privilege. (NOTE : This could also be done inside of SQLDeveloper)
The script creates a user "soe" and and tablespace "soe" and then grants the user access to the tablespace. It finishes off by granting the user enough privileges to create tables, views and procedures
At this stage it's worth starting SQLDeveloper where the bulk of the work will be done. You'll need to create a connection to the Database .You'll do this from the connections tab as show below.
Then fill in the details and hit connect
The next step is to create the example tables and test data. I've provided a simple script "create_tables.sql to do this for you. It will create one to hold the "product_information" that would be typically part of and existing application and other to hold the likes. This table is called "user_likes".
The "USER_LIKES" table is an Index Organised Table. Whilst the trivial amount of data we holding in this example won't benefit from his approach it would likely have an impact on production scale workloads. The script will insert a few sample rows as well.
To run this script copy and paste the SQL into the SQLDeveloper
Install REST Service Application Code
At this point we'll connect to ORDS from SQLDeveloper. You'll be using the username you used during the installation of ORDS. In my case this was "appdev". From the menu you'll need to view "REST Data Services" and select the "Development" option as shown below
This will launch the REST development UI (by default on the left hand side of the screen) as shown below. Add a new connection and fill in the details as shown i.e. Server Path = "/ords" and Schema Workspace = "/soe/".
The next step is to run a script "utilities_module.sql" which will enable the schema to support ORDS and create the ORDS modules used by the application. This is purely for convenience. I'll describe the process of creating one of the modules in a blog shortly after this one. To install the REST modules simply copy and paste the code below into the SQLDeveloper session you've already created.
As shown below
After this script completes we can then download the modules into SQLDeveloper from ORDS's repository. The reason we need to do this is that we've simply imported the data into the ORDS repository and not made it visible to SQLDeveloper at this point. To achieve this all we need to do is
This will display the modules we created by running the script. The two modules are
- Products : This will simply retrieve the rows from the PRODUCTS table. It's not really necessary to explicitly declare this module as ORDS base functionality already has a REST API that could be used instead.
- Utilities : This module contains three templates or services we can call to like a row in a table
- get likes : Gets the total number of likes of a row by all users and the user specified in the parameter
- like : Enables you to like a row if you haven't already
- unlike : reverses a like operation by the user specifed if he's liked it before
The first tab list the SQL or PL/SQL that will implement the business logic. In this particular example we are using PL/SQL to first check if a user has liked a row by checking first the USER_LIKES table. If they haven't already liked the row we insert a new one and commit the transaction.
The next tab lists the parameters that are passed to our logic or SQL statement. In our particular example we are passing three parameters that tell the procedure the table that is being liked, the unique identifier for the row (in our simple case it's assumed to be a string but could as equally be a rowid or number) and the user identifier (typically application specific but we'll assume it's something like an email).
The final tab summarises some of the settings for the service and provides you with a url to call to test the service.
You can test most "GET" REST calls pretty trivially within a browser put it takes just a little bit more for POST, PUT and DELETE. To test the POST call shown in the screen shot above we are going to use a command line utility called curl (installed by default on linux and MacOS). We'll also use a file that contains the json we want to send to the REST service. This file is called "test.json" and has the following contents
To call the REST service all we need to do is issue the following command
Obviously substitute the name of your server in the code. You should get a HTTP return code of 200. Indicating that the code successfully ran.
We can also test the GET REST call that returns the "like" we just made using the following command
You should get a small JSON snippet with the number of likes by everybody and the user "dominic.giles"
Calling the REST Services from a web page
Now that we've got the services up and running we can use a simple web page that displays all of the rows from the PRODUCT_INFORMATION table with a "thumb" to enable us to like the row. Like all of the code used in this blog you can find them all on my github repository.Let me start by saying that the web page we are going to create is a very simple example and would normally require a lot more validation and error handling before it would ever be considered for a production environment.
The web page we'll put together uses JQuery and a javascript library called "DataTables" (link here). However it's important to point out that implementing this functionality doesn't require these tools. It could have been as easily achieved using Oracle Application Express or a framework like Jet. But to simplify things this approach allows us to minimise the steps required and reduce the amount of moving parts.
The final app should look like this
When the form is first loaded the javascript calls the "GetLikes" REST service (once for each row displayed) which asynchronously returns the total count of "likes" a product has received and sets the colour of the them dependent on whether the current user has liked it. It will stay grey if the current user hasn't liked it and turn blue if they have. On clicking on a thumb the javascript checks whether the user has previously like it. If they haven't it calls the "Like" REST Call and call the "Unlike" REST call if they have.
The code is made up broken up into two files. One containing the HTML and the other the containing the javascript.
And the javascript
This really isn't the place to go into all of the details of the code but there are one or two things that are worth pointing out. The first is at the start of liketable.js file. You'll need to change the following entries to reflect your own environment. It's likely you'll only need to change the "hostport" variable to start with. The username is hardcoded in this example but in the real world it would be derived from the system context.
var $userName = 'dominic.giles'; var $tableName = 'product_information'; var $hostport = 'http://oracle12c2:8080'; var $uniqueIDColumn = "product_id";
The other piece of code worth looking at is the calls to the REST service. The code below uses the javascript ajax functionality to make an asynchronous call to ORDS which with then return the JSON response.
function getRowLikes(un, tn, rID) { var likeResult = $.ajax({ url: $hostport + "/ords/soe/utilities/getlikes?table_name=" + tn + "&column_value=" + rID + "&user_id=" + un, type: "GET" }); return likeResult; }
Likewise the following shows the AJAX call to the like function. This call unlike the the previous one is performed synchronously but follows the same principals.
function likeRow(un, tn, rID) { $.ajax({ url: $hostport + "/ords/soe/utilities/like", type: "POST", data: {"table_name": tn, "column_value": rID, "user_id": un}, async: false }); }
Once you've made the changes to the hostname variable you should be able to open a browser and open the products.html file within it.
Over the coming days I'll knock up a little screen cast to accompany this blog.
A new member of the family
Notes on pre-parsing data for Oracle data loads
Now typically the approach many people would take would be to develop some form of program that parses all of the data and writes it to staging area and then loads all of it in one go to the target database. I make no criticism of this approach since it works well and as long as its not time critical. It's by far the simplest method. However Im a big fan of taking advantage of whats already available and one of the most underused and powerful features of the Oracle database is the preparser. It enables you to pipeline various operations so they all run as quickly as possible. So going back to my benchmark we used this approach to load data into out target database. It consisted of 4 steps
- Read the data of the filesystem as efficiently as possible and write it to stdout
- Read from stdin and Unzip the the data writing it to stdout
- Read from stdin into a java program to do the key value mapping and error detection/correction writing the output to stdout
- Read from stdin into sqlloader
The java program simple reads from stdin and writes to stdout. To handle key value pairs just required the program to read the mapping file in and split and parse the values from stdin. The data was then written to stdout in a well know order.
Java extract from my program...
BufferedReader br = new BufferedReader(new InputStreamReader(System.in), OneMB); String line = null; HashMapkeyValuePairs = null; MyTokenizer mt = null; while ((line = br.readLine()) != null) { keyValuePairs = new HashMap (200); mt = new MyTokenizer(line, delimitor); for (String token : mt) { int loc = token.indexOf("="); if (loc != -1) { String i = token.substring(0, loc); String s = token.substring(loc + 1, token.length()); keyValuePairs.put(i, s); } StringBuffer outRec = new StringBuffer(1000); outRec.append(checkForNull(keyValuePairs.get("uniqueID"), "")).append(seperator); // mapping logic similar to above repeats System.out.println(outRec.toString()); }
All that was needed for sqlloader to process the files was a control file that understood the order of the columns and any additional formatting.
One of the additional benefits is that we can load the data via "direct path" and implement other features such as multi table insert. The Java preparser enables you to add all of the additional formatting to make this a trivial process.
The following diagram illustrates the process.
This equates into a Unix/Linux statement such as
/bin/dd if=myverybigfile.txt bs=1024k status=noxfer 2>/dev/null | /bin/gunzip -c | java -classpath /home/oracle/loader.jar com.dom.KeyValueParserStdIn | sqlloader bench/bench control=kv.ctl data=\"-\" direct=TRUE;
NOTE : one thing you may have noticed is that Im using dd to do 1MB I/Os. This just an efficiency operation and works well on structures such as DBFS, you could skip this part of the operation if needed.
Which brings us onto external tables and the preparser
External Tables and pre-parsers
As I mentioned earlier I like to take advantage of functionality that's already available and one of those features in the Oracle database is external tables. I don't intend to go into much detail as to why you should use external tables other than they do much of the heavy lifting for you and they provide a seamless interface between the filesystem and the database. They effectively make files look like tables.
This means it's trivial to implement parallelism for our pre-parser. We don't need to worry about how to handle the files and how to schedule everything, external tables take care of all of that for you. In our benchmark we used them in the following way
Our previous pipeline remains the same except that we don't need sqlloader its all managed by the table definition itself. So we end up with something similar to the following for the table definition (I've abbreviated it quite substantially and highlighted the important bits)
create table staging_ext_mydata_jan01 ( uniqueid NUMBER, .. -- Lots of columns .. ) ORGANIZATION EXTERNAL ( TYPE oracle_loader DEFAULT DIRECTORY load_dir_jan01 ACCESS PARAMETERS ( RECORDS DELIMITED BY NEWLINE PREPROCESSOR exec_dir:'external_tab.sh' BADFILE log_dir: 'external.bad' LOGFILE log_dir: 'external.log' fields terminated by '|' OPTIONALLY ENCLOSED BY '"' AND '"' ( uniqueid char(100), .. -- Lots of defintions .. ) ) LOCATION('data1.txt',data2.txt'...'data100.txt')) REJECT LIMIT UNLIMITED;
One of the things to note is that I've included the pipelined preprocessor inside of a shell script which looks like this
/bin/dd if=$1 bs=1024k status=noxfer 2>/dev/null | /bin/gunzip -c | java -classpath /home/oracle/loader.jar com.dom.KeyValueParserStdIn
The important part of this script is the parameter ($1) that is passed to the shell script. This is the file name that the external table wants to process.
A simple select statement from my "staging_ext_mydata_jan01" unzips and parses the data converting it to usable columns. Whats more if I issue the statement in parallel Oracle takes care of creating the processes for me and making sure everything is scheduled in an orderly fashion.
To finish the load we simply used a multi table insert to put the data into the correct tables in an efficient fashion. Using this approach we were able to read zipped files, parse them and insert them into our three target tables at over 1.5 million source records/sec.
New Build of Datagenerator
In the side panel you can now include scripts and parameters for the scripts.
I’ve also included the script files used for generating the SH and SOE schemas used by swingbench. This should make it easier to understand what is going on and potentially create your own versions of the schemas.
In this release I’ve also improved the threading model and included one or two other performance enhancements....
In the next release I’m going to try and add support for for well know data items such as zip/post codes, NI numbers, Social Security etc.... as well as allowing users to plug their own data generators in.
You can download it from the usual place and as before leave comments below or via the comments page.