Jupyter, Python, Oracle and Docker Part 3
Create Oracle Database in a Docker Container¶
These Python (3.6) scripts walk through the creation of a database and standby server. The reason I've done this rather than just take the default configuration is that this approach gives me a lot more control over the build and enables me to change specifc steps. If you are just wanting to get the Oracle Database running inside of Docker I strongly suggest that you use the docker files and guides in the Oracle Github repository. The approach documented below is very much for someone who is interested in a high level of control over the various steps in the installation and configuration of the Oracle Database. This current version is build on top of Oracle's Internal GiaaS docker image but will be extended to support the public dockers images as well. It aims to build an Active Data Guard model with maximum performance but can be trivially changed to support any of the required models.
It uses a mix of calls to the Docker Python API and Calls direct to the databases via cx_Oracle.
The following code section imports the needed modules to setup up our Docker container to create the Oracle Database. After that we get a Docker client handle which enables use to call the API against our local Docker environment.
import docker
import humanize
import os
import tarfile
from prettytable import PrettyTable
import cx_Oracle
from IPython.display import HTML, display
import keyring
from ipynb.fs.full.OracleDockerDatabaseFunctions import list_images,list_containers,copy_to,create_and_run_script,containter_exec,containter_root_exec,copy_string_to_file
client = docker.from_env(timeout=600)
list_images(client)
Configuration Parameters¶
The following section contains the parameters for setting the configuration of the install. The following parameters image_name
,host_oradata
,sb_host_oradata
need to be changed, although sb_host_oradata
is only important if you are planning on installing a standby database.
# The following parameters are specific to your install and almost certainly need to be changed
image_name = 'cc75a47617' # Taken from the id value above
host_oradata = '/Users/dgiles/Downloads/dockerdbs/oradataprimary' # The directory on the host where primary database will be persisted
sb_host_oradata = '/Users/dgiles/Downloads/dockerdbs/oradatastby' # The direcotry on the host where the standby database will be persisted
#
# The rest are fairly generic and can be changed if needed
oracle_version = '18.0.0'
db_name = 'ORCL'
stby_name = 'ORCL_STBY'
sys_password = keyring.get_password('docker','sys') # I'm just using keyring to hide my password but you can set it to a simple sting i.e. 'mypassword'
pdb_name = 'soe'
p_host_name = 'oracleprimary'
sb_host_name = 'oraclestby'
total_memory = 2048
container_oradata = '/u01/app/oracle/oradata'
o_base = '/u01/app/oracle'
r_area = f'{o_base}/oradata/recovery_area'
o_area = f'{o_base}/oradata/'
a_area = f'{o_base}/admin/ORCL/adump'
o_home = f'{o_base}/product/{oracle_version}/dbhome_1'
t_admin = f'{o_base}/oradata/dbconfig'
log_size = 200
Create Primary Database¶
This code does the heavy lifting. It creates a container oracleprimary (unless you changed the paramter) running the Oracle Database. The containers 1521 port is mapped onto the the hosts 1521 port. This means that to connect from the host, via a tool like sqlplus, all you'd need to do is sqlplus soe/soe@//locahost/soe
.
path = f'{o_home}/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin'
p_container = client.containers.create(image_name,
command="/bin/bash",
hostname=p_host_name,
tty=True,
stdin_open=True,
auto_remove=False,
name=p_host_name,
shm_size='3G',
# network_mode='host',
ports={1521:1521,5500:5500},
volumes={host_oradata: {'bind': container_oradata, 'mode': 'rw'}},
environment={'PATH':path,'ORACLE_SID': db_name, 'ORACLE_BASE': o_base,'TNS_ADMIN': t_admin}
)
p_container.start()
# Make all of the containers attributes available via Python Docker API
p_container.reload()
The next step uses DBCA and configures features like Automatic Memory Mangement, Oracle Managed Files and sets the size of the SGA and redo logs. It prints out the status of the creation during it's progression. NOTE : This step typically takes 10 to 12 minutes.
statement = f'''dbca -silent \
-createDatabase \
-templateName General_Purpose.dbc \
-gdbname {db_name} -sid {db_name} -responseFile NO_VALUE \
-characterSet AL32UTF8 \
-sysPassword {sys_password} \
-systemPassword {sys_password} \
-createAsContainerDatabase true \
-numberOfPDBs 1 \
-pdbName {pdb_name} \
-pdbAdminPassword {sys_password} \
-databaseType MULTIPURPOSE \
-totalMemory {total_memory} \
-memoryMgmtType AUTO_SGA \
-recoveryAreaDestination "{r_area}" \
-storageType FS \
-useOMF true \
-datafileDestination "{o_area}" \
-redoLogFileSize {log_size} \
-emConfiguration NONE \
-ignorePreReqs\
'''
containter_exec(p_container, statement)
Create Primary Database's Listener¶
This step creates the database listener for the primary database. The tnsnames.ora
will be over written in a later step if you choose to have a stand by configuration. NOTE : I could create a DNSMasq container or something similar and add container networking details to make the whole inter node communication simpler but it's a bit of an overkill and so we'll use IP addresses which are easily found.
p_ip_adress = p_container.attrs['NetworkSettings']['IPAddress']
p_listener = f'''LISTENER=
(DESCRIPTION=
(ADDRESS = (PROTOCOL=tcp)(HOST={p_ip_adress})(PORT=1521))
(ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1521))
)
SID_LIST_LISTENER=
(SID_LIST=
(SID_DESC=
(GLOBAL_DBNAME={db_name}_DGMGRL)
(ORACLE_HOME={o_home})
(SID_NAME={db_name})
(ENVS="TNS_ADMIN={t_admin}"
)
)
'''
copy_string_to_file(p_listener, f'{t_admin}/listener.ora', p_container)
contents = '''NAMES.DIRECTORY_PATH= (TNSNAMES, EZCONNECT)'''
copy_string_to_file(contents, f'{t_admin}/sqlnet.ora', p_container)
contents = f'''
ORCL =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = {p_ip_adress})(PORT=1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SID = {db_name})
)
)
'''
copy_string_to_file(contents, f'{t_admin}/tnsnames.ora', p_container)
And start the listener
containter_exec(p_container, 'lsnrctl start')
At this stage you should have a fully functioning Oracle Database. In theory there's no need to go any further if thats all you want.
Create Stand By Container¶
This step creates another container to run the standby databases. It should be pretty much instant. NOTE : You'll only need to run the rest of the code from here if you need a standby database.
sb_container = client.containers.create(image_name,
hostname=sb_host_name,
command="/bin/bash",
tty=True,
stdin_open=True,
auto_remove=False,
name=sb_host_name,
shm_size='3G',
ports={1521:1522,5500:5501},
volumes={sb_host_oradata: {'bind': container_oradata, 'mode': 'rw'}},
environment={'PATH':path,'ORACLE_SID':db_name,'ORACLE_BASE':o_base,'TNS_ADMIN':t_admin}
)
sb_container.start()
# Make all of the containers attributes available via Python Docker API
sb_container.reload()
Display the running containers.
list_containers(client)
Configure the Standby Database¶
Create some additional directories on the standby so they are consistent with the primary.
containter_exec(sb_container, f'mkdir -p {o_area}/{db_name}')
containter_exec(sb_container, f'mkdir -p {t_admin}')
containter_exec(sb_container, f'mkdir -p {r_area}/{db_name}')
containter_exec(sb_container, f'mkdir -p {a_area}')
Create Standby Database's Listener¶
Create the standby listenrs network configuration and then start the listener. NOTE : We'll be overwriting the primary databases tnsnames.ora
file in this step.
sb_ip_adress = sb_container.attrs['NetworkSettings']['IPAddress']
contents = f'''
ORCL =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = {p_ip_adress})(PORT=1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SID = {db_name})
)
)
ORCL_STBY =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = {sb_ip_adress})(PORT=1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SID = {db_name})
)
)
'''
copy_string_to_file(contents, f'{t_admin}/tnsnames.ora', p_container)
copy_string_to_file(contents, f'{t_admin}/tnsnames.ora', sb_container)
sb_listener = f'''LISTENER=
(DESCRIPTION=
(ADDRESS = (PROTOCOL=tcp)(HOST={sb_ip_adress})(PORT =1521))
(ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1521))
)
SID_LIST_LISTENER=
(SID_LIST=
(SID_DESC=
(GLOBAL_DBNAME={stby_name}_DGMGRL)
(ORACLE_HOME={o_home})
(SID_NAME={db_name})
(ENVS="TNS_ADMIN={t_admin}"
)
)
'''
copy_string_to_file(sb_listener, f'{t_admin}/listener.ora', sb_container)
contents = '''NAMES.DIRECTORY_PATH= (TNSNAMES, EZCONNECT)'''
copy_string_to_file(contents, f'{t_admin}/sqlnet.ora', sb_container)
And start the listener
containter_exec(sb_container, 'lsnrctl start')
Configure the servers for Data Guard¶
It might be necessary to pause for a few seconds before moving onto the next step to allow the database to register with the listener...
The next step is to connect to primary and standby servers and set various parameters and configuration to enable us to run Data Guard.
First check the status of archive logging on the primary.
connection = cx_Oracle.connect("sys",sys_password,f"//localhost:1521/{db_name}", mode=cx_Oracle.SYSDBA)
cursor = connection.cursor();
rs = cursor.execute("SELECT log_mode FROM v$database")
for row in rs:
print(f"Database is in {row[0]} mode")
By default it will be in no archivelog mode so we need to shut it down and enable archive log mode
contents = '''sqlplus / as sysdba << EOF
SET ECHO ON;
SHUTDOWN IMMEDIATE;
STARTUP MOUNT;
ALTER DATABASE ARCHIVELOG;
ALTER DATABASE OPEN;
EOF
'''
create_and_run_script(contents, '/tmp/set_archivelog.sql', '/bin/bash /tmp/set_archivelog.sql', p_container)
And check again
connection = cx_Oracle.connect("sys",sys_password,f"//localhost:1521/{db_name}", mode=cx_Oracle.SYSDBA)
cursor = connection.cursor();
rs = cursor.execute("SELECT log_mode FROM v$database")
for row in rs:
print(f"Database is in {row[0]} mode")
And then force a log switch
cursor.execute("ALTER DATABASE FORCE LOGGING")
cursor.execute("ALTER SYSTEM SWITCH LOGFILE")
Add some Standby Logging Files
cursor.execute("ALTER DATABASE ADD STANDBY LOGFILE SIZE 200M")
cursor.execute("ALTER DATABASE ADD STANDBY LOGFILE SIZE 200M")
cursor.execute("ALTER DATABASE ADD STANDBY LOGFILE SIZE 200M")
cursor.execute("ALTER DATABASE ADD STANDBY LOGFILE SIZE 200M")
Enable Flashback and standby file management
cursor.execute("ALTER DATABASE FLASHBACK ON")
cursor.execute("ALTER SYSTEM SET STANDBY_FILE_MANAGEMENT=AUTO")
Start an instance¶
Create a temporary init.ora file to enable us to start an instance on the standby
contents = f"DB_NAME='{db_name}'\n"
copy_string_to_file(contents, f'/tmp/init{db_name}.ora', sb_container)
Create a password file on the standby with the same parameters as the primary
containter_exec(sb_container, f'orapwd file=$ORACLE_HOME/dbs/orapw{db_name} password={sys_password} entries=10 format=12')
And start up the standby instance
contents = f'''STARTUP NOMOUNT PFILE='/tmp/init{db_name}.ora';
EXIT;
'''
create_and_run_script(contents, '/tmp/start_db.sql', 'sqlplus / as sysdba @/tmp/start_db.sql', sb_container)
Duplicate the Primary database to the Standby database¶
Duplicate the primary to the standby. For some reason the tnsnames isn't picked up on the primary/standby in the same location so an explicit connection string is needed.
contents = f'''rman TARGET sys/{sys_password}@{db_name} AUXILIARY sys/{sys_password}@{stby_name} << EOF
DUPLICATE TARGET DATABASE
FOR STANDBY
FROM ACTIVE DATABASE
DORECOVER
SPFILE
SET db_unique_name='{stby_name}' COMMENT 'Is standby'
NOFILENAMECHECK;
EOF
'''
create_and_run_script(contents, '/tmp/duplicate.sh', "/bin/bash /tmp/duplicate.sh", sb_container)
Start the Data Guard Broker¶
It's best practice to use Data Guard Broker and so we'll need to start it on both instances and then create a configuration.
cursor.execute("ALTER SYSTEM SET dg_broker_start=true")
sb_connection = cx_Oracle.connect("sys",sys_password,f"//localhost:1522/{stby_name}", mode=cx_Oracle.SYSDBA)
sb_cursor = sb_connection.cursor()
sb_cursor.execute("ALTER SYSTEM SET dg_broker_start=true")
Create a configuration
contents = f'''export TNS_ADMIN={t_admin};
dgmgrl sys/{sys_password}@{db_name} << EOF
SET ECHO ON;
CREATE CONFIGURATION orcl_stby_config AS PRIMARY DATABASE IS {db_name} CONNECT IDENTIFIER IS {db_name};
EXIT;
EOF
'''
create_and_run_script(contents, '/tmp/dgconfig.sh', "/bin/bash /tmp/dgconfig.sh", p_container)
Add the standby
contents = f'''export TNS_ADMIN={t_admin};
dgmgrl sys/{sys_password}@{db_name} << EOF
SET ECHO ON;
ADD DATABASE {stby_name} AS CONNECT IDENTIFIER IS {stby_name} MAINTAINED AS PHYSICAL;
EXIT;
EOF
'''
create_and_run_script(contents, '/tmp/dgconfig2.sh', "/bin/bash /tmp/dgconfig2.sh", p_container)
Enable the configuration
contents = f'''export TNS_ADMIN={t_admin};
dgmgrl sys/{sys_password}@{db_name} << EOF
SET ECHO ON;
ENABLE CONFIGURATION;
EXIT;
EOF
'''
create_and_run_script(contents, '/tmp/dgconfig3.sh', "/bin/bash /tmp/dgconfig3.sh", p_container)
Display the configuration
contents = f'''export TNS_ADMIN={t_admin};
dgmgrl sys/{sys_password}@{db_name} << EOF
SET ECHO ON;
SHOW CONFIGURATION;
SHOW DATABASE {db_name};
SHOW DATABASE {stby_name};
EOF
'''
create_and_run_script(contents, '/tmp/dgconfig4.sh', "/bin/bash /tmp/dgconfig4.sh", p_container)
Start the Standby in managed recovery¶
We now need to start the standby so it begins applying redo to keep it consistent with the primary.
contents='''sqlplus / as sysdba << EOF
SET ECHO ON;
SHUTDOWN IMMEDIATE;
STARTUP MOUNT;
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION;
EOF
'''
create_and_run_script(contents, '/tmp/convert_to_active.sh', "/bin/bash /tmp/convert_to_active.sh", sb_container)
Standby Database Creation Complete¶
We now have a primary and standby database that we can begin testing with.
Additional Steps¶
At this point you should have a physical standby database that is running in maximum performance mode. This might be enough for the testing you want to carry out but there's a number of possible changes that you might want to consider.
- Change the physical standby database to an Active Standby
- Convert the current mode (Maximum Performance) to Maximum Protection or Maximum Availability
- Configure the Oracle Database 19c Active Data Guard feature, DML Redirect
I'll cover these in the following sections but they "icing on the cake" rather than required.
Active Data Guard¶
This is a relatively trivial change. We just need to alter the standby database to open readonly and then start managed recovery as before
contents='''sqlplus / as sysdba << EOF
SET ECHO ON;
SHUTDOWN IMMEDIATE;
STARTUP MOUNT;
ALTER DATABASE OPEN READ ONLY;
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION;
EOF
'''
create_and_run_script(contents, '/tmp/convert_to_active.sh', "/bin/bash /tmp/convert_to_active.sh", sb_container)
Maximum Performance to Maximum Availability¶
For this change we'll use the Database Guard Broker command line tool to make the change
contents = f'''
dgmgrl sys/{sys_password}@{db_name} << EOF
SET ECHO ON;
SHOW CONFIGURATION;
edit database {stby_name} set property logxptmode=SYNC;
edit configuration set protection mode as maxavailability;
SHOW CONFIGURATION;
EOF
'''
create_and_run_script(contents, '/tmp/max_avail.sh', "/bin/bash /tmp/max_avail.sh", p_container)
Maximum Performance to Maximum Protection¶
As before we'll use the Database Guard Broker command line tool to make the change.
contents = f'''
dgmgrl sys/{sys_password}@{db_name} << EOF
SET ECHO ON;
SHOW CONFIGURATION;
edit database {stby_name} set property logxptmode=SYNC;
edit configuration set protection mode as maxprotection;
SHOW CONFIGURATION;
EOF
'''
create_and_run_script(contents, '/tmp/max_prot.sh', "/bin/bash /tmp/max_prot.sh", p_container)
Back to Max Perfromance¶
We'll use Database Guard Broker to change us back to asynchronus mode.
contents = f'''
dgmgrl sys/{sys_password}@{db_name} << EOF
SET ECHO ON;
SHOW CONFIGURATION;
edit configuration set protection mode as maxperformance;
edit database {stby_name} set property logxptmode=ASYNC;
SHOW CONFIGURATION;
EOF
'''
create_and_run_script(contents, '/tmp/max_prot.sh', "/bin/bash /tmp/max_prot.sh", p_container)
Oracle Database 19c Acvtive Data Guard DML Redirect¶
On Oracle Database 19c we can also enable DML redirect from the standby to the primary. I'll add this on the release of the Oracle Database 19c software for on premises.
Jupyter, Python, Oracle and Docker Part 2
Build Docker Oracle Database Base Image¶
The following notebook goes through the process of building an Oracle Docker image of the Oracle Database. If you are just wanting to get the Oracle Database running inside of Docker I strongly suggest that you use the docker files and guides in the Oracle Github repository. The approach documented below is very much for someone who is interested in a high level of control over the various steps in the installation and configuration of the Oracle Database or simply to understand how the various teps work.
Prerequisites¶
The process documented below uses a Jupyter Notebook (iPython). The reason I use this approach and not straight python is that it's easy to change and is self documenting. It only takes a few minutes to set up the environment. I've included a requirements file which makes it very simple to install the needed Python libraries. I go through the process of setting up a Jupyter environment for Mac and Linux here.
Running the notebook¶
Typically the only modification that the user will need to do is to modifythe values in the "Parameters" section. The code can then be run by pressing "Command SHIFT" on a Mac or "Ctrl Shift" on Windows. Or by pressing the "Play" icon in the tool bar. It is also possible to run all of the cells automatically, you can do this from "Run" menu item.
import docker
import os
import tarfile
from prettytable import PrettyTable
from IPython.display import HTML, display, Markdown
import humanize
import re
from ipynb.fs.full.OracleDockerDatabaseFunctions import list_images,list_containers,copy_to,create_and_run_script,containter_exec,copy_string_to_file,containter_root_exec
client = docker.from_env(timeout=600)
Parameters¶
This section details the parameters used to define the docker image you'll end up creating. It's almost certainly the case that you'll need to change the parameters in the first section. The parameters in the second section can be changed if there's something i.e. hostname that you want to change
# You'll need to to change the following two parameters to reflect your environment
host_orabase = '/Users/dgiles/oradata18c' # The directory on the host where you'll stage the persisted datafiles
host_install_dir = '/Users/dgiles/Downloads/oracle18_software' # The directory on the host where the downloaded Oracle Database zip file is.
# You can change any of the following parameters but it's not necssary
p_host_name = 'oracle_db'
oracle_version = '18.0.0'
oracle_base = '/u01/app/oracle'
oracle_home = f'{oracle_base}/product/{oracle_version}/dbhome_1'
db_name = 'ORCL'
oracle_sid= db_name
path=f'{oracle_home}/bin:$PATH'
tns_admin=f'{oracle_base}/oradata/dbconfig'
container_oradata = '/u01/app/oracle/oradata'
r_area = f'{oracle_base}/oradata/recovery_area'
a_area = f'{oracle_base}/admin/ORCL/adump'
container_install_dir = '/u01/install_files'
path = f'{oracle_home}/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin'
Attempt to create the needed directories on the host and print warnings if needed
try:
os.makedirs(host_orabase, exist_ok=True)
os.makedirs(host_install_dir, exist_ok=True)
except:
display(Markdown(f'**WARNING** : Unable to create directories {host_orabase} and {host_install_dir}'))
files = os.listdir(host_install_dir)
found_similar:bool = False
for file in files:
if file.startswith('LINUX.X64'):
found_similar = True
break
if not found_similar:
display(Markdown(f"**WARNING** : Are you sure that you've downloaded the needed Oracle executable to the `{host_install_dir}` directory"))
The first step in creating a usable image is to create a docker file. This details what the docker container will be based on and what needs to be installed. It will use the parameters defined above. It does require network connectivity for this to work as docker will pull down the required images and RPMs.
script = f'''
FROM oraclelinux:7-slim
ENV ORACLE_BASE={oracle_base} \
ORACLE_HOME={oracle_home} \
ORACLE_SID={oracle_sid} \
PATH={oracle_home}/bin:$PATH
RUN yum -y install unzip
RUN yum -y install oracle-database-preinstall-18c
RUN yum -y install openssl
# RUN groupadd -g 500 dba
# RUN useradd -ms /bin/bash -g dba oracle
RUN mkdir -p $ORACLE_BASE
RUN mkdir -p $ORACLE_HOME
RUN mkdir -p {container_install_dir}
RUN chown -R oracle:dba {oracle_base}
RUN chown -R oracle:dba {oracle_home}
RUN chown -R oracle:dba {container_install_dir}
USER oracle
WORKDIR /home/oracle
VOLUME ["$ORACLE_BASE/oradata"]
VOLUME ["{container_install_dir}"]
EXPOSE 1521 8080 5500
'''
with open('Dockerfile','w') as f:
f.write(script)
And now we can create the image. The period of time for this operation to complete will depend on what docker images have already been downloaded/cached and your network speed.
image, output = client.images.build(path=os.getcwd(),dockerfile='Dockerfile', tag=f"linux_for_db{oracle_version}",rm="True",nocache="False")
for out in output:
print(out)
Once the image has been created we can start a container based on it.
db_container = client.containers.create(image.short_id,
command="/bin/bash",
hostname=p_host_name,
tty=True,
stdin_open=True,
auto_remove=False,
name=p_host_name,
shm_size='3G',
# network_mode='host',
ports={1521:1522,5500:5501},
volumes={host_orabase: {'bind': container_oradata, 'mode': 'rw'}, host_install_dir: {'bind': container_install_dir, 'mode': 'rw'}},
environment={'PATH':path,'ORACLE_SID': db_name, 'ORACLE_BASE': oracle_base,'TNS_ADMIN': tns_admin, 'ORACLE_HOME':oracle_home}
)
# volumes={host_orabase: {'bind': oracle_base, 'mode': 'rw'}, host_install_dir: {'bind': container_install_dir, 'mode': 'rw'}},
db_container.start()
p_ip_adress = db_container.attrs['NetworkSettings']['IPAddress']
And then created the needed directory structure within it.
containter_exec(db_container, f'mkdir -p {container_oradata}/{db_name}')
containter_exec(db_container, f'mkdir -p {tns_admin}')
containter_exec(db_container, f'mkdir -p {r_area}/{db_name}')
containter_exec(db_container, f'mkdir -p {a_area}')
containter_exec(db_container, f'mkdir -p {oracle_base}/oraInventory')
containter_exec(db_container, f'mkdir -p {oracle_home}')
containter_root_exec(db_container,'usermod -a -G dba oracle')
Unzip Oracle Database software and validate¶
We now need to unzip the Oracle software which should be located in the host_install_dir
variable. This is unzipped within the container not the host. NOTE: I don't stream the output because it's realtively large. It should take 2-5 minutes.
files = [f for f in os.listdir(host_install_dir) if f.endswith('.zip')]
if files == 0:
display(Markdown(f"**There doesn't appear to be any zip files in the {host_install_dir} directory. This should contain the oracle database for Linux 64bit in its orginal zipped format**"))
else:
display(Markdown(f'Unzipping `{files[0]}`'))
containter_exec(db_container, f'unzip -o {container_install_dir}/{files[0]} -d {oracle_home}', show_output=False, stream_val=False)
And now display the contents of the Oracle Home
display(Markdown('Directory Contents'))
containter_exec(db_container, f'ls -l {oracle_home}')
The next step is to create an Oracle Installer response file to reflect the paremeters we've defined. We're only going to perform a software only install.
script = f'''oracle.install.responseFileVersion=/oracle/install/rspfmt_dbinstall_response_schema_v18.0.0
oracle.install.option=INSTALL_DB_SWONLY
UNIX_GROUP_NAME=dba
INVENTORY_LOCATION={oracle_base}/oraInventory
ORACLE_BASE={oracle_base}
ORACLE_HOME={oracle_home}
oracle.install.db.InstallEdition=EE
oracle.install.db.OSDBA_GROUP=dba
oracle.install.db.OSBACKUPDBA_GROUP=dba
oracle.install.db.OSDGDBA_GROUP=dba
oracle.install.db.OSKMDBA_GROUP=dba
oracle.install.db.OSRACDBA_GROUP=dba
'''
copy_string_to_file(script, f'{oracle_home}/db_install.rsp', db_container)
Run the Oracle Installer¶
Now we can run the Oracle Installer in silent mode with a response file we've just created.
containter_exec(db_container, f'{oracle_home}/runInstaller -silent -force -waitforcompletion -responsefile {oracle_home}/db_install.rsp -ignorePrereqFailure')
containter_root_exec(db_container,f'/bin/bash {oracle_base}/oraInventory/orainstRoot.sh')
containter_root_exec(db_container,f'/bin/bash {oracle_home}/root.sh')
Commit the container to create an image¶
And finally we can commit the container creating an image for future use.
repository_name = 'dominicgiles'
db_container.commit(repository=repository_name,tag=f'db{oracle_version}')
Tidy Up¶
The following code is included to enable you to quickly drop the container and potentially the immage.
db_container.stop()
db_container.remove()
list_containers(client)
list_images(client)
#client.images.remove(image.id)
#list_images()
Jupyter, Python, Oracle and Docker Part 1
Part 1
Part 2
The first discusses what Jupyter Notebooks are and the second describes the installation of the environment I use.
There should be two or three more to follow.
Jupyter Notebooks, Python and Oracle Installation on Mac
Installing Jupyter-lab and Docker environment on Mac¶
The following walk-through guides you through the steps needed to set up your enviroment to run Jupyter-lab, Oracle and Docker to build and run docker images for testing. This should work for either an on premises install or on Oracle's cloud using IaaS (Compute). This walkthough will serve primarily as a reminder to myself.
Prerequisites¶
I tested this walkthrough on Mac with High Sierra but it should work on any moderately modern variant. I'm also assuming a few other things
- You have admin access.
- Comfortable using the Terminal utility.
Install¶
The install is pretty simple. It consists of Installing Homebrew, setting up python, installing Oracle Instant client, installing Git and then cloning this directory to the server. Lets start with setting up the Python Environment
Homebrew Install¶
Homebrew is equivalent on Mac to yum
or apt-get
on Linux. The beauty of it is is that it enables you to install packages in "user" space rather than "admin". It's easy to use and incredibly simple to install. You can either visit the website yourself and follow their instructions or within the terminal run the following command
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
It will prompt you through the install. NOTE : It's possible that it may prompt you to install the XCode command line tools. Say yes.
Python Setup¶
Mac has Python installed but it's 2.7 and comes with a whole lot of packages that are needed for this exercise. Mucking around with it's version can lead to awhole lot of trouble. The good news is we can use brew to install an upto date variant. To do this just run the command
brew install python3
This might take a little while but it's worth doing. Brew should also install pip3 as part of the install but it's possible that this failed. To make sure it's usable simply run the following command
brew unlink python && brew link python
pip3
should now work and running a command like
pip3 list
Package Version
---------- -------
pip 18.1
setuptools 40.6.3
virtualenv 16.3.0
wheel 0.32.3
Will show you the installed modules.
Next we can create a virtual environment and enable it.
virtualenv -p /usr/local/bin/python3 myvirtualenv
source myvirtualenv/bin/activate
This will create a directory called myvirtualenv
(you can call it what you like) with it's own version of the python interpreter and pip. Once we "active it", any library we install will only be in this directory and won't effect the system as a whole.
Docker Install¶
You have a couple of choices here. You could just run homebrew to install docker
brew install docker
Which works fine or you might want to use the semi graphical version "Docker Desktop for Mac". This has the small advantage of notifying you of available updates. If so you can download it here
https://hub.docker.com/editions/community/docker-ce-desktop-mac
Git Installation¶
We now need to install Git which is useful for managing and versioning code. That might not be a requirement for you but it also makes it very simple to clone existing repostories. Installing it is very simple.
brew install git
We can now clone my IPython/Jupyter notebooks from github which provide you with the code for creating your own Oracle Docker Images.
git clone https://github.com/domgiles/JuypterLabWork.git
This will create a directory call JuypterLabWork
Installing Oracle Instant Client¶
Installing the Oracle Instant client takes a few more steps on Mac than it does on Linux. Simply vist
Select Instant Client for Mac OS X (Intel x86) (32-bit and 64-bit) and on the next page select the Version 18.1.0.0.0 (64-bit) Basic Package. When it's downloaded simply unzip it into your home directory.
Installing Jupyter-Lab¶
In the JupyterLabWork directory that was created when we ran the git clone command there's a file called requirments.txt
. This is a list of modules needed to run the notebooks in that directory. To install them all you need to do is to run the command
pip install -r requirements.txt
NOTE : You don't need to use pip3 when you have your virtual environment activated.
This will install all of the needed modules. From there all we need to do is to run the command
jupyter-lab
This should take you directly into a browser and the the jupyter environment.
That's it.
Jupyter Notebooks, Python and Oracle Installation on Linux
Installing Jupyter-lab and Docker environment on Linux¶
The following walk-through guides you through the steps needed to set up your enviroment to run Jupyter-lab, Oracle and Docker to build and run docker images for testing. This should work for either an on premises install or on Oracle's cloud using IaaS (Compute). This walkthough will serve primarily as a reminder to myself.
Prerequisites¶
I'm making the assumption that you're running on Linux (I've a similar walkthrough for mac). In my example I'm using Oracle Enterprise Linux 7 (OEL). I'm also assuming a few other things
- Python 3.6 or higher is installed
- You have access to root either directly or via sudo. In this example I'm installing everything in the Oracle user account that has sudo privelege
- Docker is installed. If it isn't, see this excellent guide. If you aren't running as root also make sure you follow this final step as well.
Install¶
The install is pretty simple. It consists of setting up python, installing Oracle Instant client, installing Git and then cloning this directory to the server. Lets start with setting up the Python Environment
Python Setup¶
By default OEL 7 runs Python 2 rather than Python 3 which is likely to change in the future, until then we have a few steps we need to run through. The first is to install pip and virtualenv. We can easily do this on OEL with yum
sudo yum install python36-pip
If this command fails saying something like No package python36-pip available.
then you'll need to edit the yum config file at /etc/yum.repos.d/public-yum-ol7.repo
and enable the software development repos.
The next step is to install virtualenv. Virtualenv enables us to create isolated sandpits to develop Python applications without running into module or library conflicts. Once we have pip installed it's very simple to install
sudo pip3.6 install virtualenv
Next we can create a virtual environment and enable it.
virtualenv -p /usr/bin/python36 myvirtualenv
source myvirtualenv/bin/activate
This will create a directory called myvirtualenv
(you can call it what you like) with it's own version of the python interpreter and pip. Once we "active it", any library we install will only be in this directory and won't effect the system as a whole. You should see you command prompt change when you activate it. It should look something like this
[04:59 PM : oracle@ora18server ~]$ virtualenv -p /usr/bin/python36 myvirtualenv
Running virtualenv with interpreter /usr/bin/python36
Using base prefix '/usr'
No LICENSE.txt / LICENSE found in source
New python executable in /home/oracle/myvirtualenv/bin/python36
Also creating executable in /home/oracle/myvirtualenv/bin/python
Installing setuptools, pip, wheel...
done.
[05:00 PM : oracle@ora18server ~]$ source myvirtualenv/bin/activate
(myvirtualenv) [05:00 PM : oracle@ora18server ~]$ ls
Desktop ora18server-certificate.crt sql swingbench
myvirtualenv OracleUtils sqlcl wallet
(myvirtualenv) [05:00 PM : oracle@ora18server ~]$
Running the following command will show what Python models we have installed at this point.
(myvirtualenv) [05:00 PM : oracle@ora18server ~]$ pip list
Package Version
---------- -------
pip 19.0.3
setuptools 40.8.0
wheel 0.33.1
Which shouldn't be very many.
Git Installation¶
We now need to install Git which is useful for managing and versioning code. That might not be a requirement for you but it also makes it very simple to clone existing repostories. Installing it is very simple.
sudo yum install git
We can now clone my IPython/Jupyter notebooks from github which provides you with the code for creating your own Oracle Docker Images.
git clone https://github.com/domgiles/JuypterLabWork.git
This will create a directory call JuypterLabWork
Installing Oracle Instant Client¶
One of the recent updates to Oracles install models is the support for RPMs and yum installations without the need for click through agreements. This makes it very simple to install a client with a single command
sudo yum install oracle-instantclient18.3-basic
This will typically install the the software into /usr/lib/oracle/18.3
Installing Jupyter-Lab¶
In the JupyterLabWork directory that was created when we ran the git clone command there's a file called requirments.txt
. This is a list of modules needed to run the notebooks in that directory. To install them all you need to do is to run the command
pip install -r requirements.txt
This will install all of the needed modules. From there all we need to do is to run the command
jupyter-lab
If you're running directly on a workstation or virtual machine and have a browser installed it should take you directly into the jupyter environment.
If you're running headless then when jupyter-lab starts it should give you a url that you can connect to. Look for something like
[I 12:52:30.450 NotebookApp] The Jupyter Notebook is running at:
[I 12:52:30.450 NotebookApp] http://oracle18cserver:8888/?token=f71e677e202f5fffc3d20fe458ff973e616e0dc3b8eaf072
[I 12:52:30.450 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
That's it.
Connecting to Oracle ADB from Python
Connect to Oracle ATP or ADW from Python¶
I realised that in some of my previous posts I didn't really detail connecting to ATP and ADW. Here's a slight more in depth walkthough.
Connecting to Oracle Autonomous Transaction Processing or Autonomous Datawarehouse is pretty simple from Python. It requires only a few things
- Oracle Instant Client (Or alternative)
- A Python environment.
- The Oracle_CX Python driver Module.
- A valid wallet for an ATP or ADW service
Let's go through each of these in turn
Oracle Instant Clent¶
The next step is pretty straight forward. You can download the oracle instant client from here.
You'll only need the basic package. Unzip the downloaded file into a suitable location. It's worth pointing out that on Linux this step is even easier. You can now use yum to install the instant client direct from the command line. You can find details on how to configure it here
Python environment¶
There's plenty of guides out there that show you how to install python on your windows or mac. If you haven't done this already This guide is a good place to start. I'm assuming that you've also gone through the steps of installing pip
. If not you can follow this simple guide. I'd also advice you create a virtual environment with virtualenv
before you doing anything else. It's considered best practice and isolates you from current or future library conficts.
First lets create our virtual env
virtualenv adb_virt_env
And then active it (I'm assuming linux or mac)
source adb_virt_env/bin/activate
The next step is to install the Python driver. This is as simple as
pip install cx_Oracle
And thats all we need to do at this stage inside to setup our Python environment.
Oracle ADW or ATP Wallet¶
The final thing we need is the wallet containing the credential and connect string details to enable us to connect to ATP or ADW. You'll need to log onto Oracle OCI console to do this unless have been provided the wallet by a colleague. Simply navigate to your ATP or ADW instance and follow the instructions below.
While it's not necessary we'll download and unzip the wallet into the virtual directory we've created (adb_virt_env).
$> ls
bin cx_Oracle-doc include lib pip-selfcheck.json wallet_SBATP.zip
$> unzip wallet_SBATP.zip
Archive: wallet_SBATP.zip
inflating: cwallet.sso
inflating: tnsnames.ora
inflating: truststore.jks
inflating: ojdbc.properties
inflating: sqlnet.ora
inflating: ewallet.p12
inflating: keystore.jks
$> ls
bin cx_Oracle-doc include lib pip-selfcheck.json tnsnames.ora wallet_SBATP.zip
cwallet.sso ewallet.p12 keystore.jks ojdbc.properties sqlnet.ora truststore.jks
Next we need to edit the sqlnet.ora
file to reflect the location where it's located. Currently for my environment it looks like
WALLET_LOCATION = (SOURCE = (METHOD = file) (METHOD_DATA = (DIRECTORY="?/network/admin")))
SSL_SERVER_DN_MATCH=yes
We'll need to change the DIRECTORY
parameter to our virtual environment. In my case /Users/dgiles/Downloads/adb_virt_env
. So for my environment it will look like
WALLET_LOCATION = (SOURCE = (METHOD = file) (METHOD_DATA = (DIRECTORY="/Users/dgiles/Downloads/adb_virt_env")))
SSL_SERVER_DN_MATCH=yes
We should also take a look at tnsnames.ora to see which services we'll be using. You can do this by taking a look in the tnsnames.ora
file. There's likely to by lots of entries if you have lots of ATB or ADW instances in you OCI compartment. In my instance I'll be using a connect string called sbatp_medium
which has a medium priority but pick the one appropriate to your environment.
sbatp_medium = (description= (address=(protocol=tcps)(port=1522)(host=adb.us-phoenix-1.oraclecloud.com))(connect_data=(service_name=gebqwccvhjbqbs_sbatp_medium.atp.oraclecloud.com))(security=(ssl_server_cert_dn=
"CN=adwc.uscom-east-1.oraclecloud.com,OU=Oracle BMCS US,O=Oracle Corporation,L=Redwood City,ST=California,C=US")) )
We'll only need to remember its name for the next step.
The Code¶
Finally we're ready to write some code. The first step is to import the modules we'll need. In this case it's just cx_oracle
and os
import cx_Oracle
import os
We need to set the environment variable TNS_ADMIN
to point at our directory (adb_virt_env) where all of the files from our wallet are located.
os.environ['TNS_ADMIN'] = '/Users/dgiles/Downloads/adb_virt_env'
And now we can simply connect to ATP or ADW instance using a standard Python database connect operation using the connect string we remebered from the tnsnames.ora
file. NOTE : I'm assuming you've created a user in the database or you're using the admin user created for this instance.
connection = cx_Oracle.connect('admin', 'ReallyLongPassw0rd', 'sbatp_medium')
And thats it... From here on in we can use the connection as it was a local database.
cursor = connection.cursor()
rs = cursor.execute("select 'Hello for ADB' from dual")
rs.fetchall()
Update to MonitorDB
You can add the location in the configuration file or
On the command line
I've also compiled it for Java8 and used the latest jdbc drivers.
You can find it here
Oracle SODA Python Driver and Jupyter Lab
Oracle SODA Python Driver and Jupyter Lab¶
This workbook is divided into two sections the first is a quick guide to setting up Jupyter Lab (Python Notebooks) such that it can connect to a database running inside of OCI, in this case an ATP instance. The second section uses the JSON python driver to connect to the database to run a few tests. This notebook is largely a reminder to myself on how to set this up but hopefully it will be useful to others.
Setting up Python 3.6 and Jupyter Lab on our compute instance¶
I won't go into much detail on setting up ATP or ADW or creating a IaaS server. I covered that process in quite a lot of detail here. We'll be setting up something similar to the following
Once you've created the server You'll need to logon to the server with the details found on the compute instances home screen. You just need to grab it's IP address to enable you to logon over ssh.
The next step is to connect over ssh to with a command similar to
ssh opc@132.146.27.111
Enter passphrase for key '/Users/dgiles/.ssh/id_rsa':
Last login: Wed Jan 9 20:48:46 2019 from host10.10.10.1
In the following steps we'll be using python so we need to set up python on the server and configure the needed modules. Our first step is to use yum to install python 3.6 (This is personal preference and you could stick with python 2.7). To do this we first need to enable yum and then install the environment. Run the following commands
sudo yum -y install yum-utils
sudo yum-config-manager --enable ol7_developer_epel
sudo yum install -y python36
python3.6 -m venv myvirtualenv
source myvirtualenv/bin/activate
This will install python and enable a virtual environment for use (our own Python sand pit). You can make sure that python is installed by simply typing python3.6 ie.
$> python3.6
Python 3.6.3 (default, Feb 1 2018, 22:26:31)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-16)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> quit()
Make sure you type quit()
to leave the REPL environment.
We now need to install the needed modules. You can do this one by one or simply use the following file requirements.txt and run the following command
pip -p requirements.txt
This will install all of the need python modules for the next step which is to start up Jupyter Lab.
Jupyter Lab is an interactive web based application that enables you do interactively run code and document the process at the same time. This blog is written in it and the code below can be run once your environment is set up. Vists the website to see more details.
To start jupyer lab we run the following command.
nohup jupyter-lab --ip=127.0.0.1 &
Be aware that this will only work if you have activated you virtual environment. In out instance we did this with with the command source myvirtualenv/bin/activate
. At this point the jupyter-lab is running in the background and and is listening (by default) on port 8888. You could start a desktop up and use VNC to view the output. However I prefer to redirect the output to my own desktop over ssh. To do this you'll need to run the following ssh command from your desktop
ssh -N -f -L 5555:localhost:8888 opc@132.146.27.111
Replacing the IP address above with the one for your compute instance
This will redirect the output of 8888 to port 5555 on your destop. You can then connect to it by simply going to the following url http://localhost:5555
. After doing this you should see a screen asking you input a token (you'll only need to do this once). You can find this token inside of the nohup.out
file running on the compute instance. It should be near the top of the file and should look something like
[I 20:49:12.237 LabApp] http://127.0.0.1:8888/?token=216272ef06c7b7cb3fa8da4e2d7c727dab77c0942fac29c8
Just copy the text after "token=" and paste it in to the dialogue box. After completing that step you should see something like this
You can now start creating your own notebooks or load this one found here. I'd visit the website to familiarise yourself on how the notebooks work.
Using Python and the JSON SODA API¶
This section will walk through using The SODA API with Python from within the Jupyter-lab envionment we set up in the previous section. The SODA API is a simple object API that enables developers persist and retrieve JSON documents held inside of the Oracle Database. SODA drivers are available for Java, C, REST, Node and Python.
You can find the documentation for this API here
To get started we'll need to import the need the following python modules
import cx_Oracle
import keyring
import os
import pandas as pd
We now need to set the location of the directory containing the wallet to enable us to connect to the ATP instance. Once we've done that we can connect to the Oracle ATP instance and get a SODA object to enable us to work with JSON documents. NOTE : I'm using the module keyring
to hide the password for my database. You should replace this call with the password for your user.
os.environ['TNS_ADMIN'] = '/home/opc/Wallet'
connection = cx_Oracle.connect('json', keyring.get_password('ATPLondon','json'), 'sbatp_tpurgent')
soda = connection.getSodaDatabase()
We now need to create JSON collection and if needed add any additional indexes which might accelerate data access.
try:
collection = soda.createCollection("customers_json_soda")
collection.createIndex({ "name" : "customer_index",
"fields" : [ { "path" : "name_last",
"datatype" : "string"}]})
except cx_Oracle.DatabaseError as ex:
print("It looks like the index already exists : {}".format(ex))
We can now add data to the collection. Here I'm inserting the document into the database and retrieving it's key. You can find find some examples/test cases on how to use collections here
customer_doc = {"id" : 1,
"name_last" : "Giles",
"name_first" : "Dom",
"description" : "Gold customer, since 1990",
"account info" : None,
"dataplan" : True,
"phones" : [{"type" : "mobile", "number" : 9999965499},
{"type" : "home", "number" : 3248723987}]}
doc = collection.insertOneAndGet(customer_doc)
connection.commit()
To fetch documents we could use SQL or Query By Example (QBE) as shown below. You can find further details on QBE here. In this example there should just be a single document. NOTE: I'm simply using pandas DataFrame
to render the retrieved data but it does highlight how simple it is to use the framework for additional analysis at a later stage.
documents = collection.find().filter({'name_first': {'$eq': 'Dom'}}).getDocuments()
results = [document.getContent() for document in documents]
pd.DataFrame(results)
To update records we can use the replaceOne
method.
document = collection.find().filter({'name_first': {'$eq': 'Dom'}}).getOne()
updated = collection.find().key(doc.key).replaceOne({"id" : 1,
"name_last" : "Giles",
"name_first" : "Dominic",
"description" : "Gold customer, since 1990",
"account info" : None,
"dataplan" : True,
"phones" : [{"type" : "mobile", "number" : 9999965499},
{"type" : "home", "number" : 3248723987}]},)
connection.commit()
And just to make sure the change happened
data = collection.find().key(document.key).getOne().getContent()
pd.DataFrame([data])
And finally we can drop the collection.
try:
collection.drop()
except cx_Oracle.DatabaseError as ex:
print("We're were unable to drop the collection")
connection.close()
Creating, Querying and Loading Data in to the Oracle Autonomous Data Warehouse
Oracle Autonomous Data Warehouse (ADW) access via Python¶
The following shows how to access the Oracle Autonomous Data Warehouse using Python and how to load the data using the DBMS_CLOUD package via the cx_Oracle module. This is obviously simpler via the Graphical front end or SQL Developer but using Python provdes a simple scriptable model whilst hiding some of the complexities of using native REST APIs.
This simple example assumes that you've got an Oracle Cloud account and that you've created or got access an ADW database. You'll also have to download the credentials file to provide SQL*Net access. You can find the details on how to do that here. We'll be using Python in this short example but most of what we're doing could be achieved using the GUI and/or REST Calls.
Connecting to ADW Instance¶
To start with we'll make sure we can connect to the ADW Instance we've previously created. To do that we need to import the required libraries. If you dodn't have these I reccommend using PIP (and virtualenv)
import cx_Oracle
import keyring
import os
We need to use an environment variable to reflect the location of the downloaded credentials files to be used by SQL*Net.
os.environ['TNS_ADMIN'] = '/Users/dgiles/Downloads/wallet_DOMSDB'
This is equivlent to bash
export TNS_ADMIN=/Users/dgiles/Downloads/wallet_DOMSDB
and points to the unzipped directory containing the tnsnames.ora, sqlnet.ora etc. NOTE: you'll need to update the sqlnet.ora to ensure the wallet points to the same directory specified in the TNS_ADMIN environment variable. i.e.
WALLET_LOCATION = (SOURCE = (METHOD = file) (METHOD_DATA = (DIRECTORY="/Users/dgiles/Downloads/wallet_DOMSDB")))
SSL_SERVER_DN_MATCH=yes
In the example above I've changed DIRECTORY to the location where I downloaded and unzipped the credentials file.
The next steps are to connect to the Oracle ADW instance. In the example below I've store my password using the Python Module "keyring". I'm also using the jupyter notebook magic sql functionality. We'll test the connection using the admin user and the connect string "domsdb_medium" which is one of the services included in the tnsnames.ora file.
%load_ext sql
password = keyring.get_password('adw','admin')
%sql oracle+cx_oracle://admin:$password@domsdb_medium
%%sql admin@domsdb_medium
select 1 from dual
Generating test data¶
We've connected to the oracle database and can now start uploading data to the instance. In this example we'll use datagenerator to generate the data into flat files and then place these on Oracle Object Store and load them from there.
The first step is to install datagenerator. You can find details on how to do that here. We can now simply generate data for the "SH" benchmark.
import subprocess
# Change the 2 following parameters to reflect your environment
generated_data_dir = '/Users/dgiles/Downloads/generated_data'
datagenerator_home = '/Users/dgiles/datagenerator'
# Change the following paramters relating to the way datagenerator will create the data
scale = 100
parallel = 8
dg_command = '{dg}/bin/datagenerator -c {dg}/bin/sh.xml -scale {s} -cl -f -d {gdd} -tc {p}'.format(
dg = datagenerator_home,
s = scale,
gdd = generated_data_dir,
p = parallel
)
# Typically we'd use a command similiar to the one below but since we're in a notebook it's easier to use the default functionality
# p = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE)
# (output, err) = p.communicate()
print(dg_command)
!{dg_command}
We should now have a series of files in the "generated_data_dir" directory. These will be a mix of csv files, create table scripts, loader scripts etc.
!ls {generated_data_dir}
Uploading the data to the Oracle Object Store¶
We're really only interested in the "csv" files so we'll upload just those. But before we do this we'll need to establish a connection to the Oracle Object Store. I give some detail behind how to do this in this notebook. I'll be using the object storage out of the Frankfurt Region.
import oci
import ast
my_config = ast.literal_eval(keyring.get_password('oci_opj','doms'))
my_config['region'] = 'eu-frankfurt-1'
object_storage_client = oci.object_storage.ObjectStorageClient(my_config)
namespace = object_storage_client.get_namespace().data
We've now got a handle to the Oracle Object Store Client so we can now create a bucket which we'll call and upload the "CSV" Files.
import os, io
bucket_name = 'Sales_Data'
files_to_process = [file for file in os.listdir(generated_data_dir) if file.endswith('csv')]
try:
create_bucket_response = object_storage_client.create_bucket(
namespace,
oci.object_storage.models.CreateBucketDetails(
name=bucket_name,
compartment_id=my_config['tenancy']
)
)
except Exception as e:
print(e.message)
for upload_file in files_to_process:
print('Uploading file {}'.format(upload_file))
object_storage_client.put_object(namespace, bucket_name, upload_file, io.open(os.path.join(generated_data_dir,upload_file),'r'))
We need to create an authentication token that can be used by the ADW instance to access our Object storage. To do this we need to create an identity client.
indentity_client = oci.identity.IdentityClient(my_config)
token = indentity_client.create_auth_token(
oci.identity.models.CreateAuthTokenDetails(
description = "Token used to provide access to newly loaded files"
),
user_id = my_config['user']
)
Creating users and tables in the ADW Instance¶
The following steps will feel very familiar to any DBA/developer of and Oracle database. We need to create a schema and assocated tables to load the data into.
First we'll need to create a user/schema and grant it the appropriate roles
%sql create user mysh identified by ReallyLongPassw0rd default tablespace data
Grant the "mysh" user the DWROLE
%sql grant DWROLE to mysh
%sql oracle+cx_oracle://mysh:ReallyLongPassw0rd@domsdb_medium
We can now create the tables we'll use to load the data into.
%%sql mysh@domsdb_medium
CREATE TABLE COUNTRIES (
COUNTRY_ID NUMBER NOT NULL,
COUNTRY_ISO_CODE CHAR(2) NOT NULL,
COUNTRY_NAME VARCHAR2(40) NOT NULL,
COUNTRY_SUBREGION VARCHAR2(30) NOT NULL,
COUNTRY_SUBREGION_ID NUMBER NOT NULL,
COUNTRY_REGION VARCHAR2(20) NOT NULL,
COUNTRY_REGION_ID NUMBER NOT NULL,
COUNTRY_TOTAL NUMBER(9) NOT NULL,
COUNTRY_TOTAL_ID NUMBER NOT NULL,
COUNTRY_NAME_HIST VARCHAR2(40)
)
%%sql mysh@domsdb_medium
CREATE TABLE SALES (
PROD_ID NUMBER NOT NULL,
CUST_ID NUMBER NOT NULL,
TIME_ID DATE NOT NULL,
CHANNEL_ID NUMBER NOT NULL,
PROMO_ID NUMBER NOT NULL,
QUANTITY_SOLD NUMBER(10) NOT NULL,
AMOUNT_SOLD NUMBER(10) NOT NULL
)
%%sql mysh@domsdb_medium
CREATE TABLE SUPPLEMENTARY_DEMOGRAPHICS (
CUST_ID NUMBER NOT NULL,
EDUCATION VARCHAR2(21),
OCCUPATION VARCHAR2(21),
HOUSEHOLD_SIZE VARCHAR2(21),
YRS_RESIDENCE NUMBER,
AFFINITY_CARD NUMBER(10),
BULK_PACK_DISKETTES NUMBER(10),
FLAT_PANEL_MONITOR NUMBER(10),
HOME_THEATER_PACKAGE NUMBER(10),
BOOKKEEPING_APPLICATION NUMBER(10),
PRINTER_SUPPLIES NUMBER(10),
Y_BOX_GAMES NUMBER(10),
OS_DOC_SET_KANJI NUMBER(10),
COMMENTS VARCHAR2(4000)
)
%%sql mysh@domsdb_medium
CREATE TABLE CUSTOMERS (
CUST_ID NUMBER NOT NULL,
CUST_FIRST_NAME VARCHAR2(20) NOT NULL,
CUST_LAST_NAME VARCHAR2(40) NOT NULL,
CUST_GENDER CHAR(1) NOT NULL,
CUST_YEAR_OF_BIRTH NUMBER(4) NOT NULL,
CUST_MARITAL_STATUS VARCHAR2(20),
CUST_STREET_ADDRESS VARCHAR2(40) NOT NULL,
CUST_POSTAL_CODE VARCHAR2(10) NOT NULL,
CUST_CITY VARCHAR2(30) NOT NULL,
CUST_CITY_ID NUMBER NOT NULL,
CUST_STATE_PROVINCE VARCHAR2(40) NOT NULL,
CUST_STATE_PROVINCE_ID NUMBER NOT NULL,
COUNTRY_ID NUMBER NOT NULL,
CUST_MAIN_PHONE_NUMBER VARCHAR2(25) NOT NULL,
CUST_INCOME_LEVEL VARCHAR2(30),
CUST_CREDIT_LIMIT NUMBER,
CUST_EMAIL VARCHAR2(40),
CUST_TOTAL VARCHAR2(14) NOT NULL,
CUST_TOTAL_ID NUMBER NOT NULL,
CUST_SRC_ID NUMBER,
CUST_EFF_FROM DATE,
CUST_EFF_TO DATE,
CUST_VALID VARCHAR2(1)
)
%%sql mysh@domsdb_medium
CREATE TABLE CHANNELS (
CHANNEL_ID NUMBER NOT NULL,
CHANNEL_DESC VARCHAR2(20) NOT NULL,
CHANNEL_CLASS VARCHAR2(20) NOT NULL,
CHANNEL_CLASS_ID NUMBER NOT NULL,
CHANNEL_TOTAL VARCHAR2(13) NOT NULL,
CHANNEL_TOTAL_ID NUMBER NOT NULL
)
%%sql mysh@domsdb_medium
CREATE TABLE PRODUCTS (
PROD_ID NUMBER(6) NOT NULL,
PROD_NAME VARCHAR2(50) NOT NULL,
PROD_DESC VARCHAR2(4000) NOT NULL,
PROD_SUBCATEGORY VARCHAR2(50) NOT NULL,
PROD_SUBCATEGORY_ID NUMBER NOT NULL,
PROD_SUBCATEGORY_DESC VARCHAR2(2000) NOT NULL,
PROD_CATEGORY VARCHAR2(50) NOT NULL,
PROD_CATEGORY_ID NUMBER NOT NULL,
PROD_CATEGORY_DESC VARCHAR2(2000) NOT NULL,
PROD_WEIGHT_CLASS NUMBER(3) NOT NULL,
PROD_UNIT_OF_MEASURE VARCHAR2(20),
PROD_PACK_SIZE VARCHAR2(30) NOT NULL,
SUPPLIER_ID NUMBER(6) NOT NULL,
PROD_STATUS VARCHAR2(20) NOT NULL,
PROD_LIST_PRICE NUMBER(8) NOT NULL,
PROD_MIN_PRICE NUMBER(8) NOT NULL,
PROD_TOTAL VARCHAR2(13) NOT NULL,
PROD_TOTAL_ID NUMBER NOT NULL,
PROD_SRC_ID NUMBER,
PROD_EFF_FROM DATE,
PROD_EFF_TO DATE,
PROD_VALID VARCHAR2(1)
)
%%sql mysh@domsdb_medium
CREATE TABLE PROMOTIONS (
PROMO_ID NUMBER(6) NOT NULL,
PROMO_NAME VARCHAR2(30) NOT NULL,
PROMO_SUBCATEGORY VARCHAR2(30) NOT NULL,
PROMO_SUBCATEGORY_ID NUMBER NOT NULL,
PROMO_CATEGORY VARCHAR2(30) NOT NULL,
PROMO_CATEGORY_ID NUMBER NOT NULL,
PROMO_COST NUMBER(10) NOT NULL,
PROMO_BEGIN_DATE DATE NOT NULL,
PROMO_END_DATE DATE NOT NULL,
PROMO_TOTAL VARCHAR2(15) NOT NULL,
PROMO_TOTAL_ID NUMBER NOT NULL
)
%%sql mysh@domsdb_medium
CREATE TABLE times AS
SELECT udate time_id,
TO_CHAR(udate,'Day' day_name,
TO_CHAR(udate,'DD' day_number_in_month,
TO_CHAR(udate,'DDD' day_number_in_year,
TO_CHAR(udate,'YYYY' ) calendar_year,
TO_CHAR(udate,'Q' ) calendar_quarter_number,
TO_CHAR(udate,'MM' ) calendar_month_number,
TO_CHAR(udate,'WW' ) calendar_week_number,
TO_CHAR(udate,'YYYY-MM' calendar_month_desc,
TO_CHAR(udate,'YYYY-Q' calendar_quarter_desc
FROM
(SELECT to_date('31/12/1994','DD/MM/YYYY'+rownum udate
FROM all_objects
WHERE to_date('31/12/1994','DD/MM/YYYY'+rownum <= to_date( '31/12/2014','DD/MM/YYYY'
)
%%sql mysh@domsdb_medium
select * from tab
Copying the data from the object store¶
We need to add the authorisation token to the newly created schema to allow it to access the object stores files. We can't do this using the sql magic syntax we've been using till this point so we'll do it using standard cx_Oracle calls.
connection = cx_Oracle.connect('mysh', 'ReallyLongPassw0rd', 'domsdb_medium')
cursor = connection.cursor();
cursor.callproc('DBMS_CLOUD.create_credential', keywordParameters = {'credential_name':'SALES_DATA_AUTH',
'username':'dominic.giles@oracle.com',
'password':token.data.token})
We can access the object storage using a url of the the format
https://swiftobjectstorage.<region>.oraclecloud.com/v1/<tenancy>/<bucket name>/<object name>
We can use this to dynamically generate a url for each of the objects inside of the bucket we've just created and use the DBMS_CLOUD package to copy the data into the ADW instance. The code below gets all of the names of the tables we've just created and the loops through each table copying the associated csv file into the ADW instance.
from tqdm import tqdm
format = '''{"delimiter" : ",",
"skipheaders" : 1,
"ignoremissingcolumns" : "true",
"removequotes" : "true",
"dateformat" : "DD-MON-YYYY HH24:MI:SS",
"blankasnull" : "true"}'''
file_location = '''https://swiftobjectstorage.{region}.oraclecloud.com/v1/{tenancy}/{bucket_name}/{table_name}.csv'''
region = my_config['region']
tenancy= 'oracleonpremjava'
rs = cursor.execute("select table_name from user_tables where table_name not like 'COPY%'")
rows = rs.fetchall()
for row in tqdm(rows):
url = file_location.format(region=region, tenancy=tenancy, bucket_name=bucket_name, table_name=row[0])
cursor.callproc('DBMS_CLOUD.copy_data', keywordParameters= {'table_name':row[0],
'credential_name':'SALES_DATA_AUTH',
'file_uri_list':url,
'format': format
})
We can now take a look and see how many rows we've loaded into the tables
rs = cursor.execute("select table_name from user_tables where table_name not like 'COPY%'")
rows = rs.fetchall()
for row in rows:
rs2 = cursor.execute("select count(*) from {}".format(row[0]))
rows2 = rs2.fetchone()
print('{tn: <35}{rc:>10,}'.format(tn=row[0],rc=rows2[0]))
It's now possible to run standard queries against the newly loaded data. No need to create anything else (indexes etc.)
%%sql mysh@domsdb_medium
SELECT channels.channel_desc,
countries.country_iso_code,
TO_CHAR(SUM(amount_sold), '9,999,999,999' SALES$
FROM sales, customers, times, channels, countries
WHERE sales.time_id=times.time_id
AND sales.cust_id=customers.cust_id
AND customers.country_id = countries.country_id
AND sales.channel_id = channels.channel_id
AND channels.channel_desc IN ('Tele Sales','Internet'
AND times.calendar_year = '2006'
AND countries.country_iso_code IN ('GB','DE','FR','DK'
GROUP BY
ROLLUP(channels.channel_desc,countries.country_iso_code)
ORDER BY 1
Tidying up the object store and database¶
You can run the following steps if you want to remove all of the tables from the schema and purge the object store of files. Lets start by removing the tables. NOTE : The code below will remove all of the tables from the schema. Make sure you've not got anything in the schema that you want to keep before running it.
from tqdm import tqdm
rs = cursor.execute("select table_name from user_tables")
rows = rs.fetchall()
for row in tqdm(rows):
rs2 = cursor.execute("drop table {} purge".format(row[0]))
And then removing the object store files and bucket
object_list = object_storage_client.list_objects(namespace, bucket_name)
for o in object_list.data.objects:
print('Deleting object {}'.format(o.name))
object_storage_client.delete_object(namespace, bucket_name, o.name)
print('Deleting bucket')
response = object_storage_client.delete_bucket(namespace, bucket_name)
Making the alert log just a little more readable
But what if you are only looking after one or two or just testing something out? Well the most common solution is to simply tail the alert log file.
The only issue is that it's not the most exciting thing to view, this of course could be said for any terminal based text file. But there are things you can do to make it easier to parse visually and improve your chances of catching an unexpected issue.
The approach I take is to push the alert log file through python and use the various libraries to brighten it up. It's very easy to go from this (tail -f)
To this
The reason this works is that python provides a rich set of libraries which can add a little bit of colour and formatting to the alert file.
You can find the code to achieve this in the gist below
Just a quick note on installing this. You'll need either python 2.7 or 3 available on your server.
I'd also recommend installing pip and then the following libraries
pip install humanize psutil colorama python-dateutil
After you've done that it's just a case of running the script. If you have $ORACLE_BASE and $ORACLE_SID set the library will try and make a guess at the location of the alert file. i.e
python alertlogparser.py
But if that doesn't work or you get an error you can also explicitly specify the location of the alert log with something like
python alertlogparser.py -a $ORACLE_BASE/diag/rdbms/orcl/orcl/trace/alert_orcl.log
This script isn't supposed to be an end product just a simple example of what can be achieved to make things a little easier. And whilst I print information like CPU load and Memory there's nothing to stop you from modifying the script to display the number of warnings or errors found in the alert log and update it things change. Or if you really want to go wild implement something similar but a lot more sophisticated using python and curses
The age of "Terminal" is far from over….
Changing the size of redo logs in python
The following code works it's way through the redo log files drops one thats inactive and then simply recreates it. It finished when it's set all of the redo to the right size.
Running the script is simply a case of running it with the parameters shown below
python ChangeRedoSize -u sys -p welcome1 -cs myserver/orclcdb --size 300
Note : the user is the sysdba of the container database if you are using the multitenant arhcitecture and the size is in Mega Bytes.
You should then see something similar to the following
Current Redo Log configuration +-----------+------------+--------------+-----------+---------------+----------+ | Group No. | Thread No. | Sequence No. | Size (MB) | No of Members | Status | +-----------+------------+--------------+-----------+---------------+----------+ | 1 | 1 | 446 | 524288000 | 1 | INACTIVE | | 2 | 1 | 448 | 524288000 | 1 | CURRENT | | 3 | 1 | 447 | 524288000 | 1 | ACTIVE | +-----------+------------+--------------+-----------+---------------+----------+ alter system switch logfile alter system switch logfile alter database drop logfile group 2 alter database add logfile group 2 size 314572800 alter system switch logfile alter database drop logfile group 1 alter database add logfile group 1 size 314572800 alter system switch logfile alter system switch logfile alter system switch logfile alter system switch logfile alter database drop logfile group 3 alter database add logfile group 3 size 314572800 alter system switch logfile All logs correctly sized. Finishing... New Redo Log configuration +-----------+------------+--------------+-----------+---------------+----------+ | Group No. | Thread No. | Sequence No. | Size (MB) | No of Members | Status | +-----------+------------+--------------+-----------+---------------+----------+ | 1 | 1 | 455 | 314572800 | 1 | ACTIVE | | 2 | 1 | 454 | 314572800 | 1 | INACTIVE | | 3 | 1 | 456 | 314572800 | 1 | CURRENT | +-----------+------------+--------------+-----------+---------------+----------+
Notes on pre-parsing data for Oracle data loads
Now typically the approach many people would take would be to develop some form of program that parses all of the data and writes it to staging area and then loads all of it in one go to the target database. I make no criticism of this approach since it works well and as long as its not time critical. It's by far the simplest method. However Im a big fan of taking advantage of whats already available and one of the most underused and powerful features of the Oracle database is the preparser. It enables you to pipeline various operations so they all run as quickly as possible. So going back to my benchmark we used this approach to load data into out target database. It consisted of 4 steps
- Read the data of the filesystem as efficiently as possible and write it to stdout
- Read from stdin and Unzip the the data writing it to stdout
- Read from stdin into a java program to do the key value mapping and error detection/correction writing the output to stdout
- Read from stdin into sqlloader
The java program simple reads from stdin and writes to stdout. To handle key value pairs just required the program to read the mapping file in and split and parse the values from stdin. The data was then written to stdout in a well know order.
Java extract from my program...
BufferedReader br = new BufferedReader(new InputStreamReader(System.in), OneMB); String line = null; HashMapkeyValuePairs = null; MyTokenizer mt = null; while ((line = br.readLine()) != null) { keyValuePairs = new HashMap (200); mt = new MyTokenizer(line, delimitor); for (String token : mt) { int loc = token.indexOf("="); if (loc != -1) { String i = token.substring(0, loc); String s = token.substring(loc + 1, token.length()); keyValuePairs.put(i, s); } StringBuffer outRec = new StringBuffer(1000); outRec.append(checkForNull(keyValuePairs.get("uniqueID"), "")).append(seperator); // mapping logic similar to above repeats System.out.println(outRec.toString()); }
All that was needed for sqlloader to process the files was a control file that understood the order of the columns and any additional formatting.
One of the additional benefits is that we can load the data via "direct path" and implement other features such as multi table insert. The Java preparser enables you to add all of the additional formatting to make this a trivial process.
The following diagram illustrates the process.
This equates into a Unix/Linux statement such as
/bin/dd if=myverybigfile.txt bs=1024k status=noxfer 2>/dev/null | /bin/gunzip -c | java -classpath /home/oracle/loader.jar com.dom.KeyValueParserStdIn | sqlloader bench/bench control=kv.ctl data=\"-\" direct=TRUE;
NOTE : one thing you may have noticed is that Im using dd to do 1MB I/Os. This just an efficiency operation and works well on structures such as DBFS, you could skip this part of the operation if needed.
Which brings us onto external tables and the preparser
External Tables and pre-parsers
As I mentioned earlier I like to take advantage of functionality that's already available and one of those features in the Oracle database is external tables. I don't intend to go into much detail as to why you should use external tables other than they do much of the heavy lifting for you and they provide a seamless interface between the filesystem and the database. They effectively make files look like tables.
This means it's trivial to implement parallelism for our pre-parser. We don't need to worry about how to handle the files and how to schedule everything, external tables take care of all of that for you. In our benchmark we used them in the following way
Our previous pipeline remains the same except that we don't need sqlloader its all managed by the table definition itself. So we end up with something similar to the following for the table definition (I've abbreviated it quite substantially and highlighted the important bits)
create table staging_ext_mydata_jan01 ( uniqueid NUMBER, .. -- Lots of columns .. ) ORGANIZATION EXTERNAL ( TYPE oracle_loader DEFAULT DIRECTORY load_dir_jan01 ACCESS PARAMETERS ( RECORDS DELIMITED BY NEWLINE PREPROCESSOR exec_dir:'external_tab.sh' BADFILE log_dir: 'external.bad' LOGFILE log_dir: 'external.log' fields terminated by '|' OPTIONALLY ENCLOSED BY '"' AND '"' ( uniqueid char(100), .. -- Lots of defintions .. ) ) LOCATION('data1.txt',data2.txt'...'data100.txt')) REJECT LIMIT UNLIMITED;
One of the things to note is that I've included the pipelined preprocessor inside of a shell script which looks like this
/bin/dd if=$1 bs=1024k status=noxfer 2>/dev/null | /bin/gunzip -c | java -classpath /home/oracle/loader.jar com.dom.KeyValueParserStdIn
The important part of this script is the parameter ($1) that is passed to the shell script. This is the file name that the external table wants to process.
A simple select statement from my "staging_ext_mydata_jan01" unzips and parses the data converting it to usable columns. Whats more if I issue the statement in parallel Oracle takes care of creating the processes for me and making sure everything is scheduled in an orderly fashion.
To finish the load we simply used a multi table insert to put the data into the correct tables in an efficient fashion. Using this approach we were able to read zipped files, parse them and insert them into our three target tables at over 1.5 million source records/sec.
Timing groups of SQL operations
Obviously turning on is achieved with the “set timing on” operation. i.e
SQL > set timing on
SQL > select count(1) from all_objects;
COUNT(1)
----------
68653
Elapsed: 00:00:03.95
SQL>
Which is great but what if want to time mulitiple operations. Use the timing function and simply give the timer a name, in this case statement timer.
SQL> timing start statement_timer
SQL> select count(1) from all_objects;
COUNT(1)
----------
68653
SYS@orcl > /
COUNT(1)
----------
68653
SQL> timing show statement_timer;
timing for: statement_timer
Elapsed: 00:00:30.85
SQL>
Which times anything that went on in between the timer starting and finishing. In this case also my typing of the commands. Its a fantastic utility for timing stages in a batch job including call outs to os operations.
Datawarehousing benchmark
2 Intel based servers.
- 4 CPUs with Dual Cores (hyper threading enabled).
- 64 GB of memory in each
- 2 dual ported 4Gb HBAs
- 2 dual ported 1Gb Nics
- 10 low cost storage arrays
- 20 controllers
- 20 trays of disk with 14 15k 36GB drives in each
- 2 mid range 16 port fibre switches
- 1 mid range 16 port 1Gb switch (I know we really should have had two of these)
The point here is that the hardware was low end but with the ability to deliver plenty of CPU and I/O at very attractive price point. For those that aren't that interested in any more of the details the headline figures where that we performed full table scans at over 1.6GB/sec but due to compression we achieved logically three times this figure.
We used a stock 2.6 64 bit Linux kernel we modified the following kernel settings
- wem, rmem were to set to 512k to reflect the fact that we were going to use 32k block sizes
- Huge pages were enabled... its a much more sensible way of managing the shared memory required for the SGA
- We enabled jumbo frames in the switch and set the MTU on the interconnect to 9000bytes
- We used the deadline scheduler to improve I/O prioritisation.
- Multipath was used to provide DMP over the 4 paths to disk from each of the servers.
The Oracle database was largely unchanged from the defaults with the exception of the following settings
- 16GB for SGA and PGA
- The fact table was compressed
- 32k block sized to improve compression ratios
- Two parallel instance groups to ensure execution of certain queries on a given node.
- Query rewrite was enabled.
- ocfs2 was used for the quorum and ocr files
- ASM was used to provide storage for the datafiles etc. Each physical tray of disk was presented to ASM as logical disk of roughly 500Gb.
- ASMlib was used to persist disk identity and simplify configuration.
The tests consisted of a series of SQL statements that were run serial, concurrently and as a series of streams. I cant give much in the way of specifics for obvious reasons but the machine ran at about 80% utilization during the test most of the queries returned sub minute when run against the full data and many sub second when materialized views were exploited.
However the real benefit came from the fact that low the cost commodity hardware made the system relatively cheap and easy to put together. Like any system there were things we would have liked to have done differently if we'd had the chance... The benchmark was done in 10 days (OS Install, disk, layout, Oracle install, data load etc.) and so things were done in a rush and with a little more thought could have been done much more efficiently... We are fairly certain we could have improved the I/O performance but we didn't have the luxury of experimentation and so we made a call and lived with the consequences... I strongly recommend that any one building one of these systems spends some time with Oracle Orion disk benchmarking system to determine an optimal layout. A recent customer evaluating several possible disk layout configurations showed over 100% difference between two subtly different versions.
On the subject of I/O
- CPUs, have we enough. Fast as possible.
- Memory, as much as we can put in the box. Oracle don't charge us for that
- SAN, big as possible.
At this stage the purchase order is usually given the nod and the hardware supplier will ship yet another run of the mill box. Don't get me wrong. Many experienced DBAs have been through this process many times before and realise that not only is the list in the wrong order but its missing some critical components.
- HBAs, need to specify these in proportion to the CPUs and attached storage
- NICs, might need a lot of these i.e public, cluster interconnect, storage, management, backup. And typically in multiples for resillience or performance.
- Backup, are we using the existing backup infrastructure?
I don't blame anyone for this way of thinking, its the way its always been. When discussing a new server the first question that people tend to ask is "So whats this monster packing? 16 CPUs!!!" followed by lots of very macho grunts and hollering. The standard licensing model (not just Oracles) doesn't help. It starts with premise of a CPU describing the power of a server, and to a large degree it does but misses the point of what a database is all about and that's information. Typically that information is held in ones and zeros on a bunch of spinning scrap metal. The real power of a database comes from its ability to aggregate, analyze and process those ones and zeros, turn it into information and push results out to interested parties. Paraphrasing a little "Its all about I/O stupid".
With this in mind I'm constantly surprised by the imbalance of I/O put into servers both disk and network. Its not unusual to see a 4 cpu server running with the latest generation Intel and AMD CPUs but with a single HBA and dual ported NIC. Whilst memory is cheap many of these servers still run 32 bit kernels. This typically means only a small proportion of the database is cached in memory be it in the SGA or file cache (don't me started on file cache). I'd make a rash guess that whilst the size of the memory in a typical database server has increased the average size of the SGA hasn't increased in line with this trend. To make matters worse the typical size of a database has got significantly bigger. This has to lead us to the conclusion that less of the database is cached and as a result a bigger proportion of its is located on disk. As I said this is just a guess but its backed up with real customer engagements. What would be of interest is to have performed an analysis over the last 10 years to see if the wait event for scattered and sequential reads had decreased or increased as a proportion of the total wait event in production databases.
What I'm driving at is the need to move I/O way up the agenda when sizing a server for databases. The number of CPUs needs to be married to the number of I/O channels available. It makes no sense to buy database licenses for a machine that will simply sit and wait on I/O, Its simply wasting money. Equally it makes no sense to stuff a 4 cpu machine full of HBAs for a database application that will perform index lookups on a index that fits comfortably in the cache. Adding HBAs later to an existing server isn't necessarily a simple option either especially for a mission critical application or one that has hard coded paths to disk.
The next obvious question is "well thats well and good but how do I size the ratio of HBAs to CPUs." and in a typically vague fashion I reply "well that depends". The type of application and the type of processor should heavily influence the decision. Certainly the CPU has been winning the race in terms of performance over the last few years and it needs a lot more I/O to keep it busy. But the equation also needs to be balanced with the amount of memory available on the box. A large SGA will certainly reduce the need to visit disk. The best advice I can give is to speak to your hardware supplier and find out what the current state of play is. Also check the latest TPC-C and TPC-H figures show. Whilst these are generally edging towards the extremes of performance it does show what a hardware supplier believed was needed to show their hardware in the best light.
SQL Developer goes production.
SQLDeveloper
It features much of the functionality you'd expect in a top end development tool plus features that many of its competitors charge top dollar for. The best piece of news is that its an extensible framework and plugins have started to pop up all of the place... One of my particular favourites is at fourthelephant. Perhaps a little over the top for a text man like myself but I appreciate the work that must have gone into it.
They've inspired me to think about putting one together myself... The API is pretty simple and so it shouldn't be too taxing.... Any ideas? drop me a line and I'll see what I can do.
rlwrap : Command line editing in sqlplus
You can find the tool here
Server side failover
Let start by explaining what it is. Server side failover allows the sysadmin/dba to configure the profile of connection availablity on the server using a service. Users are effectively unaware of what will happen in the advent of a node in the cluster failing. Previously in (9i and 10g) users needed an entry in their local tnsnames file to describe which nodes they could failover to and which nodes were used to load balance connections on. Unless you used a remote naming service to maintain the connection information every time you added or removed nodes from a cluster it meant an update to potentially hundreds or thousands of tnsnames files.
This was simplified with easy connect in 10g release 1 which allowed the creation and connection to a service specified on the server. For the first time users only needed to connect to a nominated "listener node" and know the name of the service, for example imagine we have a nominated server inside of our organisation called "oracleservice", this of course is the name used for our virtual ip that will float between our cluster of listeners. In 10g release 1 we could create a service called "orderentry" using either dbca or srvctl that would allow our users to connect to it using a connect string of the form
sqlplus soe/soe@//oracleservice/orderentry
This greatly simplifies Oracle network maintenance. In some cases it could mean the removal of tnsnames files from the client or application server. It has other advantages for the DBA as well. If some business event occurs that requires the provisioning of a new application or a new resource profile for a short period of time the DBA can provision it in seconds and trivially remove it when it is no longer required.
Sadly in Oracle10g release 1 this functionality didn't support Transparent Application Failover (TAF), this meant that DBAs still needed to maintain tnsnames files contain a description of what nodes a service could failover on. The good news is that in Oracle10g release 2 this all changed. DBA's could set up a service specifying TAF and the Oracle OCI layer would use this definition provided by the server to describe the load balancing and failover profile.
Implementing this functionality is pretty trivial but there is a step that might catch you out. So lets go through it step by step
To set up the service you can use either Oracle DBCA, the DBMS_SERVICE package, Enterprise Manager or srvctl. The choice is entirely dependent on what you have running. DBCA or Enterprise manager provide the simplest mechanism but you will still have to run the final step using the dbms_service package to tell the database about its failover profile.
I'll use the DBMS_SERVICE package and srvctl for the sake of brevity. In the following example I have a database called db10g2 with two instances db10g21 and db10g22. Im going to create a service called "orderentry" that will provide transparent application failover between the two instances.
The first step is to create the service using srvctl
srvctl add service -d db10g2 -s orderentry -r "db10g21,db10g22" -a "db10g21,db10g22" -P BASIC
and check on its status
$ > srvctl status service -d db10g2 -s "orderentry"
Service orderentry is not running.
So we'll have to start the service first
$ > srvctl start service -d db10g2 -s "orderentry"
if we now use sqlplus connecting as system/sys we can see the service.
SYSTEM@db10g21 > select SERVICE_ID, NAME, NETWORK_NAME, failover_method from dba_services;
id Name Network Name Failover
--- ------------------ ------------------------- ------------
1 SYS$BACKGROUND
2 SYS$USERS
3 orderentry orderentry
The thing to note is that the service hasn't got a failover profile associated with it. So we'll have to modify it using the DBMS_SERVICE package
SYS@db10g21 > get t1.sql
1 begin
2 DBMS_SERVICE.MODIFY_SERVICE(
3 service_name => 'orderentry',
4 failover_method => DBMS_SERVICE.FAILOVER_METHOD_BASIC,
5 failover_type => DBMS_SERVICE.FAILOVER_TYPE_SELECT,
6 failover_retries => 180,
7 failover_delay => 5);
8* end;
SYS@db10g21 >
if we now select the service information again
id Name Network Name Failover
--- ------------------ ------------------------- ------------
1 SYS$BACKGROUND
2 SYS$USERS
3 orderentry orderentry BASIC
We can now test the service using sqlplus.
sqlplus soe/soe@//node1/orderentry
SQL*Plus: Release 10.2.0.1.0 - Production on Wed Jan 18 14:11:47 2006
Copyright (c) 1982, 2005, Oracle. All rights reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, Real Application Clusters, OLAP and Data Mining options
SOE@//node1/orderentry >
So all we need to do now is to fire up swingbench and use the service we've created.
[oracle@node1 bin]$ ./charbench -cs //node1/orderentry -dt oci -uc 30 -a
Author : Dominic Giles
Version : 2.2
Results will be written to results.xml.
Users : 30 TPM : 272 Nested TPM : 0
If we log onto the database we can see that the connections have being balanced across the two nodes
SYS@db10g21 >;
1 select instance_name, count(1) usercount, nvl(username,'INTERNAL') user_name,
2 failover_type, failover_method
3 from gv$session s, gv$instance i
4 where s.inst_id = i.inst_id
5 group by instance_name, username, failover_type, failover_method
6* order by username, instance_name
SYS@db10g21 > /
Instance No. of Users Username Fail Over Type Fail Over Method
---------- ------------ ---------- ------------------ ------------------
db10g21 15 SOE SELECT BASIC
db10g22 15 SOE SELECT BASIC
db10g21 6 SYS NONE NONE
db10g22 6 SYS NONE NONE
db10g21 23 INTERNAL NONE NONE
db10g22 25 INTERNAL NONE NONE
so lets shut down of the instances
SQL*Plus: Release 10.2.0.1.0 - Production on Wed Jan 18 15:28:22 2006
Copyright (c) 1982, 2005, Oracle. All rights reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, Real Application Clusters, OLAP and Data Mining options
SYS@db10g22 > shutdown abort;
ORACLE instance shut down.
SYS@db10g22 >
And re-query the session profile
Instance No. of Users Username Fail Over Type Fail Over Method
---------- ------------ ---------- ------------------ ------------------
db10g21 30 SOE SELECT BASIC
6 SYS NONE NONE
25 INTERNAL NONE NONE
There's a lot more thats possible using the service approach to database connection but I'll discuss that in another blog.