Create Oracle Database in a Docker Container¶

These Python (3.6) scripts walk through the creation of a database and standby server. The reason I've done this rather than just take the default configuration is that this approach gives me a lot more control over the build and enables me to change specifc steps. If you are just wanting to get the Oracle Database running inside of Docker I strongly suggest that you use the docker files and guides in the Oracle Github repository. The approach documented below is very much for someone who is interested in a high level of control over the various steps in the installation and configuration of the Oracle Database. This current version is build on top of Oracle's Internal GiaaS docker image but will be extended to support the public dockers images as well. It aims to build an Active Data Guard model with maximum performance but can be trivially changed to support any of the required models.

It uses a mix of calls to the Docker Python API and Calls direct to the databases via cx_Oracle.

The following code section imports the needed modules to setup up our Docker container to create the Oracle Database. After that we get a Docker client handle which enables use to call the API against our local Docker environment.

import docker
import humanize
import os
import tarfile
from prettytable import PrettyTable
import cx_Oracle
from IPython.display import HTML, display
import keyring
from ipynb.fs.full.OracleDockerDatabaseFunctions import list_images,list_containers,copy_to,create_and_run_script,containter_exec,containter_root_exec,copy_string_to_file

client = docker.from_env(timeout=600)

list_images(client)

Configuration Parameters¶

The following section contains the parameters for setting the configuration of the install. The following parameters image_name,host_oradata,sb_host_oradata need to be changed, although sb_host_oradata is only important if you are planning on installing a standby database.

# The following parameters are specific to your install and almost certainly need to be changed
image_name = 'cc75a47617' # Taken from the id value above
host_oradata = '/Users/dgiles/Downloads/dockerdbs/oradataprimary' # The directory on the host where primary database will be persisted
sb_host_oradata = '/Users/dgiles/Downloads/dockerdbs/oradatastby' # The direcotry on the host where the standby database will be persisted
#
# The rest are fairly generic and can be changed if needed
oracle_version = '18.0.0'
db_name = 'ORCL'
stby_name = 'ORCL_STBY'
sys_password =  keyring.get_password('docker','sys') # I'm just using keyring to hide my password but you can set it to a simple sting i.e. 'mypassword'
pdb_name = 'soe'
p_host_name = 'oracleprimary'
sb_host_name = 'oraclestby'
total_memory = 2048

container_oradata = '/u01/app/oracle/oradata'
o_base = '/u01/app/oracle'
r_area = f'{o_base}/oradata/recovery_area'
o_area = f'{o_base}/oradata/'
a_area = f'{o_base}/admin/ORCL/adump'
o_home = f'{o_base}/product/{oracle_version}/dbhome_1'
t_admin = f'{o_base}/oradata/dbconfig'
log_size = 200

Create Primary Database¶

This code does the heavy lifting. It creates a container oracleprimary (unless you changed the paramter) running the Oracle Database. The containers 1521 port is mapped onto the the hosts 1521 port. This means that to connect from the host, via a tool like sqlplus, all you'd need to do is sqlplus soe/soe@//locahost/soe.

path = f'{o_home}/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin'

p_container = client.containers.create(image_name,
                                       command="/bin/bash",
                                       hostname=p_host_name,
                                       tty=True,
                                       stdin_open=True,
                                       auto_remove=False,
                                       name=p_host_name,
                                       shm_size='3G',
                                       # network_mode='host',
                                       ports={1521:1521,5500:5500},
                                       volumes={host_oradata: {'bind': container_oradata, 'mode': 'rw'}},
                                       environment={'PATH':path,'ORACLE_SID': db_name, 'ORACLE_BASE': o_base,'TNS_ADMIN': t_admin}
                                       )

p_container.start()
# Make all of the containers attributes available via Python Docker API
p_container.reload()

The next step uses DBCA and configures features like Automatic Memory Mangement, Oracle Managed Files and sets the size of the SGA and redo logs. It prints out the status of the creation during it's progression. NOTE : This step typically takes 10 to 12 minutes.

statement = f'''dbca -silent \
 -createDatabase \
 -templateName General_Purpose.dbc \
 -gdbname {db_name} -sid {db_name} -responseFile NO_VALUE \
 -characterSet AL32UTF8 \
 -sysPassword {sys_password} \
 -systemPassword {sys_password} \
 -createAsContainerDatabase true \
 -numberOfPDBs 1 \
 -pdbName {pdb_name} \
 -pdbAdminPassword {sys_password} \
 -databaseType MULTIPURPOSE \
 -totalMemory {total_memory} \
 -memoryMgmtType AUTO_SGA \
 -recoveryAreaDestination "{r_area}" \
 -storageType FS \
 -useOMF true \
 -datafileDestination "{o_area}" \
 -redoLogFileSize {log_size} \
 -emConfiguration NONE \
 -ignorePreReqs\
 '''

containter_exec(p_container, statement)

[WARNING] [DBT-06208] The 'SYS' password entered does not conform to the Oracle recommended standards.
   CAUSE: 
a. Oracle recommends that the password entered should be at least 8 characters in length, contain at least 1 uppercase character, 1 lower case character and 1 digit [0-9].
b.The password entered is a keyword that Oracle does not recommend to be used as password
   ACTION: Specify a strong password. If required refer Oracle documentation for guidelines.
[WARNING] [DBT-06208] The 'SYSTEM' password entered does not conform to the Oracle recommended standards.
   CAUSE: 
a. Oracle recommends that the password entered should be at least 8 characters in length, contain at least 1 uppercase character, 1 lower case character and 1 digit [0-9].
b.The password entered is a keyword that Oracle does not recommend to be used as password
   ACTION: Specify a strong password. If required refer Oracle documentation for guidelines.
[WARNING] [DBT-06208] The 'PDBADMIN' password entered does not conform to the Oracle recommended standards.
   CAUSE: 
a. Oracle recommends that the password entered should be at least 8 characters in length, contain at least 1 uppercase character, 1 lower case character and 1 digit [0-9].
b.The password entered is a keyword that Oracle does not recommend to be used as password
   ACTION: Specify a strong password. If required refer Oracle documentation for guidelines.
Prepare for db operation
8% complete
Copying database files
31% complete
Creating and starting Oracle instance
32% complete
36% complete
40% complete
43% complete
46% complete
Completing Database Creation
51% complete
53% complete

54% complete

Creating Pluggable Databases

58% complete

77% complete
Executing Post Configuration Actions
100% complete
Database creation complete. For details check the logfiles at:
 /u01/app/oracle/cfgtoollogs/dbca/ORCL.
Database Information:
Global Database Name:ORCL
System Identifier(SID):ORCL
Look at the log file "/u01/app/oracle/cfgtoollogs/dbca/ORCL/ORCL.log" for further details.

Create Primary Database's Listener¶

This step creates the database listener for the primary database. The tnsnames.ora will be over written in a later step if you choose to have a stand by configuration. NOTE : I could create a DNSMasq container or something similar and add container networking details to make the whole inter node communication simpler but it's a bit of an overkill and so we'll use IP addresses which are easily found.

p_ip_adress = p_container.attrs['NetworkSettings']['IPAddress']

p_listener = f'''LISTENER=
    (DESCRIPTION=
        (ADDRESS = (PROTOCOL=tcp)(HOST={p_ip_adress})(PORT=1521))
        (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1521))
    )    

SID_LIST_LISTENER=
    (SID_LIST=
        (SID_DESC=
            (GLOBAL_DBNAME={db_name}_DGMGRL)
            (ORACLE_HOME={o_home})
            (SID_NAME={db_name})
            (ENVS="TNS_ADMIN={t_admin}"
        )
    ) 
'''
copy_string_to_file(p_listener, f'{t_admin}/listener.ora', p_container)

contents = '''NAMES.DIRECTORY_PATH= (TNSNAMES, EZCONNECT)'''

copy_string_to_file(contents, f'{t_admin}/sqlnet.ora', p_container)

contents = f'''
ORCL = 
  (DESCRIPTION = 
    (ADDRESS = (PROTOCOL = TCP)(HOST = {p_ip_adress})(PORT=1521)) 
    (CONNECT_DATA = 
      (SERVER = DEDICATED) 
      (SID = {db_name}) 
    ) 
  ) 
'''
copy_string_to_file(contents, f'{t_admin}/tnsnames.ora', p_container)

And start the listener

containter_exec(p_container, 'lsnrctl start')

LSNRCTL for Linux: Version 18.0.0.0.0 - Production on 25-MAR-2019 17:05:35

Copyright (c) 1991, 2018, Oracle.  All rights reserved.
Starting /u01/app/oracle/product/18.0.0/dbhome_1/bin/tnslsnr: please wait...
TNSLSNR for Linux: Version 18.0.0.0.0 - Production
System parameter file is /u01/app/oracle/oradata/dbconfig/listener.ora
Log messages written to /u01/app/oracle/diag/tnslsnr/oracleprimary/listener/alert/log.xml
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=172.17.0.2)(PORT=1521)))
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=172.17.0.2)(PORT=1521)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 18.0.0.0.0 - Production
Start Date                25-MAR-2019 17:05:35
Uptime                    0 days 0 hr. 0 min. 0 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/oracle/oradata/dbconfig/listener.ora
Listener Log File         /u01/app/oracle/diag/tnslsnr/oracleprimary/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=172.17.0.2)(PORT=1521)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))
Services Summary...
Service "ORCL_DGMGRL" has 1 instance(s).
  Instance "ORCL", status UNKNOWN, has 1 handler(s) for this service...
The command completed successfully

At this stage you should have a fully functioning Oracle Database. In theory there's no need to go any further if thats all you want.

Create Stand By Container¶

This step creates another container to run the standby databases. It should be pretty much instant. NOTE : You'll only need to run the rest of the code from here if you need a standby database.

sb_container = client.containers.create(image_name,
                                       hostname=sb_host_name,
                                       command="/bin/bash",
                                       tty=True,
                                       stdin_open=True,
                                       auto_remove=False,
                                       name=sb_host_name,
                                       shm_size='3G',
                                       ports={1521:1522,5500:5501},
                                       volumes={sb_host_oradata: {'bind': container_oradata, 'mode': 'rw'}},
                                       environment={'PATH':path,'ORACLE_SID':db_name,'ORACLE_BASE':o_base,'TNS_ADMIN':t_admin}
                                       )
sb_container.start()
# Make all of the containers attributes available via Python Docker API
sb_container.reload()

Display the running containers.

list_containers(client)

Configure the Standby Database¶

Create some additional directories on the standby so they are consistent with the primary.

containter_exec(sb_container, f'mkdir -p {o_area}/{db_name}')
containter_exec(sb_container, f'mkdir -p {t_admin}')
containter_exec(sb_container, f'mkdir -p {r_area}/{db_name}')
containter_exec(sb_container, f'mkdir -p {a_area}')

Create Standby Database's Listener¶

Create the standby listenrs network configuration and then start the listener. NOTE : We'll be overwriting the primary databases tnsnames.ora file in this step.

sb_ip_adress = sb_container.attrs['NetworkSettings']['IPAddress']

contents = f'''
ORCL = 
  (DESCRIPTION = 
    (ADDRESS = (PROTOCOL = TCP)(HOST = {p_ip_adress})(PORT=1521)) 
    (CONNECT_DATA = 
      (SERVER = DEDICATED) 
      (SID = {db_name}) 
    ) 
  ) 

ORCL_STBY =  
  (DESCRIPTION = 
    (ADDRESS = (PROTOCOL = TCP)(HOST = {sb_ip_adress})(PORT=1521)) 
    (CONNECT_DATA = 
      (SERVER = DEDICATED) 
      (SID = {db_name}) 
    ) 
  ) 
'''

copy_string_to_file(contents, f'{t_admin}/tnsnames.ora', p_container)
copy_string_to_file(contents, f'{t_admin}/tnsnames.ora', sb_container)

sb_listener = f'''LISTENER=
    (DESCRIPTION=
        (ADDRESS = (PROTOCOL=tcp)(HOST={sb_ip_adress})(PORT =1521))
        (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1521))
    )    

SID_LIST_LISTENER=
    (SID_LIST=
        (SID_DESC=
            (GLOBAL_DBNAME={stby_name}_DGMGRL)
            (ORACLE_HOME={o_home})
            (SID_NAME={db_name})
            (ENVS="TNS_ADMIN={t_admin}"
        )
    ) 
'''
copy_string_to_file(sb_listener, f'{t_admin}/listener.ora', sb_container)

contents = '''NAMES.DIRECTORY_PATH= (TNSNAMES, EZCONNECT)'''

copy_string_to_file(contents, f'{t_admin}/sqlnet.ora', sb_container)

And start the listener

containter_exec(sb_container, 'lsnrctl start')

LSNRCTL for Linux: Version 18.0.0.0.0 - Production on 25-MAR-2019 17:08:51

Copyright (c) 1991, 2018, Oracle.  All rights reserved.
Starting /u01/app/oracle/product/18.0.0/dbhome_1/bin/tnslsnr: please wait...
TNSLSNR for Linux: Version 18.0.0.0.0 - Production
System parameter file is /u01/app/oracle/oradata/dbconfig/listener.ora
Log messages written to /u01/app/oracle/diag/tnslsnr/oraclestby/listener/alert/log.xml
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=172.17.0.3)(PORT=1521)))
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=172.17.0.3)(PORT=1521)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 18.0.0.0.0 - Production
Start Date                25-MAR-2019 17:08:51
Uptime                    0 days 0 hr. 0 min. 0 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/oracle/oradata/dbconfig/listener.ora
Listener Log File         /u01/app/oracle/diag/tnslsnr/oraclestby/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=172.17.0.3)(PORT=1521)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))
Services Summary...
Service "ORCL_STBY_DGMGRL" has 1 instance(s).
  Instance "ORCL", status UNKNOWN, has 1 handler(s) for this service...
The command completed successfully

Configure the servers for Data Guard¶

It might be necessary to pause for a few seconds before moving onto the next step to allow the database to register with the listener...

The next step is to connect to primary and standby servers and set various parameters and configuration to enable us to run Data Guard.

First check the status of archive logging on the primary.

connection = cx_Oracle.connect("sys",sys_password,f"//localhost:1521/{db_name}", mode=cx_Oracle.SYSDBA)
cursor = connection.cursor();
rs = cursor.execute("SELECT log_mode FROM v$database")
for row in rs:
    print(f"Database is in {row[0]} mode")

Database is in NOARCHIVELOG mode

By default it will be in no archivelog mode so we need to shut it down and enable archive log mode

contents = '''sqlplus / as sysdba << EOF
SET ECHO ON;
SHUTDOWN IMMEDIATE;
STARTUP MOUNT;
ALTER DATABASE ARCHIVELOG;
ALTER DATABASE OPEN;
EOF
'''

create_and_run_script(contents, '/tmp/set_archivelog.sql',  '/bin/bash /tmp/set_archivelog.sql', p_container)

/bin/bash /tmp/set_archivelog.sql
SQL*Plus: Release 18.0.0.0.0 - Production on Mon Mar 25 17:09:15 2019
Version 18.3.0.0.0

Copyright (c) 1982, 2018, Oracle.  All rights reserved.
Connected to:
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
Version 18.3.0.0.0
SQL> SQL>
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL>
ORACLE instance started.

Total System Global Area 1610612704 bytes
Fixed Size		    8896480 bytes
Variable Size		  520093696 bytes
Database Buffers	 1073741824 bytes
Redo Buffers		    7880704 bytes
Database mounted.
SQL>
Database altered.

SQL>
Database altered.

SQL>
Disconnected from Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
Version 18.3.0.0.0

And check again

connection = cx_Oracle.connect("sys",sys_password,f"//localhost:1521/{db_name}", mode=cx_Oracle.SYSDBA)
cursor = connection.cursor();
rs = cursor.execute("SELECT log_mode FROM v$database")
for row in rs:
    print(f"Database is in {row[0]} mode")

Database is in ARCHIVELOG mode

And then force a log switch

cursor.execute("ALTER DATABASE FORCE LOGGING")
cursor.execute("ALTER SYSTEM SWITCH LOGFILE")

Add some Standby Logging Files

cursor.execute("ALTER DATABASE ADD STANDBY LOGFILE SIZE 200M")
cursor.execute("ALTER DATABASE ADD STANDBY LOGFILE SIZE 200M")
cursor.execute("ALTER DATABASE ADD STANDBY LOGFILE SIZE 200M")
cursor.execute("ALTER DATABASE ADD STANDBY LOGFILE SIZE 200M")

Enable Flashback and standby file management

cursor.execute("ALTER DATABASE FLASHBACK ON")
cursor.execute("ALTER SYSTEM SET STANDBY_FILE_MANAGEMENT=AUTO")

Start an instance¶

Create a temporary init.ora file to enable us to start an instance on the standby

contents = f"DB_NAME='{db_name}'\n"
copy_string_to_file(contents, f'/tmp/init{db_name}.ora', sb_container)

Create a password file on the standby with the same parameters as the primary

containter_exec(sb_container, f'orapwd file=$ORACLE_HOME/dbs/orapw{db_name} password={sys_password} entries=10 format=12')

And start up the standby instance

contents = f'''STARTUP NOMOUNT PFILE='/tmp/init{db_name}.ora';
EXIT;
'''
create_and_run_script(contents, '/tmp/start_db.sql', 'sqlplus / as sysdba @/tmp/start_db.sql', sb_container)

sqlplus / as sysdba @/tmp/start_db.sql
SQL*Plus: Release 18.0.0.0.0 - Production on Mon Mar 25 17:11:01 2019
Version 18.3.0.0.0

Copyright (c) 1982, 2018, Oracle.  All rights reserved.
Connected to an idle instance.
ORACLE instance started.
Total System Global Area  268433000 bytes
Fixed Size		    8656488 bytes
Variable Size		  201326592 bytes
Database Buffers	   50331648 bytes
Redo Buffers		    8118272 bytes
Disconnected from Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
Version 18.3.0.0.0

Duplicate the Primary database to the Standby database¶

Duplicate the primary to the standby. For some reason the tnsnames isn't picked up on the primary/standby in the same location so an explicit connection string is needed.

contents = f'''rman TARGET sys/{sys_password}@{db_name} AUXILIARY sys/{sys_password}@{stby_name} << EOF
DUPLICATE TARGET DATABASE
  FOR STANDBY
  FROM ACTIVE DATABASE
  DORECOVER
  SPFILE
    SET db_unique_name='{stby_name}' COMMENT 'Is standby'
  NOFILENAMECHECK;
EOF
'''

create_and_run_script(contents, '/tmp/duplicate.sh', "/bin/bash /tmp/duplicate.sh", sb_container)

/bin/bash /tmp/duplicate.sh
Recovery Manager: Release 18.0.0.0.0 - Production on Mon Mar 25 17:11:27 2019
Version 18.3.0.0.0

Copyright (c) 1982, 2018, Oracle and/or its affiliates.  All rights reserved.
connected to target database: ORCL (DBID=1531636684)
connected to auxiliary database: ORCL (not mounted)

RMAN>
2>
3> 4>
5> 6>
7>

Starting Duplicate Db at 25-MAR-19
using target database control file instead of recovery catalog
allocated channel: ORA_AUX_DISK_1
channel ORA_AUX_DISK_1: SID=20 device type=DISK
current log archived

contents of Memory Script:
{
backup as copy reuse
passwordfile auxiliary format  '/u01/app/oracle/product/18.0.0/dbhome_1/dbs/orapwORCL'   ;
restore clone from service  'ORCL' spfile to
'/u01/app/oracle/product/18.0.0/dbhome_1/dbs/spfileORCL.ora';
sql clone "alter system set spfile= ''/u01/app/oracle/product/18.0.0/dbhome_1/dbs/spfileORCL.ora''";
}
executing Memory Script

Starting backup at 25-MAR-19
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=272 device type=DISK
Finished backup at 25-MAR-19

Starting restore at 25-MAR-19
using channel ORA_AUX_DISK_1

...
...
...

released channel: ORA_AUX_DISK_1
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=272 device type=DISK
deleted archived log
archived log file name=/u01/app/oracle/oradata/recovery_area/ORCL_STBY/archivelog/2019_03_25/o1_mf_1_6_g9l30xtn_.arc RECID=1 STAMP=1003857181
deleted archived log
archived log file name=/u01/app/oracle/oradata/recovery_area/ORCL_STBY/archivelog/2019_03_25/o1_mf_1_7_g9l30yxy_.arc RECID=2 STAMP=1003857182
Deleted 2 objects

Finished Duplicate Db at 25-MAR-19

RMAN>

Recovery Manager complete.

Start the Data Guard Broker¶

It's best practice to use Data Guard Broker and so we'll need to start it on both instances and then create a configuration.

cursor.execute("ALTER SYSTEM SET dg_broker_start=true")

sb_connection = cx_Oracle.connect("sys",sys_password,f"//localhost:1522/{stby_name}", mode=cx_Oracle.SYSDBA)
sb_cursor = sb_connection.cursor()
sb_cursor.execute("ALTER SYSTEM SET dg_broker_start=true")

Create a configuration

contents = f'''export TNS_ADMIN={t_admin};
dgmgrl sys/{sys_password}@{db_name} << EOF
SET ECHO ON;
CREATE CONFIGURATION orcl_stby_config AS PRIMARY DATABASE IS {db_name} CONNECT IDENTIFIER IS {db_name};
EXIT;
EOF
'''

create_and_run_script(contents, '/tmp/dgconfig.sh', "/bin/bash /tmp/dgconfig.sh", p_container)

/bin/bash /tmp/dgconfig.sh
DGMGRL for Linux: Release 18.0.0.0.0 - Production on Mon Mar 25 17:13:53 2019
Version 18.3.0.0.0

Copyright (c) 1982, 2018, Oracle and/or its affiliates.  All rights reserved.
Welcome to DGMGRL, type "help" for information.
Connected to "ORCL"
Connected as SYSDBA.
DGMGRL>
DGMGRL> CREATE CONFIGURATION orcl_stby_config AS PRIMARY DATABASE IS ORCL CONNECT IDENTIFIER IS ORCL;
Configuration "orcl_stby_config" created with primary database "orcl"
DGMGRL> EXIT;

Add the standby

contents = f'''export TNS_ADMIN={t_admin};
dgmgrl sys/{sys_password}@{db_name} << EOF
SET ECHO ON;
ADD DATABASE {stby_name} AS CONNECT IDENTIFIER IS {stby_name} MAINTAINED AS PHYSICAL;
EXIT;
EOF
'''

create_and_run_script(contents, '/tmp/dgconfig2.sh', "/bin/bash /tmp/dgconfig2.sh", p_container)

/bin/bash /tmp/dgconfig2.sh
DGMGRL for Linux: Release 18.0.0.0.0 - Production on Mon Mar 25 17:14:05 2019
Version 18.3.0.0.0

Copyright (c) 1982, 2018, Oracle and/or its affiliates.  All rights reserved.
Welcome to DGMGRL, type "help" for information.
Connected to "ORCL"
Connected as SYSDBA.
DGMGRL>
DGMGRL>
ADD DATABASE ORCL_STBY AS CONNECT IDENTIFIER IS ORCL_STBY
MAINTAINED AS PHYSICAL;
Database "orcl_stby" added
DGMGRL> EXIT;

Enable the configuration

contents = f'''export TNS_ADMIN={t_admin};
dgmgrl sys/{sys_password}@{db_name} << EOF
SET ECHO ON;
ENABLE CONFIGURATION;
EXIT;
EOF
'''

create_and_run_script(contents, '/tmp/dgconfig3.sh', "/bin/bash /tmp/dgconfig3.sh", p_container)

/bin/bash /tmp/dgconfig3.sh
DGMGRL for Linux: Release 18.0.0.0.0 - Production on Mon Mar 25 17:14:17 2019
Version 18.3.0.0.0

Copyright (c) 1982, 2018, Oracle and/or its affiliates.  All rights reserved.
Welcome to DGMGRL, type "help" for information.
Connected to "ORCL"
Connected as SYSDBA.
DGMGRL>
DGMGRL>
ENABLE CONFIGURATION;
Enabled.
DGMGRL>
EXIT;

Display the configuration

contents = f'''export TNS_ADMIN={t_admin};
dgmgrl sys/{sys_password}@{db_name} << EOF
SET ECHO ON;
SHOW CONFIGURATION;
SHOW DATABASE {db_name};
SHOW DATABASE {stby_name};
EOF
'''

create_and_run_script(contents, '/tmp/dgconfig4.sh', "/bin/bash /tmp/dgconfig4.sh", p_container)

/bin/bash /tmp/dgconfig4.sh
DGMGRL for Linux: Release 18.0.0.0.0 - Production on Mon Mar 25 17:14:40 2019
Version 18.3.0.0.0

Copyright (c) 1982, 2018, Oracle and/or its affiliates.  All rights reserved.

Welcome to DGMGRL, type "help" for information.
Connected to "ORCL"
Connected as SYSDBA.
DGMGRL>
DGMGRL>
SHOW CONFIGURATION;

Configuration - orcl_stby_config
Protection Mode: MaxPerformance
Members:
orcl      - Primary database
  orcl_stby - Physical standby database
Warning: ORA-16854: apply lag could not be determined

Fast-Start Failover: DISABLED
Configuration Status:
WARNING (status updated 7 seconds ago)
DGMGRL>
SHOW DATABASE ORCL;
Database - orcl
Role:             PRIMARY
Intended State:   TRANSPORT-ON
Instance(s):
  ORCL
Database Status:
SUCCESS
DGMGRL>
SHOW DATABASE ORCL_STBY;
Database - orcl_stby
Role:             PHYSICAL STANDBY
Intended State:   APPLY-ON
Transport Lag:    0 seconds (computed 2 seconds ago)
Apply Lag:        (unknown)
Average Apply Rateunknown)
Real Time Query:  OFF
Instance(s):
  ORCL


Database Warning(s):
ORA-16854: apply lag could not be determined

Database Status:
WARNING

DGMGRL>
DGMGRL>

Start the Standby in managed recovery¶

We now need to start the standby so it begins applying redo to keep it consistent with the primary.

contents='''sqlplus / as sysdba << EOF
SET ECHO ON;
SHUTDOWN IMMEDIATE;
STARTUP MOUNT;
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION;
EOF
'''

create_and_run_script(contents, '/tmp/convert_to_active.sh', "/bin/bash /tmp/convert_to_active.sh", sb_container)

/bin/bash /tmp/convert_to_active.sh
SQL*Plus: Release 18.0.0.0.0 - Production on Mon Mar 25 17:15:08 2019
Version 18.3.0.0.0

Copyright (c) 1982, 2018, Oracle.  All rights reserved.
Connected to:
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
Version 18.3.0.0.0

SQL> SQL>
ORA-01109: database not open
Database dismounted.
ORACLE instance shut down.
SQL>
ORACLE instance started.
Total System Global Area 1610612704 bytes
Fixed Size		    8896480 bytes
Variable Size		  520093696 bytes
Database Buffers	 1073741824 bytes
Redo Buffers		    7880704 bytes
Database mounted.
SQL>

Database altered.

SQL>
Disconnected from Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
Version 18.3.0.0.0

Standby Database Creation Complete¶

We now have a primary and standby database that we can begin testing with.

Additional Steps¶

At this point you should have a physical standby database that is running in maximum performance mode. This might be enough for the testing you want to carry out but there's a number of possible changes that you might want to consider.

Change the physical standby database to an Active Standby
Convert the current mode (Maximum Performance) to Maximum Protection or Maximum Availability
Configure the Oracle Database 19c Active Data Guard feature, DML Redirect

I'll cover these in the following sections but they "icing on the cake" rather than required.

Active Data Guard¶

This is a relatively trivial change. We just need to alter the standby database to open readonly and then start managed recovery as before

contents='''sqlplus / as sysdba << EOF
SET ECHO ON;
SHUTDOWN IMMEDIATE;
STARTUP MOUNT;
ALTER DATABASE OPEN READ ONLY;
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION;
EOF
'''

create_and_run_script(contents, '/tmp/convert_to_active.sh', "/bin/bash /tmp/convert_to_active.sh", sb_container)

/bin/bash /tmp/convert_to_active.sh
SQL*Plus: Release 18.0.0.0.0 - Production on Mon Mar 25 17:16:16 2019
Version 18.3.0.0.0

Copyright (c) 1982, 2018, Oracle.  All rights reserved.
Connected to:
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
Version 18.3.0.0.0
SQL>
SQL>
ORA-01109: database not open
Database dismounted.
ORACLE instance shut down.
SQL>
ORACLE instance started.

Total System Global Area 1610612704 bytes
Fixed Size		    8896480 bytes
Variable Size		  520093696 bytes
Database Buffers	 1073741824 bytes
Redo Buffers		    7880704 bytes
Database mounted.
SQL>
Database altered.
SQL>
Database altered.

SQL> Disconnected from Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
Version 18.3.0.0.0

Maximum Performance to Maximum Availability¶

For this change we'll use the Database Guard Broker command line tool to make the change

contents = f'''
dgmgrl sys/{sys_password}@{db_name} << EOF
SET ECHO ON;
SHOW CONFIGURATION;
edit database {stby_name} set property logxptmode=SYNC;
edit configuration set protection mode as maxavailability;
SHOW CONFIGURATION;
EOF
'''

create_and_run_script(contents, '/tmp/max_avail.sh', "/bin/bash /tmp/max_avail.sh", p_container)

Maximum Performance to Maximum Protection¶

As before we'll use the Database Guard Broker command line tool to make the change.

contents = f'''
dgmgrl sys/{sys_password}@{db_name} << EOF
SET ECHO ON;
SHOW CONFIGURATION;
edit database {stby_name} set property logxptmode=SYNC;
edit configuration set protection mode as maxprotection;
SHOW CONFIGURATION;
EOF
'''

create_and_run_script(contents, '/tmp/max_prot.sh', "/bin/bash /tmp/max_prot.sh", p_container)

Back to Max Perfromance¶

We'll use Database Guard Broker to change us back to asynchronus mode.

contents = f'''
dgmgrl sys/{sys_password}@{db_name} << EOF
SET ECHO ON;
SHOW CONFIGURATION;
edit configuration set protection mode as maxperformance;
edit database {stby_name} set property logxptmode=ASYNC;
SHOW CONFIGURATION;
EOF
'''

create_and_run_script(contents, '/tmp/max_prot.sh', "/bin/bash /tmp/max_prot.sh", p_container)

Oracle Database 19c Acvtive Data Guard DML Redirect¶

On Oracle Database 19c we can also enable DML redirect from the standby to the primary. I'll add this on the release of the Oracle Database 19c software for on premises.

Build Docker Oracle Database Base Image¶

The following notebook goes through the process of building an Oracle Docker image of the Oracle Database. If you are just wanting to get the Oracle Database running inside of Docker I strongly suggest that you use the docker files and guides in the Oracle Github repository. The approach documented below is very much for someone who is interested in a high level of control over the various steps in the installation and configuration of the Oracle Database or simply to understand how the various teps work.

Prerequisites¶

The process documented below uses a Jupyter Notebook (iPython). The reason I use this approach and not straight python is that it's easy to change and is self documenting. It only takes a few minutes to set up the environment. I've included a requirements file which makes it very simple to install the needed Python libraries. I go through the process of setting up a Jupyter environment for Mac and Linux here.

Running the notebook¶

Typically the only modification that the user will need to do is to modifythe values in the "Parameters" section. The code can then be run by pressing "Command SHIFT" on a Mac or "Ctrl Shift" on Windows. Or by pressing the "Play" icon in the tool bar. It is also possible to run all of the cells automatically, you can do this from "Run" menu item.

import docker
import os
import tarfile
from prettytable import PrettyTable
from IPython.display import HTML, display, Markdown
import humanize
import re
from ipynb.fs.full.OracleDockerDatabaseFunctions import list_images,list_containers,copy_to,create_and_run_script,containter_exec,copy_string_to_file,containter_root_exec

client = docker.from_env(timeout=600)

Parameters¶

This section details the parameters used to define the docker image you'll end up creating. It's almost certainly the case that you'll need to change the parameters in the first section. The parameters in the second section can be changed if there's something i.e. hostname that you want to change

# You'll need to to change the following two parameters to reflect your environment
host_orabase = '/Users/dgiles/oradata18c' # The directory on the host where you'll stage the persisted datafiles
host_install_dir = '/Users/dgiles/Downloads/oracle18_software' # The directory on the host where the downloaded Oracle Database zip file is.
# You can change any of the following parameters but it's not necssary
p_host_name = 'oracle_db'
oracle_version = '18.0.0'
oracle_base = '/u01/app/oracle'
oracle_home = f'{oracle_base}/product/{oracle_version}/dbhome_1'
db_name = 'ORCL'
oracle_sid= db_name
path=f'{oracle_home}/bin:$PATH'
tns_admin=f'{oracle_base}/oradata/dbconfig'
container_oradata = '/u01/app/oracle/oradata'
r_area = f'{oracle_base}/oradata/recovery_area'
a_area = f'{oracle_base}/admin/ORCL/adump'
container_install_dir = '/u01/install_files'
path = f'{oracle_home}/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin'

Attempt to create the needed directories on the host and print warnings if needed

try:
    os.makedirs(host_orabase, exist_ok=True)
    os.makedirs(host_install_dir, exist_ok=True)
except:
   display(Markdown(f'**WARNING** : Unable to create directories {host_orabase} and {host_install_dir}'))
    
files = os.listdir(host_install_dir)
found_similar:bool = False
for file in files:
    if file.startswith('LINUX.X64'):
        found_similar = True
        break
if not found_similar:
    display(Markdown(f"**WARNING** : Are you sure that you've downloaded the needed Oracle executable to the `{host_install_dir}` directory"))

The first step in creating a usable image is to create a docker file. This details what the docker container will be based on and what needs to be installed. It will use the parameters defined above. It does require network connectivity for this to work as docker will pull down the required images and RPMs.

script = f'''
FROM oraclelinux:7-slim

ENV ORACLE_BASE={oracle_base} \
    ORACLE_HOME={oracle_home} \
    ORACLE_SID={oracle_sid} \
    PATH={oracle_home}/bin:$PATH
    
RUN yum -y install unzip
RUN yum -y install oracle-database-preinstall-18c
RUN yum -y install openssl
    
# RUN groupadd -g 500 dba
# RUN useradd -ms /bin/bash -g dba oracle

RUN mkdir -p $ORACLE_BASE
RUN mkdir -p $ORACLE_HOME
RUN mkdir -p {container_install_dir}

RUN chown -R oracle:dba {oracle_base}
RUN chown -R oracle:dba {oracle_home}
RUN chown -R oracle:dba {container_install_dir}

USER oracle
WORKDIR /home/oracle

VOLUME ["$ORACLE_BASE/oradata"]
VOLUME ["{container_install_dir}"]
EXPOSE 1521 8080 5500
'''

with open('Dockerfile','w') as f:
    f.write(script)

And now we can create the image. The period of time for this operation to complete will depend on what docker images have already been downloaded/cached and your network speed.

image, output = client.images.build(path=os.getcwd(),dockerfile='Dockerfile', tag=f"linux_for_db{oracle_version}",rm="True",nocache="False")
for out in output:
    print(out)

{'stream': 'Step 1/16 : FROM oraclelinux:7-slim'}
{'stream': '\n'}
{'stream': ' ---> c3d869388183\n'}
{'stream': 'Step 2/16 : ENV ORACLE_BASE=/u01/app/oracle     ORACLE_HOME=/u01/app/oracle/product/18.0.0/dbhome_1     ORACLE_SID=ORCL     PATH=/u01/app/oracle/product/18.0.0/dbhome_1/bin:$PATH'}
{'stream': '\n'}
{'stream': ' ---> Running in f7be15a38f17\n'}
{'stream': 'Removing intermediate container f7be15a38f17\n'}
{'stream': ' ---> ec34e0a4ca0b\n'}
{'stream': 'Step 3/16 : RUN yum -y install unzip'}
{'stream': '\n'}
{'stream': ' ---> Running in b8ae90884550\n'}
{'stream': 'Loaded plugins: ovl\n'}
{'stream': 'Resolving Dependencies\n'}
{'stream': '--> Running transaction check\n'}
{'stream': '---> Package unzip.x86_64 0:6.0-19.el7 will be installed\n'}
{'stream': '--> Finished Dependency Resolution\n'}
{'stream': '\nDependencies Resolved\n'}
{'stream': '\n================================================================================\n Package        Arch            Version               Repository           Size\n================================================================================\nInstalling:\n unzip          x86_64          6.0-19.el7            ol7_latest          169 k\n\nTransaction Summary\n================================================================================\nInstall  1 Package\n\n'}
{'stream': 'Total download size: 169 k\n'}
{'stream': 'Installed size: 365 k\n'}
{'stream': 'Downloading packages:\n'}
{'stream': 'Running transaction check\n'}
{'stream': 'Running transaction test\n'}
{'stream': 'Transaction test succeeded\n'}
{'stream': 'Running transaction\n'}
{'stream': '  Installing : unzip-6.0-19.el7.x86_64                                      1/1'}
{'stream': ' \n  Verifying  : unzip-6.0-19.el7.x86_64                                      1/1'}
{'stream': ' \n\nInstalled:\n  unzip.x86_64 0:6.0-19.el7                                                     \n\n'}
{'stream': 'Complete!\n'}
{'stream': 'Removing intermediate container b8ae90884550\n'}
{'stream': ' ---> d7f9d5678e7f\n'}
{'stream': 'Step 4/16 : RUN yum -y install oracle-database-preinstall-18c'}
{'stream': '\n'}
{'stream': ' ---> Running in 35e010d92729\n'}
{'stream': 'Loaded plugins: ovl\n'}
{'stream': 'Resolving Dependencies\n'}
{'stream': '--> Running transaction check\n'}
{'stream': '---> Package oracle-database-preinstall-18c.x86_64 0:1.0-1.el7 will be installed\n'}
{'stream': '--> Processing Dependency: binutils for package: oracle-database-preinstall-18c-1.0-1.el7.x86_64\n'}
{'stream': '--> Processing Dependency: bc for package: oracle-database-preinstall-18c-1.0-1.el7.x86_64\n'}
{'stream': '--> Processing Dependency: initscripts for package: oracle-database-preinstall-18c-1.0-1.el7.x86_64\n'}
{'stream': '--> Processing Dependency: sysstat for package: oracle-database-preinstall-18c-1.0-1.el7.x86_64\n'}
{'stream': '--> Processing Dependency: xorg-x11-utils for package: oracle-database-preinstall-18c-1.0-1.el7.x86_64\n'}
{'stream': '--> Processing Dependency: bind-utils for package: oracle-database-preinstall-18c-1.0-1.el7.x86_64\n'}
{'stream': '--> Processing Dependency: net-tools for package: oracle-database-preinstall-18c-1.0-1.el7.x86_64\n'}
{'stream': '--> Processing Dependency: pam for package: oracle-database-preinstall-18c-1.0-1.el7.x86_64\n'}
{'stream': '--> Processing Dependency: compat-libcap1 for package: oracle-database-preinstall-18c-1.0-1.el7.x86_64\n'}
{'stream': '--> Processing Dependency: ksh for package: oracle-database-preinstall-18c-1.0-1.el7.x86_64\n'}
{'stream': '--> Processing Dependency: libaio-devel for package: oracle-database-preinstall-18c-1.0-1.el7.x86_64\n'}
{'stream': '--> Processing Dependency: xorg-x11-xauth for package: oracle-database-preinstall-18c-1.0-1.el7.x86_64\n'}
{'stream': '--> Processing Dependency: util-linux-ng for package: oracle-database-preinstall-18c-1.0-1.el7.x86_64\n'}
{'stream': '--> Processing Dependency: libaio for package: oracle-database-preinstall-18c-1.0-1.el7.x86_64\n'}
{'stream': '--> Processing Dependency: psmisc for package: oracle-database-preinstall-18c-1.0-1.el7.x86_64\n'}
{'stream': '--> Processing Dependency: module-init-tools for package: oracle-database-preinstall-18c-1.0-1.el7.x86_64\n'}
{'stream': '--> Processing Dependency: glibc-devel for package: oracle-database-preinstall-18c-1.0-1.el7.x86_64\n'}
{'stream': '--> Processing Dependency: compat-libstdc++-33 for package: oracle-database-preinstall-18c-1.0-1.el7.x86_64\n'}
{'stream': '--> Processing Dependency: libstdc++-devel for package: oracle-database-preinstall-18c-1.0-1.el7.x86_64\n'}
{'stream': '--> Processing Dependency: ethtool for package: oracle-database-preinstall-18c-1.0-1.el7.x86_64\n'}

....
....
....


{'stream': ' ---> Running in 7a359fa6a8d3\n'}
{'stream': 'Removing intermediate container 7a359fa6a8d3\n'}
{'stream': ' ---> b04ec8faaa7b\n'}
{'stream': 'Step 13/16 : WORKDIR /home/oracle'}
{'stream': '\n'}
{'stream': ' ---> Running in 6f3186c4b034\n'}
{'stream': 'Removing intermediate container 6f3186c4b034\n'}
{'stream': ' ---> db70e7f386dd\n'}
{'stream': 'Step 14/16 : VOLUME ["$ORACLE_BASE/oradata"]'}
{'stream': '\n'}
{'stream': ' ---> Running in a3807026181b\n'}
{'stream': 'Removing intermediate container a3807026181b\n'}
{'stream': ' ---> fa4f83782cac\n'}
{'stream': 'Step 15/16 : VOLUME ["/u01/install_files"]'}
{'stream': '\n'}
{'stream': ' ---> Running in 4a5210ddaf73\n'}
{'stream': 'Removing intermediate container 4a5210ddaf73\n'}
{'stream': ' ---> 74a6329f5a96\n'}
{'stream': 'Step 16/16 : EXPOSE 1521 8080 5500'}
{'stream': '\n'}
{'stream': ' ---> Running in 2226a299f10c\n'}
{'stream': 'Removing intermediate container 2226a299f10c\n'}
{'stream': ' ---> b350079361f4\n'}
{'aux': {'ID': 'sha256:b350079361f42e64ee37dbc5c8257a6734fc3fdd3c66ab500fc86a0adf3b93b7'}}
{'stream': 'Successfully built b350079361f4\n'}
{'stream': 'Successfully tagged linux_for_db18.0.0:latest\n'}

Once the image has been created we can start a container based on it.

db_container = client.containers.create(image.short_id,
                                       command="/bin/bash",
                                       hostname=p_host_name,
                                       tty=True,
                                       stdin_open=True,
                                       auto_remove=False,
                                       name=p_host_name,
                                       shm_size='3G',
                                       # network_mode='host',
                                       ports={1521:1522,5500:5501},
                                       volumes={host_orabase: {'bind': container_oradata, 'mode': 'rw'}, host_install_dir: {'bind': container_install_dir, 'mode': 'rw'}},
                                       environment={'PATH':path,'ORACLE_SID': db_name, 'ORACLE_BASE': oracle_base,'TNS_ADMIN': tns_admin, 'ORACLE_HOME':oracle_home}
                                       )

#                                        volumes={host_orabase: {'bind': oracle_base, 'mode': 'rw'}, host_install_dir: {'bind': container_install_dir, 'mode': 'rw'}},

db_container.start()
p_ip_adress = db_container.attrs['NetworkSettings']['IPAddress']

And then created the needed directory structure within it.

containter_exec(db_container, f'mkdir -p {container_oradata}/{db_name}')
containter_exec(db_container, f'mkdir -p {tns_admin}')
containter_exec(db_container, f'mkdir -p {r_area}/{db_name}')
containter_exec(db_container, f'mkdir -p {a_area}')
containter_exec(db_container, f'mkdir -p {oracle_base}/oraInventory')
containter_exec(db_container, f'mkdir -p {oracle_home}')
containter_root_exec(db_container,'usermod -a -G dba oracle')

Unzip Oracle Database software and validate¶

We now need to unzip the Oracle software which should be located in the host_install_dir variable. This is unzipped within the container not the host. NOTE: I don't stream the output because it's realtively large. It should take 2-5 minutes.

files = [f for f in os.listdir(host_install_dir) if f.endswith('.zip')]
    
if files == 0:
    display(Markdown(f"**There doesn't appear to be any zip files in the {host_install_dir} directory. This should contain the oracle database for Linux 64bit in its orginal zipped format**"))
else:
    display(Markdown(f'Unzipping `{files[0]}`'))
    containter_exec(db_container, f'unzip -o {container_install_dir}/{files[0]} -d {oracle_home}', show_output=False, stream_val=False)

And now display the contents of the Oracle Home

display(Markdown('Directory Contents'))
containter_exec(db_container, f'ls -l {oracle_home}')

total 300
drwxr-x--- 14 oracle oinstall  4096 Jun  4  2018 OPatch
drwxr-xr-x  2 oracle oinstall  4096 Feb  7  2018 QOpatch
drwxr-xr-x  5 oracle oinstall  4096 Feb  7  2018 R
drwxr-xr-x  2 oracle oinstall  4096 Jul 18  2018 addnode
drwxr-xr-x  6 oracle oinstall  4096 Feb  7  2018 apex
drwxr-xr-x  9 oracle oinstall  4096 Jul 18  2018 assistants
drwxr-xr-x  2 oracle oinstall  4096 Jul 18  2018 bin
drwxr-xr-x  4 oracle oinstall  4096 Jul 18  2018 clone
drwxr-xr-x  6 oracle oinstall  4096 Jul 18  2018 crs
drwxr-xr-x  3 oracle oinstall  4096 Feb  7  2018 css
drwxr-xr-x 11 oracle oinstall  4096 Feb  7  2018 ctx
drwxr-xr-x  7 oracle oinstall  4096 Jul 18  2018 cv
drwxr-xr-x  3 oracle oinstall  4096 Feb  7  2018 data
drwxr-xr-x  3 oracle oinstall  4096 Feb  7  2018 dbjava
drwxr-xr-x  2 oracle oinstall  4096 Feb  7  2018 dbs
drwxr-xr-x  5 oracle oinstall  4096 Jul 18  2018 deinstall
drwxr-xr-x  3 oracle oinstall  4096 Jul 18  2018 demo
drwxr-xr-x  3 oracle oinstall  4096 Feb  7  2018 diagnostics
drwxr-xr-x 13 oracle oinstall  4096 Feb  7  2018 dmu
drwxr-xr-x  4 oracle oinstall  4096 Jul 18  2018 drdaas
drwxr-xr-x  3 oracle oinstall  4096 Feb  7  2018 dv
-rw-r--r--  1 oracle oinstall   852 Aug 18  2015 env.ora
drwxr-xr-x  3 oracle oinstall  4096 Feb  7  2018 has
drwxr-xr-x  5 oracle oinstall  4096 Jul 18  2018 hs
drwxr-xr-x 10 oracle oinstall  4096 Jul 18  2018 install
drwxr-xr-x  2 oracle oinstall  4096 Feb  7  2018 instantclient
drwxr-x--- 13 oracle oinstall  4096 Jul 18  2018 inventory
drwxr-xr-x  8 oracle oinstall  4096 Jul 18  2018 javavm
drwxr-xr-x  3 oracle oinstall  4096 Feb  7  2018 jdbc
drwxr-xr-x  7 oracle oinstall  4096 Jul 18  2018 jdk
drwxr-xr-x  2 oracle oinstall  4096 Jul 18  2018 jlib
drwxr-xr-x 10 oracle oinstall  4096 Jul 18  2018 ldap
drwxr-xr-x  3 oracle oinstall 16384 Jul 18  2018 lib
drwxr-xr-x  2 oracle oinstall  4096 Feb  7  2018 log
drwxr-xr-x  9 oracle oinstall  4096 Jul 18  2018 md
drwxr-xr-x  4 oracle oinstall  4096 Feb  7  2018 mgw
drwxr-xr-x 10 oracle oinstall  4096 Jul 18  2018 network
drwxr-xr-x  5 oracle oinstall  4096 Jul 18  2018 nls
drwxr-xr-x  8 oracle oinstall  4096 Feb  7  2018 odbc
drwxr-xr-x  5 oracle oinstall  4096 Feb  7  2018 olap
drwxr-xr-x  7 oracle oinstall  4096 Jul 18  2018 opmn
drwxr-xr-x  4 oracle oinstall  4096 Feb  7  2018 oracore
drwxr-xr-x  7 oracle oinstall  4096 Jul 18  2018 ord
drwxr-xr-x  3 oracle oinstall  4096 Feb  7  2018 ordim
drwxr-xr-x  4 oracle oinstall  4096 Feb  7  2018 ords
drwxr-xr-x  3 oracle oinstall  4096 Feb  7  2018 oss
drwxr-xr-x  8 oracle oinstall  4096 Jul 18  2018 oui
drwxr-xr-x  4 oracle oinstall  4096 Feb  7  2018 owm
drwxr-xr-x  5 oracle oinstall  4096 Jul 18  2018 perl
drwxr-xr-x  6 oracle oinstall  4096 Feb  7  2018 plsql
drwxr-xr-x  6 oracle oinstall  4096 Jul 18  2018 precomp
drwxr-xr-x  4 oracle oinstall  4096 Feb  7  2018 racg
drwxr-xr-x 13 oracle oinstall  4096 Jul 18  2018 rdbms
drwxr-xr-x  3 oracle oinstall  4096 Feb  7  2018 relnotes
-rwx------  1 oracle oinstall   638 Jul 18  2018 root.sh
-rwx------  1 oracle oinstall   786 Feb  7  2018 root.sh.old
-rw-r-----  1 oracle oinstall    10 Feb  7  2018 root.sh.old.1
-rwxr-x---  1 oracle oinstall  1783 Mar  8  2017 runInstaller
-rw-r--r--  1 oracle oinstall  2927 Oct 14  2016 schagent.conf
drwxr-xr-x  5 oracle oinstall  4096 Feb  7  2018 sdk
drwxr-xr-x  3 oracle oinstall  4096 Feb  7  2018 slax
drwxr-xr-x 22 oracle oinstall  4096 Feb  7  2018 sqldeveloper
drwxr-xr-x  3 oracle oinstall  4096 Feb  7  2018 sqlj
drwxr-xr-x  5 oracle oinstall  4096 Jul 18  2018 sqlpatch
drwxr-xr-x  6 oracle oinstall  4096 Jul 18  2018 sqlplus
drwxr-xr-x  6 oracle oinstall  4096 Feb  7  2018 srvm
drwxr-xr-x  5 oracle oinstall  4096 Feb  7  2018 suptools
drwxr-xr-x  3 oracle oinstall  4096 Feb  7  2018 ucp
drwxr-xr-x  4 oracle oinstall  4096 Feb  7  2018 usm
drwxr-xr-x  2 oracle oinstall  4096 Feb  7  2018 utl
drwxr-xr-x  3 oracle oinstall  4096 Feb  7  2018 wwg
drwxr-x---  7 oracle oinstall  4096 Feb  7  2018 xdk

The next step is to create an Oracle Installer response file to reflect the paremeters we've defined. We're only going to perform a software only install.

script = f'''oracle.install.responseFileVersion=/oracle/install/rspfmt_dbinstall_response_schema_v18.0.0
oracle.install.option=INSTALL_DB_SWONLY
UNIX_GROUP_NAME=dba
INVENTORY_LOCATION={oracle_base}/oraInventory
ORACLE_BASE={oracle_base}
ORACLE_HOME={oracle_home}
oracle.install.db.InstallEdition=EE
oracle.install.db.OSDBA_GROUP=dba
oracle.install.db.OSBACKUPDBA_GROUP=dba
oracle.install.db.OSDGDBA_GROUP=dba
oracle.install.db.OSKMDBA_GROUP=dba
oracle.install.db.OSRACDBA_GROUP=dba
'''

copy_string_to_file(script, f'{oracle_home}/db_install.rsp', db_container)

Run the Oracle Installer¶

Now we can run the Oracle Installer in silent mode with a response file we've just created.

containter_exec(db_container, f'{oracle_home}/runInstaller -silent -force -waitforcompletion -responsefile {oracle_home}/db_install.rsp -ignorePrereqFailure')

Launching Oracle Database Setup Wizard...

[WARNING] [INS-32055] The Central Inventory is located in the Oracle base.
   ACTION: Oracle recommends placing this Central Inventory in a location outside the Oracle base directory.
[WARNING] [INS-13013] Target environment does not meet some mandatory requirements.
   CAUSE: Some of the mandatory prerequisites are not met. See logs for details. /tmp/InstallActions2019-03-19_02-40-31PM/installActions2019-03-19_02-40-31PM.log
   ACTION: Identify the list of failed prerequisite checks from the log: /tmp/InstallActions2019-03-19_02-40-31PM/installActions2019-03-19_02-40-31PM.log. Then either from the log file or from installation manual find the appropriate configuration to meet the prerequisites and fix it manually.
The response file for this session can be found at:
 /u01/app/oracle/product/18.0.0/dbhome_1/install/response/db_2019-03-19_02-40-31PM.rsp


You can find the log of this install session at:
 /tmp/InstallActions2019-03-19_02-40-31PM/installActions2019-03-19_02-40-31PM.log

As a root user, execute the following script(s):
1. /u01/app/oracle/oraInventory/orainstRoot.sh
	2. /u01/app/oracle/product/18.0.0/dbhome_1/root.sh

Execute /u01/app/oracle/oraInventory/orainstRoot.sh on the following nodes:
[oracle_db]

Execute /u01/app/oracle/product/18.0.0/dbhome_1/root.sh on the following nodes: 
[oracle_db]

Successfully Setup Software with warning(s).
Moved the install session logs to:

/u01/app/oracle/oraInventory/logs/InstallActions2019-03-19_02-40-31PM

containter_root_exec(db_container,f'/bin/bash {oracle_base}/oraInventory/orainstRoot.sh')
containter_root_exec(db_container,f'/bin/bash {oracle_home}/root.sh')

Changing permissions of /u01/app/oracle/oraInventory.
Adding read,write permissions for group.
Removing read,write,execute permissions for world.
Changing groupname of /u01/app/oracle/oraInventory to dba.
The execution of the script is complete.
Check /u01/app/oracle/product/18.0.0/dbhome_1/install/root_oracle_db_2019-03-19_14-42-43-621215020.log for the output of root script

Commit the container to create an image¶

And finally we can commit the container creating an image for future use.

repository_name = 'dominicgiles'
db_container.commit(repository=repository_name,tag=f'db{oracle_version}')

<Image: 'dominicgiles:db18.0.0'>

Tidy Up¶

The following code is included to enable you to quickly drop the container and potentially the immage.

db_container.stop()
db_container.remove()

list_containers(client)

list_images(client)

#client.images.remove(image.id)

#list_images()

Oracle SODA Python Driver and Jupyter Lab¶

This workbook is divided into two sections the first is a quick guide to setting up Jupyter Lab (Python Notebooks) such that it can connect to a database running inside of OCI, in this case an ATP instance. The second section uses the JSON python driver to connect to the database to run a few tests. This notebook is largely a reminder to myself on how to set this up but hopefully it will be useful to others.

Setting up Python 3.6 and Jupyter Lab on our compute instance¶

I won't go into much detail on setting up ATP or ADW or creating a IaaS server. I covered that process in quite a lot of detail here. We'll be setting up something similar to the following

OCI Cloud

Once you've created the server You'll need to logon to the server with the details found on the compute instances home screen. You just need to grab it's IP address to enable you to logon over ssh.

OCI Cloud

The next step is to connect over ssh to with a command similar to

ssh opc@132.146.27.111
Enter passphrase for key '/Users/dgiles/.ssh/id_rsa': 
Last login: Wed Jan  9 20:48:46 2019 from host10.10.10.1

In the following steps we'll be using python so we need to set up python on the server and configure the needed modules. Our first step is to use yum to install python 3.6 (This is personal preference and you could stick with python 2.7). To do this we first need to enable yum and then install the environment. Run the following commands

sudo yum -y install yum-utils
sudo yum-config-manager --enable ol7_developer_epel
sudo yum install -y python36
python3.6 -m venv myvirtualenv
source myvirtualenv/bin/activate

This will install python and enable a virtual environment for use (our own Python sand pit). You can make sure that python is installed by simply typing python3.6 ie.

$> python3.6
Python 3.6.3 (default, Feb  1 2018, 22:26:31) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-16)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> quit()

Make sure you type quit() to leave the REPL environment.

We now need to install the needed modules. You can do this one by one or simply use the following file requirements.txt and run the following command

pip -p requirements.txt

This will install all of the need python modules for the next step which is to start up Jupyter Lab.

Jupyter Lab is an interactive web based application that enables you do interactively run code and document the process at the same time. This blog is written in it and the code below can be run once your environment is set up. Vists the website to see more details.

To start jupyer lab we run the following command.

nohup jupyter-lab --ip=127.0.0.1 &

Be aware that this will only work if you have activated you virtual environment. In out instance we did this with with the command source myvirtualenv/bin/activate. At this point the jupyter-lab is running in the background and and is listening (by default) on port 8888. You could start a desktop up and use VNC to view the output. However I prefer to redirect the output to my own desktop over ssh. To do this you'll need to run the following ssh command from your desktop

ssh -N -f -L 5555:localhost:8888 opc@132.146.27.111

Replacing the IP address above with the one for your compute instance

This will redirect the output of 8888 to port 5555 on your destop. You can then connect to it by simply going to the following url http://localhost:5555. After doing this you should see a screen asking you input a token (you'll only need to do this once). You can find this token inside of the nohup.out file running on the compute instance. It should be near the top of the file and should look something like

[I 20:49:12.237 LabApp] http://127.0.0.1:8888/?token=216272ef06c7b7cb3fa8da4e2d7c727dab77c0942fac29c8

Just copy the text after "token=" and paste it in to the dialogue box. After completing that step you should see something like this

Jupyter Lab

You can now start creating your own notebooks or load this one found here. I'd visit the website to familiarise yourself on how the notebooks work.

Using Python and the JSON SODA API¶

This section will walk through using The SODA API with Python from within the Jupyter-lab envionment we set up in the previous section. The SODA API is a simple object API that enables developers persist and retrieve JSON documents held inside of the Oracle Database. SODA drivers are available for Java, C, REST, Node and Python.

You can find the documentation for this API here

To get started we'll need to import the need the following python modules

import cx_Oracle
import keyring
import os
import pandas as pd

We now need to set the location of the directory containing the wallet to enable us to connect to the ATP instance. Once we've done that we can connect to the Oracle ATP instance and get a SODA object to enable us to work with JSON documents. NOTE : I'm using the module keyring to hide the password for my database. You should replace this call with the password for your user.

os.environ['TNS_ADMIN'] = '/home/opc/Wallet'
connection = cx_Oracle.connect('json', keyring.get_password('ATPLondon','json'), 'sbatp_tpurgent')
soda = connection.getSodaDatabase()

We now need to create JSON collection and if needed add any additional indexes which might accelerate data access.

try:
    collection = soda.createCollection("customers_json_soda")
    collection.createIndex({ "name"   : "customer_index",
                          "fields" : [ { "path"     : "name_last",
                          "datatype" : "string"}]})
except cx_Oracle.DatabaseError as ex:
    print("It looks like the index already exists : {}".format(ex))

We can now add data to the collection. Here I'm inserting the document into the database and retrieving it's key. You can find find some examples/test cases on how to use collections here

customer_doc = {"id"          : 1,
       "name_last"    : "Giles",
       "name_first"   : "Dom",
       "description"  : "Gold customer, since 1990",
       "account info" : None,
       "dataplan"     : True,
       "phones"       : [{"type" : "mobile", "number" : 9999965499},
                         {"type" : "home",   "number" : 3248723987}]}
doc = collection.insertOneAndGet(customer_doc)
connection.commit()

To fetch documents we could use SQL or Query By Example (QBE) as shown below. You can find further details on QBE here. In this example there should just be a single document. NOTE: I'm simply using pandas DataFrame to render the retrieved data but it does highlight how simple it is to use the framework for additional analysis at a later stage.

documents = collection.find().filter({'name_first': {'$eq': 'Dom'}}).getDocuments()
results = [document.getContent() for document in documents]    
pd.DataFrame(results)

To update records we can use the replaceOne method.

document = collection.find().filter({'name_first': {'$eq': 'Dom'}}).getOne()
updated = collection.find().key(doc.key).replaceOne({"id"          : 1,
       "name_last"    : "Giles",
       "name_first"   : "Dominic",
       "description"  : "Gold customer, since 1990",
       "account info" : None,
       "dataplan"     : True,
       "phones"       : [{"type" : "mobile", "number" : 9999965499},
                         {"type" : "home",   "number" : 3248723987}]},)
connection.commit()

And just to make sure the change happened

data = collection.find().key(document.key).getOne().getContent()
pd.DataFrame([data])

And finally we can drop the collection.

try:
    collection.drop()
except cx_Oracle.DatabaseError as ex:
    print("We're were unable to drop the collection")

connection.close()

Simple Oracle/Jupyter/Keyring/Altair Example¶

In this trivial example we'll be using Jupyter Lab to create this small notebook of Python code. We will see how easy it is to run SQL and analyze those results. The point of the exercise here isn't really the SQL we use but rather how simple it is for developers or analysts who prefer working with code rather than a full blown UI to retrieve and visualise data.

For reference I'm using Python 3.6 and Oracle Database 12c Release 2. You can find the source here

First we'll need to import the required libraries. It's likely that you won't have them installed on your server/workstation. To do that use the following pip command "pip install cx_Oracle keyring pandas altair jupyter jupyter-sql". You should have pip installed and probably be using virtualenv but if you don't I'm afraid that's beyond the scope of this example. Note whilst I'm not importing jupyter and jupyter-sql in this example they are implicitly used.

import cx_Oracle
import keyring
import pandas as pd
import altair as alt

We will be using the magic-sql functionality inside of jupyter-lab. This means we can trivially embed SQL rather than coding the cusrsors and fetches we would typically have to do if we were using straight forward cx_Oracle. This uses the "magic function" syntax" which start with "%" or "%%". First we need to load the support for SQL

%load_ext sql

Next we can connect to the local docker Oracle Database. However one thing we don't want to do with notebooks when working with server application is to disclose our passwords. I use the python module "keyring" which enables you to store the password once in a platform appropriate fashion and then recall it in code. Outside of this notebook I used the keyring command line utility and ran "keyring set local_docker system". It then prompted me for a password which I'm using here. Please be careful with passwords and code notebooks. It's possible to mistakenly serialise a password and then potentially expose it when sharing notebooks with colleagues.

password = keyring.get_password('local_docker','system')

We can then connect to the Oracle Database. In this instance I'm using the cx_Oracle Diriver and telling it to connect to a local docker database running Oracle Database 12c (12.2.0.1). Because I'm using a service I need to specify that in the connect string. I also substitute the password I fetched earlier.

%sql oracle+cx_oracle://system:$password@localhost/?service_name=soe

'Connected: system@'

I'm now connected to the Oracle database as the system user. I'm simply using this user as it has access to a few more tables than a typical user would. Next I can issue a simple select statement to fetch some table metadata. the %%sql result << command retrieves the result set into the variable called result

%%sql result <<
select table_name, tablespace_name, num_rows, blocks, avg_row_len, trunc(last_analyzed)
from all_tables
where num_rows  > 0
and tablespace_name is not null

 * oracle+cx_oracle://system:***@localhost/?service_name=soe
0 rows affected.
Returning data to local variable result

In Python a lot of tabular manipulation of data is performed using the module Pandas. It's an amazing piece of code enabling you to analyze, group, filter, join, pivot columnar data with ease. If you've not used it before I strongly reccomend you take a look. With that in mind we need to take the resultset we retrieved and convert it into a DataFrame (the base Pandas tabular structure)

result_df = result.DataFrame()

We can see a sample of that result set the the Panada's head function

result_df.head()

All very useful but what if wanted to chart how many tables were owned by each user. To do that we could of course use the charting functionlty of Pandas (and Matplotlib) but I've recently started experimenting with Altair and found it to make much more sense when define what and how to plot data. So lets just plot the count of tables on the X axis and the tablespace name on the Y axis.

alt.Chart(result_df).mark_bar().encode(
    x='count()',
    y='tablespace_name',
)

Altair makes much more sense to me as a SQL user than some of the cryptic functionality of Matplotlib. But the chart looks really dull. We can trvially tidy it up a few additonal commands

alt.Chart(result_df).mark_bar().encode(
    x=alt.Y('count()', title='Number of tables in Tablespace'),
    y=alt.Y('tablespace_name', title='Tablespace Name'),
    color='tablespace_name'
).properties(width=600, height= 150, title='Tables Overview')

Much better but Altair can do significantly more sophisticated charts than a simple bar chart. Next lets take a look at the relationship between the number of rows in a table and the blocks required to store them. It's worth noting that this won't, or at least shouldn't, produce any startling results but it's a useful plot on a near empty database that will give use something to look at. Using another simple scatter chart we get the following

alt.Chart(result_df).mark_point().encode(
    y = 'num_rows',
    x = 'blocks',
).properties(width=600)

Again it's a near empty database and so there's not much going on or at least we have a lot of very small tables and a few big ones. Lets use a logarithmic scale to spread things out a little better and take the oppertunity to brighten it up

alt.Chart(result_df).mark_point().encode(
    y = alt.Y('num_rows',scale=alt.Scale(type='log')),
    x = alt.X('blocks', scale=alt.Scale(type='log')),
    color = 'tablespace_name',
    tooltip=['table_name']
).properties(width=600)

Much better... But Altair and the libraries it uses has a another fantastic trick up its sleeve. It can make the charts interactive (zoom and pan) but also enable you to select data in one chart and have it reflected in another. To do that we use a "Selection" object and use this to filter the data in a second chart. Lets do this for the two charts we created. Note you'll need to have run this notebook "live" to see the interactive selection.

interval = alt.selection_interval()

chart1 = alt.Chart(result_df).mark_point().encode(
    y = alt.Y('num_rows',scale=alt.Scale(type='log')),
    x = alt.X('blocks', scale=alt.Scale(type='log')),
    color = 'tablespace_name',
    tooltip=['table_name']
).properties(width=600, selection=interval)

chart2 = alt.Chart(result_df).mark_bar().encode(
    x='count()',
    y='tablespace_name',
    color = 'tablespace_name'
).properties(width=600, height= 150).transform_filter(
    interval
)

chart1 & chart2

And thats it. I'll be using this framework to create a few additional examples in the coming weeks.

So as usual for this time of year I find myself on vacation with very little to do. So I try and find personal projects that interest me. This is usually a mixture of electronics and mucking around with software in a way that I don't usally find the time for normally. One of projects is my sensor network.

I have a number of Raspberry Pi's around my house and garden that take measurements of temperature, humidity, pressure and light. They hold the data locally and then periodically upload them to a central server (another Raspberry Pi) where they are aggregated. However for any number of reasons (usally a power failure) the raspberrypi's occasionally restart and are unable to join the network. This means that some of their data is lost. I've improved their resiliance to failure and so it's a less common occurance but it's still possible for it to happen. When this means I'm left with some ugly gaps in an otherwise perfect data set. It's not a big deal but it's pretty easy to fix. Before I begin, I acknolwedge that I'm effectively "making up" data to make graphs "prettier".

In the following code notebook I'll be using Python and Pandas to tidy up the gaps.

To start with I need to load the libraries to process the data. The important ones are included at the start of the imports. The rest from "SensorDatabaseUtilities" aren't really relevant since they are just helper classes to get data from my repository

import matplotlib.pyplot as plt
from matplotlib import style
import pandas as pd
import matplotlib
import json
from pandas.io.json import json_normalize
# The following imports are from my own Sensor Library modules and aren't really relevant
from SensorDatabaseUtilities import AggregateItem
from SensorDatabaseUtilities import SensorDatabaseUtilities
# Make sure the charts appear in this notebook and are readable
%matplotlib inline
matplotlib.rcParams['figure.figsize'] = (20.0, 10.0)

The following function is used to convert a list of JSON documents (sensor readings) into a Pandas DataFrame. It then finds the minimum and maximum dates and creates a range for that period. It uses this period to find any missing dates. The heavy lifting of the function uses the reindex() function to insert new entries whilst at the same time interpolating any missing values in the dataframe. It then returns just the newly generated rows

def fillin_missing_data(sensor_name, results_list, algorithm='linear', order=2):
    # Turn list of json documents into a json document
    results = {"results": results_list}
    # Convert JSON into Panda Dataframe/Table
    df = json_normalize(results['results'])
    # Convert Date String to actual DateTime object
    df['Date'] = pd.to_datetime(df['Date'])
    # Find the max and min of the Data Range and generate a complete range of Dates
    full_range = pd.date_range(df['Date'].min(), df['Date'].max())
    # Find the dates that aren't in the complete range
    missing_dates = full_range[~full_range.isin(df['Date'])]
    # Set the Date to be the index
    df.set_index(['Date'], inplace=True)
    # Reindex the data filling in the missing date and interpolating missing values
    if algorithm in ['spline', 'polynomial'] :
        df = df.sort_index().reindex(full_range).interpolate(method=algorithm, order=order)
    elif algorithm in ['ffill', 'bfill']:
        df = df.sort_index().reindex(full_range, method=algorithm)
    else:
        df = df.sort_index().reindex(full_range).interpolate(method=algorithm)
    # Find the dates in original data set that have been added
    new_dates = df[df.index.isin(missing_dates)]
    # Create new aggregate records and insert them into the database
    # new_dates.apply(gen_json,axis=1, args=[sensor_name])
    return new_dates

This function simply takes an array of JSON documents and converts them into a DataFrame using the Pandas json_normalize function. It provides us with the dataset that contains missing data i.e. an incomplete data set.

def json_to_dataframe(results_list):
    # Turn list of json documents into a json dodument
    results = {"results": results_list}
    # Convert JSON into Panda Dataframe/Table
    df = json_normalize(results['results'])
    return df

The first step is to pull the data from the database. I'm using some helper functions to do this for me. I've also selected a date range where I know I have a problem.

utils = SensorDatabaseUtilities('raspberrypi', 'localhost')
data = utils.getRangeData('20-jan-2015', '10-feb-2015')
# The following isn't need in the code but is included just to show the structure of the JSON Record
json.dumps(data[0])

'{"Date": "2015-01-20 00:00:00", "AverageHumidity": 35.6, "AverageTemperature": 18.96, "AveragePressure": 99838.78, "AverageLight": 119.38}'

Next simply convert the list of JSON records into a Pandas DataFrame and set it's index to the "Date" Column. NOTE : Only the first 5 records are shown

incomplete_data = json_to_dataframe(data)
# Find the range of the data and build a series with all dates for that range 
full_range = pd.date_range(incomplete_data['Date'].min(), incomplete_data['Date'].max())
incomplete_data['Date'] = pd.to_datetime(incomplete_data['Date'])
incomplete_data.set_index(['Date'], inplace=True)
# Show the structure of the data set when converted into a DataFrame
incomplete_data.head()

The following step isn't needed but simply shows the problem we have. In this instance we are missing the days for Janurary 26th 2015 to Janurary 30th 2015

#incomplete_data.set_index(['Date'], inplace=True)
problem_data = incomplete_data.sort_index().reindex(full_range)
axis = problem_data['AverageTemperature'].plot(kind='bar')
axis.set_ylim(18,22)
plt.show()

Pandas offers you a number of approaches for interpolating the missing data in a series. They range from the simple method of backfilling or forward filling values to the more powerful approaches of methods such as "linear", "quadratic" and "cubic" all the way through to the more sophisticated approaches of "pchip", "spline" and "polynomial". Each approach has its benefits and disadvantages. Rather than talk through each it's much simpler to show you the effect of each interpolation on the data. I've used a line graph rather than a bar graph to allow me to show all of the approaches on a single graph.

interpolation_algorithms = ['linear', 'quadratic', 'cubic', 'spline', 'polynomial', 'pchip', 'ffill', 'bfill']

fig, ax = plt.subplots()
for ia in interpolation_algorithms:
    new_df = pd.concat([incomplete_data, fillin_missing_data('raspberrypi', data, ia)])
    ax = new_df['AverageTemperature'].plot()

handles, not_needed = ax.get_legend_handles_labels()
ax.legend(handles, interpolation_algorithms, loc='best')

plt.show()

Looking at the graph it appears that either pchip (Piecewise Cubic Hermite Interpolating Polynomial) or Cubic interpolation is going to provide the best approximation for the missing values in my data set. This is largely subjective because these are "made up values" but I believe either of these approaches provide values that are closest to what the data could have been.

The next step is to apply one to the incomplete data set and store it back in the database

complete_data = pd.concat([incomplete_data, fillin_missing_data('raspberrypi', data, 'pchip')])

axis = complete_data.sort_index()['AverageTemperature'].plot(kind='bar')
axis.set_ylim(18,22)
plt.show()

And thats it. I've made the code much more verbose that it needed to be purely to demonstrate the point. Pandas makes it very simple to patch up a data set.

Tag	id	Size
dominicgiles:db18.0.0	cc75a47617	10.8 GB
linux_for_db18.0.0:latest	b350079361	926.8 MB
oraclelinux:7-slim	c3d8693881	117.3 MB
giaas.us.oracle.com:5001/dominic.giles/104213:db19000	808b7de5b5	7.8 GB
dbtools-docker.dockerhub-den.oraclecorp.com/oracle/sdw:18.4.0	af2d2f4838	1.8 GB
dbtools-docker.dockerhub-den.oraclecorp.com/oracle/database:18.3.0-ee	c9eae81d87	8.4 GB

Tag	id	Size
dominicgiles:db18.0.0	cc75a47617	10.8 GB
linux_for_db18.0.0:latest	b350079361	926.8 MB
oraclelinux:7-slim	c3d8693881	117.3 MB
giaas.us.oracle.com:5001/dominic.giles/104213:db19000	808b7de5b5	7.8 GB
dbtools-docker.dockerhub-den.oraclecorp.com/oracle/sdw:18.4.0	af2d2f4838	1.8 GB
dbtools-docker.dockerhub-den.oraclecorp.com/oracle/database:18.3.0-ee	c9eae81d87	8.4 GB

	table_name	tablespace_name	num_rows	blocks	avg_row_len	TRUNC(LAST_ANALYZED)
0	BOOTSTRAP$	SYSTEM	60	3	314	2018-06-14
1	CON$	SYSTEM	7068	28	22	2018-06-20
2	OBJ$	SYSTEM	72687	1211	112	2018-06-14
3	TS$	SYSTEM	7	7	89	2018-06-14
4	IND$	SYSTEM	2892	1565	89	2018-06-19

	AverageHumidity	AverageLight	AveragePressure	AverageTemperature
Date
2015-01-20	35.60	119.38	99838.78	18.96
2015-01-21	38.77	63.65	99617.15	19.48
2015-01-22	37.45	143.00	100909.08	20.08
2015-01-23	35.52	119.87	101306.30	20.12
2015-01-24	39.72	92.43	101528.54	19.90

Name	Short id	Status	IP Address
oraclestby	90e777d67f	running	172.17.0.3
oracleprimary	51fb562d84	running	172.17.0.2

dominicgiles.com

"God rot Windows and all its ugly, clunky badly-designed horror" - Stephen Fry

My Blog