Oracle CDC Connector

The Oracle CDC connector allows for reading snapshot data and incremental data from Oracle database. This document describes how to setup the Oracle CDC connector to run SQL queries against Oracle databases.

Dependencies

In order to setup the Oracle CDC connector, the following table provides dependency information for both projects using a build automation tool (such as Maven or SBT) and SQL Client with SQL JAR bundles.

Maven dependency

<dependency>
  <groupId>com.ververica</groupId>
  <artifactId>flink-connector-oracle-cdc</artifactId>
  <!-- the dependency is available only for stable releases. -->
  <version>2.1.1</version>
</dependency>

SQL Client JAR

Download link is available only for stable releases.

Download flink-sql-connector-oracle-cdc-2.1.1.jar and put it under <FLINK_HOME>/lib/.

Setup Oracle

You have to enable log archiving for Oracle database and define an Oracle user with appropriate permissions on all databases that the Debezium Oracle connector monitors.

  1. Enable log archiving

    (1.1). Connect to the database as DBA

    sqlplus sys/password@host:port/SID AS SYSDBA
    

    (1.2). Enable log archiving

    alter system set db_recovery_file_dest_size = 10G;
    alter system set db_recovery_file_dest = '/opt/oracle/oradata/recovery_area' scope=spfile;
    shutdown immediate;
    startup mount;
    alter database archivelog;
    alter database open;
    

    Notes:

    • Enable log archiving requires database restart, pay attention when try to do it

    • The archived logs will occupy a large amount of disk space, so consider clean the expired logs the periodically

    (1.3). Check whether log archiving is enabled

    -- Should now "Database log mode: Archive Mode"
    archive log list;
    

    Notes:

    Supplemental logging must be enabled for captured tables or the database in order for data changes to capture the before state of changed database rows. The following illustrates how to configure this on the table/database level.

    -- Enable supplemental logging for a specific table:
    ALTER TABLE inventory.customers ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
    
    -- Enable supplemental logging for database
    ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;
    
  2. Create an Oracle user with permissions

    (2.1). Create Tablespace

    sqlplus sys/password@host:port/SID AS SYSDBA;
      CREATE TABLESPACE logminer_tbs DATAFILE '/opt/oracle/oradata/SID/logminer_tbs.dbf' SIZE 25M REUSE AUTOEXTEND ON MAXSIZE UNLIMITED;
      exit;
    

    (2.2). Create a user and grant permissions

    sqlplus sys/password@host:port/SID AS SYSDBA;
      CREATE USER flinkuser IDENTIFIED BY flinkpw DEFAULT TABLESPACE LOGMINER_TBS QUOTA UNLIMITED ON LOGMINER_TBS;
      GRANT CREATE SESSION TO flinkuser;
      GRANT SET CONTAINER TO flinkuser;
      GRANT SELECT ON V_$DATABASE to flinkuser;
      GRANT FLASHBACK ANY TABLE TO flinkuser;
      GRANT SELECT ANY TABLE TO flinkuser;
      GRANT SELECT_CATALOG_ROLE TO flinkuser;
      GRANT EXECUTE_CATALOG_ROLE TO flinkuser;
      GRANT SELECT ANY TRANSACTION TO flinkuser;
      GRANT LOGMINING TO flinkuser;
    
      GRANT CREATE TABLE TO flinkuser;
      GRANT LOCK ANY TABLE TO flinkuser;
      GRANT ALTER ANY TABLE TO flinkuser;
      GRANT CREATE SEQUENCE TO flinkuser;
    
      GRANT EXECUTE ON DBMS_LOGMNR TO flinkuser;
      GRANT EXECUTE ON DBMS_LOGMNR_D TO flinkuser;
    
      GRANT SELECT ON V_$LOG TO flinkuser;
      GRANT SELECT ON V_$LOG_HISTORY TO flinkuser;
      GRANT SELECT ON V_$LOGMNR_LOGS TO flinkuser;
      GRANT SELECT ON V_$LOGMNR_CONTENTS TO flinkuser;
      GRANT SELECT ON V_$LOGMNR_PARAMETERS TO flinkuser;
      GRANT SELECT ON V_$LOGFILE TO flinkuser;
      GRANT SELECT ON V_$ARCHIVED_LOG TO flinkuser;
      GRANT SELECT ON V_$ARCHIVE_DEST_STATUS TO flinkuser;
      exit;
    

See more about the Setting up Oracle

How to create an Oracle CDC table

The Oracle CDC table can be defined as following:

-- register an Oracle table 'products' in Flink SQL
Flink SQL> CREATE TABLE products (
     ID INT NOT NULL,
     NAME STRING,
     DESCRIPTION STRING,
     WEIGHT DECIMAL(10, 3),
     PRIMARY KEY(id) NOT ENFORCED
     ) WITH (
     'connector' = 'oracle-cdc',
     'hostname' = 'localhost',
     'port' = '1521',
     'username' = 'flinkuser',
     'password' = 'flinkpw',
     'database-name' = 'XE',
     'schema-name' = 'inventory',
     'table-name' = 'products');
  
-- read snapshot and binlogs from products table
Flink SQL> SELECT * FROM products;

Connector Options

Option Required Default Type Description
connector required (none) String Specify what connector to use, here should be 'oracle-cdc'.
hostname required (none) String IP address or hostname of the Oracle database server.
username required (none) String Name of the Oracle database to use when connecting to the Oracle database server.
password required (none) String Password to use when connecting to the Oracle database server.
database-name required (none) String Database name of the Oracle server to monitor.
schema-name required (none) String Schema name of the Oracle database to monitor.
table-name required (none) String Table name of the Oracle database to monitor.
port optional 1521 Integer Integer port number of the Oracle database server.
scan.startup.mode optional initial String Optional startup mode for Oracle CDC consumer, valid enumerations are "initial" and "latest-offset". Please see Startup Reading Positionsection for more detailed information.
debezium.* optional (none) String Pass-through Debezium's properties to Debezium Embedded Engine which is used to capture data changes from Oracle server. For example: 'debezium.snapshot.mode' = 'never'. See more about the Debezium's Oracle Connector properties

Limitation

Can’t perform checkpoint during scanning snapshot of tables

During scanning snapshot of database tables, since there is no recoverable position, we can’t perform checkpoints. In order to not perform checkpoints, Oracle CDC source will keep the checkpoint waiting to timeout. The timeout checkpoint will be recognized as failed checkpoint, by default, this will trigger a failover for the Flink job. So if the database table is large, it is recommended to add following Flink configurations to avoid failover because of the timeout checkpoints:

execution.checkpointing.interval: 10min
execution.checkpointing.tolerable-failed-checkpoints: 100
restart-strategy: fixed-delay
restart-strategy.fixed-delay.attempts: 2147483647

Available Metadata

The following format metadata can be exposed as read-only (VIRTUAL) columns in a table definition.

Key DataType Description
table_name STRING NOT NULL Name of the table that contain the row.
schema_name STRING NOT NULL Name of the schema that contain the row.
database_name STRING NOT NULL Name of the database that contain the row.
op_ts TIMESTAMP_LTZ(3) NOT NULL It indicates the time that the change was made in the database.
If the record is read from snapshot of the table instead of the change stream, the value is always 0.

The extended CREATE TABLE example demonstrates the syntax for exposing these metadata fields:

CREATE TABLE products (
    db_name STRING METADATA FROM 'database_name' VIRTUAL,
    schema_name STRING METADATA FROM 'schema_name' VIRTUAL, 
    table_name STRING METADATA  FROM 'table_name' VIRTUAL,
    operation_ts TIMESTAMP_LTZ(3) METADATA FROM 'op_ts' VIRTUAL,
    ID INT NOT NULL,
    NAME STRING,
    DESCRIPTION STRING,
    WEIGHT DECIMAL(10, 3),
    PRIMARY KEY(id) NOT ENFORCED
) WITH (
    'connector' = 'oracle-cdc',
    'hostname' = 'localhost',
    'port' = '1521',
    'username' = 'flinkuser',
    'password' = 'flinkpw',
    'database-name' = 'XE',
    'schema-name' = 'inventory',
    'table-name' = 'products'
);

** Note ** : The Oracle dialect is case-sensitive, it converts field name to uppercase if the field name is not quoted, Flink SQL doesn’t convert the field name. Thus for physical columns from oracle database, we should use its converted field name in Oracle when define an oracle-cdc table in Flink SQL.

Features

Exactly-Once Processing

The Oracle CDC connector is a Flink Source connector which will read database snapshot first and then continues to read change events with exactly-once processing even failures happen. Please read How the connector works.

Startup Reading Position

The config option scan.startup.mode specifies the startup mode for Oracle CDC consumer. The valid enumerations are:

  • initial (default): Performs an initial snapshot on the monitored database tables upon first startup, and continue to read the latest binlog.

  • latest-offset: Never to perform a snapshot on the monitored database tables upon first startup, just read from the change since the connector was started.

Note: the mechanism of scan.startup.mode option relying on Debezium’s snapshot.mode configuration. So please do not use them together. If you specific both scan.startup.mode and debezium.snapshot.mode options in the table DDL, it may make scan.startup.mode doesn’t work.

Single Thread Reading

The Oracle CDC source can’t work in parallel reading, because there is only one task can receive change events.

DataStream Source

The Oracle CDC connector can also be a DataStream source. You can create a SourceFunction as the following shows:

import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.source.SourceFunction;
import com.ververica.cdc.debezium.JsonDebeziumDeserializationSchema;
import com.ververica.cdc.connectors.oracle.OracleSource;

public class OracleSourceExample {
  public static void main(String[] args) throws Exception {
     SourceFunction<String> sourceFunction = OracleSource.<String>builder()
             .hostname()
             .port(1521)
             .database("XE") // monitor XE database
             .schemaList("inventory") // monitor inventory schema
             .tableList("inventory.products") // monitor products table
             .username("flinkuser")
             .password("flinkpw")
             .deserializer(new JsonDebeziumDeserializationSchema()) // converts SourceRecord to JSON String
             .build();

     StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

     env
        .addSource(sourceFunction)
        .print().setParallelism(1); // use parallelism 1 for sink to keep message ordering   
     
     env.execute();
  }
}

Note: Please refer Deserialization for more details about the JSON deserialization.

Data Type Mapping

Oracle type Flink SQL type
NUMBER(p, s <= 0), p - s < 3 TINYINT
NUMBER(p, s <= 0), p - s < 5 SMALLINT
NUMBER(p, s <= 0), p - s < 10 INT
NUMBER(p, s <= 0), p - s < 19 BIGINT
NUMBER(p, s <= 0), 19 <= p - s <= 38
DECIMAL(p - s, 0)
NUMBER(p, s > 0) DECIMAL(p, s)
NUMBER(p, s <= 0), p - s > 38 STRING
FLOAT
BINARY_FLOAT
FLOAT
DOUBLE PRECISION
BINARY_DOUBLE
DOUBLE
NUMBER(1) BOOLEAN
DATE
TIMESTAMP [(p)]
TIMESTAMP [(p)] [WITHOUT TIMEZONE]
TIMESTAMP [(p)] WITH TIME ZONE TIMESTAMP [(p)] WITH TIME ZONE
TIMESTAMP [(p)] WITH LOCAL TIME ZONE TIMESTAMP_LTZ [(p)]
CHAR(n)
NCHAR(n)
NVARCHAR2(n)
VARCHAR(n)
VARCHAR2(n)
CLOB
NCLOB
XMLType
STRING
BLOB
ROWID
BYTES
INTERVAL DAY TO SECOND
INTERVAL YEAR TO MONTH
BIGINT