Debezium Server to Cloud PubSub: A Kafka-less way to stream changes from databases

Ravish Garg
Nerd For Tech
Published in
4 min readJul 9, 2021

--

As per Harvard Business Review (HBR) less than 1% of unstructured and 50% of structured data is ever used in decision making as the organizations can’t figure out how to efficiently integrate, manage and extract value from their data. In addition, to remain competitive in the market, organizations required data processing and analysis in real-time to gain insights. This is where data streaming comes in and one of the most loved open-source utility is Debezium. It is an open-source distributed platform for change data capture.

The only restriction is that you need to deploy Debezium by means of Apache Kafka Connect as a Source connector and thus, all captured records have to propagate via Kafka messaging queue to Sink connector and thus, to other systems or even messaging queues like Cloud PubSub.

Cloud PubSub integration using Debezium Kafka Connector, By Author

Now the team has introduced Debezium Server, a ready-to-use application that streams change events from source database to messaging infrastructure like Google cloud managed Cloud Pub/Sub directly without the requirement of Apache Kafka and thus, alleviate the associated operational overhead and an extra hop.

Cloud PubSub integration using Debezium Server, By Author

Debezium Server set-up

Let’s configure the Debezium server with source as an enterprise database engine “SQL Server” and sink as a Google Cloud PubSub without the need of Kafka components.

Configure SQL Server for CDC

  • Ensure you are a member of the sysadmin fixed server role.
  • You are db_owner of the database.
  • SQL Server Agent is running.
  • To enable the CDC for database “demodb”, execute following:
USE demodb
GO
EXEC sys.sp_cdc_enable_db
GO
  • Enable the data capture on the source tables of interest.
USE demodb
GO
EXEC sys.sp_cdc_enable_table
@source_schema = N'dbo',
@source_name = N'Inventory',
@role_name = NULL,
@supports_net_changes = 0
GO
Source Table definition: Inventory, By Author
  • Verify the access to CDC table.
USE demodb
GO
EXEC sys.sp_cdc_help_change_data_capture
GO
CDC Table dbo_Inventory has been created and available, By Author

Configure Debezium Server

  • Download & Install Debezium Server distribution.
  • Create the configuration file “conf/application.properties” which will contain all configurations related to Source, Sink, Format & applicable Transformations.
  • The configuration file with SQL Server database as source and Google Cloud PubSub as messaging infrastructure will looks like:
debezium.sink.type=pubsub
debezium.sink.pubsub.project.id= <<PUT YOUR PROJECT_ID>>
debezium.source.connector.class=io.debezium.connector.sqlserver.SqlServerConnector
debezium.source.offset.storage.file.filename=/tmp/offsets.dat
debezium.source.offset.flush.interval.ms=0
debezium.source.database.hostname=<<SQL-Server HOSTNAME/IP_ADDRESS>>
debezium.source.database.port=1433
debezium.source.database.user=sa
debezium.source.database.password= <<PASSWORD>>
debezium.source.database.dbname=demodb
debezium.source.database.server.name=demodb
debezium.sink.pravega.scope=empty
debezium.source.table.whitelist=dbo.Inventory
debezium.source.database.history.file.filename=/tmp/FileDatabaseHistory.dat
debezium.source.database.history=io.debezium.relational.history.FileDatabaseHistory

Configure Cloud PubSub

  • Create the PubSub Topic with the following name <<db_name.schema.table_name>>
PubSub Topic: demodb.dbo.Inventory, By Author
  • Create a PubSub Subscription to view the messages pushed to PubSub Topic.

Initiate Debezium Server

  • Start the Debezium server by executing <<Debezium_Server_Path/run.sh>>
./run.sh output, By Author
./run.sh output, By Author
  • Let’s add data to Inventory table and check the PubSub Subscription
CDC data from SQL Server is visible in PubSub Subscription, By Author

Summary

Debezium Server has alleviated the dependency on Kafka Connect so that users can integrate it with the managed offering of choice and thus, no longer need any planning of additional operational overhead with the adoption of Debezium. However, a word of caution that this feature is still in incubation i.e. exact semantics, configuration options etc. may change in the future revisions. So ensure you thoroughly test it in your non-prod environment before rolling it for production systems.

--

--

Ravish Garg
Nerd For Tech

Customer Engineer, Data Specialist @ Google Cloud. I assist customers transform & evolve their business via Google’s global network and software infrastructure.