AI Platform Training using External Database Platform

AI Platform makes it easy for machine learning developers, data scientists, and data engineers to take their ML projects from ideation to production and deployment, quickly and cost-effectively. Organizations can execute their training jobs in serverless fashion by either leveraging built-in algorithms (beta) or can create their own training application to run on AI Platform Training.

This post demonstrates how to package your training application when it needs to connect to an external (On-Prem / Multi-Cloud) database to fetch the required source data-set.

How it helps your business

All organizations use one or more forms of database products to store their business data which can be required by their training jobs or need to access that business data in real-time due to the nature of their training jobs.

Google AI Platform Jobs gives an option to your organization to access this data-set directly from it’s AI Platform leveraging its custom container feature.

Enabling AI Platform Jobs connectivity to database platform

In this blog, we walk you through building a sample AI Platform Training Job which can connect to an Oracle Database hosted on your on-premise datacenter or on another cloud platform by highlighting the following key components:

  1. Enable Private Service Connect between AI Platform and your VPC Network.
  2. Allow required firewall rules.
  3. Expose custom routes to your AI Platform so that it can reach your on-premise network.
  4. Build a docker container image with required database client / libraries and your training job script.
  5. Push your custom container image to Google Cloud Container Registry.
  6. Submit the job using your container image on your VPC network.
Reference Architecture, By Author
  1. Select or create Google Cloud Project and associated VPC network which you want to peer with AI Platform Service (service producer) and enable Compute Engine, AI Platform Training & Prediction & the Service Networking APIs.
  2. For VPC Network Peering with an on-premises network, ensure your on-premises network is connected to your VPC via VPN Tunnel or Interconnect with custom routes in-place.

Set a reserved range using gcloud beta compute addresses create.

Establish a peering connection between your VPC host project and Google’s service networking, using gcloud beta services vpc-peerings connect.

PROJECT_ID= <<your-project-id>>
gcloud config set project $PROJECT_ID

REGION= <<your-region>>
PEERING_RANGE_NAME= <<peering-range-name>>
NETWORK= <<your-vpc-network-name>>

gcloud beta compute addresses create $PEERING_RANGE_NAME \
--global \
--prefix-length=16 \
--description="peering range for Google Producer service" \
--network=$NETWORK \
--purpose=VPC_PEERING

# Create the VPC connection.
gcloud beta services vpc-peerings connect \
--service=servicenetworking.googleapis.com \
--network=$NETWORK \
--ranges=$PEERING_RANGE_NAME \
--project=$PROJECT_ID
# Export custom routes (if applicable).gcloud compute networks peerings update $PEERING_RANGE_NAME \
--network=$NETWORK \
--export-custom-routes \
--project=$PROJECT_ID

Here we leverage Google Cloud AI Platform custom container feature which allows you to run your application within a Docker image and thus, gives you flexibility to include all non-ML dependencies, libraries and binaries that are not otherwise supported on AI Platform Training.

Now, let’s consider you want to access an Oracle database for your data-set (one of the common relational database platform used by organizations) which is hosted in your datacenter and your training job is written in Python which can connect to Oracle database using cx_Oracle module which internally loads Oracle Client libraries to communicate over Oracle Net to an existing database and thus, require free Oracle Instant Client “Basic” or “Basic Light” package for your custom container.

Note: This post provides instructions for working with this app: Oracle Instant Client version 12.2.0.1. The instructions might not represent newer versions of the app. For more information, see the documentation: Oracle Instant Client.

  1. Download oracle-instantclient12.2-basic-12.2.0.1.0–1.x86_64.rpm.
  2. Create requirements.txt for all packages required by your training job.
cx_Oracle

3. Create your training job script which needs to fetch data-set from your on-premise Oracle database.

# Sample Python script to test the database connectivityimport cx_Oracle
dsn_tns = cx_Oracle.makedsn(‘<<your-onprem-private-ip>>’, ‘1521’, service_name=’orcl’)
conn = cx_Oracle.connect(user=r’system’, password=’<<password>>’, dsn=dsn_tns)
c = conn.cursor()
c.execute(‘select username from dba_users’)
for row in c:
print (row[0])

4. Prepare your dockerfile with Oracle-InstantClient and required Python packages.

#Dockerfile
FROM oraclelinux:7-slim
ADD oracle-instantclient*.rpm /tmp/
RUN yum -y install /tmp/oracle-instantclient*.rpm && \
rm -rf /var/cache/yum && \
rm -rf /tmp/oracle-instantclient*.rpm && \
yum install -y python3 && \
yum install -y python3-pip && \
echo /usr/lib/oracle/12.2/client64/lib > /etc/ld.so.conf.d/oracle-instantclient12.2.conf && \
ldconfig
ENV PATH=$PATH:/usr/lib/oracle/12.2/client64/bin
WORKDIR /app
COPY requirements.txt .
COPY oratest.py .
RUN pip3 install — no-cache-dir -r requirements.txt
ENTRYPOINT [“python3”,”oratest.py”]

5. Build the Docker image and push it to Google container registry.

# Build Docker image 
export PROJECT_ID=$(gcloud config list project — format “value(core.project)”)
export IMAGE_REPO_NAME=oratest_container # you can give any name as suits your project
export IMAGE_TAG=oracle_ic # you can give any name as suits your project
export IMAGE_URI=gcr.io/$PROJECT_ID/$IMAGE_REPO_NAME:$IMAGE_TAG
docker build -f oratest.dockerfile -t $IMAGE_URI ./
# Push the image to Google Container Registry
docker push $IMAGE_URI

6. Initiate AI Platform Job using your docker image and VPC network.

# Find your Google Project Number 
gcloud projects describe $PROJECT_ID --format=”value(projectNumber)”)
# Setup config.yaml to enable AI Platform VPC private access.
trainingInput:
scaleTier: BASIC
network: projects/<<Project_Number>>/global/networks/raves-vpc
# Submit your AI Job
import time
BUCKET_NAME = ‘raves-ai-bkt’ # Provide the bucket name where you want to store o/p.
IMAGE_URI = ‘gcr.io/gargravish/oratest_container:oracle_ic’
# Define a timestamped job name
JOB_NAME = “raves_oratest_{}”.format(int(time.time()))
# Submit the training job:
!gcloud ai-platform jobs submit training $JOB_NAME \
— region asia-southeast1 \
— config ./config.yaml \
— scale-tier BASIC \
— stream-logs \
— master-image-uri $IMAGE_URI \
— \

Google Cloud is based on following principles of an open cloud:

  1. Ability to move any app to-and-from on-premises / other cloud to GCP at any time.
  2. Open-source software (OSS) permits continuous feedback loop.
  3. Open APIs provide ability to build on top of each other’s work.

Thus, in-line with above principles Google AI Platform allows you to run your job considering your own ML framework or algorithms, libraries, dependencies which your business might have without limiting you to only baked in feature-sets.

Customer Engineer, Data Specialist @ Google Cloud. I assist customers transform & evolve their business via Google’s global network and software infrastructure.