Thursday, April 11, 2024

Run Datascience workloads on OCI with GraalVM, Autonomous Database and GraalPy


GraalVM is a high-performance polyglot virtual machine that supports multiple languages, such as Java, JavaScript, Python, Ruby, R, and more. GraalVM can run either standalone or embedded in other environments, such as the Oracle Cloud Infrastructure (OCI).
The GraalVM Stack 
Data science is a fast-developing field that uses computational techniques to extract valuable insights from extensive and intricate datasets. Data scientists employ numerous tools and languages, including Python, R, SQL, and Java, to carry out data analysis, visualization, and machine learning tasks.

Working with large volumes of data and using different tools and languages can be a challenging and inefficient task for data scientists. Furthermore, traditional platforms that are not optimized for data-intensive applications may result in performance issues while running workloads.

Oracle Cloud Infrastructure (OCI) provides a po rful solution for data science workloads through GraalVM. GraalVM is a high-performance virtual machine that supports multiple programming languages such as Python, R, Java, JavaScript, Ruby, and other languages. With GraalVM, data scientists can effortlessly integrate different languages and libraries within the same application, without compromising performance or interoperability.

GraalVM has a significant feature, GraalPy, which is a speedy and compatible implementation of Python running on GraalVM. With GraalPy, data scientists can execute their present Python code on GraalVM with minimal modifications, taking full advantage of GraalVM's speed and scalability. Moreover, GraalPy offers effortless access to other GraalVM languages and libraries, including R, Java, and NumPy.

Another advantage of using GraalVM for data science workloads is the integration with Oracle Autonomous Database (ADB), a fully managed cloud database service that provides high availability, security, and performance for any type of data. ADB supports both SQL and NoSQL data models, as well as built-in machine learning capabilities. ADB also offers a dedicated Data Science service that allows data scientists to collaborate and share their projects, models, and notebooks on OCI.

By combining GraalVM, ADB, and Data Science service, data scientists can leverage the best of both worlds: the flexibility and productivity of Python and other languages on GraalVM, and the reliability and scalability of ADB on OCI. In this blog post, I will show you how to run a simple data science workload on OCI using GraalVM, ADB, and OML4py.  Furthermore, this is
 a basic setup of how to use GraalVM on OCI with the Autonomous Database and Python for data science applications.

Prerequisites


The basic prerequisites for running your workloads are:

  • An OCI Cloud environment and a compartment with the necessary permissions to create and manage resources.
  • A GraalVM Enterprise Edition instance on OCI. You can use the GraalVM Enterprise Edition (GraalVM EE) - BYOL image from the OCI Marketplace to launch a compute instance with GraalVM EE pre-installed.
  • An Autonomous Database instance on OCI. You can use either the Autonomous Transaction Processing (ATP) or the Autonomous Data Warehouse (ADW) service, depending on your workload.
  • A Python development environment with pip and virtualenv installed. You can use the GraalVM EE instance as your development environment, or you can use a separate machine with SSH access to the GraalVM EE instance.























This diagram shows a simple setup of running your workload in the cloud. For production purposes it might be more complicated.


When you create an OCI Compute node, you can follow the steps to install GraalVM and GraalPy. GraalPy is a Python implementation based on GraalVM, a high-performance polyglot virtual machine. GraalPy allows you to run Python code faster and more efficiently, as well as interoperate with other languages supported by GraalVM. 

Specific components

To run datascience workloads you might use the following components

  • Graalpy is a Python implementation that runs on the GraalVM, a high-performance polyglot virtual machine that supports multiple languages such as Java, JavaScript, Ruby, R, and Python.
  • Oracle Autonomous Database is a cloud service that configures and optimizes your database for you, based on your workload. It supports different workload types, including Data Warehouse, Transaction Processing, JSON Database, and APEX Service.
  • Graalpy workload is a type of workload that involves running Python applications on the Oracle Autonomous Database, using the GraalVM as the execution engine. This allows you to leverage the performance, scalability, security, and manageability of the Oracle Autonomous Database for your Python applications.

A possible workload on an Autonomous Database is a data analysis and machine learning application that uses the Oracle Machine Learning for Python (OML4Py) package. OML4Py is a Python package that provides an interface for data scientists and developers to work with data and models on the Autonomous Database. The package utilizes the in-database algorithms and parallel execution capabilities of the Autonomous Database, making data analysis and machine learning more scalable and efficient.

To run this application, you will need to install the GraalVM Enterprise Edition on your Autonomous Database. Then you can create a Python environment using the GraalVM Updater on a compute node where GraalVM is installed. After that, you can use the cx_Oracle module to connect to your database. Additionally, you will need to install the OML4Py package and its dependencies using the pip command. Finally, you can use the OML4Py API to load data from your database, explore and transform the data, create and train machine learning models, and evaluate and deploy these models.



Here is a code snippet that shows how to use OML4Py to create and train a logistic regression model on the iris dataset, which is a sample dataset that contains measurements of different species of iris flowers. Specifics like usernames and passwords you can get from your own setup.

# Import OML4Py and cx_Oracle modules
import oml
import cx_Oracle

# Connect to the Autonomous Database using cx_Oracle
connection = cx_Oracle.connect(user="username", password="password", dsn="dsn")

# Create an OML connection object
omlc = oml.connect(connection)

# Load the iris dataset from the database
iris = oml.sync(table="IRIS")

# Split the dataset into training and testing sets
train, test = iris.split()

# Create a logistic regression model
model = oml.logistic_regression("Species ~ SepalLength + SepalWidth + PetalLength + PetalWidth")

# Train the model on the training set
model.fit(train)

# Print the model summary
model.summary()

This script and the Iris trainingmodel is described at https://shorturl.at/orwDR by Mark Hornick
To implement GraalVM Enterprise Edition on Oracle Cloud Infrastructure (OCI) compute node with Autonomous Database (ADB) and GraalPython, you need to follow these steps:

1. Create an OCI compute node with the desired shape and operating system. You can use the OCI console, CLI, or Terraform to do this.
2. Install GraalVM EE on the compute node. You can download the latest version from the Oracle Technology Network (OTN) or use the OCI Resource Manager to provision it automatically.
3. Configure GraalVM EE to work with ADB. You need to set the environment variables JAVA_HOME, GRAALVM_HOME, and TNS_ADMIN to point to the GraalVM EE installation, the GraalVM EE home directory, and the directory where you store your ADB wallet files, respectively. You also need to add the GraalVM EE bin directory to your PATH variable.
4. Install GraalPython on GraalVM EE. You can use the GraalVM Updater tool (gu) to install GraalPython and its dependencies. For example, you can run `gu install python` to install GraalPython.
5. Download the client credentials (wallet) from the ADB service console and set the TNS_ADMIN environment variable to the path of the wallet directory. For example, run the following command:

export TNS_ADMIN=/path/to/wallet

7. Install the python-oracledb driver on GraalPython using the pip tool. For example, run the following command:

$GRAALVM_HOME/bin/pip install cx_oracle

8. Test your GraalPython installation and connection to ADB. You can use the GraalPython interactive shell (graalpython) or run a GraalPython script to connect to ADB and perform some queries. For example, you can run `graalpython connect.py` where connect.py is a script that uses the cx_Oracle module to connect to ADB and execute some SQL statements.
To connect to Oracle Autonomous Database from your Python application, you can use the following code to connect to the database:

import oracledb
# Set the TNS_ADMIN environment variable to the path of the wallet directory
import os
os.environ['TNS_ADMIN'] = '/path/to/wallet'
# Connect to the database using the service name from the tnsnames.ora file
conn = oracledb.connect(user='username', password='password', dsn='service_name')
print(conn)
conn.close()

To connect to the database, you need to place the wallet of the ADB in a 
specific location. You can obtain the service name from your ADB in the OCI console.
This should give you a good start to experiment with GraalVM, GraalPy and 
Data Science in the Oracle Cloud. It's a powerful solution for your 
production workloads, and starting with the basics will help you explore 
the possibilities.



No comments:

How organizations can boost their Cloud Native Adoption: The CNCF Maturity Model

Introduction Cloud Native has become important for building scalable and resilient applications in today's IT landscape. As organization...