Working with large volumes of data and using different tools and languages can be a challenging and inefficient task for data scientists. Furthermore, traditional platforms that are not optimized for data-intensive applications may result in performance issues while running workloads.
Oracle Cloud Infrastructure (OCI) provides a po rful solution for data science workloads through GraalVM. GraalVM is a high-performance virtual machine that supports multiple programming languages such as Python, R, Java, JavaScript, Ruby, and other languages. With GraalVM, data scientists can effortlessly integrate different languages and libraries within the same application, without compromising performance or interoperability.
GraalVM has a significant feature, GraalPy, which is a speedy and compatible implementation of Python running on GraalVM. With GraalPy, data scientists can execute their present Python code on GraalVM with minimal modifications, taking full advantage of GraalVM's speed and scalability. Moreover, GraalPy offers effortless access to other GraalVM languages and libraries, including R, Java, and NumPy.
Another advantage of using GraalVM for data science workloads is the integration with Oracle Autonomous Database (ADB), a fully managed cloud database service that provides high availability, security, and performance for any type of data. ADB supports both SQL and NoSQL data models, as well as built-in machine learning capabilities. ADB also offers a dedicated Data Science service that allows data scientists to collaborate and share their projects, models, and notebooks on OCI.
By combining GraalVM, ADB, and Data Science service, data scientists can leverage the best of both worlds: the flexibility and productivity of Python and other languages on GraalVM, and the reliability and scalability of ADB on OCI. In this blog post, I will show you how to run a simple data science workload on OCI using GraalVM, ADB, and OML4py. Furthermore, this is a basic setup of how to use GraalVM on OCI with the Autonomous Database and Python for data science applications.
Prerequisites
The basic prerequisites for running your workloads are:
- An OCI Cloud environment and a compartment with the necessary permissions to create and manage resources.
- A GraalVM Enterprise Edition instance on OCI. You can use the GraalVM Enterprise Edition (GraalVM EE) - BYOL image from the OCI Marketplace to launch a compute instance with GraalVM EE pre-installed.
- An Autonomous Database instance on OCI. You can use either the Autonomous Transaction Processing (ATP) or the Autonomous Data Warehouse (ADW) service, depending on your workload.
- A Python development environment with pip and virtualenv installed. You can use the GraalVM EE instance as your development environment, or you can use a separate machine with SSH access to the GraalVM EE instance.
Specific components
To run datascience workloads you might use the following components
- Graalpy is a Python implementation that runs on the GraalVM, a high-performance polyglot virtual machine that supports multiple languages such as Java, JavaScript, Ruby, R, and Python.
- Oracle Autonomous Database is a cloud service that configures and optimizes your database for you, based on your workload. It supports different workload types, including Data Warehouse, Transaction Processing, JSON Database, and APEX Service.
- Graalpy workload is a type of workload that involves running Python applications on the Oracle Autonomous Database, using the GraalVM as the execution engine. This allows you to leverage the performance, scalability, security, and manageability of the Oracle Autonomous Database for your Python applications.
A possible workload on an Autonomous Database is a data analysis and machine learning application that uses the Oracle Machine Learning for Python (OML4Py) package. OML4Py is a Python package that provides an interface for data scientists and developers to work with data and models on the Autonomous Database. The package utilizes the in-database algorithms and parallel execution capabilities of the Autonomous Database, making data analysis and machine learning more scalable and efficient.
To run this application, you will need to install the GraalVM Enterprise Edition on your Autonomous Database. Then you can create a Python environment using the GraalVM Updater on a compute node where GraalVM is installed. After that, you can use the cx_Oracle module to connect to your database. Additionally, you will need to install the OML4Py package and its dependencies using the pip command. Finally, you can use the OML4Py API to load data from your database, explore and transform the data, create and train machine learning models, and evaluate and deploy these models.
# Import OML4Py and cx_Oracle modules
import oml
import cx_Oracle
# Connect to the Autonomous Database using cx_Oracle
connection = cx_Oracle.connect(user="username", password="password", dsn="dsn")
# Create an OML connection object
omlc = oml.connect(connection)
# Load the iris dataset from the database
iris = oml.sync(table="IRIS")
# Split the dataset into training and testing sets
train, test = iris.split()
# Create a logistic regression model
model = oml.logistic_regression("Species ~ SepalLength + SepalWidth + PetalLength + PetalWidth")
# Train the model on the training set
model.fit(train)
# Print the model summary
model.summary()
This script and the Iris trainingmodel is described at https://shorturl.at/orwDR by Mark Hornick
import oracledb
# Set the TNS_ADMIN environment variable to the path of the wallet directory
import os
os.environ['TNS_ADMIN'] = '/path/to/wallet'
# Connect to the database using the service name from the tnsnames.ora file
conn = oracledb.connect(user='username', password='password', dsn='service_name')
print(conn)
conn.close()
To connect to the database, you need to place the wallet of the ADB in a
specific location. You can obtain the service name from your ADB in the OCI console.
This should give you a good start to experiment with GraalVM, GraalPy and
Data Science in the Oracle Cloud. It's a powerful solution for your
production workloads, and starting with the basics will help you explore
the possibilities.
No comments:
Post a Comment