TechTopia

Thursday, April 11, 2024

Run Datascience workloads on OCI with GraalVM, Autonomous Database and GraalPy

GraalVM is a high-performance polyglot virtual machine that supports multiple languages, such as Java, JavaScript, Python, Ruby, R, and more. GraalVM can run either standalone or embedded in other environments, such as the Oracle Cloud Infrastructure (OCI).

The GraalVM Stack

Data science is a fast-developing field that uses computational techniques to extract valuable insights from extensive and intricate datasets. Data scientists employ numerous tools and languages, including Python, R, SQL, and Java, to carry out data analysis, visualization, and machine learning tasks.

Working with large volumes of data and using different tools and languages can be a challenging and inefficient task for data scientists. Furthermore, traditional platforms that are not optimized for data-intensive applications may result in performance issues while running workloads.

Oracle Cloud Infrastructure (OCI) provides a po rful solution for data science workloads through GraalVM. GraalVM is a high-performance virtual machine that supports multiple programming languages such as Python, R, Java, JavaScript, Ruby, and other languages. With GraalVM, data scientists can effortlessly integrate different languages and libraries within the same application, without compromising performance or interoperability.

GraalVM has a significant feature, GraalPy, which is a speedy and compatible implementation of Python running on GraalVM. With GraalPy, data scientists can execute their present Python code on GraalVM with minimal modifications, taking full advantage of GraalVM's speed and scalability. Moreover, GraalPy offers effortless access to other GraalVM languages and libraries, including R, Java, and NumPy.

Another advantage of using GraalVM for data science workloads is the integration with Oracle Autonomous Database (ADB), a fully managed cloud database service that provides high availability, security, and performance for any type of data. ADB supports both SQL and NoSQL data models, as well as built-in machine learning capabilities. ADB also offers a dedicated Data Science service that allows data scientists to collaborate and share their projects, models, and notebooks on OCI.

By combining GraalVM, ADB, and Data Science service, data scientists can leverage the best of both worlds: the flexibility and productivity of Python and other languages on GraalVM, and the reliability and scalability of ADB on OCI. In this blog post, I will show you how to run a simple data science workload on OCI using GraalVM, ADB, and OML4py. Furthermore, this is a basic setup of how to use GraalVM on OCI with the Autonomous Database and Python for data science applications.

Prerequisites

The basic prerequisites for running your workloads are:

An OCI Cloud environment and a compartment with the necessary permissions to create and manage resources.
A GraalVM Enterprise Edition instance on OCI. You can use the GraalVM Enterprise Edition (GraalVM EE) - BYOL image from the OCI Marketplace to launch a compute instance with GraalVM EE pre-installed.
An Autonomous Database instance on OCI. You can use either the Autonomous Transaction Processing (ATP) or the Autonomous Data Warehouse (ADW) service, depending on your workload.
A Python development environment with pip and virtualenv installed. You can use the GraalVM EE instance as your development environment, or you can use a separate machine with SSH access to the GraalVM EE instance.

This diagram shows a simple setup of running your workload in the cloud. For production purposes it might be more complicated.

When you create an OCI Compute node, you can follow the steps to install GraalVM and GraalPy. GraalPy is a Python implementation based on GraalVM, a high-performance polyglot virtual machine. GraalPy allows you to run Python code faster and more efficiently, as well as interoperate with other languages supported by GraalVM.

Specific components

To run datascience workloads you might use the following components

Graalpy is a Python implementation that runs on the GraalVM, a high-performance polyglot virtual machine that supports multiple languages such as Java, JavaScript, Ruby, R, and Python.

Oracle Autonomous Database is a cloud service that configures and optimizes your database for you, based on your workload. It supports different workload types, including Data Warehouse, Transaction Processing, JSON Database, and APEX Service.

Graalpy workload is a type of workload that involves running Python applications on the Oracle Autonomous Database, using the GraalVM as the execution engine. This allows you to leverage the performance, scalability, security, and manageability of the Oracle Autonomous Database for your Python applications.

A possible workload on an Autonomous Database is a data analysis and machine learning application that uses the Oracle Machine Learning for Python (OML4Py) package. OML4Py is a Python package that provides an interface for data scientists and developers to work with data and models on the Autonomous Database. The package utilizes the in-database algorithms and parallel execution capabilities of the Autonomous Database, making data analysis and machine learning more scalable and efficient.

To run this application, you will need to install the GraalVM Enterprise Edition on your Autonomous Database. Then you can create a Python environment using the GraalVM Updater on a compute node where GraalVM is installed. After that, you can use the cx_Oracle module to connect to your database. Additionally, you will need to install the OML4Py package and its dependencies using the pip command. Finally, you can use the OML4Py API to load data from your database, explore and transform the data, create and train machine learning models, and evaluate and deploy these models.

Here is a code snippet that shows how to use OML4Py to create and train a logistic regression model on the iris dataset, which is a sample dataset that contains measurements of different species of iris flowers. Specifics like usernames and passwords you can get from your own setup.

From introvert person to public speaker

I write and speak a lot about technology, but a personal touch and experience can be nice and interesting sometimes too. This personal touch is telling the journey of how an introvert person like me became a public speaker, and I hope you get some inspiration out of it, maybe taking your first step to speaking in public.

Drive and enthusiasm

Now not everyone feels the need to speak in public, so you need to have a drive to want that, it’s an obvious fact. For me, I wasn’t really keen in the beginning to speak in public as I have a shy nature, but the most important tips are: know what you want to tell, and practice, practice, practice.

And even the most important thing: have fun doing it!

Knowing what you want to tell, especially in the technology area, but I suppose it also applies to other areas where people are public speakers, begins with doing research, combining you day to day experience and view on the world all in one. All begins with a good idea of direction. Actually my “public speaking career” started with the authoring of a book about technology, a beginners guide for starters.()

After speaking on an event, I noticed I enjoyed it, and as times goes by people are getting to know you more and you will get more enthusiastic about speaking and public.

This doesn’t mean that everything always goes well. Sometimes the subject to choose is not interesting enough, sometimes your own performance needs improvement, etc. It depends on multiple factors, but as a speaker you can have a lot under your own control, even your own nerves. And believe me, even the most experienced speakers feel excitement if they are about to speak in front of a large audience.

Prepare yourself

This seems like an open door but knowing what to tell helps a lot in gaining self confidence. My experience is to no include to much in my speech. I did that a few times and ended up in not telling everything I wanted. There are also a lot of tips available of how and what to present and they all apply, make a presentation as visual as you can. It’s better not to have slides with a lot of text, but include those in the comments.

So preparing means, choose the subject, prepare some slides and write your story down for yourself. It helped me a lot, but still the uncertainty remains about being complete. Well, in my talks I am sometimes overcomplete. So time management is really important, if you want to cover all the content.

I also followed some courses of how to speak in public, and the most important things I extracted out of it:

Do some storytelling — talk about normal day things to visualize wat you want to tell. The audience will recognize it and they will be more eager to listen to your story. Interacting with the audience gained my selfconfidence.
Have a good begin and end. Which means that you have to structure your talk so the audience can think it over after the talk
Know your position; does everyone see me good? Is the tone of my voice not too boring? Do I stand as someone with selfconfidence?

You also don’t want to be disturbed by failing devices or struggling with your presentation devices. So I always am present way before I start to present, to explore the room or hall and know what’s there.

Some presentations give demo’s. That can be a nice thing, but don’t make a demo too long or too complicated. I’ve seen a lot of demo’s where a lot of things are happening and code is flying on the screen, but they don’t add anything to the story. Prepare what you can prepare and demo a short and clear case.

My personal touch

Now, to prevent this story from becoming just a collection of hints and tips, I would like to add some personal touches. Remember that not all hints and tips work for everyone and they can be found everywhere. I love interacting with my audience during my talks. I enjoy seeing their reactions, from those who are interested and engaged to those who get a bit sleepy and start to close their eyes. Unfortunately, these days, all conferences are virtual, and we don't have the same level of interaction.

To keep my audience engaged, I always try to inject some humor and fun into my talks. I believe that getting your audience to laugh can boost your self-confidence and keep them interested.

However, even with all my experience, I still experience what's called "the imposter syndrome." This means that I sometimes fear being exposed as not being the expert people think I am. But I've learned that as long as I receive useful questions and see that people are interested and engaged, I don't have to worry about being "nailed."

Sometimes, I lose track of the structure of my story when I'm telling it. I get stuck and forget what I wanted to say. However, I've found that the best way to get back on track is to go back to my topics and start again from where I began.

Remember that not all talks will go smoothly all the time. I always question myself after a talk and ask if the topic was well-suited, if I was unclear, or if my tone was too monotone. Every talk you give can be a learning experience for the next one.

Support from the Oracle ACE community

Since I joined the ACE program in 2012 and was awarded in 2019 to Oracle ACE Director, my speaking activities were getting a real boost; being at conferences, speaking with like minded people, sharing but also gaining knowledge, and above all, meeting new people. One thing on these conferences is: Networking. Now as an introvert person, that's a little bit more of a step to take, than when you're not introvert; at least I suppose so. For me it's a huge step to talk to people I see for the first time, but during time, you meet people you've seen before and it becomes a bit easier. Still it remains a challenge for me.

Anyway, the Oracle ACE program helped me giving a boost to my speaking career.

For more information about the ACE program see: https://ace.oracle.com/

From introvert to public speaker

I am naturally an introverted person, but I enjoy speaking in public and sharing my knowledge with others. I appreciate receiving feedback, even if it is critical, as it helps to strengthen my confidence in my abilities. I hope that my story can inspire others to consider speaking in public, and I would be honored to be part of the audience and support them.

https://i.gifer.com/origin/c6/c653cdf2c5df010f4d74503986408205_w200.gif

Monday, January 1, 2024

Handling Microservice transactions with Oracle MicroTX

Maintaining data consistency in todays complicated "digital highway", and with regards to this article, across multiple microservices is one of the significant challenges today. Each microservice has its local transactions and databases, which may result in data inconsistencies if one microservice fails, or its transaction is rolled back without the others following suit. This problem becomes even more complicated in distributed and asynchronous environments where communication failures and network latency can occur.

Another challenge is how to perform queries that involve data from several microservices without causing too much overhead or coupling. This means that microservices should expose their data in a way that is easy to consume and aggregate by other services or clients without affecting their autonomy or performance.

Consistency helps maintain reliable operations and ensures that once a transaction is committed, the effects are permanent and visible to all subsequent transactions. If consistency is not maintained, later transactions might see outdated or incorrect data, leading to incorrect operations and results.

In distributed systems where data is stored across multiple nodes or locations, consistency ensures that a change made in one location is reflected across all others. This synchronisation is vital for the system to function as a coherent whole, rather than a collection of disjointed parts.

Inconsistent data might not only lead to operational problems but also legal issues, especially in industries that have regulatory requirements to ensure that data is handled accurately and consistently. It can also erode users' trust in a system and lead to a loss of reputation and business for the company operating the system.

Consistency helps in avoiding various types of anomalies like lost updates, temporary inconsistencies, and uncommitted data being read. It also facilitates collaboration in systems where multiple users might be working with the same data simultaneously.

Consistency can be achieved by implementing consistency handling mechanisms, such as:

Manual reconciliation

Manual reconciliation is a process that is often used to ensure data consistency, accuracy, and integrity between different systems or within different parts of a single system. It is typically applied in scenarios where automated reconciliation might not be feasible or where discrepancies have been detected that require human intervention to resolve.

Inconsistent data view for a period
Potential financial losses due to loss of business and customer dissatisfaction
Resource intensive task, which increases cost of operations

Develop transaction logic

Developers building transaction management logic in apps

Requires developers to have advanced skills
Takes valuable time away from app developers
Can be error prone; increases testing complexity
Increases time and cost to market

Use of existing Transaction Managers

Which will elaborated further in this article.

In summary, data consistency is fundamental to the correct, reliable, and lawful operation of databases and distributed systems. It's what allows different parts of a system to work together coherently and provides users with accurate and reliable information. Without it, systems can become unreliable, confusing, and prone to errors and misuse. Transaction patterns are essential for ensuring data consistency and reliability in distributed systems, where multiple services interact with each other and with external resources. Oracle provides a comprehensive framework that supports various transaction models, such as SAGA, 2 phase commit and XA, and allows you to choose the best option for your use case. Let's take a closer look at each of these patterns and how Oracle MicroTX can facilitate their management.

ACID

ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties that ensure reliable processing in a database system. These properties are essential for the proper functioning of a database system, particularly when handling transactions - sequences of operations that read or write database items. Here's what each property means:

1. Atomicity: This property ensures that a transaction is treated as a single unit, which either completes entirely or not at all. If a transaction is interrupted (for example, due to a system failure), any changes it made are rolled back, and the database is left in a consistent state as though the transaction had never occurred.

2. Consistency: This property ensures that any transaction will bring the database from one valid state to another while maintaining all defined rules and constraints. It guarantees that the database will not be left in a contradictory or conflicting state after the transaction. If a transaction might violate a constraint, it will be rolled back, and an error will be reported.

3. Isolation: Isolation ensures that concurrent execution of transactions leaves the database in the same state as though the transactions were executed sequentially. In other words, other transactions cannot see the results of a transaction until it has been committed. This prevents temporary inconsistencies and protects ongoing transactions from seeing partial results from other concurrently running transactions.

4. Durability: Once a transaction has been committed, it is permanent, and the changes made by the transaction will persist even in the face of system failures. Durability is typically ensured by storing transaction data in non-volatile storage, and often a transaction log is used to replay changes if a failure occurs after a transaction is committed but before all its changes are physically written to disk.

The ACID properties are a set of principles that work together to provide a strong and dependable framework for processing transactions. By making sure that transactions are atomic, consistent, isolated, and durable, ACID helps prevent data corruption, maintain data integrity, and offer predictable and correct behavior even when there are multiple users and potential system failures.

Traditional relational database systems are built to follow ACID compliance to ensure data consistency, durability, isolation, and atomicity. However, in distributed systems, achieving all four ACID properties at the same time can be challenging. Therefore, some models prefer to relax one or more of these properties to improve performance or availability. For example, some NoSQL databases prioritize eventual consistency over strict consistency to achieve higher availability and partition tolerance.

In conclusion, ACID transactions are fundamental to ensuring that database operations are reliable, and they are a core concept in database management and design.

Transaction patterns and solutions

In this section a few of the common patterns are discussed, however there are a few more, all with their own characteristics and use-cases.

SAGA is a transaction pattern that consists of a sequence of local transactions, each performed by a different service, that together achieve a global business goal. If one of the local transactions fails, the SAGA executes a series of compensating actions to undo the effects of the previous transactions and restore the system to a consistent state. SAGA is suitable for long-running and complex transactions that involve multiple services and resources, where locking or blocking them for the duration of the transaction is not feasible or desirable. SAGA also provides more flexibility and resilience than traditional atomic transactions, as it allows partial failures and retries.

To illustrate how the SAGA pattern works, let's consider an example of a travel booking system that consists of three microservices: flight service, hotel service and payment service. The global business goal is to book a flight and a hotel for a customer and charge their credit card accordingly. The SAGA workflow for this scenario could be as follows:

- The customer initiates the booking request by providing their travel details and payment information.

- The flight service receives the request and tries to reserve a flight ticket for the customer. If successful, it returns a confirmation code to the customer and notifies the coordinator service. If not, it returns an error message to the customer and aborts the transaction.

- The hotel service receives the request and tries to reserve a hotel room for the customer. If successful, it returns a confirmation code to the customer and notifies the coordinator service. If not, it returns an error message to the customer and executes a compensating action to cancel the flight reservation by calling the flight service with the confirmation code.

- The payment service receives the request and tries to charge the customer's credit card for the total amount of the booking. If successful, it returns a receipt to the customer and notifies the coordinator service. If not, it returns an error message to the customer and executes two compensating actions to cancel both the flight and the hotel reservations by calling the respective services with their confirmation codes.

As you can see, each local transaction has a corresponding compensating action that reverses its effect in case of failure. The coordinator service is responsible for orchestrating the execution of the local transactions and the compensating actions, as well as handling failures and timeouts.

Oracle MicroTX supports the SAGA pattern by providing a coordinator service that orchestrates the execution of the local transactions and the compensating actions. The coordinator service communicates with the participating services through a standard interface, which defines the business logic and the compensation logic for each service. The coordinator service also maintains a log of the transaction state and handles failures and timeouts. Oracle MicroTX allows you to define your SAGA workflows using declarative annotations or XML configuration files, which simplifies the development and maintenance of your microservices.

2 phase commit (2PC) is a transaction pattern that ensures atomicity and consistency across multiple resources, such as databases, message queues or web services. 2PC involves two phases: a prepare phase and a commit phase. In the prepare phase, each resource is asked to vote on whether it can commit or abort the transaction. If all resources vote to commit, the transaction moves to the commit phase, where each resource is instructed to finalize the transaction. If any resource votes to abort or fails to respond, the transaction moves to the abort phase, where each resource is instructed to roll back the transaction.

Prepare Phase – The coordinator asks the participating nodes whether they are ready to commit the transaction. The participants returned with a yes or no.

Commit Phase – If all the participating nodes respond affirmatively in phase 1, the coordinator asks all of them to commit. If at least one node returns negative, the coordinator asks all participants to roll back their local transactions.

Oracle MicroTX supports the 2PC pattern by providing a transaction manager service that coordinates the voting and the finalization of the transactions across multiple resources. The transaction manager service uses a standard protocol, such as JTA or WS-AT, to communicate with the resources and ensure their agreement on the outcome of the transaction. Oracle MicroTX also provides APIs and tools for integrating various types of resources with the transaction manager service, such as JDBC drivers, JMS providers or REST clients.

XA is a specification that defines how distributed transactions can be managed by a transaction manager service and multiple resource managers. XA is based on the 2PC pattern, but it adds some additional features, such as recovery mechanisms, timeout settings and heuristic decisions(decisions made in unusual circumstances, such as communication failures). XA is widely adopted as a standard for distributed transactions in heterogeneous environments, where different types of resources need to be coordinated by a common transaction manager service.

Oracle MicroTX supports the XA specification by providing an XA-compliant transaction manager service that can interoperate with any XA-compliant resource manager. Oracle MicroTX also provides XA drivers for various Oracle products, such as Oracle Database, Oracle WebLogic Server or Oracle Coherence, which enable them to participate in XA transactions. Oracle MicroTX also offers advanced features for monitoring and managing XA transactions, such as performance tuning, diagnostic tools and recovery options.

Other well-known patterns are: Try-Confirm-Cancel(TCC), Long Running Actions(LRA), Eventual Consistency, Optimistic/Pessimistic Locking.

Oracle MicroTX Architecture

The Oracle MicroTX architecture is a distributed transaction management system that enables data consistency across microservices deployed in Kubernetes and/or other environments. The MicroTX architecture consists of two main components: the transaction coordinator and the MicroTX library.

The transaction coordinator is a microservice that runs on Kubernetes and coordinates the outcome of distributed transactions among the participating microservices. The transaction coordinator supports different transaction protocols, such as XA, SAGA, LRA, and TCC, depending on the consistency and performance requirements of the application.

The MicroTX library is a client library that is integrated with the application microservices and provides the APIs and annotations to manage the distributed transactions. The MicroTX library communicates with the transaction coordinator and the database to perform the transaction operations, such as begin, commit, rollback.

Additional, supported OpenSource tools are provided for observability of transaction coordinations such as Prometheus, Jaeger, ELK and Kiali.

The MicroTX architecture simplifies the application development and operations by providing capabilities that make it easier to develop, deploy, and maintain microservices-based applications.

As you can see, Oracle MicroTX provides a comprehensive framework for handling different transaction patterns in microservice architectures. Whether you need to implement SAGA, 2PC or XA transactions(and many other transaction patterns).

***