Oracle uses MySQL HeatWave thermostat for machine learning

To further strengthen our commitment to providing advanced data technology coverage, VentureBeat welcomes Andrew Brast and Tony Beer as regular contributors. Watch their articles in the data pipeline.

Since acquiring Sun Microsystems more than a decade ago, Oracle has owned MySQL. Under Oracle control, MySQL remained distinctive. But if you’re not a MariaDB until a few years ago or less, few people thought about Oracle management. And as each major cloud provider deployed its managed MySQL database service, Oracle offered customers relatively few reasons to attract customers to Oracle-supported MySQL.

Yes, it is no longer. Fifteen months ago, Oracle introduced MySQL HeatWave, which runs an optimized implementation of MySQL on the Oracle Cloud Infrastructure (OCI, or Oracle mass cloud platform). These optimizations should be transparent to the application. And now Oracle is releasing version 3.0 of HeatWave, expanding the size of the node, reducing costs for a number of workloads, and introducing internal database machine learning that can benefit from higher data density nodes.

HeatWave is not an open source MySQL, as it differs from the extensions developed by Oracle (shown below). This is not particularly uncommon in open source, as Amazon Aurora and Azure PostgreSQL Hyperscale, with the exception of countless other PostgreSQL variants on the market, show that open source databases provide clean sheets for differentiation.

While turning the MySQL space into a serious competitor, Oracle has taken the database with HeatWave in a unique direction: it has been optimized for analysis other than transaction processing using MySQL support for plug-in storage engines. In this case, it was connected to a column-by-column storage engine that worked side-by-side with the string store and included optimizations adapted for processing analytics queries.

It is not uncommon to install a column-mounted maintenance engine that runs side-by-side with a row-mounted engine; MariaDB did just that, and in fact Oracle took a similar path, but with a different technology for its flagship database a few years ago. But to date, Oracle is the only one to release an analytics optimization engine for MySQL.

In the latest version, Oracle has introduced new improvements to reduce computing costs and include machine learning in the database.

Let’s start with operating costs. Version HeatWave 3.0 doubles the data density in each node of the calculator without changing the pricing for the calculator. So, you can now consume (pay) only half the number of nodes to calculate a workload. And, by the way, Oracle set the stage for all of this in the previous version of HeatWave 2.0, which doubled the maximum limit for HeatWave clusters to 64 nodes.

Together, the cost-effectiveness and scale should now be useful so that machine learning models can be managed on a database. Keep that thought in mind.

In addition to data density, HeatWave 3.0 scales it economically because you can add the number of nodes (up to a maximum of 64) in any increment. This is in line with what Oracle has introduced for the cloud service of autonomous databases and gets rid of the standard “t-shirt size”. Thus, flexibility with HeatWave does not mean that you have to double the number of active nodes each time your workload is included in the calculation. HeatWave also improves access when changing the size and a maximum of a few microns when the request is stopped.

HeatWave 3.0 adds a few tricks for further processing speed. Like any column storage engine, HeatWave makes extensive use of data compression. And it uses some common methods, such as Bloom filters, which reduce the amount of remote memory for processing requests. Specifically, HeatWave has implemented Closing Bloom filters that can perform the necessary data retrieval at a lower cost and significantly reduce the amount of remote memory required.

These capabilities, in turn, pave the way for Oracle to implement the ability to process machine learning models within a database without the need for an external ETL engine or machine learning environment. And by doing so, Oracle is following a trend that includes AWS (Amazon Redshift ML), Google (BigQuery ML), Microsoft (SQL Server with functions in R and Python databases), Snowflake (with Snowpark) and Teradata (via SQL extension). . But the comparison of these approaches is similar to apple and orange, as each provider has different paths from developing external models to offering limited and selective options for ML management, while others extend SQL.

The heat wave goes the chosen way. This is a suitable approach for business analysts or “civil data scientists” to democratize machine learning, just as self-service visualization puts BI in the hands of the average user. Instead, the external route is focused on data scientists in organizations that compete on their ability to develop their own unique and highly complex models.

The advantage of the structured approach is that it does not require external tools, i.e. the selection, adjustment, training and execution of ML models are performed entirely within the database. This eliminates the cost and cost of transferring data to ML devices or services that operate on separate nodes. Oracle also argues that keeping it all in the database reduces the potential level of attack and, as a result, reduces the security impact.

This is how the AutoML approach of HeatWave works. The user selects a table, columns, and algorithm type (e.g., regression or classification) and then determines where the model artifacts are stored. The system automatically determines the best algorithm, appropriate features and optimal hyperparameters, and generates a customized model.

It simplifies the basic steps; for example, when testing a candidate model, it separates the individual tasks or steps that the model performs, and each step is evaluated using proxies or notes that mimic the algorithm against a hyperparal representation pattern. It then automatically documents the selection of data, algorithms, and hyperparameters to make the model understandable, as shown in the figure below.

The advantage of ML database processing is a smoother architecture and less data transfer burden. Although the aspect of integrating any application processing into a database is more costly, there are several design features that address these issues.

The native cloud architecture, which allows computing to be expanded if necessary, eliminates the issue of contention for limited resources. In addition, most cloud analytics platforms that support ML databases only compile or support limited model libraries to avoid the workload AI equivalent of hell, especially for training courses that are more advanced. time and accountability. Oracle has published ML benchmarks for HeatWave 3.0, which are available on GitHub for subscribers and prospects to work on and test for themselves.

Oracle’s introduction of ML processing in HeatWave complements the ML-related feature from its latest version, version 2.0 from last summer. This version of MySQL Autopilot, which uses internal machine learning to help customers manage the database, such as offering how to provide and load a database, while offering closed-loop automation to process failure / recover errors and execute requests.

With version 3.0, MySQL HeatWave has a full range, using ML to manage the database and support ML models within it. This is another example of the prediction we’ve made this year that machine learning will take a central step, both to optimize database work and to give customers the ability to design and / or manage models in a database.

The VentureBeat function is the city’s digital platform for technical decision makers to gain knowledge about enterprise conversion and transaction technology. Learn more about membership.

Leave a Comment