Oracle Database 21c, the new release of the longtime industry leading RDBMS, is currently available in the Oracle Cloud, where it can be deployed as a Virtual Machine DB System (for clusters and single instance) or a Bare Metal DB System (single instance). It’s also available in the Autonomous Database Free Tier Service, with limited storage and CPU capacity. It will be available more broadly later this year, including on-premises.
While Oracle Database 20c was scheduled for release in 2020, that release was cancelled because of the COVID-19 pandemic and its effect on customer IT departments. All of the features planned for 20c were rolled into 21c.
There are several notable points to be made about Oracle Database 21c. It can serve as a transactional, analysis, or enterprise database, and tune itself for its usage. It supports relational, graph, spatial, text, OLAP, XML, and JSON data. It has in-database training of machine learning models that includes AutoML, and that training can take advantage of the extra CPUs in storage servers on Exadata hardware. Speaking of Exadata hardware, Oracle Database 21c takes full advantage of its Octane persistent memory, and the Autonomous Database feature uses Exadata to automate index creation.
Oracle claims high performance for all of its converged data types, even compared against databases specialised for JSON documents or graphs. The hard benchmark numbers for Oracle Database 21c are not yet available, however.
Oracle Database history
Oracle Database 21c is the result of four decades of development. Oracle Database was the first commercial SQL-based relational database (1979), implemented based on a published IBM specification. (IBM held off releasing its own relational database, DB2, to avoid cannibalising sales of its hierarchical database, IMS.)
Briefly, Oracle Version 3 (1983) was a rewrite in portable C, to run on mainframes, minicomputers, and personal computers. Version 6 had enhancements to disk I/O, row locking, scalability, and backup and recovery. Oracle7 (1992) introduced PL/SQL stored procedures and triggers. Oracle8 had objects and table partitioning. Oracle8i (1999) provided native support for internet protocols and server-side support for Java. Oracle9i introduced Oracle Real Application Clusters (RAC) in 2001; RAC is a shared disk, multiple instance architecture. Oracle Database 10g introduced grid computing in 2003.
Oracle Database 11g (2007) improved manageability, diagnosability, and availability. Oracle Database 12c (2013) was designed for the cloud, featuring a new multitenant architecture, in-memory column store, and support for JSON documents. And Oracle Database 21c improves the developer experience with features such as Oracle Blockchain Tables and native JSON data types. Enhancements to Automatic In-Memory make the in-memory column store largely self-managing.
The Oracle Database is only one of Oracle’s many product lines. Oracle also offers nearly a dozen application development tools for Oracle Database, four data integration tools, nine other databases including MySQL and Berkeley DB, and half a dozen big data products — and that’s without going into Java, storage, operating systems, applications, or the Oracle Cloud.
Oracle Database overview
The Oracle Database 21c Technical Architecture document runs to 40 pages with block diagrams on almost every page (see the first two below). I don’t have room for that, nor would you want to read it here. Instead, let’s hit the high points.
First of all, Oracle is a converged database. It offers many data types and data models, several workload types, and a bunch of ancillary capabilities for developers and analysts.
Oracle Database supports relational, graph, spatial, text, OLAP, XML, and JSON data — all at once. It supports transactional (OLTP), analytic (OLAP), machine learning (ML), IoT, streaming, and blockchain workloads. It has integrated microservices, events, REST services, in-database ML training, CI/CD, and low-code application development. Again, that’s all in one database, rather than requiring an assortment of special-purpose databases. Oracle contends that a converged database saves customers money, reduces data transport and transformation, and simplifies analytics and applications.
Oracle offers database availability on-premises, in its public cloud, and in an extension of its cloud running on-prem. Oracle Database can be hosted, co-managed, or fully managed (autonomous).
Autonomous Database can perform automated scaling, automated tuning, automated provisioning, automated fault management, and automated patching. Automated tuning means that the execution of your Oracle Databases is continually analysed and tuned to optimise database performance for application development, production transaction processing, and data warehousing workloads that change over time. That includes creating indexes based on usage, which is especially important for data warehouses.
Exadata hardware and software underlies Autonomous Database. More explicitly, Exadata powers Oracle Exadata Database Machine (dedicated on-prem), Oracle Exadata Cloud Service, and Oracle Exadata Cloud@Customer (cloud on-prem). Exadata Database Machine X8M (the latest model) has a high-availability, scale-out design, separate database and storage servers, uses Optane persistent memory (PMEM) as part of a layered storage system, and uses in-memory columnar hybrid compressed format with vector processing for analytics and reporting. X8M takes advantage of remote direct memory access (RDMA) over converged 100Gbps Ethernet (RoCE) for internal interconnects, which completely bypasses the normal operating system stacks.
An Oracle database is a multi-tenant container database (CDB) that holds one system seed pluggable database and any number of user-created pluggable databases (PDBs). (See the third diagram below.) Users interact only with the PDBs, and a user may never realise that the PDB they are using isn’t a standalone database. In the past, Oracle also supported non-CDB database instances, but that option is no longer supported in Oracle Database 21c.
An Oracle Real Application Clusters (RAC) database architecture consists of multiple instances that run on separate server machines, all sharing a single database. The cluster presents a single end point to users and applications, through a listener process. RAC offers even higher availability than single Exadata machines.
Traditional Oracle databases were strictly disk-based, which limited their performance. In recent years, Oracle added an in-memory option for “hot” data. Oracle Database In-Memory is a suite of features that greatly improves performance for real-time analytics and mixed workloads. The in-memory column store is the key feature of Database In-Memory. Transactional queries always go to the row store; the in-memory column store is only for analytic and reporting queries.
What’s new in Oracle Database 21c
According to Oracle, the 200-plus new features in Oracle Database 21c fall into six categories: application development; big data and data warehousing solutions; database upgrade and utilities; management solutions; performance and high availability options; and security solutions.
Reading through all the improvements, it is clear to me that Oracle Database 21c is a mature database being improved by smart people in multiple ways to make it work better, faster, and more reliably, especially in the cloud. There are way too many new features to cover them all. I’ll pick out a few of the high-level points, mostly following William Hardie’s overview blog post.
While I’m something of a blockchain skeptic, Oracle’s addition of a blockchain ledger to its converged database will probably be useful for financial institutions. Basically, a blockchain is an immutable table that chains rows together cryptographically. The Oracle implementation uses an SHA2-512 computation over the current and previous rows as a hash.
There are a few deviations from immutability in the implementation, however. For one, rows can be set to NO DELETE UNTIL n DAYS AFTER INSERT, so that, for example, the blockchain table can conform to GDPR’s “right to delete” requirement when used for an external user table, after the retention period has expired.
Blockchains are normally supposed to be distributed; an Oracle database is typically centralised. Oracle does have a separate distributed Blockchain Platform. Oracle suggests using the integrated blockchain table in Oracle Database 21c when you don’t really need a distributed ledger, for example when the blockchain is used internally by an enterprise for financials or compliance data.
Native JSON type
Oracle Database has had JSON support since version 12c, but only within other data types (VARCHAR2, CLOB, or BLOB). Now it also has a native JSON type, which is stored in binary format, similar to the JSONB used by MongoDB. As a native type, JSON parsing happens only on inserts. This can result in read and update operations being four or five times faster than text-based JSON, and updates to very large JSON documents being 20 to 30 times faster.
Oracle developers have long used PL/SQL stored procedures and table functions to simplify SQL queries. Unfortunately, the Oracle PL/SQL and SQL engines operate in different contexts, and calling PL/SQL from SQL causes an expensive context switch. To avoid this, Oracle Database 21c adds a SQL macro capability with the new SQL_MACRO keyword, which causes functions to run in the SQL engine and avoid the context switch.
In-memory database enhancements
Database In-Memory used to require considerable management from users or DBAs. Now you can set INMEMORY_AUTOMATIC_LEVEL to HIGH, and all columns will be considered for in-memory analysis.
In addition, in-memory hash joins have been optimised using SIMD vectorisation, resulting in up to 10x speedups. Also, columnar scans have been sped up for the case where not every column is in memory. When that happens, the optimiser can now perform a hybrid scan and fetch projected column values from the row store if needed, instead of needing to perform the scan entirely on the row store.
Optimised graph models
Graph databases can have millions or even billions of edges and vertices. In Oracle Database 21c, the graph representation has been optimised, allowing the analysis of larger graphs using less memory than previous versions. This version also allows developers to create or extend graph algorithms in Java. These user-defined graph algorithms can run as quickly as native algorithms.
Persistent memory (PMEM) support
The latest Exadata hardware can contain up to 21TB of Intel Optane persistent memory (PMEM) per rack. Oracle Database 21c has significant improvements in how it handles PMEM. It keeps data and redo in local PMEM. It runs SQL directly on data stored in the mapped PMEM file system, eliminating the need for a large buffer cache. And it provides much faster transaction durability and near-instant recovery.
Oracle Machine Learning
While Oracle performed in-database machine learning in previous database versions, Oracle Database 21c offers a new AutoML option and new machine learning algorithms. Oracle AutoML does automatic algorithm selection, automatic feature engineering, and automatic model tuning.
New machine learning algorithms include XGBoost, MSET-SPRT, and the Adam optimiser. XGBoost is a highly efficient, scalable gradient tree boosting machine learning algorithm for regression and classification. The Multivariate State Estimation Technique–Sequential Probability Ratio Test (MSET-SPRT) algorithm is a nonlinear, nonparametric anomaly detection technique for monitoring critical processes. And Adam is a popular extension to stochastic gradient descent that uses mini-batch optimisation and can make progress faster while seeing less data than the other supported neural network optimisation solver, Limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) with line search.
Running Oracle Database 21c in Oracle Cloud
Oracle Database 21c currently has limited availability. It is currently only available in Always Free Autonomous Database, and in Database Cloud Service. It will be available more broadly later this year.
Always Free Autonomous Database is simple to create, and as the description says, always free, but it’s limited to one core and 20 GB of storage. (See the screenshot below.)
Database Cloud Service takes more effort to create (see the screenshots below), but offers you a choice of “shapes” — meaning the configuration of the VM. To perform the testing described in the next section, two cores are the minimum and four cores are recommended. Before creating the database VM, I had to create a virtual cloud network (VCN) and request a higher limit on cores and RAM than came with my account. Requests for Oracle Cloud service limit increases take a business day, or sometimes more.
Once the database was ready, I was able to connect to the VM and the database service from a shell on my own machine.
Then I went through the two setup labs and many of the example labs, taking most of a day. While I fumble-fingered a few things and found a few messages alarming, I was always able to recover, and there were no serious problems with the labs. A day of running the database cost US$4.43.
I wish that I could have tested and benchmarked some of the bigger features, such as graph algorithms and JSON document performance, but Oracle isn’t ready to benchmark 21c, and they don’t usually publish benchmarks that were run in the cloud. The published performance numbers — 19 microseconds latency, 1 TB/second analytic scans, and 560 GB/second SQL throughput — were obtained in 2019 by benchmarking Oracle Database 19c running on Exadata X8M on-prem.
RDBMS and kitchen sink
Oracle Database 21c is the newest in a long line of relational databases going back to 1979. It is now a converged database, meaning that it supports relational, graph, spatial, text, OLAP, XML, and JSON data all at once in the same database, and supports transactional, analytic, machine learning, IoT, streaming, and blockchain workloads. Oracle Database relies on Exadata appliances for its performance and many of its capabilities, such as automatic index creation and automatic database tuning, which are bundled into the Autonomous Database feature.
Oracle Database tries to compete with every other database in the world. Historically its on-premises prices were considered high by many customers, but moving from one database to another is non-trivial, and it sometimes took large enterprises multiple years to complete their migrations and end their Oracle contracts. Oracle’s cloud pricing now feels roughly competitive with high-performance commercial cloud database services such as SQL Azure and Amazon Aurora, but direct comparisons of pricing and performance among databases are difficult to make.
I’d suggest trying an always-free Oracle Autonomous Database cloud instance for basic functionality evaluations, and a larger Database Cloud Service instance to evaluate performance with significant amounts of data. Getting started with Oracle Database takes time, even for experienced DBAs and database programmers. I’d plan on taking at least a week to do an evaluation.
Cost: A few Oracle cloud services are free, and you can get a US$300 credit when you open an account. After that, costs depend on resource usage. You can estimate your Oracle cloud costs online. Running an Exadata X8M rack on-premises starts at ~US$100K plus Oracle Database user licenses; Oracle Cloud@Customer can change finances to usage-based billing.
Platform: Currently, Oracle Database 21c is only available in the Oracle Cloud. Later in 2021, it will also be available on-premises on Exadata X8M hardware.