Oracle has staked its claim in the data science platform space with the availability of the Oracle Cloud Data Science Platform.
The platform, built on the foundation of DataScience.com acquired by Oracle in 2018, is geared for teams of data scientists working collaboratively. Its capabilities include shared projects, model catalogs, team security policies, reproducibility, and auditability.
The platform has the Oracle Cloud Infrastructure Data Science service at its core. It provides users the ability to build, train, and manage machine learning algorithms on the Oracle Cloud using Python, TensorFlow, Keras, Jupyter and other popular data science tools.
Six additional services round out the platform, including new machine learning capabilities integrated in Oracle Autonomous Database, the Oracle Cloud Infrastructure Data Catalog, Oracle Big Data Service, Oracle Cloud SQL, Oracle Cloud Infrastructure Data Flow, and Oracle Cloud Infrastructure Virtual Machines for Data Science.
"The service is really the first of its kind in terms of a native cloud service in that it's really targeted for the enterprise," says Greg Pavlik, senior vice president product development of Oracle Data and AI Services. "It is focused on providing an environment for collaboration and governance for data scientists."
According to Pavlik, the offering targets the full lifecycle of machine learning within the enterprise, meaning that it's not just about developing or training models, but also taking those models into production and maintaining them.
"As data changes, models become potentially less valid and users need to be able to continue to leverage them inside of applications or inside the analytic reports on the one hand. On the other hand, they have to have a high confidence that what they're using is actually giving them good answers or correct responses," Pavlik says.
Simplifying data science
With Oracle Cloud Infrastructure Data Science, Oracle is taking on platforms from competitors such as Alteryx, KNIME Analytics Platform, and RapidMiner with a focus on automating the data science workflow.
The platform leverages AutoML algorithm selection and tuning, using machine learning models to select the best-fit algorithm for a specific use case, and to help users choose algorithm inputs and tune the model, Pavlik says. The platform also simplifies feature engineering by automatically identifying key predictive features from larger data sets.
Oracle Cloud Infrastructure Data Science also aids in model evaluation by generating a suite of metrics and visualisations to help users measure model performance against new data and rank models over time.
To support regulatory compliance efforts and help data teams establish trust in the output of their algorithms, Oracle's offering provides automated explanation of the weighting and importance of factors used to generate a prediction.
"We have advanced capabilities that we've developed in our Oracle Labs organization for model explainability," Pavlik says. "That's really understanding what is driving the model to its prediction, which is particularly important for regulatory situations where you have to be able to give an accounting of why: Why is the business making this decision? Why is the model telling us to do this?"
To support collaboration, Oracle has drawn inspiration from modern software development processes, adding capabilities that support shared projects, model catalogs, team-based security policies, and reproducibility and accountability.
"The big problem that we often see with teams is the data scientists are off downloading a bunch of stuff on their laptop and then they're working in relative isolation,” Pavlik says.
“You lose some of the sense of accountability, safety, some of the best practices you'd have from software development. So, we're looking to help organisations solve that problem without taking anything away from the data scientist."
The platform enables teams to leverage version control and share data and notebook sessions. Using model catalogs, teams can also share models and the artefacts necessary to modify and deploy them.
Team-based security policies provide access controls to models, codes, and data, all integrated with Oracle Cloud Infrastructure Identity and Access Management. Enterprises can also track assets via the platform, thereby ensuring models can be reproduced and audited, even if team members leave.
Additional data and machine learning services
Oracle Cloud Infrastructure Data Science sits at the core of the new Oracle Cloud Data Science Platform, but Oracle also unveiled six other data and machine learning services to support the platform and integrate it with the company’s overall cloud offering.
"If you're working in your notebook, you're doing Python training, it allows you to transparently go out, use compute resources, do scale-out training jobs, without having to drop into an IT administrative type mode. You can, within the tool itself, leverage the elastic capabilities of the cloud as part of your model training and model experimentation process," Pavlik says.
The additional six services include:
- New machine learning capabilities in Oracle Autonomous Database: Oracle has added support for Python and automated machine learning to Oracle Autonomous Database. Forthcoming integration with Oracle Cloud Infrastructure Data Science will give data scientists the ability to develop models using open source and scalable in-database algorithms.
- Oracle Cloud Infrastructure Data Catalog: The data catalog provides the ability to discover, find, organise, enrich and trace data assets. It features a built-in business glossary.
- Oracle Big Data Service: This service offers a full Cloudera Hadoop implementation, as well as machine learning for Spark.
- Oracle Cloud SQL: This service gives users the ability to run SQL queries on data in HDFS, Hive, Kafka, NoSQL, and Object Storage.
- Oracle Cloud Infrastructure Data Flow: This fully managed service lets users run Apache Spark applications without deploying or managing infrastructure.
- Oracle Cloud Infrastructure Virtual Machines for Data Science: This service offers preconfigured GPU-based environments for $30 a day.