Kubeflow was built to address two major issues with machine learning projects: the need for integrated, end-to-end workflows, and the need to make deploments of machine learning systems simple, manageable, and scalable. Kubeflow allows data scientists to build machine learning workflows on Kubernetes and to deploy, manage, and scale machine learning models in production without learning the intricacies of Kubernetes or its components.
Kubeflow is designed to manage every phase of a machine learning project: writing the code, building the containers, allocating the Kubernetes resources to run them, training the models, and serving predictions from those models. The Kubeflow 1.0 release provides tools, such as Jupyter notebooks for working with data experiments and a web-based dashboard UI for general oversight, to help with each phase.
Google claims Kubeflow provides repeatability, isolation, scale, and resilience not just for model training and prediction serving, but also for development and research work. Jupyter notebooks running under Kubeflow can be resource-limited and process-limited, and can re-use configurations, access to secrets, and data sources.
Several Kubeflow components are still under development and will be rolled out in the near future. Pipelines allow complex workflows to be created using Python. Metadata provides a way to track details about individual models, data sets, training jobs, and prediction runs. Katib gives Kubeflow users a mechanism to perform hyperparameter tuning, an automated way to improve the accuracy of predictions from models.