New features in Upcoming Apache Spark 2.4 and MLflow - Tim Hunter - Databricks
This talk will combine two topics: I will start with an overview of the latest developments in Spark, and I will then present a recent Databricks project for simplifying machine learning. The soon to be released Apache Spark 2.4 comes packed with a lot of new functionalities: new scheduling model, the native AVRO data source, pyspark's eager evaluation mode, kubernetes support, and a lot of other improvements. MLflow, a new open source project from Databricks that simplifies this process. MLflow provides APIs for tracking experiment runs between multiple users within a reproducible environment, and for managing the deployment of models to production. Moreover, MLflow is designed to be an open, modular platform, in the sense that you can use it with any existing ML library and incorporate it incrementally into an existing ML development process.
This talk will combine two topics: I will start with an overview of the latest developments in Spark, and I will then present a recent Databricks project for simplifying machine learning. The soon to be released Apache Spark 2.4 comes packed with a lot of new functionalities: new scheduling model, the native AVRO data source, pyspark's eager evaluation mode, kubernetes support, and a lot of other improvements. MLflow, a new open source project from Databricks that simplifies this process. MLflow provides APIs for tracking experiment runs between multiple users within a reproducible environment, and for managing the deployment of models to production. Moreover, MLflow is designed to be an open, modular platform, in the sense that you can use it with any existing ML library and incorporate it incrementally into an existing ML development process.