Saul Shanabrook: metadsl: separating API from execution | PyData Austin 2019
"Overview of motivation New hardware (GPU, TPU, asic) and execution paradigm (distributed, polyhedral compilation) Fragmented array ecosystem (Tensorflow, PyTorch, Tensor Comprehensions, numexpr) Requires reimplementing optimizations on all backends Requires end user libraries (like scikit-learn) to support all backends Different APIs for users to learn Hard to collaborate Need to provide users with some sort of API that they can target, agnostic of optimizations and backends So can't we just have a __array_index__ and __shape__ API in Python that we can all collaborate over? No, this would be super slow! We don't do our execution in Python. But we do want this sort of API, but instead of doing the execution eagerly, it builds up a representation of the computation we want to do that can then be targeted to different backends Reference Mathematics of Arrays as original specification of this kind of algebra Overview of metadsl, the resulting framework Show how we can go from NumPy like API to LLVM Show integrations with other libraries Also show how this can be embedded in notebooks for context about the execution Conclusion How can we encourage distributed collaboration to solve the problems in our ecosystem?" metadsl is a Python framework for writing APIs that are detached from how they are executed. With it we can be framework agnostic definitions of concepts like "arrays" and compile them to backends like Tensorflow or LLVM. In this talk, we will use metadsl to build high performance scientific computing libraries. www.pydata.org PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases. 00:00 Welcome! 00:10 Help us add time stamps or captions to this video! See the description for details. Want to help add timestamps to our YouTube videos to help with discoverability? Find out more here: https://github.com/numfocus/YouTubeVideoTimestamps
"Overview of motivation New hardware (GPU, TPU, asic) and execution paradigm (distributed, polyhedral compilation) Fragmented array ecosystem (Tensorflow, PyTorch, Tensor Comprehensions, numexpr) Requires reimplementing optimizations on all backends Requires end user libraries (like scikit-learn) to support all backends Different APIs for users to learn Hard to collaborate Need to provide users with some sort of API that they can target, agnostic of optimizations and backends So can't we just have a __array_index__ and __shape__ API in Python that we can all collaborate over? No, this would be super slow! We don't do our execution in Python. But we do want this sort of API, but instead of doing the execution eagerly, it builds up a representation of the computation we want to do that can then be targeted to different backends Reference Mathematics of Arrays as original specification of this kind of algebra Overview of metadsl, the resulting framework Show how we can go from NumPy like API to LLVM Show integrations with other libraries Also show how this can be embedded in notebooks for context about the execution Conclusion How can we encourage distributed collaboration to solve the problems in our ecosystem?" metadsl is a Python framework for writing APIs that are detached from how they are executed. With it we can be framework agnostic definitions of concepts like "arrays" and compile them to backends like Tensorflow or LLVM. In this talk, we will use metadsl to build high performance scientific computing libraries. www.pydata.org PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases. 00:00 Welcome! 00:10 Help us add time stamps or captions to this video! See the description for details. Want to help add timestamps to our YouTube videos to help with discoverability? Find out more here: https://github.com/numfocus/YouTubeVideoTimestamps