This guide is for devops and gitops engineers who design, deploy and manage Jupyter platforms for data scientists. It reviews and details the architecture building blocks of the rich Jupyter ecosystem, from the user interface to the servers, reviewing the tradeoffs of the different options and highlighting when you should use them. It ensures you will take the correct architecture decisions to deliver performance, security and observability and that you are able to configure and operate the system.
This book also explains with practical examples how you can further customize and extend the building blocks to transform the platform into a real data product that fits the evolving data science use cases like Spark and Dask workloads. The book author is tightly connected with Jupyter stakeholders, as well as core developers as large users.
About the topic
Jupyter has grown from a local server used in the academic world to become the default data scientist choice of large companies. The opensess and rich set of contributors have been key to this success. The teams responsible to design, deploy and manage those infrastructure are beginning to feel friction due to the more and more demanding requirements of their users and use cases. One option is to use managed services from cloud providers, but a lot of corporations still need the level of customization and control the cloud providers can not offer.
Therefore, devops and gitops teams need discovering and learning from the GitHub repositories and documentations, making decisions and often developing extensions and implementing ad-hoc systems. There is no place where a developer can have a complete and accurate view of the Jupyter solutions on which he can rely on, anticipating the future and experimental pieces like Jupyter Notebook, JupyterLab, JupyterHub, the ipynb format, Voila for dashboarding... for their architecture. Developers need also to discard the outdated and soon deprecated pieces. This book aims to fill that gap to ensure Jupyter, grown from academics, can be used by businesses.
knowledge do you assume of them? What books can you assume they have read? What skills can you assume they've mastered?
This book audience is devops and gitops engineers who design, deploy and operate a data science platform built on top of the Jupyter ecosystem. Often, a managed service provided by a cloud provider or by a PaaS (Platform as a Service) does not fit the specific and security requirements of a company.
Those engineers need to deeply understand the Jupyter ecosystem architecture, where it is going, and how to use it and extend it today with an comprehensive and readable version of the content found in many GitHub organisations (*).
(*) https://github.com/jupyter - https://github.com/jupyterlab - https://github.com/jupyter-server - https://github.com/jupyterhub - https://github.com/executablebooks - https://github.com/jupyter-widgets
The usage of this book is meant to be a guide that can be read piece by piece with regular refreshes and more coverage from the reader. It is not meant to be a daily reference, but more like a discovery trajectory that can span months based on the reader practical requirements.
What the reader will learn—and how to apply it
By the end of this book, the reader will understand:
- The current rich Jupyter ecosystem and where it is going, as well as the norms as protocols that back that technical components.
- The various use cases (real time collaboration, distributed workloads, scheduled workload…) their users will come up.
- The customization and extension capabilities.
And the reader will be able to:
- Define, deploy and manage a secured and scalable data science platform
- Extend the platform with user-facing extensions, and backend extensions.
- Provide to data scientists the ability to run distributed workloads like Spark and Dask.
Jupyter, Notebook, JupyterLab, Jupyter Server, JupyteHub, IpyWidgets, React.js, Dashboard, Platform, Data, Data Science, Kubernetes, Extension , DevOps, GitOps, Scheduling, Security, Observability.
Other book features
Will there be a GitHub site for code samples?
Yes, for the deployment definitions on Docker and Kubernetes, and for the extension examples for JupyterLab, IpyWidgets and JupyterHub.