|
|
---
|
|
|
title: How to set up MLflow
|
|
|
---
|
|
|
|
|
|
# How to set up MLFlow on the cluster (with VSCode)
|
|
|
|
|
|
*Work in progress*
|
|
|
|
|
|
This tutorial shows how to set up [MLFlow](https://mlflow.org) on the cluster and use it via VSCode.
|
|
|
MLflow is a tool to track and manage machine learning experiments. It lets you log parameters, metrics, and models, and view them in a local web User Interface (UI). Using MLflow makes it easy to keep track of training runs across scripts and notebooks, compare results, and ensure reproducibility.
|
|
|
|
|
|
{width=844 height=319}
|
|
|
|
|
|
## Preliminaries:
|
|
|
1. [Set up access to the cluster via VSCode](Tutorials/How-to-connect-VS-Code-with-Cluster)
|
|
|
2. Set up a virtual environment, in which you install MLFlow, ipykernel and other packages you need: <br>
|
|
|
- In your home folder on the cluster, create a new directory to keep things organised (for example "mlflow_project"), and execute following commands in the directory: <br>
|
|
|
```
|
|
|
module load anaconda # to load conda
|
|
|
conda create -n mlflow-env python=3.10 # name your environment as you like. I chose "mlflow-env"
|
|
|
source activate mlflow-env # activate your environment
|
|
|
```
|
|
|
- Install MLFlow, ipykernel, and other Python packages you need: <br>
|
|
|
* `pip install mlflow` Note: This will also install `numpy, pandas, scikit-learn` and others <br>
|
|
|
* `pip install ipykernel` To run jupyter notebooks in VSCode <br>
|
|
|
(Optional but useful: register the environment as a Jupyter kernel `python -m ipykernel install --user --name=mlflow-env --display-name "Python (mlflow-env)"`
|
|
|
3. Check if your setup is correct:
|
|
|
- Open a .ipynb notebook and select your environment as the kernel. Check e.g. if you can import Numpy.
|
|
|
|
|
|
{width=1435 height=294}
|
|
|

|
|
|
|
|
|
## Follow the [MLflow Tracking Quickstart](https://mlflow.org/docs/latest/getting-started/intro-quickstart) guide:
|
|
|
|
|
|
1. (Optional): Start a local MLflow Tracking Server. <br>
|
|
|
In a terminal run `mlflow server --host 127.0.0.1 --port 8080`.
|
|
|
Info:
|
|
|
This lets you view and update the UI in real time while your code logs runs, whereas without it, you'd have to manually restart the UI each time you want to see the latest results. <br>
|
|
|
Host 127.0.0.1 ensures that only you can access the UI. If port 8080 is already in use, choose any other, like 8081 for example.
|
|
|
To stop the server, press `CTRL + C` on Mac.
|
|
|
|
|
|
{width=1089 height=235}
|
|
|
|
|
|
2. Follow the [MLFlow Quickstart test notebook](https://mlflow.org/docs/latest/getting-started/intro-quickstart) <br>
|
|
|
|
|
|
{width=1097 height=547}
|
|
|
|
|
|
If you used a tracking server, make sure that you set the tracking server's uri correctly using:
|
|
|
```
|
|
|
mlflow.set_tracking_uri(uri="http://<host>:<port>")
|
|
|
```
|
|
|
If this is not set within your notebook or runtime environment, the runs will be logged to your local file system. Using a tracking URI enables real-time, centralized logging via a running MLflow server. Depending on the server, multiple users might access the UI. If your server is set to host 127.0.0.1, only you can see the UI.
|
|
|
|
|
|
{width=955 height=164}
|
|
|
|
|
|
|
|
|
3. Run your ML run (e.g. the quickstart notebook) and look at your logged data in the mlflow UI: <br>
|
|
|
|
|
|
## Forward a port from the cluster to your local browser to view the MLFlow UI
|
|
|
|
|
|
To view the UI, you must create a secure tunnel from your local localhost:YourPortNumber (on your laptop) to the remote 127.0.0.1:YourPortNumber (on the cluster). In VScode:
|
|
|
* Open the Command Palette: Ctrl + Shift + P (or Cmd + Shift + P on Mac)
|
|
|
* "Forward a Port": When prompted, enter: YourPortNumber e.g. 8081
|
|
|
* VS Code will now show a forwarded port in the “Ports” panel at the bottom. Click on the globe symbol to see the UI in your local browser.
|
|
|
|
|
|
{width=1436 height=119}
|
|
|
{width=1440 height=591}
|
|
|
|
|
|
{width=1438 height=523}
|
|
|
|
|
|
If you did not set up a tracking server, you can access the MLFlow UI simply via this command:
|
|
|
|
|
|
`mlflow ui --port 8081` (this is equivalent to `mlflow ui --host 127.0.0.1 --port 8081`)
|
|
|
Then forward the prompt to your local computer (see the steps above).
|
|
|
|
|
|
Note: The UI in this mode is read-only, and new runs or file changes in mlruns/ may not appear immediately. To refresh, you may need to restart the mlflow ui process.
|
|
|
|
|
|
|
|
|
|
|
|
|