Packaging Python Apps That Use scikit-learn with PyInstaller

How to solve hidden import errors when bundling scikit-learn and other complex packages
Heads up! You've already completed this tutorial.

I'm trying to package a machine learning app that uses scikit-learn into a standalone executable. PyInstaller doesn't seem to recognise the package properly. How do I work around this?

If you've built a Python GUI application that uses a library like scikit-learn and then tried to package it with PyInstaller, you may have run into errors where the packaged app can't find modules that worked perfectly during development. This is a common issue with packages that have complex internal structures, use Cython extensions, or rely on many sub-packages that aren't obviously imported in your code.

Let's walk through why this happens and how to fix it.

Why Some Packages Don't Work Out of the Box

When PyInstaller bundles your application, it analyses your code to figure out which modules and packages need to be included. It follows the import statements in your scripts and traces through the dependency tree.

This works well for most packages. But some libraries — like scikit-learn — have imports that PyInstaller can't detect automatically. These are called hidden imports. They happen for a few reasons:

  • The package uses dynamic imports (e.g., importlib.import_module()).
  • The package includes compiled Cython or C extensions that aren't visible to static analysis.
  • Sub-modules are loaded conditionally at runtime.

scikit-learn ticks several of these boxes, which is why you'll often see ModuleNotFoundError or ImportError when running a PyInstaller-built executable that depends on it.

It's worth noting that PyInstaller's supported packages list is not exhaustive. A package not appearing on that list doesn't mean it won't work — most packages do work out of the box. It just means there's no built-in hook for it yet, and you may need to provide one yourself.

Solving the Problem with Hooks

PyInstaller uses a system called hooks to handle packages that need special treatment during bundling. A hook is a small Python file that tells PyInstaller about additional files or modules it needs to include.

Using --hidden-import on the Command Line

The simplest approach is to tell PyInstaller about the missing modules directly. If you see an error like:

python
ModuleNotFoundError: No module named 'sklearn.utils._cython_blas'

You can add it as a hidden import:

sh
pyinstaller --hidden-import=sklearn.utils._cython_blas your_app.py

You can specify multiple hidden imports by repeating the flag:

sh
pyinstaller --hidden-import=sklearn.utils._cython_blas --hidden-import=sklearn.neighbors._typedefs your_app.py

This can get tedious if there are many missing modules, so a hook file is often a better approach.

Writing a Custom Hook File

A hook file lets you collect all the sub-modules of a package automatically, so you don't have to chase down each missing import one by one.

Create a file called hook-sklearn.py with the following content:

python
from PyInstaller.utils.hooks import collect_submodules

hiddenimports = collect_submodules('sklearn')

The collect_submodules() function walks through the entire sklearn package and returns a list of every sub-module it finds. PyInstaller will then include all of them in the bundle.

To use this hook, place it in a directory (e.g., a folder called hooks/ in your project) and point PyInstaller to it with the --additional-hooks-dir flag:

sh
pyinstaller --additional-hooks-dir=hooks your_app.py

Your project structure would look something like this:

python
my_project/
├── your_app.py
└── hooks/
    └── hook-sklearn.py

Handling Data Files

Some packages also need data files (not just Python modules) to function correctly. scikit-learn, for example, includes dataset files and configuration data. You can collect those too by extending your hook:

python
from PyInstaller.utils.hooks import collect_submodules, collect_data_files

hiddenimports = collect_submodules('sklearn')
datas = collect_data_files('sklearn')

This ensures that both the code modules and any associated data files are bundled into your executable.

A Worked Example

Here's a minimal PyQt6 application that uses scikit-learn to train a simple model and display a prediction. You can use this as a test case for your packaging setup.

python
import sys

from PyQt6.QtWidgets import (
    QApplication, QLabel, QMainWindow, QPushButton, QVBoxLayout, QWidget,
)
from sklearn.linear_model import LinearRegression
import numpy as np


class MainWindow(QMainWindow):
    def __init__(self):
        super().__init__()
        self.setWindowTitle("scikit-learn + PyQt6")

        layout = QVBoxLayout()

        self.label = QLabel("Click the button to run a prediction.")
        layout.addWidget(self.label)

        button = QPushButton("Predict")
        button.clicked.connect(self.run_prediction)
        layout.addWidget(button)

        container = QWidget()
        container.setLayout(layout)
        self.setCentralWidget(container)

    def run_prediction(self):
        # Simple linear regression: y = 2x + 1
        X = np.array([[1], [2], [3], [4], [5]])
        y = np.array([3, 5, 7, 9, 11])

        model = LinearRegression()
        model.fit(X, y)

        prediction = model.predict(np.array([[6]]))[0]
        self.label.setText(f"Prediction for x=6: {prediction:.2f}")


app = QApplication(sys.argv)
window = MainWindow()
window.show()
app.exec()

To package this with PyInstaller, first create the hook file at hooks/hook-sklearn.py:

python
from PyInstaller.utils.hooks import collect_submodules, collect_data_files

hiddenimports = collect_submodules('sklearn')
datas = collect_data_files('sklearn')

Then run:

sh
pyinstaller --additional-hooks-dir=hooks --windowed your_app.py

The --windowed flag prevents a console window from appearing alongside your GUI on Windows and macOS.

Using a .spec File

If you find yourself re-running PyInstaller with many flags, you can use a .spec file instead. After running PyInstaller once, it generates a .spec file in your project directory. You can edit the Analysis section to include your hidden imports directly:

python
a = Analysis(
    ['your_app.py'],
    hiddenimports=['sklearn.utils._cython_blas', 'sklearn.neighbors._typedefs'],
    # ... other options
)

Or, for the automatic collection approach, you can add this at the top of the .spec file:

python
from PyInstaller.utils.hooks import collect_submodules, collect_data_files

sklearn_hidden = collect_submodules('sklearn')
sklearn_data = collect_data_files('sklearn')

And then reference them in the Analysis call:

python
a = Analysis(
    ['your_app.py'],
    hiddenimports=sklearn_hidden,
    datas=sklearn_data,
    # ... other options
)

Then build from the spec file:

sh
pyinstaller your_app.spec

A Note on fbs

If you're using fbs (fman build system) as your packaging tool, be aware that fbs wraps PyInstaller internally and doesn't currently support custom PyInstaller hooks. This means you may not be able to use the --additional-hooks-dir approach with fbs.

If you need to package an application that uses scikit-learn or similarly complex libraries, using PyInstaller directly gives you full control over hooks and hidden imports.

Tips for Debugging Packaging Issues

When your packaged app fails, here are some practical steps:

  1. Run the bundled app from the terminal to see the full traceback. On Windows, run the .exe from a command prompt. On macOS/Linux, run the binary from the terminal. This will show you exactly which module is missing.

  2. Add hidden imports one at a time as errors appear. Each run may reveal a new missing module.

  3. Use collect_submodules() to save time. Rather than hunting down individual modules, collecting all sub-modules for a package is usually the most reliable approach.

  4. Check the bundle size. Using collect_submodules() may include more than you strictly need, which increases the size of your executable. For a final release, you might want to pare this back to only the specific sub-modules your app actually uses.

  5. Test on a clean machine (or a virtual machine without Python installed) to make sure the packaged app truly works standalone.

Packaging complex Python applications takes a bit of patience, but once you have the right hooks in place, rebuilding is straightforward. The approach described here works for scikit-learn, but the same technique applies to any package that causes hidden import errors — collect_submodules() and custom hooks are your go-to tools.

Well done, you've finished this tutorial! Mark As Complete
[[ user.completed.length ]] completed [[ user.streak+1 ]] day streak

Packaging Python Applications with PyInstaller by Martin Fitzpatrick

This step-by-step guide walks you through packaging your own Python applications from simple examples to complete installers and signed executables.

More info Get the book

Martin Fitzpatrick

Packaging Python Apps That Use scikit-learn with PyInstaller was written by Martin Fitzpatrick.

Martin Fitzpatrick has been developing Python/Qt apps for 8 years. Building desktop applications to make data-analysis tools more user-friendly, Python was the obvious choice. Starting with Tk, later moving to wxWidgets and finally adopting PyQt. Martin founded PythonGUIs to provide easy to follow GUI programming tutorials to the Python community. He has written a number of popular Python books on the subject.