Train, Serve, and Deploy a Scikit-learn Model with FastAPI

Layla ZulfaFebruary 22, 2026

0 90 11 minutes read

FastAPI has rapidly emerged as a leading framework for serving machine learning models, lauded for its lightweight nature, exceptional speed, and user-friendly interface. This popularity stems from its ability to transform trained models into robust APIs, facilitating seamless testing, sharing, and production deployment for a wide array of machine learning and AI applications. This comprehensive guide outlines the process of training a Scikit-learn model, integrating it with FastAPI for inference, and deploying the resulting API to FastAPI Cloud. We will navigate through project setup, model training on a sample dataset, building a functional FastAPI inference server, rigorous local testing, and finally, a streamlined cloud deployment.

Table of Contents

Establishing the Project Foundation

The initial step in building a scalable machine learning API involves meticulous project organization. Creating a dedicated folder and establishing a clear directory structure from the outset ensures that training scripts, application code, and saved model artifacts are managed efficiently.

To commence, navigate to your terminal and execute the following commands:

mkdir sklearn-fastapi-app
cd sklearn-fastapi-app
mkdir app artifacts
touch app/__init__.py

This sequence of commands creates the root project directory sklearn-fastapi-app, a subdirectory named app to house the FastAPI application logic, and an artifacts folder to store the trained model. The __init__.py file is a standard Python convention that signifies the app directory as a package.

Following these commands, your project’s directory structure should appear as follows:

sklearn-fastapi-app/
├── app/
│   ├── __init__.py
│   └── main.py
├── artifacts/
├── train.py
├── pyproject.toml
└── requirements.txt

The next crucial step is to define the project’s dependencies by creating a requirements.txt file. This file will list all the necessary Python packages:

fastapi[standard]
scikit-learn
joblib
numpy
uvicorn[standard]

These packages are fundamental to the project:

FastAPI[standard]: Provides the core framework for building the web API, including its asynchronous capabilities and automatic data validation. The [standard] extra ensures essential dependencies like Pydantic are included.
scikit-learn: The cornerstone library for machine learning in Python, offering a vast array of algorithms and tools for model training and evaluation.
joblib: An efficient library for saving and loading Python objects, particularly useful for large NumPy arrays and Scikit-learn models, ensuring faster disk I/O compared to standard pickle.
numpy: Essential for numerical operations, especially for handling the array-based data that machine learning models typically process.
uvicorn[standard]: An ASGI server that runs FastAPI applications. It is known for its speed and efficiency in handling concurrent requests.

With the requirements.txt file populated, install these dependencies using pip:

pip install -r requirements.txt

This command ensures that all required libraries are downloaded and installed within your project’s environment, preparing it for the subsequent development phases.

Training the Machine Learning Model

The core of our machine learning API is the trained model. For this demonstration, we will train a simple classification model using the well-known breast cancer dataset readily available within Scikit-learn. This dataset is a standard benchmark for binary classification tasks.

Create a new Python file named train.py within your project’s root directory and populate it with the following code:

from pathlib import Path
import joblib
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

def main():
    """
    Trains a RandomForestClassifier on the breast cancer dataset, evaluates it,
    and saves the model and associated metadata.
    """
    print("Loading breast cancer dataset...")
    data = load_breast_cancer()
    X = data.data
    y = data.target

    print(f"Dataset loaded: X.shape[0] samples, X.shape[1] features.")

    # Split data into training and testing sets
    print("Splitting data into training and testing sets...")
    X_train, X_test, y_train, y_test = train_test_split(
        X,
        y,
        test_size=0.2,  # 20% for testing
        random_state=42, # for reproducibility
        stratify=y,      # ensures class distribution is similar in train/test
    )
    print(f"Training set size: X_train.shape[0], Testing set size: X_test.shape[0]")

    # Initialize and train the RandomForestClassifier
    print("Initializing and training RandomForestClassifier...")
    model = RandomForestClassifier(
        n_estimators=200,      # Number of trees in the forest
        random_state=42,       # Seed for random number generator
        max_depth=10,          # Maximum depth of the trees (to prevent overfitting)
        min_samples_split=5,   # Minimum samples required to split an internal node
        min_samples_leaf=3,    # Minimum samples required to be at a leaf node
        class_weight='balanced' # Adjusts weights inversely proportional to class frequencies
    )
    model.fit(X_train, y_train)
    print("Model training complete.")

    # Evaluate the model
    print("Evaluating model accuracy on the test set...")
    predictions = model.predict(X_test)
    accuracy = accuracy_score(y_test, predictions)
    print(f"Test accuracy: accuracy:.4f")

    # Prepare artifacts for saving
    artifact = 
        "model": model,
        "target_names": data.target_names.tolist(), # Convert numpy array to list
        "feature_names": data.feature_names.tolist(), # Convert numpy array to list
    

    # Define output path and save the artifact
    output_path = Path("artifacts/breast_cancer_model.joblib")
    output_path.parent.mkdir(parents=True, exist_ok=True) # Create directory if it doesn't exist
    joblib.dump(artifact, output_path)
    print(f"Model and artifacts saved to: output_path")

if __name__ == "__main__":
    main()

This script performs several critical tasks:

Data Loading: It fetches the breast cancer dataset, separating features (X) and target labels (y).
Data Splitting: The data is divided into training and testing sets using train_test_split. The stratify=y parameter is crucial here, ensuring that the proportion of malignant and benign samples is maintained in both the training and testing subsets, which is vital for unbiased model evaluation. A random_state is set for reproducibility, meaning the split will be the same each time the script is run.
Model Initialization and Training: A RandomForestClassifier is instantiated with specific hyperparameters. These hyperparameters (n_estimators, max_depth, min_samples_split, min_samples_leaf, class_weight) are chosen to create a robust model, balancing performance and generalization. The model is then trained on the training data.
Model Evaluation: The trained model’s performance is assessed using the unseen test data, calculating the accuracy score. This provides an objective measure of how well the model is expected to perform on new, real-world data.
Artifact Saving: The trained model, along with the names of the target classes and features, is bundled into a dictionary. This dictionary is then serialized and saved to a file named breast_cancer_model.joblib in the artifacts directory using joblib.dump. Saving these metadata components is essential for the API to correctly interpret predictions.

To execute the training process, run the script from your terminal:

python train.py

Upon successful execution, you will see output similar to this:

Loading breast cancer dataset...
Dataset loaded: 569 samples, 30 features.
Splitting data into training and testing sets...
Training set size: 455, Testing set size: 114
Initializing and training RandomForestClassifier...
Model training complete.
Evaluating model accuracy on the test set...
Test accuracy: 0.9561
Model and artifacts saved to: artifacts/breast_cancer_model.joblib

This output confirms that the model has been trained, evaluated, and saved. The accuracy score of approximately 95.61% indicates a strong performance on the held-out test data.

Building the FastAPI Inference Server

With the trained model secured, the next logical step is to create a FastAPI application that can load this model and serve predictions via an API. This server will be responsible for receiving input data, feeding it to the model, and returning the prediction results.

Create a new file named app/main.py and insert the following code:

from pathlib import Path
import joblib
import numpy as np
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import List, Dict, Any

# Define the path to the saved model artifact
ARTIFACT_PATH = Path("artifacts/breast_cancer_model.joblib")

# Initialize the FastAPI application with metadata
app = FastAPI(
    title="Breast Cancer Prediction API",
    version="1.0.0",
    description="A FastAPI server for serving a scikit-learn breast cancer classifier to predict malignancy.",
    contact=
        "name": "API Support",
        "url": "https://example.com/support",
        "email": "[email protected]",
    ,
    license_info=
        "name": "MIT License",
        "url": "https://opensource.org/licenses/MIT",
    ,
)

# Define the input data structure using Pydantic
class PredictionRequest(BaseModel):
    """
    Pydantic model for incoming prediction requests.
    Corresponds to the features of the breast cancer dataset.
    """
    mean_radius: float
    mean_texture: float
    mean_perimeter: float
    mean_area: float
    mean_smoothness: float
    mean_compactness: float
    mean_concavity: float
    mean_concave_points: float
    mean_symmetry: float
    mean_fractal_dimension: float
    radius_error: float
    texture_error: float
    perimeter_error: float
    area_error: float
    smoothness_error: float
    compactness_error: float
    concavity_error: float
    concave_points_error: float
    symmetry_error: float
    fractal_dimension_error: float
    worst_radius: float
    worst_texture: float
    worst_perimeter: float
    worst_area: float
    worst_smoothness: float
    worst_compactness: float
    worst_concavity: float
    worst_concave_points: float
    worst_symmetry: float
    worst_fractal_dimension: float

# Event handler to load the model upon application startup
@app.on_event("startup")
def load_model():
    """
    Loads the trained machine learning model and associated metadata
    from the artifact file when the FastAPI application starts.
    Raises an error if the artifact file is not found.
    """
    if not ARTIFACT_PATH.exists():
        raise RuntimeError(
            f"Model file not found at ARTIFACT_PATH. Please ensure 'train.py' has been run successfully."
        )
    try:
        artifact = joblib.load(ARTIFACT_PATH)
        app.state.model = artifact["model"]
        app.state.target_names = artifact["target_names"]
        app.state.feature_names = artifact["feature_names"]
        print(f"Model loaded successfully from ARTIFACT_PATH.")
        print(f"Target names: app.state.target_names")
        print(f"Feature names: app.state.feature_names")
    except Exception as e:
        raise RuntimeError(f"Error loading model artifact: e")

# Health check endpoint
@app.get("/health", tags=["Health Check"])
def health() -> Dict[str, str]:
    """
    Provides a simple health check for the API.
    Returns 'ok' status if the server is running.
    """
    return "status": "ok"

# Prediction endpoint
@app.post("/predict", response_model=Dict[str, Any], tags=["Predictions"])
def predict(request: PredictionRequest) -> Dict[str, Any]:
    """
    Accepts a JSON request with feature values and returns a breast cancer prediction.
    Includes the predicted class ID, label, and probabilities for each class.
    """
    try:
        # Convert the Pydantic model to a NumPy array
        # Ensure the order of features matches the training data
        features_list = [
            request.mean_radius, request.mean_texture, request.mean_perimeter,
            request.mean_area, request.mean_smoothness, request.mean_compactness,
            request.mean_concavity, request.mean_concave_points, request.mean_symmetry,
            request.mean_fractal_dimension, request.radius_error, request.texture_error,
            request.perimeter_error, request.area_error, request.smoothness_error,
            request.compactness_error, request.concavity_error, request.concave_points_error,
            request.symmetry_error, request.fractal_dimension_error, request.worst_radius,
            request.worst_texture, request.worst_perimeter, request.worst_area,
            request.worst_smoothness, request.worst_compactness, request.worst_concavity,
            request.worst_concave_points, request.worst_symmetry, request.worst_fractal_dimension
        ]
        features = np.array([features_list])

        # Retrieve model and target names from application state
        model = app.state.model
        target_names = app.state.target_names

        # Make prediction
        prediction_id = int(model.predict(features)[0])
        probabilities = model.predict_proba(features)[0]

        # Format the response
        response = 
            "prediction_id": prediction_id,
            "prediction_label": target_names[prediction_id],
            "probabilities": 
                target_names[i]: float(round(probabilities[i], 6))
                for i in range(len(target_names))
            
        
        return response

    except Exception as e:
        # Log the error for debugging purposes
        print(f"An error occurred during prediction: e")
        raise HTTPException(status_code=500, detail=f"Internal server error: str(e)")

# Optional: Add documentation for the /predict endpoint if not handled by FastAPI's default
# Example:
# @app.get("/docs", include_in_schema=False)
# async def custom_swagger_ui_html():
#     from fastapi.openapi.docs import get_swagger_ui_html
#     return get_swagger_ui_html(openapi_url=app.openapi_url, title=app.title + " - Swagger UI")

This FastAPI application is structured to be efficient and robust:

Pydantic Model (PredictionRequest): This class defines the expected structure and data types for incoming requests to the /predict endpoint. It ensures that the API receives data in the correct format, performing automatic validation. The fields directly correspond to the 30 features of the breast cancer dataset.
Startup Event (load_model): The @app.on_event("startup") decorator ensures that the load_model function is executed automatically when the FastAPI server starts. This is a critical optimization, as it loads the trained model from disk only once, preventing repeated loading on every prediction request. The loaded model and metadata are stored in app.state, making them accessible globally within the application. A check is included to ensure the model artifact exists before attempting to load it.
Health Check Endpoint (/health): A simple GET endpoint that returns a status message. This is invaluable for monitoring the API’s availability and ensuring it’s running as expected.
Prediction Endpoint (/predict): This POST endpoint is the core of the API.
- It accepts a PredictionRequest object, leveraging Pydantic for data validation.
- The incoming features are converted into a NumPy array, ensuring the correct shape and order required by the Scikit-learn model.
- It retrieves the loaded model and target names from app.state.
- model.predict() generates the predicted class ID, and model.predict_proba() calculates the probability distribution across all classes.
- The response is formatted as a JSON object, including the predicted class ID, its corresponding human-readable label (e.g., ‘malignant’, ‘benign’), and a dictionary of probabilities for each class, rounded for clarity.
- Robust error handling using try...except blocks is implemented to catch any exceptions during the prediction process and return a meaningful HTTPException to the client.

Testing the Model Inference Server Locally

Before deploying the API to the cloud, it’s essential to test its functionality thoroughly in a local environment. FastAPI simplifies this process by automatically generating interactive API documentation and providing a built-in development server.

To start the FastAPI server, use the uvicorn command in your terminal:

Train, Serve, and Deploy a Scikit-learn Model with FastAPI

uvicorn app.main:app --reload

The --reload flag is particularly useful during development, as it automatically restarts the server whenever you make changes to your code.

Once the server is running, it will typically be accessible at http://127.0.0.1:8000. FastAPI automatically serves interactive API documentation at the /docs endpoint. Open your web browser and navigate to:

http://127.0.0.1:8000/docs

This Swagger UI interface provides an interactive playground for your API. You can expand the /predict endpoint, click "Try it out," and input sample feature values. Then, click "Execute" to send a request to your local server and view the response.

For automated testing, you can use curl from your terminal. This command sends a POST request with a sample set of features to the /predict endpoint:

curl -X POST "http://127.0.0.1:8000/predict" 
-H "Content-Type: application/json" 
-d '
    "mean_radius": 17.99,
    "mean_texture": 10.38,
    "mean_perimeter": 122.8,
    "mean_area": 1001.0,
    "mean_smoothness": 0.1184,
    "mean_compactness": 0.2776,
    "mean_concavity": 0.3001,
    "mean_concave_points": 0.1471,
    "mean_symmetry": 0.2419,
    "mean_fractal_dimension": 0.07871,
    "radius_error": 1.095,
    "texture_error": 0.9053,
    "perimeter_error": 8.589,
    "area_error": 153.4,
    "smoothness_error": 0.006399,
    "compactness_error": 0.04904,
    "concavity_error": 0.05373,
    "concave_points_error": 0.01587,
    "symmetry_error": 0.03003,
    "fractal_dimension_error": 0.006193,
    "worst_radius": 25.38,
    "worst_texture": 17.33,
    "worst_perimeter": 184.6,
    "worst_area": 2019.0,
    "worst_smoothness": 0.1622,
    "worst_compactness": 0.6656,
    "worst_concavity": 0.7119,
    "worst_concave_points": 0.2654,
    "worst_symmetry": 0.4601,
    "worst_fractal_dimension": 0.1189
'

A successful curl command will return a JSON response similar to this:


  "prediction_id": 0,
  "prediction_label": "malignant",
  "probabilities": 
    "malignant": 0.998,
    "benign": 0.002

This output confirms that the local inference server is functioning correctly and ready for deployment.

Deploying the API to FastAPI Cloud

The final stage of our workflow is deploying the FastAPI application to the cloud, making it accessible globally. FastAPI Cloud offers a streamlined deployment process directly from the command line.

First, stop the local development server by pressing CTRL + C in your terminal.

To deploy, you’ll need to log in to your FastAPI Cloud account. If you don’t have one, you can create one through their website. Then, use the CLI command:

fastapi login

Follow the prompts to authenticate your account.

Once logged in, you can deploy your application with a single command:

fastapi deploy

During the initial deployment, the FastAPI CLI will guide you through the setup process. This includes selecting or creating a team to associate with the application and choosing whether to create a new application instance or link to an existing one. The CLI handles packaging your project, installing dependencies in the cloud environment, deploying the application, and verifying its successful launch. For subsequent deployments, a .fastapicloud directory will be created in your project, simplifying the process.

A successful deployment will conclude with a message similar to this, indicating your application’s live URL:

✅ Ready the chicken! 🐔 Your app is ready at https://sklearn-fastapi-app.fastapicloud.dev

You can then access the deployed API’s interactive documentation by navigating to the provided URL followed by /docs. To test the deployed API from your terminal, simply replace the local URL (http://127.0.0.1:8000) with your cloud URL in the curl command used previously.

For ongoing monitoring, the FastAPI Cloud dashboard provides access to build logs, startup behavior, and runtime issue tracking, offering valuable insights into your deployed application’s performance and health.

Future Considerations for Production Readiness

Having successfully deployed a Scikit-learn model via FastAPI to the cloud, you have established a robust end-to-end workflow. To elevate this deployment to a true production-grade system, several critical aspects require attention. These include implementing robust security measures, comprehensive testing strategies, continuous monitoring, and ensuring the API can reliably handle real-world traffic volumes at scale.

Key areas for further development include:

Security Enhancements: Implementing authentication and authorization mechanisms (e.g., API keys, OAuth2) to control access to your API.
Advanced Testing: Developing unit tests for individual components and integration tests for the API endpoints to ensure reliability and catch regressions.
Monitoring and Alerting: Setting up detailed logging and integrating with monitoring services to track performance metrics, error rates, and system health, with alerts configured for critical events.
Scalability and Performance Optimization: Exploring strategies like containerization (e.g., Docker), load balancing, and database optimizations to ensure the API can scale effectively to meet demand.
CI/CD Integration: Automating the build, test, and deployment pipeline using Continuous Integration and Continuous Deployment tools for faster and more reliable releases.

Addressing these considerations is paramount for transforming a functional deployed API into one that operates with resilience and efficiency in a demanding production environment.

Establishing the Project Foundation

Training the Machine Learning Model

Building the FastAPI Inference Server

Testing the Model Inference Server Locally

Deploying the API to FastAPI Cloud

Future Considerations for Production Readiness

Share this:

Related posts:

Layla Zulfa

Related Articles

MIT Researchers Unveil Breakthrough Method to Curb AI Overconfidence, Enhancing Reliability in Critical Applications

The Algorithmic Architect: Gabriele Farina’s Journey from Italian Vineyards to AI Decision-Making at MIT

The Evolving Landscape of Python: Embracing Type Annotations for Robust Data Science and Software Development

Olivia Honeycutt: Bridging Cognition, Language, and Policy to Empower Learners

Leave a Reply Cancel reply