Image of Best Machine Learning Toolkits for Python in 2021


Table of Contents


There are numerous machine learning toolkits in Python, each with its own set of advantages and disadvantages. The best way to learn a new toolkit is to choose one and dive in, as each toolkit provides functionality that is transferable to different machine learning libraries. When choosing the best machine learning toolkit, there is not a one size fits all solution. Instead, the best machine learning toolkit depends on the developer’s experience with machine learning, their need for performance optimization, and the problems they wish to solve.

Here, we’ll discuss four of the most popular machine learning toolkits for Python:

  • TensorFlow
  • Keras
  • PyTorch
  • Scikit-learn

To provide a comparison between these different toolkits, we will demonstrate training a neural network on the Iris dataset, a very simple dataset that is popular in the machine learning space.

A neural network is a machine learning algorithm that mirrors the biological structure of the brain. A neural network is composed of layers of neurons including an input layer, an output layer, and at least one hidden layer. Inputs enter the algorithm through an input layer and pass through neurons in the hidden layers until they reach an output layer. Connections between neurons are assigned a weight that indicates the strength of the connection.

Training a model involves providing data to an algorithm to help it learn good values for a given performance metric. When training a neural network, the algorithm passes over the complete training set repeatedly to determine the appropriate weights for the neural connections. During the training process, the number of complete passes over the dataset is called an epoch.

Using the Iris dataset, we’ll be using supervised learning to train neural network models with labeled data. After the training is complete, we can use the output model to predict classification values for the irises in the testing set.

Data setup

Before comparing the different machine learning approaches, we need to set up the data. After downloading the above dataset to the current directory, we can read in the data file:

import pandas

df = pandas.read_csv('iris.csv')

Now the data must be split into input and output data. To be used in model training, the output data must be encoded as an array:

from sklearn.preprocessing import OneHotEncoder

encoder = OneHotEncoder()
x = df.loc[:, df.columns != 'species']
y = encoder.fit_transform(df['species'][:, numpy.newaxis]).toarray()

Although there are more sophisticated methods of preparing a dataset for training, we will randomly divide the dataset so that 70% of the data is in the training set and 30% is in the testing set. These sets will be used to train and test all models:

import numpy

# Set up a random seed for reproducing the data split

# Divide the dataset into a training and testing set based on the random numbers generated
msk = numpy.random.rand(len(df)) < 0.7
x_train = x[msk]
y_train = y[msk]

x_test = x[~msk]
y_test = y[~msk]

Before training neural networks with this data, we should also normalize the data set so that the mean and variance of the data has unit variance. This step helps to ensure that neural network models converge. Convergence occurs when the model’s loss function approaches a stable value.

To normalize the data, we first calculate the mean and standard deviation required to scale the data set. To prevent information leakage into the testing set, this scaling factor should be calculated using only the training data.

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

Now that we have the scaling factor, we can apply this transformation to both the training and testing data sets:

x_train = scaler.transform(x_train)
x_test = scaler.transform(x_test)

We can also go ahead and define some constants that we’ll be using across a variety of models:

# Number of features
input_dim = x.shape[1]

# Number of Iris types
output_dim = y.shape[1]

# Determines how quickly the model adjusts its weights to adapt to a problem
learning_rate = 0.1

# Determines the number of times the algorithm works through the entire training set
training_epochs = 50

Best machine learning toolkits for Python


TensorFlow is an open-source machine learning library developed by Google. Although it provides support for a wide variety of machine learning algorithms, the library primarily focuses on deep learning models. TensorFlow is a low-level API that provides many opportunities for model optimization once users are familiar with its capabilities. Additionally, it’s compatible with Keras, a high-level API that makes model development simpler. Due to TensorFlow’s power and customizability, it is one of the most popular options for deep learning and is the industry standard for deployed machine learning solutions.

Because of its complexity, however, TensorFlow has a steeper learning curve than other machine learning toolkits and provides limited functionality for quick prototyping. Additionally, TensorFlow models are difficult to debug since the models generally cannot be stepped through.

TensorFlow provides a great option for users looking for a high-performance machine learning toolkit that provides advanced functionality and that is easily compatible with a high-level machine learning API.

TensorFlow is most often used in conjunction with Keras, but there are a myriad of ways to create deep learning models using TensorFlow alone. In this example, we’ll focus on using Eager Execution to build a neural network from scratch.

Now, we need to design our model. For simplicity, we’ll limit this model to an input layer, one hidden layer, and an output layer.

class Model(object):
    def __init__(self, input_dim, output_dim, hidden_nodes):
        self.input_dim = input_dim
        self.output_dim = output_dim
        self.hidden_nodes = hidden_nodes

        # Weights between layers
        # weight[0]: weight between input layer and hidden layer
        # weight[1]: weight between hidden layer and output layer
        self.weights = [tf.Variable(tf.random.normal([self.input_dim, self.hidden_nodes])),
                       tf.Variable(tf.random.normal([self.hidden_nodes, self.output_dim]))]

        # Biases for each layer
        # bias[0]: bias for hidden layer
        # bias[1]: bias for output layer
        self.biases = [tf.Variable(tf.random.normal([1, self.hidden_nodes])),
                       tf.Variable(tf.random.normal([1, self.output_dim]))]

        # Variables for calculating accuracy
        self.total_correct = 0
        self.total_obs = 0

    def variables(self):
        return self.weights + self.biases

    def accuracy(self):
        if self.total_obs <= 0:
            return -1

        return self.total_correct / self.total_obs

    def forward(self, x):
        x_tf = tf.cast([x], dtype=tf.float32)
        a = tf.matmul(x_tf, self.weights[0]) + self.biases[0]
        hidden_layer = tf.nn.relu(a)
        return tf.matmul(hidden_layer, self.weights[1]) + self.biases[1]

    def backward(self, x_train, y_train):
        optimizer = tf.optimizers.Adam(learning_rate)
        with tf.GradientTape() as tape:
          predicted = self.forward(x_train)
          current_loss = self.loss(predicted, y_train)
        grads = tape.gradient(current_loss, self.variables)
        optimizer.apply_gradients(zip(grads, self.variables))

    def loss(self, y_pred, y_true):
        y_true_tensor = tf.cast(tf.reshape(y_true, (-1, self.output_dim)), dtype=tf.float32)
        y_pred_tensor = tf.cast(y_pred, dtype=tf.float32)
        return tf.losses.mean_squared_error(y_true_tensor, y_pred_tensor)

    def reset_accuracy(self):
        self.total_correct = 0
        self.total_obs = 0

    def update_accuracy(self, y_pred, y_true):
        self.total_obs += 1
        true_index = numpy.argmax(y_true)
        # For predicted values, the predicted classification is determined by the index
        # with the highest value
        pred_index = numpy.argmax(y_pred.numpy(), axis=1)

        # If actual and predicted values are the same, the classification is correct
        if int(true_index) == int(pred_index):
            self.total_correct += 1

We can instantiate our model:

hidden_nodes = 20
model = Model(input_dim, output_dim, hidden_nodes)

Now we can train our model, viewing the mean squared error and accuracy for each epoch as the model converges over time:

for epoch in range(training_epochs):
    loss_total = tf.Variable(0, dtype=tf.float32)

    for x, y in zip(x_train, y_train):
        preds = model.forward(x)
        loss_total = loss_total + model.loss(preds, y)
        model.update_accuracy(preds, y)
        model.backward(x, y)

    mse = loss_total.numpy() / x_train.shape[0]
    print('Epoch {} - MSE: {:.3f}, Accuracy: {:.3f}'.format(epoch + 1, mse[0], model.accuracy))

For the last epoch, our model has a mean squared error of 0.10 and an accuracy of 0.90 on the training set. This performance looks good, so we can try validating our model on the testing set:

loss_total = tf.Variable(0, dtype=tf.float32)

for x, y in zip(x_test, y_test):
    preds = model.forward(x)
    loss_total = loss_total + model.loss(preds, y)
    model.update_accuracy(preds, y)

mse = loss_total.numpy() / x_train.shape[0]
print('Test - MSE: {:.3f}, Accuracy: {:.3f}'.format(mse[0], model.accuracy))

On the testing set, our model has a mean squared error of 0.11 and an accuracy of 0.69. While model performance is generally lower on a testing set, this drop in performance suggests that the structure of this model can be improved. TensorFlow provides many opportunities for optimization but doesn’t necessarily give strong results right out of the box.


For developers interested in a higher-level alternative to TensorFlow, Keras is a great option. Also developed by a Google engineer, Keras is built on top of TensorFlow and incorporated into the TensorFlow API. Compared to TensorFlow, Keras is much more beginner-friendly, and it provides an easy-to-use interface for fast prototyping and experimentation.

Because Keras provides a higher-level interface, it provides fewer opportunities for optimization than TensorFlow, resulting in slower training times. However, the time the user loses in training time is often made up for by the speed of prototyping, especially when working with smaller datasets. Although its debugging capabilities are similarly limited to TensorFlow, the simplicity of the implementations means that bugs are few and far between.

Keras is a great option for users looking to perform rapid prototyping with small datasets. Users new to machine learning may find it useful to start with Keras before eventually using the core TensorFlow library as needed.

Keras provides two core model types, a sequential model for models only requiring one input and one output and a functional API for more complex interactions. Rather than defining our own model class like we did with TensorFlow, we can use one of these pre-built models. For training the Iris dataset, a sequential model is sufficient:

# The number of nodes dictates the connectivity between layers. The best value for this for a given dataset can be determined through experimentation.
nodes = 9

input_dim = x.shape[1] # number of features
output_dim = y.shape[1] # number of iris types

models = []

# First, we need to define the models. Let’s try a few different models with 1 to 3 input layers.
for num_layer in range(3):
    model = Sequential()
    for x in range(num_layer):

       # Add a new Dense input layer to the model
        input_layer = Dense(nodes, input_dim=input_dim, activation='relu')

    # Add an output layer to the model
    output_layer = Dense(output_dim, activation='softmax')

   # Using the compile function, we must set the factors that will be used to train the model

Once we’ve defined the model, we can then train the model and test its accuracy on the output data:

for model in models:, y_train,
              validation_data=(x_test, y_test))
    score = model.evaluate(x_test, y_test, verbose=0)

    print(‘Accuracy: ‘, score[1])

Although the values will vary on repeat executions, we found that the 1-layer model had an accuracy of 0.87, the 2-layer model had an accuracy of 0.90, and the 3-layer model had an accuracy of 0.98. Using Keras, we were able to create a high-performing model with minimal effort.


PyTorch is an open-source deep learning library developed by Facebook. It is based on Torch, a machine learning library developed in Lua. Like TensorFlow, PyTorch is a low-level API focused on deep learning. It is popular for natural language processing and is used by Facebook’s artificial intelligence group.

It is a relative newcomer to the machine learning space and has a relatively small community compared to Keras and TensorFlow, but it is rapidly growing in popularity. It provides fast performance and a variety of optimization options, but it is relatively difficult to implement and does not provide a high-level API. PyTorch is especially difficult for beginners since developers must dive straight into low-level deep learning implementations.

What PyTorch lacks in a simple interface, however, it makes up for with a robust debugging experience. PyTorch is structured in a way that feels native to Python, and it allows users to step through models in a way that Keras and TensorFlow don’t permit.

PyTorch is a great option for users with previous experience with machine learning who are looking for a deep learning toolkit with strong debugging capabilities.

Similarly to TensorFlow, PyTorch requires the user to build models from tensors. While the lower-level interface is more intuitive than TensorFlow, it does not provide a high-level API comparable to Keras. There are numerous tutorials demonstrating how to implement a neural network in PyTorch using the Iris dataset.


Unlike the previous libraries, scikit-learn is native to Python. This library focuses on traditional machine learning models like random forests, decision trees, and support vector machines, rather than deep learning techniques. This library is effective for solving simple modeling and classification problems using small data sets.

It provides a purely high-level API with few options for customization. While not a good fit for difficult deep learning problems, scikit-learn can be a great way to learn the basics of various machine learning techniques without getting caught up in the details. The functionality of scikit-learn is generally limited to defining a model and its parameters, training that model, and then predicting results on a test set. Scikit-learn also provides a simple API for creating machine learning visualizations.

Scikit-learn is a great option for users completely new to machine learning who are interested in exploring the differences between different algorithms.

For this model, we will use scikit-learn’s multilayer perceptron. In general, this model defaults to reasonable values and does not require much customization to create a working model. The full list of optional arguments is available in the documentation.

from sklearn.neural_network import MLPClassifier

# Create a model. When training this model, the best performing value will be selected from hidden_layer_sizes.
mlp = MLPClassifier(hidden_layer_sizes=(3, 20, 100),

# Train that model on the training data set, y_train)

# Now we can verify the accuracy of the model on the training data set
train_accuracy =  mlp.score(x_train, y_train))

# And test the accuracy of the model on the testing data set
test_accuracy = mlp.score(x_test, y_test)

print(‘Train accuracy: , train_accuracy)
print(‘Test accuracy: , test_accuracy)

This model achieved an accuracy of 0.99 on the training set and 0.98 on the testing set. Like with Keras, we were able to create a high-performing model with minimal effort.


Each of the above toolkits provides useful frameworks for creating machine learning models in Python. The best machine learning toolkit depends on the background and intent of the developer using it.

TensorFlow is best for optimizing deep learning algorithms in deployed models, Keras is best for quick prototypes of deep learning algorithms, PyTorch is best for experienced developers looking for a more robust debugging environment, and scikit-learn is best for quick exploration of simple machine learning algorithms. For both beginner and advanced users, Keras and TensorFlow form an especially powerful combination, as they provide both low-level control and fast prototyping.

If you're interested in learning the basics of coding and software development, check out our Coding Essentials Guidebook for Developers.

Thanks and happy coding! We hope you enjoyed this article! If you have any questions or comments, feel free to reach out to

Final Notes