Image of Pandas.DataFrame.sort_index | Python | Initial Commit

ADVERTISEMENT

Table of Contents

Introduction

Pandas is a powerful Python library that can be used to manipulate and analyze data. Pandas has many methods to view and manipulate data, one example of which is the dataframe.sort_index() function.

What is sort index?

Sort index is a method in the Pandas library that can be applied to a Pandas DataFrame object. This method, when applied to a dataframe, allows you to sort by the row or column labels, rather than the actual data. The sort index method sorts the rows by default, however you can specify if you want to sort on the columns.

What does Sort_index do in pandas?

Sort index works by applying a sorting algorithm to the labels of the dataframe and returns a sorted datagrame. By default it applies the quicksort sorting algorithm. You can also destructively sort the dataframe by setting the keyword argument inplace to true like so: df.sort_index(inplace=True).

How do I reorder index in pandas?

The sort_index functions allows you to sort rows in a pandas dataframe. Let's take a look at an example. First we want to import the pandas library:

>>> import pandas as pd

Next, we generate a sample dataframe df:

>>> data = [[5.09  ,1230  ,6260 ,0.82  ,0.68  ,0.012],
[3.89  , 970  ,5860 ,0.64  ,0.68  ,0.012],
[6.49  ,1010  ,5860 ,0.64  ,0.68  ,0.012],
[4.79  ,1070  ,6360 ,0.82  ,0.68  ,0.012],
[5.09  ,1110  ,6460 ,0.91  ,0.68  ,0.012],
[4.79  , 830  ,5860 ,0.64  ,0.68  ,0.012],
[4.89  ,1090  ,6360 ,0.82  ,0.68  ,0.012],
[4.69  ,1110  ,5860 ,0.64  ,0.68  ,0.012],
[4.99  ,1210  ,6460 ,0.82  ,0.68  ,0.012],
[4.29  ,1030  ,6160 ,0.82  ,0.68  ,0.012]]
>>> df = pd.DataFrame(data, columns=["M","L","T","G","X","Z"], index=[429, 1543, 1441, 775, 300, 1136, 725, 1282, 1055, 393])
       M     L      T      G      X     Z
429   5.09  1230  6260  0.82  0.68  0.012
1543  3.89   970  5860  0.64  0.68  0.012
1441  6.49  1010  5860  0.64  0.68  0.012
775   4.79  1070  6360  0.82  0.68  0.012
300   5.09  1110  6460  0.91  0.68  0.012
1136  4.79   830  5860  0.64  0.68  0.012
725   4.89  1090  6360  0.82  0.68  0.012
1282  4.69  1110  5860  0.64  0.68  0.012
1055  4.99  1210  6460  0.82  0.68  0.012
393   4.29  1030  6160  0.82  0.68  0.012

Since the function sorts the rows by default, we can just call sort_index on the dataframe to sort by row label:

>>> df.sort_index()
         M      L       T        G       X      Z
1543 	3.89 	970 	5860 	0.64 	0.68 	0.012
1441 	6.49 	1010 	5860 	0.64 	0.68 	0.012
1282 	4.69 	1110 	5860 	0.64 	0.68 	0.012
1136 	4.79 	830 	5860 	0.64 	0.68 	0.012
1055 	4.99 	1210 	6460 	0.82 	0.68 	0.012
775 	4.79 	1070 	6360 	0.82 	0.68 	0.012
725 	4.89 	1090 	6360 	0.82 	0.68 	0.012
429 	5.09 	1230 	6260 	0.82 	0.68 	0.012
393 	4.29 	1030 	6160 	0.82 	0.68 	0.012
300 	5.09 	1110 	6460 	0.91 	0.68 	0.012

We can also specify if we want to sort by descending or ascending using the ascending keyword argument. We can set it to True if we want to sort by ascending or False otherwise:

>>> df.sort_index(ascending=False)
         M      L       T        G       X      Z
1543 	3.89 	970 	5860 	0.64 	0.68 	0.012
1441 	6.49 	1010 	5860 	0.64 	0.68 	0.012
1282 	4.69 	1110 	5860 	0.64 	0.68 	0.012
1136 	4.79 	830 	5860 	0.64 	0.68 	0.012
1055 	4.99 	1210 	6460 	0.82 	0.68 	0.012
775 	4.79 	1070 	6360 	0.82 	0.68 	0.012
725 	4.89 	1090 	6360 	0.82 	0.68 	0.012
429 	5.09 	1230 	6260 	0.82 	0.68 	0.012
393 	4.29 	1030 	6160 	0.82 	0.68 	0.012
300 	5.09 	1110 	6460 	0.91 	0.68 	0.012

If your data has any NaNs, they will by default be placed at the end of the dataframe. You can specify if you want them at the beginning using the na_position keyword argument: df.sort_index(na_position=β€˜first’).

How do you sort columns in a data frame?

You can use the sort_index() method to also sort dataframes by column as well. You can indicate that you want to sort on the columns using the axis keyword argument. You can set it equal to 0 to sort the rows, and 1 to sort the columns.

pandas sort alphabetically

If you want to sort the columns alphabetically, you can simply just specify that you want to sort on the columns. The sorting algorithm will sort strings alphabetically by default:

>>> df.sort_index(axis=1)
	    G 	    L 	    M 	     T 	    X 	     Z
429 	0.82 	1230 	5.09 	6260 	0.68 	0.012
1543 	0.64 	970 	3.89 	5860 	0.68 	0.012
1441 	0.64 	1010 	6.49 	5860 	0.68 	0.012
775 	0.82 	1070 	4.79 	6360 	0.68 	0.012
300 	0.91 	1110 	5.09 	6460 	0.68 	0.012
1136 	0.64 	830 	4.79 	5860 	0.68 	0.012
725 	0.82 	1090 	4.89 	6360 	0.68 	0.012
1282 	0.64 	1110 	4.69 	5860 	0.68 	0.012
1055 	0.82 	1210 	4.99 	6460 	0.68 	0.012
393 	0.82 	1030 	4.29 	6160 	0.68 	0.012

What is the difference between the Sort_values () function and the Sort_index () function?

The sort_index() function sorts the dataframes on the labels. It takes in no arguments and sorts the row labels by default, and can also sort by column labels if you specify axis=1. On the other hand, the sort_values() function sorts on the actual values inside of the dataframe. It has one required argument to specify the column you would like to sort on.

pandas sort values

Here is an example if we sort by the 'L' column in our dataframe:

>>> df.sort_values(by="L")
         M       L        T       G        X       Z
1136 	4.79 	830 	5860 	0.64 	0.68 	0.012
1543 	3.89 	970 	5860 	0.64 	0.68 	0.012
1441 	6.49 	1010 	5860 	0.64 	0.68 	0.012
393 	4.29 	1030 	6160 	0.82 	0.68 	0.012
775 	4.79 	1070 	6360 	0.82 	0.68 	0.012
725 	4.89 	1090 	6360 	0.82 	0.68 	0.012
300 	5.09 	1110 	6460 	0.91 	0.68 	0.012
1282 	4.69 	1110 	5860 	0.64 	0.68 	0.012
1055 	4.99 	1210 	6460 	0.82 	0.68 	0.012
429 	5.09 	1230 	6260 	0.82 	0.68 	0.012

Notice how the row labels are still out of order, unlike when using the sort_index() function.

pandas sort by multiple columns

You can use the sort_values() method to sort multiple columns. You can simply pass in a list of column names to the by argument:

>>> df.sort_values(by=["L","G"])
         M       L        T       G        X       Z
1136 	4.79 	830 	5860 	0.64 	0.68 	0.012
1543 	3.89 	970 	5860 	0.64 	0.68 	0.012
1441 	6.49 	1010 	5860 	0.64 	0.68 	0.012
393 	4.29 	1030 	6160 	0.82 	0.68 	0.012
775 	4.79 	1070 	6360 	0.82 	0.68 	0.012
725 	4.89 	1090 	6360 	0.82 	0.68 	0.012
1282 	4.69 	1110 	5860 	0.64 	0.68 	0.012
300 	5.09 	1110 	6460 	0.91 	0.68 	0.012
1055 	4.99 	1210 	6460 	0.82 	0.68 	0.012
429 	5.09 	1230 	6260 	0.82 	0.68 	0.012

How do you drop duplicate rows in a DataFrame?

You can drop duplicate rows in a dataframe by using the pandas dataframe.drop_duplicates () method. Here is an example dataframe with multiple duplicate rows:

>>> data = [[4.70, 830  , 5860, 0.64, 0.68, 0.012],
[6.49 , 1010, 5860, 0.64, 0.68, 0.012],
[4.70 , 830 , 5860, 0.64, 0.68, 0.012],
[6.49 , 1010, 5860, 0.64, 0.68, 0.012],
[6.49 , 1010, 5860, 0.64, 0.68, 0.012],
[4.99 , 1210, 6460, 0.82, 0.68, 0.012],
[4.99 , 1210, 6460, 0.82, 0.68, 0.012]]
>>> df = pd.DataFrame(data, columns=["M","L","T","G","X","Z"])
>>> df
       M      L       T        G       X      Z
0 	4.70 	830 	5860 	0.64 	0.68 	0.012
1 	6.49 	1010 	5860 	0.64 	0.68 	0.012
2 	4.70 	830 	5860 	0.64 	0.68 	0.012
3 	6.49 	1010 	5860 	0.64 	0.68 	0.012
4 	6.49 	1010 	5860 	0.64 	0.68 	0.012
5 	4.99 	1210 	6460 	0.82 	0.68 	0.012
6 	4.99 	1210 	6460 	0.82 	0.68 	0.012

>>> df.drop_duplicates()
       M      L       T        G       X      Z
0 	4.70 	830 	5860 	0.64 	0.68 	0.012
1 	6.49 	1010 	5860 	0.64 	0.68 	0.012
5 	4.99 	1210 	6460 	0.82 	0.68 	0.012

Summary

In this article you learned what the pandas sort_index() function does and how to use it. First, you learned that the pandas sort_index() method sorts a dataframe by it's row labels. Next, you learned several different keyword arguments you can specify to sort in different ways. Then, you learned how to sort by column labels using the axis keyword argument. Finally, you learned what the difference is between the sort_index() method and the sort_values() method, and how to drop duplicate rows in a dataframe.

Next steps

If you're interested in learning more about the basics of Python, coding, and software development, check out our Coding Essentials Guidebook for Developers, where we cover the essential languages, concepts, and tools that you'll need to become a professional developer.

Thanks and happy coding! We hope you enjoyed this article. If you have any questions or comments, feel free to reach out to jacob@initialcommit.io.

References

  1. Pandas Documentation - https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_index.html
  2. Sort_index - https://www.geeksforgeeks.org/python-pandas-dataframe-sort_index/

Final Notes