Pandas.DataFrame.sort_index | Python | Initial Commit
ADVERTISEMENT
Table of Contents
- Introduction
- What is sort index?
- What does Sort_index do in pandas?
- How do I reorder index in pandas?
- How do you sort columns in a data frame?
- pandas sort alphabetically
- What is the difference between the Sort_values () function and the Sort_index () function?
- pandas sort by multiple columns
- How do you drop duplicate rows in a DataFrame?
- Summary
- Next steps
- References
Introduction
Pandas is a powerful Python library that can be used to manipulate and analyze data. Pandas has many methods to view and manipulate data, one example of which is the dataframe.sort_index() function.
What is sort index?
Sort index is a method in the Pandas library that can be applied to a Pandas DataFrame object. This method, when applied to a dataframe, allows you to sort by the row or column labels, rather than the actual data. The sort index method sorts the rows by default, however you can specify if you want to sort on the columns.
What does Sort_index do in pandas?
Sort index works by applying a sorting algorithm to the labels of the dataframe and returns a sorted datagrame. By default it applies the quicksort sorting algorithm. You can also destructively sort the dataframe by setting the keyword argument inplace
to true like so: df.sort_index(inplace=True)
.
How do I reorder index in pandas?
The sort_index functions allows you to sort rows in a pandas dataframe. Let's take a look at an example. First we want to import the pandas library:
>>> import pandas as pd
Next, we generate a sample dataframe df
:
>>> data = [[5.09 ,1230 ,6260 ,0.82 ,0.68 ,0.012],
[3.89 , 970 ,5860 ,0.64 ,0.68 ,0.012],
[6.49 ,1010 ,5860 ,0.64 ,0.68 ,0.012],
[4.79 ,1070 ,6360 ,0.82 ,0.68 ,0.012],
[5.09 ,1110 ,6460 ,0.91 ,0.68 ,0.012],
[4.79 , 830 ,5860 ,0.64 ,0.68 ,0.012],
[4.89 ,1090 ,6360 ,0.82 ,0.68 ,0.012],
[4.69 ,1110 ,5860 ,0.64 ,0.68 ,0.012],
[4.99 ,1210 ,6460 ,0.82 ,0.68 ,0.012],
[4.29 ,1030 ,6160 ,0.82 ,0.68 ,0.012]]
>>> df = pd.DataFrame(data, columns=["M","L","T","G","X","Z"], index=[429, 1543, 1441, 775, 300, 1136, 725, 1282, 1055, 393])
M L T G X Z
429 5.09 1230 6260 0.82 0.68 0.012
1543 3.89 970 5860 0.64 0.68 0.012
1441 6.49 1010 5860 0.64 0.68 0.012
775 4.79 1070 6360 0.82 0.68 0.012
300 5.09 1110 6460 0.91 0.68 0.012
1136 4.79 830 5860 0.64 0.68 0.012
725 4.89 1090 6360 0.82 0.68 0.012
1282 4.69 1110 5860 0.64 0.68 0.012
1055 4.99 1210 6460 0.82 0.68 0.012
393 4.29 1030 6160 0.82 0.68 0.012
Since the function sorts the rows by default, we can just call sort_index on the dataframe to sort by row label:
>>> df.sort_index()
M L T G X Z
1543 3.89 970 5860 0.64 0.68 0.012
1441 6.49 1010 5860 0.64 0.68 0.012
1282 4.69 1110 5860 0.64 0.68 0.012
1136 4.79 830 5860 0.64 0.68 0.012
1055 4.99 1210 6460 0.82 0.68 0.012
775 4.79 1070 6360 0.82 0.68 0.012
725 4.89 1090 6360 0.82 0.68 0.012
429 5.09 1230 6260 0.82 0.68 0.012
393 4.29 1030 6160 0.82 0.68 0.012
300 5.09 1110 6460 0.91 0.68 0.012
We can also specify if we want to sort by descending or ascending using the ascending
keyword argument. We can set it to True
if we want to sort by ascending or False
otherwise:
>>> df.sort_index(ascending=False)
M L T G X Z
1543 3.89 970 5860 0.64 0.68 0.012
1441 6.49 1010 5860 0.64 0.68 0.012
1282 4.69 1110 5860 0.64 0.68 0.012
1136 4.79 830 5860 0.64 0.68 0.012
1055 4.99 1210 6460 0.82 0.68 0.012
775 4.79 1070 6360 0.82 0.68 0.012
725 4.89 1090 6360 0.82 0.68 0.012
429 5.09 1230 6260 0.82 0.68 0.012
393 4.29 1030 6160 0.82 0.68 0.012
300 5.09 1110 6460 0.91 0.68 0.012
If your data has any NaNs, they will by default be placed at the end of the dataframe. You can specify if you want them at the beginning using the na_position
keyword argument: df.sort_index(na_position=βfirstβ)
.
How do you sort columns in a data frame?
You can use the sort_index() method to also sort dataframes by column as well. You can indicate that you want to sort on the columns using the axis
keyword argument. You can set it equal to 0
to sort the rows, and 1
to sort the columns.
pandas sort alphabetically
If you want to sort the columns alphabetically, you can simply just specify that you want to sort on the columns. The sorting algorithm will sort strings alphabetically by default:
>>> df.sort_index(axis=1)
G L M T X Z
429 0.82 1230 5.09 6260 0.68 0.012
1543 0.64 970 3.89 5860 0.68 0.012
1441 0.64 1010 6.49 5860 0.68 0.012
775 0.82 1070 4.79 6360 0.68 0.012
300 0.91 1110 5.09 6460 0.68 0.012
1136 0.64 830 4.79 5860 0.68 0.012
725 0.82 1090 4.89 6360 0.68 0.012
1282 0.64 1110 4.69 5860 0.68 0.012
1055 0.82 1210 4.99 6460 0.68 0.012
393 0.82 1030 4.29 6160 0.68 0.012
What is the difference between the Sort_values () function and the Sort_index () function?
The sort_index() function sorts the dataframes on the labels. It takes in no arguments and sorts the row labels by default, and can also sort by column labels if you specify axis=1
. On the other hand, the sort_values() function sorts on the actual values inside of the dataframe. It has one required argument to specify the column you would like to sort on.
pandas sort values
Here is an example if we sort by the 'L' column in our dataframe:
>>> df.sort_values(by="L")
M L T G X Z
1136 4.79 830 5860 0.64 0.68 0.012
1543 3.89 970 5860 0.64 0.68 0.012
1441 6.49 1010 5860 0.64 0.68 0.012
393 4.29 1030 6160 0.82 0.68 0.012
775 4.79 1070 6360 0.82 0.68 0.012
725 4.89 1090 6360 0.82 0.68 0.012
300 5.09 1110 6460 0.91 0.68 0.012
1282 4.69 1110 5860 0.64 0.68 0.012
1055 4.99 1210 6460 0.82 0.68 0.012
429 5.09 1230 6260 0.82 0.68 0.012
Notice how the row labels are still out of order, unlike when using the sort_index() function.
pandas sort by multiple columns
You can use the sort_values() method to sort multiple columns. You can simply pass in a list of column names to the by
argument:
>>> df.sort_values(by=["L","G"])
M L T G X Z
1136 4.79 830 5860 0.64 0.68 0.012
1543 3.89 970 5860 0.64 0.68 0.012
1441 6.49 1010 5860 0.64 0.68 0.012
393 4.29 1030 6160 0.82 0.68 0.012
775 4.79 1070 6360 0.82 0.68 0.012
725 4.89 1090 6360 0.82 0.68 0.012
1282 4.69 1110 5860 0.64 0.68 0.012
300 5.09 1110 6460 0.91 0.68 0.012
1055 4.99 1210 6460 0.82 0.68 0.012
429 5.09 1230 6260 0.82 0.68 0.012
How do you drop duplicate rows in a DataFrame?
You can drop duplicate rows in a dataframe by using the pandas dataframe.drop_duplicates () method. Here is an example dataframe with multiple duplicate rows:
>>> data = [[4.70, 830 , 5860, 0.64, 0.68, 0.012],
[6.49 , 1010, 5860, 0.64, 0.68, 0.012],
[4.70 , 830 , 5860, 0.64, 0.68, 0.012],
[6.49 , 1010, 5860, 0.64, 0.68, 0.012],
[6.49 , 1010, 5860, 0.64, 0.68, 0.012],
[4.99 , 1210, 6460, 0.82, 0.68, 0.012],
[4.99 , 1210, 6460, 0.82, 0.68, 0.012]]
>>> df = pd.DataFrame(data, columns=["M","L","T","G","X","Z"])
>>> df
M L T G X Z
0 4.70 830 5860 0.64 0.68 0.012
1 6.49 1010 5860 0.64 0.68 0.012
2 4.70 830 5860 0.64 0.68 0.012
3 6.49 1010 5860 0.64 0.68 0.012
4 6.49 1010 5860 0.64 0.68 0.012
5 4.99 1210 6460 0.82 0.68 0.012
6 4.99 1210 6460 0.82 0.68 0.012
>>> df.drop_duplicates()
M L T G X Z
0 4.70 830 5860 0.64 0.68 0.012
1 6.49 1010 5860 0.64 0.68 0.012
5 4.99 1210 6460 0.82 0.68 0.012
Summary
In this article you learned what the pandas sort_index() function does and how to use it. First, you learned that the pandas sort_index() method sorts a dataframe by it's row labels. Next, you learned several different keyword arguments you can specify to sort in different ways. Then, you learned how to sort by column labels using the axis
keyword argument. Finally, you learned what the difference is between the sort_index() method and the sort_values() method, and how to drop duplicate rows in a dataframe.
Next steps
If you're interested in learning more about the basics of Python, coding, and software development, check out our Coding Essentials Guidebook for Developers, where we cover the essential languages, concepts, and tools that you'll need to become a professional developer.
Thanks and happy coding! We hope you enjoyed this article. If you have any questions or comments, feel free to reach out to jacob@initialcommit.io.
References
- Pandas Documentation - https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_index.html
- Sort_index - https://www.geeksforgeeks.org/python-pandas-dataframe-sort_index/
Final Notes
Recommended product: Coding Essentials Guidebook for Developers