Python Tutorial: Transforming categorical variables
Want to learn more? Take the full course at https://learn.datacamp.com/courses/human-resources-analytics-predicting-employee-churn-in-python at your own pace. More than a video, you'll learn hands-on coding & quickly apply skills to your daily work.
---
Now that we know what are the categorical variables in our dataset we can start transforming them into numerical.
To transform a categorical variable into numeric, we have to understand it's type first. There are two types of categorical variables: ordinal and nominal. Ordinal variables have two or more categories that can be ranked or ordered. In our case that is the **salary** column, where the values clearly have a logical order.
The 2nd type is Nominal, where categories do not have any intrnisic or logical order. An example of this kind of variable in our dataset is the column **department**, as its values clearly do not have any order or rank: sales department is not higher than hr or viceversa and so on.
Based on what type of categorical variable you have, there are different methods for transforming them.
For the case of ordinal variables we can encode categories by converting each of them into a respective numeric value. There are 3 steps to accomplish that tasks in Python.
- First, we have to tell Python, that the column salary is actually categorical. This is done using a method called **astype()** which is providing the type of the variable.
- Then, once Python knows that it is a categorical variable, we have to tell the correct order of categories, using cat.reorder_categories() method. As you can see in the code, this method takes a list as an input, where the correct order of categories is provided.
- Last but not least, we have to use cat.codes attribute to encode each category with a numeric value given our order. The result will overwrite the old values of salary column with new numeric values as presented in the table.
The next categorical variable is nominal, as there is no order or rank between departments. This means that encoding approach is not useful anymore. In this case, transformation should be accomplished trough the so called dummy variables.
Dumym variables are the variables that get only two values 0 or 1. Let's say an employee is from the technical department. This means if we have a searate column for each department, then the mentioned employee will have value of 1 in the column for technical and 0 in the columns of all other departments.
This means we will have to create a new dataframe where each department is a separate column and each row is a separate employee with 1s in front of his/her department and 0 in all other places. While the task seems to be confusing, it is very easy from technical perspective due to a very nice function from pandas called **get_dummies()**.
When dealing with dummy variables one should be cautious of a phenomenon known as dummy trap. The latter is the situation when different dummy varaibles convey the same information. In this example, the sample employee is from the technical department, so it is the only column with a value of 1 in the first table. In the 2nd table, the last column is dropped, but we can still understand that the employee is from technical department by looking at all the other departments that have value of 0. For that reason, whenever in similar situations dummies are created one of them can be dropped as its information is already included in others.
Ok, time to put this into practice.
#DataCamp #PythonTutorial #Human #Resources #Analytics #Predicting #Employee #Churn #Python
Что делает видео по-настоящему запоминающимся? Наверное, та самая атмосфера, которая заставляет забыть о времени. Когда вы заходите на RUVIDEO, чтобы посмотреть онлайн «Python Tutorial: Transforming categorical variables», вы рассчитываете на нечто большее, чем просто загрузку плеера. И мы это понимаем. Контент такого уровня заслуживает того, чтобы его смотрели в HD 1080, без дрожания картинки и бесконечного буферизации.
Честно говоря, Rutube сегодня — это кладезь уникальных находок, которые часто теряются в общем шуме. Мы же вытаскиваем на поверхность самое интересное. Будь то динамичный экшн, глубокий разбор темы от любимого автора или просто уютное видео для настроения — всё это доступно здесь бесплатно и без лишних формальностей. Никаких «заполните анкету, чтобы продолжить». Только вы, ваш экран и качественный поток.
Если вас зацепило это видео, не забудьте взглянуть на похожие материалы в блоке справа. Мы откалибровали наши алгоритмы так, чтобы они подбирали контент не просто «по тегам», а по настроению и смыслу. Ведь в конечном итоге, онлайн-кинотеатр — это не склад файлов, а место, где каждый вечер можно найти свою историю. Приятного вам отдыха на RUVIDEO!
Видео взято из открытых источников Rutube. Если вы правообладатель, обратитесь к первоисточнику.