Python Tutorial: Numeric variables
Want to learn more? Take the full course at https://learn.datacamp.com/courses/feature-engineering-for-machine-learning-in-python at your own pace. More than a video, you'll learn hands-on coding & quickly apply skills to your daily work.
---
As mentioned in the previous lesson, most machine learning models will require your data to be in numeric format. However, even if your raw data is all numeric, there is still a lot you can do to improve your features.
Numeric features can be used to represent a huge array of different characteristics and measurements. Pretty much anything that can be quantitatively measured can be recorded as numeric data. For example, age, the price of an item, counts, and even spatial data such as coordinates. Depending on the use case, numeric features can be treated in several different ways. We will work through a few of the considerations and possible feature engineering steps to keep in mind when dealing with numeric data.
One of the first questions you should ask when working with numeric features is whether the magnitude of the feature is its most important trait, or just its direction. For example, if you had a dataset of restaurant health and safety ratings containing the number of times a restaurant had major violations, you might care far more about whether the restaurant had any major violations at all (as you would rather not take any chances), over whether it was a repeat offender. Looking at this toy dataset containing restaurant IDs and the number of times they had major violations, we can see that some restaurants have no major violations but many have one or more. We will be creating a new binary column representing whether or not a restaurant committed any violation.
Here we first create a new column Binary_Violation and set it to zero. Then, we use the dot loc notation to find all rows where Number_of_Violations is greater than zero and set the Binary_Violation column to 1.
As you can see here, all rows where Number_of_Violations is equal to 0 are also zeros in Binary_Violation. However, for all rows where Number_of_Violations is greater than zero is 1 in Binary_Violation.
An extension of this is perhaps you wish to group a numeric variable into more than two bins. This is often useful for variables such as age, wage brackets, etc where exact numbers are less relevant than the general magnitude of the value.
Consider the same dataset of restaurant health and safety ratings containing the number of times a restaurant has had major violations. This time we will be creating three groups, Group 1, for restaurants with no offenses, Group 2 for restaurants with one or two offenses and group 3 for all restaurants with three or more offenses.
Bins are created by using the pandas' cut() function. You can define the intervals using the bins argument as shown here, which in this case is a list of 4 values. You can also pass a list of labels like so.
Note as we want to include 0 in the first bin, we must set the leftmost edge to lower than that, so all values between negative infinity and 0 are labeled as 1, all values equal to 1 or 2 are labeled as 2, and values greater than 2 are labeled as 3.
Now you know how to binarize and bin numeric columns, it's time for you to put this into practice.
#PythonTutorial #Python #DataCamp #Engineering #Numeric #variables #MachineLearning
Что делает видео по-настоящему запоминающимся? Наверное, та самая атмосфера, которая заставляет забыть о времени. Когда вы заходите на RUVIDEO, чтобы посмотреть онлайн «Python Tutorial: Numeric variables», вы рассчитываете на нечто большее, чем просто загрузку плеера. И мы это понимаем. Контент такого уровня заслуживает того, чтобы его смотрели в HD 1080, без дрожания картинки и бесконечного буферизации.
Честно говоря, Rutube сегодня — это кладезь уникальных находок, которые часто теряются в общем шуме. Мы же вытаскиваем на поверхность самое интересное. Будь то динамичный экшн, глубокий разбор темы от любимого автора или просто уютное видео для настроения — всё это доступно здесь бесплатно и без лишних формальностей. Никаких «заполните анкету, чтобы продолжить». Только вы, ваш экран и качественный поток.
Если вас зацепило это видео, не забудьте взглянуть на похожие материалы в блоке справа. Мы откалибровали наши алгоритмы так, чтобы они подбирали контент не просто «по тегам», а по настроению и смыслу. Ведь в конечном итоге, онлайн-кинотеатр — это не склад файлов, а место, где каждый вечер можно найти свою историю. Приятного вам отдыха на RUVIDEO!
Видео взято из открытых источников Rutube. Если вы правообладатель, обратитесь к первоисточнику.