Python Regular Expressions - Meta Characters - Part 2 смотреть онлайн
Concepts covered: character classes, negated character classes, the dot meta-character. ignore case, word boundaries,
parenthesis, back referencing.
Link to course : http://rshankar.com/courses/autolayoutyt7/
References:
https://developmentality.wordpress.com/2011/09/22/python-gotcha-word-boundaries-in-regular-expressions/
https://teamtreehouse.com/library/regular-expressions-in-python
Script:
In the last video we looked at quite a few meta-characters.
For example, we saw meta-characters to match a word character, a digit and a space.
And conversely, there are meta-characters that match anything NOT a word, digit or space respectively.
We also saw character classes, quantifiers and the concept of look behind.
At this point, perhaps your mind is buzzing - Mine certainly is.
So at some point, we will organize these meta-characters into logical groups.
But that time is NOT now.
In this video, we will continue building upon our repository of meta-characters and look at tackling an often encountered problem when dealing with text. The double word problem.
But let’s start with a question - I have a string and I wish to match its entire content.
The string may have alphabets, numbers or other characters - I do not care.
I just wish to match the ENTIRE string.
How do we do that?
The meta-character . matches any character. Lets see how it works.
This pattern matches the entire string.
Great, but can we have a more useful example?
A common problem is searching for dates in a string.
So let’s say we know that dates are in dd mm yy format.
But we are not sure about the separator - It could be a - or / or dot.
How do we handle this?
One way is to use a character class like so. Notice the dash at the beginning of the character class.
But can we simplify?
Lets try another pattern .
That works too. But wait a minute. How is the . outside the character class different from the . inside the character class?
The dot inside the character class is treated literally.
The dot outside the character class is a meta-character so matches any character.
Please pause the video and ensure this point is clear.
So stretching this discussion further, we may argue that this pattern is more simple but on the flip side it may have an undesired result.
Consider a string like so, representing, say a lottery number. It will get matched too.
So which pattern should we use? It all depends on how specific we wish to be.
We have seen a few examples of character classes by now.
But there is one more pattern - a negative character class. Lets take an example.
I wish to find words that start with a q but the second character should NOT be a u.
Can you think of any such words?
Lets try it out - Ignoring these interesting words for a moment, we have just just used a negated character class.
But there are 2 problems.
1. Firstly i really do not care about case. So I would want to see this last word too. Lets fix this by adding a parameter to our function. Great !
2. But I still want to find words. The qi here is part of the word Iraqi. How to fix this. Well we will wrap our pattern in a word boundary like so.
Lets test. That blew things up, hey !
It turns out that \b is also used as the backspace control sequence. Thus in order for the regular expression engine to interpret the word boundary correctly, we can
use a raw string literal.
Please allow me to not get into more detail here. We have made a case for the raw string literal and will encounter it again in future
At this point, lets throw in a question - How would you match a pattern such as July.
The only problem here is that we could have July written as both Jul or July.
Pause the video and try it out.
So here is one solution using a quantifier.
And here is another using the or meta-character which matches any of the pipe separated expressions.
We are approaching the end of the video but before leaving let us look at one last problem.
Double words is a very common problem - Can we use Regular expressions to detect double words?
So in this simple example we have 2 double words.
What we need to do is firstly remember each word in the pattern and see if it is repeated.
We will use 2 new concepts here:
a. A parenthesis and
b. A concept called a back reference
in may Regex flavors, the parenthesis can remember text matched by the sub expression they enclose.
And how do we access that remembered text? By using back referencing.
So in this case, we have 1 parenthesis. To access its contents we will use a special meta sequence - \1 which represents the text matches within the parenthesis - What ever it is at that point.
Let us test. All good here.
So, in conclusion , we saw many more meta-characters in this video - character classes, negated character classes, the dot metac-haracter. ignore case, word boundaries, the pipe meta-character , parenthesis and back referencing.
Please practice each of these concepts so that it gradually sinks it.
See you in the next video.
Что делает видео по-настоящему запоминающимся? Наверное, та самая атмосфера, которая заставляет забыть о времени. Когда вы заходите на RUVIDEO, чтобы посмотреть онлайн «Python Regular Expressions - Meta Characters - Part 2» бесплатно и без регистрации, вы рассчитываете на нечто большее, чем просто загрузку плеера. И мы это понимаем. Контент такого уровня заслуживает того, чтобы его смотрели в HD 1080, без дрожания картинки и бесконечного буферизации.
Честно говоря, Rutube сегодня — это кладезь уникальных находок, которые часто теряются в общем шуме. Мы же вытаскиваем на поверхность самое интересное. Будь то динамичный экшн, глубокий разбор темы от любимого автора или просто уютное видео для настроения — всё это доступно здесь бесплатно и без лишних формальностей. Никаких «заполните анкету, чтобы продолжить». Только вы, ваш экран и качественный поток.
Если вас зацепило это видео, не забудьте взглянуть на похожие материалы в блоке справа. Мы откалибровали наши алгоритмы так, чтобы они подбирали контент не просто «по тегам», а по настроению и смыслу. Ведь в конечном итоге, онлайн-кинотеатр — это не склад файлов, а место, где каждый вечер можно найти свою историю. Приятного вам отдыха на RUVIDEO!
Видео взято из открытых источников Rutube. Если вы правообладатель, обратитесь к первоисточнику.