regex_syntax::hir

Struct Properties

source
pub struct Properties(/* private fields */);
Expand description

A type that collects various properties of an HIR value.

Properties are always scalar values and represent meta data that is computed inductively on an HIR value. Properties are defined for all HIR values.

All methods on a Properties value take constant time and are meant to be cheap to call.

Implementations§

source§

impl Properties

source

pub fn minimum_len(&self) -> Option<usize>

Returns the length (in bytes) of the smallest string matched by this HIR.

A return value of 0 is possible and occurs when the HIR can match an empty string.

None is returned when there is no minimum length. This occurs in precisely the cases where the HIR matches nothing. i.e., The language the regex matches is empty. An example of such a regex is \P{any}.

source

pub fn maximum_len(&self) -> Option<usize>

Returns the length (in bytes) of the longest string matched by this HIR.

A return value of 0 is possible and occurs when nothing longer than the empty string is in the language described by this HIR.

None is returned when there is no longest matching string. This occurs when the HIR matches nothing or when there is no upper bound on the length of matching strings. Example of such regexes are \P{any} (matches nothing) and a+ (has no upper bound).

source

pub fn look_set(&self) -> LookSet

Returns a set of all look-around assertions that appear at least once in this HIR value.

source

pub fn look_set_prefix(&self) -> LookSet

Returns a set of all look-around assertions that appear as a prefix for this HIR value. That is, the set returned corresponds to the set of assertions that must be passed before matching any bytes in a haystack.

For example, hir.look_set_prefix().contains(Look::Start) returns true if and only if the HIR is fully anchored at the start.

source

pub fn look_set_prefix_any(&self) -> LookSet

Returns a set of all look-around assertions that appear as a possible prefix for this HIR value. That is, the set returned corresponds to the set of assertions that may be passed before matching any bytes in a haystack.

For example, hir.look_set_prefix_any().contains(Look::Start) returns true if and only if it’s possible for the regex to match through a anchored assertion before consuming any input.

source

pub fn look_set_suffix(&self) -> LookSet

Returns a set of all look-around assertions that appear as a suffix for this HIR value. That is, the set returned corresponds to the set of assertions that must be passed in order to be considered a match after all other consuming HIR expressions.

For example, hir.look_set_suffix().contains(Look::End) returns true if and only if the HIR is fully anchored at the end.

source

pub fn look_set_suffix_any(&self) -> LookSet

Returns a set of all look-around assertions that appear as a possible suffix for this HIR value. That is, the set returned corresponds to the set of assertions that may be passed before matching any bytes in a haystack.

For example, hir.look_set_suffix_any().contains(Look::End) returns true if and only if it’s possible for the regex to match through a anchored assertion at the end of a match without consuming any input.

source

pub fn is_utf8(&self) -> bool

Return true if and only if the corresponding HIR will always match valid UTF-8.

When this returns false, then it is possible for this HIR expression to match invalid UTF-8, including by matching between the code units of a single UTF-8 encoded codepoint.

Note that this returns true even when the corresponding HIR can match the empty string. Since an empty string can technically appear between UTF-8 code units, it is possible for a match to be reported that splits a codepoint which could in turn be considered matching invalid UTF-8. However, it is generally assumed that such empty matches are handled specially by the search routine if it is absolutely required that matches not split a codepoint.

§Example

This code example shows the UTF-8 property of a variety of patterns.

use regex_syntax::{ParserBuilder, parse};

// Examples of 'is_utf8() == true'.
assert!(parse(r"a")?.properties().is_utf8());
assert!(parse(r"[^a]")?.properties().is_utf8());
assert!(parse(r".")?.properties().is_utf8());
assert!(parse(r"\W")?.properties().is_utf8());
assert!(parse(r"\b")?.properties().is_utf8());
assert!(parse(r"\B")?.properties().is_utf8());
assert!(parse(r"(?-u)\b")?.properties().is_utf8());
assert!(parse(r"(?-u)\B")?.properties().is_utf8());
// Unicode mode is enabled by default, and in
// that mode, all \x hex escapes are treated as
// codepoints. So this actually matches the UTF-8
// encoding of U+00FF.
assert!(parse(r"\xFF")?.properties().is_utf8());

// Now we show examples of 'is_utf8() == false'.
// The only way to do this is to force the parser
// to permit invalid UTF-8, otherwise all of these
// would fail to parse!
let parse = |pattern| {
    ParserBuilder::new().utf8(false).build().parse(pattern)
};
assert!(!parse(r"(?-u)[^a]")?.properties().is_utf8());
assert!(!parse(r"(?-u).")?.properties().is_utf8());
assert!(!parse(r"(?-u)\W")?.properties().is_utf8());
// Conversely to the equivalent example above,
// when Unicode mode is disabled, \x hex escapes
// are treated as their raw byte values.
assert!(!parse(r"(?-u)\xFF")?.properties().is_utf8());
// Note that just because we disabled UTF-8 in the
// parser doesn't mean we still can't use Unicode.
// It is enabled by default, so \xFF is still
// equivalent to matching the UTF-8 encoding of
// U+00FF by default.
assert!(parse(r"\xFF")?.properties().is_utf8());
// Even though we use raw bytes that individually
// are not valid UTF-8, when combined together, the
// overall expression *does* match valid UTF-8!
assert!(parse(r"(?-u)\xE2\x98\x83")?.properties().is_utf8());
source

pub fn explicit_captures_len(&self) -> usize

Returns the total number of explicit capturing groups in the corresponding HIR.

Note that this does not include the implicit capturing group corresponding to the entire match that is typically included by regex engines.

§Example

This method will return 0 for a and 1 for (a):

use regex_syntax::parse;

assert_eq!(0, parse("a")?.properties().explicit_captures_len());
assert_eq!(1, parse("(a)")?.properties().explicit_captures_len());
source

pub fn static_explicit_captures_len(&self) -> Option<usize>

Returns the total number of explicit capturing groups that appear in every possible match.

If the number of capture groups can vary depending on the match, then this returns None. That is, a value is only returned when the number of matching groups is invariant or “static.”

Note that this does not include the implicit capturing group corresponding to the entire match.

§Example

This shows a few cases where a static number of capture groups is available and a few cases where it is not.

use regex_syntax::parse;

let len = |pattern| {
    parse(pattern).map(|h| {
        h.properties().static_explicit_captures_len()
    })
};

assert_eq!(Some(0), len("a")?);
assert_eq!(Some(1), len("(a)")?);
assert_eq!(Some(1), len("(a)|(b)")?);
assert_eq!(Some(2), len("(a)(b)|(c)(d)")?);
assert_eq!(None, len("(a)|b")?);
assert_eq!(None, len("a|(b)")?);
assert_eq!(None, len("(b)*")?);
assert_eq!(Some(1), len("(b)+")?);
source

pub fn is_literal(&self) -> bool

Return true if and only if this HIR is a simple literal. This is only true when this HIR expression is either itself a Literal or a concatenation of only Literals.

For example, f and foo are literals, but f+, (foo), foo() and the empty string are not (even though they contain sub-expressions that are literals).

source

pub fn is_alternation_literal(&self) -> bool

Return true if and only if this HIR is either a simple literal or an alternation of simple literals. This is only true when this HIR expression is either itself a Literal or a concatenation of only Literals or an alternation of only Literals.

For example, f, foo, a|b|c, and foo|bar|baz are alternation literals, but f+, (foo), foo(), and the empty pattern are not (even though that contain sub-expressions that are literals).

source

pub fn memory_usage(&self) -> usize

Returns the total amount of heap memory usage, in bytes, used by this Properties value.

source

pub fn union<I, P>(props: I) -> Properties
where I: IntoIterator<Item = P>, P: Borrow<Properties>,

Returns a new set of properties that corresponds to the union of the iterator of properties given.

This is useful when one has multiple Hir expressions and wants to combine them into a single alternation without constructing the corresponding Hir. This routine provides a way of combining the properties of each Hir expression into one set of properties representing the union of those expressions.

§Example: union with HIRs that never match

This example shows that unioning properties together with one that represents a regex that never matches will “poison” certain attributes, like the minimum and maximum lengths.

use regex_syntax::{hir::Properties, parse};

let hir1 = parse("ab?c?")?;
assert_eq!(Some(1), hir1.properties().minimum_len());
assert_eq!(Some(3), hir1.properties().maximum_len());

let hir2 = parse(r"[a&&b]")?;
assert_eq!(None, hir2.properties().minimum_len());
assert_eq!(None, hir2.properties().maximum_len());

let hir3 = parse(r"wxy?z?")?;
assert_eq!(Some(2), hir3.properties().minimum_len());
assert_eq!(Some(4), hir3.properties().maximum_len());

let unioned = Properties::union([
	hir1.properties(),
	hir2.properties(),
	hir3.properties(),
]);
assert_eq!(None, unioned.minimum_len());
assert_eq!(None, unioned.maximum_len());

The maximum length can also be “poisoned” by a pattern that has no upper bound on the length of a match. The minimum length remains unaffected:

use regex_syntax::{hir::Properties, parse};

let hir1 = parse("ab?c?")?;
assert_eq!(Some(1), hir1.properties().minimum_len());
assert_eq!(Some(3), hir1.properties().maximum_len());

let hir2 = parse(r"a+")?;
assert_eq!(Some(1), hir2.properties().minimum_len());
assert_eq!(None, hir2.properties().maximum_len());

let hir3 = parse(r"wxy?z?")?;
assert_eq!(Some(2), hir3.properties().minimum_len());
assert_eq!(Some(4), hir3.properties().maximum_len());

let unioned = Properties::union([
	hir1.properties(),
	hir2.properties(),
	hir3.properties(),
]);
assert_eq!(Some(1), unioned.minimum_len());
assert_eq!(None, unioned.maximum_len());

Trait Implementations§

source§

impl Clone for Properties

source§

fn clone(&self) -> Properties

Returns a copy of the value. Read more
1.0.0 · source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
source§

impl Debug for Properties

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
source§

impl PartialEq for Properties

source§

fn eq(&self, other: &Properties) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 · source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
source§

impl Eq for Properties

source§

impl StructuralPartialEq for Properties

Auto Trait Implementations§

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> CloneToUninit for T
where T: Clone,

source§

unsafe fn clone_to_uninit(&self, dst: *mut T)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dst. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T> ToOwned for T
where T: Clone,

source§

type Owned = T

The resulting type after obtaining ownership.
source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

source§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.