reuse.extract module

Utilities related to the extraction of REUSE information out of files.

reuse.extract.decoded_text_from_binary(binary_file: BinaryIO, size: int | None = None) str[source]

Given a binary file object, detect its encoding and return its contents as a decoded string. Do not throw any errors if the encoding contains errors: Just replace the false characters.

If size is specified, only read so many bytes.

reuse.extract.extract_reuse_info(text: str) ReuseInfo[source]

Extract REUSE information from comments in a string.

Raises:
  • ExpressionError – if an SPDX expression could not be parsed.

  • ParseError – if an SPDX expression could not be parsed.

reuse.extract.reuse_info_of_file(path: str | PathLike[str], original_path: str | PathLike[str], root: str | PathLike[str]) ReuseInfo[source]

Open path and return its ReuseInfo.

Normally only the first few _HEADER_BYTES are read. But if a snippet was detected, the entire file is read.

reuse.extract.find_spdx_tag(text: str, pattern: Pattern) Iterator[str][source]

Extract all the values in text matching pattern’s regex, taking care of stripping extraneous whitespace of formatting.

reuse.extract.filter_ignore_block(text: str) str[source]

Filter out blocks beginning with REUSE_IGNORE_START and ending with REUSE_IGNORE_END to remove lines that should not be treated as copyright and licensing information.

reuse.extract.contains_reuse_info(text: str) bool[source]

The text contains REUSE info.

reuse.extract.detect_line_endings(text: str) str[source]

Return one of ‘ ‘, ‘ ‘ or ‘ ‘ depending on the line endings used in

text. Return os.linesep if there are no line endings.