reuse.extract module¶

Utilities related to the extraction of REUSE information out of files.

reuse.extract.get_encoding_module() → ModuleType[source]¶: Get the module used to detect the encodings of files.

reuse.extract.set_encoding_module(name: str) → ModuleType[source]¶: Set the module used to detect the encodings of files, and return the module.

reuse.extract.CHUNK_SIZE = 65536¶: Default chunk size for reading files.

reuse.extract.LINE_SIZE = 1024¶: Default line size for reading files.

reuse.extract.HEURISTICS_CHUNK_SIZE = 2048¶: Default chunk size used to heuristically detect file type, encoding, et cetera.

class reuse.extract.FilterBlock(text: str, in_ignore_block: bool)[source]¶

Bases: NamedTuple

A simple tuple that holds a block of text, and whether that block of text is in an ignore block.

text: str¶: Alias for field number 0

in_ignore_block: bool¶: Alias for field number 1

reuse.extract.filter_ignore_block(text: str, in_ignore_block: bool = False) → FilterBlock[source]¶

Filter out blocks beginning with REUSE_IGNORE_START and ending with REUSE_IGNORE_END to remove lines that should not be treated as copyright and licensing information.

Parameters:

text – The text out of which the ignore blocks must be filtered.
in_ignore_block – Whether the text is already in an ignore block. This is useful when you parse subsequent chunks of text, and one chunk does not close the ignore block.

Returns:

A FilterBlock tuple that contains the filtered text and a boolean that signals whether the ignore block is still open.

reuse.extract.extract_reuse_info(text: str) → ReuseInfo[source]¶

Extract REUSE information from a multi-line text block.

Raises:

ExpressionError – if an SPDX expression could not be parsed.
ParseError – if an SPDX expression could not be parsed.

reuse.extract.detect_encoding(chunk: bytes) → str | None[source]¶

Find the encoding of the bytes chunk, and return it as normalised name. See encodings.normalize_encoding(). If no encoding could be found, return None.

If the chunk is empty or the encoding of the chunk is ASCII, 'utf_8' is returned.

reuse.extract.detect_newline(chunk: bytes, encoding: str = 'ascii') → str[source]¶: Return one of '\n', '\r' or '\r\n' depending on the line endings used in chunk. Return os.linesep if there are no line endings.

reuse.extract.reuse_info_of_file(fp: BinaryIO, chunk_size: int = 65536, line_size: int = 1024) → ReuseInfo[source]¶

Read from fp to extract REUSE information. It is read in chunks of chunk_size, additionally reading up to line_size until the next newline.

This function decodes the binary data into UTF-8 and removes REUSE ignore blocks before attempting to extract the REUSE information.

reuse.extract.contains_reuse_info(text: str) → bool[source]¶: The text contains REUSE info.