Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improvement: Implement support for NumPy-style docstrings #279

Open
wants to merge 8 commits into
base: develop
Choose a base branch
from

Conversation

celsiusnarhwal
Copy link

@celsiusnarhwal celsiusnarhwal commented Jan 25, 2023

This PR implements support for NumPy-style docstrings via the new NumpyProcessor class. It does so with the help of the numpydoc package, on which this PR makes Pydoc-Markdown dependent.

In addition to the above, this PR:

  • Adds a unit test for NumpyProcessor
  • Updates SmartProcessor to support NumpyProcessor
  • Updates pyproject.toml to reflect the addition of numpydoc as a dependency
  • Updates readme.md to reflect the addition of NumPy-style docstring support

This PR resolves #251.

Caveats and Limitations

  • NumpyProcessor.check_docstring_format() returns True if a docstring passes numpydoc's docstring validator without warnings or errors and False otherwise. Because SmartProcessor skips the call to check_docstring_format if the format is explicitly indicated in the docstring (e.g., with @doc:fmt:numpy), a docstring that would fail numpydoc's validator but nonetheless explicitly identifies itself as a NumPy-style docstring may result in warnings or exceptions at processing time.
    • The processor converts docstrings to NumpyDocString objects before converting them to Markdown syntax. Instantiating a NumpyDocString object with an invalid docstring will result in warnings or exceptions.
  • Reference indexes in a docstring's Notes section are not hyperlinked to their corresponding references in the References section, in contrast to the numpydoc spec. This is due to what is apparently a behavior of Pydoc-Markdown's existing faculties, which insisted on rendering HTML tags in a way that broke the hyperlinks in all my attempts to implement this behavior. Examples of how reference indexes and references are rendered by NumpyProcessor can be found below.

Examples

Here are examples of how the various sections of a NumPy-Style docstring are rendered by NumpyProcessor.

Summary / Extended Summary

The Summary and Extended Summary are rendered together as a single summary.

Input

Decode a string by shifting each character by a given offset.

Extended Summary
----------------
There's not much else to say about this function, but if there was, it would go here. Fun fact: you 
don't need to include the Extended Summary heading — if your summary spans multiple lines, everything after the 
first will be implicitly considered to be the Extended Summary. You can't have both an implicit *and* explicit 
Extended Summary, though — that causes an exception!

Output

Decode a string by shifting each character by a given offset.

There's not much else to say about this function, but if there was, it would go here. Fun fact: you don't need to include the Extended Summary heading — if your summary spans multiple lines, everything after the first will be implicitly considered to be the Extended Summary. You can't have both an implicit and explicit Extended Summary, though — that causes an exception!

Parameters / Other Parameters / Attributes / Recieves

The Parameters, Other Parameters, Attributes, and Receives sections are all rendered similarly.

Input

Parameters
----------
string : str
    The string to decode.
   
Other Parameters
----------------
offset : int
    The offset by which to shift each character in the string. Defaults to 13.
    
Attributes
----------
attr : Any
    Functions don't have attributes, but if we were documenting a class, we'd put its attributes here. 
    Unfortunately, we are not. Too bad!
    
Receives
--------
param : Any
    If this was a generator, we'd document the parameters passed to it's `send()` method here.
    Unfortunately, it is not. Too bad!

Output

Arguments

  • string (str): The string to decode.
  • offset (int): The offset by which to shift each character in the string. Defaults to 13.

Attributes

  • attr (Any): Functions don't have attributes, but if we were documenting a class, we'd put its attributes here. Unfortunately, we are not. Too bad!

Receives

  • param (Any): If this was a generator, we'd document the parameters passed to it's send() method here. Unfortunately, it is not. Too bad!
Returns / Yields

The Returns and Yields sections are rendered similarly.

Input

Returns
-------
str
    The decoded string.

Yields
------
char : str
    The decoded string, one character at a time. By the way, you can optionally annotate your return and yield 
    values with names like I did here. The type annotation isn't optional, though.

Output

Returns

  • str: The decoded string.

Yields

  • char (str): The decoded string, one character at a time. By the way, you can optionally annotate your return and yield values with names like I did here. The type annotation isn't optional, though.
Raises / Warns

The Raises and Warns sections are rendered similarly.

Input

Raises
------
ValueError
    If the string contains non-alphabetic characters.

Warns
-----
UserWarning
    If I don't like you.

Output

Raises

  • ValueError: If the string contains non-alphabetic characters.

Warns

  • UserWarning: If I don't like you.
See Also

Input

See Also
--------
:func:`encode`
    Encode a string by shifting each character by a given offset.

Output

See Also

  • :func:`encode`: Encode a string by shifting each character by a given offset.

(The processor leaves the task of cross-referencing functions, classes, and methods in this section to Pydoc-Markdown's existing faculties.)

Notes

Input

Notes
-----
This function implements an inverse substitution cipher[1]_.

Output

Notes

This function implements an inverse substitution cipher1.

References

Input

References
----------
.. [1] https://en.wikipedia.org/wiki/Substitution_cipher

Output

References

  1. https://en.wikipedia.org/wiki/Substitution_cipher
Examples

The Examples section supports doctests. The processor renders doctests in code blocks and other content as plain text.

The processor considers the start of a doctest to be marked by a line beginning with >>> and the end of a doctest to be marked by a blank line. If multiple doctests are present, they are rendered in separate code blocks.

Input

Examples
--------
>>> decode("Qba'g nfx fghcvq dhrfgvbaf!")
"Don't ask stupid questions!"

This is a super simple function so I don't really know why you'd need more than one example but here's another one 
anyway.

>>> decode("Gunax lbh xvaqyl sbe lbhe nggragvba!")
"Thank you kindly for your attention!"

Output

Examples

>>> decode("Qba'g nfx fghcvq dhrfgvbaf!")
"Don't ask stupid questions!"

This is a super simple function so I don't really know why you'd need more than one example but here's another one anyway.

>>> decode("Gunax lbh xvaqyl sbe lbhe nggragvba!")
"Thank you kindly for your attention!"
@celsiusnarhwal celsiusnarhwal marked this pull request as ready for review January 25, 2023 23:46
@NiklasRosenstein
Copy link
Owner

Hey @celsiusnarhwal, thanks for this great PR! I'll be able to take a closer look at it next week.

@NiklasRosenstein
Copy link
Owner

Hey @celsiusnarhwal, sorry for the silence. I'm finally finding some time again to look at your PR

I've made some minor adjustments, and I'd almost be happy to merge it as it is now! Only that there are two unit tests failing because the NumpyProcessor identifies the examples below as seemingly being of the Numpy doc format when in reality they're not and they don't really get processed as a consequence.

E.g. for the test_pydocmd_processor test:

# Arguments
s (str): A string.
b (int): An int.

It spits the same back out. I've added some logging so we can tell which processor the SmartProcessor is delegating to:

INFO     pydoc_markdown.contrib.processors.smart:smart.py:92 Using `numpy` processor for Module `test` (detected)

NumpyProcessor.check_docstring_format() returns True if a docstring passes numpydoc's docstring validator without warnings or errors and False otherwise

I'm also thinking that this on the other may be too restrictive. If I want to use the Numpy docstring format, I may still make mistakes, and I'd actually want it to be identified as Numpy docstring format regardless of whether I have a minor mistake in my docstring formatting. Getting a warning (although maybe not an exception) in this case would be desirable.

What do you think about checking for the presence of Numpy-doc-like sections (e.g. Raises\n-------) in the content of the docstring instead?

@hello-binit
Copy link

This is cool! Useful for me. Let me know if I can somehow help land this PR. My fork.

@luiztauffer
Copy link

hey guys, thank you so much for your work on this! any chance this can be incorporated to a new release of pydoc-markdown? This would be so helpful!

@luiztauffer
Copy link

I'm having trouble trying to use this as it is.
At my pydoc-markdown.yaml, I have:

processors:
  - type: numpy

when trying to run the CLI with pydoc-markdown, I get this error:

Traceback (most recent call last):
  File "/home/luiz/anaconda3/envs/env_voluseg/bin/pydoc-markdown", line 8, in <module>
    sys.exit(cli())
             ^^^^^
  File "/home/luiz/anaconda3/envs/env_voluseg/lib/python3.12/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/luiz/anaconda3/envs/env_voluseg/lib/python3.12/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/home/luiz/anaconda3/envs/env_voluseg/lib/python3.12/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/luiz/anaconda3/envs/env_voluseg/lib/python3.12/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/shared_storage/Github/pydoc-markdown/src/pydoc_markdown/main.py", line 371, in cli
    pydocmd = session.load()
              ^^^^^^^^^^^^^^
  File "/mnt/shared_storage/Github/pydoc-markdown/src/pydoc_markdown/main.py", line 117, in load
    config.load_config(self.config)
  File "/mnt/shared_storage/Github/pydoc-markdown/src/pydoc_markdown/__init__.py", line 121, in load_config
    result = databind.json.load(
             ^^^^^^^^^^^^^^^^^^^
  File "/home/luiz/anaconda3/envs/env_voluseg/lib/python3.12/site-packages/databind/json/__init__.py", line 66, in load
    return get_object_mapper().deserialize(value, type_, filename, settings)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/luiz/anaconda3/envs/env_voluseg/lib/python3.12/site-packages/databind/core/mapper.py", line 104, in deserialize
    self.convert(Direction.DESERIALIZE, value, datatype, Location(filename, None, None), settings),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/luiz/anaconda3/envs/env_voluseg/lib/python3.12/site-packages/databind/core/mapper.py", line 76, in convert
    return context.convert()
           ^^^^^^^^^^^^^^^^^
  File "/home/luiz/anaconda3/envs/env_voluseg/lib/python3.12/site-packages/databind/core/context.py", line 123, in convert
    return self.convert_func(self)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/luiz/anaconda3/envs/env_voluseg/lib/python3.12/site-packages/databind/core/converter.py", line 84, in convert
    return converter.convert(ctx)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/luiz/anaconda3/envs/env_voluseg/lib/python3.12/site-packages/databind/core/converter.py", line 45, in convert
    return self.deserialize(ctx)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/luiz/anaconda3/envs/env_voluseg/lib/python3.12/site-packages/databind/json/converters.py", line 619, in deserialize
    return self.deserialize_from_schema(ctx, schema)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/luiz/anaconda3/envs/env_voluseg/lib/python3.12/site-packages/databind/json/converters.py", line 600, in deserialize_from_schema
    value = ctx.spawn(container[field_name], field.datatype, field_name).convert()
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/luiz/anaconda3/envs/env_voluseg/lib/python3.12/site-packages/databind/core/context.py", line 123, in convert
    return self.convert_func(self)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/luiz/anaconda3/envs/env_voluseg/lib/python3.12/site-packages/databind/core/converter.py", line 84, in convert
    return converter.convert(ctx)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/luiz/anaconda3/envs/env_voluseg/lib/python3.12/site-packages/databind/json/converters.py", line 161, in convert
    values = list(values)
             ^^^^^^^^^^^^
  File "/home/luiz/anaconda3/envs/env_voluseg/lib/python3.12/site-packages/databind/json/converters.py", line 147, in <genexpr>
    ctx.spawn(val, item_type, idx).convert()
  File "/home/luiz/anaconda3/envs/env_voluseg/lib/python3.12/site-packages/databind/core/context.py", line 123, in convert
    return self.convert_func(self)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/luiz/anaconda3/envs/env_voluseg/lib/python3.12/site-packages/databind/core/converter.py", line 84, in convert
    return converter.convert(ctx)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/luiz/anaconda3/envs/env_voluseg/lib/python3.12/site-packages/databind/json/converters.py", line 807, in convert
    member_type = union.members.get_type_by_id(member_name)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/luiz/anaconda3/envs/env_voluseg/lib/python3.12/site-packages/databind/core/union.py", line 236, in get_type_by_id
    raise ValueError(f"{type_id!r} type ID is not a member of {self}\n" + "- \n".join(map(str, errors)))
ValueError: 'numpy' type ID is not a member of ChainUnionMembers(delegates=[EntrypointUnionMembers(group='pydoc_markdown.interfaces.Processor'), ImportUnionMembers()])
'numpy' is not a valid type ID for EntrypointUnionMembers(group='pydoc_markdown.interfaces.Processor')- 
'numpy' does not point to a type (got module instead)
@luiztauffer
Copy link

luiztauffer commented Oct 30, 2024

ok, that was a simple fix at pyproject.toml

[tool.poetry.plugins."pydoc_markdown.interfaces.Processor"]
numpy = "pydoc_markdown.contrib.processors.numpy:NumpyProcessor"

I also fixed on my fork these two lines in numpy.py, using raw strings to avoid unsupported escape sequence errors:
line 221:

citations = re.compile(r"(\.\. )?\[(?P<ref_id>\w+)][_ ]?")

and line 238:

*doctests.sub(r"```python\n\g<0>\n```", "\n".join(contents)).splitlines(),
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment