Content from Introduction to Data Python Data Analysis Projects
Last updated on 2022-11-15 | Edit this page
Estimated time: 0 minutes
This episode will introduce the various tools that will be taught throughout the lesson and how the components relate to one another. This will set the foundation and motivation for the lesson.
Overview
Questions
- What are common features of a project?
- What do I need to do to get my project shared?
- What will this lesson cover
Objectives
- Categorize pieces of code and organize them for efficient future use
- Identify components of a complete project
A Data Analysis Project
Exercise
In small groups, describe all of the steps you might go through in developing a project, how it could work, and the things you want your project to do. Then discuss problems you anticipate or have had.
The rest of the episode adds them
Workflows, project stages, and common challenges
- collaboration
- work on multiple computers
- promote the work
- Make file:
- Pipeline tools
- backup
Data and Code
- Different back up needs, different space requirements
- Different sharing needs
- Shared server examples
- scripts, numerical experiments, plotting(that get narrative)
- things that are project specific
- things that are method-related might be reused
- these can be grouped as a package for install and then imported
- can become citable: Zenodo, get a DOI
- Data documentation, who , where, when, why: Mozilla has a checklist
Environments
- the set of requirements and dependencies
- what version of different software and packages
- don’t need to track it ourselves, the environment is like a wrapper
- many different managers; one is conda
Documentation
- demonstrate and publicize what you did (beyond an academic paper)
- help your team use your code
- clarify your thinking to do it in real time
- multiscale: overview, details,
- need to write the parts in natural language; but don’t need to work on the infrastructure, tools can do that for you
Why do good practices matter?
Lots of things can work and following “best” practices can take a lot of extra time. Why should we follow them and seek them?
- Jupytercon talk
on issues about the problems with notebooks
- hidden states
- more risk for beginners
- bad habits
- hinder reproducibility
- automation tools are based on good practices: a little bit of good,
helps fancy stuff be easy
- sphinx autodocs
Key Points
- Projects have common structures
- Packaging enables a project to be installed
- An environment allows different people to all have the same versions and run software more reliably
- Documentation is an essential component of nay complete project and should exist with the code
Content from Setting up a Project
Last updated on 2022-11-15 | Edit this page
Estimated time: 0 minutes
Project Organization
Overview
Questions
- How do I set up a project in practice?
- What organization will help support the goals of my project?
- What additional infrastructure will support opening my project
Objectives
- Create a project structure
- Save helper excerpts of code
Now that we’ve brainstormed the parts of a project and talked a little bit about what each of them consists of. How should we organize the code to help our future self and collaborators?
There isn’t a specific answer, but there are some guiding principles. There are also some packages that create a basic setup for you. These are helpful for getting started sometimes, if you are building something that follows a lot of standards, but do not help you reorganize your existing ode.
We will begin in this section talking about how to start from scratch, noting that often the reality is that you have code and want to organize and sort it to be more functional. We start from clean to give you the ideas and concepts, then we’ll return to how to sort and organize code into the bins we created.
Exercise
Let’s look around on GitHub for some examples and compare and contrast them.
Here are some ideas to consider:
Questions
- What files and directory structures are common?
- Which ones do you think you could get started with right away?
- What different goals do they seem to be organized for?
So next we think about how these ideas and which of these and talk about some specific advice in each topic.
File Naming
This is the least resistence step you can take to make your code more reusable. Naming things is an important aspect of programming. This Data Carpentry episode provides some useful principles for file naming.
These are the three main characteristics of a good file name:
- Machine readable
- Human readable
- Plays well with default ordering
Guiding Principles
There are numerous resources on good practices for starting and developing your project, such as:
- NeurIPS Tips for Publishing Research Code
- GitHub’s Open Source Guide
- Good Enough Practices in Scientific Computing (PLoS Comp Bio)
In this lesson, we are going to create a project that attempts to abide by the guiding principles presented in these resources.
Setting up a project
Sometimes we get to start from scratch. So we can set up everything from the beginning.
Templates
For some types of projects there are tools that generate the base structure for you. These tools are sometimes called “cookie cutters” or simply project templates. They are available in a variety of languages, and some examples include:
For our lesson, we will be manually creating a small project. However, it will be similar to the examples above.
BASH
git clone
cd project
mkdir data
mkdir docs
mkdir experiments
mkdir package
touch setup.py
touch README.md
We will also have a .gitignore
file and some files and
folders that are not included. In general data is ignored, but scripts
that download or process the data in some way, are good to keep. Results
should be ignored.
Exercise
Make each of the following files in the project in the correct
location by replacing the __
on each line
BASH
touch __/raw_data.csv # raw data for processing
touch __/generate_figures.py # functions to create figures for presentation/publication
touch __/new_technique.py # contains the novel method at the core of your publication
touch __/reproduce_paper.py # code to re-run the analyses reported in your methods paper about the package
touch __/helper_functions.py # auxilliary functions for routine tasks associated with the novel method
touch __/how_to_setup.md # details to help others prepare equivalent experiments to those presented in your paper
Open Source Basics, MWE
Open source guidelines are generally written to be ready to scale. Here we propose the basics to get your project live and usable vs. things that will help if it grows and builds a community, but n
README
A README file is the first information about your project most people will see. It should encourage people to start using it and cover key steps in that process. It includes key information, such as:
- What the project does
- Why the project is useful
- How users can get started with the project
- Where users can get help with the project
- Who maintains and contributes to the project
- How to repeat the analysis (if it is a data project)
If you are not sure of what to put in your README, these bullet points are a good starting point. There are many resources on how to write good README files, such as Awesome README.
Exercise
Choose 2 README files from the Awesome README gallery examples or from projects that you regularly use and discuss with a group:
- What are common sections?
- What is the purpose of the file?
- What useful information does it contain?
Licenses
As a creative work, software is subject to copyright. When code is published without a license describing the terms under which it can be used by others, all of the author’s rights are reserved by default. This means that no-one else is allowed to copy, re-use, or adapt the softwarewithout the express permission of the author. Such cases are surprisingly common but, if you want your methods to be useful to, and used by, other people you should make sure to include a license to tell them how you want them to do this.
Choosing a license for your software can be intimidating and confusing, and you should make sure you feel well-informed before you do so. This lesson and the paper linked from it provide more information about why licenses are important, which are in common use for research software, and what you might consider when choosing one for your own project. Choosealicense.com is another a helpful tool to guide you through this process.
Exercise
Using the resources linked above, compare the terms of the following licenses:
What do you think are the benefits and drawbacks of each with regards to research software?
Discuss with a partner before sharing your thoughts with the rest of the group.
Open Source, Next Steps
Other common components are
- code of conduct
- contributing guidelines
- citation
Even more advanced for building a community
- issue templates
- pull request templates
- pathways and personas
For training and mentoring see Mozilla Open Leaders. For reading, check out the curriculum.
Re-organizing a project
Practice working on projects
FIXME: provide a example project folder, spend time sorting, or allow people some time to work on their own projects and generating questions.
Key Points
- Data and code should be governed by different principles
- A package enables a project to be installed
- An environment allows different people to all have the same versions and run software more reliably
- Documentation is an essential component of nay complete project and should exist with the code
Content from Packaging Python Projects
Last updated on 2022-11-15 | Edit this page
Estimated time: 0 minutes
Recall: Functions
Overview
Questions
- How do I use my own functions?
- How can I make my functions most usable for my collaborators?
Objectives
- Identify the components of a Python package
- Apply a template for packaging existing code
- Update the packaged project after modifying the code
- Install and update a local or GitHub-hosted package
When we develop code for research, we often start by writing
unorganized code in notebook cells or a script. Eventually, we might
want to re-use the code we wrote in other contexts. In order to re-use
code, it is helpful to organize it into functions and classes in
separate .py
files. We call these files
modules, and will soon go into more detail about them.
Whenever we refer to a module in Python, we can think
of it as as .py
file that has other code, typically
functions or other objects, in it.
For example, say we are making a program that deals with temperature date. We have a function to convert from degrees Fahrenheit to Celsius:
PYTHON
def fahr_to_celsius(temperature):
"""
Function to convert temperature from fahrenheit to Celsius
Parameters
-------------
temperature : float
temperature in Fahrenheit
Returns
--------
temperature_c : float
temperature in Celsius
"""
return (temperature - 32) * (5 / 9)
We use this function a lot, so we don’t want to have to copy and paste it every time. Instead, we can store it in a module and import it from there. You have probably imported modules or functions before, this time we will do that for our own code!
Pip
Pip is the most common package manager for Python. Pip allows you to
easily install Python packages locally from your computer or from an
online repository like the Python Package
Index (PyPI). Once a package is installed with pip, you can
import
that package and use it in your own code.
Pip is a command line tool. We’ll start by exploring its help manual:
pip
{:.language-bash}
The output will look like this
OUTPUT
Usage:
pip <command> [options]
Commands:
install Install packages.
download Download packages.
uninstall Uninstall packages.
freeze Output installed packages in requirements format.
list List installed packages.
show Show information about installed packages.
check Verify installed packages have compatible dependencies.
config Manage local and global configuration.
search Search PyPI for packages.
wheel Build wheels from your requirements.
hash Compute hashes of package archives.
completion A helper command used for command completion.
help Show help for commands.
General Options:
-h, --help Show help.
--isolated Run pip in an isolated mode, ignoring
environment variables and user configuration.
-v, --verbose Give more output. Option is additive, and can be
used up to 3 times.
-V, --version Show version and exit.
-q, --quiet Give less output. Option is additive, and can be
used up to 3 times (corresponding to WARNING,
ERROR, and CRITICAL logging levels).
--log <path> Path to a verbose appending log.
--proxy <proxy> Specify a proxy in the form
[user:passwd@]proxy.server:port.
--retries <retries> Maximum number of retries each connection should
attempt (default 5 times).
--timeout <sec> Set the socket timeout (default 15 seconds).
--exists-action <action> Default action when a path already exists:
(s)witch, (i)gnore, (w)ipe, (b)ackup, (a)bort).
--trusted-host <hostname> Mark this host as trusted, even though it does
not have valid or any HTTPS.
--cert <path> Path to alternate CA bundle.
--client-cert <path> Path to SSL client certificate, a single file
containing the private key and the certificate
in PEM format.
--cache-dir <dir> Store the cache data in <dir>.
--no-cache-dir Disable the cache.
--disable-pip-version-check
Don't periodically check PyPI to determine
whether a new version of pip is available for
download. Implied with --no-index.
--no-color Suppress colored output
This shows the basic commands available with pip and and the general options.
Exercise
- Use pip to install the
sphinx
package, we will need it later. - Choose a pip command and look up its options. Discuss the command with your neighbour.
Python Modules
A module is a piece of code that serves a specific purpose. In
Python, a module is written in a .py
file. The name of the
file is name of the module. A module can contain classes, functions, or
a combination of both. Modules can also define variables for use, for
example, numpy defines the value of pi
with numpy.pi
.
If a .py
file is on the path, we can import functions
from it to our current file. Open up Python, import sys
and
print the path.
import sys
sys.path
{:.language-python}
OUTPUT
['',
'/home/vlad/anaconda3/lib/python37.zip',
'/home/vlad/anaconda3/lib/python3.7',
'/home/vlad/anaconda3/lib/python3.7/lib-dynload',
'/home/vlad/anaconda3/lib/python3.7/site-packages'
]
Here we see that Python is aware of the path to the Python
executable, as well as other directories like
site-packages
.
sys.path is a list of strings, each describing the absolute path to a
directory. Python will look in these directories for modules. If we have
a directory containing modules we want Python to be aware of, we append
it that directory to the path. If I have a package in
/home/vlad/Documents/science/cool-package
I add it with
sys.path.append
sys.path.append('/home/vlad/Documents/science/cool-package')
sys.path
{:.language-python}
OUTPUT
['',
'/home/vlad/anaconda3/lib/python37.zip',
'/home/vlad/anaconda3/lib/python3.7',
'/home/vlad/anaconda3/lib/python3.7/lib-dynload',
'/home/vlad/anaconda3/lib/python3.7/site-packages',
'/home/vlad/Documents/science/cool-package'
]
We can see that the path to our module has been added to
sys.path
. Once the module you want is in sys.path, it can
be imported just like any other module.
Python Packages
To save adding modules to the path every time we want to use them, we
can package our modules to be installable. This method of importing
standardises how we import modules across different user systems. This
is why when we import packages like pandas
and
matplotlib
we don’t have to write out their path, or add it
to the path before importing. When we install a package, its location
gets added to the path, or it’s saved to a location already on the
path.
Many packages contain multiple modules. When we
import matplotlib.pyplot as plt
we are importing only the
pyplot module, not the entire matplotlib package. This use of
package.module
is a practice referred to as a
namespace. Python namespaces help to keep modules and
functions with the same name separate. For instance, both scipy and
numpy have a rand
function to create arrays of random
numbers. We can differentiate them in our code by using
scipy.sparse.rand
and numpy.random.rand
.
respectively
In this way, namespaces allow multiple packages to have functions of the same name without creating conflicts. Packages are namespaces or containers which can contain multiple modules.
Making Python code into a package requires no extra tools. We need to
- Create a directory, named after our package.
- Put modules (
.py
files) in the directory. - Create an
__init__.py
file in the directory - Create a
setup.py
file alongside the directory
Our final package will look like this:
├── package-name
│ ├── __init__.py
│ ├── module-a.py
│ └── module-b.py
└── setup.py
The __init__.py
file tells Python that the directory is
supposed to be tread as a package.
Let’s create a package called conversions with two modules temperature and speed.
Step 2: Adding Modules
conversions/temperature.py
PYTHON
def fahr_to_celsius(temperature):
"""
Function to convert temperature from fahrenheit to Celsius
Parameters
-------------
temperature : float
temperature in Fahrenheit
Returns
--------
temperature_c : float
temperature in Celsius
"""
return (temperature - 32) * (5 / 9)
the file temperature.py will be treated as a module called
temperature. This module contains the function
fahr_to_celsius
. The top level container is the package
conversions
. The end user will import this as:
from conversions.temperature import fahr_to_celsius
Exercise
- Create a file named speed.py inside the
conversions directory and add a function named
kph_to_ms
that will convert kilometres per hour to meters per second. Here’s the docstring desribing the function:
Step 3 Adding the init file
Finally, we create a file named __init__.py
inside the
conversions
directory:
The init file is the map that tells Python what our package looks like. It is also what tells Python a directory is a module. An empty init file marks a directory as a module.
Now, if we launch a new Python terminal from this directory, we can import the package conversions
Even if the __init__.py
file is empty, its existence
indicates to Python that we can import names from that package. However,
by adding import code to it, we can make our package easier to use. Add
the following code to the init file:
The .
before the temperature
and
speed
means that they refer to local modules, that is,
files in the same directory as the __init__.py
file. If we
start a new Python interpreter, we can now import
fahr_to_celsius
and kph_to_ms
directly from
the conversions
module:
Now, we can import from conversions
, but only if our
working directory is one level above the conversions
directory. What if we want to use the conversions
package from another project or directory?
SetupTools and installing Locally
The file setup.py contains the essential information about our package for PyPI. It needs to be machine readable, so be sure to format it correctly
PYTHON
import setuptools
with open("README.md", "r") as fh:
long_description = fh.read()
setuptools.setup(
name="conversions",
version="0.0.1",
author="Example Author",
author_email="author@example.com",
description="An example package to perform unit conversions",
long_description=long_description,
long_description_content_type="text/markdown",
url="https://github.com/pypa/sampleproject",
packages=setuptools.find_packages(),
classifiers=[
"Programming Language :: Python :: 3",
"License :: OSI Approved :: MIT License",
"Operating System :: OS Independent",
],
)
Now that our code is organized into a package and has setup instructions, how can we use it? If we try importing it now, what happens?
We need to install it first. Earlier, we saw that pip can install packages remotely from PyPI. pip can also install from a local directory.
Relative file paths
We want to install the package located in the
conversions/
directory. If we move inside that directory,
we can refer to it as .
. This is a special file path that
means the current directory. We can see what directory we are in with
the pwd
command, that stands for “print working directory”.
Other special file paths are ..
, meaning “the directory
containing this one”, and ~
, that refers to the current
user’s home directory (usually /home/<user-name>
for
UNIX systems).
Usually the .
and ..
file paths are hidden
if we run ls
(and the same happens for all file names that
start with the .
character), but if we run
ls -a
, we can list them:
OUTPUT
. .. conversions setup.py
So, to install our package, we can run:
The -e
flag (aka --editable
) tells pip to
install this package in editable mode. This allows us to make changes to
the package without re-installing it. Analysis code can change
dramatically over time, so this is a useful option!
Now we can try importing and using our package.
Command Line Tools
FIXME: how to make a tool command line installable
More details on this may be found at on the Python packaging documentation site
Getting a Package from A Colleague
Many projects are distributed via GitHub as open source projects, we can use pip to install those as well.
Using git clone
Download and unzip their folder
Direct download via pip
cd project_dir
pip install .
{: language-bash}
PyPI Submission
To make pip install packagename
work you have to submit
your package to the repository. We won’t do that today, but an important
thing to think about if you might want to go this direction, is that the
name must be unique. This mens that i’s helpful to check pipy before
creating your package so that you chooses a name that is availalbe.
To do this, you also need to package it up somewhat more. There are
two types of archives that it looks for, as ‘compiled’ versions of your
code. One is a source archive (tar.gz
) and the other is a
built distribution (.whl
). The built version will be used
most often, but the source archive is a backup and makes your package
more broadly compatible.
The next step is to generate distribution packages for the package. These are archives that are uploaded to the Package Index and can be installed by pip.
Make sure you have the latest versions of setuptools and wheel installed:
python3 setup.py sdist bdist_wheel
{: language-bash} This command should output a lot of text and once completed should generate two files in the dist directory:
dist/
example_pkg_your_username-0.0.1-py3-none-any.whl
example_pkg_your_username-0.0.1.tar.gz
{: language-bash}
Finally, it’s time to upload your package to the Python Package Index!
First, we’ll register for accounts on Test PyPI, intended for testing and experimentation. This way, we can practice all of the steps, without publishing our sample code that we’ve been working with.
Go to test.pypi.org/account/register/ and complete the steps on that page, then verify your account.
Now that you are registered, you can use twine to upload the distribution packages. You’ll need to install Twine:
Once installed, run Twine to upload all of the archives under dist:
You will be prompted for the username and password you registered with Test PyPI. After the command completes, you should see output similar to this:
BASH
Uploading distributions to https://test.pypi.org/legacy/
Enter your username: [your username]
Enter your password:
Uploading example_pkg_your_username-0.0.1-py3-none-any.whl
100%|█████████████████████| 4.65k/4.65k [00:01<00:00, 2.88kB/s]
Uploading example_pkg_your_username-0.0.1.tar.gz
100%|█████████████████████| 4.25k/4.25k [00:01<00:00, 3.05kB/s]
Once uploaded your package should be viewable on TestPyPI, for example, https://test.pypi.org/project/example-pkg-your-username
test by having your neighbor install your package.
Since they’re not actually a packaged with functionality, we should
uninstall once we’re done with pip uninstall
Key Points
- Packaged code is reusable within and across systems
- A Python package consists of modules
- Projects can be distributed in many ways and installed with a package manager
Content from Managing Python Environments with Conda
Last updated on 2022-11-15 | Edit this page
Estimated time: 0 minutes
Environments and environment managers
Overview
Questions
- How can I make sure the whole team (or lab) gets the same results?
- How can I simplify setup and dependencies for people to use my code or reproduce my results?
Objectives
- Identify an environment, dependencies, and an environment manager
- Use conda to install a different version of python
- Use conda to create an environment per project
- Store a projects dependencies
An environment consists of a certain Python version and some packages
Why use one:
- to delvier code and keep it the same versions
- to contou use ribute to a package y
how to chose which of the main strategies to use: virtualenv and pip or conda
Dependencies
Conda Python installs
Conda for projects
Key Points
- A python dependency is another, independent package that a given project uses and requires to be able to run
- An environment is
- An environment manager enables one step installing and documentation of dependencies, including versions
- Conda is the included environment manager with Anaconda; it is also an installer
- Other popular environment managers are FIXME
Content from Managing Python Environments with VirtualEnv
Last updated on 2022-11-15 | Edit this page
Estimated time: 0 minutes
Environments and Package managers
Overview
Questions
- How can I make sure the whole team (or lab) gets the same results?
- How can I simplify setup and dependencies for people to use my code or reproduce my results?
Objectives
- Identify an environment, dependencies, and an environment manager
- Install an older version of python
- Use virtualenv to create an environment per project
- Store a projects’ dependencies
- install dependencies for a project
An environment consists of a certain Python version and some packages. A virtual environment allows you to have multiple, independent versions of python on your system. Environments can also be saved so that you can install all of the packages and replicate the environment on a new system.
Why use one:
- to deliver code and keep it the same versions
- to use contribute to a package you also use
- to install on servers
- to share your environment with others
how to chose which of the main strategies to use:
virtualenv
and pip
or conda
conda
comes from Anaconda and does both package
management and provides a virtual environment.
pip
is the main python package installer
virtualenv
creates environments and are pip
install compatible.
Making your own packages pip installable requires fewer dependencies,
so we’ll focus on virtualenv
and pip
in this
workshop
Create an environment
Before we create an environment, let’s see what happens when we import one of our favorite packages. In a python interpreter:
That should work, because we have the package installed on our system. If not, use a package you know you have installed, or install numpy.
Next, we’ll create an environment an environment from scratch.
if python 3 isn’t your default you might need to pass the version of python that you want installed:
then we can activate the environment
Now we see that the cli changes to show the environment name and we can further test our environment with our favorite package from before.
Now, it won’t work, but we can install it and a few other favorites.
save an environment
Deactivate an environment
When you’re done with an environment, you exit it with deactivate. Also note that an environment only exists in the one terminal window. If you open a new terminal, you’ll be back to your default environment.
Exercise
download a project, create a new environment and install from the requirements file
Hint: use the pip man file to find options you can pass to
pip install
Key Points
- A python dependency is another, independent package that a given project uses and requires to be able to run
- An environment is
- An environment manager enables one step installing and documentation of dependencies, including versions
- Virtualenv is …
Content from Getting started with Documentation
Last updated on 2022-11-15 | Edit this page
Estimated time: 0 minutes
Audiences for documentation
Overview
Questions
- How do I tell people how to use my code and advertise my project
Objectives
- Identify types of documentation in a project
- Access different types of documentation for a given project
Documentation serves many purposes and many audiences, including
- future self
- collaborators
- users
- contributors
Exercise
in small groups, brainstorm the different goals for reading documentation that different audiences might have
How is documentation used?
For a potential user, they first need to understand what your code does and how it works enough to determine if they want to use it. They might need to know what dependencies it has, what features, limitations, etc
Next the user will need to know how to install the code and make it run. A collaborator or contributor might need different instructions than a more passive user.
Once we’re using it we may have questions about details of the implementation or how the pieces work together. We may need to know the usage for a specific function.
In any python kernel we have access to information about all objects
available through the help()
function.
help(print)
{:.language-python}
We can use this at a terminal or in a Jupyter noteook. In a Jupyter
notebook we can also access help with ?
and with
shift + tab. These forms of help all use the
docstring in python.
Literal Documentation
installation guides, README files, how to repeat analysis
Purpose: Literal documentation helps users understand what your tool does and how to get started using it.
Location: Literal documentation lives outside of the code, but best
practice is to keep it close. We will see that tools to support literal
documentation in your code base recommend a docs
folder
with the files in there. These can be rendered as a book.
API Documentation
Purpose: API documentation describes the usage (input, output, description of what it does) for each piece of your code. This includes classes and functions
Location and Format: Doc strings in python live inside the function. We’ll see more eamples of these in the next episode
def best_function_ever(a_param, another_parameter):
"""
this is the docstring
"""
Tutorials
Purpose: To give a thorough, runnable overview of how to accomplish something with your package, maybe reprduce experimental results, or how to get started.
Location and Format: These go alongside the literal documentation
often and are typically in a .y
Examples or Cookbooks
Purpose: To give common or anticipated patterns of use for your code.
Location and Format: These are smaller excerpts of code, they typically live in a gallery type format.
Putting it all together
Exercise
FIXME: matching exercise sorting examples of documentation into the types and/ or matching questions/goals to a type of documentation or location
Key Points
- Documentation tells people how to use code and provides examples
- Types of documentation include: literal, API, and tutorial/example
- Literal Documentation lives outside the code and explains the big picture ideas of the project and how to get it ste up
- API documentation lives in docstrings within the code and explains how to use functions in detail
- Examples are scripts (or notebooks, or code excerpts) that live alongside the project and connect between the details and the common tasks.
Content from Documentation in Code
Last updated on 2022-11-15 | Edit this page
Estimated time: 0 minutes
Documenting for collaboration
Overview
Questions
- How should I document my code in the files?
Objectives
- Outline new functions with comment psuedocode
- Create numpydoc friendly docstrings
- explain the steps,
- psuedocode
API Documentation
Doctrings Numpydoc syntax
Key Points
- Docstrings describe functions
- comments throughout the code help onboard and debug
Content from Building Documentation with Sphinx
Last updated on 2022-11-15 | Edit this page
Estimated time: 0 minutes
Sphinx is a tool for building documentation.
Overview
Questions
- How can I make my documentation more accessible
Objectives
- Build a documentation website with sphinx
- Add overview documentation
- Distribute a sphinx documentation site
What does sphinx produce?
Exercise
In a group, have each member open one of the following packages’ documentation
Discuss what the common components are, what is helpful about these documentation sites, how they address the general concepts on documentation, how they’re similar and how they’re different.
these all use sphinx to generate them?
Sphinx quickstart
Install Sphinx if you haven’t done so already:
Move into the directory that is to store your documentation:
Start the interactive Sphinx quickstart wizard, which creates a
Sphinx config file, conf.py
, using your preferences.
Suggested responses to the wizard’s questions:
- Separate source and build directories? -> yes
- Project name -> sensible to re-use the package name
- Author name(s) -> list of authors
-
Project release -> sensible to re-use the package
version specified in
setup.py
(see lesson 3) e.g. ‘0.1’ - Project language ->
en
, but you may want to target other languages as well/instead.
This will create:
-
docs/source/conf.py
-> Sphinx configuration file -
docs/source/index.rst
-> Sphinx main index page, which like almost all Sphinx content, is written in reStructured Text (like Markdown) -
docs/Makefile
-> for performing various tasks on Linux/macOS e.g. building HTML or a PDF -
docs/make.bat
-> for performing those tasks on Windows
You should now be able to build and serve this basic documentation site using:
When you browse to the URL shown in the output of the second command you can see your HTML documentation site but it’s looking fairly bare! Let’s learn a little more about reStructuredText then start adding some content to our documentation site.
Adding literal documentation
FIXME: RST overview
FIXME: adding pages
API Documentation
Add an api line to the index.rst
so that it has a link
to it.
The create an API.rst file:
Key Points
- Building documentation into a website is a common way of distributing it
- Sphinx will auto build a website from plain text files and your docstrings
Content from Example Gallery with Sphinx Gallery
Last updated on 2022-11-15 | Edit this page
Estimated time: 0 minutes
Exercise
look at pages with good examples. FIXME FIXME: discussion questions
Overview
Questions
- How can I include a number of use cases?
Objectives
- Add sphinx-gallery as an extension
- Outline example material
Sphinx Gallery Setup
Sphinx Gallery
What does a good example look like?
Key Points
- Sphinx Gallery creates a gallery for
- examples and tutorials
Content from Publishing code and data
Last updated on 2022-11-15 | Edit this page
Estimated time: 0 minutes
Why and what
Overview
Questions
Objectives
Publishing makes the code, data, and documentation accessible. We ’ll address each in turn.
Releasing isn’t necessarily enough.
Publishing Code Getting A DOI
Zenodo, archiving a copy, and doi
Serving the documentation
|Read the Docs | Gh-pages|
Content from Testing and Continuous Integration
Last updated on 2022-11-15 | Edit this page
Estimated time: 0 minutes
Testing
Overview
Questions
- How can I make sure code doesn’t get broken by changes?
- How can I automate that checking?
Objectives
- Understand basic testing tools
- Configure and TravisCI with Github
Automated testing can seem intimidating. Having every compontent of a large software application tested for correctness requires a lot of
Testing Check pretty basic things about the results save to file Then you dont have to worry about breaking Note that you’re testing interactively as you develop, then break it out as a formal test Brainstorm what you test as you’re working, how can you formalize that