SPIP Database to Markdown

lang
en

SPIP Database to Markdown

spip2md is a litle Python app that can export a SPIP database into a plain text, Markdown + YAML repository, usable with static site generators.

Features

spip2md is currently able to :

Export every section (spip_rubriques), with every article (spip_articles) they contain
- Replace authors (spip_auteurs) IDs with their name (in YAML block)
- Generate different files for each language found in <multi> blocks
- Copy over all the attached files (spip_documents), with proper links
- Convert SPIP Markup language
- Convert SPIP ID-based internal links (like <art123>) into path-based, normal links

Dependencies

spip2md needs Python version 3.9 or supperior.

spip2md uses three Python libraries (as defined in pyproject.toml) :

Peewee, with a database connection for your database :
- pymysql (MySQL/MariaDB)
PyYaml
python-slugify (unidecode variant prefered)

Installation

Simple `pip` method

Install the package with pip install spip2md (or python -m pip install spip2md if you don’t have pip installed).

Assuming your $PATH contains your pip install directory, you can now run spip2md a normal command of the same name.

Traditional method

Clone this git repo with command git clone and cd into the created directory.

Either make sure you have the dependencies installed system-wide, or create a Python virtual-environment and install them inside.

You can then run spip2md as a Python module with command python -m spip2md. Make sure to replace spip2md with a path to directory spip2md if you didn’t cd into this repository’s directory.

Configuration and Usage

Make sure you have access to the SPIP database you want to export on a MySQL/MariaDB server. By default, spip2md expects a database named spip hosted on localhost, with a user named spip of which password is password, but you can totally configure this as well as other settings in the YAML config file.

If you want to copy over attached files like images, you also need access to the data directory of your SPIP website, usually named IMG, and either rename it data in your current working directory, or set data_dir setting to its path.

YAML configuration file

To configure spip2md you can place a file named spip2md.yml in standard *nix configuration locations, set it with the command line argument, or run the program with a spip2md.yml file in your working directory.

Here’s the default configuration options with comments explaining their meaning :

# Data source settings
db: spip # Name of the database
db_host: localhost # Host of the database
db_user: spip # The database user
db_pass: password # The database password
data_dir: data # The directory in which SPIP images & files are stored

# Data destination settings
export_languages: ["en"] # Array of languages to export, two letter lang code
# If set, directories will be created only for this language, according to this
# language’s titles. Other languages will be written along with correct url: attribute
storage_language: null
output_dir: output/ # The directory in which files will be written

# Destination directories names settings
# Prepend ID to directory slug, preventing collisions
# If false, a counter will be appended in case of name collision
prepend_id: false
# Prepend lang of the object to directory slug, prenventing collision between langs
prepend_lang: false
title_max_length: 42 # Maximum length (chars) of a single filename

# Text body processing settings
remove_html: true # Should we clean remaining HTML blocks
metadata_markup: false # Should we keep markup (Markdown) in metadata fields, like title
unknown_char_replacement: ?? # String to replace broken encoding that cannot be repaired
prepend_h1: false # Add title of articles as Markdown h1, looks better on certain themes
# Array of objects with 2 or 3 values, allowing to move some fields into others.
# {src: moved_field_name, dest: destination_field_name, repr: "how to merge them"}
# repr is formatted with "{}" being the moved field, and "_" the destination one
# For example, to append a field "subtitle" to a field "title":
#   - src: subtitle
#     dest: title
#     repr: "{} _" # (this is the default repr)
move_fields: []
# Some taxonomies (Spip Mots types) to not export, typically specific to Spip functions
ignore_taxonomies: ["Gestion du site", "Gestion des articles", "Mise en page"]
rename_taxonomies: { equipes: "tag-equipes" } # Rename taxonomies (prenvent conflict)

# Ignored data settings
export_drafts: true # Should we export drafts
export_empty: true # Should we export empty articles
ignore_patterns: [] # List of regexes : Matching sections or articles will be ignored

# Settings you probably don’t want to modify
clear_log: true # Clear logfile between runs instead of appending to
clear_output: true # Clear output dir between runs instead of merging into

logfile: log-spip2md.log # Name of the logs file
loglevel: WARNING # Refer to Python’s loglevels

export_filetype: md # Filetype of exported text files

External links

SPIP Database structure

TODO

These tables seem to contain not-as-useful information, but this needs to be investicated :

spip_evenements
spip_meta
spip_mots
spip_syndic_articles
spip_mots_liens
spip_zones_liens
spip_groupes_mots
spip_meslettres
spip_messages
spip_syndic
spip_zones

These tables seem technical, SPIP specific :

spip_depots
spip_depots_plugins
spip_jobs
spip_ortho_cache
spip_paquets
spip_plugins
spip_referers
spip_referers_articles
spip_types_documents
spip_versions
spip_versions_fragments
spip_visites
spip_visites_articles

These tables are empty :

spip_breves
spip_evenements_participants
spip_forum
spip_jobs_liens
spip_ortho_dico
spip_petitions
spip_resultats
spip_signatures
spip_test
spip_urls

Name		Name	Last commit message	Last commit date
Latest commit History 292 Commits
scripts		scripts
spip2md		spip2md
.dockerignore		.dockerignore
.gitignore		.gitignore
COPYING		COPYING
LICENSE		LICENSE
README.md		README.md
dockerfile		dockerfile
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Uh oh!

Repository files navigation

SPIP Database to Markdown

Features

Dependencies

Installation

Simple `pip` method

Traditional method

Configuration and Usage

YAML configuration file

External links

TODO

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Licenses found

LCPQ/spip2md

Folders and files

Latest commit

History

Repository files navigation

SPIP Database to Markdown

Features

Dependencies

Installation

Simple pip method

Traditional method

Configuration and Usage

YAML configuration file

External links

TODO

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Simple `pip` method

Packages