-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathDockerfileWikiBrainOnly
More file actions
120 lines (101 loc) · 4.24 KB
/
DockerfileWikiBrainOnly
File metadata and controls
120 lines (101 loc) · 4.24 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
#
# WikiBrain Container
#
# When running an image built with this Dockerfile, it is required to define
# the following environment variables:
#
# MEM:
# The number of megabytes, appended with 'm' to allocate to the JVM when
# running WikiBrain's loader, e.g. '8000m'. If this is set too low, the
# diagnostic stage of WikiBrain's loader will fail and produce a helpful
# error message containing the necessary amount of memory. NOTE: if you set
# MEM higher than the maximum amount of memory available to Docker (as
# defined in your Docker preferences), the loader will crash with an
# extremely vague and unhelpful error message (code 137).
#
# WIKILANG:
# The (usually two-letter) language code of the Wikipedia from which you'd
# like to load pages, e.g. 'en' or 'simple'.
#
# To get the eventual output files from CartoExtractor, you'll need to set up a
# "volume" at run time to be shared with the host. This can be done with the
# '-v HOST_DIR:/output' where HOST_DIR is a path on the host to a directory
# (to be made if it doesn't exist) where the output files will be sent.
#
# The built image should also be run with the following options specifying
# shared memory parameters, which are needed for WikiBrain:
#
# --sysctl kernel.shmmax=64205988352
# --sysctl kernel.shmall=15675290
#
# Running the image will automatically start by running WikiBrain loader and
# CartoExtractor. If the '-it' option is given at runtime, it will then give
# the user an interactive shell into the container.
#
# The following two lines are an example of how to build an image and run a
# container from this Dockerfile:
#
# docker build -t CartoContainer .
# docker run --sysctl kernel.shmmax=64205988352 --sysctl kernel.shmall=15675290 -e WIKILANG=$1 -e MEM=9g -v ./output:/output -it CartoContainer
# Pull Ubuntu base image.
FROM ubuntu
# Install Java. Source: TODO: track down and add source
RUN apt-get update
# Add Oracle Repository
RUN echo oracle-java8-installer shared/accepted-oracle-license-v1-1 select true | debconf-set-selections
RUN apt-get --assume-yes install software-properties-common
RUN add-apt-repository -y ppa:webupd8team/java
# TODO: check if below update is meaningful
RUN apt-get update
# Install Java 8 package from Oracle Repository
RUN apt-get install --assume-yes oracle-java8-installer
RUN rm -rf /var/lib/apt/lists/*
RUN rm -rf /var/cache/oracle-jdk8-installer
# Define Java_Home
ENV JAVA_HOME /usr/lib/jvm/java-8-oracle
# Install maven
RUN apt-get update
RUN apt-get install --assume-yes maven
## Clone WikiBrain and CartoExtractor from Git
# Install Git
RUN apt-get --assume-yes install git
WORKDIR /home
# Clone WB to appropriate path
RUN git clone https://github.com/shilad/wikibrain.git ./wikibrain
WORKDIR /home/wikibrain
# Checkout <develop> branch in Git
RUN git checkout develop
RUN mvn -f wikibrain-utils/pom.xml clean compile exec:java -Dexec.mainClass=org.wikibrain.utils.ResourceInstaller
# Install PostgreSQL
WORKDIR /home/
ADD apt.postgresql.org.sh script.sh
RUN chmod 111 script.sh && yes | ./script.sh
RUN DEBIAN_FRONTEND=noninteractive apt-get install -y -q postgresql-9.5
RUN apt-get install -y postgresql-9.5-postgis-2.3
# Update Postgresql settings:
ADD postgres.conf postgres.conf
RUN cp postgres.conf /etc/postgresql/9.5/main/postgres.conf
# Add Custom WikiBrain Configuration File
WORKDIR /home/wikibrain/
ADD customized.conf_template customized.conf_template
# Add script to create appropriate users and DBs in Postgres
ADD postgres_setup.sh postgres_setup.sh
# Add pre-downloaded English Wikipedia
# ADD download en/download
# Update prostgresql settings:
ADD postgres.conf postgres.conf
RUN cp postgres.conf /etc/postgresql/9.5/main/postgres.conf
# Add script to create appropriate users and DBs in Postgres
ADD postgres_setup.sh postgres_setup.sh
WORKDIR /home/wikibrain
CMD \
## Start up and configure for PostgreSQL
# Start psql daemon
service postgresql start && \
# Add appropriate db & user to psql
sh postgres_setup.sh && \
# Generate (wiki) language-appropriate psql conf file for WikiBrain
sed "s/<WIKILANG>/$WIKILANG/" customized.conf_template > customized.conf && \
# Run WikiBrain's loader
./wb-java.sh -Xmx$MEM org.wikibrain.Loader -l $WIKILANG -c customized.conf && \
bash