-
Notifications
You must be signed in to change notification settings - Fork 111
Use types from pudo's 'typecast' library, PEP8 etc #171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
davidread
wants to merge
36
commits into
master
Choose a base branch
from
cleanup-mt2-redux
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
36 commits
Select commit
Hold shift + click to select a range
1390f09
Use `typecast` for type conversion.
pudo 879dc69
Fix up type guessing tests.
pudo 2fdaf25
Hide coverage results.
pudo d1d0972
Clean up imports.
pudo 2f71c24
Get rid of old type names.
pudo f06a3c1
Clean out old aliases for XLSXTableSet
pudo 92fb215
Further pieces of clean up.
pudo 1108885
Start getting rid of the compatibility layer
pudo ed8cda1
Remove remaining awkward compatibility work-arounds.
pudo e87c774
avoid circular import
pudo 3dd9bad
Clean up README.
pudo 8a56e5d
fix py3 compat
pudo afca917
Don’t raise for 0 as a date.
pudo 5f4d978
fix up test errors, attempt to make travis pass
pudo 145e2ee
skip tests if en_GB is not supported
pudo dcdf21d
remove ambiguous var
pudo de3e840
dont score null values in type detection
pudo 10576f3
Move test utilities to a specific module.
pudo 7da15bf
Move the buffered reader to it’s own module.
pudo ccb094c
Move guesser class to typecast.
pudo 2565632
Factor out CSV re-coder
pudo b63baeb
use cchardet
pudo 2e4b96c
simplify the handling of CSV dialects
pudo f373325
try relative imports with py3
pudo 96549a9
PEP8.
pudo 910b6c2
Simplify JTS code.
pudo a4c22f3
pep8
pudo b7b4851
Move stuff around.
pudo b8f15ed
Formatting.
pudo 3c96240
Replace CSV reader with a fully streaming implementation.
pudo ce3627c
Fix up Python 3 support
pudo 7dd9e5b
confirm at least python 3.5 is working
pudo 6cd1222
Readd Python 3.4 to Travis
StevenMaude 506269e
Fix missing comma in setup.py
StevenMaude 6638e58
Fix byte concatenation in Python 3.4
StevenMaude 126630d
Merge branch 'master' into cleanup-mt2-redux
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,7 +1,12 @@ | ||
| *.swp | ||
| *.egg-info | ||
| *.pyc | ||
| *.eggs | ||
| *.DS_Store | ||
| */_build/* | ||
| *.py~ | ||
| *.~lock.*# | ||
| .coverage | ||
| dist/* | ||
| .tox/* | ||
| pyenv3 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,10 +1,4 @@ | ||
| run: build | ||
| @docker run \ | ||
| --rm \ | ||
| -ti \ | ||
| messytables | ||
| test: | ||
| nosetests --with-coverage --cover-package=messytables --cover-erase | ||
|
|
||
| build: | ||
| @docker build -t messytables . | ||
|
|
||
| .PHONY: run build | ||
| .PHONY: run build test |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,15 +1,11 @@ | ||
| # Parsing for messy tables | ||
|
|
||
| [](https://travis-ci.org/okfn/messytables) | ||
| [](https://coveralls.io/r/okfn/messytables?branch=master) | ||
| [](https://pypi.python.org/pypi/messytables/) | ||
| # Parsing for messy tables [](https://travis-ci.org/okfn/messytables) [](https://coveralls.io/r/okfn/messytables?branch=master) [](https://pypi.python.org/pypi/messytables/) | ||
|
|
||
| A library for dealing with messy tabular data in several formats, guessing types and detecting headers. | ||
|
|
||
| See the documentation at: https://messytables.readthedocs.io | ||
|
|
||
| Find the package at: https://pypi.python.org/pypi/messytables | ||
|
|
||
| See CONTRIBUTING.md for how to send patches, run tests. | ||
| See ``CONTRIBUTING.md`` for how to send patches, run tests. | ||
|
|
||
| **Contact**: Open Knowledge Labs - http://okfnlabs.org/contact/. We especially recommend the forum: http://discuss.okfn.org/category/open-knowledge-labs/ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,25 +1,21 @@ | ||
|
|
||
| from messytables.util import offset_processor, null_processor | ||
| from messytables.headers import headers_guess, headers_processor, headers_make_unique | ||
| from messytables.headers import headers_guess, headers_processor | ||
| from messytables.headers import headers_make_unique | ||
| from messytables.types import type_guess, types_processor | ||
| from messytables.types import StringType, IntegerType, FloatType, \ | ||
| DecimalType, DateType, DateUtilType, BoolType | ||
| from messytables.error import ReadError | ||
|
|
||
| from messytables.core import Cell, TableSet, RowSet, seekable_stream | ||
| from messytables.commas import CSVTableSet, CSVRowSet | ||
| from messytables.buffered import seekable_stream | ||
| from messytables.core import Cell, TableSet, RowSet | ||
| from messytables.commas import CSVTableSet, CSVRowSet, TSVTableSet | ||
| from messytables.ods import ODSTableSet, ODSRowSet | ||
| from messytables.excel import XLSTableSet, XLSRowSet | ||
|
|
||
| # XLSXTableSet has been deprecated and its functionality is now provided by | ||
| # XLSTableSet. This is to retain backwards compatibility with anyone | ||
| # constructing XLSXTableSet directly (rather than using any_tableset) | ||
| XLSXTableSet = XLSTableSet | ||
| XLSXRowSet = XLSRowSet | ||
|
|
||
| from messytables.zip import ZIPTableSet | ||
| from messytables.html import HTMLTableSet, HTMLRowSet | ||
| from messytables.pdf import PDFTableSet, PDFRowSet | ||
| from messytables.any import any_tableset, AnyTableSet | ||
| from messytables.any import any_tableset | ||
|
|
||
| from messytables.jts import rowset_as_jts, headers_and_typed_as_jts | ||
|
|
||
| import warnings | ||
| warnings.filterwarnings('ignore', "Coercing non-XML name") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,89 @@ | ||
| import io | ||
|
|
||
| BUFFER_SIZE = 4096 | ||
|
|
||
|
|
||
| def seekable_stream(fileobj): | ||
| try: | ||
| fileobj.seek(0) | ||
| # if we got here, the stream is seekable | ||
| return fileobj | ||
| except: | ||
| # otherwise seek failed, so slurp in stream and wrap | ||
| # it in a BytesIO | ||
| return BufferedFile(fileobj) | ||
|
|
||
|
|
||
| class BufferedFile(object): | ||
| """A buffered file that preserves the beginning of a stream.""" | ||
|
|
||
| def __init__(self, fp, buffer_size=BUFFER_SIZE + 2): | ||
| self.data = io.BytesIO() | ||
| self.fp = fp | ||
| self.offset = 0 | ||
| self.len = 0 | ||
| self.fp_offset = 0 | ||
| self.buffer_size = buffer_size | ||
|
|
||
| def _next_line(self): | ||
| try: | ||
| return self.fp.readline() | ||
| except AttributeError: | ||
| return next(self.fp) | ||
|
|
||
| def _read(self, n): | ||
| return self.fp.read(n) | ||
|
|
||
| @property | ||
| def _buffer_full(self): | ||
| return self.len >= self.buffer_size | ||
|
|
||
| def readline(self): | ||
| if self.len < self.offset < self.fp_offset: | ||
| raise BufferError('Line is not available anymore') | ||
| if self.offset >= self.len: | ||
| line = self._next_line() | ||
| self.fp_offset += len(line) | ||
|
|
||
| self.offset += len(line) | ||
|
|
||
| if not self._buffer_full: | ||
| self.data.write(line) | ||
| self.len += len(line) | ||
| else: | ||
| line = self.data.readline() | ||
| self.offset += len(line) | ||
| return line | ||
|
|
||
| def read(self, n=-1): | ||
| if n == -1: | ||
| # if the request is to do a complete read, then do a complete | ||
| # read. | ||
| self.data.seek(self.offset) | ||
| return self.data.read(-1) + self.fp.read(-1) | ||
|
|
||
| if self.len < self.offset < self.fp_offset: | ||
| raise BufferError('Data is not available anymore') | ||
| if self.offset >= self.len: | ||
| byte = self._read(n) | ||
| self.fp_offset += len(byte) | ||
|
|
||
| self.offset += len(byte) | ||
|
|
||
| if not self._buffer_full: | ||
| self.data.write(byte) | ||
| self.len += len(byte) | ||
| else: | ||
| byte = self.data.read(n) | ||
| self.offset += len(byte) | ||
| return byte | ||
|
|
||
| def tell(self): | ||
| return self.offset | ||
|
|
||
| def seek(self, offset): | ||
| if self.len < offset < self.fp_offset: | ||
| raise BufferError('Cannot seek because data is not buffered here') | ||
| self.offset = offset | ||
| if offset < self.len: | ||
| self.data.seek(offset) |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.