diff --git a/README.md b/README.md index 829d6f3..e8199ea 100644 --- a/README.md +++ b/README.md @@ -355,7 +355,7 @@ Note: The methods above are built on Pillow's [`ImageDraw` methods](http://pillo ## Extracting tables -`pdfplumber`'s approach to table detection borrows heavily from [Anssi Nurminen's master's thesis](http://dspace.cc.tut.fi/dpub/bitstream/handle/123456789/21520/Nurminen.pdf?sequence=3), and is inspired by [Tabula](https://github.com/tabulapdf/tabula-extractor/issues/16). It works like this: +`pdfplumber`'s approach to table detection borrows heavily from [Anssi Nurminen's master's thesis](https://trepo.tuni.fi/bitstream/handle/123456789/21520/Nurminen.pdf?sequence=3), and is inspired by [Tabula](https://github.com/tabulapdf/tabula-extractor/issues/16). It works like this: 1. For any given PDF page, find the lines that are (a) explicitly defined and/or (b) implied by the alignment of words on the page. 2. Merge overlapping, or nearly-overlapping, lines.