Converter PDF - Table


tabula-py

  • tabula-py is a simple Python wrapper of tabula-java
  • Installation
pip install tabula-py
  • Usage
import tabula

# Read pdf into list of DataFrame
df = tabula.read_pdf("test.pdf", pages='all')

# Read remote pdf into list of DataFrame
df2 = tabula.read_pdf("https://github.com/tabulapdf/tabula-java/raw/master/src/test/resources/technology/tabula/arabic.pdf")

# convert PDF into CSV file
tabula.convert_into("test.pdf", "output.csv", output_format="csv", pages='all')

# convert all PDFs in a directory
tabula.convert_into_by_batch("input_directory", output_format='csv', pages='all)

R Tabulizer

install.packages("tabulizer")
  • Usage
library("tabulizer")
f <- system.file("examples", "data.pdf", package = "tabulizer")
out1 <- extract_tables(f)
str(out1)