tabula extract table from pdf remove line break

You need to add a parameter. Replace

file1 = "path_to_pdf_file"
table = tabula.read_pdf(file1,pages=1)
table[0]

with

file1 = "path_to_pdf_file"
table = tabula.read_pdf(file1,pages=1, lattice = True)
table[0]

All this according to the documention here

Here is an example:

Se the article "https://effectivehealthcare.ahrq.gov/sites/default/files/pdf/methods-guidance-tests-bias_methods.pdf"

import tabula
import io
import pandas as pd

file1 = r"C:\Users\s-degossondevarennes\.......\Desktop\methods-guidance-tests-bias_methods.pdf"
table = tabula.read_pdf(file1,pages=3,lattice=True, )

df = table[0]
df = df.drop(['Unnamed: 1','Unnamed: 2','Description','Unnamed: 3'],axis=1)
df

returns:

     Unnamed: 0  \
0                                    NaN   
1                        Spectrum effect   
2                           Context bias   
3                         Selection bias   
4                                    NaN   
5            Variation in test execution   
6           Variation in test technology   
7                      Treatment paradox   
8               Disease progression bias   
9                                    NaN   
10     Inappropriate reference\rstandard   
11        Differential verification bias   
12             Partial verification bias   
13                                   NaN   
14                           Review bias   
15                  Clinical review bias   
16                    Incorporation bias   
17                  Observer variability   
18                                   NaN   
19    Handling of indeterminate\rresults   
20  Arbitrary choice of threshold\rvalue   

                            Source of Systematic Bias  
0                                          Population  
1   Tests may perform differently in various sampl...  
2   Prevalence of the target condition varies acco...  
3   The selection process determines the compositi...  
4                Test Protocol: Materials and Methods  
5   A sufficient description of the execution of i...  
6   When the characteristics of a medical test cha...  
7   Occurs when treatment is started on the basis ...  
8   Occurs when the index test is performed an unu...  
9       Reference Standard and Verification Procedure  
10  Errors of imperfect reference standard bias th...  
11  Part of the index test results is verified by ...  
12  Only a selected sample of patients who underwe...  
13                                     Interpretation  
14  Interpretation of the index test or reference ...  
15  Availability of clinical data such as age, sex...  
16  The result of the index test is used to establ...  
17  The reproducibility of test results is one det...  
18                                           Analysis  
19  A medical test can produce an uninterpretable ...  
20  The selection of the threshold value for the i...  

The three dots in the column Source of Systematic Bias show that everything that was in that cell, with line breaks i considered as a single cell (item), not multiple cells. Another proof of that is

df.iloc[2,1]

returns the cell content:

'Prevalence of the target condition varies according to setting and may affect\restimates of test performance. Interpreters may consider test results to be\rpositive more frequently in settings with higher disease prevalence, which may\ralso affect estimates of test performance.'

There must be something with your pdf. If it's available online, share the link and I'll take a look.