Why printing is into two lines?

one question about my code below:

#Python code to scrape the shipment URLs 
from bs4 import BeautifulSoup
import urllib.request
import urllib.error
import urllib

# read urls of websites from text file > change it to where you stock the file
list_open = open(r"C:\***\data.csv")
#skips the header
read_list  = list_open.readlines()[1:]
file_path = "ShipmentUpdates.txt"

for url in read_list:
    soup = BeautifulSoup(urllib.request.urlopen(url).read(), "html5lib")
    # parse shipment info
    shipment = soup.find_all("span")
    Preparation = shipment[0]
    Sent = shipment[1]
    InTransit = shipment[2]
    Delivered = shipment[3]

    with open(file_path, "a") as f:
        line= f"{url} ; Preparation {Preparation.getText()}; Sent {Sent.getText()}; InTransit {InTransit.getText()}; Delivered {Delivered.getText()}"
        print (line)
        f.write(line+'\n')

my output is:

http://carmoov.fr/CfQd
 ; Preparation on 06/01/2022 at 17:45; Sent on 06/01/2022 at 18:14; InTransit ; Delivered on 07/01/2022 at 10:31
http://carmoov.fr/CfQz
 ; Preparation on 06/01/2022 at 11:18; Sent on 06/01/2022 at 18:14; InTransit ; Delivered on 07/01/2022 at 11:56

But I want "URL" is in the same row as shipment info.

Could you help to check what is the problem? Thank you!


Solution 1:

This most likely happens because the URLs have a newline character \n at their end, because you read them as individual lines from list_open. To prevent this, replace {url} with {url.strip()} in your format-string.

Solution 2:

Just use url = url.strip(). This will removed the newline character '\n':

# read urls of websites from text file > change it to where you stock the file
list_open = open(r"C:\***\data.csv")
#skips the header
read_list  = list_open.readlines()[1:]
file_path = "ShipmentUpdates.txt"

for url in read_list:
    soup = BeautifulSoup(urllib.request.urlopen(url).read(), "html5lib")
    # parse shipment info
    shipment = soup.find_all("span")
    Preparation = shipment[0]
    Sent = shipment[1]
    InTransit = shipment[2]
    Delivered = shipment[3]
    url = url.strip()
    with open(file_path, "a") as f:
        line= f"{url} ; Preparation {Preparation.getText()}; Sent {Sent.getText()}; InTransit {InTransit.getText()}; Delivered {Delivered.getText()}"
        print (line)
        f.write(line+'\n')

Solution 3:

An alternative answer would be to use list comprehension to strip each url right as you get them:

list_open = open(r"C:\***\data.csv")
#skips the header
read_list = (url.strip() for url in list_open.readlines()[1:])

It's important to note that doing this means you can only access read_list once (which you do in you code sample), because this list comprehension creates what's called a generator (mentioned in the previous link).

If you wanted to access read_list again after your initial for-loop, you could convert it to a tuple or list by doing read_list = tuple(url.strip() for url in list_open.readlines()[1:]) or read_list = list(url.strip() for url in list_open.readlines()[1:]).