Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 23131

Python : XML file downloaded from S3 full of string escaping characters

$
0
0

I have a number of XML files that I have added to S3 (localstack sever). I can view these files through Cyberduck and they are valid xml files. However, when I download the objects, the XML data is wrapped in double quotes, with each double quote in the document excaped, and each line having \n. I have made sure the response content type is "text/xml".

s3 = boto3.client('s3',                  config=s3_config,                  endpoint_url=endpoint_url,                  aws_access_key_id='foo',                  aws_secret_access_key='bar',                 )try:    r = s3.get_object(Bucket=bucket, Key=key)    return Response(r['Body'].read().decode("utf-8"))except Exception as e:    raise(e)

which results in a respose of

"<rpc-reply xmlns:....">\n<data>\n<configuration>\n    <server>meanwhileinhell</server>\n<security>\n  <group>\n  <name>mih-</name>\n<system>\n            <scripts>\n             ...             ...             ...</configuration>\n</data>\n</rpc-reply>\n"

I cannot seem to ensure this is a raw XML response body, with all of the escaping removed. Here are some of the other implementations I have tried:

from io import BytesIOf = BytesIO()s3.download_fileobj(bucket, key, f)return Response(f.getvalue(), content_type="text/xml")
from xml.etree import ElementTreetree = ElementTree.fromstring(r['Body'].read())return Response(tree)

I have also tried using pickle and BeautifulSoup with no further success. I have not tried this with another type of file such as a jpg, but why can't I get the actual raw binary data from the objects? The files I am downloading are <50KB.


Viewing all articles
Browse latest Browse all 23131

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>