Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to parse XML if they are any wrong characters or wrong UTF #2269

Closed
stephaniemartin7 opened this issue Mar 4, 2023 · 4 comments
Closed

Comments

@stephaniemartin7
Copy link

Some of the UTF that might be returned from an API call might be invalid for Karate to process.
I was getting the error: SAXParseException; 0; Content is not allowed in prolog. when parsing XML when trying to parse a XML from a API.

Even though I tried to replace part of the String, I could not get it to work.

My solution was to create a Java function and call it in my karate file. The Java function replace all invalid characters, fix the UTF and remove the namespaces making it easier to parse the XML.

In my .feature file, after making a call to an API, I am calling my Java function with:
* xml response = cleanUpKarateXml(response)

My Java function in the backend looks like:

   public static String cleanUpKarateXml(Object oldxml) {
        try {
             String xml = oldxml.toString()
             TransformerFactory tf = TransformerFactory.newInstance();
             Transformer transformer = tf.newTransformer();
             transformer.setOutputProperty( OutputKeys.METHOD, "xml" );
             transformer.setOutputProperty( OutputKeys.INDENT, "false" );
            DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
            DocumentBuilder builder = factory.newDocumentBuilder();
            InputSource inputSource = new InputSource(new StringReader(xml));
            Document xmlDoc = builder.parse(inputSource);
            Node root = xmlDoc.getDocumentElement();
            NodeList rootchildren = root.getChildNodes();
            Element newroot = xmlDoc.createElement(root.getNodeName());
            for (int i = 0; i < rootchildren.getLength(); i++) {
                newroot.appendChild(rootchildren.item(i).cloneNode(true));
            }
            xmlDoc.replaceChild(newroot, root);
            DOMSource requestXMLSource = new DOMSource( xmlDoc.getDocumentElement() );
            StringWriter requestXMLStringWriter = new StringWriter();
            StreamResult requestXMLStreamResult = new StreamResult( requestXMLStringWriter );            
            transformer.transform( requestXMLSource, requestXMLStreamResult );
            String modifiedRequestXML = requestXMLStringWriter.toString();

            return modifiedRequestXML;
        } catch (Exception e) {
            System.out.println("Could not parse message as xml: " + e.getMessage());
        }
        return "";
    }

I am raising this issue because I believe having this function as part of Karate would be very helpful to a lot of people

@ptrthomas
Copy link
Member

@stephaniemartin7 thanks for raising this

I just remembered this thread, can you see if this makes a difference: #1587 (comment)

* configure xmlNamespaceAware = true

@ptrthomas
Copy link
Member

closing this as wont fix and since there is no response - anyone is welcome to contribute a PR against this issue though

@stephaniemartin7
Copy link
Author

@ptrthomas This did not fix the issue if you'd like to reopen

@ptrthomas
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants