|
|
Subject:
Handling HTML entities in Java's HTMLEditorKit
Category: Computers > Programming Asked by: vespasian-ga List Price: $10.00 |
Posted:
26 Apr 2003 14:02 PDT
Expires: 07 May 2003 23:51 PDT Question ID: 195868 |
|
There is no answer at this time. |
|
Subject:
Re: Handling HTML entities in Java's HTMLEditorKit
From: eadfrith-ga on 26 Apr 2003 19:10 PDT |
How are you invoking the parse operation? It sounds like your problem is that the DTD is not being set on the parser - the parser uses the DTD to resolve entities found in the source - with no DTD the entities can't be resolved. This could be happen if you're creating your own Parser, (a DocumentParser maybe?). Anyway, this code will create a parser with a HTML DTD set. Try it and see if it fixes your problem. import javax.swing.text.html.parser.ParserDelegator; Parser p = new ParserDelegator(); p.parse(reader, yourParserCallback, true); Cheers, Eadfrith |
Subject:
Re: Handling HTML entities in Java's HTMLEditorKit
From: saman007uk20-ga on 27 Apr 2003 01:52 PDT |
I suggest u add a backslash before the & , so it would read: "\ ". This will tell the java to ignore &as a special character and just print it out. Regards, Saman007uk20 |
Subject:
Re: Handling HTML entities in Java's HTMLEditorKit
From: eadfrith-ga on 27 Apr 2003 12:30 PDT |
Vespasian, So, the problem wasn't that a DTD wasn't being set. I dug a little deeper and the problem is that as it parses the html the parser (in fact a DocumentParser) uses its DTD to resolve entities, via the 2 forms of the getEntity() method in the DTD class. It seems that if these methods return null, indicating that they don't recognise the entity, then the parser outputs the entity unchanged, which is what you want. So, if we could plug in our own DTD that returned null when asked for an entity then we'd be OK. The problem is that the whole html parser API is very unfriendly and it's pretty tricky to change the DTD that it uses. I've hacked together the code below which attempts to replace the standard DTD with our own that overrides the getEntity method to return null. import java.io.ByteArrayInputStream; import java.io.BufferedInputStream; import java.io.InputStreamReader; import javax.swing.text.html.parser.DTD; import javax.swing.text.html.parser.Parser; import javax.swing.text.html.parser.Entity; import javax.swing.text.html.parser.ParserDelegator; import javax.swing.text.html.parser.DocumentParser; import javax.swing.text.html.HTMLEditorKit.ParserCallback; /** * The DTDEntityFilter class extends DTD but returns null when * asked to resolve an entity. */ public class DTDEntityFilter extends DTD { // Singleton instance private static DTD c_instance; // Use lazy instantiation to create singlton public static DTD getInstance() { if(null == c_instance) { DTD dtd = new DTDEntityFilter(); c_instance = ParserDelegatorExt.createDTD(dtd, "html32"); } return c_instance; } /** * We have to extend the ParserDelegator class in order to access * the createDTD method, which is protected. */ static class ParserDelegatorExt extends ParserDelegator { public static DTD createDTD(DTD dtd, String name) { return ParserDelegator.createDTD(dtd, name); } } public DTDEntityFilter() { super("html32"); } public Entity getEntity(int ch) { return null; } public Entity getEntity(String name) { return null; } public static void main(String[] args) { ByteArrayInputStream responseStream = new ByteArrayInputStream(response); BufferedInputStream bufResponse = new BufferedInputStream(responseStream); ParserCallback callback = new YourParserCallbackClass(); DocumentParser parser = new DocumentParser(DTDEntityFilter.getInstance()); try { parser.parse(new InputStreamReader(bufResponse), callback, true); } catch (Exception e) { e.printStackTrace(); } } } Let me know if this works. Cheers, Eadfrith |
Subject:
Re: Handling HTML entities in Java's HTMLEditorKit
From: eadfrith-ga on 27 Apr 2003 18:55 PDT |
Vespasian, I'm not a Google researcher, so this one is on the house :-) Too bad that we couldn't solve the numeric entity problem. Sun's implementation of the html parser is pretty lousy. As you've discovered, it never calls getEntity(int). Instead, it just resolves the entities by replacing them with a single character whose Unicode value is that specified by the entity. This is buried in a private method in the Parser implemetation, so we can't alter the standard behaviour. It would have been far better if they had called getEntity(int) and put the same code in the default implementation of this method in the DTD class. This would have let us modify the behaviour, as we have with textual entites. Anyway, I think you now have 2 two options: 1. Have your handleText method re-encode the entities. Presumably your current handleText method does something simple like this: public void handleText(char[] data, int pos) { responseWriter.write(data); } you could do something like this instead: public void handleText(char[] data, int pos) { for (int i=0; i<data.length; i++) { if(data[i] > 128) { responseWriter.write("&#" + data[i] + ";"); } else { responseWriter.write(data, i, 1); } } } The problem with this approach is that if characters that have textual entities were instead encoded using numeric entites then you'd miss them. For example, if the html source used < instead of < then you'd incorrectly pass it through unencoded. I don't think there's anything to be done about this. 2. The other approach is to reconsider saman007uk20's solution. You could do this by wrapping the input stream in a filter and escape all '&' characters before they get read by the parser. You could use BufferedReader to get started. Let me know if you want to explore this solution. Cheers, Eadfrith |
If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you. |
Search Google Answers for |
Google Home - Answers FAQ - Terms of Service - Privacy Policy |