|
home—info—labs—hws—exams
textbook—java.lang docs—java.util docs—archive
Challenge Extra credit:
Write a method which can translate a web page into pig latin.
It takes two Strings as input —
the URL to read from, and the name of a file to write the translated html to.
(You'll be able to open that file from a browser, and view your result!)
(This is an extra-credit followup to hw08—Translating many words.)
It turns out, having a scanner read from a web page (rather than System.in) is easy. However, web pages are more than just words of text; web pages contain markup to indicate the structure of the document (where emphasized text should start and end, where to insert horizontal lines, etc.). Thus, web pages are written in “HTML” — “hypertext markup language”. We need to translate the regular information into pig latin, but leave this markup information untouched.
More information you need is here.
This requires some knowledge about
As promised, here's some library-specific information:
java.util.Scanner s = new java.util.Scanner( new java.net.URL("http://www.radford.edu/").openStream() ); |
java.io.PrintStream myOut = new java.io.PrintStream( new java.io.File( "H:/oinkayOinkay.html" ) ); // Now, you can say: myOut.println("hello"); // Before our program quits, we must close the file1: myOut.close(); |
throws java.net.MalformedURLException, java.io.FileNotFoundException, java.io.IOException |
A final note: our html-processing is oblivous to the actual structure of the markup. A proper approach would be more sophisticated, reading the structure of the markup, and then process the resulting tree.
1Well, technically, it's the java.io.PrintStream which we must close. ↩
home—info—labs—hws—exams
textbook—java.lang docs—java.util docs—archive
©2008, Ian Barland, Radford University Last modified 2008.Nov.04 (Tue) |
Please mail any suggestions (incl. typos, broken links) to ibarlandradford.edu |