RU beehive logo ITEC dept promo banner
ITEC 120
2008fall
aaray,
ejderrick,
ibarland,
jmdymacek

homeinfolabshwsexams
textbookjava.lang docsjava.util docsarchive

hw08-ec-web
extra-credit: latinizing web pages

Challenge Extra credit: Write a method which can translate a web page into pig latin.
It takes two Strings as input — the URL to read from, and the name of a file to write the translated html to.
(You'll be able to open that file from a browser, and view your result!)
(This is an extra-credit followup to hw08—Translating many words.)

It turns out, having a scanner read from a web page (rather than System.in) is easy. However, web pages are more than just words of text; web pages contain markup to indicate the structure of the document (where emphasized text should start and end, where to insert horizontal lines, etc.). Thus, web pages are written in “HTML” — “hypertext markup language”. We need to translate the regular information into pig latin, but leave this markup information untouched.

More information you need is here.

This requires some knowledge about

  1. how to make a Scanner which reads from a web page,
  2. and
  3. how to write to a file instead of to a the console.
We'll mention those below.

As promised, here's some library-specific information:

  1. To create a Scanner which reads from (say) the RU home page rather than from the keyboard System.in,
       java.util.Scanner s = new java.util.Scanner( new java.net.URL("http://www.radford.edu/").openStream() );
    
    Note that as before, the Scanner method hasNext will always return true as long as there is more input to read; it only returns false once the entire web page has been read.
  2. To write to a file instead of the console window System.out,
    java.io.PrintStream myOut = new java.io.PrintStream( new java.io.File( "H:/oinkayOinkay.html" ) );
    // Now, you can say:
    myOut.println("hello");
    
    // Before our program quits, we must close the file1:
    myOut.close();
    
  3. Important: in order for either of the above two to compile, we need to add some information about Exceptions (errors); exceptions will be discussed further in ITEC220.
    throws java.net.MalformedURLException, java.io.FileNotFoundException, java.io.IOException
    
    (More on this coming soon.)
  4. To actually see your translated page, you'll need to use a web browser, select File > Open File…, and open the disk-file your program just printed its output to.

A final note: our html-processing is oblivous to the actual structure of the markup. A proper approach would be more sophisticated, reading the structure of the markup, and then process the resulting tree.


1Well, technically, it's the java.io.PrintStream which we must close.      

homeinfolabshwsexams
textbookjava.lang docsjava.util docsarchive


©2008, Ian Barland, Radford University
Last modified 2008.Nov.04 (Tue)
Please mail any suggestions
(incl. typos, broken links)
to iba�rlandrad�ford.edu
Powered by PLT Scheme