Thursday, January 21, 2016

Converting A Paragraph Into Sentences

This simple program below splits a paragraph into sentences. It does not use regular expresssions and pattern matching techniques.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
public class ParToSent {
    public static void main(String[] args){
        String st="Each programming language has its own advantages and disadvantages. A wide variety of open source applications are written in PHP. ASP offers a much smaller learning curve. JSP provides all the features of Java.";
        
        //Counting the number of sentences in the paragraph.
        int counter=0;
        for(int i=0;i<st.length();i++){
            if(st.charAt(i)=='.'){
                counter++;
            }
        }
        
        String[] sent=new String[counter];
        String stemp=st;
        int firstindex=0;
        for(int i=0;i<sent.length;i++){
            int lastindex=stemp.indexOf('.');
            sent[i]=stemp.substring(firstindex,lastindex).trim();
            stemp=stemp.substring(lastindex+1);
        }
        
        for(int i=0;i<sent.length;i++){
            System.out.println(sent[i]);
        }
    }
}


The variable counter holds the number of sentences in the paragraph. It is used for array declaration.

int lastindex=stemp.indexOf('.');

This code finds the index of the last character ('.') of each sentence. The result is stored in the variable lastindex.

sent[i]=stemp.substring(firstindex,lastindex).trim();

When the last index of the first sentence is found, the sentence will be taken from the paragraph and stored in an array. substring() method is used for this purpose. Using the for( ; ; ;) statement, this occurs until the last sentence in the paragraph is found and stored in the array. The trim() method eliminates whitespaces in the sentence.

stemp=stemp.substring(lastindex+1);

Everytime a sentence is found and stored in the array, it will be removed from the paragraph. The second sentence becomes the first. The third becomes the second, and so on. The last index of the first sentence therefore changes.

System.out.println(sent[i]);

Finally, all the sentences stored in the array are displayed on the screen.

Replacing Words With Their Synonyms

This program changes certain words in a sentence with their synonyms. A simple mapping utilizes two arrays. The array word holds the words that will be changed. The array syn stores the synonym of each word (syn[0] corresponds to word[0]).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
public class Synonym {
    public static void main(String[] args){
        String sentence="A PDA is an invaluable tool to help manage our life.";
        String[] word={"PDA","tool","life"};
        String[] syn={"handheld organizer","device","daily tasks"};
        
        for(int i=0;i<word.length;i++){
            if(sentence.contains(word[i])){
                sentence=sentence.replace(word[i],syn[i]);
            }
        }
        
        System.out.println(sentence);
    }
}


HashMap

The following program utilizes HashMap to do the same thing.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import java.util.*;

public class Synonym2 {
    public static void main(String[] args){
        Map map = new HashMap();
        map.put("PDA","handheld organizer");
        map.put("tool","device");
        map.put("life","daily tasks");
        
        ArrayList<String> lst=new ArrayList<String>();
        String sentence="A PDA is an invaluable tool to help manage our life.";
        
        StringTokenizer st=new StringTokenizer(sentence," ");
        while(st.hasMoreTokens()){
            String temp=st.nextToken();
            if(temp.endsWith(("."))){
                temp=temp.substring(0,temp.length()-1);
            }
            lst.add(temp);
        }
        
        for(int i=0;i<lst.size();i++){
            for(int j=0;j<map.size();j++){
                if(map.get(lst.get(i))!=null){
                    String stemp=map.get(lst.get(i)).toString();
                    sentence=sentence.replace(lst.get(i),stemp);
                }
            }
        }
        
        System.out.println(sentence);
    }
}


Map map = new HashMap();
map.put("PDA","handheld organizer");
map.put("tool","device");
map.put("life","daily tasks");

HashMap is a class that implements Map interface. It stores key-value pairs (like a dictionary). The put method adds elements. The first argument is the key and the second is the value linked to the key.

StringTokenizer st=new StringTokenizer(sentence," ");
while(st.hasMoreTokens()){
     String temp=st.nextToken();
     if(temp.endsWith(("."))){
          temp=temp.substring(0,temp.length()-1);
      }
      lst.add(temp);
}

StringTokenizer is used to split the sentence into words. It breaks a string into tokens (space character in this case). The hasMoreTokens() detects if there is still a space character in the sentence that is not yet processed. The nextToken() method returns returns the string value of the next token. The value is given to a temporary String named "temp".

if(temp.endsWith(("."))){
          temp=temp.substring(0,temp.length()-1);
}
lst.add(temp);

This code detects if there is a dot (".") in the word that has just been extracted. If dot (".") is found, it will be removed from the word using substring method. All the extracted words are stored in an ArrayList.

for(int i=0;i<lst.size();i++){
     for(int j=0;j<map.size();j++){
          if(map.get(lst.get(i))!=null){
               String stemp=map.get(lst.get(i)).toString();
               sentence=sentence.replace(lst.get(i),stemp);
          }
     }
}

These lines of code perform the replacement process. The first loop iterates through the ArrayList. The second loop iterates through the HashMap. Put it simply, all the words in the sentence are compared with all the keys in the HashMap. If there is a match, the corresponding value is retrieved and the replacement process begins.