Web Crawler


Table of Contents
1. Introduction
2. Brief Overview
3. Methods
4. References
5. Source Code
6. Sample Output


1. Introduction:
The purpose of this document is to outline my design choices when implementing a persistent file system. This discussion of the program code will be divided into the following parts: Brief Overview, FileSystem(), Inode class, create(), delete(), ls(), read() and write(). Design choices, and algorithms used in the program will be discussed in these sections. Choices in data structures and data types will be discussed after. At the end of the code discussion are a list of references


2. Brief Overview:
The goal of this lab assignment is create a persistent file system. The file system is segmented into 128 blocks which are 1KB in size. The first block contains the superblock which keeps track of the free block list and the inodes. In the program, an array is created for the free block list and the inodes. The array for the free blocks is simply an integer array that gets set to 0 if the block is not used, and 1 if the block is used. The inode list is a arraylist of Inode objects. To do so, a new Inode class is created that keep stores of the inode name, size, location (start address of the inode block in the superblock), blockPointers, and used flag (0 if free, 1 otherwise). The first thing the program does when it starts is to read in the superblock from the disk and to initialize the free block list and the inode list. Therefore, all 16 inodes are initialized in the beginning. When a file is created, these inodes are updated and in turn, the inodes updates the disk. The free block list array in the program will also be updated and written to the disk. In the same manner files can be deleted, read, written, and listed using the inode list and the free block list (and writing to the disk after each change).


3. Methods
FileSystem():
This method is called at the beginning of the program. The program reads in the input file and pass to this method the name of the disk file to be read from. The method then reads the first 128 bytes from the disk and populates the freeBlockList (which is created as a global variable. It will also initialized a byte[56] (global variable) of 0s which will be later used in delete). The

method then populates the inodeList by creating 16 inodes. This is done by observing in the bytes corresponding to to the inode’s name, size, location (start address for each inode in superblock), converting them into char[] and ints respectively, and passing them in as parameters in the inode constructure.

Inode Class:
The inode class stores a name, size, bytePointers, used flag, and address. It also have an additional name field in bytes called nameBytes. This offers a more accurate way to compare names which is needed in the read(), write() and delete methods. The main function of the Inode class is to read from and write to the disk file. It contains the following methods:

­ update():
update reads the inode’s block of data from the superblock (56 bytes in length) and uses it to update the inode’s information (name, size.. etc.). Is called by delete() when this inode data block (in the superblock) has been written to the ask as all 0s (deleted).

­ set():
set is called when a file is created. The inode set its information to those that were passed to it (from the parameters) and writes them to disk.

­ getPointers():
this methods reads from the disk the current node’s blockPointer bytes and updates the inodes’ blockPointer array with the information read.

­ addPointer():
called by create() when it allocates a block to the inode. addPointer adds the address of the allocated block into the inode’s blockPointer array (which is passed in the parameters) and writes the updated information into file.

­ getName():
converts the name bytes from the disk into a String to be used for easy printing.

Create():
This method attempts to create a file of a certain size. It first reads in the freeBlockList and count the number of free blocks available. If the number of blocks required for the file is not greater than the number of available blocks, then it cannot be created. Otherwise, a free Inode (which can be found by looping through the Inode array and observing each inode’s used value), is updated to store that file name and file size. The method then allocates blocks to the designated inode by looping through the freeBlockList to find free blocks and then setting them to used (1). It then calls addPointer(x) on the inode. This is done “size” amount of times­­ therefore the method allocated size number of blocks to the inode and links them with the inode’s blockPointer.

Delete():
Delete loops through the inode list and compare the names of each inode in bytes to the char[] array that was passed in also converted to bytes. If there is a match, then that inode’s blocks are freed. This is done by looking at the blockPointer values in the inode’s array. The blockPointer values are the addresses of the blocks in the freeBlocksList. We reference the linked block in the freeBlockList and set its flag in to “0”. The updated list is then written to disk. An empty inode (which was created in FileSystem) is then written to the disk at the start address of the to­be­deleted inode. This inode then calls update() to read its changed bytes from the disk and update (delete) its stored information.

Ls():
This method loops through the Inode list and prints out the name and size of all the Inodes that have its used flag set to 1.

Read():
Read loops through the Inode list to find a name that matches the filename that was passed in (compare in bytes). When a match is found, it looks at the blockPointer of that inode and look at its value at the blockNum value that was passed in. It then reference the block’s address from disk and reads in 1024 bytes. The read bytes are then converted into a char[] and the passed char[] buf is set to equal it.

Write():
Like read(), write looks through the Inode list to find a filename that matches the one that was passed in (in bytes). It will then also look at that inode’s blockPointer at the location of the blockNum value that was passed in. This will give the starting address of a block. The method then converts the passed in char[] buf into bytes and write it into the disk starting at the address of the block obtained.

Data Structures: Arrays (byte[], char[], int[]) are mostly used in this program rather than list (ArrayList, LinkedList.. etc) since they can be more easily converted between each other by using a ByteBuffer. Byte[] arrays are preferred since they do not need to be converted when reading or writing onto the disk.

Data Types:
Enum: Used in the switch() statements in main rather than have multiple if statements. This names the code easier to read and shorter.
RandomAccessFile: Simplifies seeking, reading and writing to and from files

4. References:
Enum Types:
http://docs.oracle.com/javase/tutorial/java/javaOO/enum.html 
ByteBuffer: http://docs.oracle.com/javase/7/docs/api/java/nio/ByteBuffer.html
Converting byte[] to String: http://stackoverflow.com/questions/88838/how-to-convert-strings-to-and-from-utf8-byte-arrays-in-java
Converting int to byte[]: http://stackoverflow.com/questions/6374915/java-convert-int-to-byte-array-of-4-bytes
RandomAccessFiles: http://tutorials.jenkov.com/java-io/randomaccessfile.html


5. Source Code*:
Starter Code

myFileSystem.java

import java.util.*;
import java.io.*;
import java.nio.ByteBuffer;

public class myFileSystem{
  RandomAccessFile disk;

  byte[] freeBlockList = new byte[128];
  Inode[] inodeList = new Inode[16];
  byte[] emptyInode = new byte[56];

  public class Inode {
    char[] name;
    int size;
    int[] blockPointers=new int[8];
    private int start; //start byte location of inode in superblock
    int used; // 0 free, 1 not
    byte[] nameBytes; //use for comparing inodes

    public Inode(ByteBuffer name, int size, int loc) throws Exception{
      this.name=name.asCharBuffer().toString().toCharArray();
      this.size=size;
      start=loc;
      nameBytes=name.array();
      
      getPointers(); //sets the value in the blockPointers array and the use value
    }

    public void update() throws Exception{ //read from disk and set inode values with it
      disk.seek(start);

      ByteBuffer myName = ByteBuffer.allocate(16); //get name bytes
      disk.readFully(myName.array());
      name = myName.asCharBuffer().toString().toCharArray();
      nameBytes = myName.array();

      ByteBuffer mySize = ByteBuffer.allocate(4); //get size bytes
      disk.readFully(mySize.array());
      size = mySize.getInt();

      getPointers(); //sets the value in the blockPointers array and the use value
    }

    public void set(char[] fileName, int size) throws Exception{ //create sets inodes
      disk.seek(start);
      this.name = fileName;
      ByteBuffer fname = ByteBuffer.allocate(16); //translate to bytes
      for (int n=0; n<fileName.length; n++)
        fname.putChar(fileName[n]);

      disk.write(fname.array()); //update disk with name
      nameBytes=fname.array(); //name in bytes

      disk.write(ByteBuffer.allocate(4).putInt(size).array()); //update size in disk
      this.size=size;

      disk.seek(start+52); 
      disk.write(ByteBuffer.allocate(4).putInt(1).array()); //update use flag in disk
      this.used=1; //set used
    }

    public void getPointers() throws Exception{
      int index=0;
      for (int i=(start+20); i<(start+52); i+=4){
        ByteBuffer pointerVal = ByteBuffer.allocate(4);
        disk.seek(i);
        disk.readFully(pointerVal.array()); //get pointer bytes from disk
        
        blockPointers[index]=pointerVal.getInt(); //translate byte to int and set blockPointer
        index++;
      }

      ByteBuffer flag = ByteBuffer.allocate(4); 
      disk.readFully(flag.array()); //get used bytes
      used = flag.getInt(); //set used value
    }

    public void addPointer(int val) throws Exception{ //used in create
      for(int i=0; i<blockPointers.length; i++)
        if(blockPointers[i]==0){
          //System.out.println("Inode"+start/128 + " added block:" +val + " at block index "+i);
          blockPointers[i]=val; //update blockPointers array

          disk.seek(start+20+4*i); //start of blockPointers in disk
          disk.write(ByteBuffer.allocate(4).putInt(val).array()); //update disk 
          break;
        }
    }

    public String getName() throws Exception{ //translate name to string and return
      disk.seek(start);
      ByteBuffer fileName = ByteBuffer.allocate(16);
      disk.read(fileName.array());
      return fileName.asCharBuffer().toString();
    }
  }

  public void FileSystem(char[] diskName) throws Exception{
    String name="";
    for (int i=0; i<diskName.length; i++)
      name=name+diskName[i];

    disk = new RandomAccessFile(name, "rw");
    disk.seek(0);
    disk.readFully(freeBlockList); //populate freeblock list

    for (int j=0; j<emptyInode.length; j++)
      emptyInode[j]=0; //used in delete

    for (int i=0; i<16; i++){
      disk.seek(128+i*56);

      ByteBuffer inodeName = ByteBuffer.allocate(16); 
      disk.read(inodeName.array()); // get the inode names
      ByteBuffer inodeSize = ByteBuffer.allocate(4);
      disk.readFully(inodeSize.array()); //get size

      inodeList[i]= new Inode(inodeName, inodeSize.getInt(), 128+i*56); //add inodes to a list
    }
  }


  public void create(char[] name, int size) throws Exception {  
    System.out.println("C: Creating new file:"+String.valueOf(name)+" " +size);
    int fileSize=size;
    disk.seek(0);
    
    disk.readFully(freeBlockList); //reading in freeblocklist
    int numFreeBlocks=0; //get num free blocks
    for (int j=1; j<freeBlockList.length; j++)
      if(freeBlockList[j]==0)
        numFreeBlocks++;

    ByteBuffer fname = ByteBuffer.allocate(16); //get name in bytes
    for (int n=0; n<name.length; n++)
      fname.putChar(name[n]);

    //-------- does the file name already exist in the file?
    boolean exist=false; 
    for (int ii=0; ii<inodeList.length;ii++) //checks to see if filename already used
      if(Arrays.equals(inodeList[ii].nameBytes,fname.array())){
        System.out.println(" ERROR: A file with that name already exist!");
        exist=true;
        break;
      }

    
    if(numFreeBlocks>size && exist==false && size<=8){ //are there enough data blocks? Is size <=8?
      int freeInode=0; //find freeInode index in array
      
      for (int lookUp=0; lookUp<inodeList.length; lookUp++) //can for free inode in list
        if(inodeList[lookUp].used==0){
          freeInode=lookUp;
          inodeList[freeInode].set(name,size); //if free, update inode with info
          break;
        }

      String allocBlocks="";
      while(fileSize>0) //allocate all blocks to new file
        for (int f=1; f<freeBlockList.length; f++){
          if(fileSize<=0) break; //don't have any more blocks to allocate
          if (freeBlockList[f]==0){ //if block is free
            freeBlockList[f]=1; //then block is no longer free

            inodeList[freeInode].addPointer(f); //set pointers to blocks
            allocBlocks=allocBlocks+f+" ";

            fileSize--; //dec file size
          }
        }
      System.out.println(" Allocated Blocks: "+allocBlocks);
      disk.seek(0);
      disk.write(freeBlockList); // updates free blocks list
    }
  } 

  public void delete(char[] name) throws Exception{
    ByteBuffer fname = ByteBuffer.allocate(16); //get name in bytes
    for (int n=0; n<name.length; n++)
      fname.putChar(name[n]);

    for(int i=0; i<=inodeList.length; i++){ //finds file
      if(Arrays.equals(inodeList[i].nameBytes, fname.array())){ //if equal
        System.out.println("D: Deleting file "+ inodeList[i].getName());

        String allocBlocks="";
        for(int j=0; j<inodeList[i].blockPointers.length; j++) //looks at used blocks in freeBlockList
          if(inodeList[i].blockPointers[j]>0){
            allocBlocks=allocBlocks+inodeList[i].blockPointers[j]+" ";
            //System.out.println("Inode"+i+ " deleted block: "+(inodeList[i].blockPointers[j]));
            freeBlockList[(inodeList[i].blockPointers[j])]=0; //set blocks to not used
            
          }
        System.out.println(" Deallocated Blocks: "+allocBlocks);
        disk.seek(0);
        disk.write(freeBlockList); //update freeBlock list in disk
        disk.seek(inodeList[i].start);
        disk.write(emptyInode); //clear inode data from disk
        inodeList[i].update(); //clear inode data in program
        break;
      }
    }
  }

  public void ls() throws Exception{
    System.out.println("L: Listing Files"); //loops through inodeList array and prints
    for (int i=0; i<inodeList.length; i++)
      if(inodeList[i].used==1) //print inode if only it is used. 
        System.out.println(" File Name: "+inodeList[i].getName()+"; File Size: "+inodeList[i].size);
  }

  public void read(char[] name, int blocknum, char[] buf) throws Exception{
    ByteBuffer fname = ByteBuffer.allocate(16); //name in bytes
    for (int n=0; n<name.length; n++)
      fname.putChar(name[n]);

    for(int i=0; i<=inodeList.length; i++){
      if(Arrays.equals(inodeList[i].nameBytes, fname.array())){ //if equal we have located inode
        System.out.println("R: Reading Block "+blocknum+" of "+inodeList[i].getName());
        if(blocknum<inodeList[i].size){
          int blockIndex = inodeList[i].blockPointers[blocknum]; //find value of the blockPointer array at index of blocknum
          System.out.println(" --Disk Address of Block: "+blockIndex);
          
          ByteBuffer theBlock = ByteBuffer.allocate(1024); 
          disk.seek(blockIndex*1024); 
          disk.readFully(theBlock.array()); //theBlock now contains all the bytes in blockNum
          
          buf = theBlock.asCharBuffer().toString().toCharArray(); //sets buf
          System.out.println(" READ: "+theBlock.asCharBuffer().toString()); //print out read bytes
        }
        else
          System.out.println(" ERROR: "+inodeList[i].getName()+" does not have that many blocks");
        break;
        }
      }
  } 

  public void write(char[] name, int blockNum, char[] buf) throws Exception{
    ByteBuffer fname = ByteBuffer.allocate(16); //name in bytes
    for (int n=0; n<name.length; n++)
      fname.putChar(name[n]);

    for(int i=0; i<=inodeList.length; i++){
      if(Arrays.equals(inodeList[i].nameBytes, fname.array())){ //if equal we have located inode
        System.out.println("W: Writing Block" +blockNum +" of "+inodeList[i].getName());
        
        if(blockNum<inodeList[i].size){
          int blockIndex = inodeList[i].blockPointers[blockNum]; //disk addr
          System.out.println(" --Disk Address of Block: "+blockIndex);
          
          disk.seek(blockIndex*1024);
          ByteBuffer theBlock=ByteBuffer.allocate(1024); //use for conversion
          for (int f=0; f<buf.length; f++)
            theBlock.putChar(buf[f]);
          
          disk.write(theBlock.array()); //writes the char[] into file
          System.out.println(" WROTE: " +new String(theBlock.array(), "US-ASCII"));
        }
        else
          System.out.println(" ERROR: "+inodeList[i].getName()+" does not have that many blocks");
      break;
      }
    }

  }

  public enum Commands{ C, L, D, R, W } //used in switch() in main

  public static void main (String[] args) throws Exception{
    myFileSystem fs = new myFileSystem();
    
    FileInputStream input = new FileInputStream("testFile.txt"); 
    BufferedReader in = new BufferedReader(new InputStreamReader(input)); //used to read file
    
    String diskName = in.readLine(); //first line is diskName
    fs.FileSystem(diskName.toCharArray()); //set FileSystem
    
    char[] buf = new char[1024]; //used as 3rd param in R
    char[] hello = {'h','e','l','l','o'}; //for testing purposes of W

    String str;
    while((str = in.readLine())!=null){ //read in all other lines
      String[] createMe = str.split(" ");
      Commands enumVal = Commands.valueOf(createMe[0]); //make first letter from file a enum

      /*note: for testing purposes, parameters 3 of R and W are filled in */
      switch (enumVal) {
        case C: fs.create(createMe[1].toCharArray(), Integer.parseInt(createMe[2])); break;
        case L: fs.ls();break;
        case D: fs.delete(createMe[1].toCharArray()); break;
        case R: fs.read(createMe[1].toCharArray(), Integer.parseInt(createMe[2]), buf); break;
        case W: fs.write(createMe[1].toCharArray(), Integer.parseInt(createMe[2]), hello); break;
      }

    }

  }

}


Sample Output

#include 
1x-vl909-128-119-20-255:src xianchen$ java myFileSystem
C: Creating new file:file1.c 3
 Allocated Blocks: 1 2 3 
C: Creating new file:file2.c 8
 Allocated Blocks: 4 5 6 7 8 9 10 11 
C: Creating new file:file3.c 4
 Allocated Blocks: 12 13 14 15 
C: Creating new file:a.out 5
 Allocated Blocks: 16 17 18 19 20 
C: Creating new file:lab1.jav 6
 Allocated Blocks: 21 22 23 24 25 26 
L: Listing Files
 File Name: file1.c; File Size: 3
 File Name: file2.c; File Size: 8
 File Name: file3.c; File Size: 4
 File Name: a.out; File Size: 5
 File Name: lab1.jav; File Size: 6
W: Writing Block0 of file1.c
 --Disk Address of Block: 1
 WROTE: hello
W: Writing Block1 of file1.c
 --Disk Address of Block: 2
 WROTE: hello
W: Writing Block2 of file1.c
 --Disk Address of Block: 3
 WROTE: hello
W: Writing Block3 of file2.c
 --Disk Address of Block: 7
 WROTE: hello
W: Writing Block7 of file2.c
 --Disk Address of Block: 11
 WROTE: hello
W: Writing Block2 of file2.c
 --Disk Address of Block: 6
 WROTE: hello
W: Writing Block4 of file2.c
 --Disk Address of Block: 8
 WROTE: hello
W: Writing Block5 of file2.c
 --Disk Address of Block: 9
 WROTE: hello
W: Writing Block0 of a.out
 --Disk Address of Block: 16
 WROTE: hello
W: Writing Block1 of a.out
 --Disk Address of Block: 17
 WROTE: hello
W: Writing Block2 of a.out
 --Disk Address of Block: 18
 WROTE: hello
W: Writing Block3 of a.out
 --Disk Address of Block: 19
 WROTE: hello
W: Writing Block4 of a.out
 --Disk Address of Block: 20
 WROTE: hello
D: Deleting file file3.c
 Deallocated Blocks: 12 13 14 15 
R: Reading Block 2 of file1.c
 --Disk Address of Block: 3
 READ: hello
C: Creating new file:file4.c 7
 Allocated Blocks: 12 13 14 15 27 28 29 
L: Listing Files
 File Name: file1.c; File Size: 3
 File Name: file2.c; File Size: 8
 File Name: file4.c; File Size: 7
 File Name: a.out; File Size: 5
 File Name: lab1.jav; File Size: 6
R: Reading Block 4 of file2.c
 --Disk Address of Block: 8
 READ: hello
R: Reading Block 5 of file2.c
 --Disk Address of Block: 9
 READ: hello
R: Reading Block 6 of file2.c
 --Disk Address of Block: 10
 READ: 
D: Deleting file lab1.jav
 Deallocated Blocks: 21 22 23 24 25 26 
C: Creating new file:lab2.jav 7
 Allocated Blocks: 21 22 23 24 25 26 30 
R: Reading Block 1 of a.out
 --Disk Address of Block: 17
 READ: hello
R: Reading Block 3 of a.out
 --Disk Address of Block: 19
 READ: hello
R: Reading Block 0 of a.out
 --Disk Address of Block: 16
 READ: hello
L: Listing Files
 File Name: file1.c; File Size: 3
 File Name: file2.c; File Size: 8
 File Name: file4.c; File Size: 7
 File Name: a.out; File Size: 5
 File Name: lab2.jav; File Size: 7