Welcome, Guest: Register On Nairaland / LOGIN! / Trending / Recent / New
Stats: 3,150,564 members, 7,809,059 topics. Date: Thursday, 25 April 2024 at 10:03 PM

Another Programming Challenge - Programming - Nairaland

Nairaland Forum / Science/Technology / Programming / Another Programming Challenge (2199 Views)

Programming Challenge For Beginners Competition Two N20000 -SEASON 2- / Programming Challenge For Beginners N20000 / Facebook Programming Challenge Question (2) (3) (4)

(1) (2) (Reply) (Go Down)

Another Programming Challenge by ektbear: 12:10pm On Sep 27, 2012
You are given a very large text file, too big to load all of it in memory. It has N lines in it, where N is unknown to you.

You are also given some number K.

Write a program that prints K randomly chosen lines from the text file to the screen.

Try to make your program as efficient as possible.

(there is a slight bit of vagueness in what I mean by "K randomly chosen lines", but don't worry too much about it).
Re: Another Programming Challenge by Javanian: 12:32pm On Sep 27, 2012
If k =3 and the file contains 100 lines...should the printed lines follow each other e.g line 50,51 and 52 or it can be any 3 random lines e.g. 3,45, 82 ??
Re: Another Programming Challenge by ektbear: 12:45pm On Sep 27, 2012
The latter.
Re: Another Programming Challenge by Javanian: 8:13pm On Sep 27, 2012


import java.util.*;
import java.io.*;
public class ReadFile
{
static List list;

public static void main(String[] args)
{
Scanner read = new Scanner(System.in);
System.out.println("Enter the number of lines (k) you want to print"wink;
int k = read.nextInt();

list = new ArrayList();
File file = new File(args[0]);

try
{
FileReader fr = new FileReader(file);
BufferedReader in = new BufferedReader(fr);
String s;

//read each line from the file
s = in.readLine();

while(s!=null)
{

list.add(s);
s = in.readLine();
}
in.close();

}
catch(Exception ex)
{
ex.printStackTrace();
}
printRandom(k);
}

public static void printRandom(int k)
{
long i = Math.round(Math.random()*list.size());

if(k<=list.size())
{
for(int m =0; m<k; m++)
{
System.out.println("Read:" + list.get((int)(i)));
i = Math.round(Math.random()*list.size());
}
}
else
{
System.out.println("No of Lines too much..."wink;
System.exit(0);
}

}

}

Re: Another Programming Challenge by ektbear: 8:16pm On Sep 27, 2012
//read each line from the file
s = in.readLine();

while(s!=null)
{

list.add(s);
s = in.readLine();
}
in.close();


This above loads the entire file into memory ( the ArrayList list()).

If the file is bigger than your memory size, then this won't work. Assume the file is several terabytes or something.
Re: Another Programming Challenge by ektbear: 11:01pm On Sep 27, 2012
At a high level, this is the idea:

1. Sweep through file line by line to compute N
2. Select some subset of K integers from the numbers 1 through N
3. Sweep through the file again, adding the corresponding line numbers to the file.
Re: Another Programming Challenge by ektbear: 12:44am On Sep 28, 2012
too boring, I take it?
Re: Another Programming Challenge by PrinceNN(m): 7:07am On Sep 28, 2012
Am On it...expect my submission the moment I arrive my destination..no internet here, just my phone...
Re: Another Programming Challenge by ektbear: 8:01am On Sep 28, 2012
Another small one:

1. Write a function that determines if a number is prime or not (try to use an algorithm at least slightly better than the obvious one that just tests all factors).

2. If you plan on calling this function many times with many repeated values, how would you suggest speeding up the performance?
Re: Another Programming Challenge by Javanian: 8:36am On Sep 28, 2012
^ i suggest you remove this one until some one comes up with a solution to the former...so that we dont derail the thread...

For the former can i read it in parts, lets say 100kb until the file is exhausted?
Re: Another Programming Challenge by lordZOUGA(m): 10:37am On Sep 28, 2012
ekt_bear: At a high level, this is the idea:

1. Sweep through file line by line to compute N
2. Select some subset of K integers from the numbers 1 through N
3. Sweep through the file again, adding the corresponding line numbers to the file.
sweeping through a file of unknown length line by line?? Even if the file is say up to 1terabyte??
Alot of vagueness in your question though....
Besides I wouldn't append index to the file... Why? Because there would be no way to jump to it since you cannot load all the file into memory at once.
A buffer is a wise choice though...
Re: Another Programming Challenge by Javanian: 5:30pm On Sep 28, 2012


import java.util.*;
import java.io.*;
public class ReadFile
{
static List list;

public static void main(String[] args) throws Exception
{
//receive input for number of lines to print
Scanner read = new Scanner(System.in);
System.out.println("Enter the number of lines (k) you want to print"wink;
int k = read.nextInt();

list = new ArrayList();
File file = new File(args[0]);
//get the file size
int filesize=(int)file.length();
FileInputStream fis= new FileInputStream(file);

//size of required to read to at a time (1kb)
int size = 1024;


try
{
byte b[]=new byte[size];


int ch,c=0;

while(filesize>0)
{

ch=fis.read(b,0,size);
//read file of 1kb


filesize = filesize-ch;
//update the file size

//create temporary file name
String fname=c+"."+file.getName()+"";
c++;
//create temporary file
File tempFile = new File(fname);
FileOutputStream fos= new FileOutputStream(tempFile);
fos.write(b,0,ch);
fos.flush();
fos.close();

//read temporary file
FileReader fr = new FileReader(fname);
BufferedReader in = new BufferedReader(fr);
String s;

//read each line from the file
s = in.readLine();

while(s!=null)
{
//add each line of file to a list object
list.add(s);
s = in.readLine();
}
//delete temporary file to save memory
tempFile.delete();

in.close();
}
}
catch(Exception ex)
{
ex.printStackTrace();
}
printRandom(k);
}
public static void printRandom(int k)
{
//Generate random number
long i = Math.round(Math.random()*list.size());

//ensure k (required no of lines) is not more than number of lines in the list
if(k<=list.size())
{
//generate reqired number of lines(k)
for(int m =0; m<k; m++)
{
//print an random selected line
System.out.println("Read:" + list.get((int)(i)));
i = Math.round(Math.random()*list.size());
}
}
else
{
System.out.println("No of Lines too much..."wink;
System.exit(0);
}

}

}

Re: Another Programming Challenge by ektbear: 8:48pm On Sep 28, 2012
lordZOUGA:
sweeping through a file of unknown length line by line?? Even if the file is say up to 1terabyte??
If there are N lines, this will take O(N) time, won't it?

Plus most languages will do some sort of buffered I/O which reads big blocks of the file at a time, no?
Re: Another Programming Challenge by Javanian: 8:56pm On Sep 28, 2012
@ekt_bear did i get it right??
Re: Another Programming Challenge by ektbear: 9:02pm On Sep 28, 2012
Hmm...I think you need to have a function countLines() that first computes N.

Once you have this, then you generate a list of the K row numbers you'll need to grab.

Then you grab them.
Re: Another Programming Challenge by Javanian: 9:23pm On Sep 28, 2012

//This adds each line to the list
while(s!=null)
{

list.add(s);
s = in.readLine();
}
//This gets the no of lines in the list
list.size()



i dont think i need an extra function to compute N
Re: Another Programming Challenge by ektbear: 9:25pm On Sep 28, 2012
If the file is 4 terabyes, you've just created a list of size 4 TB. Which will not fit in your computer's memory.
Re: Another Programming Challenge by Javanian: 9:29pm On Sep 28, 2012
ok, then i will seat back and wait for a solution grin

1 Like

Re: Another Programming Challenge by mkwayisi: 9:38pm On Sep 28, 2012
Javanian: ok, then i will seat back and wait for a solution grin
And there you have it... implemented in C#

using System;
using System.IO;

public class Program
{
public static void Main()
{
string fileName = "file.txt"; // Name of file to read
int linesToRead = 5; // Number of random lines read

Random random = new Random();
int totalLines = 0;
int[] groups = new int[linesToRead];
string[] randomLines = new string[linesToRead];
StreamReader sr = null;

// Count the number of lines in the file. (Hint: I hate exceptions!)
try { sr = new StreamReader(fileName); }
catch (Exception e) { Console.WriteLine(e.Message); return; }
while (!sr.EndOfStream) {
sr.ReadLine();
totalLines++;
}

// "You don high?" -- Javanian
if (linesToRead > totalLines) {
Console.WriteLine("Can't dash me {0} naira when you have only {1} wink",
linesToRead, totalLines);
goto end; // The more they hate you, the more I love you!
}

// Split the lines into groups so we will pick one random line from each.
// This will make the result set more semantically random.
for (; linesToRead > 0; linesToRead--) {
groups[linesToRead - 1] = totalLines / linesToRead;
totalLines -= groups[linesToRead - 1];
}

// Randomly select the line number to read from each group.
for (int i = 0, min = 0, max = 0; i < groups.Length; min = max, i++)
groups[i] = random.Next(min, max += groups[i]);

// With the line numbers to read set, now read them!
sr.BaseStream.Position = 0;
for (int i = 0, line = 0; !sr.EndOfStream; line++)
if (groups[i] == line) {
randomLines[i++] = sr.ReadLine();
if (i == groups.Length)
break;
} else sr.ReadLine();

// Prove to the agnostic that this algorithm works wink
for (int i = 0; i < randomLines.Length; i++)
Console.WriteLine("Random Line {0}: {1}", i + 1, randomLines[i]);

end:
sr.Close();
}
}
Re: Another Programming Challenge by ektbear: 9:41pm On Sep 28, 2012
goto end; // The more they hate you, the more I love you!

lmao! grin
Re: Another Programming Challenge by Javanian: 9:41pm On Sep 28, 2012
while (!sr.EndOfStream) {
sr.ReadLine();
totalLines++;
}


You are doing the samething i did grin
Re: Another Programming Challenge by Javanian: 9:41pm On Sep 28, 2012
ekt_bear: goto end; // The more they hate you, the more I love you!

lmao! grin

dont mind him angry
Re: Another Programming Challenge by ektbear: 9:43pm On Sep 28, 2012
I think your algorithm works, mkwayisi. However, the splitting the lines into groups stuff is not something I mentioned...but can of course be corrected pretty easily.
Re: Another Programming Challenge by ektbear: 9:44pm On Sep 28, 2012
Javanian:
while (!sr.EndOfStream) {
sr.ReadLine();
totalLines++;
}


You are doing the samething i did grin

Right, but he just counts the number of lines to compute N. He doesn't store the line in memory like you do with your list.
Re: Another Programming Challenge by mkwayisi: 9:48pm On Sep 28, 2012
ekt_bear: I think your algorithm works, mkwayisi. However, the splitting the lines into groups stuff is not something I mentioned...but can of course be corrected pretty easily.
I introduced it just to make the result *more* random wink
Re: Another Programming Challenge by ektbear: 9:49pm On Sep 28, 2012
Eh...that makes it less random, not more random.

Well, it depends on how you define random I guess.
Re: Another Programming Challenge by mkwayisi: 9:53pm On Sep 28, 2012
My definition of "random":
Suppose you are to select 3 random numbers from set P, where P = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10},
R = {2, 5, 9} is *more* random logically (but certainly not academically) than R = {1, 2, 3}
Re: Another Programming Challenge by ektbear: 9:56pm On Sep 28, 2012
That typically isn't how people define a random subset of size 3 from N objects.

Basically, there are N choose 3 such subsets. And when people say "random", then mean that each of these subsets are equally likely.

Otoh, you seem to be throwing away subsets such as R={1,2,3}.

Anyway, not a big deal...easy to fix
Re: Another Programming Challenge by PrinceNN(m): 12:58am On Sep 29, 2012
Done in Java
http://pastebin.com/znMpn3TH

Pass the file path as d only arguement

Java fileClass.java c:/file.txt
Re: Another Programming Challenge by mkwayisi: 8:42am On Sep 29, 2012
₱®ÌИСΞ:
Done in Java
http://pastebin.com/znMpn3TH

Pass the file path as d only arguement

Java fileClass.java c:/file.txt

With your implementation, you did not eliminate the possibility of the same number being generated multiple times. So if your random numbers generated are say {1, 1, 2, 2, 3}, you'd end up displaying same lines multiple times as well.
Re: Another Programming Challenge by PrinceNN(m): 12:38pm On Sep 29, 2012
mkwayisi:

With your implementation, you did not eliminate the possibility of the same number being generated multiple times. So if your random numbers generated are say {1, 1, 2, 2, 3}, you'd end up displaying same lines multiple times as well.

Hahaha lol...thnx for dat...I imagined the possibility of dat occuring in a HUGE file was near impossible so I guess my mind didn't go there...lemme fix it
Re: Another Programming Challenge by lordZOUGA(m): 3:57pm On Sep 29, 2012
ekt_bear:
If there are N lines, this will take O(N) time, won't it?

Plus most languages will do some sort of buffered I/O which reads big blocks of the file at a time, no?
it will take O(N)... This would be nice but your algorithm suggested doing it twice when in my opinion, once would be enough.
Yea most languages has internal buffer... C++ has the internal buffer object stream_buf that is created for every I/O operation

(1) (2) (Reply)

Your Advice For A Beginner Database Administrator? / Lack Of Startup Ideas / Please How Can I Download A Torrent File From Freetutorial.us

(Go Up)

Sections: politics (1) business autos (1) jobs (1) career education (1) romance computers phones travel sports fashion health
religion celebs tv-movies music-radio literature webmasters programming techmarket

Links: (1) (2) (3) (4) (5) (6) (7) (8) (9) (10)

Nairaland - Copyright © 2005 - 2024 Oluwaseun Osewa. All rights reserved. See How To Advertise. 43
Disclaimer: Every Nairaland member is solely responsible for anything that he/she posts or uploads on Nairaland.