Welcome, Guest: Register On Nairaland / LOGIN! / Trending / Recent / New
Stats: 3,148,909 members, 7,802,957 topics. Date: Saturday, 20 April 2024 at 05:17 AM

Storing K largest numbers from a stream - Programming - Nairaland

Nairaland Forum / Science/Technology / Programming / Storing K largest numbers from a stream (2551 Views)

Sorting List Of Numbers And Strings: Simple Puzzle / Microsoft Attempting World's Largest Coding Marathon / Randomising ,storing And Retrieving From Database (2) (3) (4)

(1) (Reply) (Go Down)

Storing K largest numbers from a stream by ektbear: 3:39am On Oct 31, 2012
I'm going to see a long stream of numbers. Say, N of them..think N=1 billion, 1 trillion, etc.

I want to store the top K largest of these numbers. Think K=10, etc.

Design a function/program that given stores the K largest numbers of this stream. Try to make your program/function as efficient as possible.

Hint: Essentially, start thinking about how to design a function insert(f) that given a number f, will check whether it needs to be stored in my collection of the K largest numbers and updates this collection appropriately.
Re: Storing K largest numbers from a stream by Javanian: 8:22am On Oct 31, 2012
Is memory a problem?
Re: Storing K largest numbers from a stream by ektbear: 8:24am On Oct 31, 2012
The N numbers could be much bigger than the capacity of your memory.

So the simple algorithm one might have in mind of loading all the N numbers into memory, sorting, and picking the K largest won't work, since you don't have space to sort all N items at once.

So, you sort of want to process the data sequentially. So a possible design is this:

1. Build some class/data structure that has internal storage for the K largest elements
2. build an insert method for this class that allows you to insert a potential number into the data structure.
3. Then, as you see this stream of N numbers, call your insert function on each new number you see.

So in pseudocode:

a = MyClass.new(K)
for i=1..N
a.insert(Next # in stream of numbers)
end

Something like that. Or more legibly/filled out a bit more, something like this:
http://pastebin.com/5xRVQ6iC
Re: Storing K largest numbers from a stream by lordZOUGA(m): 2:18pm On Oct 31, 2012
ekt_bear: I'm going to see a long stream of numbers. Say, N of them..think N=1 billion, 1 trillion, etc.

I want to store the top K largest of these numbers. Think K=10, etc.

Design a function/program that given stores the K largest numbers of this stream. Try to make your program/function as efficient as possible.

Hint: Essentially, start thinking about how to design a function insert(f) that given a number f, will check whether it needs to be stored in my collection of the K largest numbers and updates this collection appropriately.
okay, my first reply was kinda hasty. Didn't get that the exercise needed K greater numbers.
I will make an array that can hold K greater numbers and another K sized array that will hold the buffer... Both arrays will always be sorted in descending order...
If(greater[LAST] >= buffer[FIRST]){
//discard buffer}
if(greater[FIRST] <= buffer[LAST]){
//replace greater with buffer
}
if all the above returns false, I will use a merge() that returns an array with the greatest of both arrays...
Re: Storing K largest numbers from a stream by Javanian: 8:39pm On Oct 31, 2012

/**PHCN has decided to abandon us, so i wrote this code from a mobile phone, if it works its my code else @ekt_bear wrote it grin
***/

import java.util.*;
public class TopK
{
//assuming k is 5
int k=5;
int[] numbers;

public TopK()
{
numbers = new int[k];
for(int i=0; i< numbers.length; i++)
{

numbers[i]=0;
}
}
public void insert(int n)
{
//sort the arrays in ascending order
Arrays.sort(numbers);
for(int i=0; i<=numbers.length; i++)
{
if(n>numbers[i])

numbers[i]=n;
break;
}
}
public void print()
{
for(int i=0; i<numbers.length; i++)
System.out.println(numbers[i]);
}
public static void main(String[] args)
{
TopK tp = new TopK();
//Here you will read @ekt_bear 1 Zillion numbers and insert them individually e.g tp.insert(read.nextInt)
//but for sake of convinience i will just insert some numbers to test it

tp.insert(90);
tp.insert(40);
tp.insert(100);
tp.insert(60);
tp.insert(10);
tp.insert(80);
tp.insert(110);
tp.insert(1000);
tp.insert(260);
tp.insert(10);
tp.print();
}

}

Re: Storing K largest numbers from a stream by ektbear: 9:04pm On Oct 31, 2012
Your Java implementation looks close.

However, some comments:

1. In your constructor, shouldn't you pass k as an argument to it?
2. In your constructor, you initialize all the values to zero. This is fine if you know that all of the #s in the stream are bigger than zero. But if they are just arbitrary floating point numbers, then you'd need to do things a bit differently.
3. In your function insert(), since you sort numbers right away, then you only need to compare n to the bottom element of numbers, i.e., numbers[0], right?


But you are quite close.
Re: Storing K largest numbers from a stream by Javanian: 9:06pm On Oct 31, 2012
Like i said I'm writing from a mobile phone so i overlooked a lot of things but those should be easy to fix right?
Re: Storing K largest numbers from a stream by ektbear: 9:08pm On Oct 31, 2012
1 and 3 are easy to fix.

2 is also easy to fix, but you'll either need some sort of minimum float value from Java, or (even better) to handle the case when you've only seen K numbers from the stream separately from the case when you've already seen more than K numbers.
Re: Storing K largest numbers from a stream by ektbear: 10:25pm On Oct 31, 2012
Anyway, let's pretend/assume that you fixed those issues. Essentially, you are paying K log(K) time each time you call insert. If K is a small constant or something, this isn't a big deal.

However, there is a way to improve this substantially by making the "cost" of each insertion log(K). Have you heard of something called a heap/priority queue? You can make the program much faster if you use a min-heap.
Re: Storing K largest numbers from a stream by Javanian: 10:41pm On Oct 31, 2012
Yeah!...i know of heap and priority queues...i will attempt your suggestion when am on p.c. Although i would have prefered using something like a Binary search tree
Re: Storing K largest numbers from a stream by ektbear: 10:49pm On Oct 31, 2012
Interesting, if you can implement it with a BST that'd be cool.
Re: Storing K largest numbers from a stream by PrinceNN(m): 8:07am On Nov 01, 2012
*peeps in...steals @javanian's implementation for rebranding* hehe
Re: Storing K largest numbers from a stream by lordZOUGA(m): 6:00pm On Nov 01, 2012
modified my post above
Re: Storing K largest numbers from a stream by ektbear: 8:45am On Nov 04, 2012
Here is a Minheap based approach in Ruby: http://pastebin.com/P1uYFJP5

I'll at some point see if the data structures Javanian proposes can also be used to solve the problem.
Re: Storing K largest numbers from a stream by Javanian: 9:50pm On Nov 04, 2012
with a HEAP

class Node
{
private int iData;

Node(int key)
{ iData = key; }

int getKey()
{ return iData; }

void setKey(int id)
{ iData = id; }
}
class Heap
{
private Node[] heapArray;
private int maxSize; // size of array
private int currentSize; // number of nodes in array

Heap(int mx)
{
maxSize = mx;
currentSize = 0;
heapArray = new Node[maxSize];

for(int i=0; i<heapArray.length; i++)
{
Node newNode = new Node(0);
heapArray[i] = newNode;
currentSize++;
}
}

boolean isEmpty()
{ return currentSize==0; }

boolean insert(int key)
{
for(int i=0; i<heapArray.length; i++)
{
Node newNode = new Node(key);
while(newNode.getKey() > heapArray[i].getKey())
{
Node temp = heapArray[i];
heapArray[i] = newNode;
change(i+1, temp.getKey());
trickleUp(i);
return true;
}
}
return true;
}

void trickleUp(int index)
{
int parent = (index-1) / 2;
Node bottom = heapArray[index];
while( index > 0 &&
heapArray[parent].getKey() < bottom.getKey() )
{
heapArray[index] = heapArray[parent]; // move it down
index = parent;
parent = (parent-1) / 2;
}
heapArray[index] = bottom;
}

public boolean change(int index, int newValue)
{
if(index<0 || index>=currentSize)
return false;
int oldValue = heapArray[index].getKey(); // remember old
heapArray[index].setKey(newValue); // change to new
if(oldValue < newValue) // if raised,
trickleUp(index); // trickle it up
return true;
}
public void displayHeap()
{
for(int i=0; i<heapArray.length; i++)
{
System.out.println(heapArray[i].getKey());
}
}

}
class HeapApp
{
public static void main(String[] args)
{
Heap theHeap = new Heap(2);
theHeap.insert(7000000);
theHeap.insert(4000);
theHeap.insert(50000);
theHeap.insert(20);
theHeap.insert(-1);
theHeap.insert(100);
theHeap.insert(8000);
theHeap.insert(30);
theHeap.insert(10);
theHeap.insert(900000000);

theHeap.displayHeap();


}
}
Re: Storing K largest numbers from a stream by ektbear: 10:35pm On Nov 04, 2012
Why not use the java collections heap rather than implementing your own?
Re: Storing K largest numbers from a stream by ektbear: 10:36pm On Nov 04, 2012
Cool that you know how to implement a heap from scratch off the top of your head though
Re: Storing K largest numbers from a stream by Javanian: 10:51pm On Nov 04, 2012
ekt_bear: Why not use the java collections heap rather than implementing your own?

Actually, I've been looking for an opportunity to implement it grin , Thanks for giving me one...
Re: Storing K largest numbers from a stream by ektbear: 8:23am On Nov 05, 2012
Sure. Now, let's see your binary search tree based approach.
Re: Storing K largest numbers from a stream by ektbear: 12:10am On Nov 09, 2012
Javanian, where u dey
Re: Storing K largest numbers from a stream by kadeerna: 1:21pm On Nov 09, 2012
Maybe I don't completely understand what you guys are doing here, but exactly why are you sorting and comparing such a large stream of numbers using less than ( < ) and greater than ( > ) comparators?

Have you thought of keeping a moving track of the ex. lengths of each entry instead?
Re: Storing K largest numbers from a stream by ektbear: 7:55am On Nov 10, 2012
not too sure what you mean, kadeerna
Re: Storing K largest numbers from a stream by kadeerna: 11:26am On Nov 10, 2012
In keeping track of the highest K values from a list of N values,

- Start at value 1 and store the length of the value converted to a string.
- On every other value in the list, check the length stored from 1 above against the length of the current value (converted to a string).
- If the length of the current value is greater than the value from 1, update the variable with the current longest found length and save the value to longest[0].
- Continue the same check through other values in the list
- Where you find the length of the current value is equal to longest length variable, either do a greater than, less than or equality comparison. Another option might be to check the individual digits
ex.

longestValue = 122323213254354664323, longestValueLength = 20
current(list) = 1290344355465742345, current(list).length = 20

currentLonger = false
while i < 19 or currentLonger == false
currentLonger = (longestValue[0] == current(list)[0]) <-- cross check condition.

longestValue = (currentLonger) ? current(list) : longestValue;

Expanding this to store the last K values will simply modifying the above to use arrays or K instead.

This way, you don't store at least 2 data structures of N size.
Re: Storing K largest numbers from a stream by ektbear: 10:13pm On Nov 10, 2012
What is "length of value?"

Like, what is the length of the value of say the number 50?
Re: Storing K largest numbers from a stream by Javanian: 2:33am On Nov 11, 2012
@ekt_bear i'm sorry for the late reply there has been shortage of power in my area.

O.k. I tried implementing it with a BST it worked but it was no different from when i did it with a heap. It will take O(log2n) time at worst case. Although i have to admit that suggesting a BST at first was wrong, what i had in mind was something like a Binary Heap which will return the maximum item in O(1) time at worst case. But it is more or less useless because I'm not going to be inserting all the items. So i wont be able to take advantage of this...
Re: Storing K largest numbers from a stream by kadeerna: 11:24am On Nov 11, 2012
ekt_bear: What is "length of value?"

Like, what is the length of the value of say the number 50?

Length here, is the number of characters in the value when converted to a string.

->>>> ((string)12232111).Length

Length of 50 = 2
Re: Storing K largest numbers from a stream by Nobody: 6:06pm On Nov 11, 2012
Ekt, I'd like to program like you. How do I go about it? I've intermediate programming knowledge.
Re: Storing K largest numbers from a stream by ektbear: 1:08am On Nov 12, 2012
Hi,

My suggestion is for you to pick one nice, simple, well-designed scripting language like Python and use it as your main programming language for a year.

The nice thing about Python is that it is only a level above pseudo-code...makes it very easy to express your ideas.

Register for a programming competition site like spoj (http://www.spoj.pl/), work through the exercises there. You'll learn a lot from doing this.

Anyway, I think this would be a good place to start.

Let me know if you have any questions.
Re: Storing K largest numbers from a stream by ektbear: 1:24am On Nov 12, 2012
Javanian: @ekt_bear i'm sorry for the late reply there has been shortage of power in my area.

O.k. I tried implementing it with a BST it worked but it was no different from when i did it with a heap. It will take O(log2n) time at worst case. Although i have to admit that suggesting a BST at first was wrong, what i had in mind was something like a Binary Heap which will return the maximum item in O(1) time at worst case. But it is more or less useless because I'm not going to be inserting all the items. So i wont be able to take advantage of this...

Yeah, so it looks like using a binary search tree leads to basically the same result as a heap.

You pay log(k) for insertions, deletions, and searches. Java has a TreeMap (http://docs.oracle.com/javase/6/docs/api/java/util/TreeMap.html) data structure which can be used for this.
Re: Storing K largest numbers from a stream by Nobody: 10:55am On Nov 12, 2012
@Ekt_bear Thanks.
Re: Storing K largest numbers from a stream by Javanian: 5:05pm On Dec 25, 2013
bump
Re: Storing K largest numbers from a stream by ektbear: 3:28am On Dec 26, 2013
Here is a refreshed version in Java: http://pastebin.com/X61JmSEs

(1) (Reply)

Multiple Login Control / What Is The Longest Line Of Code You Have Written And In What Language / I Need Pdf E-book On Vb.net For Web Application

(Go Up)

Sections: politics (1) business autos (1) jobs (1) career education (1) romance computers phones travel sports fashion health
religion celebs tv-movies music-radio literature webmasters programming techmarket

Links: (1) (2) (3) (4) (5) (6) (7) (8) (9) (10)

Nairaland - Copyright © 2005 - 2024 Oluwaseun Osewa. All rights reserved. See How To Advertise. 44
Disclaimer: Every Nairaland member is solely responsible for anything that he/she posts or uploads on Nairaland.