# ソートされた配列の要素を検索する

ポジションを申請していたところ、コーディングの問題を完了するように依頼されました。私はそうし、それを提出しましたが、後で私はそのポジションから拒否されたことがわかりました。とにかく、私は折衷的なプログラミングの背景を持っているので、自分のコードがひどく間違っているのか、それともそこに最良の解決策がなかったのかはわかりません。私のコードを投稿し、それについていくつかのフィードバックを得たいと思います。その前に、問題の説明を次に示します。

You are given a sorted array of integers, say, c2020_0. Now you are supposed to write a program (in C or C++, but I chose C) that prompts the user for an element to search for. The program will then search for the element. If it is found, then it should return the first index the entry was found at and the number of instances of that element. If the element is not found, then it should return "not found" or something similar. Here's a simple run of it (with the array I just put up):

```Enter a number to search for: 4

4 was found at index 2.
There are 2 instances for 4 in the array.

Enter a number to search for: -4.

-4 is not in the array.
```

• ユーザーに入力を求めます。
• 次に、それが境界内にあるかどうかを確認します（配列のa [0]より大きく、配列の最大要素より小さい）。
• その場合、私は二分探索を実行します。
• 要素が見つかった場合、2つのwhileループを記述しました。1つのwhileループは、見つかった要素の左側にカウントされ、2番目のwhileループは、見つかった要素の右側にカウントされます。ループは、隣接する要素が目的の値と一致しない場合に終了します。

とにかく、欠けている高度なテクニックがあるのか​​、CSの背景がなくて大きなエラーを起こしただけなのかはわかりません。建設的な批評をいただければ幸いです！

``````#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stddef.h>

/* function prototype */

int get_num_of_ints(
const int* arr,
size_t r,
int N,
size_t* first,
size_t* count
);

int main()
{
int N;                                 /* input variable */
int arr[]={1,1,2,3,3,4,4,4,4,5,5,7,7,7,7,8,8,8,9,11,12,12}; /* sorted */
size_t r = sizeof(arr)/sizeof(arr[0]); /* right bound */
size_t first;                          /* first match index */
size_t count;                          /* total number of matches */

/* prompts the user to enter input */

printf( "\nPlease input the integer you would like to find.\n" );
scanf( "%d", &N );

int a = get_num_of_ints( arr, r, N, &first, &count );

/* If the function returns -1 then the value is not found.
Else it is returned */

if( a == -1)
printf( "%d has not been found.\n", N );

else if(a >= 0){
printf( "The first matching index is %d.\n", first );
printf( "The total number of instances is %d.\n", count );
}

return 0;
}

/* function definition */

int get_num_of_ints(
const int* arr,
size_t r,
int N,
size_t* first,
size_t* count)
{
int lo=0;    /* lower bound for search */
int m=0;     /* middle value obtained */
int hi=r-1;  /* upper bound for search */

int w=r-1;   /* used as a fixed upper bound to calculate the number of
right instances of a particular value. */

/* binary search to find if a value exists */

/* first check if the element is out of bounds */

if( N < arr[0] || arr[hi] < N ){
m = -1;
}
else{ /* binary search to find value if it exists within given params */

while(lo <= hi){
m = (hi + lo)/2;

if(arr[m] < N)
lo = m+1;
else if(arr[m] > N)
hi = m-1;
else if(arr[m]==N){
m=m;
break;
}
}
if (lo > hi)    /* if it doesn't we assign it -1 */
m = -1;
}

/* If the value is found, then we compute left and right instances of it */

if( m >= 0 ){

int j = m-1;    /* starting with the first term to the left */
int L = 0;  /* total number of left instances */

/* while loop computes total number of left instances */

while( j >= 0 && arr[j] == arr[m] ){
L++;
j--;
}

/* There are six possible outcomes of this. Depending on the outcome,
we must assign the first index variable accordingly */

if( j > 0 && L > 0 )
*first=j+1;
else if( j==0 && L==0)
*first=m;
else if( j > 0 && L==0 )
*first=m;
else if(j < 0 && L==0 )
*first=m;
else if( j < 0 && L > 0 )
*first=0;
else if( j=0 && L > 0 )
*first=j+1;

int h = m + 1;  /* starting with the first term to the right */
int R = 0;      /* total number of right instances */

/* while loop computes total number of right instances */
/* we fixed w earlier so that it's value does not change */

while( arr[h]==arr[m] && h <= w ){
R++;
h++;
}

*count = (R + L + 1); /* total num of instances stored into count */
return *first;        /* first instance index stored here */
}

/* if value does not exist, then we return a negative value */

else if( m==-1)
return -1;
}
``````

It's been a long while since I used C++, so I couldn't write it write now, but I know there were algorithms in the STL that would make very short work of this. Implementing your own binary search is probably a sign to the prospective employer that you're something of a beginner. The fact that you can implement it is good - but that you would choose to, when such algorithms are already implemented for you, might be discouraging to them.

I would factor the binary search out as a separate procedure.

Also, I would input the array, even if this is not called for, or factor it out into an input procedure that could be converted into something that reads from a file.

I prefer camelCase convention, and more meaningful variable names.

Your algorithm, however, seems fine. Have you thought about the possibility that the test was not a big factor in the hiring decision? Several applicants may have written equally good answers, and the decision made on other criteria.

It looks reasonable enough to me. Yes, maybe some STL methods might help. The only thing I would criticize is that you don't verify that the input is actually a number.

Still I don't see enough here to reject you base don this assignment. Maybe it was another part of the interview. Or maybe they did not reject you - they accepted someone else.

Your code is too complex for what it should do. There are too many comments, variables poor named, and not a clear definitions of functions roles. Some code to show what I would expect as a response:

``````#include <stdio.h>

int binary_search( const int value, const int *arr, size_t start, size_t end ){
if( value < arr[start] || value > arr[end] ){
return -1;
}

while(start <= end){
int pivot = (start+end) >> 1;
if(arr[pivot] == value){
return pivot;
}else if(arr[pivot] < value){
start = pivot+1;
} else if(arr[pivot] > value){
end = pivot-1;
}
}
return -1;
}

int get_occurences( int begin, const int *arr, size_t max){
int counter = 1;
int cursor = begin;
while ( (cursor+1) < max && arr[cursor] == arr[cursor+1]) {
counter++;
cursor++;
}
cursor = begin;
while ( (cursor-1) > 0 && arr[cursor] == arr[cursor-1]) {
counter++;
cursor--;
}
return counter;
}

#define MAX 22
int main()
{
int value;
int arr_sorted []={1,1,2,3,3,
4,4,4,4,5,
5,7,7,7,7,
8,8,8,9,11,
12,12};
size_t arr_size = MAX; // works also the other way

printf( "\nPlease input the integer you would like to find.\n" );
scanf( "%d", &value );
printf("Searching %d\n", value);
int pos = binary_search( value, arr_sorted, 0, arr_size-1);

if( pos == -1) {
printf( "%d has not been found.\n", value );
} else{
int howmany = get_occurences( pos, arr_sorted, arr_size);
printf( "The first matching index is %d.\n", pos );
printf( "The total number of instances is %d.\n", howmany );
}

return 0;
}
``````

I think the last part of your algorithm is sub-optimal. You could carry on your binary search to find the lowest and highest elements equal to the one you are looking for, and then subtract the addresses to find how many there are.

This would scale better if there are many identical elements, but, more importantly for the interview question, it would make your algorithm simpler. For example, the "6 outcomes" code is pretty hairy, and having such a bunch of if-else is often considered a code smell.

Random observations:

• The binary search should be decoupled from "find the longest run of identical elements containing this index."

• It's possible they want time logarithmic in the length of the run, rather than linear.

• I'd reject any candidate who submitted a six-way case analysis for a simple problem like finding a run. If you split that out as its own routine, you probably find a simpler way to do it like

``````for (first = m; first > arr && *first == arr[m]; first--)
;
for (last = m; last < arr+r-1 && *last == arr[m]; last++)
;
``````

This code costs linear in the length of the run; if they want logarithmic it becomes a tricky little problem—but in my opinion, over the top as an interview problem.

I would have a few concerns about hiring someone who submitted this for a code sample. Here is what I see.

First, addressing overall design, the algorithm is suboptimal, and is worst case linear instead of worst case logarithmic, because it doesn't use a binary search to find the amount of elements, but a linear one.

Second, (and this is what would have really killed it for me) the variable names. Most of these are either one or two letters, and the code is very unreadable because of it. Giving your variables descriptive names is important for maintainability.

Third, ignoring standard libraries. Unless instructed not to use them, you should prefer standard libraries which have implementations of binary search (e.g. the stl or bsearch)

Fourth, why have get_num_of_ints return -1 has a magic value feeling to me; better to just set count = 0 and check that.

Fifth, get_num_of_ints is just far too long, and tries to do way too much. It badly needs to be broken up.

Sixth (and this is a personal choice), I think C++, with the STL, is a far better choice in this instance.

In the spirit of "show, don't tell", here is how I would have written the assignment (untested, uncompiled) (edited to match the required function signature):

``````#include <iostream>
#include <algorithm>

using namespace std;

// This function signature is required:
int get_num_of_ints(const int* searchBegin, size_t searchSize, int input,
size_t* first, size_t* count) {
const int* searchEnd = searchBegin + searchSize;
const int* result = lower_bound(searchBegin, searchEnd, input);

if (searchEnd == result || *result != input)
return -1;

*first = result - searchBegin;
*count = upper_bound(result, searchEnd, input) - result;
return 0;
}

void print_search_results(const int* searchBegin, size_t searchSize, int input) {
size_t first;
size_t count;

if (get_num_of_ints(searchBegin, searchSize, input, &first, &count) < 0) {
cout << input << " is not in the array." << endl;
return;
}

cout << input << " was found at index " << first << ". "
<< "There are " << count << " instances for " << input << " in the array."
<< endl;
}

cout << "Enter a number to search for: ";
bool succeeded = cin >> *input;
cout << endl;
return succeeded;
}

int main (int argc, char** argv) {
const int searchNumbers[] = {1, 2, 4, 4, 5, 8, 9, 9, 9, 9, 9, 9, 10, 10, 10, 11, 13};

while(1) {
int input;
count << "Bad input, exiting" << endl;
return 1;
}

}
}
``````

``````m = (hi + lo)/2;
``````

is the wrong way to find the middle index. This can overflow. You should do:

``````m = lo + (hi - lo) / 2;
``````

Second, the line:

``````m=m;
``````

has no effect, and can be eliminated.

Was there a specific requirement that you do a binary search on the list? If there wasn't, why are you doing it? Unnecessarily overcomplicating a solution is one strike against you. * Edit: Whoops you said you had to do it this way in the question! Never mind, carry on *

Why are you using single character variable names? There's plenty of room in the symbol table for more descriptive names. That's another strike.

Too many comments. I can hear you all now: "is he crazy??". Seriously. The code should speak for itself as much as possible. Every time you start writing a comment think "is there a way I could make the code clear enough to avoid this comment?". That's strike three.

Coding is as much communicating to other people as it is communicating to a machine. Your coding style should reflect that. Code Complete is a great book with lots of useful coding style advice. Read the first few chapters and code up the solution again, applying the techniques he describes. Compare your two solutions while asking yourself "which one would I rather maintain?".

Is it just me, or am I the only one who would implement this in an entirely different fashion?

Looking at the requirements of the problem:

1. Determine if element is present
2. If present return array index of first occurrence
3. If present return number of occurrences

You have to think outside the box on a problem like this. Specifically, I would implement a solution to this by dumping the array into a hash at the beginning of the program. The hash key is the number, and the value is a struct containing the index of the first occurance and the total occurance count. You set up this hash as part of the program start up / initilization and then any subsequent look ups by the user are always constant time operations - BigO(1).

You would have no problem implementing this in C++ with the STL.

unordered_map (C++)

Binary Search is just the wrong solution to this problem.

Edit

Regarding the cost of setting up the hash:

Setup is a constant cost incurred only once. While I won't say it doesn't matter, there are a limited number of cases where it does; usually when you have either very small data sets or you execute the algorithm a very small number of times. Other than that setup cost is amortized across all the executions of the algorithm. An algorithm that requires n setup but executes in BigO(1) still beats an algorithm that requires 0 setup but executes in BigO(log n) time unless you're executing it a very small number of times.

This is what I would have turned in, simple clean and easy to read:

I modified it to test the timing with an array of 300 Million ints and even on my older system it found the "worst case" values (the ones at the end of the array) in about a second (took 6 seconds to fill the array at startup).

So IMHO all this binary search blather falls under the category of "pre-mature optimization" for this simple interview question. :-)

Do the simplest thing that works. Write the code so people can read it and not have to decipher it. Of course, be prepared to discuss performance alternatives if needed.

Latest edit: Addition of FindStartingPoint() function to optimize for speed.

``````/*****************************************************************
Module: FindInts.cpp
Author: Ron Savage
Date: 03/13/2010

Description:
This program prompts the user for an integer, searches for it
in a given sorted array of integers and prints the first location
and number of matches in the array.

Modification History:
Date       Init Comment
03/14/2010 RS   Added a recursive FindStartingPoint function to
find a good starting point for the linear search.
03/14/2010 RS   Added a 300 million int array to test timing.
03/13/2010 RS   Created.
*****************************************************************/
#include <stdafx.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stddef.h>

/* function prototype */
void   FillArray( int* searchList, size_t listSize);
int*   FindStartPoint( int* searchList, size_t listSize, int numberToFind );
size_t FindMatches( int* searchList, size_t listSize, int numberToFind, size_t* firstMatchIndex, size_t* matchCount );

/*****************************************************************
Function: main()
Author: Ron Savage
Date: 03/13/2010

Description:
Entry point to the program.
*****************************************************************/
int main()
{
int userInput = 0;
int *valueList = 0;

// Allocate an array of 300 million ints
size_t listSize = 300000000;
if (valueList = (int*) malloc(listSize * sizeof(int)))
{
size_t firstMatchIndex = 0;
size_t matchCount = 0;

/* Fill the array with some values */
FillArray(valueList, listSize);

/* prompt the user to enter input */
printf( "\nPlease input the integer you would like to find.\n" );
scanf_s( "%d", &userInput );

/* Search the list for the value entered */
size_t iterations = 0;
iterations = FindMatches( valueList, listSize, userInput, &firstMatchIndex, &matchCount );

/* Display the results if found */
if ( matchCount > 0 )
{
printf("\n%d matches for [%d] were found, starting at index %d in %ld iterations.\n", matchCount, userInput, firstMatchIndex, iterations);
}
else
{
printf("\nNo matches for [%d] were found.\n", userInput);
}
}
else
{
printf("\nCouldn't allocate memory for [%ld] ints.\n", listSize);
}
}

/*****************************************************************
Function: FindMatches()
Author: Ron Savage
Date: 03/13/2010

Description:
This function searches the searchList for the value entered
by the user (numberToFind) and returns the first index and
count of the matches.
*****************************************************************/
size_t FindMatches( int* searchList, size_t listSize, int numberToFind, size_t* firstMatchIndex, size_t* matchCount )
{
// Set the default values if no matches are found
*firstMatchIndex    = 0;
*matchCount         = 0;

// Find the start point of the number
int* startPoint   = FindStartPoint( searchList, listSize, numberToFind );

// Loop through the list looking for matches, exit early if we pass the value in the sorted list.
size_t searchIndex       = 0;
size_t searchInterations = 0;
size_t startIndex        = startPoint - searchList;
for (searchIndex = startIndex; searchIndex < listSize; searchIndex++)
{
searchInterations++;

if ( searchList[searchIndex] == numberToFind )   if ( ++(*matchCount) == 1 ) *firstMatchIndex = searchIndex;

if ( searchList[searchIndex] > numberToFind ) break;
}

return(searchInterations);
}

/*****************************************************************
Function: FindStartPoint()
Author: Ron Savage
Date: 03/14/2010

Description:
This function checks to see if the numberToFind is in the top
or bottom half of the searchList, then recursively calls itself
on each subsequently smaller sub-list until a starting point
at or slightly before the first numberToFind is found.
*****************************************************************/
int* FindStartPoint( int* searchList, size_t listSize, int numberToFind )
{
// Check the value in the middle, to determine which half the number is in
size_t middleIndex  = listSize / 2;

// Recurse into this function for just the first half of the list
if ( searchList[middleIndex] >= numberToFind && middleIndex > 1 )
{
return(FindStartPoint(searchList, middleIndex, numberToFind ));
}

// Recurse into this function for just the last half of the list
if ( searchList[middleIndex] < numberToFind && middleIndex > 1 )
{
return(FindStartPoint(&searchList[middleIndex], middleIndex, numberToFind ));
}

return(searchList);
}

/*****************************************************************
Function: FillArray()
Author: Ron Savage
Date: 03/13/2010

Description:
This function fills the array with a sorted list of values.
*****************************************************************/
void FillArray( int* searchList, size_t listSize)
{
size_t searchIndex  = 0;
int    value        = 0;

for (searchIndex = 0; searchIndex < listSize; searchIndex++)
{
searchList[searchIndex] = value++/100;
}
}
``````

I came up with a solution that is $$\O(\log n)\$$. It involves a modified binary search that can be told to always take exactly log n comparisons (no early breaking if a match is found), and depending on a parameter, it will either keep trying to find the value to the left or to the right in the array until exhaustion.

In this manner, we can run the binary search at most 2 times - no need to run it the second time if the first reports the value wasn't found - and give back the number of matches (last index - first index + 1). I pass in the `pos` variable as a pointer so we can effectively "return" 2 values - the number of matches, and the index of the first match in the run.

``````// returns the number of matches, and sets pos to the index of
// the first match, or -1 if none found
int findMatches(int array[], int size, int searchNum, int* pos)
{
*pos = binarySearch(array, size, searchNum, -1);
// if it was found, pos points to a positive number
if(*pos >= 0)
{
int lastIdx = binarySearch(array, size, searchNum, 1);
return lastIdx - *pos + 1;
}
return 0;
}
``````

Then I've got a binary search method that takes an extra argument, `direction`, which indicates whether you want the first binary search index (0), the earliest (-1) or the last (+1).

``````int binarySearch(int array[], int size, int searchNum, int direction)
{
int left = 0;
int right = size - 1;
int center;
int pos = -1;

while(left <= right)
{
center = (right + left) >> 1;

if(array[center] == searchNum)
{
pos = center;
// break early if we want to find the exact match
if(direction == 0) break;
}

// adding 1 to searchNum means we will find the last in the run
if(array[center] < searchNum + ((direction > 0) ? 1 : 0))
{
left = center + 1;
}
else
{
right = center - 1;
}
}

return pos;
}
``````

In my experience, the

``````if (condition)
consequence;
statementx;
...
``````

style is a land mine just waiting for another (or even the same) developer to extend it to:

``````if (condition)
consequence1;
consequence2;
statementx;
...
``````

Some of you might see what the problem is, but for most programmers, it is a virtually invisible bug because developers tend to interpret code by indentation even though the curly braces are missing, making consequence2 unconditional.

Another possibility is that the choice of C++ and C was a test in itself :) Maybe they were willing to settle for someone with C experience, but would have preferred someone with C++ experience, so that could have been a factor.

If you are really curious, you could even contact them again and ask for some feedback.

However, programming style is something that you should work on continuously, not just in preparation for an interview. You aren't going to change things much in that time. I would take a longer range view and try to get some C++, C#, and Java experience under your belt, maybe PHP or perl, or ruby or python, work continouously on programming better, maybe read a book or article on interviewing and resume writing, and keep applying.

I really think maybe they just didn't like the shirt you were wearing :)

I came a bit late to this, but this is something like what I might expect to see in C++.

``````// Interface defined by the problem statement has been adjusted
size_t get_num_of_ints( const int* arr, size_t r, int N, size_t* first )
{
std::pair<const int *, const int *> range = std::equal_range(arr, arr+r, N);
size_t count = range.second - range.first;
if (count > 0 && first != 0)
*first = range.first - arr;
return count;
}
``````

I think the interface is broken as defined, so I've fixed it :)

Slightly more than a nitpick, but probably not a big enough problem to rule you out:

Why do you explicitly test the final `else` case in three exhaustive `if ... else` statements? Generally, either they should have another `else` statement to catch any possible unexpected case, or, in all these cases, you ought to comment out the final `if` clause:

``````if( a < 0)
printf( "%d has not been found.\n", N );
else //if(a >= 0)
{ ...

if(arr[m] < N)
lo = m+1;
else if(arr[m] > N)
hi = m-1;
else //if(arr[m]==N)
{ ...

if( j > 0 && L > 0 )
*first=j+1;
else if( j==0 && L==0)
*first=m;
else if( j > 0 && L==0 )
*first=m;
else if(j < 0 && L==0 )
*first=m;
else if( j < 0 && L > 0 )
*first=0;
else //if( j=0 && L > 0 )
*first=j+1;
``````

Note also the typo in that last condition.

I did adjust the first condition to clearly ensure there's not an `else` case to add, and that is a valid option for the final case too, but as others have mentioned there's more consolidation available here anyway.

BTW don't take my reusing your single statement then/else clauses as an endorsement of that; it just wasn't what I was intending to discuss.

Your solution works, yet, as said in previous answers, it runs in worst case linear time. You can reduce it to worst-case logarithmic by rewriting two functions in the C++ standard library to C: namely, std::lower_bound and std::upper_bound. `std::lower_bound` will return the "iterator" (pointer in C, call it `const int* res`) to the first matching element (with value `N`). If there is no such, we will have `N != *res`. If the pointer is correct, perform `std::upper_bound` and the count of matching elements will be simply the difference between the result of `std::upper_bound` and `std::lower_bound`. Also, what comes to API, I suggest you insert the value of zero to `count` whenever there is no match. Putting all pieces together, you may obtain something like this:

``````#include <stdio.h>

static const int* upper_bound(const int* begin,
const int* end,
const int value)
{
size_t count;
size_t step;
const int* iter;
count = end - begin;

while (count > 0)
{
iter = begin + (step = (count >> 1));

if (*iter <= value)
{
begin = iter + 1;
count -= step + 1;
}
else
{
count = step;
}
}

return begin;
}

static const int* lower_bound(const int* begin,
const int* end,
const int value)
{
size_t count;
size_t step;
const int* iter;
count = end - begin;

while (count > 0)
{
iter = begin + (step = (count >> 1));

if (*iter < value) /* upper_bound compares "*iter <= value" */
{
begin = iter + 1;
count -= step + 1;
}
else
{
count = step;
}
}

return begin;
}

void get_number_of_ints(const int* array,
size_t array_length,
int target_value,
size_t* first_index,
size_t* count)
{
const int* iter1;
const int* iter2;

iter1 = lower_bound(array, array + array_length, target_value);

if (*iter1 != target_value)
{
/* No match. */
*count = 0;
return;
}

iter2 = upper_bound(array, array + array_length, target_value);
*first_index = (size_t)(iter1 - array);
*count = (size_t)(iter2 - iter1);
}

int main()
{
int N;                                 /* input variable */
int arr[]={1,1,2,3,3,4,4,4,4,5,5,7,7,7,7,8,8,8,9,11,12,12}; /* sorted */
size_t r = sizeof(arr)/sizeof(arr[0]); /* right bound */
size_t first;                          /* first match index */
size_t count;                          /* total number of matches */

/* prompts the user to enter input */

printf( "\nPlease input the integer you would like to find.\n" );
scanf( "%d", &N );

get_number_of_ints(arr, r, N, &first, &count);

if (count == 0)
{
}
else
{
printf("%d found at %zu, length %zu.\n", N, first, count);
}

return 0;
}
``````

Before we even look at the interface and algorithm, there's a few issues to address:

• `<string.h>` and `<stddef.h>` aren't required, as far as I can see (note that `<stdlib.h>` must provide `size_t`, if that's why the latter was included).

• `int main()` is a non-prototype definition; prefer `int main(void)` instead.

• There's no check that `scanf()` successfully converted a value. This lack of checking is a serious bug. If `scanf()` doesn't return 1 here, we need to take corrective action (consume input and prompt again, or just write a message to `stderr` and return `EXIT_FAILURE`).

• We attempt to print the results using `%d` with `size_t` values - we should be using `%zd` instead.

As for the `get_num_of_ints()` function, there are many narrowing and sign-converting conversions hidden there, due to use of `int` variables which really ought to be `size_t` (don't just blindly convert, because subtraction will overflow when `r` is zero).

To be honest, I don't understand why you're implementing your own binary search from scratch instead of simply using the Standard Library `bsearch()` for this.

Oh, and the function name is highly misleading, as the return value is not the number of matches; it's the index of first match.

I think all my other observations are mentioned in older answers; there's no point repeating those.