Amazon Interview Question for SDE-2s


Country: India




Comment hidden because of low score. Click to expand.
6
of 6 vote

Piped grep commands aside, you're probably better off with either Amazon Kinesis -> Spark -> Elasticsearch or similar architecture or using 3rd-party solutions like Logly. That said, here's a typical recommended approach.

1. Each application logger and usage should include the application and component name, along with timestamp, level, and message from an initial design perspective.

2. Either add a log "wrapper" to each application that persists them to a queue, cache or central data mart. An alternative approach would be to make sure log rotation enabled on the server and tail logs and pipe to a "wrapper" to export them to central location. Another alternative (my preference) is to use Logstash to consume log messages and ship them to external store, usually Kafka or Kinesis topics.

3. Consumer groups may grok/normalize the various log entries if not already standardized and republish normalized version to another "topic" in Kafka or Kinesis. You could also use a stream processor built into Kafka or 3rd-party like Spark if necessary.

4. Again leverage Logstash to consume normalized topics and publish to Elasticsearch or other Lucene-based search engine with document store. Be sure to add index rotation to data mart later (perhaps beyond 15 days' records) or your cluster will become massive.

5. Spin up Kibana to allow searchable browser-based interface and various dashboards and visualizations, plus time-series data analysis plugins.

6. Add either server-side alerting using Nagios or Monit, or you can use commercial products like Elastic's X-Pack "Watcher". Personally I use Monit and have custom shell script that triggers webhook message to a Slack channel for various system alerts.

;-) Enjoy!

- Mike Sparr - www.goomzee.com May 17, 2017 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

one of the solution could be that !
1- Each application may have different parameters as per requirement in the log file.
2- Each log file have Application_id which is unique for each application .
3- Apart this application id each each log will have LOG_LEVEL, DATE, TIME_STAMP, BODY,
STATUS.
4- Store LOG from each application into the log_distributed system.
5- map each application_name with application_id the system, now we can query to get all the logs with the application _name .
6- We can query all the logs we the given time_stamp. and LOG_LEVEL

- Harsh Bhardwaj June 08, 2017 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

I am wondering how do you design the schema to store log record. which allows the query the records by Time stamp, by log level as well as the application type to succeed. For time stamp, I believe the logs can be ordered (via re time stamping) at the time of retrieving the logs from the Kafka ( or other queuing system), rather than using the time stamp provided by the application server.
Once the logs are pushed to the log collector (logstash etc), some real time analytics or decision can me made per log instance (e.g event trigger, action on critical log etc, raise alerts). Apart from these real time action, the logs needs to be stored and queried.
Lets suppose the logs are stored in database (RDMS or no sql ??), these needs to be searched to provide the answers which this question poses.
How shall we store the log records to allow fast search/lookups

- ali.kheam June 11, 2017 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

Here is a proposal for a solution:
1) logs will be kept in clusters of in-memory databases, each cluster dedicated to a group of application servers running application X.

2) each cluster will hold data structure to store the logs for the application for a particular type.

3) the in-memory structure for holding logs for application type X will e based on sets of N associative sets caches when there will be a cache for each log level, the set entries will be time ranges, lets say minute based and the set will rely on suffix-trees for fast search of text.
There will be such in-memory structure per each day.
this will allow a fast search of according to application type and filtered per: level, time-stamp, particular text or any combination of the above.

- Arie March 07, 2018 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

Here is a proposal for a solution:
1) logs will be kept in clusters of in-memory databases, each cluster dedicated to a group of application servers running application X.

2) each cluster will hold data structure to store the logs for the application for a particular type.

3) the in-memory structure for holding logs for application type X will e based on sets of N associative sets caches when there will be a cache for each log level, the set entries will be time ranges, lets say minute based and the set will rely on suffix-trees for fast search of text.
There will be such in-memory structure per each day.
this will allow a fast search of according to application type and filtered per: level, time-stamp, particular text or any combination of the above.

- Arie March 07, 2018 | Flag Reply
Comment hidden because of low score. Click to expand.
-2
of 2 vote

grep "{particular-text}" logfile.txt | grep "{1[5-7]-05-2017 19:20}" | grep "{log-level}"

- Kapil May 17, 2017 | Flag Reply


Add a Comment
Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Learn More

Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Learn More

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Learn More

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.

Learn More