Databricks Certified Professional Data Engineer Exam 認定 Databricks-Certified-Professional-Data-Engineer 試験問題 (Q39-Q44):

質問 # 39
What steps need to be taken to set up a DELTA LIVE PIPELINE as a job using the workspace UI?

  • A. Use Pipeline creation UI, select a new pipeline and job cluster
  • B. Select Workflows UI and Delta live tables tab, under task type select Delta live tables pipeline and select the notebook
  • C. DELTA LIVE TABLES do not support job cluster
  • D. Select Workflows UI and Delta live tables tab, under task type select Delta live tables pipeline and select the pipeline JSON file


The answer is,
Select Workflows UI and Delta live tables tab, under task type select Delta live tables pipeline and select the notebook.
Create a pipeline
To create a new pipeline using the Delta Live Tables notebook:
1.Click Workflows in the sidebar, click the Delta Live Tables tab, and click Create Pipeline.
2.Give the pipeline a name and click to select a notebook.
3.Optionally enter a storage location for output data from the pipeline. The system uses a de-fault location if you leave Storage Location empty.
4.Select Triggered for Pipeline Mode.
5.Click Create.
The system displays the Pipeline Details page after you click Create. You can also access your pipeline by clicking the pipeline name in the Delta Live Tables tab.

質問 # 40
Which of the following two options are supported in identifying the arrival of new files, and incre-mental data from Cloud object storage using Auto Loader?

  • A. File hashing, Dynamic file lookup
  • B. Writing ahead logging, read head logging
  • C. Checking pointing, watermarking
  • D. Directory listing, File notification
  • E. Checkpointing and Write ahead logging


The answer is A, Directory listing, File notifications
Directory listing: Auto Loader identifies new files by listing the input directory.
File notification: Auto Loader can automatically set up a notification service and queue service that subscribe to file events from the input directory.
Choosing between file notification and directory listing modes | Databricks on AWS

質問 # 41
What is the purpose of the bronze layer in a Multi-hop Medallion architecture?

  • A. Reduces data storage by compressing the data
  • B. Data quality checks, corrupt data quarantined
  • C. Powers ML applications
  • D. Contain aggregated data that is to be consumed into Silver
  • E. Copy of raw data, easy to query and ingest data for downstream processes.


The answer is, copy of raw data, easy to query and ingest data for downstream processes, Medallion Architecture - Databricks Here are the typical role of Bronze Layer in a medallion architecture.
Bronze Layer:
1. Raw copy of ingested data
2. Replaces traditional data lake
3. Provides efficient storage and querying of full, unprocessed history of data
4. No schema is applied at this layer
Exam focus: Please review the below image and understand the role of each layer(bronze, silver, gold) in medallion architecture, you will see varying questions targeting each layer and its purpose.
Sorry I had to add the watermark some people in Udemy are copying my content.

質問 # 42
What are the different ways you can schedule a job in Databricks workspace?

  • A. Cron, File notification from Cloud object storage
  • B. On-Demand runs, File notification from Cloud object storage
  • C. Continuous, Incremental
  • D. Once, Continuous
  • E. Cron, On Demand runs


The answer is, Cron, On-Demand runs
Supports running job immediately or using can be scheduled using CRON syntax

質問 # 43
You are currently working on a production job failure with a job set up in job clusters due to a data issue, what cluster do you need to start to investigate and analyze the data?

  • A. Existing job cluster can be used to investigate the issue
  • B. Databricks SQL Endpoint can be used to investigate the issue
  • C. All-purpose cluster/ interactive cluster is the recommended way to run commands and view the data.
  • D. A Job cluster can be used to analyze the problem


Answer is All-purpose cluster/ interactive cluster is the recommended way to run commands and view the data.
A job cluster can not provide a way for a user to interact with a notebook once the job is submitted, but an Interactive cluster allows to you display data, view visualizations write or edit quries, which makes it a perfect fit to investigate and analyze the data.

質問 # 44


