import React from 'react';

import Markdown from './Markdown';

const markdown = `
## Skills
Watch For is highly programmable. It provides an expressive, web-based programming framework to build custom analysis pipelines for images, videos, live streams, and gifs.
These pipelines, called **Skills**, are built using JavaScript with fully imperative code. 
Skills are used to program which components (from the repository) are run on each image/frame,
how they are run and what results are returned, 
and for videos and live streams, how frames are sampled and how frame results are aggregated (temporal analysis).
Custom moderation policies and optimization goals can be easily expressed through skills including at a per-request level.

### Using skills
A skill is executed on every request. The output of the skill is returned back as the *result*.
A deployment comes with *system-managed skills* for processing [images](/images/skills), [videos](/videos/skills), [live streams](/streams/skills), and [gifs](/gifs/skills). 
System-managed skills are managed by the Watch For team and are not editable. These skills are built
for the common case and can be used out of the box. System-managed skills are initially set as the *default skills*.
Default skills are executed when no skillId is specified in the request.
A skill can be set as default by selecting the <code>Set as default</code> option.
A custom skill (that is not default) can be executed for a request by setting the *skillId* in the request.

Note that system-managed skills may be updated on deployment updates as the Watch For team
improves accuracy for the common case. 
Use a (cloned) custom skill as default if automatic updates are not preferred.

### Building skills
A new skill can be created by selecting 
the <code>Create skill</code> option. This action clones the system-managed skill instead of
creating an empty one so that it's easy to get started. A new skill can also be created
by cloning other custom skills by selecting the <code>Clone skill</code> option.

A unique skill id is specified when a skill is created. Skill id can be up to 1024 characters long with [A-Za-z0-9-_].
This id is passed as the *skillId* in requests to run the specific skill.
A skill also has a descriptive name that can be edited.

#### Programming model
Watch For provides a web-based programming framework for building skills.

A skill has two parts:
* **Components**: A list of components added from the repository.
* **Code**: JavaScript code that describe the analysis pipeline. Images, videos, live streams, and gifs have different templates with specific methods to fill.

**Components**
Watch For manages a repository of AI models and libraries, called components, that can be used in a skill to build a custom analysis pipeline.
Components are added by selecting <code>Add from repository</code>.
In the [repository](/repository) page, click on a component to view its documentation - description, methods the component exposes and example usage. 
Click <code>Add</code> next to a component to add it the skill.
When a component is added to the skill, it gets a *handle*, which can be used in code to access its methods (similar to an object).
A single component can be added multiple times. Each will have its own handle.
Click on <code>Edit handle</code> to edit its value. 
Some components have *required properties* that need to be set before they can be used.
Click on <code>Edit properties</code> to set their values. If a component has required properties,
a warning would appear below the component.
As an example, components that make API calls to Azure's cognitive services have Endpont and ApiKey exposed as
required properties that need to set.

**Code**
Watch For provides a simple JavaScript coding framework to express how analysis is done.
For each media type (images, videos, live streams, and gifs), it exposes a few JavaScript methods that needs to be filled.
These are described in detail in sections below.

The framework supports most JavaScript [ES6 features](https://github.com/sebastienros/jint).
In addition to using native JavaScript constructs, a component (added from the repository)
can be directly invoked using its handle. Internally, component invocations are delegated to managed/native methods
and results are returned to the scripting runtime.
The example below runs the CVS Gore model on an image input and stores the result in a variable.

~~~javascript
const goreResult = cvsGore.run(image);
~~~

#### Building image analysis
The framework provides a built-in template for processing images.
It exposes a JavaScript method called <code>processImage</code> which is executed on every request with the image as the input.
The return value of the method is sent back with the response in the *result* field.
The <code>processImage</code> method can invoke components and can have any imperative logic.
There are two input parameters to the method:
* **image**: A reference to the input image. This reference can be used directly with image models and other components in the repository that process images.
* **props**: A set of custom JSON properties sent in the *props* field along with the request. The values in props can be used to adapt the processing logic based on application-specific signals. The use of props is described in detail in the *Using props* section.

Consider a new skill, with id *my-image-skill*, with two added components, Bing Adult V3 classifier (handle: bingAdultV3) and CVS Gore classifier (handle: cvsGore).
The following code simply runs both models on the image and returns their results:

~~~javascript
function processImage(image, props) {
  const result = {};
  result.adultResult = bingAdultV3.run(image);
  result.goreResult = cvsGore.run(image);
  return result;
}
~~~

The following request invokes this skill:
~~~
POST /images

{
  "id": "test-image",
  "imageUri": "https://www.gstatic.com/webp/gallery/1.jpg",
  "skillId": "my-image-skill"
}
~~~

The following is the response:
~~~
{
  "id": "test-image",
  "result": {
    "adultResult": {
      "adult":false,
      "adultScore":0.0021993641275912523,
      "racy":false,
      "racyScore":0.0020786249078810215
    },
    "goreResult": {
      "gore":false,
      "goreScore":0.00038933451287448406
    }
  },
  ...
}
~~~

In addition to executing components, the skill can have any imperative logic.
The following code thresholds based on scores and returns a boolean result:

~~~javascript
function processImage(image, props) {
  const result = { adult: false, gore: false };
  const adultResult = bingAdultV3.run(image);
  if (adultResult.adultScore >= 0.8) {
    result.adult = true;
  }
  const goreResult = cvsGore.run(image);
  if (goreResult.goreScore >= 0.9) {
    result.gore = true;
  }
  return result;
}
~~~

The following is the response for the test image:
~~~
{
  "id": "test-image",
  "result": {
    "adult": false,
    "gore": false
  },
  ...
}
~~~

In the following example, an additional component is added: CVS adult classifier (handler: cvsAdult)
and the image is marked as adult only if both the adult classifiers agree on the result with high scores.

~~~javascript
function processImage(image, props) {
  const result = { adult: false, gore: false };
  const adultResult1 = bingAdultV3.run(image);
  const adultResult2 = cvsAdult.run(image);
  if (adultResult1.adultScore >= 0.8 && adultResult2.adultScore >= 0.75) {
    result.adult = true;
  }
  const goreResult = cvsGore.run(image);
  if (goreResult.goreScore >= 0.9) {
    result.gore = true;
  }
  return result;
}
~~~

#### Building video analysis
The framework provides a built-in template for sampling video frames, processing each frame and aggregating frame results.
The template exposes five JavaScript methods: <code>init</code>, <code>getSampleTimestamp</code>, <code>processFrame</code>, <code>shouldStoreFrame</code> and <code>aggregateFrameResults</code>.
These methods are executed in the following way for each video:

~~~pseudocode
init(props);
frameResults = [], prevSample = null;
while (true) {
  sampleTimestamp = getSampleTimestamp(videoInfo, prevSample);
  if (!sampleTimestamp)
    break;
  frame = seekKeyFrame(sampleTimestamp); // fetch frame from video
  if (prevSample && frame.timestamp === prevSample.actualTimestamp) {
    prevSample = { duplicate: true, sampleTimestamp, actualTimestamp : frame.timestamp };
    continue;
  }
  frameResult = processFrame(frame);
  frameResults.push(frameResult);
  if (shouldStoreFrame(frameResult))
    storeFrame(frame, frameResult); // frame is stored and the frameUri is populated
  prevSample = { sampleTimestamp, actualTimestamp : frame.timestamp };
}
result = aggregateFrameResults(frameResults);
pushResult(result); // Results updated in storage and sent through event hub
~~~

<code>init</code> is called before a video is processed (line 1). The framework reuses the same script loaded in memory
across videos. Hence, the code is <code>init</code> should reset any global state maintained in the skill.
The use of state is described in detail in the *Using state* section.

<code>getSampleTimestamp</code> method is called for sampling the next frame in the video for processing (line 4).
The return value of the method is the *absolute timestamp in seconds* of the frame to sampled. 
**Currently, the framework only supports key frame sampling.** Given a timestamp to be sampled,
the system fetches the nearest key frame that exists before that timestamp (line 7).
For example, if <code>getSampleTimestamp</code> method returns 4 (timestamp: 4 seconds) and key frames are at 0, 3 and 6 seconds,
the frame at second 3 is sampled.
Processing of frames is stopped once a null returned from the <code>getSampleTimestamp</code> method (line 5-6).
If sampling returns the same frame as the previously processed frame, processing is skipped for the frame (line 8-11)
and <code>getSampleTimestamp</code> method is called again (line 4).

For every sampled frame, the <code>processFrame</code> method is called to process the frame (line 12).
<code>processFrame</code> method can invoke components from the repository to analyze the frame.
The return value of the method is stored as the frame result (line 13).

Once a frame is processed, <code>shouldStoreFrame</code> method is called to get a value indicating 
whether the frame should be stored (line 14). If *true* is returned, the frame is stored and the frameUri is populated 
in the frame response (line 15). If *false* is returned, the frame is not stored and the frameUri is set to null.
If retention policy does not allow frame storage, the frame is not stored irrespective of the value returned by the method.

Once all sampled frames are processed, the <code>aggregateFrameResults</code> method is called (line 18) with the frame results.
The return value of the method is sent in the *result* field of the video response (line 19).

The methods that can be customized are described in detail below with a running example.

<code>init</code> method is called with one input parameter: 
* **props**: A set of custom JSON properties sent in the *props* field along with the request. The values in props can be used to adapt the processing logic based on application-specific signals.

The following code shows the default implementation of the <code>init</code> method. The method simply resets the global state variable.
The use of props and state is described in detail later.

~~~javascript
let state = {};
function init(props) {
  state = {};
}
~~~

<code>getSampleTimestamp</code> method is called with two parameters:
* **videoInfo**: An object with metadata about the video. Currently, it has only one field called *duration* that provides the total length of the video in seconds.
* **prevSample**: An object with information about the previous sample. The object is null for the first sample call. For subsequent calls, the object has three fields:
  * sampleTimestamp: Value returned by the previous invocation of the sample method.
  * actualTimestamp: Timestamp in seconds of the actual key frame sampled.
  * duplicate: A boolean indicating whether the previous sample resulted in a duplicate frame.

The following code shows an implementation of <code>getSampleTimestamp</code> method to sample the video every 2 seconds.

~~~javascript
function getSampleTimestamp(videoInfo, prevSample) {
  const { duration } = videoInfo;
  const sampleTimestamp = 0;
  if (prevSample) {
    sampleTimestamp = prevSample.sampleTimestamp + 2;
    if (sampleTimestamp > duration) {
      return null;
    }
  }
  return sampleTimestamp;
}
~~~

<code>processFrame</code> method is called with one parameter:
* **frame**: An object with frame data. It has two fields.
  * image: A reference to the frame image. This reference can be used directly with image models and other components in the repository that processes images.
  * timestamp: Timestamp of the frame in seconds in the video.

Consider a skill with two added components, Bing Adult V3 classifier: (handle: bingAdultV3) and CVS Gore classifier (handle: cvsGore).
The following code runs the models on the frame image, applies thresholds and marks the frame as adult or gore.

~~~javascript
function processFrame(frame) {
  const { image } = frame;
  const frameResult = { adult: false, gore: false };
  const adultResult = bingAdultV3.run(image);
  if (adultResult.adultScore >= 0.8) {
    frameResult.adult = true;
  }
  const goreResult = cvsGore.run(image);
  if (goreResult.goreScore >= 0.9) {
    frameResult.gore = true;
  }
  return frameResult;
}
~~~

<code>shouldStoreFrame</code> method is called with one parameter:
* **frameResult**: The result of the previous <code>processFrame</code> method.

The following code returns *true* if the frame result has the adult or gore flag set, otherwise it returns false.
In this example, only flagged frames are stored (e.g. as evidence for human review).

~~~javascript
function shouldStoreFrame(frameResult) {
  if (frameResult.adult || frameResult.gore) {
    return true;
  }
  return false;
}
~~~

<code>aggregateFrameResults</code> method is called with one parameter:
* **frameResults**: An array of frame results returned by the <code>processFrame</code> method for each frame.

The following code aggregates the frame results. In this example, if at least two frames are marked as adult, the video is marked as adult.
Similarly, if at least two frames are marked as gore, the video is marked as gore.

~~~javascript
function aggregateFrameResults(frameResults) {
  const result = {};
  let adultFrames = 0;
  let goreFrames = 0;
  frameResults.forEach((frameResult) => {
    if (frameResult.adult) {
      adultFrames++;
    }
    if (frameResult.gore) {
      goreFrames++;
    }
  });
  result.adult = adultFrames >= 2;
  result.gore = goreFrames >= 2;
  return result;
}
~~~

Suppose the example skill above has the id *my-video-skill*, the following request invokes the skill:
~~~
POST /videos

{
  "id": "test-video",
  "videoUri": "https://www.learningcontainer.com/wp-content/uploads/2020/05/sample-mp4-file.mp4",
  "skillId": "my-video-skill"
}
~~~

The following is the response:
~~~
{
  "id": "test-video",
  "result": {
    "adult": false,
    "gore": false
  }
  ...
}
~~~

#### Building live stream analysis
The framework provides a built-in template for sampling frames from live streams and processing each frame.
The results are continuously updated (and sent through event hub) as frames are processed.
The template exposes four JavaScript methods: <code>init</code>, <code>processFrame</code>, <code>shouldStoreFrame</code> and <code>getSampleTime</code>.
These methods are executed in the following way for each live stream:

~~~pseudocode
init(props);
while (true) {
  if (checkStopped()) // Check if live stream processing is stopped
    break;
  frame = seekLastKeyFrame(); // fetch the last key frame from live stream
  frameResult = processFrame(frame);
  pushResult(result); // Results updated in storage and sent through event hub
  if (shouldStoreFrame(frameResult))
    storeFrame(frame, frameResult); // frame is stored and the frameUri is populated
  sampleTime = getSampleTime();
  sleep(sampleTime);
}
~~~

<code>init</code> is called before a live stream is processed (line 1). The framework reuses the same script loaded in memory
across live streams. Hence, the code is <code>init</code> should reset any global state maintained in the skill.
The use of state is described in detail in the *Using state* section.

Frames are continuously sampled and processed until processing is stopped (line 3-4). Processing can be stopped using the stop API.
At every sample time, the last key frame is fetched (line 5).
Watch For currently supports only HLS streams. To get a frame, HLS manifest is parsed for a list of chunks.
The first frame of the last chunk (key frame) is fetched as the frame to processed.

For every sampled frame, the <code>processFrame</code> method is called to process the frame (line 6).
<code>processFrame</code> method can invoke components from the repository to analyze the frame.
The return value of the method is pushed as the current result (line 7).

Once a frame is processed, <code>shouldStoreFrame</code> method is called to get a value indicating 
whether the frame should be stored (line 8). If *true* is returned, the frame is stored and the frameUri is populated 
in the frame response (line 9). If *false* is returned, the frame is not stored and the frameUri is set to null.
If retention policy does not allow frame storage, the frame is not stored irrespective of the value returned by the method.

<code>getSampleTime</code> method is called to get the next relative sampling time (line 10).
The return value of the method is the *time to sleep in seconds* before the next frame is sampled and processed (line 11). 

The methods that can be customized are described in detail below with a running example.

<code>init</code> method is called with one input parameter: 
* **props**: A set of custom JSON properties sent in the *props* field along with the request. The values in props can be used to adapt the processing logic based on application-specific signals.

The following code shows the default implementation of the <code>init</code> method. The method simply resets the global state variable.
The use of props and state is described in detail later.

~~~javascript
let state = {};
function init(props) {
  state = {};
}
~~~

<code>processFrame</code> method is called with one parameter:
* **frame**: An object with frame data. It has two fields.
  * image: A reference to the frame image. This reference can be used directly with image models and other components in the repository that processes images.
  * timestamp: HLS timestamp of the frame in seconds in the stream.

Consider a skill with two added components, Bing Adult V3 classifier: (handle: bingAdultV3) and CVS Gore classifier (handle: cvsGore).
The following code runs the models on the frame image, applies thresholds and marks the frame as adult or gore.

~~~javascript
function processFrame(frame) {
  const { image } = frame;
  const frameResult = { adult: false, gore: false };
  const adultResult = bingAdultV3.run(image);
  if (adultResult.adultScore >= 0.8) {
    frameResult.adult = true;
  }
  const goreResult = cvsGore.run(image);
  if (goreResult.goreScore >= 0.9) {
    frameResult.gore = true;
  }
  return frameResult;
}
~~~

<code>shouldStoreFrame</code> method is called with one parameter:
* **frameResult**: The result of the previous <code>processFrame</code> method.

The following code returns *true* if the frame result has the adult or gore flag set, otherwise it returns false.
In this example, only flagged frames are stored (e.g. as evidence for human review).

~~~javascript
function shouldStoreFrame(frameResult) {
  if (frameResult.adult || frameResult.gore) {
    return true;
  }
  return false;
}
~~~

<code>getSampleTime</code> method is called no parameters.
The following code shows an implementation of <code>getSampleTime</code> method to sample the live stream every 5 seconds after processing a frame.

~~~javascript
function getSampleTime() {
  return 5;
}
~~~

Suppose the example skill above has the id *my-stream-skill*, the following request invokes the skill:
~~~
POST /streams

{
  "id": "test-stream",
  "streamUri": "https://nmxlive.akamaized.net/hls/live/529965/Live_1/index.m3u8",
  "skillId": "my-stream-skill"
}
~~~

The following is a response. Note that results are continuously updated and pushed for live streams.
~~~
{
  "id": "test-stream",
  "result": {
    "adult": false,
    "gore": false
  }
  ...
}
~~~

#### Building gif analysis
The framework provides a built-in template for sampling gif frames, processing each frame and aggregating frame results.
The template exposes five JavaScript methods: <code>init</code>, <code>getSampleIndex</code>, <code>processFrame</code>, <code>shouldStoreFrame</code> and <code>aggregateFrameResults</code>.
These methods are executed in the following way for each gif:

~~~pseudocode
init(props);
frameResults = [], prevSample = null;
while (true) {
  sampleIndex = getSampleIndex(gifInfo, prevSample);
  if (!sampleIndex)
    break;
  frame = seekFrame(sampleIndex); // fetch frame from gif
  if (prevSample && frame.FrameIndex == prevSample.ActualFrameIndex)
  {
    prevSample = { duplicate: true, sampleIndex, actualIndex : frame.FrameIndex };
    continue;
  }
  frameResult = processFrame(frame);
  frameResults.push(frameResult);
  if (shouldStoreFrame(frameResult))
    storeFrame(frame, frameResult); // frame is stored and the frameUri is populated
  prevSample = { sampleIndex, actualIndex : frame.FrameIndex };
}
result = aggregateFrameResults(frameResults);
pushResult(result); // Results updated in storage and sent through event hub
~~~

<code>init</code> is called before a gif is processed (line 1). The framework reuses the same script loaded in memory
across gifs. Hence, the code is <code>init</code> should reset any global state maintained in the skill.
The use of state is described in detail in the *Using state* section.

<code>getSampleIndex</code> method is called for sampling the next frame in the gif for processing (line 4).
The return value of the method is the *index* of the frame to be sampled. 
Processing of frames is stopped once a null is returned from the <code>getSampleIndex</code> method (line 5-6).
If sampling returns the same frame as the previously processed frame, processing is skipped for the frame (line 8-11)
and <code>getSampleIndex</code> method is called again (line 4).

For every sampled frame, the <code>processFrame</code> method is called to process the frame (line 12).
<code>processFrame</code> method can invoke components from the repository to analyze the frame.
The return value of the method is stored as the frame result (line 13).

Once a frame is processed, <code>shouldStoreFrame</code> method is called to get a value indicating 
whether the frame should be stored (line 14). If *true* is returned, the frame is stored and the frameUri is populated 
in the frame response (line 15). If *false* is returned, the frame is not stored and the frameUri is set to null.
If retention policy does not allow frame storage, the frame is not stored irrespective of the value returned by the method.

Once all sampled frames are processed, the <code>aggregateFrameResults</code> method is called (line 18) with the frame results.
The return value of the method is sent in the *result* field of the gif response (line 19).

The methods that can be customized are described in detail below with a running example.

<code>init</code> method is called with one input parameter: 
* **props**: A set of custom JSON properties sent in the *props* field along with the request. The values in props can be used to adapt the processing logic based on application-specific signals.

The following code shows the default implementation of the <code>init</code> method. The method simply resets the global state variable.
The use of props and state is described in detail later.

~~~javascript
let state = {};
function init(props) {
  state = {};
}
~~~

<code>getSampleIndex</code> method is called with two parameters:
* **gifInfo**: An object with metadata about the gif. Currently, it has only one field called *FrameCount* that provides the total number of frames in the gif.
* **prevSample**: An object with information about the previous sample. The object is null for the first sample call. For subsequent calls, the object has three fields:
  * sampleIndex: Value returned by the previous invocation of the sample method.
  * actualIndex: The index of the actual key frame sampled.
  * duplicate: A boolean indicating whether the previous sample resulted in a duplicate frame.

The following code shows an implementation of <code>getSampleIndex</code> method to sample the gif every 2 frames.

~~~javascript
function getSampleIndex(gifInfo, prevSample) {

  const sampleIndex = 0;
  if (prevSample) {
    sampleIndex = prevSample.sampleIndex + 2;
    if (sampleIndex > gifInfo.FrameCount) {
      return null;
    }
  }
  return sampleIndex;
}
~~~

<code>processFrame</code> method is called with one parameter:
* **frame**: An object with frame data. It has two fields.
  * image: A reference to the frame image. This reference can be used directly with image models and other components in the repository that processes images.
  * frameindex: the index of the frame in the gif.

Consider a skill with two added components, Bing Adult V3 classifier: (handle: bingAdultV3) and CVS Gore classifier (handle: cvsGore).
The following code runs the models on the frame image, applies thresholds and marks the frame as adult or gore.

~~~javascript
function processFrame(frame) {
  const { image } = frame;
  const frameResult = { adult: false, gore: false };
  const adultResult = bingAdultV3.run(image);
  if (adultResult.adultScore >= 0.8) {
    frameResult.adult = true;
  }
  const goreResult = cvsGore.run(image);
  if (goreResult.goreScore >= 0.9) {
    frameResult.gore = true;
  }
  return frameResult;
}
~~~

<code>shouldStoreFrame</code> method is called with one parameter:
* **frameResult**: The result of the previous <code>processFrame</code> method.

The following code returns *true* if the frame result has the adult or gore flag set, otherwise it returns false.
In this example, only flagged frames are stored (e.g. as evidence for human review).

~~~javascript
function shouldStoreFrame(frameResult) {
  if (frameResult.adult || frameResult.gore) {
    return true;
  }
  return false;
}
~~~

<code>aggregateFrameResults</code> method is called with one parameter:
* **frameResults**: An array of frame results returned by the <code>processFrame</code> method for each frame.

The following code aggregates the frame results. In this example, if at least two frames are marked as adult, the gif is marked as adult.
Similarly, if at least two frames are marked as gore, the gif is marked as gore.

~~~javascript
function aggregateFrameResults(frameResults) {
  const result = {};
  let adultFrames = 0;
  let goreFrames = 0;
  frameResults.forEach((frameResult) => {
    if (frameResult.adult) {
      adultFrames++;
    }
    if (frameResult.gore) {
      goreFrames++;
    }
  });
  result.adult = adultFrames >= 2;
  result.gore = goreFrames >= 2;
  return result;
}
~~~

Suppose the example skill above has the id *my-gif-skill*, the following request invokes the skill:
~~~
POST /gifs

{
  "id": "test-gif",
  "gifUri": "https://media.giphy.com/media/Mwzz3DItueHqaMYNDF/giphy.gif",
  "skillId": "my-gif-skill"
}
~~~

The following is the response:
~~~
{
  "id": "test-gif",
  "result": {
    "adult": false,
    "gore": false
  }
  ...
}
~~~

#### Using *state*
For videos and live streams, the *state* variable can be used to pass data between methods and across frames.
*state* is a JavaScript object and can store any information. 
Note that the <code>init</code> method needs to reset *state* since the same scripting runtime is reused across videos and live streams
for performance reasons.

The following example uses *state* to pass information between <code>processFrame</code> and <code>getSampleTime</code>
to sample a live stream faster (2 seconds vs. 5 seconds) for the next 10 samples if a frame is flagged as adult.
*fastSampling* and *fastSamplingCount* are state maintained across methods and frames.

~~~javascript
let state = {};
function init(props) {
  state = {};
}
~~~

~~~javascript
function getSampleTime() {
  if (state.fastSampling)
    return 2;
  return 5;
}
~~~

~~~javascript
function processFrame(frame) {
  const { image } = frame;
  const frameResult = {};
  const adultResult = bingAdultV3.run(image);
  if (adultResult.adultScore >= 0.8) {
    frameResult.adult = true;
    state.fastSampling = true;
    state.fastSamplingCount = 0;
  }

  if (state.fastSampling) {
    state.fastSamplingCount++;
    if (state.fastSamplingCount > 10)
      state.fastSampling = false;
  }

  return frameResult;
}
~~~

The following example for adapts the processing for the next frame based on results of the previous frame.
If the previous frame is marked as adult, the threshold is reduced for the next frame.

~~~javascript
let state = {};
function init(props) {
  state = {};
}
~~~

~~~javascript
function processFrame(frame) {
  const { image } = frame;
  const frameResult = { adult: false };
  const threshold = state.prevAdult ? 0.6 : 0.75;

  const adultResult = bingAdultV3.run(image);
  if (adultResult.adultScore >= threshold) {
    frameResult.adult = true;
  }

  state.prevAdult = frameResult.adult;
  return frameResult;
}
~~~

#### Using *props*
*props* is set of custom JSON properties sent along with the request and is passed to the skill.
The values in *props* can be used to adapt the processing logic based on application-specific signals.

The following example adapts the sampling rate for videos based on a parameter (*highRisk*) passed in props.

~~~javascript
let state = {};
function init(props) {
  state = {};
  state.highRisk = props.highRisk;
}
~~~

~~~javascript
function getSampleTimestamp(videoInfo, prevSample) {
  const { duration } = videoInfo;
  const sampleTimestamp = 0;
  const samplePeriod = state.highRisk ? 1 : 2;
  if (prevSample) {
    sampleTimestamp = prevSample.sampleTimestamp + samplePeriod;
    if (sampleTimestamp > duration) {
      return null;
    }
  }
  return sampleTimestamp;
}
~~~

The request is sent as below:

~~~
POST /videos

{
  "id": "test-video",
  "videoUri": "https://www.learningcontainer.com/wp-content/uploads/2020/05/sample-mp4-file.mp4",
  "skillId": "my-video-skill",
  "props": {
    "highRisk": true
  }
}
~~~

The following example adapts the classifiers run for image processing based on values in props.

~~~javascript
function processImage(image, props) {
  const result = {};
  if (props.runAdult) {
    result.adultResult = bingAdultV3.run(image);
  }
  if (props.runGore) {
    result.goreResult = cvsGore.run(image);
  }
  if (props.runOcr) {
    result.ocrResult = ocr.run(image);
  }
  return result;
}
~~~

The request is sent as below:
~~~
POST /images

{
  "id": "test-image",
  "imageUri": "https://www.gstatic.com/webp/gallery/1.jpg",
  "skillId": "my-image-skill",
  "props": {
    "runAdult": true,
    "runGore": false,
    "runOcr": true
  }
}
~~~

### Examples
This section provides more real-world examples of skills with various components from the repository.

#### Cascading multiple models
Cascading multiple similar-purposed models can significantly reduce false positives.
The insight here is that similar-purposed models (e.g. to find adult content) built by different teams
are trained on similar *true* data but different *false* data. Hence, they agree on the true positives
but disagree on the false positives. By checking for agreement across multiple classifiers, false positives
can be minimized. By having the cheapest model first in the cascade, significant cost efficiency can be achieved.

In the following example, multiple adult classifiers are invoked in a cascaded manner to 
flag an adult frame. The cheapest classifier in terms of cost (bingAdultV3) is used
first in the cascade.

~~~javascript
function processFrame(frame) {
  const { image } = frame;
  const result = { adult: false };
  result.classifiers = {};

  result.classifiers.bingAdultV3 = bingAdultV3.run(image);
  if (result.classifiers.bingAdultV3.adultScore >= 0.75) {
    result.classifiers.xiaoice = xiaoice.run(image);
    if (result.classifiers.xiaoice.category === 'adult') {
      result.classifiers.cvsAdult = cvsAdult.run(image);
      if (result.classifiers.cvsAdult.adultScore >= 0.75) {
        result.adult = true;
      }
    }
  }
  return result;
}
~~~


#### Face detection and age classification
In addition to models for finding inappropriate content, the repository hosts many 
general-purpose image components such as face detection and age classification.
These components can be run in the same way as AI models.
In the following example, an image is flagged for review if there 
is a person with age less than 13 
(detected using face detection and age classification).

~~~javascript
function processImage(image, props) {
  const result = { review: false };
  const faces = faceDetector.run(image);
  faces.forEach((face) => {
    const age = ageClassifier.run(image, face);
    if (age <  13) {
      result.review = true;
    }
  });
  return result;
}
~~~

#### OCR
The repository hosts a few OCR components for recognizing text in images.
The skill can invoke a local OCR (e.g. Tesseract) or call to Azure Cognitive Services.
Watch For team is working on onboarding more local OCR components and custom models to
detect text (before invoking a full OCR).
To use cognitive Services OCR component, an API key and endpoint are required properties and 
it is up to the customer team to make sure the provided endpoint can scale to their workload.

In the following example, a local OCR is invoked, and the raw text is sent back in the results.
The returned text can be sent to a text moderation system (e.g. community sift) to moderate content.

~~~javascript
function processImage(image, props) {
  const result = {};
  result.ocrResult = tessOcr.run(image);
  return result;
}
~~~

#### Adaptive sampling of frames
Videos and live streams can be adaptively sampled both to improve accuracy and reduce cost.
For instance, a video or a live stream perceived as high risk can be sampled faster while
others can be sampled slower. The sampling rate can either be set using applicating-specific
signals (in *props*) or can be dynamically changed as a content is being processed.

In the following example, a live stream is processed with 3x more samples if the calling application
marks the stream as high risk (static setting) or 
if a classifier flags a frame as adult (dynamic setting).
It is typical to set streams from users who have recently created accounts or are not verified
as high risk. This can be set in *props*.
If a frame is flagged as adult, sampling faster for a period of time (next 10 frames in the example)
can help quickly determine if the flagged frame is a false positive or a true positive.
For instance, if there are more than one unique frame flagged, then the live stream can be considered for review.

~~~javascript
let state = {};
function init(props) {
  state = {};
  state.highRisk = props.highRisk;
}
~~~

~~~javascript
function getSampleTime() {
  if (state.fastSampling || state.highRisk)
    return 5;
  return 15;
}
~~~

~~~javascript
function processFrame(frame) {
  const { image } = frame;
  const frameResult = {};
  const adultResult = bingAdultV3.run(image);
  if (adultResult.adultScore >= 0.8) {
    frameResult.adult = true;
    state.fastSampling = true;
    state.fastSamplingCount = 0;
  }

  if (state.fastSampling) {
    state.fastSamplingCount++;
    if (state.fastSamplingCount > 10)
      state.fastSampling = false;
  }

  return frameResult;
}
~~~

#### Deduping frames
The repository has components to fingerprint and compare images for similarity.
For example, the **dHash** component provides a perceptual hash of an image. It exposes two methods,
one to compute the hash and the other to compare hashes. Hashes are computed as a 64-bit string.
They are compared using hamming distance -- the lesser the distance (e.g. < 5), the more similar the images are.
Image hashes can be uses to dedup frames in videos and live streams.
In the following example, a video is flagged as adult if there are at least two frames
flagged as adult and the frames have a hamming distance of more than 10 (not similar).

~~~javascript
function processFrame(frame) {
  const { image } = frame;
  const frameResult = { adult: false };
  const adultResult = bingAdultV3.run(image);
  if (adultResult.adultScore >= 0.75) {
    frameResult.adult = true;
  }
  frameResult.hash = dHash.compute(image);
  return frameResult;
}
~~~

~~~javascript
function aggregateFrameResults(frameResults) {
  const result = { adult: false };
  for (let i = 0; i < frameResults.length - 1; i++) {
    for (let j = i + 1; j < frameResults.length; j++) {
      if (frameResults[i].adult && frameResults[j].adult) {
        const distance = dHash.compare(frameResults[i].hash, frameResults[j].hash);
        if (distance > 10) {
          result.adult = true;
        }
      }
    }
  }
  return result;
}
~~~

#### Aggregating results in a sliding time window

For live streams, currently there are no built-in aggregation methods to aggregate results from across frames.
However, *state* can be used to aggregate frame results over a time or frame window.
In the following example, *result.adult* is set based on frame results over a time window.
*result.adult* is set to true (i.e. the live stream is marked as adult) if there are at least two 
unique frames (different by a dHash distance of more than 10) flagged as adult within the last 120 seconds.

~~~javascript
function processFrame(frame) {
  const { image } = frame;
  const time = Math.floor(Date.now() / 1000);

  const result = { adult: false };
  let adultFrame = false;

  const hash = dHash.compute(image);
  const adultResult = bingAdultV3.run(image);  
  if (adultResult.adultScore >= 0.8) {
    adultFrame = true;
  }
  
  const timeWindow = 120;
  state.adultFrames = state.adultFrames.filter(af => time - af.time <= timeWindow);  
  if (adultFrame) {
    state.adultFrames.forEach((af) => {
      const distance = dHash.compare(af.hash, hash);
      if (distance > 10) {
        result.adult = true;
      }
    });

    state.adultFrames.push({ time, hash });
  }

  return result;
}
~~~

### Debugging skills
Clone a new skill when creating or modifying a skill.
Test the skill by setting the skillId in the requests.
Use [boards](/videos/boards) or the interative [query interface](/videos) to browse results and verify the accuracy of the skill.
Debugging information in the skill can be added to *result* and can be viewed with the responses.
Use [metrics](/videos/metrics) to study the performance of the skill.
Use the [errors page](/videos/errors) to find scripting or runtime errors.
Errors (if any) are populated as soon a skill is run on a request.

Watch For team is working on more tools for easy debugging.

<br/><br/>
`;

export default function Doc() {
  return <Markdown markdown={markdown} />;
}
