Knowledge base search and retrieval based on document similarity

    公开(公告)号:US10332123B2

    公开(公告)日:2019-06-25

    申请号:US14837249

    申请日:2015-08-27

    Abstract: A system performs search and retrieval. The system monitors one or more user interface (“UI”) fields configured to receive text input in a UI. The system determines that the one or more UI fields are being used to enter a textual description, and performs a search on a knowledge base based on document similarity to identify documents that are similar to a portion of the textual description that has already been entered in the one or more UI fields. The system then provides one or more of the documents in a UI field of the UI, and repeats the monitoring, the determining, the performing, and the providing.

    STREAMING LATENT DIRICHLET ALLOCATION
    3.
    发明申请

    公开(公告)号:US20190114319A1

    公开(公告)日:2019-04-18

    申请号:US15934262

    申请日:2018-03-23

    Abstract: Embodiments make novel use of random data structures to facilitate streaming inference for a Latent Dirichlet Allocation (LDA) model. Utilizing random data structures facilitates streaming inference by entirely avoiding the need for pre-computation, which is generally an obstacle to many current “streaming” variants of LDA as described above. Specifically, streaming inference—based on an inference algorithm such as Stochastic Cellular Automata (SCA), Gibbs sampling, and/or Stochastic Expectation Maximization (SEM)—is implemented using a count-min sketch to track sufficient statistics for the inference procedure. Use of a count-min sketch avoids the need to know the vocabulary size V a priori. Also, use of a count-min sketch directly enables feature hashing, which addresses the problem of effectively encoding words into indices without the need of pre-computation. Approximate counters are also used within the count-min sketch to avoid bit overflow issues with the counts in the sketch.

    Debiasing Pre-trained Sentence Encoders With Probabilistic Dropouts

    公开(公告)号:US20240419900A1

    公开(公告)日:2024-12-19

    申请号:US18817147

    申请日:2024-08-27

    Abstract: Debiasing pre-trained sentence encoders with probabilistic dropouts may be performed by various systems, services, or applications. A sentence may be received, where the words of the sentence may be provided as tokens to an encoder of a machine learning model. A token-wise correlation using semantic orientation may be determined to determine a bias score for the tokens in the input sentence. A probability of dropout that for tokens in the input sentence may be determined from the bias scores. The machine learning model may be trained or tuned based on the probabilities of dropout for the tokens in the input sentence.

    Debiasing Pre-trained Sentence Encoders With Probabilistic Dropouts

    公开(公告)号:US20220245339A1

    公开(公告)日:2022-08-04

    申请号:US17589662

    申请日:2022-01-31

    Abstract: Debiasing pre-trained sentence encoders with probabilistic dropouts may be performed by various systems, services, or applications. A sentence may be received, where the words of the sentence may be provided as tokens to an encoder of a machine learning model. A token-wise correlation using semantic orientation may be determined to determine a bias score for the tokens in the input sentence. A probability of dropout that for tokens in the input sentence may be determined from the bias scores. The machine learning model may be trained or tuned based on the probabilities of dropout for the tokens in the input sentence.

Patent Agency Ranking